Voting Systems Certification Standards
Thomas P. Ryan1 and Candice Hoke2
Abstract
This paper analyzes the Diebold Election
Systems, Inc. election management software (GEMS) using publicly accessible postings
of GEMS election databases. It finds
that the GEMS architecture fails to conform to fundamental database design
principles and software industry standards for ensuring accurate data. Thus, in
election tabulations, aspects of the GEMS design can lead to, or fail to
protect against, erroneous reporting of election results. Further, GEMS’s dependence
on Microsoft’s JET technology introduces additional risks to data accuracy and
security.
Despite these
technical and systemic deficiencies,
With unprecedented Federal funding available to States
under the Help America Vote Act of 2002 (
This
paper analyzes the Diebold Election Systems, Inc. (DESI) election management
software named Global Election Management System (“GEMS”) using publicly
accessible postings of GEMS election databases.[8] It finds that the GEMS architecture violates
fundamental design principles and software industry standards for ensuring
accurate data. When utilized for election
tabulations, the GEMS design can lead to data errors, which in turn create a
serious risk for generating erroneous election results.
Either of these design aspects would be
worrisome. For the GEMS database (DB) to
have been structured with fundamental flaws at the levels of both system architecture
and system technology, and yet still obtain Federal and State certification,
raises questions concerning the adequacy of the existing regulatory
standards. Thus the paper turns to ask
what the relationship is between the regulatory standards and the technical
database flaws. It argues the regulatory
standards structurally encourage low DB design standards rather than promoting the
use of tabulation system architecture that meet widely recognized industry standards
for data accuracy and reliability.
This
paper proceeds by briefly reviewing the DB design principles of the First and
Second Normal Forms. In part II, the
paper examines the GEMS DB in light of these fundamental design principles,
concluding that GEMS does not satisfy even the most basic, essential precepts
of the First Normal Form. Further, its use of JET technology renders GEMS
susceptible to additional difficulties. Part
III critically evaluates the federal regulatory structure and standards for
certifying election management software.
It concludes that the federal standards produce the unintended and
injurious consequence of rewarding poor database designs with lower vendor research and development costs, and faster
movement through less intensive certification reviews than if the DB design were
more sophisticated.[10]
I. Database Design Fundamentals
Any successful database (DB) must accurately and
precisely store data without mixing values or losing information--an obvious
essential in managing election results data.
To diminish the incidence of anomalies which reduce the accuracy of DB
contents, computer science and engineering have established fundamental DB design
precepts, including “normalization.”[11]
Normalization
is a methodology of DB design that creates proper relations, removes redundant
data, promotes efficient use of disk space,[12] and reduces the likelihood that accessing and
manipulating data will result in anomalies.
Normal form classification uses consecutive, progressive numerical
titles (e.g., 1NF) to describe in shorthand whether a particular DB has
satisfied the fundamental design precepts.
If a DB design has not been normalized, the DB has been designed in a
manner that fails to prevent avoidable errors and data corruption.[13] For example, when the DB design causes
storage of specific data in multiple locations or tables, updates to that data
can cause anomalies to occur. Failure to
update the specific data in every location virtually simultaneously causes
inconsistencies in the data between the two locations (an update anomaly). Normalized DBs also create correct
dependencies[14] among
data sets. Incorrect dependencies can
create errors when data is added or deleted from the DB.
A. Normal
Forms
1.
First
As Edgar Codd has outlined,[15] satisfaction
of the first Normal Form requires a DB design to (a) eliminate repeating groups
in individual tables (atomicity);[16]
(b) identify each set of related data with a primary key; [17]
and (c) create a separate table for each set of related data.[18]
Violations of
the first Normal Form (1NF) include the flaws of repeating groups, the absence
of unique identifiers, the inclusion of multiple meaningful values in a single
field, and the inclusion of multiple columns representing the same type of
atomic data. Data corruption is highly
probable if any of these violatons are found within the DB design.
2. Second
The overarching purpose of the Second Normal Form
(2NF) is to reduce the amount of redundant and duplicate entries within a DB. A
DB table satisfies 2NF if (a) it conforms to 1NF and (b) each non-primary key
element is dependent upon the primary key.[19] DB satisfaction of 2NF means tables with
repeating information separate the repeating data and reference those records
through the use of “integrity constraints.”
Integrity constraints provide a method to ensure data entry changes or
updates do not result in a loss of data consistency.[20] The most common tool deployed is known as a
foreign key
The first and second Normal Forms contain the most
fundamental design principles for efficient and accurate DBs. Any DB that fails to satisfy the first two
Normal Forms will suffer various failures upon deployment.
This paper’s analysis of the GEMS DB design is based
on review of publicly available GEMS election DBs that are publicly accessible via
the internet.[21]
A.
System Architecture Design Flaws
Analysis of the GEMS DB architecture[22] demonstrates that it violates both 1NF and 2NF.
1.
Violations
of 1NF
The
First 1NF
Violation: GEMS’s Race table violates 1NF because it has multiple
columns representing the same type of atomic data. The Race table is structured
for two columns to contain the same type of atomic data, VGroup1Id and
VGroup2Id, but the purpose of having two columns cannot be distinguished from
examining the table alone. Through
normal operation, the GEMS design creates unnecessary processing and uses DB
storage inefficiently. GEMS thus
violates one of the main purposes of the 1NF:
eliminating duplicative columns from the DB.[23]
Second
1NF Violation: GEMS includes
multiple meaningful values within a single field as demonstrated by the VCenter
table of GEMS. The VCenter table holds
information regarding polling locations but because of the column “Label,” the
VCenter table violates 1NF: multiple meaningful values are held within the
same field. Combining data in this
manner makes it difficult to query voting locations and allows for numerous
entries for the same polling location.
Third 1NF
Violation: Several GEMS tables lack a unique identifier,
a failure demonstrated by review of Figure 1.
Within the two Counter tables, the third
and
forth entries of both Counter tables are indistinguishable from one another
other than by their position in the table.

Figure 1:
These
tables additionally attempt to avoid data duplication via deployment of MS
Access-specific functions, a strategy which has proved to be unreliable.[24]
Other 1NF
Violations The design of a number of other GEMS tables
crucial to producing accurate election results reports violate 1NF principles, with
flaws similar to those inventoried
above. [25]
2.
Violations
of 2NF
The GEMS DB design reveals little if no attempt to
conform to 2NF principles. GEMS lacks
data constraints that ensure data integrity,[26]
and omits referential constraints that ensure data consistency.[27] GEMS also replicates the same data throughout
numerous locations and tables.
The most
troubling violation of the 2NF involves the duplication of data between two
tables holding vote tallies. In Figure 1, for instance, both
CandidateCounter and SumCandidateCounter tables hold total vote data for a
candidate -- which is identified by the CandVGroupID column. This dual depositing scheme for election
results data can easily generate update anomalies — otherwise known as
inconsistencies in election results data for a given candidate and race. After an update anomaly has occurred, it is
impossible to rectify the inconsistency without identifying when the anomaly
occurred, or starting again from the beginning point when data processing
began. If an election ended and the
tables held different numbers in each table, the question becomes which vote
total should be accepted as the correct value.[28]
Eliminating the opportunity for these types of
anomalies is precisely the objective of 2NF.
GEMS’s design thus violates 2NF in tables essential to producing
accurate and consistent election results reports.
Additional exemplars of GEMS’ departures
from 2NF can be identified. DB
architects use data integrity constraints to ensure the type of data is correct
for a given field. These limits ensure a
program cannot input letters where numbers are expected or a negative number
where only positive numbers should be allowed. The SumCandidateCounter table
(see Figure 1), holding vote tally
information, violates 2NF in allowing negative values to be entered into the
table.
VCenterId is the polling location where votes
were registered, and thus should always have a positive value associated with
it.[29] A table that permits incorrect negative values,
such as Figure 2, vitiates one
identifier of DB corruption.
|
Reportunit Id |
VCenter Id |
Counter GroupId |
CandV GroupId |
Total Votes |
|
829 |
-1 |
0 |
1 |
246 |
|
829 |
-1 |
0 |
2 |
45 |
|
829 |
-1 |
0 |
3 |
231 |
|
829 |
-1 |
0 |
4 |
51 |
|
829 |
-1 |
0 |
5 |
252 |
|
829 |
-1 |
0 |
6 |
56 |
|
829 |
-1 |
0 |
7 |
230 |
|
829 |
-1 |
0 |
8 |
49 |
|
829 |
-1 |
0 |
9 |
231 |
Figure 2, Sample Data from
Allowing
negative values into the VCenter column means tracking the origin of votes cannot
occur[30] and
the value ‘-1’ might, but does not necessarily, indicate the table and/or DB is corrupted.[31] Placing data constraints upon the VCenter
table would force the values to remain positive integers.
Integrity constraints provide the foundation for
managing data that resides in several interrelated tables. The election data
management code within the
Microsoft’s
Joint Engine Technology (
But JET’s
limitations have led Microsoft (MS) to
state that JET is inappropriate for systems that require data integrity,
security, and transaction logs and rollbacks.[35]
Microsoft JET … was not intended (or architected) for the
high-stress performance required by 24x7 scenarios, ACID transactions, or
unlimited users, that is, scenarios where there has to be absolute data
integrity or very high concurrency.[36]
An election management system obviously requires both
“absolute data integrity” and in many urban jurisdictions if not all, a “very
high concurrency” of users. Thus, the GEMS’ architects’ choice of inexpensive
JET as the DB engine places the entire election tabulation process at very high
risk.
1. Capacity Limit
of 2 Gigabytes
GEMS
exacerbates JET’s limitations and can lead to DB failure. Its 2 gigabyte limit can easily be exceeded
in large turnout urban elections, especially where central count scanning is
utilized.[37] If the capacity limit is exceeded, database
corruption is highly probable.