GEMS Tabulation Database Design Issues in Relation to

Voting Systems Certification Standards

 

Thomas P. Ryan1 and Candice Hoke2

 

 


Abstract

 

      This paper analyzes the Diebold Election Systems, Inc. election management software (GEMS) using publicly accessible postings of GEMS election databases.  It finds that the GEMS architecture fails to conform to fundamental database design principles and software industry standards for ensuring accurate data. Thus, in election tabulations, aspects of the GEMS design can lead to, or fail to protect against, erroneous reporting of election results. Further, GEMS’s dependence on Microsoft’s JET technology introduces additional risks to data accuracy and security.

Despite these technical and systemic deficiencies, GEMS received approval as complying with Federal Voting System 2002 standards. Questions then arise concerning the adequacy of the 2002 and 2005 regulatory standards.  The paper concludes that the standards structurally encourage and reward election system vendors for using less exacting database design standards.

 

With unprecedented Federal funding available to States under the Help America Vote Act of 2002 (HAVA),[3] election administration has become highly reliant on computer technologies.   While some continue to praise the new voting and tabulation technologies as a significant advance, the augmented computerization has introduced new possibilities for wide-impact election operational errors and may have opened new avenues for tampering with election results.  Previous vulnerability analyses have focused on a direct-recording electronic (DRE) voting machine,[4] a paper ballot optical scanning device,[5]  computerized vote-tallying,[6] and a pilot test of internet voting.[7]  But the systemic design features of currently utilized election tabulation databases have yet to be closely examined.

        This paper analyzes the Diebold Election Systems, Inc. (DESI) election management software named Global Election Management System (“GEMS”) using publicly accessible postings of GEMS election databases.[8]  It finds that the GEMS architecture violates fundamental design principles and software industry standards for ensuring accurate data.  When utilized for election tabulations, the GEMS design can lead to data errors, which in turn create a serious risk for generating erroneous election results.  GEMS architectural design plus its use of Microsoft’s JET technology,[9] introduces significant risk of data errors in elections administered using GEMS.

Either of these design aspects would be worrisome.  For the GEMS database (DB) to have been structured with fundamental flaws at the levels of both system architecture and system technology, and yet still obtain Federal and State certification, raises questions concerning the adequacy of the existing regulatory standards.   Thus the paper turns to ask what the relationship is between the regulatory standards and the technical database flaws.  It argues the regulatory standards structurally encourage low DB design standards rather than promoting the use of tabulation system architecture that meet widely recognized industry standards for data accuracy and reliability. 

        This paper proceeds by briefly reviewing the DB design principles of the First and Second Normal Forms.  In part II, the paper examines the GEMS DB in light of these fundamental design principles, concluding that GEMS does not satisfy even the most basic, essential precepts of the First Normal Form. Further, its use of JET technology renders GEMS susceptible to additional difficulties.  Part III critically evaluates the federal regulatory structure and standards for certifying election management software.  It concludes that the federal standards produce the unintended and injurious consequence of rewarding poor database designs with lower vendor  research and development costs, and faster movement through less intensive certification reviews than if the DB design were more sophisticated.[10] 

 

I.       Database Design Fundamentals

 

Any successful database (DB) must accurately and precisely store data without mixing values or losing information--an obvious essential in managing election results data.  To diminish the incidence of anomalies which reduce the accuracy of DB contents, computer science and engineering have established fundamental DB design precepts, including “normalization.”[11]         

        Normalization is a methodology of DB design that creates proper relations, removes redundant data, promotes efficient use of disk space,[12]  and reduces the likelihood that accessing and manipulating data will result in anomalies.  Normal form classification uses consecutive, progressive numerical titles (e.g., 1NF) to describe in shorthand whether a particular DB has satisfied the fundamental design precepts.  If a DB design has not been normalized, the DB has been designed in a manner that fails to prevent avoidable errors and data corruption.[13]  For example, when the DB design causes storage of specific data in multiple locations or tables, updates to that data can cause anomalies to occur.  Failure to update the specific data in every location virtually simultaneously causes inconsistencies in the data between the two locations (an update anomaly).  Normalized DBs also create correct dependencies[14] among data sets.  Incorrect dependencies can create errors when data is added or deleted from the DB. 

 

A.     Normal Forms

 

1.                          First Normal Form (1NF)

 

As Edgar Codd has outlined,[15] satisfaction of the first Normal Form requires a DB design to (a) eliminate repeating groups in individual tables    (atomicity);[16] (b) identify each set of related data with a primary key; [17] and (c) create a separate table for each set of related data.[18]

    Violations of the first Normal Form (1NF) include the flaws of repeating groups, the absence of unique identifiers, the inclusion of multiple meaningful values in a single field, and the inclusion of multiple columns representing the same type of atomic data.  Data corruption is highly probable if any of these violatons are found within the DB design. 

 

         2.  Second Normal Form (2NF)

 

The overarching purpose of the Second Normal Form (2NF) is to reduce the amount of redundant and duplicate entries within a DB. A DB table satisfies 2NF if (a) it conforms to 1NF and (b) each non-primary key element is dependent upon the primary key.[19]  DB satisfaction of 2NF means tables with repeating information separate the repeating data and reference those records through the use of “integrity constraints.”  Integrity constraints provide a method to ensure data entry changes or updates do not result in a loss of data consistency.[20]  The most common tool deployed is known as a foreign key

The first and second Normal Forms contain the most fundamental design principles for efficient and accurate DBs.  Any DB that fails to satisfy the first two Normal Forms will suffer various failures upon deployment. 


 

II.  GEMS Database Design Flaws

 

This paper’s analysis of the GEMS DB design is based on review of publicly available GEMS election DBs that are publicly accessible via the internet.[21] 

 

A.     System Architecture Design Flaws

 

          Analysis of the GEMS DB architecture[22] demonstrates that it violates both 1NF and 2NF.   

 

1.      Violations of 1NF

 

The GEMS DB design violates fundamental principles of DB architecture for it fails to conform to several 1NF principles.  Consequently, GEMS is susceptible to the common errors and anomalies that 1NF seeks to eliminate.  Most troubling, nonconformity with 1NF can cause erroneous data to be entered into the DB through normal operation of the system.  System failures can then occur without an operator knowing or having any indication that the system is failing because the DB lacks essential design constraints in place to prevent invalid data.

First 1NF Violation: GEMS’s Race table violates 1NF because it has multiple columns representing the same type of atomic data. The Race table is structured for two columns to contain the same type of atomic data, VGroup1Id and VGroup2Id, but the purpose of having two columns cannot be distinguished from examining the table alone.  Through normal operation, the GEMS design creates unnecessary processing and uses DB storage inefficiently.  GEMS thus violates one of the main purposes of the 1NF:  eliminating duplicative columns from the DB.[23]

Second 1NF Violation: GEMS includes multiple meaningful values within a single field as demonstrated by the VCenter table of GEMS.  The VCenter table holds information regarding polling locations but because of the column “Label,” the VCenter table violates 1NF:    multiple meaningful values are held within the same field.  Combining data in this manner makes it difficult to query voting locations and allows for numerous entries for the same polling location. 

Third 1NF Violation:  Several GEMS tables lack a unique identifier, a failure demonstrated by review of Figure 1. Within the two Counter tables, the third

 

 

 

 

and forth entries of both Counter tables are indistinguishable from one another other than by their position in the table. 

 

 

Figure 1: GEMS tables CandidateCounter and SumCandidateCounter

 

These tables additionally attempt to avoid data duplication via deployment of MS Access-specific functions, a strategy which has proved to be unreliable.[24]

Other 1NF Violations  The design of a number of other GEMS tables crucial to producing accurate election results reports violate 1NF principles, with  flaws similar to those inventoried above. [25]

 

2.      Violations of 2NF

 

The GEMS DB design reveals little if no attempt to conform to 2NF principles.  GEMS lacks data constraints that ensure data integrity,[26] and omits referential constraints that ensure data consistency.[27]  GEMS also replicates the same data throughout numerous locations and tables. 

        The most troubling violation of the 2NF involves the duplication of data between two tables holding vote tallies.  In Figure 1, for instance, both CandidateCounter and SumCandidateCounter tables hold total vote data for a candidate -- which is identified by the CandVGroupID column.  This dual depositing scheme for election results data can easily generate update anomalies — otherwise known as inconsistencies in election results data for a given candidate and race.  After an update anomaly has occurred, it is impossible to rectify the inconsistency without identifying when the anomaly occurred, or starting again from the beginning point when data processing began.  If an election ended and the tables held different numbers in each table, the question becomes which vote total should be accepted as the correct value.[28] 

Eliminating the opportunity for these types of anomalies is precisely the objective of 2NF.   GEMS’s design thus violates 2NF in tables essential to producing accurate and consistent election results reports. 

Additional exemplars of GEMS’ departures from 2NF can be identified.  DB architects use data integrity constraints to ensure the type of data is correct for a given field.  These limits ensure a program cannot input letters where numbers are expected or a negative number where only positive numbers should be allowed. The SumCandidateCounter table (see Figure 1), holding vote tally information, violates 2NF in allowing negative values to be entered into the table.

VCenterId is the polling location where votes were registered, and thus should always have a positive value associated with it.[29]  A table that permits incorrect negative values, such as Figure 2, vitiates one identifier of DB corruption.

 

 Reportunit

Id

VCenter

Id

Counter

GroupId

CandV

GroupId

Total

Votes

829

-1

0

1

246

829

-1

0

2

45

829

-1

0

3

231

829

-1

0

4

51

829

-1

0

5

252

829

-1

0

6

56

829

-1

0

7

230

829

-1

0

8

49

829

-1

0

9

231

 

Figure 2, Sample Data from GEMS SumCandidateCounter

 

Allowing negative values into the VCenter column means tracking the origin of votes cannot occur[30] and the value ‘-1’ might, but does not necessarily,  indicate the table and/or DB is corrupted.[31]  Placing data constraints upon the VCenter table would force the values to remain positive integers. 

Integrity constraints provide the foundation for managing data that resides in several interrelated tables. The election data management code within the GEMS software lacks this essential foundation for data accuracy.  Although the GEMS software holds some form of data management, without integrity constraints it is only a matter of time before major problems surface. While modifying the GEMS architecture to include integrity constraints would not completely solve the problem, it would assist in ensuring data accuracy.

 

B.     System Technology Flaws:  Use of JET

 

            Microsoft’s Joint Engine Technology (JET) is a basic DB engine[32] technology that is appropriate for personal computing and very small scale applications requiring DB technology.  Commercially known as Microsoft Access®,  JET is a file-sharing DB that can support DBs with sizes up to 2 gigabytes.[33]  JET is often considered ideal for small DB deployments with very few concurrent user/processes,[34] and can also be used by custom programs to access the data through the Microsoft Data Access Components Application Programming Interface (MDAC API).

            But JET’s limitations have led Microsoft (MS)  to state that JET is inappropriate for systems that require data integrity, security, and transaction logs and rollbacks.[35]

 

Microsoft JET …  was not intended (or architected) for the high-stress performance required by 24x7 scenarios, ACID transactions, or unlimited users, that is, scenarios where there has to be absolute data integrity or very high concurrency.[36]

 

An election management system obviously requires both “absolute data integrity” and in many urban jurisdictions if not all, a “very high concurrency” of users. Thus, the GEMS’ architects’ choice of inexpensive JET as the DB engine places the entire election tabulation process at very high risk. 

 


1.  Capacity Limit of 2 Gigabytes

 

  GEMS exacerbates JET’s limitations and can lead to DB failure.  Its 2 gigabyte limit can easily be exceeded in large turnout urban elections, especially where central count scanning is utilized.[37]  If the capacity limit is exceeded, database corruption is highly probable.

 

2.  Data Corruption During Normal