################################################ # # # ## ## ###### ####### ## ## ## ## ## # # ## ## ## ## ## ### ## ## ## ## # # ## ## ## ## #### ## ## ## ## # # ## ## ###### ###### ## ## ## ## ### # # ## ## ## ## ## #### ## ## ## # # ## ## ## ## ## ## ### ## ## ## # # ####### ###### ####### ## ## ## ## ## # # # ################################################ The following paper was originally published in the Proceedings of the Fourth Annual Tcl/Tk Workshop Monterey, California, July 1996 For more information about USENIX Association contact: 1. Phone: 510 528-8649 2. FAX: 510 548-5738 3. Email: office@usenix.org 4. WWW URL: https://www.usenix.org QuaSR: A Large-Scale Automated, Distributed Testing Environment Steven Grady, G. S. Madhusudan, and Marc Sugiyama (steveng@sybase.com, madhu@sybase.com, sugiyama@sybase.com) Sybase, Inc. Abstract The QuaSR project at Sybase, Inc. involves the creation of thousands of automated tests for Sybase, Inc.'s SQL Server, each implemented as an independent Tcl program. The resulting test suite, significantly more than a million lines of code, comprises the largest known Tcl code base. The test harness is written in [incr Tcl] and the test cases in Tcl. Tcl and [incr Tcl]'s extensibility, simplicity, and reliability have made them uniquely suited to the development of a sophisticated automated testing system. 1. Introduction QuaSR (Quality Systems Re-engineering, pronounced "quasar") is an internal project at Sybase, Inc. to improve the quality of its software, starting with its relational database server product, SQL Server. The primary technical component of this effort is the QuaSR test harness. The harness provides an environment in which small automated test programs may be developed to test arbitrary pieces of functionality, and sets of tests may be combined into single test runs, with distributed resource allocation being handled automatically. This paper provides an overview of the design of the QuaSR test harness (also known as "QuaSR"), along with samples illuminating the use of the system. It concludes with lessons learned from the application of Tcl and [incr Tcl] to an automated testing environment such as ours. 2. Test Harness Design 2.1 Overview The goal of the QuaSR project is to deliver a fast, reliable, extensible, and automated test system. Specifically, it must enable overnight execution of the main body of SQL Server regression tests. To this end, it provides for: minimized, automated resource allocation self-analyzing, assertion-based test cases distributed client/server execution standardized test components test case independence 2.2 Test Case Files (To better understand the system, the following discussion will refer to the example Test Case File, rtrim.tcf. For the sake of brevity, the free-text sections have been edited.) A Test Case File, or TCF (pronounced "tee-kiff"), is a file containing a set of test cases. The test cases are usually related in the functionality they test. The TCF consists of a declaration of resources available to all test cases, an initialization routine, called tcf_start, a termination routine, called tcf_end, and a set of one or more test cases. In rtrim.tcf, one resource is declared, a standard database of type stdbempty (a database with a full complement of users defined, but no additional data). It is given the logical name mydb. In tcf_start, a database connection is established to the SQL Server on which mydb resides. The connection is established through the user loginA and given the name sqlcon. This connection is established before any test case in the TCF is executed. Tcf_end closes the sqlcon connection; it is invoked after all test cases are complete. The tcf_run line at the end is an implementation artifact. The resources section is simple and powerful. In rtrim.tcf, it contains only the declaration of a single database. However, QuaSR automatically supplies any additional resources necessary to support a standard database; specifically it results in the implicit declaration of a SQL Server to host the database, three logical devices to support the SQL Server and the database, three physical devices (potentially raw devices or filesystem space) to support the logical devices, a machine that holds the devices and runs SQL Server, and a network address for client-server communication. Resources can also be declared more explicitly if non-standard configurations are required. For instance, a standard database could also be declared more explicitly as: machine mymac sql_server -machine mymac mysrv stdbempty -sql_server mysrv mydb The above declaration would ultimately result in exactly the same resource allocations as the simple declaration used in rtrim.tcf. The declaration syntax can be used to declare arbitrarily complex resource relationships, however. One can declare resources for tests that require interactions between multiple databases, SQL Servers, or machines, non-default device types and sizes, specific platforms or localization values, etc. 2.3 Test Cases A test case is the smallest unit of functionality in QuaSR. Each test case is given an identifying number and has text describing the assertion under test, a strategy providing an English description of the approach used for the test, and a program implementing the test. The first two sections are normally logged when a test case fails so that all relevant information is readily available for test failure analysis. rtrim.tcf contains one test case, which tests a negative assertion (one expected to generate an error). Using the resources declared in the resources section and the connection created in tcf_start, it creates the appropriate SQL commands to test the assertion, sends them through the connection, and analyzes the results. Depending on the results, the test case may generate a PASS, indicating the assertion was found to be valid, a FAIL, indicating the assertion was determined to be invalid, or an UNRESOLVED (caused by the error call), indicating that there was some problem with the test case and that the assertion could not be tested. 2.4 Test Sets A test set is a set of test cases. An example test set is: tests/dml/rtrim.tcf tests/dml/select.tcf{1,3,5-8} tests/ddl This test set specifies a set consisting of all the test cases in rtrim.tcf, a subset of the cases in select.tcf, and all the test cases in the directory hierarchy ddl/. Through other QuaSR mechanisms, it is possible to specify test sets such as "all tests that do not require a tape drive", "tests that require SQL Server version 11 that run only on HPs", or "tests that exercise code in cache.c". A test set is used as the basis for resource allocation and usually as the basis for an execution session. 2.5 Test Sessions A test session normally consists of the following steps: generate scan_res acquire prepare exec release The steps generate through prepare are used to set up the environment to run a set of tests. The exec step runs the tests. The release step cleans up the session, allowing any acquired resources to be used by others. Generate is responsible for converting a test set specification file into a format appropriate for use by the rest of the system. For instance, it traverses any directory hierarchies to determine the individual TCFs within them. Scan_res scans the resource requirements of the TCFs. It reads each TCF, determines the full resource requirements of each, then consolidates the full set of TCF-declared resources into a minimal resource set. Thus, for instance, a set of 100 TCFs may require only a single SQL Server, on a machine with enough space for two standard databases. (Theoretically, the resulting set may not actually be minimal, but in practice, our algorithm almost always generates a minimal set.) Acquire uses the minimal resource set as a basis for requesting a machine (or machines) that can support the resources. The resources are allocated from the Resource Manager, and are locked for use during the test session. By allocating the complete set of resources before execution, the user is guaranteed that the test run will not fail during execution due to lack of resources. Platform type, operating system, SQL Server version, and other attributes of the resulting execution environment are recorded for later phases. Prepare prepares the resources. Support directories are created on the machines, SQL Servers are initialized, databases are created. This step is the first in which the machines responsible for executing the tests are actually used. Exec executes the test cases. The user has the option of specifying an execution scenario that contains a subset of those in the original test set. Each TCF is run in turn, binding the logical resource names to the acquired resources, running the tcf_start initialization, executing each of the test cases specified in the scenario, then cleaning up in the tcf_end routine. The results are logged in a journal file, with summary information of the run printed on the standard output. The journal file is a structured file containing all information relevant to a particular run, including resource attributes, debugging information for non-PASSing tests, and timing information. Release releases the acquired resources, freeing them for others to use. The separation of the default process into individual steps allows for greater control by the user. For instance, a SQL Server developer can prepare all resources, run the tests, fix bugs demonstrated by the test run, then (using other steps not described above) copy a new version of the SQL Server binary to the acquired machines to re-run the tests. A test developer could modify the resources declaration in a TCF, then re-scan the resources to verify that the currently-acquired resources are sufficient to support the new declaration. There are also compound steps, such as "setup", which are responsible for executing multiple basic steps. 2.6 User Configuration QuaSR allows the test runner to specify certain resource configuration values. For instance, the user may specify that the unspecified machine platforms should default to "Sparc" or that physical devices should default to use raw partitions. Another configuration variable allows the user to specify an alternative binary to execute in place of the standard SQL Server (it will be copied automatically to the remote machines before execution), or to allow the user to run binaries under a debugger rather than invoking them automatically. 2.7 Graphical User Interfaces There are multiple graphical interfaces to the system. One provides an interface to the steps described above (along with information about the current state of the system, buttons to provide terminal connections to SQL Servers, and other conveniences). Another provides a simple way to browse a journal file. Among other things, it provides buttons to traverse the hierarchical format in an intuitive fashion, uses multiple colors to distinguish different types of information, formats query results into a familiar format, and allows the user to inspect the original TCFs. 2.8 Test Code Support Various support libraries and extensions allow test case writers to write code at a high level conforming to that of the strategy. QASQL provides commands to communicate with SQL Server, formatting the resulting data stream into a manipulable format. Undo provides a mechanism for expressing an undo stack; this mechanism is used to record any changes that a test case makes, so that after the execution of the test case, the SQL Servers can be restored to the same state as before the test case was run. The Utility Library provides a set of common high-level procedures and objects to simplify the coding of standard test steps (for instance, the RESULT object in the example test case understands requests for specific pieces of information, such as whether a given server message was contained in the result). The Log library provides a way to generate uniform, parsable debug messages suitable for later filtering. The Resources module, which defines the resources available for declaration, supplies a variety of additional methods for information retrieval and resource control, such as determining the exact size of an allocated device, or shutting down and rebooting a SQL Server. 2.9 Utility Scripts In addition to the journal browser, there are various tools available for analyzing the results of a single run or set of runs. There are tools to process the journal and place the results in a database suitable for querying. There are scripts to summarize the results of a single run and for comparing two runs. Mrsummary summarizes the resource requirements for a particular session. Showsql shows the SQL commands that were sent to SQL Servers in a given session, optionally in a format suitable for input to isql, an interactive SQL shell. 2.10 Other Modules The agent is a program that must run on each remote machine; it services requests to start processes and capture their output, create and remove files and directories, and other simple, operating-system-specific activities. It is the only part of the system which is ported to all the SQL Server platforms (which include, along with over a dozen UNIX variants, diverse platforms such as VMS, NT, OS/2, and NetWare). The agentlib library provides commands for the client to communicate with the server agents. The assertion database stores information about assertions and their associated test cases. The results database stores the results of test runs (based on the contents of journal files). Interactions in the system are controlled by a state machine, which determines what steps are legal at a given stage. This information enhances the main GUI, by limiting users to legal actions. Tcl, [incr Tcl], and the various extensions are combined into a single interpreter, called squash (Sybase QUality Assurance SHell). Squash is the program which actually runs the code in the test cases. There is an option to run a full test set in parallel. The algorithm used is very simple: to run n parallel threads, the tests are examined, and those requiring only a standard empty database are placed in n-1 homogeneous buckets; the rest in one heterogeneous bucket. The buckets are then separated into test runs and run individually. Currently, about two thirds of the TCFs fit into the homogeneous buckets. Ignoring time requirements for TCFs, on average a suite can be split into three threads to parallelize at maximum effectiveness. 3. SQL Server Test Suite As of the time of writing (March 1996), thousands of individual test cases have been coded for testing SQL Server, comprising a Tcl code base with a line count in the millions - the largest known Tcl code base in a single project. Some of the cases test multiple variations; the total suite currently performs tens of thousands of product tests. Test suites for products other than SQL Server are also in development. The SQL Server test cases vary in length from around 10 lines to thousands of lines, with the majority of the simpler cases being under 100 lines. They vary in complexity from those that create and execute a single SQL command and verify the result (as in the test case in rtrim.tcf) to those that have several nested loops, and check the results of tens of SQL commands. Individual test cases run in times ranging from under one second to about twenty minutes, with the vast majority running in under five seconds (depending on the server platform). On a fast server (e.g. an HP9000/800G with 128 megabytes of memory), a single-threaded run takes under twelve hours. 4. Use of Tcl and [incr Tcl] The QuaSR test harness is implemented as a combination of C, C++, Perl, Bourne shell, Tcl, and [incr Tcl]. C and C++ are used primarily to provide Tcl extensions (including QASQL, the debug log library, and the agent library) that implement Tcl APIs on top of existing C APIs. Perl and sh are used for a few of the utility scripts, primarily performing file manipulation and process control. The bulk of the design is in Tcl (version 7.3) and [incr Tcl] (version 1.5). The GUIs are written using Tk (version 3.6). 4.1 Statistics There are about 17,000 lines of Tcl/[incr Tcl] in the harness (plus about 8,000 lines more for the data that go into the standard databases), and 10,000 lines of C and C++. (There are also about 1,000 lines of Bourne shell and Perl scripts). 4.2 [incr Tcl] classes The primary use of [incr Tcl] is in the definition of resources. Each resource is a separate class, organized into a single-inheritance hierarchy of about a dozen classes in total. Another hierarchy is used to define types of Resource sets, including the resources associated with a single TCF and the minimal resource set. Class containment is used to describe the relationship between platform-specific versions of SQL Server. Other [incr Tcl] uses include: the utility library, which uses objects to create high-level interfaces to result streams coming back from SQL queries, and the resource manager, which defines machine attribute requests as objects. 4.3 Conceptual Expression Nearly all of the conceptually challenging parts of QuaSR are written in Tcl/[incr Tcl]. The resource minimization algorithm, the dynamic resource binding, and the parallelization algorithm are coded in Tcl. The only uses of C/C++ are for C API module interfaces and for the journal browser. The browser must be able to parse and display files of 10 megabytes and more, so the processing code was rewritten in highly optimized C. 4.4 Test Case Tcl Usage Because few of the test developers on the project had experience in object-oriented design and implementation, we decided that they could ramp up more quickly if they did not have to learn about [incr Tcl] and class design. In test case code, we minimize the exposure of [incr Tcl] to the use of method invocations on declared resources within the test cases. (Of course, we make full use of [incr Tcl]'s capabilities in the test harness code.) 5. Benefits of Tcl and [incr Tcl] There was some initial apprehension about the choice of Tcl and [incr Tcl]. In hindsight, Tcl and [incr Tcl] have been suitable for both the harness and the test suite. 5.1 Easy to Learn Part of the project was hiring and training a group of programmers who would develop tests under our system. At the time of hiring, almost none of them had used Tcl. Tcl's simplicity reduced the ramp-up time for developing tests under QuaSR. 5.2 Extensible The use of [incr Tcl] classes as the basis for resource design made possible a truly powerful resource declaration language that can easily be extended as new resource requirements are identified. Even significant redesign of the class hierarchy has been possible due to the modular coding available with [incr Tcl]. During the course of the project, new resource classes have been added and old ones changed, with no backwards incompatibilities in the TCFs and few changes in other parts of the harness. 5.3 Embeddable Standing alone, Tcl with [incr Tcl] would not have been sufficiently powerful to meet the goals of QuaSR in a reasonable timeframe. The embeddable nature of the Tcl interpreter made it possible to build a superstructure (squash) that defines procedures and objects at an appropriate conceptual level. 5.4 Interpreted Because Tcl is interpreted, and because of the easy transformation between code and data, it has been possible to implement some very powerful constructs around the test cases. Test case code is stored as an [incr Tcl] instance variable in an object, and later executed. The execution is wrapped in code that catches exceptions, invokes user-specified hooks, and processes undo statements to reset the environment, all implemented with almost no effort. Adding further functionality, such as invariant checks between test cases, is similarly simple. The interpreted nature of QuaSR has also made it easy to test the harness itself, since the tests have easy access to the internal harness procedures. Also, it was possible to implement simple tests for the GUIs by invoking widget actions via Tk's send command. 5.5 Public Domain Because most of the components of QuaSR are based on public-domain code (Tcl, [incr Tcl], Don Libes' Tcl debugger, Perl, etc.), experimentation with different potential tools required little investment of time or money. The reduced risk made it possible to take more chances when trying to find the right solution. The reliability of both Tcl and [incr Tcl] has been outstanding, attributable at least in part to their public-domain status, resulting in thousands of developers being able to identify and fix problems. 5.6 List and String Processing Ultimately, testing SQL Server comes down to sending Transact-SQL commands to the server and verifying that the results are as expected. A language that allows strings to be created easily is appropriate to generating the commands to be sent, and a language that handles structured lists easily is appropriate for analyzing the response stream. The resulting test case code has been easy to read and write. 5.7 [incr Tcl] Cleanly Integrated Minimizing the exposure of [incr Tcl] to the test case writers would not have been possible had [incr Tcl] not been cleanly designed. As it is, test coders do not need to learn anything about [incr Tcl] beyond the structure of a method invocation. 6. Disadvantages of Tcl and [incr Tcl] While most of our initial concerns about the use of Tcl and [incr Tcl] were unfounded, there are a few areas that need to be improved before their use in this project can be called a complete success. 6.1 Performance Although the bulk of the processing during test case execution is on the server, much of the preparation time is on the client, particularly the scan_res phase which is responsible for consolidating the resources. The bottleneck here is [incr Tcl]'s performance when handling more than a handful of objects. [incr Tcl] 2.0 is supposed to alleviate this problem but we have not completed the transition to [incr Tcl] 2.0. The time for simply parsing the TCFs is non-trivial when they comprise over a million lines, and complex algorithms can be painfully slow. 6.2 Memory Consumption [incr Tcl] 1.5 is a memory hog. This becomes a problem when a large amount of data is held in memory. Again we are hoping [incr Tcl] 2.0 will alleviate this problem. 6.3 Lack of Development Tools Although we make some use of public-domain development tools, such as Don Libes' debugger, we sorely feel the lack of industrial-strength tools, a truly integrated debugger that also understands [incr Tcl]. a profiler with profiling for Tcl procedures, and [incr Tcl] objects and methods. a syntax checker. code coverage analyzer. In particular, each test case must be visually inspected to make sure there are no syntax errors, a tedious effort that a compiler would render unnecessary. Tcl compiler - we are testing a couple of recently released compilers. object browser for [incr Tcl]. This is mandatory if any serious programming has to be done in [incr Tcl]. Nautilus is a start, but we need tools similar to those suppled with most advanced C++ environments. 6.4 Rapid Change in Tcl Versions QuaSR is an extremely large system which is rolled out to 300-500+ users currently. The frequency with which Tcl and its extensions get changed is a double edged sword. On one hand we get quick bug fixes, but the price we pay is frequent upgrades. Once the system goes production, frequent upgrades will not be feasible. 6.5 Error Reporting Error reporting can be improved. The problem is mainly with the specificity of errors with respect to location and the nature of the problem. 7. Use of Alternate Test Harnesses QuaSR uses the X/Open TET harness as its underlying harness. It was chosen primarily because of its successful use in other projects and its support for assertion based tests. Its distributed nature was also a factor in its selection. DejaGnu could also have been used but we were not sure about its maturity and ability to support distributed tests. Having said this, it should be pointed out that the test harness plays a very minor role in the current QuaSR system and hence the choice of a test harness is not very germane. This may change if the system is used for interactive testing, where DejaGnu has definite advantages. But in the current non-interactive regime of tests, most of the work is done by the test case and the distributed agent with the harness merely acting as a test case sequencer. If we ever decide to switch to DejaGnu, the effort will not be significant since the underlying harness is fairly well isolated from the rest of the system. 8. Futures QuaSR is undergoing continuing development. The following changes are expected in the near future: 8.1 New Features QuaSR was originally designed for SQL Server testing. Relatively simple mechanisms can be added to support client testing as well, including interoperability testing of arbitrary client-server combinations. Support for testing server products other than SQL Server can be added relatively easily through additions to the resource class tree. Once the new resources are in place, tests can simply declare the new resource and use it. The new tests would operate cleanly with tests in the existing suite. 8.2 Updated Software QuaSR is currently being rolled out to various test and development groups at Sybase, Inc. In order to preserve stability, we have not integrated the latest versions of the underlying software. We expect to switch to Tcl 7.5, Tk 4.1, and [incr Tcl] 2.0 when we have the time to deal with any problems generated by the switch. 8.3 Performance Optimization The current system meets the broad performance goals. However more work is required before all the specific performance goals are met. 9. Conclusions The QuaSR project is ambitious both in its scope and its performance goals. The use of Tcl and [incr Tcl] as a basis for its implementation has its pros and cons but the benefits outweigh the disadvantages. The only significant drawback is the lack of industrial-strength development tools. From a development standpoint, it is clear that the high-level nature of Tcl was beneficial. Both the harness design team and the test writing team found that the bulk of the design time was spent thinking about conceptual problems; once a solution was devised, it was straightforward to implement in Tcl. The developers spent their time much more effectively than had they been using C or another low-level language. As a testing tool, Tcl's clean syntax and easy manipulation of structured data allow for powerful tests to be written simply and clearly. Inefficiencies due to interpretation are of little importance, particularly given the client-server nature of the product under test. Similarly, the high level interfaces made convenient by [incr Tcl] allow for powerful support libraries. High-level libraries allow tests to be written at a level close to that of their pseudo-code strategies. Although there are challenges involved in using Tcl for a single, large-scale program, it is perfectly suited to the development of (thousands of) small programs. In particular, QuaSR, despite its size, has not been impaired by the use of Tcl, and in fact has been well-served by Tcl and [incr Tcl]'s reliability, simplicity, and extensibility. 10. Bibliography [Libe] Don Libes, "A Debugger for Tcl Applications", Proceedings of the Tcl/Tk Workshop, University of California at Berkeley, June 10-11, 1993. [McLe93] M. J. McLennan, "[incr Tcl]: Object-Oriented Programming in Tcl", Proceedings of the Tcl/Tk Workshop, University of California at Berkeley, June 10-11, 1993. [Savo] Rob Savoye, "The DejaGnu Testing Framework". Available at https://www.cygnus.com/doc/dejagnu/dejagnu_toc.html. ========= rtrim.tcf ========= resources { stdbempty mydb } tcf_start { mydb login loginA sqlcon } tcf_end { sqlcon close } testcase 1 -assertion { When "rtrim" is used with less than or greater than one argument, then error 174 is generated. } -strategy { 1. set up list of no argument and two arguments 2. execute the following command: select rtrim(argumentlist) 3. if SQL Server does not return error number 174 return FAIL 4. repeat steps 2 and 3 for all test variations 5. if all test variations are successful, return PASS, otherwise, generate UNRESOLVED } -code { set pass_count 0 set testlist { { } { abc, xyz } } set totvariation [llength $testlist] foreach test $testlist { set cmd "select rtrim($test)" SQL_cmd RESULT sqlcon $cmd if { ! [$RESULT servermsg 174] } { util_log "expected error 174 NOT returned by server" return FAIL } incr pass_count } if { $pass_count == $totvariation } { return PASS } else { error "Expected variations $totvariation but got $pass_count" } } tcf_run