################################################ # # # ## ## ###### ####### ## ## ## ## ## # # ## ## ## ## ## ### ## ## ## ## # # ## ## ## ## #### ## ## ## ## # # ## ## ###### ###### ## ## ## ## ### # # ## ## ## ## ## #### ## ## ## # # ## ## ## ## ## ## ### ## ## ## # # ####### ###### ####### ## ## ## ## ## # # # ################################################ The following paper was originally presented at the Third Annual Tcl/Tk Workshop Toronto, Ontario, Canada, July 1995 sponsored by Unisys, Inc. and USENIX Association It was published by USENIX Association in the 1995 Tcl/Tk Workshop Proceedings. For more information about USENIX Association contact: 1. Phone: 510 528-8649 2. FAX: 510 548-5738 3. Email: office@usenix.org 4. WWW URL: https://www.usenix.org ^L Automatic Generation of Tcl Bindings for C and C++ Libraries Wolfgang Heidrich Computer Graphics Lab University of Waterloo, Canada wheidrich@cgl.uwaterloo.ca Heidrich@informatik.uni-erlangen.de Philipp Slusallek Computer Graphics Group Universitaet Erlangen-Nuernberg, Germany Slusallek@informatik.uni-erlangen.de Abstract In the past few years Tcl has found widespread interest as a extensible scripting language. Numerous Tcl interfaces for a variety of C libraries have been created. While most of these language bindings have been created by hand, others have made use of dedicated code generators designed for the specific library. In this paper we present a tool for the automatic generation of Tcl language bindings for arbitrary C libraries. Moreover, the mapping of C++ class hierarchies to [incr Tcl] classes will be described. 1. Introduction --------------- One of the reasons for the recent success of Tcl is its powerful API to C and C++, which allows the extension of the core language with commands implemented as C functions. This facility has been used to create a variety of language bindings for C libraries, ranging from different 3D graphics libraries (IRIS GL, OpenGL, VOGLE, SIPP) to several X widget sets, for example Wafe and tclMotif. While most of this work has been done manually, other bindings, like Wafe [Neumann 93] have been made with the help of dedicated code generators, which create the required C code from a simpler description file. However, none of these semi-automatic systems is capable of creating Tcl bindings for C++ class hierarchies. In [Beier 94] Beier describes a framework for developing C++ class hierarchies in such a way that Tcl bindings can be created easily. The implementation of these classes, however, has to be done manually. In this paper we will present a tool called Itcl++, which can create Tcl bindings for C libraries automatically from the C header files. Moreover it can automatically map C++ class hierarchies to equivalent hierarchies in [incr Tcl], an object-oriented extension of Tcl [McLennan 93]. 1.1 The Problem --------------- New functionality can be added to Tcl interpreters by registering C functions of a specific type, which can then be accessed using the normal Tcl command syntax. Parameters are passed to these functions as an array of strings, much in the same way program arguments are passed to the C function main(). The functions then have to parse these strings and convert them to C values and data structures. The major problem which arises when trying to attach existing C or C++ library functions to Tcl, is that they do not normally receive their arguments using this argc/argv mechanism. The developer has to write a C wrapper function, which parses the parameter list, converts the string of each argument to the correct C type, and passes these arguments to the C function. Return values and other output arguments must then be converted back into strings in order to be stored in a Tcl variable. However, all wrapper functions are very similar to each other. Their main functionality, that is argument parsing and translation, can be created automatically if sufficient information about argument types is provided. The authors of Wafe use specification files with a special syntax for the description of C functions, widgets and special widget properties. These specification files are then parsed by a Perl script which creates the appropriate C code. The problem becomes even more complicated if not only functions, but also C++ objects are to be accessed. Not only must a wrapper function be created for each public member function, but since C(++) data can not be addressed directly from Tcl, string handles need to be assigned, where each handle represents one C++ object on ``the Tcl side'' of the application. Tables that translate handles to and from C++ objects can be implemented using the hash tables provided by Tcl. Moreover, our main goal was to transparently encapsulate C++ functionality in [incr Tcl] classes and objects. This means that [incr Tcl] classes have to be built, with every member function calling the corresponding C++ function wrapper (see Figure 2 and Section 3.2). All necessary code should be created without human intervention, if at all possible. 2. Structure of Itcl++ ---------------------- To meet these requirements, we decided to use a two step strategy. In the first step, C or C++ header files are parsed and specification files are created from function declarations and C++ class definitions. Additional information from a type declaration file, which contains information about complex C data types like structures or enumerations, is used. The functionality of the specification files is a superset of those used in the Wafe project. In contrast to Wafe, which uses a proprietary file format, our specification files are just Tcl scripts which are executed with a predefined set of functions. This approach ensures that our code generators have enough flexibility to handle even very complex situations, as arbitrary Tcl commands may be executed within the specification file. The generation of specification files is partially based on heuristics, since the semantics of a parameter can not always be determined by just evaluating its declaration. Consider, for example, the following function. void foo( Foo *f ); No decision is possible about whether "f" is supposed to be a pointer to an object of type "Foo", or an array of "Foo" objects: the two alternatives obviously require different conversion code. In cases where ambiguities occur, heuristics must be used, and a warning message is generated. Decisions made in this step may be overridden by simply editing the specification file. This specification file, perhaps with some modifications made by hand, now contains all the information required to create C++ and [incr Tcl] code in the second step. Every semantic ambiguity in the specification file should now have been resolved, so that code generation can take place without further intervention. Both [incr Tcl] and C (or C++) code is generated, where the [incr Tcl] part is only necessary if arrays or C++ objects are used. As mentioned above, both code generators are Tcl scripts which define a set of functions and then call the specification files. Another Tcl script is available to generate manual pages from the specification. A diagram showing the interaction of all the parts is shown in Figure 1. 3. Generating Code from Specification Files ------------------------------------------- The specification files contain one command for every C function which is to be mapped to Tcl. The following example specifies that the C function "foo", which takes an integer, and produces an integer as a return value, should be made available in Tcl as a command which also has name "foo". command int {} foo { in int {cname value} cmdCode {returnVar= foo( value );} } The line starting with keyword "in" declares the input parameter for the function, with the second entry being the C type, and the third entry being a list of option/value pairs. The option "cname" in the above example is used to assign a name to the variable used in the C code of the wrapper function. If no name is specified, the names "localVar", "localVar1" and so on are used. The same mechanism applies to return values. In our example the option list for the return type is empty, and the default variable name "returnVar" is used. The line starting with "cmdCode" contains the C code which is to be executed after parameters have been parsed, converted and stored in C variables. In our example the C function "foo" is called with its parameter. Then the result, an integer, is stored in the C variable "returnVar", which is the default name for the variable holding the return value. The code for converting the incoming Tcl parameter, a string, to the correct C type, and for converting the return value back to Tcl, is generated automatically. Another way of returning values which is often used in C is to pass a pointer to a variable as a function parameter. These semantics can be specified as follows: command void {} bar { out int {cname outValue} cmdCode {bar( &outValue );} } True call by reference semantics as available in C++ can be specified in a similar way. To allow the conversion of very complex data types, which might require temporary memory, clean up code may also be specified. The clean up code usually frees dynamically allocated memory and is executed as the last command in the wrapper function, after all output and return values have been written back to Tcl. In the following we will show how the different C and C++ data types can be mapped to Tcl. In Section 4 we will describe how the specification files can be generated automatically. 3.1 Basic C Types ----------------- For the conversion of C types between Tcl and C, only C code needs to be generated. No Tcl or [incr Tcl] code is necessary, and no C++ features are used, except for array types, which we will discuss below. A separate type specification file, which again is a Tcl script, contains the necessary information to convert values between C and Tcl. Simple C types, like integers, floats, characters and strings can be converted using "scanf" for input, and "printf" for output arguments. This information is stored in a Tcl array variable called "conversion": set conversion(scanf,int) "%d" set conversion(printf,int) "%d" Using this information, the following C code will be generated for every integer input variable. Recall that the default name for the C variable used to hold the value is "localVar". /* argv[i] holds the i-th parameter */ if( sscanf( argv[i], "%d", &localVar ) != 1 ) /* error handling */ ... A similar technique is used to create the output code. More complex C data types like enumerations do not have a "natural" representation in Tcl. However, string constants can be used to represent the C constants in Tcl. Consider the following C type declaration. typedef enum { Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday } Days; The following lines in the type file specifies that type "Days" is an enumeration, with Tcl string "Mon" corresponding to the C constant "Monday", "Tue" corresponding to "Tuesday", and so on. set conversion(enum,Days,Mon) Monday set conversion(enum,Days,Tue) Tuesday ... The code generated from this specification is a cascaded "if" statement. if( !strcmp( argv[i], "Mon" ) ) localVar= Monday; else if( !strcmp( argv[i], "Tue" ) ) localVar= Tuesday; ... The same principle can be applied when a set of C preprocessor macros can be passed as a parameter. Many commercial libraries use macros instead of enumerations, because the latter are not supported by older compilers. In this case the macro names can be mapped to Tcl strings by introducing a pseudo enumeration type in the type file. There are two different possibilities on how to handle C structures. One approach is to create a system of handles, and only pass these handles to Tcl, instead of the real data. Read or write access to single components of the structure would then be implemented by a call to a C function. We will describe such a system of handles when we discuss the conversion of C++ objects in Section 3.2. For C structures, however, this approach to access the components from Tcl is overly complicated. We therefore chose to convert structures to associative arrays in Tcl. This means that all components of a structure are passed to Tcl and stored in an array, where the component names are used as indices. To see how structure types can be specified in the type file, consider the following C type definition for a structure holding information about an employee. typedef struct { char *name; int sin; float salary; } Employee; The entry in the type file which specifies this C structure is a comma separated list of components and their types: set conversion(struct,Employee) "(char *) name,int sin,float salary" After the code for this type has been generated, its components can be accessed in Tcl using normal array syntax. For example the social insurance number of an employee can be accessed like this: employee(sin) The code generated for the conversion of structures recursively applies type conversion to each of the components of the structure. Like structures, arrays can be handled in two different ways. Again we have to choose between a system based on handles, and one based on the direct mapping of the array contents to a Tcl data structure such as a list. The latter approach, however, bears the problem of deciding at runtime how many elements a passed array contains. Another problem arises with this method when it is used in a C++ context. Suppose an array is returned from a function call. All its elements will be extracted, and put into a Tcl list. Later, when we want to pass this array to another C++ function, all elements will be transferred back, and a new array will be constructed. While this copy semantics is not usually a problem for arrays of simple types, it might be extremely harmful in the case of arrays of C++ objects, since for each element in the array a constructor will be called. Therefore we decided to use the same system of handles as for C++ objects to support arrays. In Tcl, these handles are encapsulated in [incr Tcl] objects, which have methods for reading and writing single entries, as well as assigning lists of values to arrays. 3.2 C++ Classes --------------- The syntax for the specification files presented above can easily be extended to C++ classes. The following is a specification of a simple counter class containing methods for incrementing and decrementing the counter by a given value, and a method for querying the current value. Furthermore it has one constructor, and one destructor. Since destructors in C++ do not take arguments, the existence of a public destructor needs only be specified with a binary flag. class Counter {} { constructor Constructor { cmdCode {returnVar= new Counter();} } destructor member int {} getValue { cmdCode {returnVar= self->getValue();} } member void {} += { in int {cname value} cmdCode {self->operator+=( value );} } member void {} -= { in int {} cmdCode {self->operator-=( localVar );} } } Since C++, unlike [incr Tcl], allows for more than one constructor in every class, every C++ constructor is mapped to a [incr Tcl] procedure. These procedures will call the [incr Tcl] constructor to create a new object, and will then invoke the corresponding C++ constructor. A more detailed description of the generated [incr Tcl] code can be found in [Heidrich 94]. Instead of having different wrappers for each public function of a class, we decided to group these functions into four categories (constructor, destructor, static and non-static member), for each of which we create one wrapper in order to prevent replication of code. The function call scheme is illustrated in Figure 2. In order to keep track of the C++ objects referenced by [incr Tcl], we use an object server, which consists of two hash tables. One hash table maps C++ pointers to [incr Tcl] object names and is used for handling return values. It is important that this table contains pointers to all C++ subobjects of every registered C++ object. That is, for each registered C++ object the table contains one entry for each class in the inheritance hierarchy of the object. The second table maps [incr Tcl] objects to C++ object pointers. Again we need different pointers for every class in the inheritance graph of the object, so we can not just take the [incr Tcl] object name as a hash key, since there is no one-to-one correspondence between names and pointers. Instead, we have to assign a unique handle to each [incr Tcl] subobject. This handle is stored in a private "self" variable within every subobject. During construction of an [incr Tcl] object, the constructor wrapper recursively calls the wrappers of the superclasses, with each wrapper registering the corresponding subobject and initializing the "self" variable (see Figure 3). The object server is used to convert the arguments when C++ functions are called from [incr Tcl]. If a member function takes an object as one of its parameters, the wrapper function simply queries the object handler for the address of this object. Since every [incr Tcl] object is assigned a C++ object at creation time, this pointer always exists. If, however, a C++ function returns a pointer to an object, this object may or may not be registered, that is, there may or may not be a corresponding [incr Tcl] object. If such an object exists, its name should be returned to [incr Tcl], otherwise a new [incr Tcl] object must be created and registered with the returned C++ object in the object server. In this case, if the object handler is not able to find a matching entry in its hash table, it creates a new [incr Tcl] object, which in turn registers itself with the object handler (see Figure 4). One problem that arises with complex class hierarchies is that [incr Tcl] does not support repeated inheritance, i.e. a class may only occur once in the inheritance graph of another class. Thus the inheritance graph for every class may only be a tree instead of an arbitrary DAG as in C++, for example, when using virtual base classes. This means that C++ class hierarchies that use this feature cannot be completely mapped to [incr Tcl], but rather the [incr Tcl] hierarchy has to be cut below the point where this problem would occur. While this is clearly a restriction, in practice the consequences do not seem to be too striking, since in C++ this feature is most often used to provide groups of classes with low level functionality, for example being writable to some sort of stream. In cases where [incr Tcl] is used as an high level interface on top of a C++ hierarchy, which seems to be the most appropriate range of application, one could as well do without such low level functionality. Nonetheless, we think that [incr Tcl] should be changed to support the full C++ semantics of inheritance in the future. 4 Generation of the Specification Files --------------------------------------- The specification files can be generated by a Tcl script which parses a list of ANSI C or C++ header files. It uses the information from the type description files and from built-in rule tables to figure out the semantics of a given parameter. As mentioned above, some of these rules need to be heuristic. Most of these heuristics deal with the problem of how to interpret pointer parameters: as arrays or as output values. The rules will determine, that, for example an argument of type "char *" is usually a string, while an argument of type "int *" is probably an output value. Whenever a heuristic rule is used, a warning message will be generated in the output file, so that it is easy to verify the correctness of the decision. The parser also instantiates C++ templates. This is done by looking for simple type definitions which involve template types, but it is also possible for the programmer to directly specify which instances should be generated for a given template. In the future, the type information could also be extracted automatically from the definition of enumerations and structures. It is clear, however, that the mapping of C macros to pseudo types mentioned in 3.1 would still have to be specified manually. 5. Results ---------- 5.1 Use of Itcl++ with our own Class Hierarchy ---------------------------------------------- Itcl++ was originally developed for the use in a object oriented rendering system called VISION, which is currently under development at the computer graphics laboratory at the University of Erlangen [Slusallek 95]. The heuristics used for C++ parsing have been developed using the classes of the VISION hierarchy as a reference. However the C++ classes have not been changed to accommodate Itcl++. The VISION system currently consists of about 250 classes, making extensive use of advanced C++ features such as templates. About 100 of the high level classes with a total of over 650 member functions have been mapped to [incr Tcl]. Heuristics have proven to work very well for this project: After inserting some type declarations in the types file, correct decisions have been made for all parameters of the 650 functions. As a result, we are able to start Itcl++ from a ``makefile'', so that code is now generated completely automatically, without the need for human intervention. We now use the [incr Tcl] interface to do initialization and configuration of our application, to describe scenes for our rendering system, and to test and debug new classes. 5.2 Use of Itcl++ with the OpenInventor Class Library ----------------------------------------------------- We tested Itcl++ with the commercial OpenInventor class library [Strauss 92] from Silicon Graphics. OpenInventor is an object oriented 3-D toolkit, which provides means to display and interactively manipulate complex scenes, using the 3D graphics library OpenGL. For our testing purposes we chose 32 classes with 190 member functions, mainly geometric objects and manipulators. The C++ parser detected 13 ambiguities, all relating to parameters of type "char *". Based on the heuristic rules, these parameters were interpreted as strings, not as pointers to "char". In all cases this interpretation turned out to be the right one, so that no further human intervention has been necessary. A specification file of 839 lines was used to create 8204 lines (about 18 KB) of C++ code. This averages to about 43 lines of code per member function. 6. Conclusion and Future Extensions ----------------------------------- We have presented Itcl++, a tool for automatically generating Tcl/[incr Tcl] interfaces for C and C++ libraries. We have shown that it is possible to map C types to Tcl, and whole C++ class hierarchies to equivalent hierarchies in [incr Tcl]. The approach has been demonstrated using examples from a rendering class library and a commercial graphics library. Directions for future development include the improvement of the heuristics used in the parsing step, the automatic generation of the type specifications from C and C++ header files, and providing direct read and write access to C variables. Itcl++ is freely available to the research community. Please contact the authors for details. 7. Acknowledgments ------------------ We would like to thank Gustav Neumann and Stefan Nusser, who implemented a specification syntax and a related code generator for the Wafe program [Neumann 93]. We used their Perl implementation as a starting point for Itcl++. Gustav Neumann also made some suggestions concerning the syntax of our specification files. We would also like to thank the members of the VISION project, for which the tool was originally written since they tested early versions of Itcl++ and provided valuable feedback. Finally we would like to thank Fabrice Jaubert for reviewing an early version of the paper and making useful suggestions to improve its quality. References ---------- [Beier 94] Beier, E. (1994). Tcl meets 3D - interpretative access to object-oriented graphics. In Proc: 2nd Tcl/Tk Workshop, New Orleans, 1994. [Heidrich 94] Heidrich, W., Slusallek, P., and Seidel, H.-P. (1994). Using C++ class libraries from an interpreted language. In Proceedings of TOOLS USA '94. [McLennan 93] McLennan, M. J. (1993). [incr Tcl]: Object - Oriented Programming in Tcl. In Proc: 1st Tcl/Tk Workshop, University of California at Berkeley, 1993. [Neumann 93] Neumann, G. and Nusser, S. (1993). Wafe - an X toolkit based frontend for application programs in various programming languages. In Proc: Usenix Winter Conference, 1993. [Ousterhout 90] Ousterhout, J. K. (1990). Tcl: an embedded command language. In Proc: Usenix Winter Conference, 1990. [Ousterhout 94] Ousterhout, J. K. (1994). An introduction to Tcl and Tk. Addison Wesley. [Slusallek 95] Slusallek, P. and Seidel, H.-P. (1995). Vision - an architecture for global illumination calculations. IEEE Transactions on Visualization and Computer Graphics, 1(1). [Strauss 92] Strauss, P. S. and Carey, R. (1992). An Object-Oriented 3D graphics toolkit. In ACM Computer Graphics. SIGGRAPH '92 Conference Proceedings. [Welch 94] Welch, B. (1994). Practical programming in Tcl and Tk. To be published. Prentice Hall.