Dixie Language and Interpreter Issues R. Stockton Gaines U. S. C. Information Sciences Institute 4674 Admiralty Way Marina del Rey CA 90292 Abstract Dixie (Distributed Internet Execution Environment) provides a base for sending programs called Dixie applications to Internet sites for execution. It provides the features generally found in operating systems, such as a file system, multiprocessing, interprocess communications, etc., and in addition capabilities to permit Dixie applications to interact with resources at the local site. Security is of first importance; it must not be possible for a Dixie application to have an undesired effect on the local system. This paper explains the Dixie concept and discusses language and execution issues. The languages understood by Dixie, at least initially, will fall in the class of Very High Level Languages, not the least because these languages will support the security requirements of Dixie, as well as the command language requirements. Dixie complements these languages, and provides uniform platform independent of the local hardware and operating systems to support Dixie application programs. Introduction Dixie (Distributed Internet Execution Environment) is a virtual operating system and program execution environment that is portable. Once installed on a system connected to the Internet (an Internet host, or simply "host", here), it can execute programs in the languages it supports. Dixie, therefore, provides a means of sending a program to other Internet hosts for execution. Programs which execute in the Dixie environment are called Dixie applications. They are an instantiation of the concept of knowbots or intelligent agents that can travel through the Internet carrying out useful actions. Dixie complements the recent work on very high level programming languages by providing an interface that is consistent across systems, is secure in that programs executing on Dixie are harmless to the local host system, and provides full operating system functionality. A program written in a language embedded in Dixie will see the same operating system interface anywhere in the Internet, independent of either the operating system or the compilers of the host system itself. Dixie solves another problem, that of software portability. A Dixie application can be run on any host with Dixie installed, without need for changes to be compatible with the host's underlying operating system, compilers or hardware. Dixie therefore is a vehicle for software distribution. Furthermore, because Dixie installations are themselves accessible through the Internet, a natural means of remote maintenance of Dixie applications (as well as Dixie itself) is available. Whereas portable servers such as World Wide Web (WWW) are primarily of interest on hosts that act as servers at Internet sites, Dixie will also be useful when installed on workstations. Dixie will include a GUI package so that a Dixie application running on a workstation can be interactive with the workstation user. Dixie includes a full operating system model, called the Dixie Host Interface (DHI). The main features are: processes, including multiple processes in execution, shared memory between processes, interprocess communications, and process scheduling and swapping; a complete file system interface; and device and service interfaces to the host system on which it is installed. The first priority in designing and implementing Dixie is that it provide a secure execution environment. There are two types of security of concern, both important. First, a Dixie application must not be able to harm the host system on which it runs in any way. Second, communications to and from both Dixie applications and the DHI must be authenticated, and be reliable (that is, the message received must be verifiably the message sent). These security considerations will not be discussed further here, but dominate the design of Dixie. Another system that offers the ability to send a program to a remote site for execution is Safe-Tcl [1, 7, 8]. Safe-Tcl is the most recent version of an active mail system (termed enabled mail in Safe-Tcl papers). Enabled mail is mail that when read by the receiver, executes as a program. Dixie takes a much broader view of the issues and requirements for executing programs at remote Internet sites, but many of the security issues are similar. A good discussion can be found in the Safe-Tcl references. The Dixie Host Interface will support multiple program execution environments. Initially these will be based on interpreters. Again, the goal is to provide a host-independent environment that supports useful languages. Dixie will combine three important existing components: the Prospero file system [5], the Tk [6] GUI package, and interpreters for several languages. At the time of writing it is expected that Python [9] will be of great interest. This language is an unusual combination of elegance, simplicity and power, with a number of features that are particularly suitable for Dixie (especially its form of modules). Tcl [6], already in widespread use, is another powerful and important language that will be integrated into Dixie. Other languages of interest are Perl [10], REXX [3] and, when and if it becomes available, Telescript from General Magic. All of these languages are implemented as interpreters, and are suitable both as command languages and programming languages. Since in many cases a Dixie application will execute on a remote site from the invoker of the program, and the invoker will not be connected in a session with the application, the ability of the application to generate commands to its operating sy stem (DHI) as well as lower level operating system calls is an important virtue of all these languages. One motivation for Dixie is to provide a method that permits programmatic access to local resources through the Internet in a safe way. For example, sites that maintain databases may wish to make the information in the database accessible without exporting the entire database. An SQL interface will be incorporated in DHI through which accesses to local databases can be defined and controlled by the host owner. For example, if various Departments of Motor Vehicles are interested in making information about automobiles and accidents available, but prohibiting access to any personal information about drivers or automobile owners, appropriate views of the database can be defined that will not support the retrieval of such prohibited information. A Dixie application running on such a host can issue SQL commands against these views, but cannot otherwise access the database. The file system interface will be based on Prospero [5]. It is already being used extensively, and has, for example, been used as the basis for the archie server. Prospero defines a mapped view of the underlying file system. The view that is presented through Prospero consists of a set of directories and files that may be different from the actual structure of the file system of the host computer. The mapping will be definable by the owner of the host system. (Prospero also includes the ability to make visible non-local files that reside on other systems. This, too, may be valuable for Dixie). An important aspect of Prospero is that attributes can be associated with each Prospero visible file and directory. These attributes can include access methods. For example, a read access method can be defined for each file. When the file is accessed, the routine specified for the file is invoked, rather than simply reading the file in the normal manner provided by the host file system. The attributes can include additional security mechanisms. One example would be the association of an access control list with a file, designating on a per file basis the rights of specific authenticated individuals. Attributes associated with directories would include the right to create a file, and to designate its type. For example, it would be possible, and useful, to restrict the creation of files to files that can be read but not executed. Dixie, through the use of Prospero, will be able to insure that no Dixie application can install a file in the local file system that is executable, which will prevent ma ny well known attacks on systems. The power to control exactly how the Dixie applications can interact with the local file system, including which portion of it is visible, and through the use of file and directory attributes place additional limitations on the access to the file system and the ways in which files are created, named or renamed and modified leads to a high degree of security. Prospero provides the ability to map a single file into an entire file system, from the viewpoint of a Dixie application. Prospero can also map a disk partition as a file system. This will isolate it completely from the host file system, if that is desirable. As can be seen from these examples, Prospero provides complete flexibility in providing persistent storage through a file system interfaced for Dixie and Dixie applications, with the ability to expose those parts of the host file system that the host owner desires, while restricting all other accesses. In general, restrictions on the use of the host system's resources will be implemented within DHI. Since DHI provides all support for Dixie applications, which cannot invoke the host operating system directly, restrictions on the language itself will be minimized. For reasons of efficiency or functionality, it may be desirable that a Dixie application be able to make calls on routines that are compiled to run directly on the host computer. For example, if Dixie had been available and in widespread use, it could have been used as the basis for finding the largest prime number using many computers throughout the world. The heart of this distributed application was a relatively simple C program. All of the communications and coordination parts of the application could have been handled through a Dixie program for each host, since the computation requirements for these were not great. But it would have been necessary to provide an interface to the C subroutine from a Dixie application. The main issue here is security. The host owner must be able to trust the C program. The host owner could trust the program if written locally, or obtained from a reliable source. Trust could also be based on an inspection of the program's source code, for programs that are simple enough. In the example just given, this could be straightforward. The program should inspect its inputs to insure that they are valid, should not make any system calls, and should communicate with the DHI in a straightforward way, such as accepting a single value as an argument and returning a single value. The routine would need to be registered by the host owner as callable through the DHI in order to be accessible to a Dixie application. To deal with more than very simple cases may be a research question. Language and Interpreter Issues The philosophy that motivates Dixie is that there is a clear distinction between an operating system and a programming language. Far too much of the operating system tends to get built into programming languages, limiting flexibility and applicability. This philosophy suggests that the abstractions presented to the programmer should be at a high enough level that there is freedom to do what makes sense during code generation, program execution and in the operating systems to deal with issues of memory management, process structuring and scheduling, etc. A process has an internal behavior and an external behavior. The programming language provides mechanisms for defining objects that populate the internal environment and specifying actions on those objects. A language is also needed to describe the external actions of a process, but that language is, according to the philosophy being espoused here, not part of the programming language. Rather it is a language invoked through the programming language by calls to routines that cause external actions, and by emitting statements in a language that is understood external to the process. A great virtue of many very high level languages is that they provide good tools for generating these statements for external consumption. An example from ADA may help to illustrate the point. ADA includes as language constructs "fork" and "join". Fork and join are process management actions. By including these as primitives, ADA was forced to add a lot more baggage within the language to define and manage what amounts to pseudo processes. These features in turn impose restrictions on the operating system, or else result in a complicated run time package to support ADA. If fork and join are calls on routines that are supplied separately from the programming language, they can have a semantics suitable to both the operating system and hardware environment in which the program will execute, and can be optimized for the needs of different types of applications. The separation of concepts between the programming environment and the supporting operating system environment of Dixie leads to a smaller set of requirements for the languages that provide the execution environment for Dixie applications. The required functionality, to the extent possible, will be provided by a set of run time callable routines that are common to all the programming environments. This has the additional virtue that Dixie can evolve without the need to change all the language interpreters when there is a change in the DHI. Since the Dixie Host Interface acts as the operating system for a set of Dixie processes that are executing Dixie applications, it must provide for the synchronization of the activities of these processes. A design objective of Dixie is to develop a set of synchronization and coordination tools that will support both processes running on the same machine and processes that are distributed among multiple machines. Semaphores and other synchronization mechanisms will be built into the DHI. Such tools are not ordinarily included in operating systems, but there are a couple of advantages. First, they can be made simple and efficient. In addition, the scheduling and swapping policies for Dixie processes can be aware of process synchronization activities, also improving efficiency. An issue that has not received much attention from the programming language community is how a programmer can view and act on a program from within the program. At least two aspects of this are pertinent to Dixie applications. Dixie applications will often execute far away from their creator, and must be able to deal with the local environment in ways anticipated by the programmer, but not interactively with the programmer or invoker during execution. One aspect of making a program aware of itself is to make accessible to the program the attributes of objects within the program. These attributes are known to the compiler or interpreter, and often to the run-time code, but generally are not accessible by the program itself. Objects (simple variables, arrays, structures, procedures and functions, etc.) have attributes such as type, dimension, and whether or not they have been written to (set). There are times when a programmer would like to obtain the values of these attributes. Variables to hold these values can be created and set explicitly in some circumstances, but not always. For example, when an array is passed by name, it would often be convenient to obtain the size of the array from attributes known to the run-time code. The current type of a variable is an interesting case in several very high level languages. In some of these languages, the type of all variables is "string" at the language level, but has a dynamic type such as integer, floating point or string at run time. Though the interpreter knows or can determine this dynamic type, it is not always available to the programmer. As an example of its use, one might like to construct a sort routine that checks on the types of the elements being sorted, and acted according to this information. Another aspect of a program that is likely to be of interest for Dixie applications is how long the program has run, according to some measure. Host computers that run Dixie so that Dixie applications can access local resources may wish to provide limitations on the amount of execution time any one Dixie application can consume. An approximation for this is the number of statements executed. The programmer may wish to write a Dixie application that uses most of the available time, and then interrupts itself to prepare a message reporting the results obtained before terminating (or being terminated). Methods of making this information available conveniently will be explored. A second area of interest is how one constructs programs to react to errors. The REXX language has incorporated the ON CONDITION concept from PL/1. This is very useful in many cases. The basic concept is that if a certain condition arises during a program, this creates a "trap" to a specified subroutine. It may or may not be possible to return to the point at which the trap occurred, depending on the cause of the trap and details the programming language. This notion of "if some state is reached, invoke this action" as a global statement to be checked for continuously during program execution, in contrast to explicitly programmed checks, is very powerful. It leads to a very useful kind of internal multithreading within programs. I refer to this as "internal" because it is not visible to the operating system. REXX includes the ability to turn condition checking on and off for specific events. It is very useful for building routines that can react to errors in dealing with the operating system without placing lots of messy error checking code in the middle of what may be already complicated blocks of code. There are several issue in implementing and using an on condition feature. Obviously it can be expensive to carry out checks continuously, so this must be dealt with in sensible ways. It must be clear to the programmer who cares what the overhead is in using this feature. It must be possible to turn checking on and off, so that sections of code where the condition being checked for will not occur need not bear the overhead. When a trap occurs, the question arises of how to determine where the trap was generated. It would be nice to be able to insert labels in the code for this purpose (as it would be for some debugging tools). If this were possible, the value of a variable associated with the trap could be checked to identify the trap location. In REXX it is possible to obtain a line number, which is useful for post-mortem debugging. This is hard to make use of at run time because line numbers will change each time the program is modified, and it is a problem to keep track of them accurately for use within a trap routine. The availability of an on condition that is triggered by the number of statements executed would be a useful solution to the problem mentioned above of trapping near the end of a Dixie applications allotment of execution time. References [1] N. Borenstein and M. Rose, "EMail with a Mind of its Own: the Safe-Tcl Language for Enabled Mail", to be published in ULPAA `94. [2] B. Borden. R. S. Gaines and N. Shapiro, "MH, A Message Handling System for the UNIX Operating System", The Rand Corporation, R-2376-PAF, October 1979. [3] M. F. Cowlishaw, The REXX Programming Language, Prentice Hall, 1990. [4] R. S. Gaines, "An Operating System Based on the Concept of a Supervisory Computer", Communications of the ACM, Vol.15, No.3, March 1972. [5] B. C. Neuman, "The Prospero File System: A global file system based on the Virtual System Model," Computing Sys- tems, 5(4),p. 407-432, FAll 1992 [6] J. Ousterhout, Tcl and the Tk Toolkit, Addison-Wesley, Reading Massachusetts, 1994. [7] M. Rose and N. Borenstein, "A Model for Enabled Mail (EM)", draft in preparation. [8] M. Rose and N. Borenstein, "MIME Extensions for Mail- Enabled Applications: Application/Safe-Tcl and Multipart/enabled-mail", draft in preparation. [9] G.van Rossum, Python 1.0.1, documentation and code available for anonymous ftp from ftp.uu.net in /languages/python. [10] L. Wall and R. Schwartz, Programming Perl, O'Reilly & Associates, 1990. Stockton Gaines has worked in the areas of computer operating systems and computer security for over 25 years. His paper "An Operating System Based on the Concept of a Supervisory Computer" [4] was presented at the 3rd Symposium on Operating Systems Principles in 1971. He was chairman of the ACM's Special Interest Group on Operating Systems (SIGOPS) and Operating Systems editor of the Communications of the ACM from 1975 through 1980. He was a consultant starting in 1981, and consulted on operating systems for IBM, Honeywell and Control Data Corporation, among others, during the years 1981-1989. Dr. Gaines directed ISI's research on parallel computing from 1989 to 1992, which include the porting of the Mach operating system to a distributed memory parallel computer . He developed the concepts of System Manager and Job Manager which form the basis of the Prospero Resource Manager being developed by Cliff Neuman at ISI. Dr. Gaines chaired the first conference on computer security, held in Princeton, NJ in 1972. He chaired the technical committee to oversee the development of a secure version of Unix for ARPA during 1976-1977. Together with Norman Shapiro of the Rand Corporation, he designed the MH mail handling system [2], and he directed its development. Subsequently, he did research on secure message system. As part of his consulting he worked on security issues for a number of clients.