A DISTRIBUTED SOFTWARE ARCHITECTURE FOR GPS-DRIVEN MOBILE APPLICATIONS Thomas G. Dennehy Environmental Research Institute of Michigan Ann Arbor, MI 48113-4001 ABSTRACT The unique requirements of voice recognition can shape a software architecture in many ways that have proven effective for mobile and distributed applications. We show in this paper that extending the voice recognition model of translating utterances into sentences to include translating a variety of real-world events into a command protocol can create an architecture whose components operate identi- cally on hand-held devices, man-portable or vehicle-borne units, notebook, or desktop computers. SANSE, a portable navigation and geographic information management system having several redundant user interfaces, is described. In SANSE a collection of distributed Interactors translate events­spoken words, input from GPS hardware, timers expiring, input from files or communication links, and direct manipulation actions­into SANSE commands that are sent to one or more Receivers, which can execute commands without regard to their source. The complete operation of this system can be captured in vocabulary of less than 70 words, small enough to provide speaker- independent operation yet rich enough to be broadly applicable. The architecture can be extended by adding new Interactor types without affecting the operation of the baseline system. 1. Introduction SANSE is a software architecture for GPS-driven mobile applications that developed from a simple yet challenging concept: to build a portable navigation and geographic information management system with two completely redundant user interfaces: direct manipula- tion and voice-activated. The unique requirements of voice recognition shaped the SANSE architecture in a number of ways that proved effective when configuring systems for stand-alone or networked operation. Com- ponents of SANSE-based systems can be deployed on hand-held devices, man-portable or vehicle-borne units, notebook, or desktop computers. For a command language to be effective, it must satisfy a number of criteria [1]: · Expressiveness - The language must provide complete access to the capabilities of the system. · Expressiveness of Intent - The vocabulary must be precise enough, but the user should not be overbur- dened with expressing his intent. Commands should be short-a few words at most-and to the point. · Freedom from Detail - The vocabulary should be interpreted within a general context in order to cut down the detail that needs to be expressed. This should not be confused with context-sensitive grammars, which allow a single word to be interpreted in multi- ple ways depending on the local sentence context. Such word overloading should be avoided in com- mand languages. · Principle of Least Surprise - The commands and vocabulary should be familiar and natural and behave in expected ways. A Geographic Information System (GIS) may have several related definitions of North, for example, and although a particular language may choose to recognize only one of these definitions, the language should not redefine North to mean what is generally recognized as South. But an effective command language provides not only a convenient means to use a system, but also a natural structure around which to organize the system, a partic- ularly effective structure for distributed systems. First, by defining system behavior in terms of a well-under- stood command set, we can effectively decouple the response to a command from the various and often redundant circumstances that can initiate the command. Second, the command set defines the internal protocol of the system, creating abstract interfaces between architectural components so these components will interact identically whether deployed on a common plat- form or distributed across a hardware network. Finally, short commands make effective use of inter-process, packet, cellular, and other protocols. In the next section, we describe the core SANSE archi- tecture for a system with direct manipulation and voice activation, and how this model can be extended for a variety of other input sources. The SANSE protocol is then described, followed by a description of a SANSE- based portable navigation system and discussion of future directions. 2. Core System Architecture Creating two redundant user interfaces, voice-activated and direct manipulation, is a difficult problem since these two interface styles communicate very differently. Voice recognition hardware translates utterances into one of several forms, the most common of which are: isolated words, where a token representing each individ- ual recognized word is returned to the host; and con- nected speech, where speech energy is interpreted according to a sentence grammar loaded onto the hard- ware, returning legal sentence structures. To accommo- date both styles, SANSE chose sentences as the basis for the protocol between the Voice Interactor and the SANSE core. If an isolated word recognizer were cho- sen, it would be the responsibility of the Voice Interactor to assemble the words into legal sentences. Visual interface toolkits have various styles of commu- nication. User input can cause events to be posted (X Windows), messages to be sent (Microsoft Windows) or callback function to be invoked (Xt toolkit, Motif wid- gets). Complete redundancy between the two user inter- faces required the Screen Interactor to relate to the SANSE core in the same way as the Voice Interactor, making the callback structure not feasible. While events or messages are a valid basis for communication, they didn't match the natural output of the Voice Interactor. The Screen Interactor was therefore designed to trans- late direct manipulation actions into sentences. With outside world interfaces translating external events into sentences, the core SANSE component-the Receiver-was designed to accept and interpret the SANSE command protocol (Figure 1). From the Receiver's perspective, SANSE operation is a stream of commands that can be executed without regard to the circumstances that created them. From the user's per- spective, there is no difference between speaking the words PAN LEFT or pressing the corresponding button on screen, allowing voice commands and direct actions to be freely mixed. (Figure omitted.) The Receiver executes a command by updating one or more state variables know as Subjects. Each Subject has one or more Views associated with it that needs to be notified when the Subject is modified. A View is owned by an Interactor, and is simply a representation-visible, audible, or hidden-of one or more Subjects[2]. This Subject/View coupling yields a simple deterministic model for implementing the command set; the complex- ity of the control structure of the system is independent of the number of commands recognized. The operation of SANSE can be partitioned into three separable occurrences (Figure 2): (Figure omitted) 1) External events are handled by the various Interac- tors, which translate those events into SANSE com- mands and send the commands to the Receiver. 2) The Receiver executes commands, modifying one or more Subjects per command executed. 3) When a Subject is modified, it notifies the Views which are currently attached to it, so that the individ- ual Views may update their appearance to reflect the new state of the Subject. View update is a different process from the completely local graphic occurrences intended to provide interac- tive feedback. For example, when a "soft" button is pressed, its appearance will be altered so as to inform the user that the action has been registered - the shading of its borders may invert, for example - but that action reflects only that a particular button was pressed, not that a particular SANSE command was executed as a result. To illustrate, an on-screen button may cause a PAN LEFT command to be sent when pressed, but saying "Pan Left" should not cause the same visual feedback, as though some invisible hand had pressed the button. However, panning left will update any View that is tied to the Subject representing the Forward direction, and that update will occur independent of whichever Inter- actor sent the command. The Interactor model can be extended to any structure that translates physical events into SANSE commands to be interpreted by the Receiver. Three more Interactors have proven immediately useful: · GPS Interactor, which translates real-time global positioning information into SANSE commands. · Trap Interactor, which originates SANSE com- mands in response to elapsed time or distance traveled; · Remote Interactor, which relays SANSE commands received over remote links or read from files. The Interactors model the redundancy of the user inter- face - silent operation or hands-free operation - as well as the independence of the interface components. Given this redundancy and independence, different SANSE systems with various combinations of Interactors can be configured, and the extensibility of the system is well defined. New capabilities, may be added to the system without affecting its present operation by defining a new Interactor to translate new types of events into com- mands to send to the Receiver, extending the SANSE command vocabulary if necessary. From the Receiver's perspective any SANSE Interactor is a drop-in replacement for any other Interactor, enabling consistent operation in both stand-alone and distributed configurations. For example, SANSE would operate identically as a self-contained portable naviga- tion system receiving input from an on-board GPS receiver (through the GPS Interactor) or as a desktop tracking system receiving position information from one or more mobile systems (via a Remote Interactor). The Interactors and the Receiver communicate through an abstract interface that can be implemented using a vari- ety of physical channels and protocols[3]. Because Interactors operate independent of one another and independent of the Receiver, SANSE systems can be deployed in stand-alone or networked configurations using a wide variety of hardware components. · A simple field data collection application can be hosted on a hand-held device using only the GPS and Trap Interactors, operating either in batch mode or in real-time communication with a base station via radio or cellular links. (SANSE's command protocol is well-suited to new packet cellular protocols like CDPD.) · Portable systems incorporating voice response and GIS displays have been hosted on notebook comput- ers outfitted with single-board peripherals. · Shadow systems (where mobile system A reports its position to desktop or mobile system B) have been deployed with both systems A & B having full display capabilities. Advances in CPU power, PCMCIA packaging, and stor- age capacity will make such self-contained SANSE sys- tems no larger or heavier than the notebook computers hosting the software. 3. The SANSE Protocol SANSE's command protocol has two representations, an internal packet format, and an ASCII equivalent. The ASCII representation of a command is a sequence of fields separated by semi-colons and terminated by a newline. Org;Mnemonic;C;Keyword;Data;T_Sent;T_Rec The Org identifies the command as user-generated (U) or system-generated (S). Each command Mnemonic can have optional Keyword and/or Data. Data repre- sentations for geographic positions, headings, GPS sta- tus packets, and numeric choices have been devised- others can be easily added. The T_Sent (Time Sent) is supplied by the Interactor originating the command; the T_Rec (Time Received) is inserted by the Receiver. There are two mechanisms for repeating command exe- cution. SANSE will repeat once the last user command executed whenever the Receiver gets the MORE com- mand, or will continuously repeat the last user command when the Receiver gets the CONTINUE command. The continuation process repeats until the next user com- mand is received. System commands (new GPS location or status information, for example) can be executed without interrupting continuation. The C (Continuation) field of the command protocol has proven valuable for cutting down the communication load between the Receiver and Interactors. Placing the string "ING" in the C field is a request for immediate continuation. Thus, if a button owned by the Screen Interactor is intended to provide sustained operation, it can send a command with the Continuation field set when the button is pressed, and send a STOP command when the button is released. The communication load is therefore independent of the amount of time the button is depressed, and the controls operate effectively in net- worked configurations. The Voice Interactor uses the present participle form of certain commands to request continuation: "Panning Left" as opposed to "Pan Left." Thus, the ASCII representation of the user command PAN LEFT would be U;PAN;;LEFT;;; while PAN LEFT CONTINUE would appear as U;PAN;;LEFT;;; U;CONTINUE;;;; but could be appreviated as U;PAN;ING;LEFT;;; Sample content for the data field is illustrated by a choice command like USE 2: U;USE;;C 2;; The internal representation of this protocol is fixed- length packets; the packet size is determined by the size of the largest data element it can contain, currently 20 bytes. The unused packet space in commands that con- tain only keywords or shorter data is more than compen- sated by avoiding the overhead of sending and receiving data-dependent variable-length packets. SANSE com- ponents residing on the same host almost always use the internal representation for routing commands; uncou- pled components can choose the ASCII or internal for- mat as required. The Receiver maintains a history file of all commands executed, with commands stored in their ASCII repre- sentation. History files can be replayed through the Remote Interactor. During replay, the Remote Interactor can reproduce or accelerate the relative gaps between commands represented by their individual T_Rec stamps. 4. A SANSE Vocabulary for GPS/GIS Applications Plates 1 and 2 following the text of this paper illustrate a SANSE-based portable system combining navigation and geographic information display with multimedia field data collection and review. This system was imple- mented with a command set of 33 operations and a total vocabulary of 70 words, a vocabulary small enough to provide speaker-independent voice response. This sec- tion describes the operation of that system and its vocabulary. 4.1\x11Perspective commands. These commands alter the Field of View, the region of the earth's surface repre- sented by the GIS display. =========================================== Mnemonic Argument ------------------------------------------- TRACK(ING) Direction PLACE Location or KnownPosition PAN(ING) LEFT or RIGHT LOOK NumericHeading, Direction, or KnownPosition TIGHTEN(ING) WIDEN(ING) ZOOM IN or OUT ENLARGE REDUCE CONVERGE ------------------------------------------- The View Point-the center of the Field of View-is typi- cally the position reported by the GPS Interactor, but can be established at an absolute location using the PLACE command, which takes as its argument a geo- graphic Location or the keywords HERE, representing the location currently reported by the GPS Interactor, or BACK, representing a previously stored location (see section 4.4). The Field of View can be moved incremen- tally by TRACK(ING) in any of the four compass directions (NORTH, SOUTH, EAST, WEST) or FOR- WARD, BACKWARD, LEFT, or RIGHT relative to the View Heading. The View Heading is typically the cur- rent heading reported by the GPS Interactor, but can be rotated by PAN(ING) LEFT or RIGHT or positioned at an absolute heading using the LOOK command. The extent of the Field of View (the scale of the display) can be changed using the ENLARGE or REDUCE com- mands. The degree of enlargement of reduction is con- trolled by the View Finder, whose size is controlled by the TIGHTEN and WIDEN commands. ZOOM IN makes the View Finder as small as it can be; ZOOM OUT removes it from the screen. Finally, the CONVERGE commands restores the display scale to the natural scale of the data being viewed. 4.2\x11Composition commands. These commands manip- ulate data sets shown in the Field of View;. =============================== Mnemonic Argument ------------------------------- USE Choice or NONE WITH Choice or NONE ADD Choice or ALL REMOVE Choice or ALL HIDE SHOW ------------------------------- The display model combines a raster-based underlay image with vector or symbol-based overlays (annota- tion). The underlay is a composition of two classes of data: backgrounds and transparencies. Backgrounds might be scanned maps or satellite photos, while trans- parencies include land use maps, elevation maps, or related data sets. Although a single background or single transparency could function as the underlay image, there are a number of background/transparency combinations that make tactical sense. The various categories of anno- tation are rank-ordered by priority, and the enabled overlays shall be drawn in reverse order of priority, low- est to highest. The USE command specifies the background data set to use, or NONE. The equivalent command for the transpar- ency is WITH. ADD and REMOVE manipulate layers of the overlay; HIDE turns off the current overlays; SHOW restores them. Background, transparency, and overlay choices can be always specified by number, and a num- ber of common types of date (MAP, PHOTO, TRACE) have been assigned keywords in the vocabulary. Future versions of the system may support loading customized vocabularies to represent specific data sets. 4.3\x11Screen Management commands. These commands interact with the window management system through the Screen Interactor;. ========================== Mnemonic Argument -------------------------- OPEN Window WHERE AM I CLOSE Window RAISE Window MOVE(ING) Direction BEFORE NEXT -------------------------- OPEN and CLOSE can be used to configure the display. The window types recognized are: · VIEW - A window showing the Field of View, along with controls for opening other windows; · SCALE - showing the current map scale, along with controls for changing scale and manipulating the View Finder; · COMPASS, showing position, heading, and status information reported by the GPS Interactor; · KEY - showing the current composition of the Field of View, along with controls for manipulating back- grounds, transparencies, and overlays. · POINT - showing the View Point and View Heading, along with controls to manipulate them. · MARKER - showing information about user-defined markers (see next section). WHERE AM I is a natural equivalent to the command OPEN COMPASS. Most windows are referenced by their keyword alone, but MARKER windows are referenced by name and num- ber ("Open Marker 4") or the most recent Marker if the number is omitted. The user can chain through the entire list of Markers once a Marker window is open using the BEFORE and NEXT commands. This same feature could be extended to other kinds of windows representing data maintained in lists. The RAISE command brings a particular window to the top and makes it the current window. Although a com- mand is provided for MOVE(ING) the current window UP, DOWN, LEFT, or RIGHT, using this command is admittedly far less convenient than using a pointing device. No attempt is made in this command subset to provide access to all the features of a particular window man- agement system or toolkit. 4.4\x11Action commands. These commands report data, mark locations, and initiate other miscellaneous actions. =============================================== Mnemonic Argument Qualifier ----------------------------------------------- MORE CONTINUE STOP REF UNREF MARK SYSTEM or USER GPS_FIX GPS Fix SYSTEM GPS_STATUS GPS Status SYSTEM CHECK SYSTEM or USER QUIT SYSTEM or USER ----------------------------------------------- As previously discussed, MORE repeats one the last user command executed. CONTINUE repeatedly executes the last user command until the next user command is received. STOP interrupts continuation without execut- ing another command. The REF command saves the current View Point and View Heading; these values can then be accessed through the keyword BACK (as opposed to HERE). UNREF clears these values. System-initiated commands are distinct from user com- mands in that system commands do not interrupt contin- uation, but instead have their execution interleaved with continuation. GPS_FIX and GPS_STATUS are System commands that relay position, heading, and status reports from a GPS receiver. The System command MARK is initiated by the Trap Interactor periodically to save the current GPS position and heading on a Trace of travel. The interval between Trace point can be time- based, distance-based, or a combination. The other Sys- tem commands are CHECK, to run a self-test, or the self- evident QUIT; both these commands can also be user- initiated. One other System command can also be user-initiated. A user can leave a MARK at the current View Point, with an option to annotate that mark with data exchanged with other programs. SANSE applications have used Markers containing spreadsheet data, CAD drawings, or audio recordings (an example of an audio Marker is shown in Plate 2). A Marker file with annotation can be preloaded into SANSE, enabling SANSE to be used in the field to update spatial database information in its native format. 5. Discussion SANSE is written in the C++ language and used prima- rily on computers running the Microsoft Windows oper- ating system (Release 3.1 and later). SANSE systems can exchange data with other programs through the Microsoft OLE protocol, with SANSE acting as the OLE client. SANSE can be ported to other operating environments, as it incorporates no proprietary non- standard language features and processes only seven Windows messages in the course of its operation. Although originally written for use in mobile GPS/GIS applications, the SANSE architecture provides a robust general model for mobile and distributed systems by: 1)\x11defining system behavior in terms of a well-under- stood command set; 2) effectively decoupling the response to a command from the various and often redundant circumstances that can initiate the command; and 3) creating abstract interfaces between architectural components so these components will interact identi- cally whether deployed on a common platform or dis- tributed across a hardware network. This approach pays several benefits: · Redundancy - In representing the redundancy between elements of the operator interface (silent operation and hands-free operation, for example) the architecture also models the independence of the vari- ous elements and how they individually relate to the SANSE core. · Configurability - Since elements of the SANSE inter- face are independent, the architecture supports instan- tiating SANSE with elements selectively enabled/ disabled. The command set is easily partitioned; a dis- tributed system can have several Receivers, each rec- ognizing only those commands that can make use of the local platform resources. · Extensibility - In establishing the allocation of func- tionality between the SANSE core and various ele- ments of the interface, the architecture supports extending the interface by adding new I/O devices - a video camera, for example - without affecting the operation of the present system. To write distributed programs, one must be conversant in two distinct vocabularies. First there is the vocabulary of the problem domain, or the computation model; soft- ware embodying the computation model is called appli- cation code. Second, there is the vocabulary of the system domain, or the coordination model; software embodying the coordination model is called system code. It has been shown elsewhere that a well-chosen vocabulary for system code can isolate application code from the details of physical process distribution and communication channels, creating distributed programs that may be conveniently ported across different operat- ing environments [3]. Here we have shown that a well- chosen application vocabulary extends this flexibility to the application components, creating systems whose core functionality is isolated from the many redundant sources of its inputs and outputs, and whose diverse components can serve as drop-in replacements for one another to serve a broad range of needs. 6. Acknowledgments and Contact Information SANSE was designed at the Environmental Research Institute of Michigan (ERIM) in Ann Arbor, MI. The author wishes to acknowledge the many contributors to the project: Orest Mykolenko, Matt Frazer, Lori Sulik, and Linda Spencer for software design; Dave Symanow, Cyrus Wood, and Len Tomko for hardware design and logistics; and especially Ron Swonger for his vision and management support. Inquiries regarding SANSE may be directed to Jeremy Salinger (jsalinger@erim.org) at ERIM, P.O. Box 134001, Ann Arbor, MI 48113-4001. 7. References [1] Hilfinger, P., Abstraction Mechanisms and Lan- guage Design, The MIT Press, 1983. [2] Linton, Mark A., et. al., "InterViews: A C++ Graphical Interface Toolkit," Proceedings of the USENIX C++ Workshop, November, 1987, Santa Fe, NM. [3] Dennehy, T. G., "Class Libraries as an Alternative to Language Extensions for Distributed Programming," USENIX Symposium on Experiences with Distributed and Multiprocessor Systems III (SEDMS III), March 26- 27, 1992, Newport Beach, CA. (Plates 1 & 2 omitted.)