################################################ # # # ## ## ###### ####### ## ## ## ## ## # # ## ## ## ## ## ### ## ## ## ## # # ## ## ## ## #### ## ## ## ## # # ## ## ###### ###### ## ## ## ## ### # # ## ## ## ## ## #### ## ## ## # # ## ## ## ## ## ## ### ## ## ## # # ####### ###### ####### ## ## ## ## ## # # # ################################################ The following paper was originally published in the Proceedings of the USENIX Summer 1993 Technical Conference Cincinnati, Ohio June 21-25, 1993 For more information about USENIX Association contact: 1. Phone: 510 528-8649 2. FAX: 510 548-5738 3. Email: office@usenix.org 4. WWW URL: https://www.usenix.org Integrating Handwriting Recognition into Unix James Kempf Nomadic Systems Group, Sun Microsystems Computer Corp. 2550 Garcia Ave., Mail Stop MTV17-08 Mountain View, CA, 94043 Abstract Many new portable computers are substituting an electronic stylus, or pen, for the mouse. While the pen can serve as a sim- ple replacement for the mouse, it also provides an enhanced draw- ing capability. This capability opens up the potential for new modes of user interaction, one of which is text input through handwriting instead of keyboard entry. In this paper, the in- tegration of handwriting recognition into the Unix operation sys- tem is discussed. We begin with an examination of the current state of the art in rec- ognition algorithms and how handwriting recognition can enhance a user interface. A standard applica- tion program interface for handwriting recognition engines (HRE API) is then presented. The HRE API is distinguished from exist- ing PC operating system API's in that it is specifically designed for multiple handwriting recognition engines of differing techno- logies, rather than a single, vendor-specific engine, and it shares a relatively narrow surface area with the window system. The latter characteristic allows it to be used with existing win- dow systems, such as X, but does not hinder migration to other window sys- tems should they become available. The API has been implemented with a public domain recognition engine and is currently being circulated among vendors of handwriting recogni- tion engines for com- ment. Finally, the paper concludes with a discussion of where handwriting recognition belongs in the current X window system architecture, and what would be needed to make handwriting an equal part- ner with typed keyboard input for text entry. 1. Introduction An important new trend in portable computing is the substitution of an electronic stylus, or pen, for the mouse. In some machines, the pen serves as a simple replacement pointing device and text entry is still primarily through the keyboard. In other more rad- ical designs, the keyboard has been eliminated entirely. Text en- try on keyboardless machines is accomplished by writing on the surface of the display. As the user writes, the window system software leaves a trail of electronic ink, and the result is trans- lated into text by a handwriting recognition engine (HRE) when input is complete. Handwriting recogni- tion distinguishes the use of the electronic stylus in these new machines from old- er, desktop stylus use, in which the stylus was employed just for drawing and pointing and not for text entry. Although the use of handwriting for text entry evolved in response to the lack of horizontal support for a keyboard in por- table computers, there are occasions where handwriting for text entry may be appropriate in desktop systems as well. This paper explores the integration of handwriting recognition into the Unix operating system. In the next section, we briefly review the hardware characteristics of electronic tablets and pens. Section 3. dis- cusses the characteristics of text input through handwriting recognition in comparison with keyboard en- try. We also review of the current state of handwriting recogni- tion technologies. Given the variety of handwriting recognition technologies available, an open systems approach to integration is essential. Section 4. through 8. describe an open handwriting recognition engine application program interface (HRE API) writ- ten in ANSI C [Kernighan88]. The API can accommodate HRE's of various technologies, but has provision allowing technologies with distinguishing characteristics to offer extensions without requiring all HRE's to support them. The API also features only a few, very limited contact points with the window system. In con- trast, the two major commercial handwriting recognition API's, Microsoft Windows for Pen Computing (MWPC) [Microsoft92] and Pen- Point [Go92], very tightly couple the hand- writing recognition API with the window system and with the look and feel of the graphical user inter- face. We conclude the paper in Section 9. with a design for integrating the HRE into the existing standard window system on Unix, the X window system [Scheifler86], and in Section 10. a brief summary. 2. Hardware Technologies for Pen Input Tablet digitizers have existed for a number of years (see [Tap- pert88] for a review). There are currently three different tech- nologies in widespread commercial use: o Electromagnetic, in which the pen position is sensed by inductive coupling, o Electrostatic, in which the pen position is sensed by capacitive coupling, o Resistive, in which pressure on the tablet surface causes a change in resistivity, allowing the pen posi- tion to be sensed. In addition, experimental acoustic and light sensing pen devices have been demonstrated. The most important advance in the last several years has been the coupling of the tablet with an LCD flat-panel display. With accompanying software, this allows the integrated tablet and display to mimic paper. For handwriting recognition, tablet requirements are very strict. The tablet must have a resolution of 200 points per inch and a sampling rate of 100 points per second. 3. Characteristics of Handwriting Recognition Technologies A common criticism of handwriting for text entry, especially from professional programmers, is that people can type much faster than they can write. In fact, studies of actual text input speed [Dao92] indi- cate that people require 10-20 words before coming up to full typing speed, whereas only 2-4 words are required to come up to full speed when writing. Once up to full speed, howev- er, fast typers tend to type at about 70 words per minute while fast writers tend to write at 30 words per minute. Slow typers tend to be somewhat slower than slow writers, 15 as opposed to 20 words per minute. These data indicate that handwriting is clearly not appropriate for tasks involving entry of large amounts of text, such as composing a document or writing a program. However, for applications with command language inter- faces, such as spreadsheets, or for editing documents, handwriting could actual- ly be faster than the key- board. The advantage of handwriting is particularly noticeable when the application also requires interaction with a pointing device. In a study of expert spreadsheet operators [Dao92], switching between the key- board and mouse required 25 time units, while no switching was required in a handwriting-only inter- face. Furth- ermore, time required for command and data entry in spreadsheet operation was also reduced in a handwriting-only interface due to the lack of lag in coming up to full speed for short sequences of words. Handwriting recognizers can be differentiated into two general categories, depending on the constraints imposed on written in- put. Block recognizers require users to write cleanly separated, block-printed char- acters, without any overlap. They recognize handprinting on a character-by-character basis. Some block recog- nizers relax constraints on overlapping, but still require block printing. Since most people don't naturally write with well- separated block letters, the extra effort required makes text en- try with block recognizers slower than natural handwriting. Cur- sive recognizers allow writing with characters con- nected by ligatures, which is how most people write naturally. Some cursive recognizers are restricted to lower case only, while others allow mixed case, and also handle block printing with overlapping charac- ters. Most cursive recognizers restrict the input text to words in a dictionary however, in order to obtain higher transla- tion accuracy. Both types of recognizers come in writer- independent and writer-dependent forms. Writer-independent recog- nizers have good walk-up translation accuracy, and some can be improved by explicit training. Writer-dependent recognizers re- quire the user to train the recognizer in order to get accurate translation. Studies of recognition performance indicate that the accuracy of translation and recognition rate for cur- rent state of the art, commercial recognizers is acceptable, while still open for im- provement. Users tend to judge the acceptability of translation accuracy based on the word accuracy of the result [Vallone91]. The word accuracy is the percentage of words correctly translated out of the total words entered. Char- acter accuracy, or the per- centage of characters correctly translated out of the total set of characters entered, is less important from the user's perspec- tive. Commercial block recognizers can achieve about a 90% word recognition rate [Vallone91], while commercial cursive recogniz- ers can achieve about the same on restricted dictionary sizes [Kempf92]. Most recognizers can achieve good recognition rates even on PC-class machines. On a 33 MHz 486 PC, cursive recogni- tion requires between 90-180 milliseconds per character, which most users judge as excellent [Kempf92]. While improvements in translation accu- racy are desirable, the current state of the art is still highly acceptable in exactly those kinds of limited text input applications where the underlying dynamics of handwriting versus typing indicate handwrit- ing has an advan- tage. A wide variety of pattern recognition and artificial intelligence technologies have been applied to the problem of on-line handwriting recognition (for a technology review, see [Tap- pert88], for a review of commercial products see [Gibbs93]). Chain codes, feature analysis, and templates have been applied to handwriting recognition for some time. More recently, neural nets have become popular for both cursive and block recognition [Mar- tin92] [Mori92]. No commercial recognizer has yet exploited the full potential of learning, however. Since handwriting recogni- tion is something which people are naturally good at, most people expect the computer to perform as well as, if not better than, a person. They expect that the computer will be able to learn their handwriting automatically, even if it is difficult for another person to read. The block recognizers distributed with MWPC and PenPoint, as well as most other block and cur- sive recognizers, allow users to explicitly train the recognizer for personal vari- ations in their handwrit- ing, but the training must be explicit- ly done. Ideally, training would occur by having the recognizer observe recognition errors, so that the recognition rate for the primary user improved over time without causing a deterioration in the untrained, walk-up rate. The above analysis applies primarily to Western languages and the computer command languages derived from them. For East Asian languages, the large number of ideographic characters often re- quires that multiple keys be pressed to input a single character, rather than a single key as in Western lan- guages. Although no comparative studies have been done, the difficulty of entering ideographic charac- ters with a keyboard suggests that handwrit- ing recognition may actually be a superior method of text entry, even for large volumes of text. Indeed, research into handwriting recognition for East Asian lan- guages is somewhat more active than for Western languages, and commercial products are available tar- geted specifically to the East Asian market. 4. The Handwriting Recognition Engine API To help foster the integration of handwriting recognition into the Unix software environment, an appli- cation program interface for handwriting recognition was designed. Recognition of text, gestures, and arbitrary objects is support by the interface. Ar- bitrary objects need to be supported by the API because some recognizers convert pen strokes into geometric figures. Gestures are noncharacter strokes which act like the commands that are typically on a menu or associated with a function key (copying, pasting, etc.). Pen-based graphical user interfaces (GUI's) em- ploy gestures in place of some menu or keyboard com- mands. The advantage of a gesture is that a gesture can indicate in a single stroke a target for an action (by its position) and the action to be employed on the target (by its form). In contrast, a mouse or com- mand key interface requires multiple mouse movements and button clicks to specify the target and acti- vate the command. 4.1 Goals The following is a list of explicit design goals for the API: o Provide a functionally complete interface to HRE's in a wide variety of technologies, o Allow particular HRE's to customize the API for technology-specific features, o Fully support multiple HRE's both at the programmatic and system level, o Minimize dependence on the window system API and com- pletely decouple from the GUI look and feel, o Fully support internationalization. The first three goals address the open system nature of the API. The API must provide basic support for handwriting recognition service across a wide range of technologies and HRE products, without requir- ing support of technology-specific features from all recognizers. On the other hand, the design should have provi- sion for extension if a particular technology or product can sup- ply some additional service. Application developers and system integrators may want to supply different recognizer configura- tions for application-specific and locale-specific purposes. Therefore, the API should support the installation and use of multiple HRE's. The fourth goal recognizes that, unlike MWPC and PenPoint, no one GUI or window system is likely to satisfy the wide range of Unix customers. In addition, since Unix has traditionally been the operating system of choice among a large segment of the computer science research community, a handwriting rec- ognition API decoupled from any particular GUI may help to foster research into new ways of designing pen-based GUI's and new recognition technologies. Finally, the fifth goal addresses the potential in- terest in handwriting recognition among East Asian and other international users. There are two important nongoals of the API: o Support for recognizers which operate upon bitmaps, o Support for signature verification. Recognizers operating on bitmaps are typically associated with off-line, optical character recognition. The HRE API is specifi- cally designed for on-line, stroke-based recognizers. For signa- ture verification, there are additional issues, such as encryp- tion, that need to be addressed in a full signature verification API. Signature verification also tends to use algorithms with different data requirements than handwrit- ing recognition and doesn't support gesture translation. 4.2 Design of the API The HRE API consists of three pieces: o A set of C structures for transmitting information between the client application and the recognition engine, o A functional interface for use by clients of handwriting recognition services, o An internal structure and functional interface for use by HRE implementors. Because the input required by the HRE and the translated output are somewhat complex, the set of structures for communicating with the HRE is rich. The structure interface can be broken into the follow- ing groups: o Input structures for conveying pen, tablet, and other recognition information to the recognition engine, o Output structures for conveying the results of recogni- tion to the client, o Information structures for conveying information about the HRE configuration or for changing it, o Structures for passing information to/from the recognizer about gestures. o Scalar constants and flags for indicating particular pen, tablet and other state; for example, if the pen tip is currently in contact with the tablet. In this paper, the API functions involved in extension and trans- lation and their associated structure arguments and returns are discussed in detail, the rest are reviewed. For more information on the HRE API, see [Kempf93b]. Figure 1: Architecture of the Recognition Manager 4.3 The Recognition Manager HRE's are packaged as shared libraries [USL92a]. The HRE API im- plementation, or recognition manager, manages the interaction between the recognition client and the HRE shared library, insu- lating the client from differences in HRE's. The recognition manager performs the loading and unloading of recognizers, the initialization of internal recognizer state, the mapping of client-visible operation invocations into specific recognizer functions, and the finalization of the recognizer when the HRE is unloaded. The rela- tionship between the HRE shared libraries, recognition manager, and client is illustrated in Figure 1. The client loads a recognizer by calling the function recognizer_load() with arguments specifying the shared library name of a specific recognizer, any character subsets to which the recognizer should be restricted, and (optionally) the directory where the shared library is located. The opaque recognizer object returned is used as the first argument to all client API func- tions. The client is allowed to restrict the character sets upon which the recognizer operates because such restriction can im- prove translation accuracy markedly. For example, if the client is just translating Social Security numbers, the recognizer can be restricted to the numerals. When the client is no longer in need of recognition services, the client unloads the recognizer by passing the recognizer object to the function recognizer_unload(). The internal implementation of a recognizer object is a C struc- ture with some data members and a col- lection of function pointers to the implementation functions in the HRE shared li- brary, similar to a C++ virtual function table [Stroustrup91]. The structure is shown in Figure 2. In addition to the function pointers, the structure contains a starting and ending magic number (recognizer_magic and recogniz- er_end_magic) to check in- tegrity, the API version number (recognizer_version), a pointer to a structure with the recognition configuration (recognizer_info), and a handle to the shared library (recognizer_- handle). typedef struct _Recognizer { u_int recognizer_magic; char* recognizer_version; rec_info* recognizer_info; void* recognizer_handle; u_int recognizer_end_magic; } *recognizer; Figure 2: Internal Recognizer Structure The recognition manager calls the global function __recognizer_internal_initialize() in the HRE shared library after the shared library has been loaded. The shared library must implement this function to allocate a recognizer object and ini- tialize the function pointers. It can also perform any HRE- specific initialization. The initialization function is guaranteed to be called before any client-level function. When the recognizer is unloaded, the recognition manager calls the global function __recognizer_internal_fi- nalize() in the HRE shared library. This function must save any recognizer state if necessary, deallocate any internal data structures, and finally deallocate the recognizer object. The recognition manager then unloads the shared library. The recognition manager looks up the initialization and finalization func- tions in the shared library using the SVR4 function dlsym(3)[USL92b]. A recognizer may require particular internal state files for specifying the recognition characteristics of printed or written script, dictionaries of words, or other technology-specific internal state. The client API contains functions for loading and storing internal recognizer state (recognizer_load_state() and recog- nizer_store_state()). These two calls can be used by the recognizer to handle client requirements for user-specific or application-specific internal state, such as customized training prototypes of letters for particular users or dictionaries of words in a vocabulary specific for a particular application. In addition, the recognizer can load default prototype files when it is initialized. The API contains no functions for obtaining de- tailed information about letter prototypes, since this informa- tion is highly technology-spe- cific. For example, a cursive recognizer may not be able to provide any stroke information on the words it recognizes while a block recognizer could. If a par- ticular recognizer can supply such additional infor- mation, it can add an extension function. 4.4 Naming of Handwriting Recognizers and Data Since recognizers are provided as shared libraries, their names must fit into the standard SVR4 scheme for naming shared li- braries [USL92a], namely: filename.so where filename is the name of the recognition engine. The recognition manager uses an environment variable to determine where to look for HRE's of various locales. The pathname for HRE's is: $RECHOME/$LANG where RECHOME is an environment variable set to the root directo- ry containing the system recognizers and LANG is the standard SVR4 environment variable set to the current locale name [USL92c]. If RECHOME is not set, the directory /usr/lib/recognizers is searched. If LANG is not set, the default locale (C) is used. A cli- ent can also override these conven- tions by supplying a directory name argument directly to recogniz- er_load(). An HRE may require additional files to be loaded after the shared library is loaded and before the HRE is ready to accept client requests. Any additional files should be collocated with the main HRE shared library, and either the shared library _init() func- tion or the recognition engine initialization function __recognizer_internal_initialize() should perform the additional operations. Some recognizers may collect information for individual users, such as individualized training proto- types. To avoid cluttering up the user's home directory with multiple files, on first use the recognition manager creates a directory for user-specific recognition data. The directory has the name $HOME/.recog- niz- ers, where HOME is environment variable containing the name of the user's home directory. By con- vention, an individual recog- nition engine puts user-specific data into files named with the HRE shared library file name, minus the.so extension, or, if more than one file is required, by creating a subdirectory, having the HRE name, to contain the files. 4.5 Error Handling Client API functions return either a pointer to a translation or other data, or an integer error code. A cli- ent determines that an error has occurred if either: o A NULL pointer has been returned from a function which should return a structure pointer, o An integer error code has been returned if the function does not return a structure pointer. Error handling in the HRE API is similar to that in dynamic link- ing for SVR4 [USL92b]. Each HRE implements a recognizer_error() function that returns a string describing the last error since the previ- ous time recognizer_error() was called. The recogni- tion manager handles its errors similarly. If no errors occurred or if recognizer_error() is called twice in a row without any recognition operation between, NULL is returned. As with the recognizers themselves, handling of error messages generated by the recognition manager is internationalized, and individual recognition engine vendors are encour- aged to internationalize their error message handling as well. 5. Client Structure Interface The client structure interface is fairly rich due to the nature of the information exchanged between the client and HRE. Clients passes information on the tablet, the physical and linguistic context in which rec- ognition is occurring, and the actual strokes to the recognition engine. The recognition engine returns the recognized object (ASCII, variable byte, or double byte string, gesture, or arbitrary object. The recog- nition engine must also provide correlation between strokes and recognized text, gestures, or objects, so that a pen-based GUI can display feedback. The structure definitions reside in the file hre.h. Only struc- tures directly involved in translation are discussed here, see [Kempf93b] for more information about other structures. The structure interface uses a number of standard Unix types and a few additional scalar types. Stan- dard Unix scalar types are taken from sys/types.h, while time value types taken from sys/time.h. Three additional scalar types are required: o A standard boolean type is defined for boolean fields: typedef u_char bool; #define true 1 #define false 0 o A function type is defined for use in typing the vector of function pointers to extension functions: typedef void (*rec_fn)(); o Since reports of recognizer confidence are restricted to the range 0 through 100, a simple type is defined for recognizer confidence: typedef u_char rec_confidence; 5.1 Recognition Input Structures The client uses a number of structures to pass information into the recognizer. In Figure 3, structures and constants which the recognizer must share with the window system are shown. There are only four such structures: a point structure (pen_point), a rec- tangle structure (pen_rect), a structure describing current pen state (pen_state), and a structure describing the hardware characteristics of the tablet (tablet_info). The pen_point and pen_rect structure definitions are identical to the definitions for the X window system [Nye92], so that rectangles and arrays of points can be passed directly to the recognizer without time- consuming copying. Because X does not yet have a standardized pen extension, the pen_state and tablet_info structures have no equivalent in X. typedef struct { short x, y; } pen_point; typedef struct { short x,y; short width,height; } pen_rect; #define PEN_DOWN 0x1 #define PEN_BUTTON1 0x2 #define PEN_BUTTON2 0x4 #define PEN_BUTTON3 0x8 #define PEN_OUT_OF_RANGE 0x10 typedef struct { int pt_state; int pt_pressure; bool pt_invert; double pt_anglex; double pt_angley; double pt_barrelrotate; } pen_state; #define TABLET_BARREL1 0x1 #define TABLET_BARREL2 0x2 #define TABLET_BARREL3 0x4 #define TABLET_INTEGRATED 0x8 #define TABLET_PROXIMITY 0x10 #define TABLET_RANGE 0x20 #define TABLET_RELATIVE 0x40 #define TABLET_PRESSURE 0x80 #define TABLET_HEIGHT 0x100 #define TABLET_INVERT 0x200 #define TABLET_ANGLEX 0x400 #define TABLET_ANGLEY 0x800 #define TABLET_ROTATE 0x1000 typedef struct { int ti_capabilities; u_int ti_maxx; u_int ti_maxy; u_int ti_sample_rate; u_int ti_sample_distance; } tablet_info; Figure 3: Structure Definitions Shared with the Window System The ti_capabilities member indicates basic tablet hardware capa- bilities, and is a combination of the TABLET_xxx constants using bitwise "or". The ti_maxx and ti_maxy members of the tablet_info structure indicate the maximum x and y coordinates that the ta- blet reports. All tablets are required to provide these. Some ta- blets may also be able to report the sampling rate, in samples/second, and the sampling distance. The sampling distance is the number of tablet coordinates the pen must move before a point event is generated. The ti_sample_rate and ti_sample_distance members indicate the sampling rate and sample distance, respectively, and are zero if a tablet is incapable of reporting the information. The TABLET_xxx constants indicate the following set of capabili- ties (constant names for reporting in parentheses): o A depressible pen tip or other indication that the pen is in contact with the table (required and assumed default), o One to three barrel buttons (TABLET_BARRELx), o Tablet is integrated with a flat panel display (TABLET_INTEGRATED), o Tablet reports the pen location even when the pen is not in contact with the tablet. Not that this does not necessarily mean the tablet can report height information (TABLET_PROXIMITY). o Tablet reports when the pen moves out of sensing range (TABLET_RANGE), o Tablet only reports relative positions, like a mouse. Ab- solute position reporting is assumed otherwise (TABLET_RELATIVE). o Tablet reports the pressure of the pen against the tablet (TABLET_PRESSURE), o Tablet reports the height of the pen above the tablet (TABLET_HEIGHT), o Tablet reports when the pen is inverted, such as would be the case during erasing with a real pen hav- ing an eraser (TABLET_INVERT), o Tablet reports the x and y angle of the pen with the ta- blet surface (TABLET_ANGLEX and TABLET_ANGLEY), o Tablet reports when the pen barrel has been rotated (TABLET_ROTATE). The pen_state structure indicates the state of the various pen capabilities during a single pen stroke. The pt_state member is set to the logical "or" of the PEN_xxx constants representing the state. All tablets must report PEN_DOWN when the pen is in con- tact with the tablet. The other constants indicate additional button (PEN_BUTTONx) and proximity (PEN_OUT_OF_RANGE) state, and can be set if the corresponding flags in ti_ca- pabilities member of the tablet_info structure indicate the hardware has that capa- bility. Whether the rest of the pen_state members can be set also depends on which flags are set in ti_capabilities. The pt_pressure member is dimensionless and is positive for pressure against and negative for height above the tablet. The pt_invert member is true if the pen is inverted. The pt_anglex, pt_angley, and pt_barrel- rotate members are the attitude and barrel rota- tion angles, respectively, all in radians. typedef struct { struct timeval ps_tstart; struct timeval ps_tend; u_int ps_npts; pen_point* ps_pts; pen_state* ps_state; } pen_stroke; #define REC_LEFTH 0x1 #define REC_DEFAULT 0x0 #define REC_BOTTOM_TOP 0x1 #define REC_LEFT_RIGHT 0x2 #define REC_RIGHT_LEFT 0x3 #define REC_TOP_BOTTOM 0x4 #define REC_ULEFT 0x0 #define REC_URIGHT 0x1 #define REC_LLEFT 0x2 #define REC_LRIGHT 0x3 typedef struct { u_short rc_upref; bool rc_gesture; rec_confidence rc_error_level; u_short rc_direction; u_short rc_orient; tablet_info* rc_tinfo; } rc; Figure 4: Stroke and Recognition Context Structures Basic input data for the recognizer is supplied by the pen_stroke and rc, or recognition context, struc- tures, shown along with the constants in Figure 4. The pen_stroke structure contains two timeval struc- ture members, ps_tstart and ps_tend, for indicat- ing the starting and ending times of the stroke. The ps_npts and ps_pts members are the number of points in the stroke and the ar- ray of points. Note that the client need not copy the array of points returned by the window system, but can simply attach a pointer on the ps_pts member, reducing the amount of time re- quired to initialize the data structures for recogni- tion. The ps_state member is a pointer to a pen_state structure, giving the state of the tablet and pen during the stroke. The recognition context is somewhat like a graphics context in X [Nye92]. It contains particular tablet configuration information that can change but usually remains constant between calls on the recognizer. The rc_upref member indicates the user preferences. Currently, this member only indicates whether the user is right handed (the default) or left handed (REC_LEFTH). The rc_gesture member is set to true if the client would like the recognizer to include gestures in the translation. The rc_error_level member is set to the recognition confidence level, an integer in the range 0 to 100. It indicates the confidence level below which the recognizer should report that no translation could be found. The rc_direction member indicates the preferred and the secondary writing direction. For example, in English, the preferred direc- tion is from left to right, the secondary direction is from top to bottom. The preferred direction is in the upper byte and the secondary writing direction in the lower byte of the u_short. Writing directions are indicated by the constants REC_DEFAULT through REC_TOP_BOTTOM. The rc_orient member gives the tablet orientation, and is set to one of the constants REC_ULEFT through REC_LRIGHT. The orientation indicates the location of the tablet origin. This information is not in the tablet_info structure be- cause it can change if an application changes the tablet orienta- tion from portrait to landscape, such as might be the case in a forms-based application. Finally, the rc_tinfo member contains a pointer to the underlying tablet infor- mation in a tablet_info structure. #define REC_NONE 0x0 #define REC_GESTURE 0x1 #define REC_ASCII 0x2 #define REC_VAR 0x4 #define REC_WCHAR 0x8 #define REC_OTHER 0x10 typedef struct { char re_type; union { gesture* gval; char* aval; wchar_t* wval; } re_result; rec_confidence re_conf; } rec_element; typedef struct { u_int ra_nelem; rec_element* ra_elem; } rec_alternative; typedef struct Gesture { char* g_name; u_int g_nhs; pen_point* g_hspots; pen_rect g_bbox; void (*g_action)( struct Gesture*); } gesture; typedef void (*xgesture)(gesture*); Figure 5: Basic Output Structures for Translation 5.2 Recognition Output Structures For most recognizers, the mapping between the input strokes and output text or object is rarely one to one, so the recognizer translation functions must return arrays of alternative transla- tions. The basic out- put structures for translation are shown in Figure 5. The rec_element structure is the basic recognition re- turn. It contains a re_type field indicating the type of the re- turned value in the re_result union. Possi- ble types are indi- cated by the flags on the right side of the figure. The REC_NONE flag indicates that no value was returned. The REC_ASCII, REC_VAR, and REC_OTHER flags indicate that the aval union member is valid. The client is responsible for casting to the appropri- ate type. REC_WCHAR indicates that the wval member is valid, while REC_GESTURE indicates that gval is valid. The re_conf member indicates the confi- dence (0-100, but greater than or equal to rc_error_level) which the recognizer places in the translation. The rec_alternative structure provides a way of re- turning a group of alternative translations for the same strokes. The translations are in the ra_elem array, while the ra_nelem member gives the size of the array. The gesture structure communicates recognized gestures to the client. The g_name member is a distinc- tive gesture name. The client identifies the gesture with this name when setting the gesture action. The g_nhs and g_hspots members are the number of gesture hotspots and the hotspots themselves, respec- tively. Gesture hotspots are like cursor hotspots in X, and are used by the client to calculate the target of the gesture. Similarly, the g_bbox member, containing the bounding box around the gesture, is used to calculate where the gesture target is located. Finally, the g_action member is a pointer to a callback func- tion that the client (usually a window toolkit) installs before recognition begins. Upon recognition, the client can execute the function to perform the gesture action. typedef struct { rec_element ro_elem; u_int ro_nstrokes; pen_stroke** ro_stroke; u_int* ro_start; u_int* ro_stop; } rec_correlation; typedef struct { u_int rca_ncorr; rec_correlation* rca_corr; } rec_corralternative; Figure 6: Correlated Translation Output Structures Information on the correlation between a translation and the ori- ginal strokes is provided by the struc- tures in Figure 6. The rec_correlation structure contains the rec_element member ro_elem with the translation, along with the ro_nstrokes and ro_strokes members, giving the number of strokes and a pointer to a null terminated array of pointers to strokes corresponding with the translated element. The ro_start and ro_stop members are pointers to arrays of integers giving the starting and stopping point in- dex for the corresponding translation in each stroke. The rec_corralternative structure allows the return of alternative correlated translations. It contains an rca_ncorr member with the number of corre- lated translations and an rca_corr member with a pointer to the array of correlated translations. rec_alternative** recognizer_translate(recognizer rec, rc* rec_xt, u_int nstrokes, pen_stroke** strokes) rec_corralternative** recognizer_correlate(recognizer rec, rc* rec_xt, u_int nstrokes, pen_stroke** strokes) int recognizer_train(recognizer rec, rc* rec_xt, u_int nstrokes, pen_stroke** strokes, rec_element* re, bool replace_p) char** recognizer_get_gesture_names(recognizer rec) xgesture recognizer_set_gesture_action(recognizer rec, char* name, xgesture action) rec_fn* recognizer_get_extension_functions( recognizer rec) Figure 7: Translation and Extension API 6. Client Interface Functions Figure 7 contains the translation and training functions in the API client interface. The recognizer_- translate() and recognizer_correlate() functions return null terminated arrays of pointers to rec_alternative and rec_corralternative structures, respectively. These functions provide basic transla- tion ser- vices. The recognizer_train() function takes a rec_element pointer and a boolean indicating whether the training should re- place any existing translation, in addition to the translation function argu- ments. The recognizer_train() function trains the recognizer to return the rec_element contents when the argument strokes are recognized. The recognizer_get_gesture_names() func- tion returns an array of gesture names, or NULL if no gestures are supported, and recognizer_set_gesture_action() sets the call- back function for a particular gesture. The recognizer_get_extension_functions() call returns a NULL ter- minated array of pointers to extension functions for the recog- nizer. Naturally, to use the extension functions, the recognition engine must sup- ply interface definitions for the functions and the client must compile in the interface definitions and cast the return pointers to the appropriate type. A recognizer can also extend a structure by "laminating" members on after the standard definition [Go92]. Using a combination of extension functions and lami- nated structures, the HRE API achieves similar extensibili- ty to single inheritance in C++ [Stroustrup91] (with less strict type checking), but in ANSI C. An ANSI C interface is important because the over- whelming majority of commercial HRE's, and some research ones as well, are implemented in ANSI C. 7. Recognition Examples This section presents some examples of stroke input and the re- turn data structures. Figure 8: Returned Data Structure for Ambiguous First Letter 7.1 Ambiguous First Letter In the first example, a block recognizer is given the following input: The first two strokes could be interpreted as either "k" or "lc". Suppose the recognizer_translate() function was called with the stroke input. The returned data struc- ture for this example would look like Figure 8. The rec_element structure array for the first letter con- tains two elements, one for the interpretation of the strokes as "k" and the other for the interpretation as "lc". The rest of the elements have only one alternative. The re- turn value in this case would be a null-ter- minated array of pointers to the rec_alternative structures for the different letter alternatives. Figure 9: Stroke Correlation for Delayed Crossing of "t" 7.2 Correlation of Delayed Strokes The next example shows how the rec_correlation structure can han- dle delayed strokes. The small num- bers indicate the stroke ord- er. The writer waited until the end of the word to go back and cross the "t": The data structure returned from recognizer_correlate() is shown in Figure 9. Again, the recognizer uses a block algorithm for character-based recognition. The first correlation consists of the letter "t" with the first and fourth strokes. The other correlations are for a single stroke only. Figure 10: Multiple Translations Corresponding to a Single Stroke 7.3 Alternative Translations of a Single Stroke The third example shows how a correlation can be established for alternative translations of a single stroke. In this case, a cur- sive recognizer is doing the translation, and the stroke input could be either the letter "w" or the letters "iu", with the writer having forgotten to dot the "i": In this case, the numbers correspond to points in a single stroke. As shown in Figure 10, the correlation alternative array has two elements, one for the translation of the stroke as "w" and the other for the translation as "iu". The start and stop arrays in the second case indi- cate where the recognizer stopped translat- ing the stroke as an "i" and started translating it as "u". The low confidence levels for the second correlation indicate that the translation as "iu" is unlikely. 8. Implementation Status The recognition manager has been implemented with a single stroke, public domain recognizer [Rubine91] as a test vehicle. To ease the task of integrating a recognizer, a collection of allo- cation and deallocation functions for the API data structures, somewhat like C++ constructors and destructors [Stroustrup91], were added to the recognition manager. Some of these are avail- able to clients through the client API, since they are needed to create structures passed into the recognizer and deallocate re- turned structures when they are no longer needed. A GUI-based tool for testing the integration of a recognizer was built around the HRE API. It allows various recognizers to be loaded and test- ed through the HRE API. The API specification is currently being circulated among vendors of HRE's for comment and is available from the author upon request. 9. Handwriting Recognition and the X Window System Architecture Since the X window system is the standard window system on most Unix platforms, a standardized API for handwriting recognition needs to be integrated into the X window system architecture. A prerequi- site for handwriting recognition is a source of pen strokes. Lack of a standardized pen extension in the X11R5 dis- tribution means that client applications wanting to employ handwriting recognition need to handle the details of pen input themselves. This implies using the primary pointer or an X input exten- sion [Ferguson92] for the pen and implementing electronic ink by drawing lines. Requiring the client to handle the pen is particularly unattractive in a remote viewing situation, since drawing electronic ink involves a client-server round trip, which is time-consuming. An additional problem when the pen is also the primary pointing device is that, while the X input extensions al- low other devices to be substi- tuted for the mouse, the hardware characteristics must be exactly the same. As a result, it is not possible to get information specific to the pen through X, such as pressure information, when the pen is the pri- mary pointing device. A standardized pen extension to X would simplify handwriting recognition in applications. There have been a number of attempts to integrate pen extensions into X. An experimental pen server extension has been contributed to the X11R5 distribution by IBM [Rhyne92]. The IBM server pen extension enforces a policy of electronic inking similar to PenPoint, i.e., the pen trails ink wherever it touches the tablet. IBM has also developed two exper- imental Motif widgets, one for drawing and one for entering text via hand- writing recognition. A simpler server extension for pen input is described in [Kempf93a]. In this exten- sion, the server handles inking and buffers pen points until the client asks for them. Pen events are mapped into mouse events, and the server only does ink for windows which specifically register them- selves as pen windows. User interface policy issues are left to the window manager and toolkit, as is typ- ically the case in X. Despite the lack of a standardized pen extension, design con- siderations suggest some alternatives for the handwriting recog- nition in the X architecture. Figure 11 gives an overview of how handwriting rec- ognition could fit into a variety of X input si- tuations. Since X applications may want to choose from a variety of different HRE's, handwriting recognition must be available as a client library feature, and not as part of the server. Most X applications are written using a window system toolkit so the logical place to put handwriting recognition is into an extension of the toolkit. As in the IBM pen extensions, a spe- cialized widget can handle translating pen strokes into text. The toolkit routes pen strokes to the HRE when the strokes appear on a par- ticular widget, and displays the resulting text. Note that spe- cialized pen widgets can also handle internationalization, as long as there is no requirement for existing text wid- gets to perform handwriting recognition. The input part of the X interna- tionalization extension [Wid- ener91] is designed to handle the translation of ASCII key codes to multibyte and wide characters. Since text input through handwriting does not involve key codes, recognition should not require an input method process. A pen ex- tension to an existing internationalized toolkit could simply use the existing locale mechanism to determine which handwriting recognizer to load as long as text display was inter- national- ized. Alternatively, existing text widgets can be enhanced to support handwriting input, as is the case in MWPC, or handwriting recog- nition could be integrated at the level of Xlib [Nye92]. While enhancing Xlib or existing text widgets does allow current appli- cations to transparently migrate to pen, the model of substitut- ing pen input directly for keyboard input might not be optimal from the user's perspective. Correction of translated handwriting requires more precise hand/eye coordination for graphical input rather than the simple finger and arm movements of keyboard and mouse input. Users may perceive correcting mistranslated handwriting as requiring more concentration than simply backspac- ing or using the mouse to select and delete a region. Translation correction also requires some controls, such as a but- ton to in- dicate when the correction should be accepted. In a study of cur- sive handwriting recognizers done with MWPC, users often ex- pressed a desire for a different kind control over the transla- tion and correction of handwriting than exists for keyboard text input windows [Kempf92]. Ultimately, a GUI "look and feel" designed around pen input, such as PenPoint offers, would provide more comfortable interaction on machines where the pen and an in- tegrated tablet and display are the only input devices. In X, a new "look and feel" would require a new toolkit and a new window manager. Figure 11: Integrating Handwriting Recognition into X 10. Summary Pen input and handwriting recognition are new text input techno- logies developed for small, mobile PC- style computers. These technologies allow input of text without using a keyboard. The user writes directly on the display surface with an electronic stylus or pen, and the computer's operating system or window sys- tem software follows along, filling in color to provide a kind of electronic ink. The pen strokes are then translated into text by a handwriting recognition engine (HRE). Handwriting input tends to be particularly handy when a few words are needed, such as would be the case for a command language style interface or when editing a document. It is less useful when large quantities of text are required, such as when writing a document or program- ming. While the focus of handwriting recogni- tion has been on mobile computers, handwriting could also play a role in desktop applications. There are two major kinds of handwriting recognizers: block and cursive. Block recognizers require the user to print in well- separated block letters, while cursive recognizers allow letters to be connected with ligatures. Both block and cursive recogniz- ers have a word translation accuracy of about 85-90%. Cur- rent- ly, a wide variety of recognition technologies are either under development or commercially avail- able. East Asian languages look like a particularly good candidate for recognition, because the process of entering text on a keyboard in East Asian languages requires more effort. To help foster development of applications employing handwriting recognition and research into tech- nological improvements in recognition and handwriting-based user interfaces, a standardized HRE API was designed. The API allows multiple different recogniz- ers to coexist and is internationalized to allow support for lo- cal language recognition. The API implementation, or recognition manager, requires the HRE to be packaged as a shared library that is dynamically loaded upon client demand. The API con- tains functions for translating handwriting, for obtaining translations with correlations between the translation and the original pen strokes, and for training the recognizer. The API also has a function whereby the recognizer can provide a vector of functions for extended capabilities. The recognition man- ager has been im- plemented on Solaris 2.x, Sun's distribution of SVR4. In the current X window system environment, clients are required to deal with the details of pen input themselves, before handing off the pen strokes to the handwriting recognition engine. The alternative is to provide server and toolkit support for pen in- put. Two server extension prototypes and one toolkit extension prototype have already been developed. Full integration of handwriting into the X window system architecture is likely to require a standardized server extension to X for pen input, and toolkit extensions to handle handwriting recognition. Alterna- tively, the existing toolkit text widgets could be extended for handwriting input, though users often express a preference for a different kind of control over handwriting translation than ex- ists for keyed text. Ultimately a very effective pen-enhanced or pen- only GUI may require a redesigned GUI "look and feel" for pen, implemented as a new toolkit and win- dow manager on X. 11. References [Dao92] Dao, J., "The Next Computer Paradigm," Computer Intelli- gence Corp., 1992. [Ferguson92] Ferguson, P., "The X Input Extension," X Resource, 4, pp. 171-270, 1992. [Gibbs93] Gibbs, M., "Handwriting Recognition: A Comprehensive Comparison," Pen, 12, pp. 31-35, 1993. [Go92] PenPoint Programmer's Reference, Go Corp., Foster City, CA, 1992. [Kempf92] Kempf, J., "An Evaluation of Cursive Handwriting Recog- nition Technology," Sun Microsystems, 1992. [Kempf93a] Kempf, J., and Wilson, A., "Supporting Mobile, Pen- Based Computing with X," X Resource, 5, pp. 203-211, 1993. [Kempf93b] Kempf, J., "Preliminary Handwriting Recognition Appli- cation Program Interface for SPARC", Sun Microsystems Computer Corp., 1993. [Kernighan88] Kernighan, B., and Ritchie, D., The C Programming Language: Second Edition, Prentice-Hall, Englewood Cliffs, NJ, 272 pp., 1988. [Martin92] Martin, G., and Pittman, J., "Recognizing Hand-Printed Letters and Digits," in Advances in Neural Information Process- ing, D. Touretzky, ed., Morgan Kaufman, Menlo Park, CA, pp. 415- 422, 1990. [Microsoft92] Microsoft Windows for Pen Computing: Programmer's Reference, Microsoft Corp., Redmond, WA, 1992. [Mori92] Mori, Y., and Joe, K., "A Large-Scale Neural Network which Recognizes Handwritten Kanji," in Advances in Neural Infor- mation Processing, D. Touretzky, ed., Morgan Kaufman, Menlo Park, CA, pp. 415-422, 1990. [Nye92] Nye, A., Xlib Programming Manual, O'Reilly and Associ- ates, Sebastapol, CA, 645 pp., 1992. [Rubine91] Rubine, D., The Automatic Recognition of Gestures, Ph.D. thesis, School of Computer Science, Carnegie Mellon Univer- sity, 1991. [Rhyne92] Rhyne, J., et. al., "Enhancing the X-Window System," Dr. Dobb's Journal, pp. 30-38, Dec., 1991. [Scheiffler86] Scheiffler, R., and Gettys, J., "The X Window Sys- tem," ACM Transactions on Graphics, 5(2), April, 1986. [Stroustrup91] Stroustrup, B., The C++ Programming Language: Second Edition, Addison-Wesley, Reading, MA, 669 pp., 1991. [Tappert88] Tappert, C., et. al., "On-Line Handwriting Recogni- tion: A Survey," Proceedings of the 9th Conference on Pattern Rec- ognition, IEEE Computer Society Press, Washington, DC, pp. 1123-1132, 1988. [USL92a] Unix System V Release 4: Programmer's Guide, Unix System Laboratories, Prentice Hall, Englewood Cliffs, NJ, 1990. [USL92b] Unix System V Release 4: Interface Definition, Unix Sys- tem Laboratories, Prentice Hall, Englewood Cliffs, NJ, 1990. [USL92c] Unix System V Release 4: Multi-National Language Supple- ment, Unix System Laboratories, Prentice Hall, Englewood Cliffs, NJ, 1990. [Vallone91] Vallone, R., et. al., "Evaluation of Handwriting Recognition Technology: Word-level vs. Character-level Accuracy," Pro- ceedings of the 35th Annual Meeting of the Human Factors So- ciety, 1991. [Widener91] Widner, G., and Joloboff, V., "Developing Interna- tionalized X Clients," X Resource, 0, pp. 133-152, 1991. James Kempf (james.kempf@sun.com) has spent 10 years in the com- puter industry with various interesting system software projects. At Hewlett-Packard Laboratories, he worked in research on object-oriented software. Some of the projects he was involved in include participating in the design of the Common Lisp Object System (CLOS), implementing CLOS on HP Common Lisp, and develop- ing an object-oriented database that allowed exchange of objects between programs written in Common Lisp and Objective-C. In 1988, James moved to Sun Microsystems, where he spent a year helping implement CLOS in Lucid Common Lisp and writing a prototype development environment for CLOS. In 1989, James began working on the Spring distributed, object- oriented operating system at Sun, a project which was moved to Sun Microsystems Laboratories Incor- porated when it was formed in 1991. Since 1992, James has worked in Sun Microsystems Computer Corporation as a nomadic software architect and pen-based computing resource person.