################################################ # # # ## ## ###### ####### ## ## ## ## ## # # ## ## ## ## ## ### ## ## ## ## # # ## ## ## ## #### ## ## ## ## # # ## ## ###### ###### ## ## ## ## ### # # ## ## ## ## ## #### ## ## ## # # ## ## ## ## ## ## ### ## ## ## # # ####### ###### ####### ## ## ## ## ## # # # ################################################ The following paper was originally presented at the Third Annual Tcl/Tk Workshop Toronto, Ontario, Canada, July 1995 sponsored by Unisys, Inc. and USENIX Association It was published by USENIX Association in the 1995 Tcl/Tk Workshop Proceedings. For more information about USENIX Association contact: 1. Phone: 510 528-8649 2. FAX: 510 548-5738 3. Email: office@usenix.org 4. WWW URL: https://www.usenix.org ^L Taming the Complexity of Distributed Multimedia Applications Frank Stajano and Rob Walker Olivetti Research Limited 24A Trumpington Street Cambridge CB2 1QA United Kingdom {fstajano, rwalker}@cam-orl.co.uk https://www.cam-orl.co.uk/ Phone: (+44 1223) 343.000 Fax: (+44 1223) 313.54 Abstract The Medusa environment for networked multimedia uses Tcl to compose applications out of low-level processing blocks called modules. A medium-sized application such as a two way multis- tream videophone already uses around one hundred interworking modules, running in parallel on several host machines. This paper shows how we overcome the inherent complexity of such applications: to deal with parallelism we use a multi-threaded library hidden behind a single-threaded Tcl interpreter; to build higher order components than the modules we use the ob- ject oriented extension [incr Tcl]; and to exploit the variety of available input and output devices we adopt the Model-View- Controller paradigm. Important note from the authors This is an ascii-only version which we were asked to produce by the conference organisers. To ensure consistency, it was obtained by automatically stripping pictures and formatting from the original, without any further re-editing (picture captions have been left in). As such it is incomplete, because the paper was intended to be read in conjunction with the fig- ures and diagrams that were prepared for it. This version may be usable for text searches, but if you intend to read the pa- per we advise you to get the full version. You may obtain a full copy, either as postscript or as a paper reprint, from Usenix or from Olivetti Research. 1. Introduction Medusa [Medusa-ICMCS 94] is an applications environment for distributed multimedia designed to support many simultaneous streams. Tens of parallel streams and over a hundred software modules are used in some of our applications and several such applications may be active in the system. Input devices in the broadest sense (from cameras and microphones to sensors and location equipment like Active Badges [Badges 94]) can all be used as sources of multimedia streams. The consumers of such streams may be human, for example someone watching a video, or computer-based, such as an analyser recognising a gesture made in front of a camera. It is not uncommon for the same camera stream to be simultaneously sent to a number of video windows on different X servers, a storage repository on the network and a video analyser module on a processor bank. Modules, the software units that generate, process and consume the data, are optimised for efficiency and written in C++. Applications, designed for flexibility and configurability, are created in Tcl/Tk: they instantiate the required modules on the appropri- ate networked hosts, they connect them together and control them according to the users directives. The first few applications were successful and they raised ex- pectations for the subsequent ones. As soon as one aspect of the system became configurable in one program, all the other programs were required to offer the same flexibility. While the modules remained the same, the Tcl portion of the applica- tions started to grow in size and complexity over the limits of what could easily be managed within the applications frame- work we had one year ago [Medusa-Tcl 94]. This paper presents some problems we had to face in three key areas (asynchronous programming, reusable components, multiple interfaces) as a consequence of this growing complexity, together with our so- lutions. The inherent parallelism of the system called from the start for an asynchronous interface to the modules. It was not triv- ial, however, to schedule the various Medusa tasks and the Tcl interpreter in a way that would ensure consistency and effi- ciency. Maintaining a simple API from Tcl was also a high pri- ority, and a challenging task. We finally implemented a multi- threaded interface to the lower levels of Medusa, but we pre- sented it as single-threaded from Tcl. We also developed a new programming construct, the asynchronous context, to support efficient error checking in an asynchronous environment from within the single-threaded Tcl layer. After some experience with the system it soon became clear that an extra level of abstraction was highly desirable be- tween the low-level modules and the complete application. Some common patterns of modules emerged repeatedly and the obvious desire was to encapsulate such a medium-level structure into something as self-contained as a module, of which one could create new instances as required. We adopted [incr Tcl] for this [incr Tcl 93]. The first Medusa applications had traditional mouse-based in- terfaces implemented in Tk. As our data analysers became more refined, we started investigating alternative input devices like pen, gesture and voice. An application could check for input on all these different channels, but it was preferable to find a way to abstract the input interface from the re- quired action. The symmetrical problem was found on the output side, where the same message could be presented through a va- riety of media including text, video, recorded audio and syn- thetic voice. We used the Model-View-Controller paradigm for this [Smalltalk 90]. 2. Medusa - a brief overview To follow the rest of the discussion, an understanding of the basics of the Medusa architecture is required. An in-depth presentation of Medusa would be outside the scope of this pa- per, but a brief overview will be given here. More details may be found in [Medusa-ICMCS 94], [Medusa-Tcl 94] and [Medusa- video 95]. The Network Scalable Personal Computer The hardware on which Medusa runs is based around the concept of the Network Scalable Personal Computer (NS-PC): the multi- media devices are not peripherals of the main computer but are first-class network citizens with their own ATM connection. This has several advantages. One of them is scalability: some NS-PCs may have one camera while others have six, without the need for an overdimensioned backplane; in some cases a Network Scalable PC might not even include a workstation  it might be, for example, the logical union of a video tile (a stand- alone LCD display) and an audio brick. Another benefit is the avoidance of undesired interference between streams: in a tra- ditional architecture the digital audio data has to be routed through the workstations bus, thus degrading the performance of both the workstation and the audio link itself. NS-PCs, hosts, servers and modules across the network. From a software point of view, Medusa uses a peer-to-peer ar- chitecture in which modules, linked by connections, exchange data and commands. Modules reside in server processes dis- tributed across the hosts of a heterogeneous network. However connections between modules have the same behaviour and prop- erties regardless of the relative location of the modules they connect  to Medusa, a connection between modules in the same process is no different from one between modules on different machines. At this low level, even the controlling Tcl applica- tion is seen as a module; the application process contains a module with many command connections to all the modules it creates. 3. Parallelism, asynchronous execution and threads Any Medusa process, both the servers containing the data proc- essing modules and the application process containing the con- trol module, must contain the Medusa scheduler, which manages the processing of messages sent and received over the connec- tions. Because the application process is implemented as a Tcl interpreter containing, among others, the Tk and Medusa exten- sions, the interaction between the Medusa scheduler and the Tk event loop is particularly important. Our original implementation of the Tcl interpreter for Medusa was single-threaded. While standard Tcl commands were execut- ing, Medusa was locked out. This didnt affect the multimedia performance as the interpreter only dealt with the control of modules; the data could still flow through the worker modules, which resided in other processes. Still, this meant that a Me- dusa-triggered callback might have to wait for a while before being invoked. Because Medusa activity was considered to be high priority, whenever Medusa regained control (because Tcl invoked a Medusa command) it allowed its Tcl callbacks to be invoked. In retrospect this was a bad design because it ren- dered procedures interruptible by callbacks, whereas they arent in plain Tcl/Tk. Tcl does not have any synchronisation primitives to disable interruptions by callbacks scheduled by -command options, af- ter events or bind commands. On the other hand, it doesnt need them because its mode of operation is such that these callbacks can only be invoked when the interpreter is idle or when a flush is explicitly requested via update. In other words, callbacks cant interrupt anything else by definition. The queuing of the pending callbacks is taken care of by the system and is transparent to the application programmer. We considered whether to introduce disable/enable interrupts primitives, but we soon decided that Tcls original model was the one to adopt. Procedures must be uninterruptible unless they request an update. This method is very simple, suffi- ciently powerful and, most importantly, very natural for the programmer  so much so that in most cases the programmer doesnt even think about what could happen in the middle of what else, and yet Tcl does the right thing in the end. So we followed this philosophy in our new implementation. The development of a threaded API to the Medusa module layer prompted a major change from the original implementation: we made the new Medusa-Tcl interpreter multithreaded. One thread runs the interpreter while other worker threads running in parallel carry out the individual Medusa commands. The com- mands for each module must be executed in strict sequence, but the interpreter need not be held up waiting for each one to finish before issuing the next, and commands for different modules can be executed in parallel. Each Medusa command spawns its own thread and immediately re- turns control to the interpreter (unless synchronous execution is explicitly requested for that command) so that the Tcl flow of control can move on to the next instruction. To sequence the commands pertaining to a given module, every module has a FIFO queue of worker threads. All threads, except the head of the queue, are blocked, and when a thread finishes it drops off the queue, allowing the one behind it to run. Although this change involved a complete rewrite of the inter- preter, we maintained complete compatibility with the previous Tcl-level API, so that existing scripts could run unmodified. We maintained, for example, the scheme by which Medusa com- mands can have optional -success and -failure callbacks asso- ciated with them for the management of asynchronous calls. While the syntax described in [Medusa-Tcl 94] was retained for compatibility, some new alternative constructs were introduced to make the Tcl programming interface to the modules even more similar to that of Tks widgets. module .m \ -host duck -server mgbvideo \ -factory CameraDevice0 \ -module video_source \ -frame_rate 1/2 The module, once created, has a configure method through which its attributes can be inspected and changed: .m configure -frame_rate ^ 1/2 .m configure -frame_rate 1/3 ^ 1/3 This syntax implicitly uses the getattributes and setattrib- utes methods of the module. It is also possible to bind a Tcl variable to an attribute so that changes to one will be reflected in the other and vice versa. This implicitly uses the watchattributes method of the module to propagate changes from Medusa to Tcl, and the trace mechanism on the variable to propagate changes in the reverse direction. This is most useful for controlling an attribute with a Tk widget that can be linked to a variable. The vari- able becomes the contact point between the module and the wid- get. module bind attrName varName .volumeSlider bind gain scalePos scale .s -variable scalePos As we shall see in greater detail later on, the variable can be viewed as the Model in a Model-View-Controller setup; the scale is then both a View and a Controller for the variable. The same goes for the attribute. 3.1 An example of asynchronous programming issues: building a pipeline Because Medusa is a fully distributed system in which the mul- timedia peripherals are first-class networked computers, most connections from an ultimate source to an ultimate sink (such as from a camera to a video tile, possibly going through buff- ers, gates, format converters and so on) involve modules that are on different hosts from each other and from the host that the Tcl program is running on. The connect method is used to tell a module to connect to another module. Setting up the complete pipeline involves creating the n distributed modules and connecting up the n-1 pairs by issuing the connect method on one of the modules of each pair. The pipeline of modules we want to create Without the programming constructs to specify asynchronous execution, one would create all the modules sequentially (each time waiting for completion of the previous creation) and then connect all the pairs in turn. Neglecting, as a first approxi- mation, the time spent issuing the commands, the situation can be represented as follows, with time on the horizontal axis: This is clearly unacceptable and was never considered as a vi- able solution. In our first implementation the command to create new modules could be invoked asynchronously but it did not support -success and -failure callbacks. The MDsynch ?module? command (similar to update) blocked the interpreter until no more com- mands were pending on a given module, or on all the modules known to the interpreter. In this environment one would create all the modules concurrently and then, for each pair in turn, MDsynch the modules of the pair and then connect them before proceeding to the next pair. The resulting situation is: This is a substantial improvement but still isnt fully opti- mised. Because at connect time the A-B pair is processed be- fore the B-C pair, the commands MDsynch A; MDsynch B; Connect A-B precede MDsynch C; Connect B-C and the issuing of the Connect B-C command has to wait unnecessarily for the crea- tion of A (specifically because of the MDsynch A in the treatment of the A-B pair), whereas it could have started ear- lier otherwise. This is because there is only one Tcl thread of execution and so MDsynch, like update, blocks the whole in- terpreter. Note that the above example, for illustration purposes, shows a worst-case scenario in which the creation of the first mod- ule takes the longest time; on the other hand the method we outlined cannot optimise the order of creations in advance, because the creation times for the various modules are unknown before creating the modules. It is interesting to note that even the availability of -success and -failure callbacks is not sufficient to produce the optimum version without resorting to global flags. The problem is that we cant schedule the Connect A-B in the - success callback for either A or B, because the other module might not be ready to accept a connection. We would have in- stead to schedule something like the following proc: proc conditionallyConnect {A B} { # Invoked on successful creation # of A; tries to connect it to B. global created if {[info exists created($B)]} { $A connect $B } else { set created($A) 1 } } Every module except the first and last must have two of these calls in its -success callback, one to attempt to connect it to the previous module and one to attempt to connect it to the next, because it is not known in advance which module of any given pair will be ready first. The optimal solution, obtainable through the above strategy, has the following diagram: But this is where our new threaded implementation proves its worth: its sequencing of asynchronous commands is such that this behaviour is obtained by default, with no effort required by the programmer. It is sufficient to issue all the creation commands and then all the connection commands, without having to register any callback or global flag. The create commands will start in parallel for the different modules, each being the head of its queue. The command to connect two modules is effectively present in two queues, but it doesnt hold the second queue up for any length of time  it is just there for synchronisation purposes, to ensure the module has been created. The actual connection operation starts as soon as both modules exist. A key feature of this scheme is that the command that creates a module, even if it returns before having completed the crea- tion, ensures that the thread queue for the module is ready before returning. This allows the system to accept and enqueue commands (such as connect) for that module even before the module actually exists. 3.2 Programming for robustness in an asynchronous environment One of the main reasons why prototypes are so much quicker to write than full-fledged applications is that in writing a pro- totype one can afford to omit a great deal of detail about er- ror checking. While in the prototype an action can simply be expressed by issuing the corresponding command, in a properly written application the command must be wrapped inside a layer of code that traps any potential errors, checks the return code and possibly recovers by attempting an alternative opera- tion. It is well recognised [Man-Month 75] that this error catching in itself adds a significant layer of complexity to the application. But the problem only becomes worse in an environment using asynchronous programming. To be able to check the return code, which is what Tcls catch does, the command must have com- pleted its execution, successfully or not. However the asyn- chronous strategy, which is fundamental to achieve the desired levels of efficiency especially in a distributed system, is to issue the command in the background and move to the next one at once, well before completion; so catch would not work here, because the command to be caught returns control to the inter- preter immediately, before knowing whether it will succeed or fail. In a prototype which doesnt check return codes the asynchronous approach is easy to follow: as long as all the commands succeed, everything will work and go very fast; as soon as one command fails, though, the whole prototype will fall over. For an application that wants to be robust, check- ing the return codes is reasonable, if only tedious, when one can afford to wait until the commands complete before proceed- ing, i.e. when one has the luxury of using the catch approach; but having to support both error checking and asynchronous programming multiplies, instead of simply adding, the com- plexities of the two approaches. It should be noted in passing that the fact of dealing with a distributed environment makes error checking a necessity for every application which is not a prototype. You wouldnt nor- mally think of catching every button creation in your Tk pro- gram, because if the programs window comes up at all it is highly probable that it will be able to create all the buttons it needs. But when the items you create are modules running in external processes and often on external hosts, it would not be wise to assume that because the program has started on the local computer then all these remote operations will succeed too. The way Medusa controls asynchronous programming is, as weve seen, by making the -success and -failure options available to every Medusa command. This allows the programmer to register two callbacks, one of which will be invoked when the command completes. While this construct is expressive and complete, at the Tcl level it forces a programming style where the code to perform a complex action, instead of being written as a self- contained procedure, has to be scattered among the nodes of a binary tree of callbacks. We can say from experience that this leads to programs of low readability that are quite hard to maintain and update. To overcome this problem we developed a new programming con- struct, the asynchronous context, which allows several related actions to be launched as a group, registering -success and -failure callbacks for the whole group without having to fol- low the outcome of the individual actions. Lets use the pipeline example from section 3.1 to show how this construct works. Assume that all the modules have been created and that we want to connect up all the pairs, as fast as possible and checking for any errors. We create an asynchronous context object, say c, and register our -success and -failure callbacks with it; these callbacks will be relative to the success or failure of the entire op- eration, not of the individual commands. From within c, we launch all the required Medusa commands: this causes them to be spawned asynchronously, with hidden -success and -failure callbacks added by the context which will notify c of the com- pletion of the launched actions. When we have launched all the required operations, we issue cs commit method, which means that no more commands are to be launched from within this con- text. From now on, as soon as all the commands launched so far have completed, c will invoke one of its two completion call- backs. The corresponding procedure can be written as follows: proc pipe {args} { # argument parsing omitted for # brevity. The result of it is that # the local variables success, # failure and modules are created. set c [async_context #auto \ -success $success \ -failure $failure \ -persistent 0] set l [llength $modules] for {set i [expr $l-1]} {$i>0} \ {incr i -1} { set sink [lindex $modules $i] set source [lindex $modules \ [expr $i-1]] $c launch $i-th_pair \ $sink connect \ up input $source down } $c commit } You may have noticed the -persistent 0 option in the command that creates the context. This instructs the context object to destroy itself after it has reached completion and invoked one of its two callbacks, relieving the programmer from having to keep track of the context herself. In other cases one may want the object to stay around even after completion, so as to be able to query its state or to invoke some of its methods: - persistent 1 will then be used. More features available for the context object are shown be- low: The -sync option If set to 1, makes the commit method block until completion. Also available as a sync method for persistent objects. The -cleanuponfailure option A failure of the context means that at least one of the launched operations failed; but other launched operations may have succeeded. If this option is set to 1, in case of failure for each launched operation which succeeded a match- ing cleanup command will be called. This requires the pro- grammer to register a cleanup command for every command that is launched. In the example above, one would register a matching disconnect for every connect. Incidentally, this is the reason why the launched operations carry an identifier ($i-th_pair above). The interrupt method The typical use of this is to launch a set of operations from the main program while allowing an external event (typically a button press from the user) to abort the se- quence. The interrupt method can be invoked at any time dur- ing the lifetime of the context, even before a commit. It will have effect if it is received before completion, in which case it will force a failure. If the interrupt comes in before commit, all requests to launch commands are si- lently ignored (this saves time: if you've already decided to abort the lot, any operation requested after this deci- sion is not even started). If the interrupt comes after the commit but before completion, it won't affect pending opera- tions although it will still force a failure on completion. If the context has already completed then the interrupt has no effect. The composition properties The asynchronous context has good composition properties. The construction of a composite object containing several modules (see 4.3) can be wrapped into a context, thus pro- viding -success and -failure callbacks for the composite ob- ject. From then on, the construction of such a composite ob- ject becomes itself an operation that can be launched from within a higher level context. The asynchronous context has been a very successful addition to our programming toolkit. It allows us to perform the impor- tant task of error checking without giving up the efficiency advantages of asynchronous programming, all with a clean pro- gramming interface to single-threaded Tcl. 4. Composing modules into higher order objects Just like the graphical side of a Tk application is built out of widgets such as buttons and listboxes, the multimedia side of a Medusa application is built out of modules such as cam- eras and speakers. In both cases, after some practice in writing applications, it is frequently observed that common patterns emerge: groups of widgets or modules appear in the same arrangement from one program to the next. And, just like Tk programmers have often felt the need for mega-widgets, so Medusa programmers have of- ten wanted mega-modules. A typical example is the basic video pipeline that puts a video stream from a networked camera onto an X display. A cam- era module pumps data through a network buffer into a con- verter and an X sink. The converter translates the incoming video from the format produced by the source to the one ac- cepted by the display (displays with different colour depths will represent pixels in different formats). The X sink puts the frames of video into an X window which may have been cre- ated by Tk. The modules of a video pipeline The basic parameters for such a pipeline are the frame rate, the size of the picture produced by the camera and the magni- fication factor (video frames can be resized in software by the converter module). Unfortunately, to change these parame- ters you must know which modules control them: the frame rate is set by the camera; the magnification factor, by the con- verter; the picture size, by the camera again; and all this without forgetting that, if you change the size or the magni- fication on the modules, you must also change the size of the Tk window containing the X sink. What is needed is a way to create the pipeline as an object which knows what internal op- erations to perform when requested to change those parameters. This would allow Medusa programmers to develop a useful col- lection of parts for common tasks; new applications could then be written by concentrating on their new functionality instead of the perpetual recoding of the common portions. Ideally, for greater completeness, the object would also include the appro- priate Tk widgets through which the parameters could be tuned by the user. Including the controlling widgets in the multime- dia object itself also has the benefit that, whenever that ob- ject appears in an application, the user is always presented with a consistent interface that must be learnt only once. One of the first requirements was for this level of grouping to be implemented in Tcl. Writing the objects in C++ was con- sidered but had little to offer other than some potential ef- ficiency gains; on the other hand it complicated the develop- ment process considerably, especially when embedding Tk wid- gets inside objects. We first tried using pure Tcl and one of the successful results was what we call a vwin object  the combination of a converter and X sink modules, a Tk window and some suitable controlling widgets. A simple program incorporating a vwin (everything below the menu bar is part of the vwin) The vwin was immediately popular and was adopted by practi- cally all the video applications written after it  a practi- cal demonstration of how much the need was felt for higher level reusable components. The drawback of this approach was that the vwin was not really an object but rather a group of modules, procedures and global variables. We therefore adopted the object oriented extension [incr Tcl], which provides lan- guage constructs for developing such groups as first-class ob- jects and endowing them with a syntax similar to that of nor- mal widgets and modules. Of the three pillars of OOP  encapsulation, inheritance and polymorphism  what we needed most was encapsulation. With [incr Tcl], the vwin can at last become an object with its own methods and private data. We did not develop a class for mod- ules in order to inherit from it, because our classes do not usually have an is-a relationship with modules: a class rep- resenting a pair of modules has both of them in it, but is neither; inheritance (even multiple) is generally inappropri- ate in our case. There is however scope for inheritance be- tween objects. A basic video sink widget might simply consist of two Medusa modules and a frame widget, while a more complex one would inherit from this and have extra widgets to control the video parameters. What we have described so far is similar to the experience of many other programmers in the field of Tk widgets: the atoms are too low level to build applications with, so we need a way of composing them into higher level molecules (with apolo- gies to chemists!) with their own methods. We do this by resorting to OOP, using mostly encapsulation and a bit of inheritance. In the field of distributed multimedia, however, there are new major complications, related to the is- sues of connections between objects and cardinality of connec- tions. 4.1 Connections between objects Modules, unlike widgets, must not only be grouped together  they must also be connected for data to flow. Grouping atoms into molecules is a hierarchical composition. But our atoms now also have connections between them. Can we allow con- nections to span molecule boundaries or not? Cutting vertically in our language means drawing molecule boundaries that intersect the pipelines and consequently force the molecule to expose some I/O ports. Cutting horizontally is our shorthand expression for drawing molecule boundaries which are parallel to the data pipelines. If we go back to our video pipeline example it is easy to see that we would like to cut horizontally so as to have the camera, the converter and the Tk frame in the same molecule; this encapsulation would allow us to ask the molecule to change its magnification without us having to know which individual module to address. But on the other hand it is not unreasonable to want to cut vertically to obtain a source / sink arrangement, especially in the case of multiple parallel streams. It can be quite useful to be able to connect the multiple sources from NS-PC A to sinks on either B or C. But if we want to be able to cut both ways simultaneously, a hierarchical composition is no longer satisfactory. So we ask ourselves: should we look for a grouping strategy that is more flexible than the hierarchy? The problem of wanting to group atoms that have connections is found in other fields too, like digital circuit design, and we may draw some inspiration from there. The established practice in digital design is to allow connections to span molecule boundaries. An atom (a logic gate) has connections to the out- side, and so has a molecule (a component like a counter). From the outside, atom and molecule are structurally alike: both have input and output ports accepting data and control infor- mation. The digital design point of view, compared to the mul- timedia point of view, does not give the same emphasis to sources and sinks of data, but nonetheless allows them to be represented in the model. In simple cases, like building a counter out of gates, it is the pattern of interconnections between the I/O ports of the atoms that defines the behaviour of the molecule. In more complex cases, where programmable logic is involved, the behaviour of the molecule may be de- fined by a set of instructions stored in a ROM. This model, although purely hierarchical, has proved its worth in the hardware field. Does it work well with our multimedia modules? The answer is yes, as long as we are prepared to give up some flexibility. Because a hierarchy is involved, we can- not cut both horizontally and vertically: we must decide on one or the other beforehand, when we design the collection of reusable parts that we are going to build. An important lesson to learn from the digital design field is that it is a good idea to make molecules that behave like at- oms when seen from the outside. This means above all that they must be capable of being connected to atoms as well as to other molecules. Other features that would be desirable for uniformity but which are not essential to make the system work are the ability to export the internal modules attributes as if they were the molecules own, the ability to respond to the standard module methods (including for example watchattributes on the exported attributes) and the ability of being inspected with standard tools like MDpanel (a Tcl library procedure that pops up a window exposing the attributes of the module). The issue of making molecules look and behave like atoms is shared in the field of mega-widgets [incr Tk 94]. We are still in the experimental stages at the time of writing, but it may well be the case that the solutions adopted by the Tk community to compose widgets may be fruitfully applied to composing mod- ules. An alternative solution is to provide the module look and feel not by emulation but through Medusa itself. We could build a special Medusa module that would do nothing except being a wrapper. This idea consists of recreating the module equiva- lent of what the frame is in the world of Tk widgets: a compo- nent that does nothing of its own but whose function is to group other components. Medusa already has facilities for mod- ule proxying and for handing off data connections to lower level implementation modules (see [Medusa-ICMCS 94]), so this strategy would save us from having to emulate a substantial amount of functionality. The major piece of work would not go into creating the wrapper module but rather into the equivalent of the pack command, say mdpack: the command that puts an existing module into the wrapper, connecting it with others in a user-specified pattern and exporting some of its attributes. This command should be exported to the Tcl level to allow molecules to be defined in Tcl. Building a molecule amounts to creating a wrapper, creating the appropriate modules and plugging the modules together in- side the wrapper with mdpack. [incr Tcl] steps in at this point, to allow the programmer to define molecule types in- stead of just instances. In the tradition of [incr Tcl], some renaming and aliasing acrobatics will become necessary to make the object respond to its module name as well as its object name, but this is an issue that has been dealt with before when building mega-widgets. In building hierarchies of molecules we believe that it will be advantageous to have a lower level where the molecules only contain modules and no Tk widgets. This allows the functional- ity to be separated from the user interface  a topic that will be treated in greater detail in section 5. 4.2 Cardinality of connections As explained in section 2, the configuration of an NS-PC is completely flexible and the number of available devices varies from unit to unit. We have talked of standard high level molecules and how to build them in the style of the digital circuit components, but there we implicitly assumed that a molecule had a definite structure and, for example, a given number of ports. If we want to design a molecule for a multi-headed video source, it must instead have as many output ports as there are available cameras on that given NS-PC; so, while the general structure will be the same, different instances will have different car- dinalities. The constructor takes a list of the available cam- eras as a creation parameter. On the sink side we have yet another situation. A multi-headed video sink is made of screen windows, but the NS-PC imposes no definite limit on the number of windows that can be created; it either does not have a screen, in which case it cant make any, or it has one, in which case it can make as many video windows as required. But if a multi-source is to be plugged into a multi-sink, the multi-sink must be ready to accept whatever the multi-source is ready to produce. Always over- dimensioning (and consequently under-using) is not a good so- lution. Over-dimensioning first and then killing off the dan- gling unused sources or sinks after the connection has been established is only slightly better. This is our second major complication in grouping, which comes from the fact of dealing with a heterogeneous distributed sys- tem instead of a uniform configuration where every networked workstation has the same array of multimedia devices. In the digital design case, once a molecule containing five atoms of a kind has been defined, it is always possible to instantiate it; in the networked multimedia case, instead, the available resources are not known before runtime and so we need a more versatile molecule that, instead of five, has as many atoms of that kind as it is possible to have on this host. One solution is to give up completely on the multi-source and multi-sink arrangement and tend to cut horizontally, grouping one pipeline at a time. Later, all the parallel pipelines can be combined into a bigger molecule. This means giving up the advantages of having a multi-source or multi-sink as an ob- ject. We have also been designing an alternative solution, based on delayed creation of the endpoints, which has the advantages of both the vertical and the horizontal styles of grouping: it sees the multiple endpoints as separate entities, but it can also address the multiple pipeline and the individual pipe- lines inside it. The key idea is that the multi-sink and multi-source must not be created in advance, but only when a specific multiconnection request is received. This way it be- comes possible to work out the common subset beforehand and create two endpoints that match. This arrangement amounts to grouping vertically before the endpoints have been created and grouping horizontally once they exist. The main drawback is an increased setup time compared to the case where the endpoints are created in advance. The structure to implement this idea is rather elaborate and it is still part of our research to evaluate whether the benefits of this approach are worth the extra complication. 4.3 The current implementation For most of the compositional approaches presented above we have built prototypes to evaluate their relative advantages and trade-offs. But we havent yet developed any of them into a complete solution upon which to build an extensive library of components. In the meantime, though, with or without a library, we had to produce some reliable applications to put our deployed hard- ware to good use, and for this task we built a few components as we went along, without a grand design behind them but with the practical goals of robustness and reusability. These components are currently used in the supported multime- dia applications that are in daily use at the lab: video mail, video phone, video peek and radio/tv/CD player. They are listed below. The vwin A video sink object with user controls for the stream pa- rameters: picture size, magnification and frame rate. Internal structure of the ringer component The ringer A complete audio pipeline from a file source to a speaker that can play a given sound sample and optionally repeat it at regular intervals until stopped. The multicamera A multi-headed video source combining many cameras in parallel and automatically adapting to the number of cameras available on the NS-PC where it is instantiated. The multiwindow A multi-headed video sink consisting of a dynamically variable number of parallel vwins; as the multiwindow is connected to a multicamera, it automatically creates or hides some of its vwins so as to match the number of chan- nels of the multicamera. Internal structure of the video transceiver, showing a hierarchy of building blocks The video transceiver The combination of a multicamera and a multiwindow. Can be connected to another transceiver for a bi-directional link, or it can be looped back on itself for a local view. These components are built as [incr Tcl] objects containing Medusa modules and, where appropriate, Tk widgets. As can be seen we have performed the grouping both ways, depending on the circumstances. The ringer is built by cutting horizon- tally, i.e. by creating an object which contains an entire pipeline and has no data connections to the external world. The other objects are instead built by cutting vertically, i.e. by running the object boundary around an endpoint of the pipeline and thus intersecting the data flow. These components have been built by cutting either horizon- tally or vertically but not both, without resorting to the more complex delayed creation approach outlined earlier. The case of the vwin shows some of the problems of this limita- tion. This component is obviously an endpoint, i.e. an object that has been obtained by cutting vertically. However it has an element of horizontal cutting (which means keeping the whole pipeline together) in that it contains user controls, like the picture size selector, that refer to parameters be- longing to modules outside of itself, in this case the camera module. It is thus necessary to give the vwin a reference to the external module (the module name of the camera it is con- nected to) so that it can ask the camera to produce the appro- priate size whenever the user acts on the vwins controls. The vwins control panel has to act on modules outside the vwins boundaries This pointer going across the object boundary is the dirty part of this scheme. As new features are added to the vwin (this happened recently), the need arises for the vwin to talk to more modules in its pipeline that are outside of itself, and the proliferation of references to modules outside the vwin object is a bad thing. The fact that these components can themselves be used as building blocks for other components, as exemplified by the multiwindow and the video transceiver, is a fundamental property. We would reject any composition method that only allowed one layer of grouping. The ability to make molecules out of other molecules, and not only out of atoms, allows us to build the library of components in a bottom-up style. It must however be noted that the molecules we have built do not behave in all respects like atoms. The programming inter- face is similar for some aspects, like the availability of -success and -failure callbacks on most methods including creation, but the attributes, ports and capabilities mecha- nisms that all modules have are not supported nor emulated. The molecules also have a richer set of methods than the at- oms, with each component exposing new methods appropriate to the facilities it provides. This makes their programming in- terface more expressive, but it also means that they couldnt be easily treated by automatic tools like Sticks and Boxes (see [Medusa-Tcl 94]). It is debatable whether this is an as- set or a liability; for the atoms, the ability to deal with them only through a limited set of standard commands is mostly an advantage, but for the molecules, given that they can en- capsulate much higher level behaviour, it may not be appropri- ate to restrict the programming interface in a way that may not map naturally to the facilities they offer. An important issue that comes out of this experience is the relevance of a strategy for connecting molecules that have connection ports with a composite structure. For the compo- nents presented above the problem was bypassed by designing the multiple endpoints in pairs and then giving one of them a connect method that knew how to talk to the other one. This works well as long as each type of component can only be con- nected to another given type; but as soon as a component can be connected to several other types of components, a clearly defined multiconnection interface becomes a necessity. Among the issues to be addressed (for which we do not have a com- plete solution yet) are the following. Gender Inputs and outputs must of course be distinguishable, and a port may only be plugged into a port of the opposite gender. But the case of a multiple port is less trivial than it sounds, as the subports composing the multiple port could be of different genders (as in the case of the transceiver). It all depends on whether one allows the connection between two transceivers to be seen as one fat bi-directional pipe, or whether one imposes the constraint that pipes can carry mul- tiple streams but only in one direction. Type While there are good reasons for keeping the connections be- tween modules as untyped, it is important to type the higher-level fat pipes that result from aggregation of many simple connections. Presumably this typing will have to be hierarchical, to respect the fact that the molecule out of which the fat pipe comes may be composed of several subpipes each coming from a submolecule and so on. Cardinality Suitable policies must be defined for the case when the two components to be connected match in gender and type but not in cardinality, e.g. one has five video inputs and another has three video outputs. In our multicamera / multiwindow implementation we arranged for the sink side to reconfigure itself to match the cardinality of the source side, but this will not be possible in every case. In such cases it must be decided whether to connect the common subset or to reject the connection request. Self-description capacity The multiport must encapsulate in itself a description of its gender, type etc. so that the connection operation can be made independent of the objects to be connected. 5. Designing multi-device interfaces In what we call first-generation networked multimedia systems [Pandora 90], the data streams are switched from one place to another in a hands off fashion. In second-generation systems like Medusa, however, the application can analyse and manipu- late the data as it flows. This gives us the opportunity to use the multimedia streams as additional input and output user interfaces to control the application. Sound detection, speech recognition, motion detection, gesture recognition, speech synthesis are a few of the new input and output channels that an architecture like Medusa can support. Many transducer modules have been written which turn raw data into higher level events or vice versa. From the point of view of the applications programmer, how- ever, any one of these transducers is still a fairly low level component; it is inconvenient, for example, to have to accept input on several streams by redoing a similar programming job for each of the available input transducers. We adopted the Model-View-Controller paradigm (MVC, see [Smalltalk 90]) as a higher level abstraction that shielded us from this problem. In this paradigm the unit of processing is the Model, which can perform several actions; the input de- vices are called Controllers: they may have very different ways of interfacing to the user, but they share a common in- terface to the Model; a similar arrangement exists for the output devices, called Views: they can show the state of the model in a variety of ways. As a proof of concept, a demo application was written: Grip- ple, an electronic version of a commercial puzzle of the same name. The original Gripple puzzle, marketed by m-squared inc. in 1989 Gripple is rather similar to the classical 15 sliding tiles puzzle; it consists of a set of sixteen numbered discs mounted on five interlocked rotating platforms. Each circular platform carries four discs. There is one platform for each of the cor- ner quartets of discs and one platform in the centre of the board. By rotating the platforms, the discs move from one platform to another. The aim of the game is to bring the discs in the initial configuration after having scrambled the board. The Model contains a representation of the board describing the current disc positions. This is implemented simply as a list of sixteen elements. The model accepts three commands: rotate platform direction where platform is one of the five platforms and direction is either clockwise or anticlockwise. restart ?difficulty? where difficulty is a non-negative integer. If difficulty is omitted, the discs are positioned at random; otherwise, starting from the ordered configuration, that many random scrambling moves are performed (so lower difficulty settings correspond to easier configurations). reload configuration where configuration is a list of sixteen discs. This is used for studying moves. As far as the model is concerned, all these commands ulti- mately have the effect of performing a specific permutation on the sixteen disks. The simplest controller only has a textual interface through which the above commands can be sent to the model. The sim- plest view only shows the list of discs as text. A screen shot showing the video view, the graphical view (note that the underlying model is the same, so the positions of the discs are coherent in the two views) and the mouse controller. A more elaborate view draws coloured discs on a canvas and shows the rotation as an animation. Writing this component taught us that the model should send out actions to the views instead of its new state. For the view, performing the animation is much easier if the model says rotate this plat- form that way than if it says the new configuration of my discs is the following. The mouse controller shows the five platforms. Clicking on a platform with the left button rotates it to the left (anticlockwise), and clicking with the right button rotates to the right. To add to the challenge of the game, instead of using numbered discs another view uses video coming from a live television feed. The video is split into sixteen squares covering the same positions as the corresponding discs. Only when the puz- zle is solved can one see the unscrambled picture. Because the target position of the squares is not immediately apparent, the puzzle is much harder if only this view is used. But, be- cause many views can be attached to a model at the same time, one can always get help by temporarily looking at a numbered discs view. The Hand Tracker controller uses an image processing algorithm [Medusa-snakes 95] which spots and tracks a hand in a scene from a camera. A drawing of the five platforms is superimposed on the scene; the player moves the hand to one of the plat- forms and rotates it in the required direction to trigger the corresponding action on the model. The tracking algorithm is based on active shape models. The Voice controller [Medusa-speech 95] takes verbal instruc- tions such as rotate top left platform clockwise. The speech recogniser is based on Hidden Markov Models. The use of a con- strained grammar gives a high recognition rate  a fundamen- tal requirement for usability. From our Gripple experience we can abstract some general guidelines for designing applications that support multiple user interfaces. The concerns are similar to those encountered when dealing with internationalisation of software applications. While in a single-language application it may be appropriate to embed messages to the user in strings within print statements, when the application has to be translated into many other languages it becomes necessary to add a level of indirection. Similarly, if direct programming of the I/O device may be sufficient for a basic keyboard-and-mouse application, a layer of indirection must be added if the same application is to support multiple user interfaces. The application must be represented by a model with a well- defined set of methods. Controllers may be plugged into the model: they are the means through which the user can invoke the models methods. It is not required that every controller offer all the methods of the model: for example most of the controllers described above, except the text controller, do not offer the reload method and do not allow to specify a dif- ficulty parameter to the restart method. Views show the state of the application. Having views and controllers that are not based on Tk allows us to run an application on a remote work- station and control it via local Medusa devices, which is use- ful for example in the case of a wall-mounted audio box in the corridor where there would be no workstation for traditional mouse-based interaction. Using [incr Tcl] we have written Model, View and Controller classes that handle the communication between these elements and various management issues. For any given application, the programmer now designs specialised model, view and controller classes which inherit from the generic ones. If, some months later, a new I/O device is brought along, a new view or con- troller can be derived to make it interact with the applica- tion. Note that device should be taken in its widest sense: even a primitive image analysis module capable of discriminat- ing between no motion, some motion and a lot of motion can be considered as such a device; it is just a matter of mapping this three-way output onto some useful subset of the methods offered by the model. An important development is the concept of using the MVC para- digm on a finer granularity than the whole application. It is quite useful, for example, to represent a communication link between two places with its own model and to be able to view this link in many ways: with audio and video for a workstation to workstation link, with audio only when one of the endpoints does not have a screen and so on. The application will always have to establish and shut down the communication link regard- less of whether it includes video or not, so this abstraction helps us in isolating the internals of the application from the implementation details of how to address the various mul- timedia devices. The core of a conferencing application is in- dependent of the nature of the links between the participants: ringing, accepting or rejecting calls, adding new participants and so on are actions that will always be present whatever the medium. According to the resources available at the different endpoints and to the users preferences, each link will be viewed in the most appropriate way. This novel way of applying the MVC paradigm appears to be quite promising and may well be even more useful to us than the ability to control an application with different user in- terfaces. 6. Conclusions Most if not all of the multimedia environments based on Tcl/Tk, including Medusa, accept the wisdom that Tcl/Tk must only have the roles of glue, control and user interface, while the multimedia processing must be done elsewhere. This is a fundamental principle. But we believe that delegating the media processing layer to a more efficient implementation language, while necessary, is not sufficient: distributed multimedia applications based on plain Tcl without any added structure will easily become too complex to manage and maintain. We have shown three main areas in which we have experienced growing complexity and we have shown our adopted or planned solutions to these problems. (1) It is not trivial to efficiently control parallel execu- tion through a single-threaded process, but it helps to keep the Tcl side single-threaded for simplicity. The introduc- tion of an appropriate programming construct solves some im- portant control problems. (2) Modules are a good first layer, but they are too low level for building large applications. There is a need for a library of reusable building blocks that are semantically high level and syntactically simple. The problems of build- ing these are harder than those of composing Tk widgets. This is due to the data connections and the cardinality is- sues associated with a distributed heterogeneous system. [incr Tcl], while not in itself a complete solution, is a valuable tool which helps a lot in grouping atoms to create reusable components. (3) The many media under which an application can present the same semantic contents to the user should not be handled on an ad-hoc basis without a unifying abstraction: the Model- View-Controller provides this. By supporting our applications environment with the structure introduced by these abstractions we can tame the inherent com- plexity of distributed multimedia programs while reaping the benefits of reconfigurability and ease of prototyping that Tcl/Tk is rightly acclaimed for. 7. Acknowledgements Many people at ORL have been working on Medusa bringing their essential contribution at many levels, from the design of the hardware and of the ATMos operating system to the networking, modules and applications layers. Everybody in the Medusa soft- ware team (Martin Brown, Paul Fidler, Tim Glauert, Alan Jones, Ferdi Samaria, Harold Syfrig and the authors) has been in- volved at some stage with the Tcl layer of Medusa. Of specific relevance to this paper Tim Glauert, co-designer of the Medusa software architecture, wrote the threaded C++ API and Frazer Bennet, as well as tracking down some subtle threads bugs, patiently rebuilt our gigantic interpreter every time we needed to pull in a new release of our many exten- sions. Many thanks are also due to the authors of the software that our interpreter is built out of: John Ousterhout for Tcl/Tk, Michael McLennan for [incr Tcl], Brian Smith and his team at Berkeley for Tcl-DP. 8. References [Badges 94] ANDY HARTER, ANDY HOPPER, A Distributed Location System for the Active Office, IEEE Network, Vol. 8, No. 1 [incr Tcl 93] MICHAEL J. MCLENNAN, [incr Tcl] - Object-Oriented Program- ming in Tcl, Proceedings of the Tcl/Tk Workshop, University of California at Berkeley, June 10-11, 1993. [incr Tk 94] MICHAEL J. MCLENNAN, [incr Tk] - Building Extensible Widgets with [incr Tcl], Proceedings of Tcl/Tk Workshop, New Or- leans, June 1994. [Man-Month 75] FREDERICK P. BROOKS, The Mythical Man-Month, Addison-Wesley, 1975, 1982 [Medusa-ICMCS 94] STUART WRAY, TIM GLAUERT, ANDY HOPPER, The Medusa Applica- tions Environment, Proceedings of the International Confer- ence on Multimedia Computing and Systems, Boston MA, May 1994. An extended version appeared in IEEE Multimedia, Vol. 1 No. 4, Winter 1994. Also available as ORL Technical Report 94.3. [Medusa-snakes 95] TONY HEAP, FERDINANDO SAMARIA, Real-Time Hand Tracking and Gesture Recognition using Smart Snakes, ORL Technical Report 95.1 [Medusa-speech 95] ROBERT WALKER, An Application Framework For Speech Recogni- tion in Medusa, Third Year Project Report for the Computer Science Tripos, University of Cambridge, 1995 [Medusa-Tcl 94] FRANK STAJANO, Writing Tcl Programs in the Medusa Applica- tions Environment, Proceedings of Tcl/Tk Workshop, New Or- leans, June 1994. Also available as ORL Technical Report 94.7. [Medusa-video 95] ANDY HOPPER, The Medusa Applications Environment, ORL Tech- nical Report 94.12 (video). Also available as an MPEG- encoded movie that can be viewed online from our web server. [Pandora 90] ANDY HOPPER, Pandora  An Experimental System for Multimedia Applications, ACM Operating Systems Review, Vol 24, No.2 April 1990. Also available as ORL Technical Report 90.1. [Smalltalk 90] PHILIP D. GRAY, RAMZAN MOHAMED, Smalltalk-80: A Practical Introduction, Pitman, 1990. The ORL Technical Reports are available for download on the Internet through Olivetti Research Limiteds web server at https://www.cam-orl.co.uk/ or via anonymous ftp at ftp://ftp.cam-orl.co.uk/pub/docs/ORL ***END***