From schmidt@tango.ICS.UCI.EDU Mon Mar 7 17:32:55 1994 Received: from paris.ics.uci.edu by usenix.ORG (4.1/1.29-emg890317) id AA26710; Mon, 7 Mar 94 17:32:26 PST Received: from tango.ics.uci.edu by paris.ics.uci.edu id aa14649; 7 Mar 94 17:32 PST To: Lilia Carol Scott Subject: Re: Your online submission to the C++ conference proceedings In-Reply-To: Your message of "Mon, 07 Mar 1994 16:21:41 PST." <9403080021.AA26368@usenix.ORG> Date: Mon, 07 Mar 1994 17:32:23 -0800 From: "Douglas C. Schmidt" Message-Id: <9403071732.aa14649@paris.ics.uci.edu> Status: R Hi, Here is the text-version of my USENIX paper! Doug ---------------------------------------- ASX: An Object-Oriented Framework for Developing Distributed Applications Douglas C. Schmidt schmidt@ics.uci.edu Department of Information and Computer Science University of California, Irvine, CA 92717, (714) 856-4105 Abstract The ADAPTIVE Service eXecutive (ASX) is a highly modular and extensible object-oriented framework that simplifies the development and configuration of distributed applications on shared memory multi-processor platforms. This paper describes the structure and functionality of the ASX framework's object-oriented architecture. In addition, the paper presents the results of performance experiments conducted using ASX-based implementations of connection-oriented and connectionless protocols from the TCP/IP protocol family. These experiments measure the performance impact of alternative methods for parallelizing communication protocol stacks. Throughout the paper, examples are presented to indicate how the use of object-oriented techniques facilitate application extensibility, component reuse, and performance enhancement. Section 1 Introduction Distributed computing is a promising technology for improving collaboration through connectivity and interworking; performance through parallel processing; reliability and availability through replication; scalability, extensibility, and portability through modularity; and cost effectiveness through resource sharing and open systems. Despite these benefits, distributed applications (such as on-line transaction processing systems, global mobile communication systems, distributed object managers, video-on-demand servers, and communication subsystem protocol stacks) are often significantly more complex to develop and configure than non-distributed applications. A significant portion of this complexity arises from limitations with conventional tools and techniques used to develop distributed application software. Conventional application development environments (such as UNIX, Windows NT, and OS/2) lack type-safe, portable, re-entrant, and extensible system call interfaces and component libraries. For instance, endpoints of communication in the widely used socket network programming interface are identified via weakly-typed I/O descriptors that increase the potential for subtle run-time errors . Another major source of complexity arises from the widespread use of development techniques based upon algorithmic decomposition , which limit the extensibility, reusability, and portability of distributed applications. Object-oriented techniques offer a variety of principles, methods, and tools that help to alleviate much of the complexity associated with developing distributed applications. To illustrate how these techniques are being successfully applied in several research and commercial settings, this paper describes the structure and functionality of the ADAPTIVE Service eXecutive (ASX). ASX is an object-oriented framework containing automated tools and reusable components that collaborate to simplify the development, configuration, and reconfiguration of distributed applications on shared memory multi-processor platforms. Components in the ASX framework are designed to decouple (1) application-independent components provided by the framework that handle interprocess communication, event demultiplexing, explicit dynamic linking, concurrency, and service configuration from (2) application-specific components inherited or instantiated from the framework that perform the services in a particular distributed application. The primary unit of configuration in the ASX framework is the service. A service is a portion of a distributed application that offers a single processing capability to communicating entities. Services may be simple (such as returning the current time-of-day) or highly complex (such as a real-time distributed PBX event traffic monitor ). By employing object-oriented techniques to decouple the application-specific service functionality from the reusable application-independent framework mechanisms, ASX facilitates the development of applications that are significantly more extensible and portable than those based on conventional algorithmic decomposition techniques. For example, it is possible to dynamic reconfigure one or more services in an ASX-based application without requiring the modification, recompilation, relinking, or restarting of a running system. In addition to describing the object-oriented architecture of the ASX framework, this paper examines results obtained by using the framework to conduct experiments on protocol stack performance in multi-processor-based communication subsystems. In the experiments, the ASX components help control for several relevant confounding factors (such as protocol functionality, concurrency control schemes, and application traffic characteristics) in order to precisely measure the performance impact of different methods for parallelizing communication protocol stacks. For example, in the experiments described in Section , connectionless and connection-oriented protocol stacks were developed by specializing existing components in the ASX framework via techniques involving inheritance and parameterized types. These techniques hold the protocol functionality constant while allowing the parallel processing structure of the protocol stacks to be altered systematically in a controlled manner. This paper is organized as follows: Section 2 outlines the primary features of the ASX framework and describes its object-oriented architecture, Section 3 examines empirical results from experiments conducted using the framework to parallelize communication protocol stacks; and Section 4 presents concluding remarks. Section 2 The ADAPTIVE Service eXecutive Framework Overview The ADAPTIVE Server eXecutive (ASX) is an object-oriented framework that is specifically targeted for the domain of distributed applications. The framework simplifies the construction of distributed applications by improving the modularity, extensibility, reusability, and portability of both the application-specific network services and the application-independent OS interprocess communication (IPC), demultiplexing, explicit dynamic linking, and concurrency mechanisms that these services utilize. A framework is an integrated collection of components that collaborate to produce a reusable architecture for a family of applications. Object-oriented frameworks are becoming increasingly popular as a means to simplify and automate the development and configuration process associated with complex application domains such as graphical user interfaces, databases, operating system kernels, and communication subsystems. The components in a framework typically include classes (such as message managers, timer-based event managers, demultiplexers, and assorted protocol functions and mechanisms ), class hierarchies (such as an inheritance lattice of mechanisms for local and remote interprocess communication), class categories (such as event demultiplexers ), and objects (such as a service dispatch table). By emphasizing the integration and collaboration of application-specific and application-independent components, frameworks enable larger-scale reuse of software compared with simply reusing individual classes or stand-alone functions. The ASX framework incorporates concepts from several other modular communication frameworks including System V STREAMS, the x -kernel, and the Conduit. These frameworks all contain features that support the flexible configuration of communication subsystems by inter-connecting building-block protocol and service components. In general, these frameworks encourage the development of standard reusable communication-related components by decoupling application-specific processing functionality from the surrounding framework infrastructure. As described below, the ASX framework also contains additional features that help to further decouple application-specific service functionality from (1) the type of locking mechanisms used to synchronize access to shared objects, (2) the use of message-based vs. task-based parallel processing techniques, and (3) the use of kernel-level vs. user-level execution agents. The Object-Oriented Architecture of ASX The architecture of the ASX framework was developed incrementally by generalizing from extensive design and implementation experience with a range of distributed applications including on-line transaction processing systems , real-time PBX performance monitoring systems , and multi-processor-based communication subsystems. After building several prototypes and iterating through a number of alternative designs, the class categories illustrated in Figure were identified and implemented. A class category is a collection of components that collaborate to provide a set of related services such as communication subsystem services used to implement protocol stacks. A complete distributed application may be formed by combining components in each of the following class categories via C++ language features such as inheritance, aggregation, and template instantiation: a. Stream Class Category -- These components are responsible for coordinating the configuration and run-time execution of a Stream, which is an object containing a set of hierarchically-related services (such as the layers in a communication protocol stack) defined by an application b. Reactor Class Category -- These components are responsible for demultiplexing temporal events generated by a timer-driven callout queue, I/O events received on communication ports, and signal-based events and dispatching the appropriate pre-registered handler(s) to process these events c. Service Configurator Class Category -- These components are responsible for dynamically linking or dynamically unlinking services into or out of the address space of an application at run-time d. Concurrency Class Category -- These components are responsible for spawning, executing, synchronizing , and gracefully terminating services at run-time via one or more threads of control within one or more processes e. IPC SAP Class Category -- These components encapsulate standard OS local and remote IPC mechanisms (such as sockets and TLI) within a more type-safe and portable object-oriented interface Lines connecting the class categories in Figure indicate dependency relationships. For example, components that implement the application-specific services in a particular distributed application depend on the Stream components, which in turn depend on the Service Configurator components. Since components in the Concurrency class category are used throughout the application-specific and application-independent portions of the ASX framework they are marked with the global adornment. Note that the ``namespaces'' feature accepted recently by the ANSI C++ committee provides explicit C++ language support for these types of class category relationships. This section examines the main components in each class category. Relationships between components in the ASX framework are illustrated throughout the paper via Booch notation. Solid rectangles indicate class categories, which combine a number of related classes into a common name space. Solid clouds indicate objects; nesting indicates composition relationships between objects; and undirected edges indicate some type of link exists between two objects. Dashed clouds indicate classes; directed edges indicate inheritance relationships between classes; and an undirected edge with a small circle at one end indicates either a composition or uses relation between two classes. The Stream Class Category Components in the Stream class category are responsible for coordinating one or more Streams. A Stream is an object used to configure and execute application-specific services into the ASX framework. As illustrated in Figure , a Stream contains a series of inter-connected Modules that may be linked together by developers at installation-time or by applications at run-time. Modules are objects that developers use to decompose the architecture of a distributed application into a series of inter-connected, functionally distinct layers. Each layer implements a cluster of related service-specific functions (such as an end-to-end transport service, a presentation layer formatting service, or a real-time PBX signal routing service). Every Module contains a pair of Queue objects that partition a layer into its constituent read-side and write-side service-specific processing functionality. Any layer that performs multiplexing and demultiplexing of message objects between one or more related Streams may be developed using a Multiplexor object. A Multiplexor is a container class that provides mechanisms to route messages between one or more Modules in a collection of related Streams. A complete Stream is represented as an inter-connected series of independent Module and/or Multiplexor objects that communicate by exchanging messages with adjacent objects. Modules and Multiplexors may be joined together in essentially arbitrary configurations in order to satisfy application requirements and enhance component reuse. The ASX framework uses C++ language features such as inheritance and parameterized types to enable developers to incorporate service-specific functionality into a Stream without requiring the modification of the basic framework components. For example, incorporating a new service layer into a Stream involves (1) inheriting from the Queue interface and selectively overriding several member functions (described below) in the subclass to implement service-specific functionality, (2) allocating a new Module that contains two instances (one for the read-side and one for the write-side) of the service-specific Queue subclass, and (3) inserting the Module into a Stream object. Service-specific functions in adjacent inter-connected Queues collaborate by exchanging typed messages via a uniform message passing interface. To avoid reinventing familiar terminology, many C++ class names in the Stream class category correspond to similar componentry available in the System V STREAMS framework. However, the techniques used to support extensibility and concurrency in the two frameworks are significantly different. For example, adding service-specific functionality to the ASX Stream classes is performed by inheriting from several interfaces and implementations defined by existing framework components. Using inheritance to add service-specific functionality provides greater type-safety than the pointer-to-function idiom used in System V STREAMS. As described in Section below, the ASX Stream classes also completely redesign and reimplement the co-routine-based, ``weightless'' A weightless process executes on a run-time stack that is also used by other processes. This greatly complicates programming and increases the potential for deadlock. For example, a weightless process may not suspend execution to wait for resources to become available or events to occur . service processing mechanisms used in System V STREAMS. These ASX changes enable more effective use of multiple PEs on shared memory multi-processing platforms by reducing the likelyhood of deadlock and simplifying flow control between Queues in a Stream. The remainder of this section discusses the primary components of the ASX Stream class category ( i.e., Stream class, the Module class, the Queue class, and the Multiplexor class) in detail. a. The STREAM Class The STREAM class defines the application interface to a Stream. A STREAM object contains a stack of one or more hierarchically-related services that provide applications with a bi-directional get / put -style interface for sending and receiving data and control messages to the service-specific Module layers within a particular Stream. The STREAM class also implements mechanisms that allow applications to configure a Stream at run-time by inserting and removing objects of the Module class that is described next. b. The Module Class A Module object is used to attach a layer of service-specific functionality together with the other Module objects that are connected together to form a Stream. By default, two standard Module objects ( Stream Head and Stream Tail ) are installed automatically when a Stream is opened. These standard Module s interpret pre-defined framework control messages that may be passed through a Stream at run-time. For incoming data, the Stream Tail class typically transforms network packets received by network interfaces or pseudo-devices into a canonical internal message format recognized by other components in a Stream. Likewise, for outgoing data it transforms messages from their internal format into network packets. The Stream Head class provides a message buffering interface between an application and a Stream. I/O between an application and a Stream occurs synchronously when the Stream Head Module appears at the top of a Stream. However, if the Stream Head is omitted, messages percolating up a Stream are delivered into the address space of an application asynchronously. c. The Queue Abstract Class Each Module object contains a pair of pointers to objects that are service-specific subclasses of the Queue abstract class. An abstract class in C++ provides an interface that contains at least one pure virtual member function. A pure virtual member function provides only an interface declaration, without any accompanying definition. Subclasses of an abstract class must provide definitions for all its pure virtual member functions before any objects of the class may be instantiated. One Queue subclass handles read-side processing for messages sent upstream to its Module layer and the other handles write-side processing messages send downstream to its Module layer. The Queue class is an abstract class since its interface defines four pure virtual member functions: open , close , put , and svc . Defining Queue as an abstract class enhances reuse by decoupling the general-purpose components provided by the Stream class category from the service-specific subclasses that inherit from and use these components. Likewise, the use of pure virtual member functions allows the C++ compiler to ensure that a subclass of Queue honors its obligation to provide the following service-specific functionality: Initialization and Termination Member Functions: Subclasses derived from Queue must implement open and close member functions that perform service-specific Queue initialization and termination activities. These activities typically allocate and free resources such as connection control blocks, I/O descriptors, and synchronization locks. The open and close member functions of a Module 's write-side and read-side Queue subclasses are invoked automatically by the ASX framework when the Module is inserted or removed from a Stream, respectively. Service-Specific Processing Member Functions: Subclasses of Queue also must define the put and svc member functions, which perform service-specific processing functionality on messages that arrive at a Module layer in a Stream. When messages arrive at the head or the tail of a Stream, they are escorted through a series of inter-connected Queues as a result of invoking the put and/or svc member function of each Queue in the Stream. A put member function is invoked when a Queue at one layer in a Stream passes a message to an adjacent Queue in another layer. The put member function runs synchronously with respect to its caller, i.e., it borrows the thread of control from the Queue that originally invoked its put member function. This thread of control typically originates either ``upstream'' from an application process, ``downstream'' from a pool of threads that handle I/O device interrupts , or internal to the Stream from an event dispatching mechanism (such as a timer-driven callout queue used to trigger retransmissions in a connection-oriented transport protocol Module ). The svc member function is used to perform service-specific processing asynchronously with respect to other Queue s in its Stream. Unlike put , the svc member function is not directly invoked from an adjacent Queue . Instead, it is invoked by a separate thread associated with its Queue . This thread executes the Queue 's svc member function, which runs an event loop that continuously blocks waiting for messages to arrive on the Queue 's Message List . A Message List is a standard component in a Queue that is used to buffer a sequence of data messages and control messages for subsequent processing in the svc member function. When messages arrive, the svc member function dequeues the messages and performs the Queue subclass's service-specific processing tasks. Within a put or svc member function, a message may be forwarded to an adjacent Queue in the Stream by passing the message via the put next utility member function. Put next calls the put member function of the next Queue residing in an adjacent layer. This invocation of put may borrow the thread of control from the caller and handle the message immediately ( i.e., the synchronous processing approach illustrated in Figure (1)). Conversely, the put member function may enqueue the message and defer handling to its svc member function that is executing in a separate thread of control ( i.e., the asynchronous processing approach illustrated in Figure (2)). As discussed in Section , the particular processing approach that is selected often has a significant impact on performance and ease of programming. In addition to the four pure virtual member function interfaces, each Queue also contains a number of reusable utility member functions (such as put next , getq , and putq ) that may be used by service-specific subclasses to query and/or modify the internal state of a Queue object. This internal state includes a pointer to the adjacent Queue on a Stream, a back-pointer to a Queue 's enclosing Module (which enables it to locate its sibling), a Message List , and a pair of high and low water mark variables that are used to implement layer-to-layer flow control between adjacent Modules in a Stream. The high water mark indicates the amount of bytes of messages the Message List is willing to buffer before it becomes flow controlled. The low water mark indicates the level at which a previously flow controlled Queue is no longer considered to be flow controlled. Two types of messages may appear on a Message List : simple and composite. A simple message contains a single Message Block and a composite message contains multiple Message Block s linked together. Composite messages generally consist of a control block followed by one or more data blocks. A control block contains bookkeeping information (such as destination addresses and length fields), whereas data blocks contain the actual contents of a message. The overhead of passing Message Block s between Queue s is minimized by passing pointers to messages rather than copying data. The Multiplexor Class A Multiplexor is a C++ container class that provides mechanisms for demultiplexing messages between one or more Modules in a collection of inter-related Streams. Multiplexor s are typically used to route Message Block s between inter-related streams (such as those used to implement complex protocol families in the Internet and the ISO OSI reference models). A Multiplexor is implemented as a C++ template class parameterized by an external identifier (such as a network address, port number, or type-of-service field) and an internal identifier (such as a pointer to a Module ). These template parameters are instantiated by service-specific Stream components to produce specialized Multiplexor objects that perform efficient intra-Stream message routing. Each Multiplexor object contains a set of Modules that may be linked above and below the Multiplexor in essentially arbitrary configurations. The Reactor Class Category Components in the Reactor class category are responsible for demultiplexing (1) temporal events generated by a timer-driven callout queue, (2) I/O events received on communication ports, and (3) signal events and dispatching the appropriate pre-registered handler(s) to process these events. The Reactor encapsulates the functionality of the select and poll I/O demultiplexing mechanisms within a portable and extensible C++ wrapper. Select and poll are UNIX system calls that detect the occurrence of different types of input and output events on one or more I/O descriptors simultaneously. To improve portability, the Reactor provides the same interface regardless of whether select or poll is used as the underlying I/O demultiplexor. In addition, the Reactor contains mutual exclusion mechanisms designed to perform callback-style programming correctly and efficiently in a multi-threaded event processing environment. The Reactor contains a set of member functions illustrated in Figure . These member functions provide a uniform interface to manage objects that implement various types of service-specific handlers. Certain member functions register, dispatch, and remove I/O descriptor-based and signal-based handler objects from the Reactor . Other member functions schedule, cancel, and dispatch timer-based handler objects. As shown in Figure , these handler objects all derive from the Event Handler abstract base class. This class specifies an interface for event registration and service handler dispatching. The Reactor uses the virtual member functions in the Event Handler interface to integrate the demultiplexing of I/O descriptor-based and signal-based events together with timer-based events. I/O descriptor-based events are dispatched via the handle input , handle output , handle exceptions , and handle signal member functions. Timer-based events are dispatched via the handle timeout member function. Subclasses of Event Handler may augment the base class interface by defining additional member functions and data members. In addition, virtual member functions in the Event Handler interface may be selectively overridden to implement application-specific functionality. Once the pure virtual member functions in the Event Handler base class have been supplied by a subclass, an application may define an instance of the resulting composite service handler object. When an application instantiates and registers a composite I/O descriptor-based service handler object, the Reactor extracts the underlying I/O descriptor from the object. This descriptor is stored in a table along with I/O descriptors from other registered objects. Subsequently, when the application invokes its main event loop, these descriptors are passed as arguments to the underlying OS event demultiplexing system call (e.g., select or poll ). As events associated with a registered handler object occur at run-time, the Reactor automatically detects these events and dispatches the appropriate member function(s) of the service handler object associated with the event. This handler object then becomes responsible for performing its service-specific functionality before returning control to the main Reactor event-loop. The Service Configurator Class Category Components in the Service Configurator class category are responsible for explicitly linking or unlinking services dynamically into or out of the address space of an application at run-time. Explicit dynamic linking enables the configuration and reconfiguration of application-specific services without requiring the modification, recompilation, relinking, or restarting of an executing application. The Service Configurator components discussed below include the the Service Object inheritance hierarchy (Figure (1)), the Service Repository class (Figure (2)), and the Service Config class (Figure (3)). The Service Object Inheritance Hierarchy: The Service Object class is the focal point of a multi-level hierarchy of types related by inheritance. The interfaces provided by the abstract classes in this type hierarchy may be selectively implemented by service-specific subclasses in order to access Service Configurator features. These features provide transparent dynamic linking, service handler registration, event demultiplexing, service dispatching, and run-time control of services (such as suspending and resuming a service temporarily). By decoupling the service-specific portions of a handler object from the underlying Service Configurator mechanisms, the effort necessary to insert and remove services from an application at run-time is significantly reduced. The Service Object inheritance hierarchy consists of the Event Handler and Shared Object abstract base classes, as well as the Service Object abstract derived class. The Event Handler class was described above in the Reactor Section . The behavior of the other classes in the Service Configurator class category is outlined below: The Shared Object Abstract Base Class: This abstract base class specifies an interface for dynamically linking and unlinking objects into and out of the address space of an application. This abstract base class exports three pure virtual member functions: init , fini , and info . These functions impose a contract between the reusable components provided by the Service Configurator and service-specific objects that utilize these components. By using pure virtual member functions, the Service Configurator ensures that a service handler implementation honors its obligation to provide certain configuration-related information. This information is subsequently used by the Service Configurator to automatically link, initialize, identify, and unlink a service at run-time. The init member function serves as the entry-point to an object during run-time initialization. This member function is responsible for performing application-specific initialization when an object derived from Shared Object is dynamically linked. The info member function returns a humanly-readable string that concisely reports service addressing information and documents service functionality. Clients may query an application to retrieve this information and use it to contact a particular service running in the application. The fini member function is called automatically by the Service Configurator class category when an object is unlinked and removed from an application at run-time. This member function typically performs termination operations that release dynamically allocated resources (such as memory or synchronization locks). The Shared Object base class is defined independently from the Event Handler class to clearly separate their two orthogonal sets of concerns. For example, certain applications (such as a compiler or text editor) might benefit from dynamic linking, though it might not require timer-based, signal-based, or I/O descriptor-based event demultiplexing. Conversely, other applications (such as an ftp server) require event demultiplexing, but might not require dynamic linking. The Service Object Abstract Derived Class: Support for dynamic linking, event demultiplexing, and service dispatching is typically necessary to automate the dynamic configuration and reconfiguration of application-specific services in a distributed system. Therefore, the Service Configurator class category defines the Service Object class, which is a composite class that combines the interfaces inherited from both the Event Handler and the Shared Object abstract base classes. During development, application-specific subclasses of Service Object may implement the suspend and resume virtual member functions in this class. The suspend and resume member functions are invoked automatically by the Service Configurator class category in response to certain external events (such as those triggered by receipt of the UNIX SIGHUP signal). An application developer may define these member functions to perform actions necessary to suspend a service object without unlinking it completely, as well as to resume a previously suspended service object. In addition, application-specific subclasses must implement the four pure virtual member functions ( init , fini , info , and get fd ) that are inherited (but not defined) by the Service Object subclass. To provide a consistent environment for defining, configuring, and using Streams, the Queue class in the Stream class category is derived from the Service Object inheritance hierarchy (illustrated in Figure (1)). This enables hierarchically-related, application-specific services to be linked and unlinked into and out of a Stream at run-time. The Service Repository Class: The ASX framework supports the configuration of applications that contain one or more Streams, each of which may have one or more inter-connected service-specific Modules . Therefore, to simplify run-time administration, it may be necessary to individually and/or collectively control and coordinate the Service Object s that comprise an application's currently active services. The Service Repository is an object manager that coordinates local and remote queries and updates involving the services offered by an application. A search structure within the object manager binds service names (represented as ASCII strings) with instances of composite Service Object s (represented as C++ object code). A service name uniquely identifies an instance of a Service Object stored in the repository. Each entry in the Service Repository contains a pointer to the Service Object portion of an service-specific C++ derived class (shown in Figure (2)). This enables the Service Configurator classes to automatically load, enable, suspend, resume, or unload Service Object s from a Stream dynamically. The repository also maintains a handle to the underlying shared object file for each dynamically linked Service Object . This handle is used to unlink and unload a Service Object from a running application when its service is no longer required. An iterator class is also supplied along with the Service Repository . This class may be used to visit every Service Object in the repository without compromising data encapsulation. The Service Config Class: As illustrated in Figure (3), the Service Config class integrates several other ASX framework components (such as the Service Repository , the Service Object inheritance hierarchy, and the Reactor ). The resulting composite Service Config component is used to automate the static and/or dynamic configuration of concurrent applications that contain one or more Streams. The Service Config class uses a configuration file to guide its configuration and reconfiguration activities. Each application may be associated with a distinct configuration file. This file characterizes the essential attributes of the service(s) offered by an application. These attributes include the location of the shared object file for each dynamically linked service, as well as the parameters required to initialize a service at run-time. By consolidating service attributes and installation parameters into a single configuration file, the administration of Streams within an application is simplified. Application development is also simplified by decoupling the configuration and reconfiguration mechanisms provided by the framework from the application-specific attributes and parameters specified in a configuration file. Further information on the configuration format utilized by the Service Config class is presented in . The Concurrency Class Category Components in the Concurrency class category are responsible for spawning, executing, synchronizing, and gracefully terminating services at run-time via one or more threads of control within one or more processes. The following section discusses the two main groups of classes ( Synch and Thr Manager ) in the Concurrency class category. The Synch Classes Components in the Stream , Reactor , and Service Configurator class categories described above contain a minimal amount of internal locking mechanisms to avoid over-constraining the granularity of the synchronization strategies used by an application. In particular, only components in the ASX framework that would not function correctly in a multi-threaded environment (such as enqueueing Message Block s onto a Message List , demultiplexing Message Blocks onto internal Module addresses stored in a Multiplexor object, or registering an Event Handler object with the Reactor ) are protected by synchronization mechanisms provided by the Synch classes. The Synch classes provide type-safe C++ interfaces for two basic types of synchronization mechanisms: Mutex and Condition objects . A Mutex object is used to ensure the integrity of a shared resource that may be accessed concurrently by multiple threads of control. A Condition object allows one or more cooperating threads to suspend their execution until a condition expression involving shared data attains a particular state. The ASX framework also provides a collection of more sophisticated concurrency control mechanisms (such as Monitors , Readers Writer locks, and recursive Mutex objects) that build upon the two basic synchronization mechanisms described below. A Mutex object may be used to serialize the execution of multiple threads by defining a critical section where only one thread executes its code at a time. To enter a critical section, a thread invokes the Mutex::acquire member function. To leave a critical section, a thread invokes the Mutex::release member function. These two member functions are implemented via adaptive spin-locks that ensure mutual exclusion by using an atomic hardware instruction. An adaptive spin-lock operates by polling a designated memory location using the hardware instruction until (1) the value at this location is changed by the thread that currently owns the lock (signifying that the lock has been released and may now be acquired) or (2) the thread that is holding the lock goes to sleep (at which point the thread that is spinning also goes to sleep to avoid needless polling). On a shared memory multi-processor, the overhead incurred by a spin-lock is relatively minor since polling affects only the local instruction and data cache of the CPU where the thread is spinning. A spin-lock is a simple and efficient synchronization mechanism for certain types of short-lived resource contention. For example, in the ASX framework, each Message List in a Queue object contains a Mutex object that prevents race conditions from occurring when Message Block s are enqueued and dequeued concurrently by multiple threads of control running in adjacent Queues . A Condition object is a somewhat different synchronization mechanism that enables a thread to suspend itself indefinitely (via the Condition::wait member function) until a condition expression involving shared data attains a particular state. When another cooperating thread indicates that the state of the shared data has changed (by invoking the Condition::signal member function), the associated Condition object wakes up the suspended thread. The newly awakened thread then re-evaluates the condition expression and potentially resumes processing if the shared data is now in an appropriate state. For example, each Message List in the ASX framework contains a pair of Condition objects (named notfull and notempty ), in addition to a Mutex object. These Condition objects implement flow control between adjacent Queues. When one Queue attempts to insert a Message Block into a neighboring Queue that has reached its high water mark, the Message List::enqueue member function performs a wait operation on the notfull condition object. This operation atomically relinquishes the PE and puts the calling thread to sleep awaiting notification when flow control conditions abate. Subsequently, when the number of bytes in the flow controlled Queue 's Message List fall below its low water mark, the thread running the blocked Queue is automatically awakened to finish inserting the message and resume its processing tasks. Unlike Mutex objects, Condition object synchronization is not implemented with a spin-lock since there is generally no indication of how long a thread must wait for a particular condition to be signaled. Therefore, Condition objects are implemented via sleep-locks that trigger a context switch to allow other threads to execute. Section discusses the consequences of spin-locks vs. sleep-locks on application performance. The Thr Manager Class The Thr Manager class contains a set of mechanisms that manage groups of threads that collaborate to implement collective actions (such as a pool of threads that render different portions of a large image in parallel). The Thr Manager class provides a number of mechanisms (such as suspend all and resume all ) that suspend and resume a set of collaborating threads atomically. This feature is useful for distributed applications that execute one or more services concurrently. For example, when initializing a Stream composed of Module s that execute in separate threads of control and collaborate by passing messages between threads, it is important to ensure that all Queues in the Stream are completely inter-connected before allowing messages to flow through the Stream. The mechanisms in the Thr Manager class allow these initialization activities to occur atomically. The IPC SAP Class Category Components in the IPC SAP class category encapsulate standard OS local and remote IPC mechanisms (such as sockets and TLI) within a more a type-safe and portable object-oriented interface. IPC SAP stands for ``InterProcess Communication Service Access Point.'' As shown in Figure , a forest of class categories are rooted at the IPC SAP base class. These class categories includes SOCK SAP (which encapsulates the socket API), TLI SAP (which encapsulates the TLI API), SPIPE SAP (which encapsulates the UNIX SVR4 STREAM pipe API), and FIFO SAP (which encapsulates the UNIX named pipe API). Each class category in IPC SAP is itself organized as an inheritance hierarchy where every subclass provides a well-defined subset of local or remote communication mechanisms. Together, the subclasses within a hierarchy comprise the overall functionality of a particular communication abstraction (such as the Internet-domain or UNIX-domain protocol families). Inheritance-based hierarchical decomposition facilitates the reuse of code that is common among the various IPC SAP class categories. For example, the C++ interface to the lower-level UNIX OS device control system calls like fcntl and ioctl are inherited and shared by all the other components in the IPC SAP class category. Section 3 Performance Experiments on the Communication Subsystem To illustrate how the components of the ASX framework are used in practice, this section describes results from performance experiments that measure the impact of alternative methods for parallelizing communication subsystems. A communication subsystem is a distributed system that consists of protocol functions (such as routing, segmentation/reassembly, connection management, end-to-end flow control, remote context management, demultiplexing, message buffering, error protection, session control, and presentation conversions) and operating system mechanisms (such as process management, asynchronous event invocation, message buffering, and layer-to-layer flow control) that support the implementation and execution of protocol stacks that contain hierarchically-related protocol functions. Advances in VLSI and fiber optic technology are shifting performance bottlenecks from the underlying networks to the communication subsystem. Designing and implementing multi-processor-based communication subsystems that execute protocol functions and OS mechanisms in parallel is a promising technique for increasing protocol processing rates and reducing latency. To significantly increase communication subsystem performance, however, the speed-up obtained from parallel processing must outweight the context switching and synchronization overhead associated with parallel processing. A context switch is generally triggered when an executing process either voluntarily or involuntarily relinquishes the processing element (PE) it is executing upon. Depending on the underlying OS and hardware platform, performing a context switch may require dozens to hundreds of instructions due to the flushing of register windows, instruction and data caches, instruction pipelines, and translation look-aside buffers . Synchronization mechanisms are necessary to serialize access to shared objects (such as messages, message queues, protocol context records, and demultiplexing tables) related to protocol processing. Certain methods of parallelizing protocol stacks incur significant synchronization overhead from managing locks associated with processing these shared objects. A number of process architectures have been proposed as the basis for parallelizing communication subsystems. A process architecture binds one or more processing elements (PEs) together with the protocol tasks and messages that implement protocol stacks in a communication subsystem. As shown in Figure (1), the three basic elements that form the foundation of a process architecture are (1) the processing elements (PEs), which are the underlying execution agents for both protocol and application code, (2) control messages and data messages , which are typically sent and received from one or more applications or from network devices, and (3) protocol processing tasks , which perform protocol-related functions upon messages as they arrive and depart from the communication subsystem. Two fundamental types of process architectures ( task-based and message-based ) may be created by structuring the three basic process architecture elements shown in Figure (1) in different ways. Task-based process architectures are formed by binding one or more PEs to different units of protocol functionality (shown in Figure (2)). In this architecture, tasks are the active objects, whereas messages processed by the tasks are the passive objects. Parallelism is achieved by executing protocol tasks in separate PEs and passing data messages and control messages between the tasks/PEs. In contrast, message-based process architectures are formed by binding the PEs to the protocol control messages and data messages received from applications and network interfaces (as shown in Figure (3)). In this architecture, messages are the active objects, whereas tasks are the passive objects. Parallelism is achieved by escorting multiple data messages and control messages on separate PEs simultaneously through a stack of protocol tasks. Section examines how the choice of process architecture significantly affects context switch and synchronization overhead. A survey of alternative process architectures appears in . Selecting an effective process architecture is an important design decision in application domains other than communication subsystems. For example, real-time PBX monitoring systems and video-on-demand servers also perform non-communication-related tasks (such as database query processing) that benefit from a carefully structured approach to parallelism. This section focuses primarily upon the impact of process architectures on communication subsystem performance since network protocol behavior and functionality is well-understood and the terminology is relatively well-defined. Moreover, a large body of literature exists with which to compare performance results presented in Section . The remainder of this section describes relevant aspects of performance experiments that measure the impact of different process architectures on connectionless and connection-oriented protocol stacks. Synchronization overhead occurs from various mechanisms used to serialize access to shared resources. For example, protocol software typically utilizes locks to ensure that resources like messages, message queues, protocol context records, and demultiplexing tables are protected against race conditions. Certain protocol stack and process architecture combinations (such as implementing connection-oriented protocols via Message Parallelism) may incur significant synchronization overhead from managing locks associated with these shared objects . Other sources of synchronization overhead involve contention for shared hardware resources such as I/O buses and global memory. In general, hardware contention represents an upper limit on the benefits that may accrue from parallel processing. The scope of our experiments focuses on software-related techniques for reducing the impact of context switching and locking. Therefore, we do not directly consider the impact of hardware synchronization. Multi-processor Platform All experiments were conducted on an otherwise idle Sun 690MP SPARCserver, which contains 4 SPARC 40 MHz processing elements (PEs), each capable of performing at 28 MIPs. The operating system used for the experiments is release 5.3 of SunOS, which provides a multi-threaded kernel that allows multiple system calls and device interrupts to execute in parallel . All the process architectures in these experiments execute protocol tasks in separate unbound threads multiplexed over 1, 2, 3, or 4 SunOS lightweight processes (LWPs) within a process. SunOS 5.3 maps each LWP directly onto a separate kernel thread. Since kernel threads are the units of PE scheduling and execution in SunOS, this mapping enables multiple LWPs (each executing protocol processing tasks in an unbound thread) to run in parallel on the SPARCserver's PEs. Rescheduling and synchronizing a SunOS LWP involves a kernel-level context switch. The time required to perform a context switching between two LWPs was measured to be approximately 30 u secs. During this time, the OS performs system-related overhead (such as flushing register windows, instruction and data caches, instruction pipelines, and translation lookaside buffers) on the PE and therefore does not perform protocol processing. Measurements also revealed that it requires approximately 3 micro-seconds to acquire or release a Mutex object implemented with a SunOS adaptive spin-lock. Likewise, measurements indicated that approximately 90 micro-seconds are required to synchronize two LWPs using Condition objects implemented using SunOS sleep-locks. The larger amount of overhead for the Condition operations compared with the Mutex operations occurs from the more complex locking algorithms involved, as well as the additional context switching incurred by the SunOS sleep-locks that implement the Condition objects. Communication Protocols Two types of protocol stacks are used in the experiments, one based on the connectionless UDP transport protocol and the other based on the connection-oriented TCP transport protocol. The protocol stacks contain the data-link, transport, and presentation layers. Preliminary tests indicated that the PE, bus, and memory performance of the SunOS multi-processor platform was capable of processing messages through the protocol stack at a much faster rate than the platform's 10 Mbps Ethernet network interface was capable of handling. Therefore, for the process architecture experiments, the network interface was simulated with a single-copy pseudo-device driver operating in loop-back mode. For this reason, the routing and segmentation/reassembly functions of the network layer processing were omitted from these experiments since both the sender and receiver portions of the test programs reside on the same host machine. The presentation layer is included in the experiments since it represents a major bottleneck in high-performance communication systems due primarily to the large amount of data movement overhead it incurs . Both the connectionless and connection-oriented protocol stacks were developed by specializing existing components in the ASX framework via techniques involving inheritance and parameterized types. These techniques are used to hold the protocol stack functionality constant while systematically varying the process architecture. For example, each protocol layer is implemented as a Module whose read-side and write-side inherit standard interfaces and implementations from the Queue class. Likewise, synchronization and demultiplexing mechanisms required by a protocol layer or protocol stack are parameterized using template arguments that are instantiated based on the type of process architecture being tested. Data-link layer processing in each protocol stack is performed by the DLP Module . This Module transforms network packets received from a network interface or loop-back device into a canonical message format used by the Stream components. The transport layer component of the protocol stacks are based on the UDP and the TCP implementation in the BSD 4.3 Reno release. The 4.3 Reno TCP implementation contains the TCP header prediction enhancements, as well as the slow start algorithm and congestion avoidance features. The UDP and TCP transport protocols are configured into the ASX framework via the UDP and TCP Modules , respectively. Presentation layer functionality is implemented in the XDR Module using marshalling routines produced by the ONC eXternal Data Representation (XDR) stub generator. The ONC XDR stub generator automatically translates a set of type specifications into marshalling routines that encode/decode implicitly-typed messages before/after they are exchanged among hosts that may possess heterogeneous processor byte-orders. The ONC presentation layer conversion mechanisms consist of a type specification language (XDR) and a set of library routines that implement the appropriate encoding and decoding rules for built-in integral types ( e.g., char, short, int, and long) and real types ( e.g., float and double). In addition, these library routines may be combined to produce marshalling routines for arbitrary user-defined composite types (such as record/structures, unions, arrays, and pointers). Messages exchanged via XDR are implicitly-typed, which improves marshalling performance at the expense of flexibility. The XDR functions selected for both the connectionless and connection-oriented protocol stacks convert incoming and outgoing messages into and from variable-sized arrays of structures containing both integral and real values. This conversion processing involves byte-order conversions, as well as dynamic memory allocation and deallocation. Process Architectures Design of the Task-based Process Architecture Figure illustrates the ASX framework components that implement a task-based process architecture for the TCP-based connection-oriented and UDP-based connectionless protocol stacks. Protocol-specific processing for the data-link and transport layer are performed in two Modules clustered together into one thread. Likewise, presentation layer and application interface processing is performed in two Modules clustered into a separate thread. These threads cooperate in a producer/consumer manner, operating in parallel on the header and data fields of multiple incoming and outgoing messages. The LP DLP::svc and LP XDR::svc member functions perform service-specific processing in parallel within a Stream of Modules . When messages are inserted into a Queue 's Message List , the svc member function dequeues the messages and performs the Queue subclass's service-specific processing tasks (such as data-link layer processing or presentation layer processing). Depending on the ``direction'' of a message ( i.e., incoming or outgoing), each cluster of Modules performs its associated protocol functions before passing the message to an adjacent Module running asynchronously in a separate thread. Messages are not be copied when passed between adjacent Queues since threads all share a common address space. However, moving messages between threads typically invalidates per-PE data caches. The connectionless and connection-oriented task-based process architecture protocol stacks are designed in a similar manner. The primary difference is that the objects in the connectionless transport layer Module implement the simpler UDP functionality that does not generate acknowledgements, keep track of round-trip time estimates, or manage congestion windows. The design of the task-based process architecture test driver always uses PEs in multiples of two: one for the cluster of data-link and transport layer processing Modules and the other for the cluster of presentation layer and application interface processing Modules . Design of the Message-based Process Architecture Figure illustrates a message-based process architecture for the connection-oriented protocol stack. When an incoming message arrives, it is handled by the MP DLP::svc member function, which manages a pool of pre-spawned threads. Each message is associated with a separate thread that escorts the message synchronously through a series of inter-connected Queue s in a Stream by making an upcall to the put member function in the adjacent Queue at each higher layer in the protocol stack. Each put member function executes the protocol tasks associated with that layer. The MP TCP::put member function utilizes Mutex objects that serialize access to per-connection control blocks as separate messages from the same connection ascend the protocol stack in parallel. The connectionless message-based protocol stack is structured in a similar manner. However, the connectionless protocol stack perform the simpler set of UDP functionality. Unlike the MP TCP::put member function, the MP UDP::put member function processes each message concurrently and independently, without explicitly preserving inter-message ordering. This reduces the amount of synchronization operations required to locate and update shared resources. C++ Features Used to Simplify Process Architecture Implementation Many of the protocol functions, process architecture synchronization mechanisms, and ASX framework support components (such as demultiplexing and message buffering classes) are reused throughout the process architecture test programs described above. For example, process architecture-specific synchronization strategies may be instantiated by selectively instrumenting protocol functions with different types of mutual exclusion mechanisms. When combined with C++ language features such as inheritance and parameterized types, these objects help to decouple protocol processing functionality from the concurrency control scheme used by a particular process architecture. For example, objects of class Multiplexor use a Map Manager component to demultiplex incoming messages to Modules . Map Manager is a search structure container class that is parameterized by an external ID, internal ID, and a mutual exclusion mechanism, as follows: template class Map_Manager { public: // ... int find (EX_ID ex_id, IN_ID &in_id); private: MUTEX lock; // ... }; The type of MUTEX that this template class is instantiated with depends upon the particular choice of process architecture. For instance, the Map Manager used in the message-based implementation of the TCP protocol stack described in Section is instantiated with the following class parameters: typedef Map_Manager MP_Map_Manager; This particular instantiation of Map Manager locates the transport control block ( TCB ) associated with the TCP address of an incoming message. The Map Manager class uses the utex class described in Section to ensure that its find member function executes as a critical section. This prevents race conditions with other threads that are inspecting or inserting entries into the connection map in parallel. In contrast, the task-based process architecture implementation of the TCP protocol stack described in Section does not require the same type of concurrency control within a connection. In this case, demultiplexing is performed within the svc member function in the LP DLP read Queue of the data-link layer Module , which runs in its own separate thread of control. Therefore, the Map Manager used for the connection-oriented task-based process architecture is instantiated with a different MUTEX class, as follows: typedef Map_Manager LP_Map_Manager; The implementation of the acquire and release member functions in the Null Mutex class are essentially ``no-op'' inline functions that may be removed completely by the compiler optimizer. The ASX framework employs a C++ idiom that involves using a class constructor and destructor to acquire and release locks on synchronization objects, respectively . The Mutex Block class illustrated below defines a ``block'' of code over which a Mutex object is acquired and then automatically released when the block of code is exited and the object goes out of scope: template class Mutex_Block { public: Mutex_Block (MUTEX &m): mutex (m) this->mutex.acquire (); Mutex_Block (void) this->mutex.release (); private: MUTEX &mutex; }; This C++ idiom is used in the implementation of the Map Manager::find member function, as follows: template int Map_Manager::find (EX_ID ex_id, IN_ID &in_id) { Mutex_Block monitor (this->lock); if (/* ex_id is successfully located */) return 0; else return -1; } When the find member function returns, the destructor for the Mutex Block object automatically releases the Mutex lock. Note that the Mutex lock is release regardless of which arm in the if/else statement returns from the find member function. In addition, this C++ idiom also properly releases the lock if an exception is raised during processing in the body of the find member function. Process Architecture Experiment Results This section presents measurement results obtained from the data reception portion of the connection-oriented and connectionless protocol stacks implemented using the task-based and message-based process architectures described above. Three types of measurements were obtained for each combination of process architecture and protocol stack: total throughput , context switching overhead , and synchronization overhead . Total throughput was measured by holding the protocol functionality, application traffic patterns, and network interfaces constant and systematically varying the process architecture to determine the resulting performance impact. Each benchmarking session consisted of transmitting 10,000 4K byte messages through an extended version of the widely available ttcp protocol benchmarking tool. The original ttcp program measures the processing resources and overall user and system time required to transfer data between a transmitter process and a receiver process communicating via TCP or UDP. The flow of data is uni-directional, with the transmitter flooding the receiver with a user-selected number of data buffers. Various sender and receiver parameters (such as the number of data buffers transmitted and the size of application messages and protocol windows) may be selected at run-time. The version of ttcp used in our experiments was enhanced to allow a user-specified number of communicating applications to be measured simultaneously. This feature measured the impact of multiple connections on process architecture performance (two connections were used to test the connection-oriented protocols). The ttcp program was also modified to use the ASX-based protocol stacks configured via the process architectures described in Section . To measure the impact of parallelism on throughput, each test was run using 1, 2, 3, and 4 PEs successively, using 1, 2, 3, or 4 LWPs, respectively. Furthermore, each test was performed several times to detect the amount of spurious interference incurred from other internal OS tasks (the variance between test runs proved to be insignificant). Context switching and synchronization measurements were obtained to help explain differences in the throughput results. These metrics were obtained from the SunOS 5.3 /proc file system, which records the number of voluntary and involuntary context switches incurred by threads in a process, as well as the amount of time spent waiting to obtain and release locks. Figure illustrates throughput (measured in Mbits/sec) as a function of the number of PEs for the task-based and message-based process architectures used to implement the connection-oriented (CO) and connectionless (CL) protocol stacks. The results in this figure indicate that parallelization definitely improves performance. Each 4 Kbyte message effectively required an average of between 3.2 and 3.9 milliseconds to process when 1 PE was used, but only .9 to 1.9 milliseconds to process when 4 PEs were used. However, the message-based process architectures significantly outperformed their task-based counterparts as the number of PEs increased from 1 to 4. For example, the performance of the connection-oriented task-based process architecture was only slightly better using 4 PEs (approximately 16 Mbits/sec, or 1.92 milliseconds per-message processing time) than the message-based process architecture was using 2 PEs (14 Mbits/sec, or 2.3 milliseconds per-message processing time). Moreover, if a larger number of PEs had been available, it appears likely that the performance improvement gained from parallel processing in the task-based process architectures would have leveled off sooner than the message-based tests due to the higher rate of growth for context switching and synchronization shown in Figure and Figure . Figure illustrates the number of voluntary and involuntary context switches incurred by the process architectures measured in this study. A voluntary context switch is triggered when a thread puts itself to sleep until certain resources (such as I/O devices or synchronization locks) become available. For example, when a protocol task attempts to acquire a resource that may not become available immediately (such as obtaining a message from an empty Message List ), the protocol task puts itself to sleep by invoking the wait member function of a Condition object. This action causes the OS kernel to preempt the current thread and perform a context switch to another thread that is capable of executing protocol tasks immediately. Figure indicates the number of involuntary context switches incurred by the process architectures. An involuntary context switch occurs when the OS kernel preempts a running thread. For example, the OS preempts running threads periodically when their time-slice expires in order to schedule other threads to execute. As shown in Figure , the task-based tests incur significantly more voluntary context switches than the message-based process architectures, which accounts for the substantial difference in overall throughput. The primary reason for this difference is that the locking mechanisms used for the message-based process architectures utilize adaptive spin-locks (which rarely trigger a context switch), rather than the sleep-locks used for task-based process architectures (which do trigger a context switch). The task-based process architectures also exhibited greater levels of involuntary context switching. This is due mostly to the fact that they required more time to process the 10,000 messages and were therefore pre-empted a greater number of times. Figure indicates the amount of execution time /proc reported as being devoted to waiting to acquire and release locks in the connectionless and connection-oriented benchmark programs. As with context switching benchmarks, the message-oriented process architectures incurred considerably less synchronization overhead, particularly when 4 PEs were used. As before, the spin-locks used by message-based process architecture reduce the amount of time spent synchronizing, in comparison with the sleep-locks used by the task-based process architectures. Section 4 Concluding Remarks Despite an increase in the availability of operating system and hardware platforms that support networking and parallel processing, developing distributed applications that effectively utilize parallel processing remains a complex and challenging task. The ADAPTIVE Service eXecutive (ASX) provides an extensible object-oriented framework that simplifies the development of distributed applications on shared memory multi-processor platforms. The ASX framework employs a variety of advanced OS mechanisms (such as multi-threading and explicit dynamic linking), object-oriented design techniques (such as encapsulation, hierarchical classification, and deferred composition) and C++ language features (such as parameterized types, inheritance, and dynamic binding) to enhance software quality factors (such as robustness, ease of use, portability, reusability, and extensibility) without degrading application performance. In general, the object-oriented techniques and C++ features enhance the software quality factors, whereas the advanced OS mechanisms improve application functionality and performance. A key aspect of concurrent distributed application performance involves the type of process architecture selected to structure parallel processing of tasks in an application. Empirical benchmark results reported in this paper indicate that the task-based process architectures incur relatively high-levels of context switching and synchronization overhead, which significantly reduces their performance. Conversely, the message-based process architectures incur much less context switching and synchronization, and therefore exhibit higher performance. The ASX framework helped to contributed to these performance experiments by providing a set of object-oriented components that decouple the protocol-specific functionality from the underlying of process architecture, thereby simplifying experimentation. The ASX framework components described in this paper are freely available via anonymous ftp from ics.uci.edu in the file gnu/C++ wrappers.tar.Z . This distribution contains complete source code, documentation, and example test drivers for the C++ components developed as part of the ADAPTIVE project at the University of California, Irvine. Components in the ASX framework have been ported to both UNIX and Windows NT and are currently being used in a number of commercial products including the Bellcore Q.port ATM signaling software product, the Ericsson EOS family of PBX monitoring applications, and the network management portion of the Motorola Iridium mobile communications system.