The following paper was originally published in the

	       Proceedings of the USENIX Conference on
		 Object-Oriented Technologies (COOTS)

		   Monterey, California, June 1995


	For more information about USENIX Association contact:

		   1. Phone:	510 528-8649
		   2. FAX:	510 548-5738
		   3. Email:	office@usenix.org
		   4. WWW URL:  https://www.usenix.org


    Object-Oriented Components for High-speed Network Programming

	 Douglas C. Schmidt, Tim Harrison, and Ehab Al-Shaer
  schmidt@cs.wustl.edu, harrison@cs.wustl.edu, and ehab@cs.wustl.edu
		    Department of Computer Science
			Washington University
			 St. Louis, MO 63130
			    (314) 935-7538
 

Abstract 

   This paper makes two contributions to the development and
evaluation of object-oriented communication software.  First, it
reports performance results from benchmarking several network
programming mechanisms (such as sockets and CORBA) on Ethernet and ATM
networks.  These results illustrate that developers of
bandwidth-intensive and delay-sensitive applications (such as
interactive medical imaging or teleconferencing) must evaluate their
performance requirements and the efficiency of their communication
infrastructure carefully before adopting a distributed object
solution.  Second, the paper describes the software architecture and
design principles of the ACE object-oriented network programming
components.  These components encapsulate UNIX and Windows NT network
programming interfaces (such as sockets, TLI, and named pipes) with
C++ wrappers.  Developers of object-oriented communication software
have traditionally had to choose between high-performance, lower-level
interfaces provided by sockets or TLI or less efficient, higher-level
interfaces provided by communication frameworks like CORBA or DCE.
ACE represents a midpoint in the solution space by improving the
correctness, programming simplicity, portability, and reusability of
performance-sensitive communication software.

  Introduction 

Distributed object computing (DOC) frameworks like the Common Object
Request Broker Architecture (CORBA) , OODCE
 , and OLE/COM   are well-suited for
applications that exchange richly typed data via request-response or
oneway communication.  However, current implementations of DOC
frameworks may be less suitable for an important class of
bandwidth-intensive and delay-sensitive applications that stream
relatively simple datatypes over high-speed networks.  Medical
imaging, interactive teleconferencing, and video-on-demand are common
examples of these streaming applications.

Streaming applications with stringent throughput and delay
requirements are ideal candidates for high-speed networks such as ATM
and FDDI.  However, these applications may not be able to tolerate the
overhead associated with contemporary DOC frameworks.  This overhead
stems from non-optimized presentation layer conversions, data copying,
and memory management, inefficient receiver-side demultiplexing and
dispatching operations, synchronous stop-and-wait flow control, and
non-adaptive retransmission timer schemes.  Meeting the throughput
demands of streaming applications has traditionally involved direct
access to network programming interfaces such as sockets
  or System V TLI  .  These lower-level
interfaces are efficient since they omit unnecessary functionality
(such as presentation layer conversions for ASCII data) and allow
fine-grained control over memory management, protocol buffering,
demultiplexing, and flow control.

However, conventional network programming interfaces are low-level,
non-portable, and non-typesafe, which complicates programming and
permits subtle run-time errors.  For instance, communication endpoints
in the socket interface are identified by weakly-typed integer   
handles  (also known as    socket descriptors ).  Weak type-checking
increases the potential for run-time errors since compilers cannot
detect or prevent improper use of handles.  Thus, operations can be
applied to handles incorrectly (such as invoking a    read  or   
write  on a passive-mode handle that can only accept connections).

Traditionally, developers of high-performance streaming applications
had to choose between two solutions:

   
     Higher-level, but less efficient network programming
interfaces  -- such as DOC frameworks or RPC toolkits;
     Lower-level, but more efficient network programming
interfaces  -- such as sockets or TLI.
 

  This paper describes object-oriented network programming
components that provide a midpoint in the solution space.  These
components are part of the ACE toolkit  , which
encapsulates conventional network programming interfaces with a family
of C++ wrappers.  The ACE toolkit improves the correctness, ease of
use, portability and reusability of communication software without
sacrificing performance.

This paper is organized as follows: Section   compares
the performance of several network programming mechanisms (C sockets,
C++ wrappers for sockets, and two implementations of CORBA) for a
representative streaming application over Ethernet and ATM networks;
Section   outlines the design of the object-oriented ACE
components that encapsulate UNIX and Windows NT network programming
interfaces (such as sockets, TLI, STREAM pipes, and named pipes);
Section   illustrates the differences between programming
with C sockets, ACE, and CORBA; Section   summarizes
the design principles of the ACE wrappers; and Section  
presents concluding remarks.

  Performance Experiments 
 

This section describes performance results from comparing several
network programming mechanisms that transfer large streams of data
using TCP/IP over Ethernet and ATM networks.  The network programming
mechanisms compared below include C sockets, C++ wrappers for sockets,
and two implementations of CORBA.  The benchmark tests are
representative of applications written by the authors for the Motorola
Iridium project   (which is a next-generation
satellite-based global personal communication system) and Project
Spectrum (which is an enterprise-wide medical imaging system that
transports radiology images across high-speed ATM LANs and WANs
 ).

  Test Platform and Benchmarks 
The performance results in this section were collected using a Bay
Networks LattisCell 10114 ATM switch connected to two uni-processor
SPARCstation 20 Model 5Os.  This LattisCell is a 16 Port, OC3
155Mbs/port switch.  The SPARCstations contain 100 MIP Super SPARC
CPUs running SunOS 5.4.  The SunOS 5.4 TCP/IP protocol stack is
implemented using the STREAMS communication framework
 .  Each SPARCstation 20 has 64 Mbytes of RAM and an
ENI-155s-MF ATM adaptor card, which supports 155 Mbits/sec (Mbps)
SONET multimode fiber.  The Maximum Transmission Unit (MTU) size of a
SONET frame on the ENI ATM adaptor is 9,180 bytes.  Each ENI card has
512 Kbytes of on-board memory.  32 Kbytes is alloted per ATM virtual
circuit connection for receiving and transmitting frames (for a total
of 64K).  This allows up to 8 connections per card.

Data for the experiments was produced and consumed by an extended
version of the widely available    ttcp    protocol
benchmarking tool.  This tool measures end-to-end data transfer
throughput in Mbps from a transmitter process to a remote receiver
process.  The flow of user data is uni-directional, with the
transmitter flooding the receiver with a user-specified number of data
buffers.  Various sender and receiver parameters (such as the number
of data buffers transmitted, the size of data buffers, and the size of
the socket transmit and receive queues) may be selected at run-time.

The following versions of    ttcp  were implemented and benchmarked:

 
     C version  -- this is the standard    ttcp  program
implemented in C.  It uses C socket calls to transfer and receive data
via TCP/IP.

     ACE version  -- this version replaces all C socket calls
in    ttcp  with the C++ wrappers for sockets provided by the ACE
network programming components (version 3.2)  .  The
ACE wrappers encapsulate sockets with efficient and typesafe C++
interfaces.

     CORBA versions  -- two implementations of CORBA were used:
version 1.3 of Orbix from IONA Technologies and version 1.2 of
ORBeline from Post Modern Computing.  These versions replace all C
socket calls in    ttcp  with stubs and skeletons generated from a
pair of CORBA IDL definitions.  One IDL definition uses a   
sequence  parameter for the data buffer and the other uses a   
string  parameter.
 
 
Each version of    ttcp  was compiled using SunC++ 4.0.1 with the
highest level of optimization (   -O4 ).  To control for confounding
factors the timing mechanisms, command-line options, socket options,
and communication protocols were held constant for all implementations
of    ttcp .  Only the connection establishment and data transfer
mechanisms were varied.

  
  Results 
We ran a series of tests that transferred 64 Mbytes of user data in
buffers ranging from 1 byte to 128 Kbytes using TCP/IP over Ethernet
and ATM networks.  Data buffers were run in increments of 1 byte, 1K,
2K, 4K, 8K, 16K, 32K, 64K, and 128K sizes.  Two different sizes for
socket queues were also used: 8K (the default on SunOS 5.4) and 64K
(the maximum size supported by SunOS 5.4).  Each test was run 20 times
to account for performance variation due to transient load on the
networks and hosts.  The variance between runs was very low since the
tests were conducted on otherwise unused networks.

Figure   summarizes the performance results for all the
benchmarks using 64K socket queues over a 155 Mbps ATM link and a 10
Mbps Ethernet (the 8K socket queue results are presented below and
Tables   and    summarize the results for all the
tests).  The C and ACE C++ wrapper versions of    ttcp  obtained the
highest throughput: 62 Mbps using 8K data buffers.  In contrast, the
Orbix and ORBeline CORBA versions of    ttcp  peaked at around 39
Mbps with 64K data buffers using IDL    sequences .

The results for Ethernet show much less variation, with the
performance for all tests ranging from around 8 to 8.7 Mbps with 64K
socket queues.  None of the Ethernet benchmarks ran faster than 8.7
Mbps, which is 87 percent of the maximum speed of a 10 Mbps Ethernet.
Although the absolute throughput of    ttcp  is much faster over
ATM, the relative utilization of the network channel speed was much
lower (62 Mbps represents only 40 percent of the 155 Mbps ATM link).

The disparity between network channel speed and end-to-end application
throughput is known as the    throughput preservation problem 
 , where only a portion of the available bandwidth is
actually delivered to applications.  This problem stems from operating
system and protocol processing overhead (such as data movement,
context switching, and synchronization  ).  As shown in
Section  , the throughput preservation problem is
exacerbated by contemporary implementations of DOC frameworks like
CORBA, which copy data multiple times during fragmentation/reassembly,
marshalling, and demarshalling.

Sections   and   examine these performance
results in detail and Section   presents
recommendations based on the results.

  C and ACE Wrapper Implementations of TTCP 
 
 
Figure   illustrates the performance results from the C
and ACE wrapper versions of    ttcp  over ATM and Ethernet.  The
performance of C sockets and ACE C++ wrappers are roughly equivalent,
indicating there is no significant performance penalty for using the
ACE wrappers.  Both peak at 62 Mbps over ATM using 8K data buffers and
64K socket queues.  When the data buffers exceeded 8K performance
began to decline, leveling off at around 48 Mbps with 64K data
buffers.  This behavior is caused primarily by the MTU size of the ATM
network, which is 9,180 bytes (the MTU size of a SONET frame).  When
data buffers exceed the MTU size they are fragmented and reassembled,
thereby lowering performance.

Figure   also illustrates the impact of socket queue
size on throughput.  Larger socket queues increase the TCP window
size, which allows the transmission of multiple TCP segments
back-to-back.  In the case of ATM, increasing the socket queue from 8K
to 64K improves    ttcp  performance significantly from 23 Mbps to
62 Mbps.

The Ethernet results for large and small socket queues are more
similar than the ATM results.  They peak at 8.4 Mbps with 8K socket
queues and 8.7 Mbps with 64K socket queues.  In both cases, the factor
limiting performance is the slow speed of the network.

  
  CORBA Implementations of TTCP 
 

Figure   illustrates the results of measuring two
versions of    ttcp  implemented with two different versions of
CORBA.  The CORBA implementations were developed using single-threaded
versions of Orbix 1.3 and ORBeline 1.2.  At the time these tests were
performed, neither Orbix nor ORBeline fully supported the OMG 2.0
CORBA standard.  This complicated the CORBA versions of    ttcp 
somewhat since different implementations were required to account for
differences in Orbix and ORBeline.

Extending    ttcp  to use CORBA required several modifications to
the original C/socket code.  All C socket calls were replaced with
stubs and skeletons generated from pair of CORBA interface
definitions.  One IDL interface uses a    sequence  to transmit the
data and the other IDL interface uses a    string , as follows:

  
  0.85 
  
typedef sequence<char> ttcp_sequence;

interface TTCP_Sequence
 
  oneway void send (in ttcp_sequence ttcp_seq);
 ;

interface TTCP_String
 
  oneway void send (in string ttcp_string);  
 ;
 
 
  The    send  operations use    oneway  semantics since
the    ttcp  benchmarks measure the performance of uni-directional
data transfers.

The client-side of    ttcp  was modified to obtain object references
to the server-side    TTCP Sequence  and    TTCP String  object
implementations, as follows:

  
  0.9 
  
// Use locator service to acquire bindings.
TTCP_String *t_str = TTCP_String::_bind ();
TTCP_Sequence *t_seq = TTCP_Sequence::_bind ();
  

  Data buffers of the appropriate size were 
initialized and then transmitted by calling the IDL-generated   
send  stubs, as follows:

  
  0.85 
  
// String transfer.

char *buffer = new char[buffer_size];
// Initialize data in char * buffer...

while (--buffers_sent >= 0)
  t_str->send (buffer);

// Sequence transfer.

TTCP_Sequence sequence_buffer;
// Initialize data in TTCP_Sequence buffer...

while (--buffers_sent >= 0)
  t_seq->send (sequence_buffer);
  

The server-side was modified to create object implementations for   
TTCP Sequence  and    TTCP String .  CORBA IDL compilers generate
skeletons that translate IDL interface definitions (such as   
TTCP Sequence ) into C++ base classes (such as   
TTCP SequenceBOAImpl ).  Each IDL operation (such as    oneway void
send ) is mapped to a corresponding C++ pure virtual method (such as
   virtual void send ).  Programmers then define C++ derived classes
that override these virtual methods to implement application-specific
functionality, as follows:  Both CORBA implementations of   
ttcp  used inheritance since ORBeline does not support Orbix's ``TIE''
technique (which uses object composition to tie application-specific
classes to the generated IDL skeletons). 

  
  0.85 
  
// Implementation class for IDL interface
// that inherits from automatically-generated 
// CORBA skeleton class.

class TTCP_Sequence_i 
  : virtual public TTCP_SequenceBOAImpl
 
public:
  TTCP_Sequence_i (void): nbytes_ (0)   

  // Upcall invoked by the CORBA skeleton.
  virtual void send 
    (const TTCP_Sequence &ttcp_seq,
     CORBA::Environment &IT_env) 
   
    this->nbytes_ += ttcp_seq._length;
   
  // ...

private:
  // Keep track of bytes received.
  u_long nbytes_;
 ;
  

The server-side used the CORBA    impl is ready  event loop to
demultiplex incoming requests to the appropriate object
implementation, as follows:

  
  0.85 
  
int main (int argc, char *argv[])
 
  // Implements the Sequence object.
  TTCP_Sequence_i ttcp_sequence;

  // Implements the String object.
  TTCP_String_i ttcp_string;
          
  // Tell the ORB that the objects are active.
  CORBA::BOA::impl_is_ready ();

  /* NOTREACHED */
  return 0;
 
  
Porting    ttcp  to use CORBA over ATM demonstrated the importance
of having sufficient hooks to manipulate underlying OS mechanisms
(such as transport layer and socket layer options) that significantly
affect performance.  In particular, high performance data transfers
over TCP and ATM require large socket queues.  This is illustrated by
the considerable difference in throughput for the 8K and 64K socket
queues in Figures   and  .

Orbix provides hooks to enlarge socket queues via    setsockopt  by
invoking a user-defined callback function whenever a new socket is
connected.  In contrast, it was hard to enlarge the socket queues
using ORBeline 1.2 since it did not provide direct access to sockets
(subsequent versions of ORBeline will provide this functionality).

By comparing Figure   with Figure   it
is clear that the CORBA-based    ttcp  implementations ran
considerably slower than the C and ACE wrapper versions on the ATM
network, particularly for 8K data buffers.  The highest throughput (39
Mbps) was obtained by the Orbix    sequence  implementation using
64K data buffers and 64K socket queues.  The performance leveled off
beyond 64K data buffers.

Unlike the C and ACE wrapper results in Figure  , the
performance of the CORBA versions did not decrease when the size of
the data buffers exceeds 8K.  This behavior stems from the higher
fixed overhead of CORBA (such as demultiplexing and memory management)
which lowers its performance for small buffer sizes.  As the buffer
size increases, however, the relative impact of this fixed overhead is
reduced.  However, as the buffers increase in size the overhead of
data copying grows, which ultimately limits the throughput achievable
with the CORBA implementations.

Further profiling and examination of the IDL stubs and skeletons
generated by Orbix and ORBeline revealed that the CORBA overhead stems
from the following sources:

 
     Data Copying:  
The data buffers exchanged between the sender and receiver in   
ttcp  are treated as a stream of untyped bytes.  This is similar to
the type of data transmitted by streaming applications such as
teleconferencing and medical imaging  .  Since the data
is untyped the CORBA presentation layer need not perform complex
marshalling to handle byte-ordering differences between sender and
receiver.

Although marshalling is not required, the CORBA implementations
incurred significant data copying overhead.  The UNIX profiler   
prof  was used to pinpoint the sources of this overhead.     prof 
measures the amount of time spent in functions during program
execution.  Figure   lists the functions for all the
tests where the most time was spent sending and receiving 64 Mbytes
using 128K data buffers and 64K socket queues:

The    read  and    write  system calls accounted for most of the
execution time in the C and ACE wrapper implementations of    ttcp .
The remaining time for the sender-side was spent preparing the data
for transmission.  Note that although the data was transmitted as 512
128K buffers it was read by the receiver in much smaller chunks
(around 4K).  This illustrates the fragmentation and reassembly
performed by the ATM network adaptors.

The    read  and    write  system calls dominated the execution of
the CORBA implementations, as well.  However, unlike the C and ACE
wrapper versions, these implementations spent 4 to 15 percent of their
time performing other tasks, such as copying and/or inspecting data
(   memcpy ,    strcpy , and    strlen ), checking for activity
on other handles (    poll ), and manipulating signal handlers
(     sigaction ).

The highest cost tasks involved data copying.  The IDL stubs and
sequences copy data multiple times,    e.g.,  from the TCP data
buffer into a marshalling buffer, and then again into the parameter
passed to the    send  upcall.  The results in
Figure   illustrate that the choice of CORBA IDL
parameter datatypes has a significant impact on performance.  The   
sequence  implementations shown in Figure   peaked
at 39 Mbps for Orbix and 38 Mbps for ORBeline.  In contrast, the   
string  implementations peaked at 34 Mbps for Orbix and 30 Mbps for
ORBeline.

The performance variation between the    sequence  and    string 
results from differences in their IDL to C++ mappings.  In particular,
the IDL    sequence  mapping contains a length field, whereas the
   string  mapping does not.  Thus, the generated stubs and
skeletons use this length field to avoid searching each    sequence 
parameter for a terminating NUL character.  In contrast, the IDL   
string  implementations use    strlen  to determine the length of
their parameters.

The performance variation between Orbix and ORBeline results from
differences in their message fragmentation/reassembly implementation,
as well as the design of their socket event handling.  As shown in
Figure  , ORBeline copies data approximately 3 more times
than Orbix on the sender and receiver for both    sequence  and   
string .  In addition, ORBeline invokes the    poll  and   
sigaction  system calls over 1,000 times.  The Orbix implementation
does not perform these extra operations, which is one reason why
ORBeline performance consistently lower than Orbix in
Figure  .

 
     Demultiplexing: 
Each CORBA request message contains the name of its remote operation
represented as a string.  Orbix demultiplexes incoming messages to the
upcall by performing a linear search through the list of operations in
the IDL interface.  In the case of    ttcp , linear search suffices
since there was only one choice (   send ).  However, this strategy
does not scale since search time grows linearly with the number of
operations in the IDL interface.  Moreover, the order of operations
will determine the demultiplexing performance.  Therefore, operations
in Orbix should be ordered by decreasing frequency of use.

In contrast, ORBeline use hashing to determine the appropriate upcall
associated with an incoming request.  Hashing is likely to scale
better for large IDL interfaces, but may be less efficient for small
interfaces.  Thus, demultiplexing may benefit from adaptive
optimizations that select customized strategies depending on the
properties of the IDL interface.  Alternatively, perfect hashing
  or some type of integral indexing scheme could be
negotiated between sender and receiver to improve performance and to
shield developers from having to manually tune their IDL interfaces.

 
     Memory allocation:  CORBA-generated skeletons
do not know how the user-supplied upcall will use the parameters
passed to it from the request message.  Thus, they use conservative
memory management techniques that dynamically allocate and release
copies of messages before and after an upcall, respectively.  These
memory management policies are important in some circumstances (  
e.g.,  if an upcall is used in a multi-threaded application).
However, this strategy needlessly increases processing overhead for
streaming applications like    ttcp  that immediately consume their
data without modifying it.
 

  Evaluation and Recommendations 
 

Section   compared the performance of C, ACE wrapper,
and CORBA versions of    ttcp  in terms of their ability to stream
large qualities of data using TCP/IP over Ethernet and ATM networks.
Tables   and    summarize the results for all the
ATM and Ethernet tests, respectively.  All tests perform roughly the
same on Ethernet.  However, the data copying overhead of the CORBA
implementations significantly limits their throughput on ATM.  This
illustrates that the overhead of CORBA implementations may not be
revealed until the network is no longer the limiting factor.  In
addition, the profiler results in Figure   illustrate
that small design and implementation differences have a large
performance impact on high-speed networks.

As users and organizations migrate to high-speed networks the
performance limitations of contemporary CORBA implementations will
become more evident.  This should encourage vendors to optimize the
performance of their ORBs for streaming applications running over
high-speed networks such as ATM.  Key areas of optimization include
presentation layer conversions, memory management and memory copying,
and receiver-side demultiplexing and dispatching.  In particular,
implementations must reduce the number of times that large data
buffers are copied on the sender and receiver.  The need for these
optimizations is widely recognized in the communication protocol
community   and prototypes that implementation these
optimizations are becoming available  .

Until these optimizations are widely implemented in production
systems, however, we recommend that developers of bandwidth-intensive
and delay-sensitive streaming applications on high-speed networks
consider the following when adopting a distributed object computing
solution:

   
  Carefully measure the performance of the communication
infrastructure (   i.e.,  the network/host hardware and software).
The    ttcp  benchmarks and ACE source code described in this paper
are freely available and may be obtained via anonymous ftp from   
ics.uci.edu  in the file  /C++_wrappers.tar.Z@ or from URL
 ://www.cs.wustl.edu/ schmidt/@.  We encourage others to
replicate our    ttcp  experiments using different implementations
of CORBA and other network/host platforms and report the results.

  Evaluate tools based on empirical measurements and thorough
understanding of application requirements, rather than adopting a
particular communication model or implementation unconditionally.

  Integrate higher-level DOC frameworks with high-performance
object-oriented encapsulations of lower-level network programming
interfaces (such as the ACE socket wrappers described in
Section  ).  

  Insist that CORBA implementors provide hooks to manipulate
the underlying protocol layer and socket layer options conveniently.
It is particularly important to increase the size of the socket queues
to the largest values supported by the OS.

  Tune the size of transmitted data buffers to match
the MTU of the network where appropriate.

  Use IDL    sequences  rather than    strings  to avoid
unnecessary data access.

 
The performance results and recommendations in this paper are not
intended as a criticism of the CORBA model or of particular ORB
vendors.  It is beyond the scope of this paper to discuss the benefits
(such as extensibility and maintainability) of CORBA, as well as its
limitations  .  Clearly, implementations of other DOC
frameworks (such as OODCE or OLE/COM) that do not address key sources
of overhead on high-speed networks will exhibit similar performance
problems.

  An Object-Oriented Network Programming Interface  
 

Low-level network programming interfaces like sockets or TLI are
difficult to program.  They require strict attention to many tedious
details, making them hard to learn and error prone to program.  In
addition, programming directly to low-level interfaces limits
portability and reuse.

One solution is to develop applications using higher-level distributed
object computing (DOC) frameworks like CORBA.  DOC frameworks shield
developers from low-level programming details and facilitate a
reasonably portable distributed computing platform.  As described in
the previous section, however, the performance of conventional
implementations of CORBA may be inadequate for bandwidth-intensive and
delay-sensitive streaming applications on high-speed networks.

One method for satisfying the tension between programming simplicity,
portability, and run-time efficiency is to encapsulate lower-level
network programming interfaces with object-oriented wrappers.  By
judicious use of languages features (such as inlining and templates)
and design patterns (such as Factories  , Connectors
and Acceptors  ) it is possible to create reusable
object-oriented components that are typesafe, portable, convenient to
program,    and  efficient.

This section outlines the design of the    IPC SAP  object-oriented
network programming components provided by the ACE toolkit
 .  ACE contains a set of object-oriented networking
programming components that perform active and passive connection
establishment, data transfer, event demultiplexing, event handler
dispatching, routing, dynamic (re)configuration of application
services, and concurrency control  .

   IPC SAP  stands for ``InterProcess Communication Service Access
Point.''  It consists of a family of class categories shown in
Figure   that encapsulate handle-based network programming
interfaces such as sockets (   SOCK SAP ), TLI (   TLI SAP ), UNIX
SVR4 STREAM pipes (   SPIPE SAP ), and UNIX named pipes (   FIFO
SAP ).  These network programming wrappers are designed to improve the
correctness, programming simplicity, portability, and reusability of
performance-sensitive communication software.  This section describes
the    SOCK SAP  socket wrappers, focusing on interface design
techniques that shield programmers from shortcomings of C, C++, and
existing OS network programming interfaces.

  Limitations with Sockets 
 

Sockets were originally developed in BSD UNIX to provide an interface
to the TCP/IP protocol suite  .  From an application's
perspective, a socket is a local endpoint of communication that can be
bound to an address residing on a local or a remote host.  Sockets are
accessed via    handles , which are unsigned integers that index
into a table maintained in the OS.  Handles shield applications from
the internal representation of OS data structures.  In UNIX and
Windows NT, socket handles share the same name space as other handles
(such as files, named pipes, and terminal devices).

The standard socket interface is defined by the C functions shown in
Figure  .  It contains several dozen routines that perform
tasks such as locating address information for network services,
establishing and terminating connections, and sending and receiving
data  .  Although the socket interface is widely
available and widely used, its design has several notable limitations
discussed below.  These limitations are shared by other lower-level
network programming interfaces such as TLI, STREAM pipes, and named
pipes.

  
  High Potential for Error 
In UNIX any integral value can be passed as a handle to a socket
routine.  Therefore, compilers are unable to detect or prevent the
erroneous use of handles.  This weak type-checking allows subtle
errors to occur at run-time since the socket interface cannot enforce
the correct use of routines for different communication roles (such as
active vs. passive connection establishment or datagram vs. stream
communication).  Operations (such as invoking a data transfer
operation on a handle designated for establishing connections) may
therefore be applied improperly on handles.

Figure   depicts the following subtle (and common)
errors that occur when using the socket interface:

 
  Forgetting to initialize the    length  parameter (used by   
accept ) to the size of    struct sockaddr in ;
  Forgetting to ``zero-out'' all bytes in the socket address
structure;
  Using an address family type that is inconsistent with the
protocol family of the socket (   e.g.,     PF UNIX  vs.   
AF INET );
  Neglecting to use the    htons  library function to convert
port numbers from host byte-order to network byte-order and vice
versa; 
  Applying the    accept  function on a    SOCK DGRAM  socket;
  Erroneously omitting parentheses in an assignment expression;
  Trying to    read  from a passive-mode socket that should only be
used to    accept  connections;
  Failing to properly detect and handle ``short-writes'' that
occur due to buffering in the OS and flow control in the transport
protocol.
 
  Other common misuses of sockets not shown in this example
are forgetting to call    listen  when creating a passive-mode   
SOCK STREAM  listener socket and miscalculating the length of the
pathname in a UNIX-domain socket address (the trailing NUL should not
be counted).

Several of the problems listed above are classic problems with
programming in C.  For instance, by omitting the parentheses in the
expression

if (n_fd = accept (s_fd, 
                   (struct sockaddr *) &s_addr, 
                   &length) == -1)
 
 
  the value of    n fd  will always be set to either 0 or 
1, depending on whether    accept() == -1 .  This problem is
exacerbated by the fact that    accept  returns the handle of the
newly connected socket.  If this handle were passed back as an out
parameter there would be less incentive to use    accept  in an
assignment expression.

A deeper problem is that C's lack of support for data abstraction and
object-oriented programming makes it hard to define typesafe,
extensible, and reusable component interfaces.  For example, the
generic    sockaddr  socket address structure provides a crude form
of inheritance to express the commonality between Internet domain and
UNIX domain address structures (   sockaddr in  and   
sockaddr un , respectively).  These ``subclass'' address structures
require the use of a non-typesafe cast to overlay the    sockaddr 
``base class.'' In an object-oriented language this commonality would
be expressed more cleanly and robustly using inheritance and dynamic
binding.

In general, the use of unsafe typecasts, combined with the
weakly-typed handle-based socket interface, makes it impossible for a
compiler to detect mistakes at compile-time.  Instead, error checking
is deferred until run-time, which complicates error handling and
reduces application robustness.  Most of the error checking
has been omitted in these examples to save space.  Naturally, robust
programs should check the return values of library and system calls. 

  Complex Interface 
Sockets support multiple protocol families (such as TCP/IP, IPX/SPX,
ISO OSI, and UNIX domain sockets) with a single interface.  The socket
interface contains many functions to support different   
communication roles  (such as active vs. passive connection
establishment),    communication optimizations  (such as   
writev /   readv  that send/receive multiple buffers in a single
system call), and    protocol options  (such as broadcasting,
multicasting, asynchronous I/O, and urgent data delivery).

Although sockets combine this functionality into a common interface,
the result is complex and hard to master.  Much of this complexity
stems from the overly broad and one-dimensional design of the socket
interface.  That is, all the routines appear at a single level of
abstraction (as shown in Figure  ).  This design increases
the amount of effort required to learn and use sockets correctly.  In
particular, programmers must understand most of the interface to use
any part of it effectively.

  
If the socket routines are examined carefully, however, it is clear
that the interface decomposes naturally into the following
communication dimensions:

   
     Type of communication service  --    i.e.,  stream
vs. datagram vs. connected datagram;

     Communication role  --    i.e.,  active vs. passive
(clients are typically active, whereas servers are typically passive); 

     Communication domain  --    i.e.,  local IPC only
vs. local/remote IPC.
 

  Figure   classifies the socket routines
according to these dimensions.  Since the socket interface is
one-dimensional, however, this natural clustering of functionality is
obscured.

Another problem with the socket interface is that its several dozen
routines lack a uniform naming convention.  Non-uniform naming makes
it hard to determine the scope of the socket interface.  For example,
it is not immediately obvious that    socket ,    bind ,   
accept , and    connect  routines are related.  Other network
programming interfaces solve this problem by prepending a common
prefix before each routine.  For example, a    t   is prepended
before each routine in the TLI library.

  
  * 

  The SOCK SAP Class Category 
 

   SOCK SAP  is designed to overcome the limitations with sockets
described above.  It improves the correctness, ease of learning and
ease of use, reusability, and portability of communication software
without sacrificing performance.  This section outlines the software
architecture of    SOCK SAP  and explains the classes used by the
programming examples in Section  .  Readers who are not
interested in this level of detail may want to skip to
Section  , which discusses the general principles
underlying the design of the    SOCK SAP  wrappers.

   SOCK SAP  consists of around one dozen C++ classes that are
related by multiple inheritance and composition.  These components and
their relationships are illustrated via Booch notation  
in Figure  .  Dashed clouds indicate classes and
directed edges indicate inheritance relationships between these
classes (   e.g.,     SOCK Stream  inherits from    SOCK ).  The
general structure of    SOCK SAP  corresponds to the taxonomy of
   communication services ,    communication roles , and   
communication domains  shown in Figure  .  It is
instructive to compare Figure   with
Figure  .  The latter is more concise since it uses C++
wrappers to encapsulate the behavior of multiple socket mechanisms
within classes related by inheritance.

Each class in    SOCK SAP  provides an abstract interface for a
subset of mechanisms that together comprise the overall class
category.  The functionality of various types of Internet-domain and
UNIX-domain sockets is achieved by inheriting mechanisms from the
appropriate classes described below.  These classes are presented
below according to the groupings shown in Figure  .

  Base Classes: 
The    SOCK  and    LSOCK  classes anchor the inheritance
hierarchy and enable subsequent derivation and code sharing.  Objects
of these classes cannot be instantiated since their constructors are
declared in the    protected  section of the class definition.

 
     SOCK: 
this class is the root of the    SOCK SAP  hierarchy.  It provides
mechanisms common to all other classes, such as opening and closing
local endpoints of communication and handling options (such as
selecting socket queue sizes and enabling group communication).

 
     LSOCK:  
this class provides mechanisms that allow applications to send and
receive open file handles between unrelated processes on the local
host machine (hence the prefix 'L').  Note that System V and BSD UNIX
both support this feature, though Windows NT does not.  Other classes
inherit from    LSOCK  to obtain this functionality.
 

   SOCK SAP  distinguishes the    LSOCK*  and    SOCK*  classes
on the basis of network address formats and communication semantics.
In particular, the    LSOCK*  classes use UNIX pathnames as
addresses and only allow intra-machine IPC.  The    SOCK*  classes,
on the other hand, use Internet Protocol (IP) addresses and port
numbers and allow both intra- and inter-machine IPC.

  Connection Establishment: 

Communication software is typified by asymmetric connection behavior
between clients and servers.  In general, servers listen   
passively  for clients to initiate connections    actively 
 .  The structure of passive/active connection
establishment and data transfer relationships are captured by the
following connection-oriented    SOCK SAP  classes:

       SOCK Acceptor and LSOCK Acceptor: 
The    *Acceptor  classes are factories   that
passively establish new endpoints of communication in response to
active connection requests.  The    SOCK Acceptor  and    LSOCK
Acceptor  factories produce    SOCK Stream  and    LSOCK Stream 
connection endpoint objects, respectively.

       SOCK Connector and LSOCK Connector: 
The    *Connector  classes are factories that actively establish new
endpoints of communication.  These classes establish connections with
remote endpoints and produce the appropriate    *Stream  object when
a connection is established.  A connection may be initiated either
synchronously or asynchronously.  The    SOCK Connector  and   
LSOCK Connector  factories produce    SOCK Stream  and    LSOCK
Stream  connection endpoint objects, respectively.

  
Note that the    *Acceptor  and    Connector  classes do not
provide methods for sending or receiving data.  Instead, they are
factories that produce the    *Stream  data transfer objects
described below.  The use of strongly-typed interfaces detects
accidental misuse of local and non-local    *Stream  objects at
compile-time.  In contrast, the socket interface can only detect these
type mismatches at run-time.

  
  Stream Communication: 

Although establishing connections requires a distinction between
active and passive roles, once a connection is established data may be
exchanged in any order according to the protocol used by the
endpoints.     SOCK SAP  isolates the data transfer behavior in the
following classes:

     SOCK Stream and LSOCK Stream:  These two
classes are produced by the    *Acceptor  or    *Connector 
factories described above.  The    *Stream  classes provide
mechanisms for transferring data between two processes.     LSOCK
Stream  objects exchange data between processes on the same host
machine;    SOCK Stream  objects exchange data between processes
that may reside on different host machines.

The overloaded    send  and    recv     *Stream  methods provide
standard UNIX    write  and    read  semantics.  Thus, a   
send  may write less (and a    recv  may read more) than the
requested number of bytes.  These ``short-writes'' and ``short-reads''
occur due to buffering in the OS and flow control in the transport
protocol.  To reduce programming effort, the the    *Stream  classes
provide    send n  and    recv n  methods that allow
transmission and reception of exactly   bytes.  ``Scatter-read'' and
``gather-write'' methods are also provided to efficiently send and
receive multiple buffers of data simultaneously.

  Datagram Communication: 

     SOCK CODgram and LSOCK CODgram:  These
classes provide a ``connected-datagram'' mechanism, which allows the
   send  and    recv  operations to omit the address of the
service when exchanging datagrams.  Note that the connected-datagram
mechanism is only a syntactic convenience since there are no
additional semantics associated with the data transfer (   i.e., 
datagram delivery remains unreliable).     SOCK CODgram  inherits
mechanisms from the    SOCK  base class.     LSOCK CODgram 
inherits mechanisms from both    SOCK CODgram  and    LSOCK 
(which provides the ability to pass file handles).

       SOCK Dgram and LSOCK Dgram:  These
classes provide mechanisms for exchanging datagrams between processes
running on local and/or remote hosts.  Unlike the connected-datagram
classes described above, each    send  and    recv  operation   
must  provide the address of the service with every datagram sent or
received.     LSOCK Dgram  inherits all the operations of both   
SOCK Dgram  and    LSOCK .  It only exchanges datagrams between
processes on the same host.  The    SOCK Dgram  class, on the other
hand, may exchange datagrams between processes on local and/or remote
hosts.

  Group Communication: 

 
     SOCK Dgram Bcast: 
This class provides mechanisms for broadcasting UDP datagrams to
processes running on local and/or remote hosts attached to local
subnets.  The interface for this class supports the broadcast of
datagrams to (1) all network interfaces connected to the host machine
or (2) a particular network interface.  This class shields the
end-user from the low-level details required to utilize broadcasting
effectively.

 
     SOCK Dgram Mcast: 
This class provides mechanisms for multicasting UDP datagrams to
processes running on local and/or remote hosts attached to local
subnets.  The interface for this class supports the multicast of
datagrams to a particular multicast group.  This class shields the
end-user from the low-level details required to utilize multicasting
effectively.

  Network Addressing 
 

Designing an efficient, general-purpose network addressing interface
is hard.  The difficulty stems from trying to represent different
network address formats with a space efficient and uniform interface.
Different address formats store diverse types of information
represented with various sizes.  For example, an Internet-domain
service (such as    ftp  or    telnet ) is identified using two
fields: (1) a four-byte IP address (which uniquely identifies the
remote host machine throughout the Internet) and (2) a two-byte port
number (which is used to demultiplex incoming protocol data units to
the appropriate client or server process on the remote host machine).
In contrast, UNIX-domain sockets rendezvous via UNIX pathnames (which
may be up to 108 bytes in length and are meaningful only on a single
local host machine).

The existing    sockaddr -based network addressing structures
provided by the socket interface is cumbersome and error-prone.  It
requires developers to explicitly initialize all the bytes in the
address structure to 0 and to use explicit casts.  In contrast, the
   SOCK SAP  addressing classes shown in Figure  
contain mechanisms for manipulating network addresses.  The
constructors for the    Addr  base class ensure that all fields are
automatically initialized correctly.  Moreover, the different sizes,
formats, and functionality that exist between different address
families are encapsulated in the derived address subclasses.  This
makes it easier to extend the network addressing scheme to encompass
new communication domains.  For example, the    UNIX Addr  subclass
is associated with the    LSOCK*  classes, the    INET Addr 
subclass is associated with the    SOCK*  and    TLI*  classes,
and the    SPIPE Addr  subclass is associated with the    STREAM
Pipe  classes.

  Programming with SOCK SAP C++ Wrappers 
 

This section illustrates the ACE    SOCK SAP  wrappers by using them
to develop a client/server streaming application.  This application is
simplified version of the    ttcp  program described in
Section  .  For comparison, this application is also
written with sockets and CORBA.

Figures   and   present a
client/server program that uses Internet-domain sockets to implement
the stream application.  The server shown in
Figure   creates a passive-mode listener socket and
waits for clients to connect to it.  Once connected, the server
receives the data transmitted from the client and displays the data on
its standard output stream.  The client-side shown in
Figure   establishes a TCP connection with the
server and transmits its standard input stream across the connection.
The client uses non-blocking connections to limit the amount of time
it waits for a connection to be accepted or refused.

Most of the error checking for return values has been omitted to save
space.  However, it is instructive to note all the socket
initialization, network addressing, and flow control details that must
be programmed explicitly to make even this simple example work
correctly.  Moreover, the code in Figures  
and   is not portable to platforms that do not
support sockets or    select .

Figures   and   use    SOCK
SAP  to reimplement the C versions of the client/server programs.  The
   SOCK SAP  programs implement the same functionality as those
presented in Figure   and
Figure  .  The    SOCK SAP  C++ programs exhibit
the following benefits compared with the socket-based C
implementation:

 
     Decreased program size  --    e.g.,  a substantial
reduction in the lines of code results from localizing active and
passive connection establishment in the    SOCK Acceptor  and   
SOCK Connector  connection factories.  In addition, default values are
provided for constructor and method parameters, which reduces the
number of arguments needed for common usage patterns.

     Increased clarity  --    e.g.,  network addressing and
host location is handled by the    Addr  class, which hides the
subtle and error-prone details that must be programmed explicitly in
Figures   and  .  Moreover, the
low-level details of non-blocking connection establishment are
performed by the    SOCK Connector .

     Increased typesafety  --    e.g.,  
the    SOCK Acceptor  and    SOCK Connector  connection factory
objects must be passed arguments of the    SOCK Stream  type.  This
prevents the type errors shown in Figure   from
occurring at run-time.

     Increased portability  --    e.g.,  switching 
between sockets and TLI simply requires changing
  
send_data <TLI_Connector, TLI_Stream, 
           INET_Addr> (s_addr);
  
  in the client to 

send_data <SOCK_Connector, SOCK_Stream,
           INET_Addr> (s_addr);
  
  and
  
recv_data<SOCK_Acceptor, SOCK_Stream,
          INET_Addr> (s_addr);
  
  in the server to
  
recv_data<TLI_Acceptor, TLI_Stream, 
          INET_Addr> (s_addr);
  

  Conditional compilation directives can be used to further
decouple the communication software from reliance upon a particular
type of network programming interface.

However, the ACE wrappers share some of the same drawbacks as sockets.
In particular, too much of the code required to program at this level
is not directly related to the application.  In contrast,
Figures   and   illustrate the
CORBA version of the stream application implemented using Orbix 1.3.
This implementation is considerably more concise than both the C and
ACE wrapper versions.  CORBA performs the low-level communication
details associated with service location, passive and active
connection establishment, message framing, marshalling and
demarshalling, demultiplexing, and upcall dispatching.  This allows
developers to concentrate on defining application-specific behavior,
rather than wrestling with the details of network programming.

The persistent server shown in Figure   creates an
implementation of a    Data Stream  IDL interface and informs the
ORB that it is ready to receive    send  requests.  It uses a
standard ACE class    CORBA Handler  to register the server and
object name with the Orbix daemon automatically.

The client shown in Figure   uses the Orbix locator
service to bind to the    marker  exported by the    Data Stream 
server.  Once bound, the client transmits all data from its standard
input to the server via the    Data Stream::send  proxy.  This
example is behaves slightly differently than the C and ACE wrapper
versions since CORBA does not provide a standard means to obtain the
host and port of the sender.  Moreover, CORBA communication semantics
are request-oriented rather than connection-oriented.  Thus, other
clients could conceivably bind to the same marker name and transmit
data via its    send  method.

  Socket Wrapper Design Principles 
 
This section describes the design principles applied throughout the
   SOCK SAP  class category.  Although these principles are widely
used in domains such as graphical user interfaces they are less widely
applied in the communication software domain.

 
     Only permit typesafe operations: 

Several limitations with sockets discussed in Section  
stem from the lack of typesafety in its interface.  To enforce
typesafety,    SOCK SAP  ensures all of its objects are properly
initialized via constructors.  In addition, to prevent accidental
violations of typesafety, only legal operations are permitted on   
SOCK SAP  objects.  This latter point is illustrated in the    SOCK
SAP  revision of    echo server  shown in Figure  .
This version fixes the problems with sockets and C identified in
Figure  .  Since    SOCK SAP  classes are strongly
typed, invalid operations are rejected at compile-time rather than at
run-time.  For example, it is not possible to invoke    recv  or
   send  on a    SOCK Acceptor  connection factory since these
methods are not part of its interface.  Likewise, return values are
used to convey success or failure of operations, rather than returning
more detailed information.  This reduces the potential for misuse in
assignment expressions.

       Simplify for the common case:  The key
to this principle is ``make it easy to use    SOCK SAP  correctly,
hard to use it incorrectly, but not impossible to use it in ways the
class designers did not anticipate originally.''  This principle is
exemplified by the    get handle  and    set handle  methods
provided by the    IPC SAP  root class.  These methods extract and
assign the underlying handle, respectively.  By providing   
get handle  and    set handle ,    IPC SAP  allows applications
to circumvent its type-checking mechanisms in unforeseen situations
where applications must interface directly with UNIX system calls
(such as    select ) that expect a handle.

       Define parsimonious interfaces:  This
principle localizes the cost of using a particular abstraction.  The
   IPC SAP  interfaces limits the amount of details that application
developers must remember.     IPC SAP  provides developers with
distinct cluster of classes that perform various types of
communication (such as connection-oriented vs. connectionless) and
various roles (such as active vs. passive).  For example, to reduce
the change of error, the    SOCK Acceptor  class only permits
operations that apply for programs playing passive roles and the   
SOCK Connector  class only permits operations that apply for programs
playing an active role.  In addition, sending and receiving open file
handles has a much simpler calling interface using    SOCK SAP 
compared with using the highly-general UNIX    sendmsg/recvmsg 
routines.

 
     Replace one-dimensional interfaces with
hierarchically-related class categories: 

This principle involves using hierarchically-related class categories
to restructure existing one-dimensional socket interfaces.  The
criteria used to structure the    SOCK SAP  class category involved
identifying and clustering related socket routines to maximize the
reuse and sharing of class components.

Inheritance support different subsets of functionality for the   
SOCK SAP  class categories.  For instance, not all operating systems
support passing open file handles (   e.g.,  Windows NT).  Thus, it
is possible to omit the    LSOCK  class (described in
Section  ) from the inheritance hierarchy without
affecting the interfaces of other classes in the    SOCK SAP 
design.  

Inheritance also increases code reuse and improves modularity.  Base
classes express    similarities  between class category components
and derived classes express the    differences .  For example, the
   SOCK SAP  design places shared mechanisms towards the ``root'' of
the inheritance hierarchy.  These mechanisms include operations for
opening/closing and setting/retrieving the underlying socket handles,
as well as certain option management functions that are common to all
the derived    SOCK SAP  classes.  Subclasses located towards the
``bottom'' of the inheritance hierarchy implement specialized
operations that are customized for the type of communication provided
(such as stream vs. datagram communication or local vs. remote
communication).  This approach avoids unnecessary duplication of code
since the more specialized derived classes reuse the more general
mechanisms provided at the root of the inheritance hierarchy.

 
     Enhance portability with parameterized types: 
Wrapping sockets with C++ classes (rather than stand-alone C
functions) helps to improve portability by allowing the wholesale
replacement of network programming mechanisms via parameterized types.
Parameterized types decouple applications from reliance on specific
network programming interfaces.  Figure   illustrates
this technique by modifying the    echo server  to become a C++
function template.  Depending on certain properties of the underlying
OS platform (such as whether it implements TLI or sockets more
efficiently), the    echo server  may be instantiated with either
   SOCK SAP  or    TLI SAP  classes, as shown in
Figure  .

In general, the use of parameterized types is less intrusive and more
extensible that conventional alternatives (such as implementing
multiple versions or littering conditional compilation directives
throughout the source code).

For example, the    SOCK SAP  and    TLI SAP  classes offer the
same object-oriented interface (depicted in Figure  ).
Certain OS platforms may possess different underlying network
programming interfaces such as sockets but not TLI or vice versa.
Using    IPC SAP , applications can be written that are
transparently parameterized with either the    SOCK SAP  or    TLI
SAP  class category.  C++ templates support a loose form of type
conformance that does not constrain an interface to encompasses   
all  potential functionality.  Instead, templates are used to
parameterize application code that is carefully designed to invoke
only a subset of methods that are common to the various communication
abstractions (   e.g.,     open ,    close ,    send ,   
recv , etc.).

The type abstraction provided by templates helps improve portability
among platforms that support different network programming interfaces
(such as sockets or TLI).  For example, the parameterizing the
transport interface turned out to be useful for developing
applications across various SunOS platforms.  The socket
implementation in SunOS 5.2 was not thread-safe and the TLI
implementation in SunOS 4.x contains a number of serious defects.

       Inline performance critical methods:  To
encourage developers to replace existing low-level network programming
interfaces with C++ wrappers, the    SOCK SAP  implementation must
operate efficiently.  To ensure this, methods in the critical
performance path (such as the    SOCK Stream     recv  and   
send  methods) are specified as C++ inline functions to eliminate
run-time overhead.  Inlining is both time and space efficient since
these methods are very short (approximately 2 or 3 lines per method).
The use of inlining implies that virtual functions should be used
sparingly since most contemporary C++ compilers do not fully optimize
away virtual function overhead.

 
     Design auxiliary classes that shield applications from
error-prone details:  

   e.g.,     SOCK SAP  contains the    Addr  class hierarchy
(shown in Figure  ).  This hierarchy supports several
diverse network addressing formats via a typesafe C++ interface.  The
   Addr  hierarchy eliminates several common programming errors
(such as forgetting to zero-out a    sockaddr  addressing structure)
associated with using the C-based family of    struct sockaddr  data
structures directly.

       Combine several operations to form a
single operation:     e.g.,  the    SOCK Acceptor  is a factory
for passive connection establishment.  Its constructor performs the
socket calls    socket ,    bind , and    listen  required to
create a passive-mode listener endpoint.

 
     Supply default parameters for typical method argument
values:     e.g.,  the addressing parameters to    accept  are
frequently NULL pointers.  To simplify programming, these values are
given as defaults in    SOCK Acceptor::accept  so that programmers
need not provide them.

  Concluding Remarks 
 

An important class of applications require high-performance streaming
communication.  Bandwidth-intensive and delay-sensitive streaming
applications like medical imaging or teleconferencing are not
supported efficiently by contemporary CORBA implementations due to
data copying, demultiplexing, and memory management overhead.  As
shown in Section  , this overhead is often masked on
low-speed networks like Ethernet and Token Ring.  On high-speed
networks like ATM or FDDI, however, this overhead becomes a
significant factor limiting communication performance.

The ACE socket wrappers described in this paper provide a
high-performance network programming interface that shields developers
from lower-level details of sockets or TLI without sacrificing
performance.  The ACE wrappers automate and simplify many aspects
(such as initialization, addressing, and handling short-writes) of
using lower-level network programming interfaces.  They improve
portability by shielding applications from platform-specific network
programming interfaces.  Wrapping sockets with C++ classes (rather
than stand-alone C functions) makes it convenient to switch wholesale
between different network programming interfaces by using
parameterized types.  In addition, as shown in Figure  ,
the ACE socket wrappers do not introduce any significant overhead
compared with programming with socket directly.

The primary drawback with the ACE network programming wrappers is that
they do not address higher-level issues related to system reliability
and availability, flexibility of object location and selection,
support for transactions, security, and deferred process activation,
and the exchange of binary data between different computer
architectures.  For example, programmers must provide explicit support
for presentation layer conversions in conjunction with the ACE
wrappers.  Therefore, these wrappers are most useful when the
datatypes simple, like those used by the high-performance streaming
applications described in this paper.

The ACE C++ wrappers for sockets may be integrated with CORBA to
enhance the performance of streaming applications.  We've combined
CORBA and the ACE wrappers in a high-speed teleradiology system that
transfers 10-40 Mbyte medical images over ATM.  In this system, CORBA
is used as a signaling mechanism to identify endpoints of
communication in a location-independent manner.  The ACE wrappers are
then used to establish point-to-point TCP connections and transmit
bulk data efficiently across the connections.  This strategy builds on
the strengths of both CORBA and ACE.

ACE has been ported to many versions of UNIX and Windows NT and is
currently being used in many commercial products including the
Bellcore and Siemens Q.port ATM signaling software product, the
Ericsson EOS family of telecommunication monitoring applications, the
System Control Segment of the Motorola Iridium project, and a
high-speed enterprise-wide medical image delivery system for Kodak
Health Imaging Systems.