################################################
	   #                                              #
	   # ##   ## ###### ####### ##    ## ## ##     ## #
	   # ##   ## ##  ## ##      ###   ## ##  ##   ##  #
	   # ##   ## ##     ##      ####  ## ##   ## ##   #
	   # ##   ## ###### ######  ## ## ## ##    ###    #
	   # ##   ##     ## ##      ##  #### ##   ## ##   #
	   # ##   ## ##  ## ##      ##   ### ##  ##   ##  #
	   # ####### ###### ####### ##    ## ## ##     ## #
	   #                                              #
	   ################################################


	 The following paper was originally published in the
      Proceedings of the USENIX Summer 1993 Technical Conference
		  Cincinnati, Ohio  June 21-25, 1993


	For more information about USENIX Association contact:

		   1. Phone:	510 528-8649
		   2. FAX:	510 548-5738
		   3. Email:	office@usenix.org
		   4. WWW URL:  https://www.usenix.org


                AudioFile:  A Network-Transparent System
                   for Distributed Audio Applications

                 Thomas M. Levergood, Andrew C. Payne,
       James Gettys, G. Winfeld Treese,  and Lawrence C. Stewart

                   Digital Equipment Corporation
                      Cambridge Research Lab


                            Abstract

AudioFile is a portable, device-independent,  network-transparent
system  for computer audio systems.  Similar to the X Window Sys-
tem, it provides an abstract audio device interface with a simple
network  protocol to support a variety of audio hardware and mul-
tiple simultaneous clients. AudioFile emphasizes client  handling
of  audio  data  and permits exact control of timing.  This paper
describes our approach to digital audio, the AudioFile  protocol,
the client library, the audio server, and some example client ap-
plications. It also discusses the performance of the  system  and
our  experience  using standard networking protocols for audio. A
source code distribution is available by anonymous FTP.

1   Introduction

Audio hardware is becoming increasingly common on desktop comput-
ers.  In 1990, the authors began a project at Digital's Cambridge
Research Laboratory to explore  desktop  audio.  1  We  began  by
designing  a  #exible  I/O  device for audio and telephony.  Once
that hardware was available, we  began  work  on  software.   The
result of our efforts is the AudioFile System.

    Similar to the X Window System [13 ], AudioFile was  designed
to  allow  multiple  clients,  to  support  a  variety  of  audio
hardware, and to permit transparent access through  the  network.
Since  its  original  implementation, AudioFile has been used for
many applications and experiments with desktop audio.  These  ap-
plications  include  audio  recording,  playback,  video telecon-
ferencing, answering machines,  voice  mail,  telephone  control,
speech  recognition,  and  speech  synthesis.  AudioFile supports
multiple audio data types and sample rates, from 8 KHz  telephone
quality through 48 KHz high-#delity stereo.

    Currently, AudioFile runs on Digital's  RISC  and  Alpha  AXP
systems,  on Sun's SPARC systems, and on Silicon Graphics' Indigo
workstations.  A  source  code  distribution  is   available   by
anonymous FTP.

    Like the X Window System, AudioFile has four main components:


    - The Protocol. The AudioFile System de#nes a  wire  protocol
that links the server with client applications over
       local and network communication channels.

    -  Client Library and API. The client  library  and  applica-
tions programming interface (API) provide a means for
       applications to generate protocol requests and to communi-
cate with the server using a procedural interface.

     - The  Server.   The  AudioFile  server  contains  all  code
speci#c to individual devices and operating systems.  It
mediates access to audio hardware devices and exports  the
device-independent interface to clients.
_____________________________________________________
    Also with the MIT Laboratory for Computer Science.
    The authors' names are in random order.
   1And video, but that is another story.

    - Clients. The AudioFile distribution includes  several  out-
of-the-box applications which make the system immediately usable
and which serve as illustrations  for  more complex applications.


    The parts of the implementation of  AudioFile  that  are  not
speci#c to audio, such as client/server communications, are based
on X11 Release 4. 2 We should emphasize that AudioFile is not  an
addition  to  the  X Window System; it is a separate entity which
borrowed some source code. We feel quite strongly that audio ser-
vices should be separate from graphics.

    AudioFile was designed with several goals in mind. These  in-
clude:

    - Network transparency.  Applications  can  run  on  machines
scattered throughout the network.  Network transparency allows
applications to run anywhere but still interact with the user.
Network transparency also enables applications such as teleconferencing
that need to use audio on several systems simultaneously.

    - Device independence. Applications need not be rewritten  to
run on new audio hardware. The AudioFile System provides a common
abstract interface to the real hardware, insulating applications
from the messy details.

    - Support for multiple simultaneous clients. Applications can
run concurrently, sharing access to the actual audio hardware. 3

    - Support a wide range of clients.  It should be possible  to
implement applications ranging from audio biff to multiuser
teleconferencing. We chose to  implement  a  few very general-purpose
mechanisms that permit a wide variety of applications.

    - Simplicity. Simple applications should be  simple;  complex
applications should be possible.

    This paper begins with a discussion of the historical context
of  AudioFile.  We discuss the key abstractions, the network pro-
tocol, the client library, and the server  implementation.  Next,
we describe some sample applications and analyze AudioFile's per-
formance. We conclude with a brief discussion of  our  plans  for
future work.

2   Background

    In the early 1980's, Xerox PARC built an Ethernet-based tele-
phone  system called Etherphone [15 ].  The system also had capa-
bilities for workstation  control  of  recording,  playback,  and
storage.  The Etherphone system was used primarily to explore is-
sues of multimedia documents and computer-telephone integration.

    In the mid 1980's, the Fire#y multiprocessor [17 ]  developed
at  Digital's  Systems  Research  Center  incorporated telephone-
quality audio.  An audio server buffered the past input  and  fu-
ture  output and exported a remote procedure call (RPC) interface
to clients.  This system pioneered  explicit  client  control  of
time  (as  described in Section 3) and was primarily used for ap-
plications such as teleconferencing and multimedia presentations.

    In the mid to late 1980's, the MIT  Media  Lab  and  Olivetti
Research  collaborated on a project called VOX [2 ]. In this sys-
tem almost  all  audio  functions  were  implemented  inside  the
server,  with the client merely controlling those functions.  VOX
was constrained by the view that clients would control the #ow of
audio  between  external  devices,  rather than handling the data
themselves.

    Other projects were underway at about the same  time  as  Au-
dioFile.  Digital's  XMedia  [1  ] and Bellcore's Sonix [12 ] are
similar to VOX in their emphasis on  handling  audio  within  the
server.  The  conferencing system described by Terek and Pasquale
[16 ] was based on a modi#ed X server. In contrast, we think  au-
dio  and graphics should be kept separate for ease of implementa-
tion and so that non-graphics equipped machines can still use au-
dio.

3   Audio Abstractions

    This section describes the fundamental  abstractions  of  Au-
dioFile.   These  provide  the view of audio available to clients
and guided the design of the protocol, client library, and  audio
server.  _____________________________________________________2
    Why start with a clean slate? For more information on how  to
steal code, consult Spencer [14].
   3We think that something like an #audio window manager#  might
be useful, but so far we have not found it necessary to implement
*
 *one.

3.1   Devices

    Abstract  audio  devices  are  Analog-to-Digital  (ADC)   and
Digital-to-Analog (DAC) converters which produce and consume sam-
ple data at a regular rate known as the sampling  frequency.  The
sample  data  are  one of several prede#ned types and consists of
one or more channels.

3.2   Time

    The concept of audio device time is critical to understanding
the  design  of  all AudioFile components. We expose audio device
time in the protocol and at the client  library  API.  All  audio
recording  and  playback  operations  in the AudioFile System are
tagged with time values that are  directly  associated  with  the
relevant audio hardware.

    There are a remarkable number of clocks in a  modern  distri-
buted  computer  system.  A simple desktop system might have four
different clocks: time-of-day, interval timer,  display  refresh,
and  audio. Each computer system in a network has its own clocks.
Time-of-day clocks may be synchronized with protocols such as NTP
[9  ],  but  we  are not aware of any systems that keep the other
clocks synchronized.  In principle, it is  possible  to  use  any
clock  for  audio.  Because we wanted to be able to specify audio
data down to the individual sample, we chose  to  use  the  audio
sample rate clock.  The server maintains a representation of this
clock in a #time register# for scheduling all  audio  events  for
the  particular device. AudioFile does not provide a complete in-
frastructure for synchronization; rather, it  supplies  low-level
timing  information  to  its clients. Applications can build syn-
chronization mechanisms suitable to their own needs. 4

    Audio device time is represented by a 32-bit integer that in-
crements  once  per sample period and wraps on over#ow.  There is
no absolute reference value for a device time; the value  is  set
to 0 when the server is initialized and advances thereafter. Time
comparisons are easy to implement, as illustrated by the  follow-
ing  code  fragment  for  a  device  running  at 8000 samples per
second.


             if ((b - a) > 0)       /* time b is later than  time
a.              */
             if ((b - a) < 0)       /* time  b  is  earlier  than
time a.            */
             if ((b - a) == 8000)   /* time b is one second later
than time a. */


    This method breaks when the difference approaches  231.  Pro-
grams that deal with time must be careful not to make comparisons
between widely separated time values. Even at a 48  KHz  sampling
rate,  however, 231 samples represent about 12 hours worth of au-
dio.

    Each play and record request carries with it an exact  times-
tamp. The abstraction is implemented by buffering future playback
and recent record data in the server.   Continuous  recording  or
playback  is  done  by advancing the requested device time by the
duration of the previous request.

    Explicit control of time provides the  mechanism  needed  for
real-time applications. As long as play requests reach the server
before their start times, playback will be continuous. A leisure-
ly application will schedule playback well in the future, while a
real-time application will schedule for the very near future.

    Recording is a much easier problem than playback. The  server
buffers  all audio input, typically for several seconds.  No data
will be lost unless a request fails to  reach  the  server  until
well  after its start time. Because the server buffers all device
input, clients can request recording at times #in the  past#  and
deliver  instantaneous  response. For example, an application can
begin recording at  the  exact  moment  the  #Record#  button  is
pressed, even though there is a delay from pressing the button to
scheduling the record. This is a more  natural  interaction  than
requiring the user to wait for an audible beep.

    This explicit use of  time  means  that  clients  operate  on
blocks  of  audio  data.  The  alternative  to AudioFile's block-
oriented design would be streams, which would cause problems that
AudioFile  avoids.  It  is dif#cult to determine how much data is
buffered in a stream or to #nd out if  a  stream  is  running  or
blocked.  Streams tend to obscure issues of bandwidth and latency
that are critical to real-time applications.  Finally,  in  prac-
tice, applications deal with data in blocks anyway.

3.3   Output Model

    AudioFile's output model is shown in Figure  1.  Clients  can
schedule  playback  at  any time from the present to four seconds
into the future. Requests that fall in the past are silently dis-
carded.    Requests    that    fall    beyond   the   four-second
_____________________________________________________4
    We envision adding standard mechanisms  for  providing  clock
conversion  services,  but  have not yet encountered a compelling
need *
 *to do so.


                    Figure  1:   AudioFile server output model


buffer are suspended until time advances to within four seconds.

    After a playback request is received by the server, the  data
are  passed through an optional module that converts the client's
data type to  the  device's  preferred  representation.  5  After
conversion,  a  client speci#ed gain is applied, and the data are
combined with the data from other clients in the playback buffer.
The  server will mix client data by default, but preemptive play-
back is possible. Finally, a master volume control is applied.

    The output model speci#es  that  silence  is  emitted  during
periods  of time in which no data have been written to the output
buffer.  This may reduce network  bandwidth  requirements,  since
clients need not transmit silence data to the server.

3.4   Input Model

    AudioFile's input model is shown in Figure 2.  As in the out-
put model, the server buffers four seconds of data.  The recorded
data are modi#ed by a master  input  gain  and  placed  into  the
server  buffer.  Clients  requesting  input  data older than four
seconds in the past are given silence. Requests within  the  past
four  seconds  return  buffered  data, and requests in the future
block until time  advances.  The  system  also  supports  a  non-
blocking record request, which will return as much data as possi-
ble without blocking. The  input  model  also  supports  optional
modules  which convert the native audio hardware data type to the
client requested type.

      Figure  2:   AudioFile server input model

_____________________________________________________5
    Conversions could include sample rate  changing  as  well  as
data type conversion.

3.5   Events

    Events are  asynchronous  messages  from  server  to  client.
Events  may  be generated by a device or as a side effect of some
client's request.  Clients must register their  interest  to  re-
ceive various classes of events, such as a telephone device ring-
ing or a change in a property  used  for  interclient  communica-
tions.

4   Audio Hardware

    AudioFile is designed to support a variety of audio hardware.
Currently supported devices include:


LoFi   LoFi,6 designed at CRL, is a workstation peripheral  later
released as the DECaudio product [7 ]. LoFi supports
       two 8 KHz CODECs, one connecting to a telephone line. LoFi
contains a digital signal processing chip with 32K
       words of memory shared with the host.  The DSP supports  a
44.1 KHz stereo DAC and can also be used with
       external #DSP port# devices operating at  up  to  48  KHz.
The telephone interface on LoFi enables applications
       such as voice mail and remote information access.

       We see no dif#culty in supporting other kinds of telephone
interfaces, such as ISDN or PhoneStation [19 ].

JVideo:  JVideo is  an  experimental  desktop  video  peripheral.
Like LoFi, JVideo includes audio hardware based on a
       DSP  and  shared  memory.  However,  JVideo  has   neither
telephony capability nor an external DSP port.

Baseboard: The Personal DECstation series, the Alpha AXP-based
systems, and the SPARCstation 2 all include 8
       KHz CODEC devices on their system modules.

Indigo:   The Silicon Graphics Indigo workstations support  stereo
audio at rates up to 48 KHz.

LineServer:   LineServer is an Ethernet  peripheral  used  within
Digital's research labs for remote bridging and routing.  LineServer
includes an 8 KHz CODEC. The LineServer version of the AudioFile server
is interesting because the server runs on a nearby host, not  on  the
LineServer itself.

5   Protocol Description

    The AudioFile protocol is modeled on the same  basic  princi-
ples  as the X Window System protocol. AudioFile can be used over
any transport protocol that is reliable and which does not reord-
er  or  duplicate  data.  The current version supports TCP/IP and
UNIX-domain sockets. At connection setup, the client  and  server
exchange  version  information, the server sends audio device at-
tributes, and clients provide authentication information.

    The AudioFile protocol de#nes 37 request  types.  7  Many  of
these  are  for  housekeeping purposes, including access control,
interclient communications, and extensions (though no  extensions
are  implemented  today).  Other requests support device control,
telephony,  and  audio  data  handling.    Table   1   summarizes
AudioFile's  protocol  requests.   All  protocol  requests have a
length #eld, an opcode, and an opcode extension. Header #elds are
kept  naturally  aligned  by padding request data out to a 32-bit
boundary.

    There are #ve #xed-size events supporting  telephone  control
and interclient communications. All events contain the audio dev-
ice time and the clock time of the host of the server.  The  host
clock time may be needed when synchronizing with other media.

    Rather than specifying all parameters  for  play  and  record
with  each  request, a client uses an #audio context# (AC) to en-
capsulate most of these parameters. The  audio  context  includes
the  play  gain, number of channels, sample type, and byte order.
ACs simplify the programming interfaces for play and record.

    AudioFile adopted from X the idea of property lists to enable
clients to communicate.  Properties are associated with a device,
and can be read and written by clients. Clients can  register  to
be  noti#ed  when properties change. These facilities can be used
to coordinate use of resources and to share information, such  as
the last phone number dialed.

6   Client Libraries

    We have developed two client libraries: a standard  interface
to the AudioFile server and a utility library of common functions
required              by              many               clients.
_____________________________________________________6
    We called it LoFi primarily because it  wasn't.  LoFi  always
included high #delity audio capability.
   7For comparison, the X Window System has 119 requests  in  the
core protocol.
________________________________________________________________________
_   Audio  and  Events       _  SelectEvents           _   Select
which     events     the     client      wants           _      _
_CreateAC                 _       Create    an    audio   context
_ _                          _ChangeACAttributes   _       Change
the     contents     of     an     audio     context      _     _
_FreeAC                    _      Free    an    audio     context
_ _                         _PlaySamples           _    Play sam-
ples                                           _                _
_RecordSamples               _              Record        samples
_
___________________________GetTime____________________Get_the_audio_device's_time________________
_  Telephony             _ QueryPhone           _      Get  tele-
phone           state                                _          _
_EnablePassThrough    _         Enable   telephone    passthrough
_  _                          _DisablePassThrough  _      Disable
telephone           passthrough                     _           _
_HookSwitch                   _          Control       hookswitch
_  _                          _FlashHook              _     Flash
hookswitch                                       _              _
_EnableGainControl       _        Not     for     general     use
_  _                          _DisableGainControl   _     Not for
general                use                                      _
___________________________DialPhone__________________Obsolete,_do_not_use_______________________
_  I/O Control            _SetInputGain           _    Set  input
gain                                           _                _
_SetOutputGain           _      Set    output    gain    (volume)
_  _                         _QueryInputGain       _     Find out
current        input        gain                      _         _
_QueryOutputGain        _       Find   out  current  output  gain
_ _                          _EnableInput             _    Enable
input                                            _              _
_EnableOutput                  _            Enable         output
_  _                          _DisableInput           _   Disable
input                                                           _
___________________________DisableOutput______________Disable_output_____________________________
_  Access Control        _ SetAccessControl     _     Set  access
control                                        _                _
_ChangeHosts            _       Change   access   control    list
_
___________________________ListHosts__________________List_which_hosts_are_permitted_access______
_   Atoms  and  Properties  _ InternAtom             _   Allocate
unique           ID                                 _           _
_GetAtomName              _           Get     name     for     ID
_ _                          _ChangeProperty        _      Change
device           property                            _          _
_DeleteProperty           _        Remove     device     property
_  _                         _GetProperty            _   Retrieve
device                 property                                 _
___________________________ListProperties_____________List_all_device_properties________________
_ _   Housekeeping          _   NoOperation            _     Non-
blocking          NoOperation                        _          _
_SyncConnection            _          Round-trip      NoOperation
_  _                          _QueryExtension       _     Not yet
implemented                                    _                _
_ListExtensions              _        Not     yet     implemented
_
___________________________KillClient_________________Not_yet_implemented_______________________

              Table 1: AudioFile protocol requests

6.1   Core Library

    The core client library is the  standard  interface  for  Au-
dioFile clients.  Some of its functions provide interfaces to the
AudioFile protocol; others provide an interface to the  library's
internal  data structures. Table 2 summarizes these library func-
tions.

    Some library functions, such as AFGetTime(), require  an  im-
mediate  response  from the server; others, such as AFCreateAC(),
do not. In the former case, the library blocks until a  reply  is
received.  Otherwise,  the  library may defer sending the request
and will return to the client  immediately.  Certain  operations,
including the synchronous functions, #ush any deferred requests.

    AudioFile provides a simple access control  scheme  based  on
host  network  address.  The access control functions allow hosts
to be added or removed from the access  list  and  allows  access
control to be disabled entirely.

6.2   Client Utility Library

    The AudioFile distribution also includes a utility library to
provide  a number of facilities that are used by several clients.
Two kinds of facilities are provided: tables and subroutines.

    The AudioFile system handles a variety of digital audio  data
formats,  such  as  -law  and A-law. The utility library includes
tables for converting these formats to and from linear  encoding.
The library also includes tables for computing signal power, gain
control, and generating sine waves at various frequencies.

    The utility library gathers together a number of useful  sub-
routines. Most do not directly interact with the AudioFile proto-
col.  They include  subroutines  for  generating  on-the-#y  gain
translation  tables  for  -law and A-law samples, tone generation
procedures, and some miscellaneous  functions.   One  subroutine,
AFDialPhone(),  encapsulates the operations necessary to generate
Touch-Tone dialing sequences on a telephone device.


7   Server Design

    The AudioFile server is responsible for  managing  the  audio
hardware and presenting abstract device interfaces to clients via
the AudioFile protocol.  This section discusses some of  the  im-
portant  issues  in  the  server's  design, the implementation of
buffering to provide the audio device abstraction,  and some oth-
er details of the server's implementation.

7.1   Implementation Considerations

    Our primary concern for the implementation  of  an  AudioFile
server was performance.  We wanted the server to run continuously
in the background, so we felt that the  quiescent  server  should
present  a  negligible  CPU load. Further, load due to the server
with a few clients running should leave most of the CPU available
for applications.

    We considered using threads to implement the server, but were
apprehensive  about  the  performance and portability of existing
thread packages.  Although the internal structure of  the  server
might  be  slightly cleaner with threads, we took the safer route
and designed the server as a single-threaded process.  The server
must handle requests from multiple active clients, so we designed
it so that one client cannot dominate  the  available  processing
time.

7.2   Buffering

    The server maintains input and output buffers for each  audio
device.   A periodic update task moves samples between the server
buffers and the audio hardware. Figure 3 illustrates  the  server
record  and  play  buffers  before and after the update task exe-
cutes.  At each invocation, the update task moves new record data
(since  recLastUpdate)  from  the  hardware  buffer to the server
buffer, and moves the next batch of playback  data  (starting  at
the  #before#  timeNextUpdate)  from  the  server  buffer  to the
hardware buffer. Finally, the update task initializes the end  of
the server buffer with silence.

    Unless a client request falls into  the  shaded  portions  of
Figure  3,  it can be handled entirely out of the server buffers.
If a record request falls after recLastUpdate,  the  server  per-
forms  an  update  before handling the request. If a playback re-
quest falls before timeNextUpdate, the server writes the data all
the way through to the hardware.

________________________________________________________________________
_    Connection   Management    _      AFOpenAudioConn          _
Open  a   server   connection                               _   _
_AFCloseAudioConn          _        Close   a  server  connection
_  _                               _AFSynchronize               _
Synchronize    with   the   audio   server                      _
_______________________________AFSetAFterFunction_____________Set_a_synchronization_function__________________________
_    Audio   Handling             _   AFPlaySamples             _
Play  digital  audio   samples                              _   _
_AFRecordSamples          _       Record  digital  audio  samples
_
_______________________________AFGetTime______________________Get_the_device_time_of_a_device_________________________
_   Audio  Contexts             _    AFCreateAC                 _
Create   a   new   audio  context  (AC)                      _  _
_AFChangeACAttributes         _           Modify      an       AC
_
_______________________________AFFreeAC_______________________Free_resources_associated_with_an_AC___________________
_    Event   Handling             _    AFEventsQueued           _
Check   for   events                                       _    _
_AFPending                  _     Returns  number  of unprocessed
events               _  _                              _AFIfEvent
_   Find  and  dequeue  a  particular  event (blocking)       _ _
_AFCheckIfEvent            _    Find  and  dequeue  a  particular
event  (nonblocking)    _ _                             _AFPeekI-
fEvent              _     Find  a  particular  event   (blocking)
_   _                               _AFNextEvent                _
Return   the   next   unprocessed   event                       _
_______________________________AFSelectEvents_________________Select_events_of_interest______________________________
_  _   Telephone                   _   AFCreatePhoneAC          _
Create   an  AC  for  a  telephone  device                   _  _
_AFFlashHook              _     Flash the hookswitch on  a  tele-
phone   device             _  _                              _AF-
HookSwitch             _     Set  the  state  of  the  hookswitch
_
_______________________________AFQueryPhone___________________Returns_the_state_of_the_hookswitch_and_loop_current____
_  I/O Control                _AFEnableInput             _    En-
able  inputs  on  an   audio   device                       _   _
_AFDisableInput             _   Disable inputs on an audio device
_   _                               _AFEnableOutput             _
Enable   outputs   on  an  audio  device                     _  _
_ADisableOutput             _   Disable outputs on an audio  dev-
ice                     _  _                              _AFEna-
blePassThrough    _       Connect local audio  to  the  telephone
_    _                                _AFDisablePassThrough     _
Remove  the  direct  local  audio/telephone   connection    _   _
_AFQueryInputGain          _      Get minimum/maximum input gains
for a device     _ _                              _AFQueryOutput-
Gain        _       Get minimum/maximum output gains for a device
_  _                               _AFSetInputGain              _
Set   the   input  gain  of  a  device                          _
_______________________________AFSetOutputGain________________Set_the_output_gain_of_a_device_________________________
_  Access Control            _ AFAddHost                 _    Add
a  host  to  the  access   list                             _   _
_AFAddHosts                 _    Add a set of hosts to the access
list                                      _                     _
_AFListHosts                 _    Return  the  host  access  list
_   _                               _AFRemoveHost               _
Remove  a  host  from  the  access  list                     _  _
_AFRemoveHosts           _      Remove a set of  hosts  from  the
access  list            _ _                             _AFSetAc-
cessControl      _      Enable or disable access control checking
_    _                                _AFEnableAccessControl    _
Enable   access    control    checking                          _
_______________________________AFDisableAccessControl_________Disable_access_control_checking________________________
_    Properties                   _AFGetProperty                _
Manipulate    properties                                   _    _
_AFListProperties           _   Get a list of existing properties
_    _                               _AFChangeProperties        _
Modify   a   property                                      _    _
_AFDeleteProperty              _        Delete     a     property
_  _                               _AFInternAtom                _
Install    a   new   atom   name                                _
_______________________________AFGetAtomName__________________Fetch_the_name_of_an_atom______________________________
_  Error Handling            _ AFSetErrorHandler        _     Set
the   fatal   error   handler                               _   _
_AFSetIOErrorHandler      _     Set the system call error handler
_
_______________________________AFGetErrorText_________________Translate_error_code_to_a_string_______________________
_  _   Miscellaneous              _  AFNoOp                     _
Don't    do   anything                                      _   _
_AFFlush                    _   Flush any queued requests to  the
server                 _   _                              _AFSync
_    Default   synchronization   function                       _
_______________________________AFAudioConnName________________Return_the_name_of_the_audio_server____________________

                                   Table 2: AudioFile client  li-
brary functions


         Figure 3:  AudioFile periodic updates

7.3   Server Implementation

    An AudioFile server is organized like an  X  server.  It  in-
cludes  device  independent  audio  (DIA), device dependent audio
(DDA), and operating system (OS) components. The DIA  section  is
responsible  for  managing  client  connections,  dispatching  on
client requests, sending replies and events to clients, and  exe-
cuting  the  main processing loop. The DDA section is responsible
for presenting the abstract interface for each  supported  device
and  contains  all  device-speci#c code.  Finally, the OS section
includes all the platform or operating system-speci#c code.  Much
of the OS and DIA code is based on X11R4.

    Instead of  using  threads,  we  implemented  a  simple  task
mechanism  which  allows procedures to be scheduled for execution
at future times, outside the  main  #ow  of  control.   The  task
mechanism  is  used  by  the server's update mechanism and by the
dispatcher to resume execution of partially completed client  re-
quests.

    At the core of the DIA section  is  the  main  control  loop,
which  relies  heavily  on the select() system call.  select() is
called with #le descriptors for client connections and open  dev-
ices, as well as a timeout argument for the next task which needs
to execute.  When select() returns, the server runs  any  pending
tasks and then handles input events and client requests.

    Client requests are processed by the dispatcher. The  request
type  is  used  to index into a table of protocol request handler
procedures.  All handlers are implemented by the device  indepen-
dent part of the server, but audio speci#c requests are passed to
the device dependent audio server.

    Adding DDA support for a new audio device is straightforward.
The  interface  between the DIA and DDA sections of the server is
very similar to that in the X server.


7.4   Device Dependent Server Examples

    This section describes some of the device dependent implemen-
tation details of the LoFi, baseboard, and LineServer DDA code.

Alo# # LoFi Server

    There are play and record circular  buffers  for  each  CODEC
device and for each channel of the stereo HiFi device.  For HiFi,
we implemented a  single  stereo  abstract  device,  as  well  as
separate  left  and right devices for those clients not requiring
stereo data. The host performs audio I/O by reading  and  writing
these shared memory buffers. The LoFi's DSP runs a simple program
that maintains the device time and buffers for each device.

    We optimized the update procedure  to  achieve  good  perfor-
mance.  The server's periodic updates, which move samples between
the server's buffers and the hardware, can consume  quite  a  few
CPU cycles, especially at high sample rates. We chose to have the
HiFi record update run only if there is an active record  client.
The  #rst record operation performed under an audio context marks
the context as recording.  A per-device reference count then con-
trols  the  operation  of  the record update code. Similarly, the
play update runs only when there is outstanding client  data.  We
also  initialize  the  server buffer with silence only when abso-
lutely necessary. In the common case of contiguous  playback  re-
quests, silence #lling is never necessary.
________________________________________________________________________
              _   Audio  Handling                 _  apass      _
Record from one AF server and playback on another  _
              _                                   _aplay        _
Playback from #les or pipes                           _
              _                                    _arecord     _
Record to #les or pipes                                _
              _                                            _abiff
_Incoming e-mail noti#cation by audio                 _
              _                                    _abob        _
Tk-based multimedia demonstration                  _
              _                                   _radio        _
Multicast network audio                              _
              _                                   _xplay        _
X-based sound #le browser                           _
              ____________________________________abrowse______Tk-
based_sound_#le_browser_____________________________
              _  Device  Control                 _  ahs         _
Telephone hook switch control                        _
              _                                    _aphone      _
Telephone dialer                                      _
              _                                   _aset         _
Device control                                        _
              _                                    _aevents     _
Report input events                                   _
              _                                  _adial       _Tk
telephone dialer                                   _
              _                                  _axset      _ Tk
version of aset                                    _
              ____________________________________afxctl_______X-
based_event_display_and_device_control_______________
              _    Signal    Processing    Utilities        _afft
_Tk based real-time spectrogram display               _
              _                                  _xpow      _   X
display of audio signal power                       _
              ____________________________________autil________Signal_generator________________________________________
              _   Access  control  and  Properties  _ahost      _
AudioFile server access control                       _
              _                                    _alsatoms    _
Display de#ned atoms                                _
              ____________________________________aprop________Display_and_modify_properties__________________________

                                                   Table  3:  Au-
dioFile clients

Aaxp and Asparc # Baseboard Servers

    The audio servers for the baseboard audio on Alpha AXP works-
tations  and SPARCstations use kernel device drivers with similar
interfaces.  The server's update procedure uses the device driver
read  and  write interfaces for recording and playing audio data.
Because the kernel device drivers do not maintain a time register
for the baseboard CODECs, the server maintains an estimated value
using the system clock and must occasionally  resynchronize  with
the device driver.

Als # LineServer Server

    For the LineServer, an AudioFile server running on  a  nearby
workstation uses a private UDP-based protocol to communicate with
the device.  The LineServer runs simple  #rmware  that  processes
incoming  packets  and  moves  samples  to  and  from  the  audio
hardware. On the workstation, a periodic update task  moves  data
between  the  server's buffers and the LineServer's buffers using
the private protocol.  The server makes every attempt to minimize
access  to  the LineServer, since crossing the network is a rela-
tively expensive operation. Only requests in the  update  regions
require  network  traf#c.  For  requests that require returning a
device time, the server generates an estimate.

8   AudioFile Clients

    The AudioFile distribution includes a number of  client  pro-
grams, including applications for recording, playback, and signal
processing, as well as telephone,  device,  and  access  control.
Some clients have graphical user interfaces using X and Tk toolk-
its [11 ]. Table 3 summarizes the clients. In  the  remainder  of
this  section  we  describe two clients, aplay and apass, in some
detail, to illustrate the simplicity of the AudioFile client  li-
brary programming interface. We also include an answering machine
shell script to show how simple AudioFile clients can be combined
into larger applications.

8.1   aplay

    aplay reads digital audio from a #le or  standard  input  and
sends  it to the server for playback. aplay can serve as the core
of a sound-clip browser or voice mail retrieval program or as the
#nal  stage  in  a  signal  processing pipeline. For example, our
software implementation of the DECtalk [3  ]  speech  synthesizer
uses aplay for output.

    aplay handles only #raw# sound #les but could be  easily  ex-
tended to handle popular sound #le formats.  aplay works with any
#xed-size encoding format and any number of channels  #  but  the
user must know the format of the

#le and choose an appropriate server device.

    As an example, we show a simpli#ed version of aplay.  Readers
who  wish  all the details should read the sources, which are in-
cluded in the AudioFile distribution.

         aplay()
         -
              aud = AFOpenAudioConn("");          /* open a  con-
nection to the server */
              device = FindDefaultDevice(aud);   /* select  audio
device                */
              /* set up audio context, possibly setting the  gain
and endian-ness    */
              ac = AFCreateAC(aud, device, (ACPlayGain _  ACEndi-
an), &attributes);


    FindDefaultDevice() locates the lowest numbered audio  device
that  is not connected to the telephone, which is usually the lo-
cal audio device.

              srate = ac->device->playSampleFreq;       /* sample
rate                 */
              type = ac->device->playBufType;             /*  en-
coding type              */
              ssize = sample_sizes[type] * channels;    /*  bytes
per sample           */
              buf = malloc(BUFSIZE*ssize);                /*  al-
locate the play buffer */

              nbytes = read(fd, buf, BUFSIZE*ssize);     /*  pre-
read first buffer     */


    It is not logically necessary to pre-read the #rst #le block,
but  doing  so avoids putting the latency of the #le read between
the call to AFGetTime() and the #rst call to AFPlaySamples().

              t = AFGetTime(ac);                          /*  ob-
tain initial device time          */
              t  =  t  +  (time_offset  *  srate);             /*
schedule initial playback           */

              do  -                                            /*
loop until done or brown on top    */
                  nact = AFPlaySamples(ac, t, nbytes, buf);    /*
send samples to the server */
                  /*
                    * At this point, the buffers  in  the  server
hold the samples from time
                    * nact to time t.   Next, we figure  out  how
many samples we read
                    * from the file, and schedule the next  block
to start after this one
                    */
                  t += (nbytes / ssize);
              " while ((nbytes = read(fd, buf, BUFSIZE*ssize))  >
0);
         "

    This code fragment is the inner loop of aplay. It obtains the
current  device time and schedules the playback of the #rst block
of audio. Thereafter, it schedules each successive block to  play
directly  on the heels of the previous block, so playback will be
continuous.  After each call to AFPlaySamples(), the time pointer
is simply incremented by the number of samples played.

    There is no code in aplay for #ow  control.   aplay  makes  a
fundamental,  but  unwritten,  assumption that the #le system can
supply audio data faster than it is required by the  server.  The
audio  data will be buffered in the server until aplay gets about
four seconds ahead. At that point, the connection to  the  server
will  block occasionally, keeping aplay about four seconds ahead.
The #le I/O side of aplay can block for as long as  four  seconds
without causing a break in the audio.

8.2   apass

    Teleconferencing was one of our major goals in  the  develop-
ment of AudioFile. apass is not a complete teleconfer- encing ap-
plication; it simply records from one device and  after  a  small
delay, plays back on another device. However, apass addresses two
of the fundamental problems of network teleconferencing:


    - Management of end-to-end delay. In teleconferencing, it is
important to have tight control over the end-to-end
       delay of the audio connection.  If the  round  trip  delay
exceeds about 300 milliseconds, humans begin to have
       dif#culty with conversational dynamics. apass  sets  up  a
strict delay budget, accounting for the various factors
       involved.

    - Multiple clocks. In a system with multiple audio  devices,
the different devices are usually controlled by different
       clocks. If the transmitting end runs faster than  the  re-
ceiving end, then excess samples will accumulate in buffers
       in between, gradually increasing the end-to-end delay.  If
the transmit clock is slower than the receive clock, the
       buffers will run dry and the playback will break up. apass
tracks the clock rates and resynchronizes as necessary.

    apass reads blocks of samples from the  transmit  server  and
schedules  their  playback  on  the  receive  server.   The delay
between input and output is made up of three parts:

    - Packetization Delay.  The last sample of a block  must  be
recorded before the #rst sample can be played back.
       The size of the block sets the minimum end-to-end delay.
    - Transport Delay.  apass reads samples  from  the  transmit
server and sends them to the receive server.  The
       transmission delay, software  overhead,  and  rescheduling
delays make up the transport delay.

    - Anti-Jitter Delay.  apass uses explicit control over play-
back time to insert extra delay at the receiving server.
       This absorbs variation in the  transport  delay,  provided
that the variation is not larger than the anti-jitter delay.

         apass()
         -
              faud = AFOpenAudioConn(faf);   /* open  connections
to the from */
              taud = AFOpenAudioConn(taf);   /*    and  to  audio
servers          */

              /* set up audio contexts, find sample size and sam-
ple rate     */
              fac = AFCreateAC(faud, fdevice, ACRecordGain,  &at-
tributes);
              srate = fac->device->playSampleFreq;
              ssize  =  sample_sizes[fac->device->playBufType]  *
fac->device->playNchannels;
              tac = AFCreateAC(taud, tdevice, ACPlayGain, &attri-
butes);

              /* establish a value for the delay from  record  to
playback     */
              delay_in_samples = fsrate * delay_in_seconds;

              ft =  AFGetTime(fac);             /*  get  starting
times             */

              /* playback will start delay_in_samples in the  fu-
ture           */
              tt = AFGetTime(tac) + delay_in_samples;

              for (;;) -
                  /*  record   samples   and   play   them   back
*/
                  factt      =      AFRecordSamples(fac,      ft,
samples_bufsize*ssize, buf, ABlock);
                  tactt      =       AFPlaySamples(tac,       tt,
samples_bufsize*ssize, buf);


    AFRecordSamples() and AFPlaySamples() accept  the  parameters
ft  (from-time)  and  tt (to-time) respectively, and return factt
(from-actual-time) and tactt (to-actual-time). factt, the current
transmit   server   time,   will   be   approximately   equal  to
ft+samples_bufsize, because the pacing #ow control is provided by
the transmit server.

                  est_delay  =  average_recent(tt  -  tactt);  /*
average recent delay estimates */

                  /* if the delay has drifted outside of the  al-
lowable
                      region, then resynchronize  the  connection
*/
                  if ((est_delay < delay_lower_limit) __ (slip >=
delay_upper_limit))
                       tt = tactt + delay_in_samples;

                  /* finally, update the start times of the  next
block                          */
                  ft += samples_bufsize;   tt += samples_bufsize;
              "
         "


    apass uses a very simple algorithm for synchronizing  clocks.
tt  #  tactt is an estimate of the current end-to-end delay minus
the packetization and average transport delays.  This  difference
should be about delay_in_samples, but will vary from this nominal
value. apass averages several recent estimates  to  determine  if
the   end-to-end   delay   is   within   the  range  speci#ed  by
delay_in_samples plus or minus the anti-jitter  speci#cation.  If
the  receive clock is too fast, the average delay will eventually
drift below the lower end of the range.  If the receive clock  is
too  slow,  then  the delay will drift above the upper end of the
range. In either case, tt is reset to nominal delay,  resynchron-
izing the connection.

    Note that time values from the two audio  servers  cannot  be
directly  compared because they have different initial values and
slightly different rates. apass avoids this problem  by  tracking
the buffering available at the receiving server.

    A robust teleconferencing application would likely  choose  a
more  sophisticated  algorithm for managing multiple clocks, such
as resynchronizing whenever the audio is quiet, or resampling  at
the receive sample rate.

8.3   A Trivial Answering Machine

    Separate simple clients can be used to construct  interesting
applications.  For  example,  we can implement a simple answering
machine as a shell script.

         #!/bin/sh
         #
              while true; do
         #
              aevents -ringcount 3.0                 #  wait  for
the phone to ring three times
              ahs  off                                  #  answer
the phone (take off-hook)
              aplay -f -d 0 outgoing_message.snd # play  outgoing
message
              aplay -f -d 0 beep.snd                # play a beep
         #
         # record up to 30 seconds, or  until  the  caller  stops
talking
         #
              arecord -silentlevel -35.0 -d 0 -silenttime 4.0  -l
30.0 -t -1.0 >>messages.snd
              aplay -f  -d  0  thanks.snd              #  play  a
thank-you message
              ahs on                                  #  hang  up
the phone
         #
         # add a date stamp to the message file using a  text  to
speech
         #    synthesizer (not part of AudioFile ...)
         #
              date _ tts >>messages.snd
              done                                     # Go  back
and get the next message


8.4   Other AudioFile Applications

    There are already several applications  which  use  AudioFile
but are not distributed with it.  We mention some interesting ex-
amples here.


DECtalk:    We built  a  software-only  version  of  the  DECtalk
text-to-speech synthesizer, called tts.  The synthesizer
generates waveform samples on the standard  output,  which
we can pipe to aplay for output.

DECspin:    DECspin is a Digital product for  network  audio  and
video teleconferencing.  DECspin uses AudioFile to
provide its voice capability.

VAT:   A team at the University of California, led by  Van  Jacob-
son, has built a network teleconferencing application
using IP multicast protocols [4 ]. VAT can  use  AudioFile
for its audio I/O.

Sphinx:   Sphinx II is a continuous speech recognition system [6 ]
developed at Carnegie Mellon University.  We use
AudioFile's high #delity input capability to supply  audio
to Sphinx.

9   Performance Results

    In this section, we present some performance results for  our
implementation  of  the  AudioFile System.  First, we measure the
time to complete client library operations.  Next, we measure the
CPU  load for recording and playback.  Finally, we brie#y discuss
our experience using TCP as the transport protocol.

9.1   Server and Client Performance

    We measured latencies and performance of our AudioFile imple-
mentation  by timing various client library functions.  We tested
with two types of systems (MIPS and Alpha AXP)  under  six  local
and networked con#gurations:

                                       alpha           Alpha  lo-
cal client & server
                                       alpha/mips           Alpha
client, MIPS server
                                       alpha/alpha          Alpha
networked client & server
                                       mips            MIPS local
client & server
                                       mips/mips             MIPS
networked client & server
                                       mips/alpha            MIPS
client, Alpha server


    We did all testing with the LoFi server, Alo#,  using  the  8
KHz CODEC device. All MIPS systems were DECstation 5000/200s run-
ning ULTRIX 4.3, and all Alpha AXP  systems  were  DEC  3000/400s
running  DEC  OSF/1  for Alpha AXP V1.2. All network testing took
place on a lightly loaded 10 Mbit/sec Ethernet.

Play and Record

    The AudioFile library functions that move data have latencies
that  depend  on the length of the data. Figures 4 and 5 show the
elapsed time for various length AFRecordSamples() and  AFPlaySam-
ples() requests on the different system


        Figure        4:        AFRecordSamples()         timings
Figure 5: AFPlaySamples() timings


con#gurations.  The requests were scheduled to  hit  entirely  in
the server's buffers (and therefore not block).  These times were
computed by computing the average time for 1000 operations.

    The timings for short requests represent the  basic  overhead
for  a  server/client exchange.  In the record case, the jumps at
approximately 8K bytes are due to  #chunking#  performed  in  the
client  library.   Record requests larger than 8K bytes (not sam-
ples) are broken into 8K byte chunks to better  control  interac-
tions  with  transport protocol heuristics and to simplify server
implementation.  Each record request  completes  synchronously  #
the  client  library  waits for the reply before sending the next
chunk.  A 16K byte request therefore takes the same time  as  two
independent 8K byte requests.

    Play requests are chunked  in  a  similar  fashion,  but  the
client does not wait for each chunk to complete. The client sends
all chunks to the server and waits only for the reply to the last
one. The resulting play timing is a nearly linear function of the
play request size.

Open Loop Record/Play

    The timings of various AudioFile operations have implications
for  applications that process audio in real time.  Simple appli-
cations, such as playing a #le, do not really care how  long  the
operations  take  to  complete, as long as the throughput exceeds
the  audio  data  rate.  However,  other  applications,  such  as
teleconferencing,  depend on minimizing the time needed to handle
samples.

    To illustrate some of the implementation limits, we  coded  a
loopback  test  that  reads samples from a device and writes them
back as quickly as possible. The test uses a non-blocking  record
function  that returns only what samples are available. The algo-
rithm is shown in this code fragment:

           for(;;) -
                now =  AFRecordSamples(ac,  next,  8000,  buffer,
ANoBlock);
                length = now - next;
                AFPlaySamples(ac, next+4000, length, buf);
                next = now;
           "

    The rate at which this loop iterates is governed entirely  by
the  AudioFile overhead and represents a limit for handling real-
time audio. The average times to complete one iteration are shown
in Table 4.

                                                   _____________________
                                                   _ Con#guration
_   Time  _
                                                   ___(client/server)___(ms)____
                                                   _        alpha
_0.87  _
                                                   _  alpha/alpha
_ 1.27  _
                                                   _   alpha/mips
_  2.17  _
                                                   _         mips
_ 1.93  _
                                                   _   mips/alpha
_  2.15  _
                                                   ___mips/mips_________3.45____

                       Table 4: Loopback timing


    AudioFile's overhead establishes a minimum latency for  real-
time  applications.  However, we feel that AudioFile will be ade-
quate for all but the most demanding real-time  requirements.  In
some  networked con#gurations, however, AudioFile's overhead will
be small compared to the network  delays.  For  example,  a  link
across  North  America  has  a minimum 15 millisecond propagation
time, not including transmission and routing time.

9.2   CPU Usage

    We also investigated the CPU usage for  playback  and  record
operations.   The  tests  consisted  of  playing and recording 30
seconds of audio at two sample rates and data types:  8 KHz  -law
and  44.1  KHz  CD-quality  stereo.   Tests  were done in a local
con#guration with the server  and  client  running  on  the  same
machine,  using  UNIX  domain  sockets.   Table  5 summarizes the
server and client CPU usage for the two  cases.   Both  user  and
system  times  (in  seconds) are given, and the percentage column
indicates the system load, computed by dividing total CPU time by
the  duration.  These  results  include the costs of transferring
data to LoFi using programmed I/O. CPU usage would be reduced  by
use of shared memory transport or by a DMA I/O device.

________________________________________________________________________
                    _                 _                         _
Server     _    Client     _         _  %    _
                    ___System________Test_case________________User_____Sys____User_____Sys____Total____Load____
                    _    Alpha         _  8  KHz  playback      _
0.2   _ 0.1  _  0.0   _ 0.1  _  0.4   _   1.2   _
                    _   DEC  3000/400   _8  KHz  record         _
0.1   _ 0.1  _  0.1   _ 0.1  _  0.5   _   1.5   _
                    _                 _44.1   KHz   playback    _
2.9   _ 2.0  _  0.0   _ 0.5  _  5.4   _  18.0  _
                    _________________44.1_KHz_record___________3.7_____1.1_____0.7_____1.4_____6.9______22.9____
                    _    MIPS         _   8  KHz  playback      _
0.2   _ 0.2  _  0.0   _ 0.1  _  0.5   _   1.7   _
                    _   DEC  5000/200   _8  KHz  record         _
0.2   _ 0.1  _  0.2   _ 0.1  _  0.6   _   2.0   _
                    _                 _44.1   KHz   playback    _
1.9   _ 1.9  _  0.0   _ 2.6  _  6.4   _  21.3  _
                    _________________44.1_KHz_record___________4.4_____4.0_____1.1_____3.4_____12.9_____43.0____

Table 5: CPU usage for playback and record operations

9.3   Data Transport

    AudioFile can be used over  almost  any  transport  protocol,
though  the  details  of  the protocol may affect real-time audio
performance. This section discusses our experience using  TCP  as
the transport layer.

    Although applications such as apass may exercise  tight  con-
trol over timing, most do not have strong real-time requirements.
TCP is usually suf#cient for these applications because the delay
caused by retransmission of lost packets is small compared to the
buffering of unplayed samples. On the  other  hand,  applications
like  teleconferencing  do  require  timely delivery of the audio
data.

    We found that a naively implemented teleconferencing applica-
tion displayed serious problems when used over a transcontinental
TCP link.  We observed frequent and lengthy dropouts in the audio
stream,  which  were  especially  likely  with bidirectional data
streams.  These stem from packet losses caused  by  a  phenomenon
known  as  #ACK- compression# [10 , 20 ], a subtle consequence of
the use of window-based #ow control. The duration of each dropout

is exacerbated by TCP's slow-start algorithm [5  ],  which  comes
into play when packets are dropped by the network.

    ACK-compression occurs when the spacing  between  acknowledg-
ments  is  changed by delays in the routers. This can cause cause
TCP to send large bursts of packets, which overrun the buffers in
a  router, causing packets to be dropped.  Unfortunately, the TCP
slow-start algorithm converts these losses into lengthy  recovery
periods during which data #ows more slowly.  On a connection such
as a long-haul T1 circuit, it can take several seconds to restore
full throughput.

    TCP is arguably the wrong transport protocol for applications
such  as  teleconferencing,  since  it tries to guarantee ordered
packet delivery without any concern for packet delay. Many appli-
cations  instead  need  guarantees  on bandwidth and latency, but
they may be prepared to accept some lost data. Networks and  pro-
tocols that provide such guarantees are active areas of research.
To manage these issues, all of the teleconferencing  applications
mentioned  in Section 8 are split among sites, using special pro-
tocols over long-haul paths, and only  communicate  locally  with
AudioFile servers.

10   Summary

    The AudioFile System  provides  device-independent,  network-
transparent audio services. With AudioFile, multiple audio appli-
cations can run simultaneously,  sharing access to the actual au-
dio  hardware.   Network transparency means that application pro-
grams can run on machines scattered throughout the network.   Be-
cause  AudioFile  permits  applications to be device-independent,
applications need  not  be  rewritten  to  work  with  new  audio
hardware.

10.1   Areas for Further Work

    It is remarkably dif#cult to get  something  as  big  as  Au-
dioFile  completely  right.   We  are very pleased with our basic
design decisions, but we do have a list of items which, if imple-
mented or #xed, would make AudioFile still more useful.


    - Audio devices should support multiple  sample  rates,  and
the library should support sample rate changing.

    - Audio devices should have an  ordered  list  of  supported
data formats, so that the device can express a preference
for one format over another.

    - The protocol and library should offer improved support for
synchronization and conversion between clocks, including clock
prediction routines and  the  simultaneous reporting of all device
clocks.

    - Clients should be able to refer symbolically to #the local
loudspeaker# or #the telephone#.

    - Clients which deal with #les should know about various po-
pular sound #le formats.

    - The server should support  compressed  data  types,  which
would make AudioFile more useful in low-bandwidth environments.

10.2   Conclusions

    We believe that AudioFile has done well in meeting our design
objectives:  network  transparency, device indepen- dence, simul-
taneous clients, simplicity, and ease of implementation. We  also
believe our experience has validated our principles:


    Client control of time.  AudioFile permits both  real-time
and non-real-time audio applications using the same primitives.

    - No rocket science. Our decisions to build on top of  stan-
dard communications protocols and not to use threads
have improved the portability of the system.  It  is  also
arguable that AudioFile performs so well precisely because
of its minimalist underpinnings.

    - Simplicity. Simple play and record  clients  require  very
little code. Indeed, many applications can be constructed using
independent AudioFile  clients  organized  by  shell scripts.

    - Computers are fast.  We did not  let  fear  of  per-sample
processing get in our way.  Our slowest implementation uses less
than 2% of the CPU for  telephone-quality  playback.

10.3   How to Get AudioFile

    AudioFile can be copied by  anonymous  FTP  from  crl.dec.com
(192.58.206.2)  in  /pub/DEC/AF/AF2R2.tar.Z.  Some  stereo  sound
bites are in AF2R2-other.tar.Z. A more  detailed  description  of
the design and implementation of AudioFile is also available as a
technical report [8 ].

    We apologize for the code being cluttered with  left-justi#ed
chicken  scratches. 8  We wanted AudioFile to be portable, but at
least one major vendor does not support function prototypes  with
their stock C compiler.

    We have created an Internet mailing list  af@crl.dec.com  for
discussions    of    AudioFile.     Send   a   message   to   af-
request@crl.dec.com to be added to this list.

10.4   Acknowledgments

    Many people have contributed to AudioFile.  We would like  to
thank  Ricky  Palmer  and  Larry  Palmer for the SPARCstation DDA
code.  They, along with Lance Berc and Dave Wecker, persevered as
early  users  of  AudioFile.  Lance Berc's experiments with long-
distance teleconferencing taught us much about  the  network  is-
sues.   Jeff  Mogul  offered valuable assistance on understanding
these issues. Guido van Rossum contributed the DDA code  for  the
Silicon Graphics Indigo only two weeks after we released the #rst
public distribution.  Dick  Beane  provided  useful  comments  on
drafts  of this paper. We would also like to thank Victor Vyssot-
sky and Mark R. Brown for putting up with us.

References

 [1]  Susan Angebranndt, Richard L.  Hyde,  Daphne  Huetu  Luong,
Nagendra Siravara, and Chris Schmandt. Integrating
audio and telephony in a distributed  workstation  environ-
ment. In Proceedings of the USENIX Summer Conference. USENIX, June 1991.

 [2]  B. Arons, C. Binding, K. Lantz, and C. Schmandt.   The  VOX
audio server.  In Multimedia '89:  2nd IEEE
      COMSOC International  Multimedia  Communications  Workshop,
1989.

 [3]   Edward  Bruckert,  Martin  Minow,  and  Walter  Tetschner.
Three-tiered software and VLSI aid developmental
      system to read text aloud. Electronics, Apr. 21, 1983.

 [4]  Steve Deering and Steve Casner. The  #rst  IETF  audiocast.
ACM Communications Review, 22(3), July 1992.

 [5]  Van Jacobson.  Congestion avoidance and control.  In  Proc.
SIGCOMM '88 Symposium on Communications
      Architectures and Protocols, pages 314#329,  Stanford,  CA,
August 1988.

 [6]  Kai-Fu Lee.  Automatic Speech Recognition:  The Development
of the SPHINX System.  Kluwer Academic
      Publishers, Norwell, MA, 1989.

 [7]  Thomas M. Levergood. LoFi: A TURBOchannel audio module. CRL
Technical Report 93/9, Digital Equipment
      Corporation, Cambridge Research Lab, 1993.

 [8]  Thomas M. Levergood, Andrew  C.  Payne,  James  Gettys,  G.
Win#eld Treese, and Lawrence C. Stewart. Audio#le:
      A network-transparent system for distributed audio applica-
tions. CRL Technical Report 93/8, Digital Equipment
      Corporation, Cambridge Research Lab, 1993.

 [9]  D. L. Mills. Network time protocol (NTP). Internet RFC 958,
Network Information Center, September 1985.

[10]  Jeffrey C. Mogul. Observing TCP dynamics in real  networks.
In Proc. SIGCOMM '92 Symposium on Commu-
      nications Architectures and Protocols, Baltimore,  MD,  Au-
gust 1992.

[11]  John K.  Ousterhout.  An  X11  toolkit  based  on  the  Tcl
language. In Proceedings of the USENIX Winter Conference,
      January 1991.

[12]  Steven J.  Rohall.   Sonix:   A  network-transparent  sound
server.  In Proceedings of the Xhibition 92 Conference,
      June 1992.

[13]  Robert W. Schei#er and James Gettys. X Window System. Digi-
tal     Press,     Bedford,     MA,     3rd     edition,    1991.
_____________________________________________________8
    You really should read Thompson's paper on the Plan 9 C  com-
piler [18].

[14]  Henry Spencer.  How to steal code -or- inventing the  wheel
only once.  In Proceedings of the USENIX Winter
      Conference, pages 335#346. USENIX, February 1988.

[15]  D. C. Swinehart, L. C. Stewart, and S. M. Ornstein.  Adding
voice to an of#ce computer network. In Proceedings
      of GlobeCom 1983, November 1983.

[16]  Robert Terek and Joseph Pasquale.  Experiences  with  audio
conferencing using the X window system, UNIX,
      and TCP/IP. In Proceedings of the USENIX Summer Conference.
USENIX, June 1991.

[17]  Charles P. Thacker, Lawrence C. Stewart, and Edwin H.  Sat-
terthwaite Jr. Fire#y: A multiprocessor workstation.
      IEEE Transactions on Computers, 37(8):909#920, Aug. 1988.

[18]  Ken Thompson. A new C compiler. In Proceedings of the  Sum-
mer 1990 UKUUG Conf., pages 41#51, London,
      July 1990.

[19]  Stephen A. Uhler. PhoneStation, moving the  telephone  onto
the virtual desktop. In Proceedings of the USENIX
      Winter Conference. USENIX, January 1993.

[20]  Lixia  Zhang,  Scott   Shenker,   and   David   D.   Clark.
Observations  on  the  dynamics  of  a  congestion  control
      algorithm: The effects of two-way traf#c. In Proc.  SIGCOMM
      and Protocols, pages 133#147, Zurich, September 1991.

Author Information

    Lawrence C. Stewart joined the Cambridge Research  Lab  (CRL)
in  1989 after 5 years at Digital's Systems Research Center.  His
interests include speech, audio, and multiprocessors.  He was one
of  the  designers of the #rst Alpha AXP computer system.  Before
joining Digital, Larry was at Xerox PARC.  He  received  an  S.B.
from  MIT  in  1976,  and M.S. and Ph.D. degrees from Stanford in
1977 and 1981, all in Electrical Engineering.

    G. Win#eld Treese joined CRL in 1988 after working at MIT  on
Project  Athena.  Win's interests are in networks and distributed
systems.  He received an S.B. in Mathematics from MIT in 1986 and
an  S.M.  in Computer Science from Harvard University in 1992. He
is now pursuing a Ph.D. at MIT in the area of computer networks.

    James Gettys joined CRL in 1989.  Jim's focus at CRL is  mul-
timedia  audio  and video systems.  Before joining CRL, Jim spent
two years at the Systems Research Center. Before that, he  was  a
Digital engineer and visiting scientist at MIT working on Project
Athena. He is one of the two principal designers  and  developers
of the X Window System.  Jim received an S.B. from MIT in 1978.

    Andrew C. Payne joined CRL in 1992 after receiving a B.S.  in
Electrical  Engineering  from  Cornell University.  As a co-op at
Digital in 1990, he helped build the #rst Alpha AXP chip.  Andy's
interests include signal processing, speech, and user interfaces.

    Thomas M. Levergood joined CRL in 1990.  Tom's  focus  is  on
speech  and audio-related research.  He is also involved in Alpha
AXP system  and  software  projects,  including  an  experimental
evaluation  of  split user/supervisor cache memories. He received
his B.S. in Electrical Engineering in 1984 and M.S. in Electrical
Engineering in 1993, both from Worcester Polytechnic Institute.

    All of the authors can be reached at:  Digital Equipment Cor-
poration,  Cambridge Research Lab, One Kendall Square, Bldg. 650,
Cambridge, MA 02139, or by e-mail as fstewart, treese, jg, payne,
tmlg@crl.dec.com.