A  Usage  Profile  and  Evaluation
                  of  a  Wide-Area  Distributed  File  System (1)

         Mirjana Spasojevic                        M. Satyanarayanan
        Transarc Corporation                  Carnegie Mellon University


                                    Abstract


The evolution of the Andrew File System (AFS) into a wide-area distributed  file
system  has  encouraged  collaboration  and  information dissemination on a much
broader scale than ever before.  In this paper, we examine AFS as a provider  of
wide-area  file  services  to over 80 organizations around the world. We discuss
usage characteristics of AFS derived from empirical measurements of the  system,
and  from user responses to a questionnaire.  Our observations indicate that AFS
provides robust and efficient data access in  its  current  configuration,  thus
confirming  its  viability  as  a  design  point  for wide-area distributed file
systems.


1. Introduction


Over the last decade, distributed file systems such as AFS and NFS in  the  Unix
world,  and Netware and LanManager in the MS-DOS world have risen to prominence.
Today, virtually every organization with a large collection of personal machines
uses  such  a  system.   The  stunning  success  of  the distributed file system
paradigm is attributable to three factors.


First, a distributed file system simplifies  the  separation  of  administrative
concerns  from usage concerns.  Users work on tasks directly relevant to them on
their personal  machines.   Incidental  but  essential  tasks  such  as  backup,
disaster  recovery, and expansion of disk capacity are handled by a professional
staff who focus primarily on the servers.


__________

 1.     This research was funded by the Advanced Research Project Agency, under
    contract number MDA972-90-C-0036, ARPA order number 7312. The views and
    conclusions expressed in this paper are those of the authors and do not
    represent the official position of ARPA, Transarc Corporation or Carnegie 
    Mellon University.  
    Please direct correspondence to Mirjana Spasojevic, Transarc Corporation, 
    The Gulf Tower, 707 Grant Street, Pittsburgh, PA 15219.


Second, the use of a distributed file system  simplifies  the  sharing  of  data
within  a  user  community.   Such  sharing  can  arise in two forms:  by a user
accessing his files from different machines,  and  by  one  user  accessing  the
files  of  another  user.   The  ability  to  easily access one's files from any
machine enhances a  user's  mobility  within  his  organization.   Although  the
accessing  of  someone else's files is not a frequent event (a fact confirmed by
many previous studies [1,6]), ease of access once the need arises  is  perceived
as  a  major  benefit  by users.  In other words, while sharing may be rare, the
payoff of being able to share easily is very high.(2)
 

Third, transparency is preserved from the users'  and  applications'  points  of
view.  Applications do not have to be modified to use a distributed file system.
Because a distributed file system looks just like a local file  system,  a  user
does  not  have to learn a completely new set of commands or new methods of file
usage.


The designs of modern distributed file systems reflect these observations.  They
use  a  client-server  model,  offer  location  transparency, rely on caching to
exploit  locality,  provide  fairly  weak  consistency  semantics  relative   to
databases,  and  support programming and user interfaces that are close to those
of a local file system.  The success  and  widespread  usage  of  these  systems
confirms the appropriateness of these design choices.


But this success engenders a new question:   "Is  the  distributed  file  system
paradigm  sustainable  at very large scale?" In other words, how well can a very
large  distributed  file  system  meet   the   goals   of   simplifying   system
administration,   supporting   effective   sharing   of   data,  and  preserving
transparency?  Growth brings many problems with it [12 ]:  the  level  of  trust
between  users  is  lowered;  failures  tend to be more frequent; administrative
coordination is more difficult; performance  is  degraded.  Overall,  mechanisms
that  work  well  at  small  scale tend to function less effectively as a system
grows.  Given these concerns, how large can a distributed file system get before
it proves too unwieldy to be effective?
__________

 2. In this respect a distributed file system is like a telephone system:
    although a given individual only tends to call a tiny fraction of all
    telephone numbers, the latent ability to effortlessly reach any other
    telephone in the world is viewed as a major asset of the system.


In  this  paper,  we  seek  to  answer  this  question  by  studying  the  usage
characteristics of AFS, the largest currently deployed instance of a distributed
file  system.  At the time of writing, AFS unites about 1,000 servers and 20,000
clients in 7 countries into a single file name space. We estimate that more than
100,000  users  use  this  system  worldwide.   In geographic span as well as in
number of users and machines, AFS is the largest distributed  file  system  that
has ever been built and put to serious use.


Our study confirms that the distributed file system  paradigm  is  indeed  being
effectively  supported  at  the current scale of AFS. Further, our data does not
expose any obvious impediments to further growth of the system. While asymptotic
limits  to  growth  are  inevitable,  they  do  not appear to be just around the
corner.


2. AFS Background


The rationale, detailed design, and evolution of AFS have been  well  documented
in  previous papers [2 , 5, 9 , 10 , 11 , 15 ]. In this section, we only provide
enough details of the current version of AFS (AFS-3) to make  the  rest  of  the
paper understandable.


Using a set of trusted servers, AFS presents a  location-transparent  Unix  file
name  space  to clients.  Files and directories are cached on the local disks of
clients using a consistency mechanism based on callbacks [3].   Directories  are
cached  in  their entirety, while files are cached in 64 KB chunks.  All updates
to a file are propagated to its server upon close. Directory  modifications  are
propagated immediately.


Backup, disk quota enforcement, and most other administrative operations in  AFS
operate  on volumes [13].  A volume is a set of files and directories located on
one server and forming a partial subtree of the shared name  space.   A  typical
installation  has  one volume per user,  one or more volumes per project,  and a
number of volumes containing system software.  The distribution of these volumes
across  servers is an administrative decision.  Volumes that are frequently read
but rarely modified (such as system binaries) may  have  read-only  replicas  at
multiple servers to enhance availability and to evenly distribute server load.


AFS uses an access list mechanism for protection.  The granularity of protection
is  an  entire  directory  rather than individual files. Users may be members of
groups, and access lists may specify rights for users and groups. Authentication
relies on Kerberos [16].


AFS supports multiple administrative cells, each with its own servers,  clients,
system   administrators  and  users.   Each  cell  is  a  completely  autonomous
environment.  But a federation of cells can cooperate in presenting users with a
uniform, seamless file name space. The ability to decompose a distributed system
into cells simplifies delegation of administrative responsibility [15 ].


As originally designed, AFS was intended for a LAN. However,  the  RPC  protocol
currently  used in AFS has been designed to perform well both on LANs as well as
on wide-area networks.  In conjunction with the cell mechanism,  this  has  made
possible  shared  access  to  a  common, world-wide file system distributed over
nodes in many countries.


In 1990 the Advanced Research Projects Agency (ARPA) awarded Transarc a contract
to  deploy and evaluate a file system to be shared by 40 to 50 Internet sites in
the US. By mid-1991 there were 14 organizations included in the study.   At  the
time  of  writing this paper, more than 80 organizations were part of this wide-
area distributed file system (wadfs).


The wide-area nature of AFS is clearly visible from Figure 1,  which  shows  the
cells visible at the topmost level of AFS. All these directories, as well as the
trees beneath them, are accessible  via  normal  Unix  file  operations  to  any
workstation anywhere in the system.

      cs.arizona.edu      theory.cornell.edu     soup.mit.edu   spc.uchicago.edu
        cs.brown.edu    kiewit.dartmouth.edu    watch.mit.edu           ucop.edu
              bu.edu northstar.dartmouth.edu         ncat.edu         ni.umd.edu
             cmu.edu             iastate.edu     eos.ncsu.edu        wam.umd.edu
      andrew.cmu.edu         ucs.indiana.edu           nd.edu          umich.edu
     club.cc.cmu.edu                 isi.edu  nsf-centers.edu     citi.umich.edu
          ce.cmu.edu        alefnull.mit.edu         pitt.edu math.lsa.umich.edu
          cs.cmu.edu          athena.mit.edu          psc.edu      lsa.umich.edu
         ece.cmu.edu  rel-eng.athena.mit.edu  rose-hulman.edu         cs.unc.edu
         sei.cmu.edu       media-lab.mit.edu          rpi.edu    css.cs.utah.edu
      cs.cornell.edu             net.mit.edu dsg.stanford.edu  cs.washington.edu
graphics.cornell.edu            sipb.mit.edu  ir.stanford.edu


                            (a) educational cells


             ads.com  ctp.se.ibm.com           prc.unisys.com         gr.osf.org
          bstars.com      mtxinu.com  stars.reston.unisys.com         ri.osf.org
           cards.com       locus.com        grand.central.org     syseng.osf.org
      pub.nsa.hp.com       stars.com               ciesin.org
palo_alto.hpl.hp.com    transarc.com              dce.osf.org


                             (b) commercial cells


           inel.gov      alw.nih.gov                ssc.gov
          nersc.gov     ctd.ornl.gov       cmf.nrl.navy.mil


                             (c) goverment cells


jrc.flinders.oz.au      uni-freiburg.de        etl.go.jp pegasus.cranfield.ac.uk
    glade.yorku.ca rus.uni-stuttgart.de others.chalmers.se       athena.ox.ac.uk
   writer.yorku.ca       sfc.keio.ac.jp        nada.kth.se
   lrz-muenchen.de         titech.ac.jp          bcc.ac.uk


                             (d) cells outside US

This figure shows the cells visible from a typical client  in  the  system.  The
listing  above  was  obtained  by doing an "ls /afs" and then sorting the output
according to the domain.  As the figure shows, there are 47  educational  cells,
18 commercial, 6 governmental, and 14 cells outside the United States.


               Figure 1: Cells visible from a typical AFS client.


3.  Evaluation Methodology


A comprehensive characterization of this system would include an  assessment  of
basic architectural features, an analysis of quantitative data from the deployed
system, and an examination of qualitative information reflecting on issues  such
as user perceptions of quality.


Since earlier papers have explored the architecture of AFS in detail, we omit it
from   this  paper.   Here  we  report  on  AFS  from  two  angles:   first,  by
instrumenting clients and servers and collecting data over  a  period  of  time;
second,  by circulating a questionnaire on various aspects of AFS to a sample of
users and summarizing their responses.  We  believe  that  this  combination  of
quantitative  and qualitative information fairly characterizes the current state
of the system.


One's confidence in the answers of an evaluation can  be  classified  into  four
levels based on the origin of the information:  intrinsic (direct examination of
the system design), empirical (raw measurements), evidentiary (inferences  based
on  raw  data),  and  anecdotal  (information requiring user judgment).  In this
taxonomy, our quantitative information is empirical and  evidentiary  while  our
qualitative information is anecdotal.


3.1   Quantitative Data


Empirical measurements of AFS were performed through the xstat  data  collection
facility  [17  ].  The AFS code was instrumented to allow collection of extended
statistics concerning the operation of servers and  clients.   These  statistics
could  be obtained remotely via an RPC call.  A central data collection machine,
located at Transarc, polled and obtained data from  each  participating  machine
four  times  a  day.   The  collected  data  was  formatted  and inserted into a
relational database for postprocessing.  Figure 2 shows  the  structure  of  our
data collection mechanism.


                 Figure 2: Instrumentation for Data Collection


The  scale  of  the  system  complicated  the  logistics  of   data   collection
considerably.   It  would have been practically infeasible to require the active
cooperation of users or system administrators at many different cells to  assist
in   the  data  collection.   Hence  our  instrumentation  required  no  regular
administrative  effort  by  the  sites  being  monitored.  However,  the  system
administrator  of a cell could turn off data gathering if that cell did not wish
to participate in the study.


Not requiring the active cooperation of remote cells complicated the process  of
discovering  which  clients and servers should be contacted for data collection.
Our solution to this problem was to run  a  discovery  process  once  every  few
weeks.  This  process  queried  the Domain Name Service at each cell to obtain a
list of registered IP addresses. This list was then probed to discover  new  AFS
clients and servers in that cell.


The measurements were conducted during a 12-week  data  collection  period  from
mid-May to mid-August 1993.  Our data spans 50 file servers and 300 clients from
12 cells in 7 states.  The only  factors  limiting  broader  coverage  were  the
deadlines  for  this  paper, and the need for participating sites to pick up the
versions of AFS software incorporating our instrumentation.


3.2    Qualitative Data


To complement the quantitative data obtained by instrumentation, we  constructed
a  questionnaire  that touched upon a diverse set of issues.  The purpose of the
questionnaire was to elicit user perceptions as well as to obtain a  profile  of
AFS  usage.  The  topics of interest to us included characterization of the user
community,  extent  of  usage  of  native  and  foreign  cells,  and  degree  of
collaboration  within  and  across  cells.  We were also interested in obtaining
user perceptions of performance and reliability of AFS for  native  and  foreign
cell  access.  Finally,  we were interested in the value and adequacy of various
AFS mechanisms such as access control lists,  read-only  replication,  and  data
mobility.


The questionnaire was distributed in two ways:  first,  by  posting  on  several
Netnews  bboards;  second, by direct mailing to AFS contacts in different cells.
We received about 100 responses from 50 cells. The data we present in this paper
is averaged over all these responses.


4.    Observations and Analysis


In this section we present both  quantitative  and  qualitative  data  collected
during  our  12-week  study.   We  begin  by examining storage capacity and user
profile. We then discuss the nature of client-server interaction, including  RPC
traffic  and  bulk  data  transfers.  Next,  we  explore  cache  performance and
availability, two key parameters of any distributed system. Finally, we  examine
the extent to which AFS is used for collaboration and information dissemination.
In discussing these issues, we interleave the  results  of  both  empirical  and
anecdotal  evidence,  pointing  out  corroborations  and contradictions wherever
appropriate.


4.1.     AFS Usage


4.1.1.      Data Profile


Table 1 shows a recent snapshot of the data stored at 17 cells. (3)
These cells comprised 95 file  servers,  housing  almost  50,000  volumes  and
constituting over 300 GB of data. The data shows that although over


     ___________________________________________________________
     |__Volume_type__|___Total__|__Size_(GB)__|__Avg_(MB/vol)__|
     |  User         |  25,630  |      73     |       2.9      |
     |  Backup       |  14,557  |     105     |       7.2      |
     |  Readonly     |   2,121  |      24     |      11.4      |
     |__Other________|___7,595__|_____111_____|______14.6______|
     |__ALL__________|__49,903__|_____313_____|_______6.3______|

                    Table 1: Storage Capacities of 17 Cells


50% of the volumes belong to individual users, they contain only 23% (73 GB)  of
the data. A third of the data (over 100 GB) belongs to backup volumes. Only 4.2%
of the volumes are readonly replicas, and they contain only 7.7%  of  the  data.
The  remaining  15%  of  the  volumes  correspond  to  system binaries and data,
bulletin boards, and other miscellaneous data. Together, these  volumes  contain
one third of the total data.


Extrapolating from this evidence,  and  from  additional  information  from  the
questionnaire,  we  estimate  that  the  whole  wadfs contains more than 200,000
volumes with 1.5-2 TB of data. It is  interesting  to  note  that  although  the
average  volume  size  is  only  6.3MB, the raw data indicates that some volumes
contain more than 1.5GB of data. In other words, volumes span a  wide  range  of
sizes but tend to be skewed toward the low end.


A related but distinct question pertains to how many of  these  volumes  are  in
active  use  every  day.   To  answer  this  question, we recorded the number of
volumes whose activity level exceeded a specified threshold each day for the the
duration of our data collection. The activity level was arbitrarily chosen to be
10 read references to a volume. Our data showed that, on average, a  server  has
65 active volumes, each containing about 16MB of data.

__________

 3.     These 17 cells were a superset of the 12 from which all other statistics
    in this paper are reported. We were able to obtain a larger sample in this
    case because the necessary instrumentation was present in an earlier release
    of AFS.


4.1.2.      User Profile


The AFS user  community  consists  of  a  number  of  academic,  government  and
commercial  sites and AFS users tend to have a very diverse background. However,
responses to our questionnaire came mostly from AFS contacts,  who  are  usually
system administrators (Figure 3). (4)
__________

 4.     The percentages for some questions do not add up to 100% because some
    respondents did not answer particular questions or they marked more than one
    choice.

The majority of respondents use AFS daily and for most of them the  typical  AFS
session  lasts  a  full  working  day.  Most of them are serious programmers and
two-thirds of them rate their knowledge of AFS to be at an  advanced  or  expert
level.  Most of them had experience with other distributed file systems, usually
NFS.  Our  sample  thus  represents  a  technically   sophisticated   group   of
respondents.   This  renders their assessments of AFS quality more credible, but
also leaves unanswered the question of how naive users view AFS.


1. What is your occupation?         5. How would you rate your knowledge of AFS?
  9%   Student                        2%  Novice
 16%   Researcher/Scientist          28%  Intermediate
 32%   Software Developer            50%  Advanced intermediate
  8%   Manager                       17%  Expert
 11%   Support Staff
 49%   System Administrator         6. What other distributed file systems have
  2%   Other                           you worked with
                                      89% NFS
2. How often do you typically         17% Apollo Domain
use AFS?                               7% RFS
  0%   Never                           9% Other(s)
  1%   Rarely
 11%   Periodically                 7. What's the best description of how you
85%   Daily                            use AFS with the other file service
                                       resources at your site?
3. How long does your typical AFS     0%  Don't use AFS at all
session last?                         5%  Use existing files in AFS, but none of
  6%   Under 30 min                       files are there
  5%   30 min to 1 hr                13%  Store some of my files in AFS,
 12%   1 to 3 hr                          most on other systems
 74%   Full working day              14%  Store many of my files in AFS
                                     66%  Most of my files are in AFS,
                                          including my home directory
4. Which best describes the depth
of your general computing experience?
  1%   Novice
 15%   Casual programmer
 81%   Serious programmer
  2%   Non-technical user

                   Figure 3: A Profile of Survey Participants


4.2.     Client-Server Interaction Profile


How do AFS clients and  servers  interact?   The  answer  to  this  question  is
important  because  knowledge  of  the  relative distribution of file system RPC
calls helps characterize a normal system and identifies the most  common  calls.
This,  in  turn,  allows  performance  tuning  to be focused. Figure 4 lists the
client-server RPC calls with short descriptions.


    Fetch_Data   Returns data of the specified file or directory
                 and places a callback on it.
     Fetch_ACL   Returns the content of the specified file's or directory's
                 access control list.
  Fetch_Status   Returns the status of the specified file or directory and
                 places a callback on it.
    Store_Data   Stores data of the specified file or directory and updates
                 the callback.
     Store_ACL   Stores the content of the specified file's or directory's
                 access control list.
  Store_Status   Stores the status of the specified file or directory and
                 updates the callback.
   Remove_File   Deletes the specified file.
   Create_File   Creates a new file and places a callback on it.
        Rename   Changes the name of a file or directory.
       Symlink   Creates a symbolic link to a file or directory.
          Link   Creates a hard link to a file.
      Make_Dir   Creates a new directory.
    Remove_Dir   Deletes the specified directory which must be empty.
      Set_Lock   Locks the specified file or directory.
   Extend_Lock   Extends a lock on the specified file or directory.
  Release_Lock   Unlocks the specified file or directory.
   GiveUp_Call   Specifies a file that a cache manages has flushed from its
                 cache.
  Get_Vol_Info   Returns the name(s) of servers that store the specified volume.
Get_Vol_Status   Returns the status information about the specified volume.
Set_Vol_Status   Modifies status information on the specified volume
      Get_Time   Synchronizes the workstation clock and checks if servers
                 are alive.
   Bulk_Status   Same as Fetch_Status but for a list of files or directories.


                       Figure 4: Client-Server RPC Calls


Both servers and clients  have  been  instrumented  to  record  the  information
regarding these calls. They keep statistics about the total number of calls, the
number of successful calls and the average time of execution of successful calls
(with the standard deviation).  During our study, statistics were collected from
46 file servers and 264 clients on a typical day.


4.2.1.      RPC Calls Observed by Servers


Over 440 million calls were observed during the data  collection  period.  About
86%  of  these  were  successful.  Table 2 summarizes the detailed statistics of
calls accounting for at least 1% of the total.


______________________________________________________________________________
|__Type_of_call_______|____%___|___#_of_calls___(%_err.)__|Avg_ms______(s.d.)_|
|__1.___Fetch_Data____|___7.6__|__33,427,405____(0.2)_____|_____116____(486)__|
|__2.___Fetch_Status__|__67.0__|_295,247,833____(18.0)____|______12____(378)__|
|__3.___Store_Data____|___4.0__|__17,336,400____(1.0)_____|_____157____(744)__|
|__4.___Store_Status__|___8.7__|__38,399,197____(0.3)_____|_______3____(119)__|
|__5.___Remove_File___|___1.9__|___8,172,106____(0.0)_____|______40____(335)__|
|__6.___Create_File___|___2.0__|___8,945,032____(15.7)____|______22____(545)__|
|__7.___Extend_Lock___|___1.8__|___7,815,294____(73.1)____|_______9____(291)__|
|__8.___GiveUp_Call___|___1.6__|___6,839,076____(0.0)_____|_______1____(39)___|
|__9.___Get_Time______|___3.2__|__14,210,834____(0.0)_____|_______4____(800)__|
|__ALL________________|_100.0__|_440,778,197____(13.8)____|_________n/a_______|

         Table 2: Average Distribution of RPC Calls Observed by Servers


The most frequent is Fetch_Status call.  We conjecture that many of these  calls
are  generated by users listing directories in parts of the file name space that
they do not have cached. The relatively high number of unsuccessful calls  (18%)
suggests  that these directories belong to some protected areas of the file name
space. It is interesting to note that despite caching, the number of  Fetch_Data
calls  is  considerably  higher  than  the  number  of  Store_Data  calls.  Both
Fetch_Data and Store_Data calls take considerably longer than other  operations.
This is to be expected, since they involve disk I/O.


GiveUP_Call turned out to be the call that takes the least  amount  of  time  on
average.  It was even faster than the Get_Time call, which is the simplest call.
Considering the very high standard deviation of the Get_Time call, this might be
just  an  anomaly in the collected data, but it can also be the result of a slow
system call to get the time.


Although Fetch_ACL is not shown in Table 2, our raw data showed  that  it  takes
considerably  more  time on average than Fetch_Status.  This surprised us, since
Fetch_Status  returns  access  list  information.   This  apparent  anomaly  was
explained  when  inspection  of  the  AFS code showed that the implementation of
Fetch_ACL contains a call to a protection server, while  the  implementation  of
Fetch_Status does not.


Analysis of RPC calls on a weekly basis  confirms  that  their  distribution  is
stable  over  time.  Table  3  presents  this  data.  This  data  shows only two
significant deviations from the general profile shown in Table 2. One anomaly is
the very high number of Store_Status calls during weeks 10 and 11. We discovered
that more than 90% of these calls were concentrated on  three  file  servers  at
Transarc.  Further investigation revealed that these servers are frequently used
for testing new AFS releases, thus explaining the unusual distribution of calls.


The second anomaly is the unusually high number of Extend_Lock calls during week
4.   This is usually a rarely-occurring call, typically accounting for less than
1% of the calls in other weeks.  Detailed analysis of week 4's data showed  that
the  majority  of  these  Extend_Lock  calls  were concentrated on just one file
server.  Our hypothesis is that there was a  orphaned  process  on  one  of  the
clients  repeatedly  trying  to make an Extend_Lock call, but failing because of
expired authentication tickets.  This  also  explains  the  high  percentage  of
failed Extend_Lock calls in Table 2.


Based  on  this  data,  one  can  loosely  characterize   a   normally   running
system   as   one  with  a  very  high number (above 60%) of Fetch_Status calls,
and smaller, but still significant, number of Fetch_Data and Store_Status  calls
(about  8%).  Other  frequent  calls  in  such  a  system include Store_Data and
Get_Time.


week Fetch_D Fetch_S Store_D Store_S Remove_F Create_F Extend_L GiveUp_C Get_T
1     8.4     73.2      3.2    2.0     1.2      1.4       3.1      1.8    4.0
2     8.1     71.5      3.4    4.9     1.5      1.6       1.0      1.8    3.9
3     8.2     71.6      3.6    3.7     1.3      1.7       2.2      1.5    4.4
4     7.5     62.3      3.5    4.7     1.4      1.7      12.0      1.3    3.5
5     7.1     76.9      3.3    2.4     1.2      1.6       0.5      1.4    3.7
6     7.3     70.9      4.0    6.3     2.2      2.4       0.3      1.4    2.6
7     7.4     71.0      4.0    5.8     1.7      2.2       0.5      1.8    3.6
8     8.7     66.7      4.2    7.3     2.0      2.5       0.4      1.6    2.8
9     7.1     72.9      3.3    6.4     1.5      1.6       0.4      1.6    3.1
10    7.3     53.6      4.8   21.1     2.8      2.4       0.3      1.3    3.2
11    7.2     52.9      5.4   22.1     2.8      2.6       0.4      1.3    2.5
12    7.0     74.5      3.2    6.1     1.2      1.6       0.6      1.9    2.5
all   7.6     67.0      4.0    8.7     1.8      2.0       1.8      1.5    3.2

This table is based on the same  raw  data  as  Table  2.  It  indicates  weekly
averages (in percentages), rather than averaging across all weeks.


           Table 3: Weekly RPC Call Distributions Observed by Servers


4.2.2.      RPC Calls Generated by Clients


The set of machines from which we were  collecting  data  did  not  represent  a
"closed  system",  i.e.  there  was  no guarantee that participating servers and
clients were contacting only each other.  Thus, the number of calls observed  by
file  servers  does  not  match  the  number  of  calls  generated  by  clients.
Nevertheless,  it  is  interesting  to  compare  these  two  profiles.  Table  4
summarizes the data collected from clients.


______________________________________________________________________________
|__Type_of_call_______|_____%___|__#_of_calls___(%_err.)__|Avg_ms______(s.d.)_|
|__1.___Fetch_Data____|___7.6___|__9,141,014____(0.5)_____|_____158____(614)__|
|__2.___Fetch_Status__|__54.4___|_65,450,704____(14.5)____|______56____(540)__|
|__3.___Store_Data____|__17.3___|_20,816,713____(0.0)_____|______65____(332)__|
|__4.___Store_Status__|___9.8___|_11,806,214____(0.3)_____|______30____(209)__|
|__5.___Remove_File___|___1.0___|__1,260,655____(0.2)_____|______61____(342)__|
|__6.___Create_File___|___1.5___|__1,866,759____(12.6)____|______55____(642)__|
|__7.___GiveUp_Call___|___2.2___|__2,680,433____(0.0)_____|______65____(434)__|
|__8.___Get_Time______|___3.5___|__4,188,167____(8.0)_____|______33____(661)__|
|__ALL________________|_100.0___|120,192,852____(8.5)_____|_________n/a_______|

        Table 4: Average Distribution of RPC Calls Generated by Clients


There were over 120 million calls, out of which 91.5%  were  successful.  Again,
Fetch_Status  calls  dominate.   But  the relative percentage of these calls was
significantly lower than that reported in Table 2  for  servers.   At  the  same
time,  the  relative  percentage  of  Store_Data calls was significantly higher.
Examination of the raw data showed that most of Store_Data calls came from a set
of eight machines belonging to one cell.  We conjecture that the applications on
these machines differed  substantially  from  the  norm  in  their  file  access
patterns.  When  these machines are excluded from the data set, the frequency of
Fetch_Status calls increases to 62% and the frequency of Store_Data calls  drops
to 5%.  The frequencies of other calls are similar to those reported in Table 2.


Surprisingly, Table 4 shows the average Store_Data call to be much  faster  than
the average Fetch_Data call.  It is even faster than the average Fetch_Data call
on servers (Table 2), indicating negative network delay! This  anomaly  is  also
caused  by  the  above-mentioned  group of eight clients. When they are excluded
from the analysis, the average time of Store_Data  calls  increases  to  a  more
credible 149ms.


4.2.3.      Causes of RPC Failures


As noted in the previous section, nearly 8.5% of the calls generated by  clients
failed.   We were curious about the nature of these failures since they may have
been symptomatic of underlying performance or  reliability  problems.  To  study
this,  AFS  clients  were instrumented to keep track of failed RPC calls. Errors
were  divided  into  several  categories:  server  problems,  network  problems,
protection   problems   (insufficient  authorization  or  expired  authorization
tickets), volume problems, occurrences of a busy volume (e.g. when a  volume  is
moved to another server) and errors of unknown cause.


Our data showed that the majority of failed calls, 92%, were Fetch_Status calls.
Most  of them, 76%, failed because of protection errors. This is consistent with
our earlier hypothesis of the existence of periodic jobs on some  machines  that
attempt  to  traverse the AFS tree and fail when they encounter a protected part
of the tree. Another plausible  explanation  is  continuous  execution  of  some
background  daemons  (e.g.  xbiff)  which always produce a failed call after the
authorization ticket's expiration.  A significant number of unsuccessful  calls,
22%, failed for unknown reasons.


4.2.4.      Bulk Transfer Profile


Statistics concerning file transfers were recorded  by  both  file  servers  and
clients.  AFS  performs  partial file caching, so the numbers reported here show
transfers on a per chunk basis, rather than on a per file basis.  The exceptions
are  directories  which  are  cached  in  their  entirety. Chunk size is 64KB by
default, but may be changed on a per-client basis.


The collected statistics are summarized in Table 5.  Our data indicates that the
most  frequently  fetched  chunks  are  in  the range 1-8KB. These correspond to
entire files or directories.   This  result  is  consistent  with  many  earlier
studies  of  file size distributions which have reported small average file size
[6 , 8 ].  The second most frequently fetched chunk size is even smaller, in the
range 0-128B.


___________________________________________________________________________
|                           |_______Servers________|________Clients_______|
|___________________________|Fetched___|___Stored__|_Fetched___|__Stored__|
|       0 B - 128 B         |    32 %  |     44 %  |     33 %  |      6 % |
|      128 B - 1 KB         |     4 %  |      7 %  |      5 %  |    15 %  |
|      1 KB - 8 KB          |    43 %  |     14 %  |     37 %  |     26 % |
|    8 KB - 16 KB           |     4 %  |      6 %  |      4 %  |     8 %  |
|    16 KB - 32 KB          |     2 %  |      4 %  |      3 %  |     7 %  |
|    32 KB - 128 KB         |    14 %  |     25 %  |     17 %  |      7 % |
|_______over_128_KB_________|_____1_%__|______0_%__|______0_%__|_____0_%__|
|__Daily_per_machine________|__156_MB__|___116_MB__|___5.3_MB__|___4.7_MB_|

                  Table 5: File Transfer Size Distribution


The distribution of fetched data on file servers and clients  is  very  similar.
However,  the distribution of stored data differs considerably.  We can conclude
that even when mixes of RPC calls and fetched data
distributions are similar, there might be a significant variation in stored data
distribution  on  servers  and  clients. The results from Section 4.1.1 indicate
that the amount of data housed by active volumes is about 1GB per  file  server.
Table  5  shows  that only about 15% of this data (156MB) is actually fetched by
clients.


4.3.     AFS Performance


4.3.1.      Cache Performance


Cache hit ratio is a critical factor in determining the overall performance of a
system  like  AFS.  Caching is especially valuable in masking the long latencies
typical of wide-area networks.  To  study  this  aspect  of  AFS,  clients  were
instrumented  to  keep  statistics  on cache hit rates and on the percentages of
references made to native and foreign cells.  Since the AFS file cache is  split
into  a  cache  for data and a cache for status information, our statistics were
kept separately for these two categories.


The overall percentage of references to remote files was 4.5% for data and  2.3%
for  status  information.  However, these numbers showed high variation from day
to day: between 0.5% and 26% for data,  and  0.5  and  34%  for  status.  Closer
inspection  of the raw data revealed a group of six machines contributing to the
majority of these references. We conjecture that  these  machines  run  periodic
jobs that attempt to traverse the entire AFS tree.(5)
Since these constitute pathological cases, we excluded these machines from  our
data set, and obtained the substantially more uniform results shown in Figure 5.
__________

 5. This hypothesis has been verified for at least some of the machines.


Our data indicates that the average cache hit ratio is over  98%  for  data  and
over 96% for status information.  Over 95% of data and status references are  to
native  cells.   We  statistically  analyzed  the  possibility  of  foreign cell
references causing much lower cache hit  ratios.  Our  analysis  indicated  that
there was no such correlation.


      (a) Combined cache hit ratio for native and foreign file references


              (b) Fraction of references to files in foreign cells


This figure shows the observed cache hit ratios and relative proportion of native and foreign cell
references over the data collection period. As explained in Section 4.3.1., data
from  six  machines  was  excluded.   Analysis  of  the raw data showed that the
excluded machines exhibited comparable cache performance to the overall  set  of
machines.  The gaps in histograms on several days correspond to missing data due
to problems with the data collection machine.


             Figure 5: Cache Performance and Reference Mixes


The responses to our questionnaire on AFS performance are presented in Figure 6.
Most of the respondents rate the performance of AFS when accessing local data as
good or excellent.  Only 7% of users are not satisfied.   AFS  performance  when
accessing  files in a remote cell is somewhat worse - 50% of respondents rate it
as good or excellent, while 38% feel it is fair. Compared to  other  distributed
systems  they  have  used,  32% of respondents feel that AFS provides comparable
performance, while 41% say that it  is  faster  or  much  faster.  Overall,  the
majority  of  users  seem  to be satisfied with AFS performance. But nearly two-
thirds of them also rate performance and reliability as aspects of AFS that have
sometimes been unsatisfactory.


8. How would you rate AFS performance  10.  How would you rate AFS performance
when accessing files in your own cell?    when accessing files in a remote cell?
 22%   Excellent                           8%  Excellent
 49%   Good                               42%  Good
 20%   Fair                               38%  Fair
  4%   Poor                                6%  Poor
  3%   Unsatisfactory                      2%  Unsatisfactory
                                           4%  No experience
9. Compared to other distributed file
systems you've used, is AFS in your    11. Have any of the following aspects of
own cell:                                 AFS seriously impeded your work?
11%   Much faster                         65%  Performance/reliability
30%   Faster                              21%  Authentication/ACLs
32%   Comparable                           6%  Replication
15%   Slower                              19%  Backup/restore
 2%   Much slower                         25%  Semantics (Unix emulation)
 5%   Haven't used other distributed      32%  Availability for other
      file system                              hardware/OS bases
                                          10%  Deployment (i.e., it doesn't run
                                               at the places with which I
                                               interact)
                                           5%  Other(s)

                 Figure 6: Users' Perception of AFS Performance


4.3.2.      Frequency of File Server Failures


Interruption  of  file  service  in  a  wadfs  is  a  potential  obstacle  to  providing  transparency.   One  way  of
measuring file server downtimes is to have file servers record downtimes themselves and report them to the
data collection agents. However, in our view, a much more important picture is the one that client machines
have about the file servers' availability. Thus, we instrumented clients to record outages. A particular file
server's downtime was observed only by the clients that could not access particular data from that file
server (because of the server's failure and/or network problems).  Such an approach weights failures by
clients' interest in the files affected; in other words, the inaccessibility of a heavily-used file contributes
more to the metric than the inaccessibility of a lightly-used file.  Table 6 reports average inconvenience
time, which is the time during which a client cannot communicate with at least one file server that it needs
to access.


 _____________________________________________________________________
 |__Type_of_outage______________|___%_of_time____|__Time_(min/week)__|
 |  Servers in the same cell    |  0.08 - 0.59%  |        1.2-8.5    |
 |__Servers_in_the_foreign_cell_|__0.04_-_0.54%__|________0.5-7.7____|

This table shows observed average inconvenience times for clients  over  12-week
data  collection  period.   The lower side of the range represents the case when
for each client all daily failures occur simultaneously. The higher side of  the
range represents the case when daily failures do not overlap.


                Table 6: Average Inconvenience Time for Clients


Downtime incident statistics were collected from 235 clients on an average  day.
During  the  twelve  week  data collection period, the number of observed server
downtime incidents was 3349 for servers in the same cell and 1159 for servers in
foreign  cells.  (Table 7). It should be noted that a particular server's outage
can be reported multiple times if observed by  multiple  clients.  Also,  on  an
average  day  only  about  15% of the contacted clients accessed data in foreign
cells and  thus  were  able  to  observe  server  downtimes  in  foreign  cells.
According  to  the  numbers  collected,  on  average, a client observes a server
outage every 5-6 days for the local cell and every  3-4  days  for  the  foreign
cell, under the assumption that all clients are equally observant (active).  The
duration of almost half the outages is less than 10min.  Since this  is  shorter
than  the  recovery  time for a typical server, we conjecture that many of these
short outages are really due to transient network failures.


_____________________________________________________________
|                        |    Servers in    |  Servers in   |
|__Downtime_durations____|__the_same_cell___|foreign_cells__|
|  0 min - 10 min        |       1584       |      861      |
|  10 min - 30 min       |        759       |      128      |
|  30 min - 1 hr         |        484       |       67      |
|  1 hr - 2 hr           |        275       |       48      |
|  2 hr - 4 hr           |        140       |       21      |
|  4 hr - 8 hr           |         63       |       6       |
|__>_8_hr________________|________44________|______28_______|
|__TOTAL_________________|_______3349_______|_____1159______|

             Table 7: Distribution of File Servers Outage Durations


Users tend to  notice  file  server  failures  less  frequently  than  what  the
empirical evidence indicates (Figure 7).  Failures of servers in local cells are
experienced at most once a month by 77% of respondents.  Only  3%  witness  file
server  failures  on a daily basis.  However, users perceive failures as lasting
longer than the empirical data indicates:  less than 10min for 13%, 10 - 30  min
for  36%,  30min  -  1hr  for  35%,  and  more than 1 hr for 12% of respondents.
Failures of servers in foreign cells are experienced at most once a month by 54%
of  respondent.  However, the actual percentage is higher, because this question
did not apply to 20% of respondents (question 14).


12. In your experience, how often are  14. In your experience, how often are the
the AFS File Servers in your own cell     AFS File Servers in other cells down
(organization) down or unavailable?       or unavailable?
  2%   Never                           6%  Never
 36%   Once every few months          20%  Once every few months
 39%   Once a month                   29%  Once a month
 17%   Once a week                    17%  Once a week
  3%   Daily                          7%  Daily
                                      20%  N/A

13.  In your experience, for how long are AFS File
Servers typically down when they crash or in the
presence of network problems?
 13%   Less than 10 minutes
 36%   10 minutes to 30 minutes
 35%   30 minutes to 1 hour
 12%   More than 1 hour
  3%   N/A

               Figure 7: Users' Perception of File Server Failures


4.4.     Sharing in AFS


The existence of cross-cell file  access  in  AFS  is  borne  out  by  the  data
presented  in  Figure 5(b). That figure showed that the percentage of references
to the files in foreign cells was up to 5% for data and up to  4.5%  for  status
information  during the 12-week data collection period. Although 5% may not seem
like much, it is significant because cells represent  organizational  boundaries
and most users tend to access data within their own organizations.


Table 8 represents a histogram of the number of  different  cells  contacted  by
each  client  during  the 12-week period. The table shows that two thirds of the
clients referenced data in at least one foreign cell while  3%  of  the  clients
referenced  data  in  all available cells.  Further, examination of the raw data
shows that, on average, 15% of the clients referenced foreign data each day.

              ______________________________________
              |__Cells_contacted__|_%_of_clients___|
              |      >=  1        |       100      |
              |      >=  2        |       67       |
              |      >=  3        |       42       |
              |      >=  6        |       22       |
              |      >= 10        |      15        |
              |      >= 20        |       9        |
              |      >= 50        |       4        |
              |______>=_70________|_______3________|


                      Table 8: Client Contacts with Cells

We also repeated the study originally reported by Kistler and Satyanarayanan  [4
] on the extent of sequential write sharing on directories and files. Every time
a user modified an AFS directory or file, the user's identity  was  compared  to
that  of  the  user  who made the previous modification.  Our data, showing that
99.1% of all directory modifications were by the previous writer, is  consistent
with  Kistler  and Satyanarayanan's observations. Unfortunately, we are not able
to report on write sharing on files due to a bug in  the  statistics  collection
tools.


These observations  confirm  that  the  wide-area  aspects  of  AFS  are  indeed
valuable.   Our  anecdotal  data,  presented  in  Figure  8,  corroborates  this
conclusion. Most users rate AFS very highly as a communication and collaboration
tool.   In  their local cell, over 60% of the users tend to read or modify files
that do not belong to them (question 15). Most users  have  used  or  looked  at
materials  that reside in other cells. About 80% possess accounts/authentication
identities in foreign cells.  About  38%  of  the  users  participate  in  joint
projects with people from different cells, although 23% do not do so frequently.


15.  What is the nature of AFS      19.  Rate the importance of the following
interaction between yourself and      communication/collaboration media and
others in the same AFS cell?          methods in your organization, using the
(Check any that apply.)               following scale: 5: Very important, 4:
 3%   No interaction                  Important, 3: Somewhat important, 2: Not
39%   "Looking around" your cell's    important, 1: Not used at all.
      file space to see what's new     3.92 Direct phone calls
 69%   Reading files                   2.43 Conference calls
 15%   Accessing AFS-based bboards     2.55 Internal (paper) memoranda
 45%   Copying interesting files into  2.14 U.S. mail
       your own storage area(s)        2.69 Express/overnight delivery services
 33%   Copying files in one direction  3.26 FAX
       (e.g., drop-offs)               2.66 Physical media (floppies, hard
 39%   Copying files back and forth,        disks, mag tapes, etc.)
       modifying them at each  step    4.56 Email
66%   In-place use, modifying files    3.49 BBoards/Email lists
      without copying them             3.55 FTP
  6%   Other                           3.06 Local (non-distributed) file system
                                       3.38 Non-AFS distributed file system
16.  Have you ever explored/used the        (e.g., NFS)
resources available through the        4.05 AFS
grand.central.org cell?                0.17 Other(s)
34% Yes, I've used the materials there
26% Yes, I've looked through it to see  20. Are you currently working with
    what's there                            people in another AFS cell on joint
21% No, I haven't had the need/desire       projects of any kind?
19% No, the /afs/grand.central.org       7% Yes, frequently
    directory has not been set up at     8% Yes, moderately
    my site                             23% Yes, but not very frequently
                                        15% No, the people I collaborate with
17.  Have you ever explored/used the        outside my own cell do not (all)
resources available at other cells?         run AFS
71%  Yes, I've used the materials at    43%  No (for any other reason)
     other cells
20%  Yes, I've looked through other
     cells to see what's there
 8%  No, I haven't had the need/desire
 1%  No, the /afs/<other cell>
    directories have not been
       set up at my site

18.  How many accounts/authentication identities do
you have in other cells (i.e., how many cells other
than your home cell can you klog to)?
 20%   0
 36%   1
 19%   2
 18%   3
  7%   More than 3

                 Figure 8: Users' Perception of Sharing in AFS


Further anecdotal information of the value of wide-area file access is  provided
by  highly  visible  instances of information dissemination and collaboration in
AFS. For example, AFS has facilitated the development of OSF's DCE. It has  also
been  used  in  the STARS project initiated in 1990 by ARPA, which established a
nationwide government-commercial collaboration. In both these  cases,  wide-area
file  access  has  been used by participant organizations to support sharing and
dissemination.  Project software  and  documentation  are  located  in  AFS  and
collaboration via AFS has occurred on a regular basis. AFS has also been used as
a tool for information dissemination. The release of MIT's X11R5 software  is  a
good example. In September 1991, the X11R5 release was installed  into  the cell
grand.central.org  and  all AFS  sites were able to immediately browse and 
access the release without manual file transfers.


5.    Conclusion

Our goals in conducting this study were to observe a wadfs in actual use and  to
characterize its usage profile.  We were also interested in determining how well
AFS worked at the current scale of the system, and to see if any imminent limits
to its further growth were apparent.


The qualitative and quantitative data that we have presented confirms  that  AFS
provides  robust and efficient distributed file access in its present world-wide
configuration.  The caching mechanism is  able  to  satisfy  most  of  the  file
references  from  the  clients' local cache. Even though file server and network
outages can be disruptive for  particular  users,  our  observations  show  that
prolonged server inaccessibility is rare.  Our data shows no obvious bottlenecks
that might preclude further growth of the system.


AFS's divide and conquer technique of using semi-autonomous cells  for  spanning
widely  disparate  organizations  has  proven  to  be  invaluable.  By providing
considerable flexibility in security and storage management policies,  the  cell
mechanism reduces the psychological barrier to entry of new organizations.  As a
consequence, growth in AFS over time has not just been in the number of nodes in
each cell, but also in the total number of cells.


In summary, this paper provides conclusive evidence that AFS is a viable  design
point  in  the  space  of  wide-area  distributed  file  system  designs. We are
convinced that any alternative  design  must  preserve  the  aggressive  caching
policies  and  support  for  autonomous administration that are the hallmarks of
AFS' approach.  The absence of either of these features will  be  fatal  in  any
attempt  to  build  a  file  system that uses a wide-area network and spans many
organizations.


Acknowledgments


The xstat data collection facility was designed and  implemented  by  Ed  Zayas.
Contributions  to  the  evalu-  ation methodology for wide-area distributed file
systems were made by Ed Zayas, Alfred Spector and  Bob  Sidebotham.   Anne  Jane
Gray  provided  assistance  in organizing this project.  Comments by Mike Kazar,
Maria Ebling, Qi Lu, and Jay Kistler were helpful in improving the presentation.


References


  [1] Baker, M.G., Hartman, J.H.,  Kupfer,  M.D.,  Shirriff,  K.W.,  Ousterhout,
J.K.,  Measurements  of a Distributed File System. Proceedings of the Thirteenth
ACM Symposium on Operating System Principles, Pacific Grove, CA, October 1991.

  [2] Howard, J.H., Kazar, M.L., Menees, S.G.,  Nichols,  D.A.,  Satyanarayanan,
M., Sidebotham, R.N., West, M.J., Scale and Performance in a Distributed File 
System. ACM Trans. on Computer Systems, Vol. 6, No. 1, February 1988.

  [3] Kazar,  M.L.,  Synchronization  and   Caching   Issues   in   the   Andrew
File  System.  Usenix  Conference Proceedings, Winter 1988.

  [4] Kistler, J., Satyanarayanan, M., Disconnected Operation in the  Coda  File
System. ACM Trans. on Computer Systems, Vol. 10, No. 1, February 1992.

  [5] Morris, J. H., Satyanarayanan, M., Conner, M.H., Howard, J.H.,  Rosenthal,
D.S. and Smith, F.D. Andrew: A Distributed Personal Computing  Environment. 
Communications  of the ACM, Vol. 29, No. 3, March 1986.

  [6] Ousterhout, J., Da  Costa,  H.,  Harrison,  D.,  Kunze,  J.,  Kupfer,  M.,
Thompson, J. A Trace-Driven Analysis of the 4.2BSD File System. Proceedings 
of the 10th ACM  Symposium on Operating System  Principles, December, 1985.

  [7] Sandberg, R., Goldberg, D., Kleiman, S., Walsh, D., Lyon, B.,  Design  and
Implementation of the Sun Network Filesystem. Summer Usenix Conference 
Proceedings, 1985.

  [8] Satyanarayanan, M., A  Study  of  File  Sizes  and  Functional  Lifetimes.
Proceedings of the 8th ACM Symposium on Operating System Principles, Asilomar, 
December 1981.

  [9]  Satyanarayanan,  M.,  Howard,  J.H.,  Nichols,  D.N.,  Sidebotham,  R.N.,
Spector, A.Z. and West, M.J., The ITC Distributed File System: Principles and  
Design.  Proc.  10th  ACM Symposium on Operating System Principles, December 1985.

[10]  Satyanarayanan, M., Integrating Security in a  Large  Distributed  System.
ACM Transactions on Computer Systems, Vol. 7, No. 3, August 1989.

[11]  Satyanarayanan, M., Scalable, Secure,  and  Highly  Available  Distributed
File Access. IEEE Computer, Vol. 23, N. 5, May 1990.

[12]  Satyanarayanan, M., The Influence of  Scale  on  Distributed  File  System
Design. IEEE Transactions on Software Engineering, Vol. 18, No. 1, January 1992.

[13]  Sidebotham, R.N.,  Volumes:   The  Andrew  File  System  Data  Structuring
Primitive. European Unix User Group Conference Proceedings, August 1986.

[14]  Spector, A.Z., Thoughts on Large Distributed File Systems.  Proc.  of  the
German National Computer Conference, October 1986.

[15]   Spector,  A.Z.,  Kazar,  M.L.,  Wide  Area  File  Service  and  the   AFS
Experimental System. Unix Review, Vol. 7, No. 3, March 1989.

[16]   Steiner,   J.G.,   Neuman,   C.,    Schiller,    J.I.,    Kerberos:    An
Authentication  Service  for  Open  Network Systems. Usenix Conference 
Proceedings, Winter 1988.

[17]  Transarc Corporation, AFS 3.1 Programmer's Reference  Manual.  FS-00-D180,
Pittsburgh, PA, October 1991.


Mirjana Spasojevic received the B.S. degree in mathematics from  the  University
of Belgrade in 1986, and the M.S. and Ph.D. degrees in computer science from The
Pennsylvania State University, in 1989 and 1991, respectively.  She is currently
working  as  a  System  Designer  at  Transarc  Corporation.   Prior  to joining
Transarc, she was an Assistant Professor at the School of Electrical Engineering
and  Computer  Science,  Washington  State  University.   Her research interests
include distributed operating systems and data management.


Mahadev Satyanarayanan is an Associate Professor of Computer Science at Carnegie
Mellon  University.  He is currently investigating the connectivity and resource
constraints of mobile computing in the context of the Coda File  System.   Prior
to  his work on Coda, he was a principal architect and implementor of the Andrew
File System.  Satyanarayanan received the PhD in Computer Science from  Carnegie
Mellon  University  in 1983, after a Bachelor's degree in Electrical Engineering
and a  Master's  degree  in  Computer  Science  from  the  Indian  Institute  of
Technology,  Madras.  He is a member of the ACM, IEEE, Sigma Xi, and Usenix, and
has been a consultant to industry and government.