################################################
	   #                                              #
	   # ##   ## ###### ####### ##    ## ## ##     ## #
	   # ##   ## ##  ## ##      ###   ## ##  ##   ##  #
	   # ##   ## ##     ##      ####  ## ##   ## ##   #
	   # ##   ## ###### ######  ## ## ## ##    ###    #
	   # ##   ##     ## ##      ##  #### ##   ## ##   #
	   # ##   ## ##  ## ##      ##   ### ##  ##   ##  #
	   # ####### ###### ####### ##    ## ## ##     ## #
	   #                                              #
	   ################################################


	 The following paper was originally presented at the

	  Ninth System Administration Conference (LISA '95)
	     Monterey, California, September 18-22, 1995


	    It was published by USENIX Association in the
		    Conference Proceedings of the
		Ninth System Administration Conference

 
        For more information about USENIX Association contact:
 
                   1. Phone:    510 528-8649
                   2. FAX:      510 548-5738
                   3. Email:    office@usenix.org
                   4. WWW URL:  https://www.usenix.org
 
 
^L


          OpenDist - Incremental Software Distribution

Peter W. Osel and Wilfried Gnsheimer - Siemens AG, Mnchen, Germany

                            ABSTRACT

     OpenDist provides efficient procedures and tools to
synchronize our software file servers.  This simple goal becomes
challenging because of the size and complexity of supported
software, the diversity of platforms, and because of network
constraints.

     Our current solution is based on rdist(1) [1].  However, it
is not possible anymore to synchronize file servers nightly,
because it takes several days just to compare distant servers.

     We have analyzed the update process to find bottlenecks in
the current solution.  We measured the effects of network
bandwidth and latency on rdist.  We created statistics on the
number of files and file sizes within all software packages.

     We found that not only the line speed, but also the line
delay contributes substantially to the overall update time.  Our
measurements revealed that adding a compression mode to rdist
would not have solved our problem, so we decided to look for a
new solution.

     We have compiled a list of requirements for evaluating
software distribution solutions.  Based on these requirements, we
evaluated both commercial and freely available tools.  None of
the tools fulfilled our most important requirements, so we
implemented our own solution.

     In the following we will describe the overall architecture
of the toolset and present performance figures for the
distribution engine that replaces rdist.  The results of the
prototype implementation are promising.  We conclude with a
description of the next steps for enhancing the OpenDist toolset.

                         Our Environment

     The CAD Support Group of the Semiconductor Division of
Siemens AG installs, integrates and distributes all software
needed to develop Integrated Circuits.  We have development sites
in Germany (Mnchen and Dsseldorf), Austria (Villach), the United
States (Cupertino, CA), and Singapore.  The development sites are
connected by leased lines with a speed of 64 to 128 kBit/s.  At
each site, a central file server stores all software.  Client
workstations mount software from these servers.  Software is
installed and integrated in Mnchen and distributed to all other
development sites.  System administrators of the development
sites initiate the transfer on the master server in Mnchen.

     The CAD Support Group takes care of the CAD software and
tools, only.  A separate department is responsible for system
administration, i.e., maintenance of the operating system and
system tools, backups, etc.

     Our software distribution problem differs in many ways from
the one solved by traditional software distribution tools.  Most
software distribution tools we looked at are designed to
distribute a moderate number of fairly static software packages
of moderate size to many clients.

     In contrast, we have to synchronize few file servers (under
a dozen), which store many (about 200) packages of sizes ranging
from tiny (a couple of kilobytes) to huge (1.8 GBytes).  The
total size of the software we store is currently 25 GBytes, 10-15
GBytes are currently being kept up-to-date at all sites.  Many
packages are changed each day.  A change might update only a
single file of a few bytes or could change up to 50,000 files for
a total of 1 GBytes per day.  Every month about 10% of the
software change.  Most changes are small, but many files are
constantly updated.  The installation of a huge patch or a new
software package changes many files at once.

     There is no separate installation- or test-server, all
changes are applied to the systems while our clients are using
them.  The changes are tested in Mnchen and, ideally, copied to
all slave file servers within one day.  Synchronizing or cloning
file servers is the best way to describe our setup.

                      Our Current Solution

     Our current software distribution process uses rdist(1) to
find changed files and to update slave software servers.  It is
no longer possible to compare two software servers in one night.
A complete check of all software packages on the slave file
server in Singapore would take several days which is not
acceptable nor feasible.  During that time, software packages
would be in inconsistent states, and changes of the master
software server could take up to a week to be transferred to the
slave file server.  Though it is possible to apply different
update schedules - updating small packages daily, some weekly -
the setup is not satisfactory.  With an ever-increasing number of
software packages and an ever-growing size of each software
package, the distribution process using rdist is not acceptable
any more.

                                      LAN       MNCHEN    DSSELDORF    VILLACH     CUPERTINO     SINGAPORE

Line Type                          Ethernet      ISDN           X.25    leased           X.25     leased
Nominal Line Speed [kBit/s]        10,000         64           64         128           64            64
Transfer rate [kByte/s]            90-100        6-7          4-5        7-12          2-3           3-4
 Time [ms]                <1      33-88      188-372      81-311     530-1083      617-1375
rdist file create [s]                   0.2        0.2          1.2         0.6          2.1           4.5
rdist file check [s]                    0.02       0.06         0.5         0.2          1.            2.1
rdist file delete [s]                   0.1        0.13         0.5         0.3          1.            2.3
10kBytes transfer rate [kByte/s]       -           5.9          2.5         4.9          1.5           1.4

 [h]                       1          2.5          7           4           12            24
rdist check SW subset [h]              -          -            16           8          -            >802
OpenDist check SW subset [h]3           0.5       -            21            .5           .75           .75

rdist check all SW [h]2                 3         -            69          27          140           290
k all SW [h]               1.5       -            51           1.5          2             3

  1 Increased time, because software pools in  Dsseldorf  are  ac-
cessed via NFS not UFS.
  2 Estimated.
  3 This subset consists of technology data  and  is  changed  and
distributed  daily.   The  subset  contains approximately 150,000
files with a total of 1.1GBytes.
                 Table 1:  Line Characteristics

Searching The Bottleneck

     We have analyzed the update process to find bottlenecks in
our current solution.  We analyzed our lines and measured
bandwidth, latency and compression rate (all leased lines are
equipped with datamizers - devices that compress all traffic).
We created statistics on the number of files and their size for
more than 200 software and data packages.  Commercial software
packages, technology data and cell libraries, as well as many
free packages like X11 and gnu tools were analyzed.  We were also
interested in the compression rate and time of software packages
and how much the compression rate differs when software packages
are compressed file by file or as a complete archive.  We
analyzed where rdist spends its time during updates.  Compared to
the installed software, our change rate is small, so finding
changed files must be efficient.  Changes can be rather huge, so
the transmission of changed files must be efficient, too.

The Benchmark

     We wrote a benchmark suite that measures the elapse time
needed to perform typical software distribution operations such
as  installing, comparing, deleting, and updating files of
different sizes, installing symbolic and hard links.  All
operations were executed many thousand times to equalize
differences of the link performance.

     The benchmark measures ping(1), rcp(1), and rdist(1)
performance and times.  Each rdist test runs on a directory with
an appropriate number of random files of the same size.  Each
test contains an add, check, update and delete sequence.  The
file size is increasing from 1 Byte to 1MBytes.  Thus the effect
of transfer rate and rdist protocol can be separated.  The rdist
part of the benchmark source tree contains approximately 5,000
files.  This sums up to 10,000 transferred files, 5,000 check
actions, 5,000 delete actions and 30MBytes transferred data per
test run.  rcp(1) times are measured for a text, a binary and a
compressed file of 1 MBytes each.  This shows the achieved on-
line compression.

     The leased lines (except the dialup ISDN link) are shared by
many users.  So it is not astonishing that the benchmark results
varied a lot, sometimes by more than a factor of three.  To make
our benchmark of the line performance more comparable, we
calculated the average value for the best results of several runs
of the benchmark.  Some of the small numbers are within the
magnitude of time resolution and must be interpreted cautiously.

The Results

Size and Composition

     Software packages vary substantially in size and composition
of file types,  however bigger packages don't necessarily have
bigger files, they have a few huge files, but the average file
size is more or less independent of the total size of the package
(Diagram 1).

     [picture fsfn.eps not available]

             Diagram 1:  Package Size vs. File Count

Compression Factor

     The average compression factor of our software packages is
three.  Most of our software packages were compressed by this
factor, though we observed compression factors between two and
five.

     When using gzip(1), you can regulate the compression speed
between fast (less compression) and slow (best compression).  For
our software packages, increasing the compression quality reduces
the compressed file size by less than 5%, the compression time
however sometimes increased by more than 200% (Diagram 2).  The
default compression level of 6 is a good compromise, so we
se it.
     [picture X11-R6.gzip.eps not available]

         Diagram 2:  Gzip compression Quality and Speed

     Though our leased lines are equipped with datamizers that
compress network traffic, it is worthwhile to compress archives
before transmission.  Datamizers increased the transmission rate
of uncompressed data by 10..15%, whereas gzip reduced the data to
a third of their original size.

Compression Rate

     On a SPARCstation 10/41 (Solaris 2.4, 128 MBytes memory)
gzip created compressed data at a rate of 65 kByte/s, many times
faster than the speed of our leased lines.  This figure is
important to know when you want to pipeline the creation,
compression, and transmission of update archives.  In case the
throughput of the lines is in the same order of magnitude as the
gzip output rate, it would be advisable to decrease the
compression level.

Decompression

     Decompressing the archives with gunzip(1) is usually six
times faster than compressing the data.  Decompression time does
not depend significantly on the compression quality chosen for
compression (Diagram 2).

Compression and Archives

     It is better to compress an archive of files than to archive
compressed files.  Compressing complete packages is significantly
faster and creates smaller archives than compressing each file
separately and archiving the compressed files.  For example,
archiving and compressing X11R6 was completed in three minutes
elapse time, and the overall size was reduced by 55%.
Compressing each individual file and archiving the compressed
files in a second step took five minutes elapse time and reduced
the overall file size by only 45%.  All tests were performed
several times on an unloaded machine.  Compressing individual
files and archiving them needs many more file and disk operations
compared to archiving the uncompressed files and compressing the
archive.  Compressing several small files (or small network
packets) is not as efficient as compressing the files in a single
run.

Transmission and Archives

     It is better to transmit an archive of files than to
transmit each file individually.  Depending on the file transfer
protocol used, the latency of the line has a high impact on
transfer rates.  The smaller the files and the higher the
latency, the higher is the delay caused by inefficient protocols.

     The latency increases the time rdist needs to check or
create files.  If you have many files, rdist needs a long time to
compare master and slave server.  If many or all files changed
(e.g.  when installing a new software package), rdist will need
much more time to transfer all files.  The average file size of
our software packages is 30kBytes (Diagram 3).  To our Singapore
site, we need about 10 seconds (3kByte/s) to transfer a file of
this size.  However, rdist needs more than 4 seconds to create
the new file, for a total transmission time of 14 seconds (40%
increase), a 30% decrease in transfer rate.  The transfer rate
for 10 kBytes files is only half of the normally achievable
e (See Table 1).

     [picture size_range.eps not available]

           Diagram 3:  File size range (All packages)

     Besides avoiding protocol overhead, the transmission of
archives has additional advantages.  By first transferring all
changed files to a holding disk, and installing changes locally
on the remote server from the holding disk, the time during which
the software package is in an inconsistent state is significantly
reduced.  Moreover, we can use the same tools to archive and
roll-back changes.  The installation of changes can be done
asynchronously, so a system administrator at the remote site can
easily postpone updates.  The advantages compensate the
disadvantage of needing holding disks to temporarily store the
s.
     [picture lics.eps not available]

                Diagram 4:  Line Characteristics

rdist and Latency

     Although the line speed from Mnchen to Villach and to
Singapore differs by only a factor of two, the time needed to run
the rdist benchmark differs by a factor of six (see Table 1 and
Diagram 4).  The ping response time (and therefore latency) has a
greater impact on the time rdist needs to create or compare files
than the line speed.

Benchmark Summary

     Our measurements have revealed that the line speed is not
the only bottleneck:  the latency also plays an important role.
rdist compares source and target directory file by file.  Because
the time for this is proportional to the latency, and because our
change rate is small compared to the installed software, adding
compression to rdist would not have solved our problem.  rdist
spent most of its time trying to figure out what to update, and
not actually updating files.  On the other hand, if a new version
of our biggest software package is installed, we have to transmit
1.8 GBytes, so transmission must be optimized, too.  The
transmission of single files is another bottleneck as in our
environment, the protocol overhead and transmission time are in
the same order of magnitude, which reduces the average actual
transfer rate by up to 30%.  For an efficient solution in our
environment, files that have to be updated must be archived first
and then be transmitted in one large file.

     Upgrading our lines would not solve our problem, because the
latency would not get small enough.  It is also a very costly
solution.

     We found that we had to tackle two problems: making the
finding of changed files more efficient, and making the
transmission of data more efficient.  We began to look for a new
solution.

              Requirements for Software Maintenance

     We compiled a long list of requirements that a new solution
should fulfill.  Here are some of the more important ones:

Optimal Support of Incremental Distribution

     We do not want to trace changes as they are applied and re-
apply them at a later date on slave file servers.  Changes should
be found by comparing the status of the master and the slave file
server.  Comparison should be stateless - it should not depend on
update history.  Each file server is administrated by independent
system administrator groups, so we don't want to rely on what we
think the status is, but we rather have to check the actual
status of the remote file server.  We have to detect changes
applied by remote administrators.

Update Programs Currently Executing

     Files that are updated may not be overwritten.  The old file
has to be moved and unlinked, then the new file has to be moved
to it's final destination.

Do Not Require Root Permission To Run

     We install all software using unprivileged accounts and try
to avoid using root permissions as much as possible.
Synchronizing software file servers should be done using an
unprivileged account, too.  If root permission are required
(e.g., to update entries in system files, or programs that
require an user or group s-bit with a system owner- or group-
ship), a script should be created, that is executed separately by
the system administrator of the slave file server.

Support Mapping

     To allow localization, a flexible mapping of, e.g., file-
and path-names, permissions, and owner should be supported.
Software packages might be owned by different accounts on
different servers.  Symbolic links replace files to implement
site specific changes (e.g., for configuration files).

Support Execution of Scripts Before and After Updates

     Before and after an update or roll-back it should be
possible to execute scripts on the server and client.  You might
want to shutdown a database server and restart it after the
update has finished.  License servers might have to be re-
started, if license files were updated.

Transfer Data Efficiently and Reliably

     The data transfer must be efficient, because our links are
slow and have a high latency.  If the link fails for a short
period of time and the data transfer is aborted, transmit only
the missing data, do not re-transmit all data.

Be Humble

     Do not require special installation of software packages.
We don't want to change the installation of commercial software
and have to support a variety of different package types.

Should Support Roll-back

     It should be possible to undo at least one update.  If an
update of a software package introduces problems, it should be
possible to go back to the previous state of the software
package.  Also, if an update fails, roll-back the already applied
changes to return to a consistent state.

Minimize Inconsistent States

     The time that a software package is in an inconsistent state
(the time between the first and the last change that is applied)
should be as small as possible.  The time between applying
changes to software packages that depend on each other should be
as small as possible, too.  Roll-back changes, if an update did
not complete successfully.

Avoid Errors Pro-actively

     Try to verify in advance, whether an update is likely to
succeed, e.g.  check whether the target server has enough disk
space to store new or changed files.

Should Be Flexible

     It should be easy to choose alternative distribution media:
e.g., tape, email, direct network link.  Comparing the status of
master and slave file servers should be possible, even if no
direct network link exists between the servers.  The tool should
be modular and extensible.  Tool interfaces should exist and be
well-documented.

Should Use Standards

     Use well-known existing standards and standard tools as much
as possible.  Do not re-invent wheels.

Should Be Cost Effective

     The cost for the product, its installation and
customization, and its maintenance must be acceptable.

                   Evaluation of Alternatives

     We took a look at freely available tools, as well as
commercial tools, proposed standards, and papers dealing with
software management ([14], [17], [20], [21]).

Freely Available Tools

     The tools that we looked at can be categorized as follows:
Tools that help to maintain source code and install software in a
heterogeneous platform environment, like rtools [22];  Tools
whose primary focus is network and disk space efficient
installation and an unified setup and access by users in a campus
network - tools like ``The depot'' [3], [18], depot-lite [7],
opt_depot [24], ldd [4], lude [8], and beam [19] fall in this
category; and tools that are designed to distribute software,
like rdist [1], fdist [6], mirror [25], track [9], sup [2], and
SPUDS [5].

     All tools lack efficient incremental software distribution
over slow WAN links.  Bad assumptions include the set-up of
software packages which imposes too tight restrictions.  This
won't work in our multi-vendor system environment.  None cares
about controlling the transmission in terms of media, scheduling,
interruption or measurement and self-adoption.  No roll-back
support exists.

Commercial Tools

     Some commercial data distribution tools exist, as well as
software management tools, that provide additional functions to
cover a broader range of the software life-cycle, e.g.,
packaging, installation and de-installation.  XFer from ViaTech,
MLINK/ACM & DistribuLink from Legent, Tivoli/Courier from Tivoli,
and DSM-SAX from SNI fall more into the data distribution
category, whereas HP OpenView Software Distributor from Hewlett-
Packard, and SunDANS from Sun Microsystems [13] fall into the
latter category.

     GUI and object-oriented methods and policies ease software
packaging and automate distribution and gathering tasks.  Typical
application fields for these commercial tools are large companies
with diverse offices with many client machines like financial or
insurance companies.

     All commercial tools claim to comply to standards, although
it is sometimes hard to tell which standard they mean.  Only one
commercial tool purports to be compliant to the draft of the
POSIX standard 1387.2 (formerly 1003.7.2) Software
Administration.

     We found it difficult to explain exactly what we mean by
incremental update to some tool vendors.  No package had proper
mechanisms built-in.  Incremental updates can be added to most
commercial tools by writing scripts.  Price is also a problem.
Truly powerful tools won't start below $50,000 just for the
licenses.  Add an equivalent amount for installation,
customization, maintenance, and updates.  One benefit of
commercial tools is to reduce the required skill and cost of the
personnel at remote sites.  This will not work for our few,
demanding development sites.

     Usage of commercial tools is problematic if you want to
establish links to external companies.  You need to buy licenses,
so has your partner, too.  All evaluated distribution software
tools require that on both target and source locations daemons
are running, and you have to pay a license fee per master and per
client.

     It's foreseeable that not all companies (esp. small
consulting groups) are willing or able to spend the extra money
and the extra effort of installation.  Therefore it's very
important for us to have a tool that can be used without any
restrictions at least at the client side.

     Two software packages looked promising, though:

Tivoli/Courier

     Tivoli Systems implemented an extensive collection of system
administration tools.  One of them, Tivoli/Courier, allows
automatic software distribution and control of server and
workstation configuration.  Tivoli/Courier is embedded into the
Tivoli Management Framework.  Together with other tools this
forms a complete management environment.  A graphical user
interface and an object oriented approach allow easy maintenance
of large systems.  Tivoli/Courier allows to define software
packages, different styles of scheduling, to define which files
are updated at what time.  Scripts can be added to customize the
management environment to special requirements.

     Tivoli/Courier does not fit to our requirements in respect
to incremental distribution as we need it (It could be
implemented by external scripts).  It was not clear if we could
run Tivoli/Courier stand alone without the framework or the other
system administration tools.  A direct network link is mandatory.
All hosts involved in the update process need licenses and the
software has to be installed as root.  Reference customers seem
to have a different profile (many hosts to update, static package
design, smaller volume) than we have.

     If we already had Tivoli Management Environment in
productive use as the basis of our system administration, it
would make sense to evaluate the performance of Tivoli/Courier.
For the time being it would be too costly to implement the
Management Environment to just use the Tivoli/Courier part.

XFer

     XFer from ViaTech Technology was the second tool we
evaluated very closely.  XFer is optimized to solve the standard
software distribution task:  Update many hosts spotted over the
globe with packages of (in our opinion) modest size and well-
known structure.  Compression and packaging of updates are
standard.  Packages, hosts and other resources are objects and
can be managed efficiently.  Machines can be grouped in
``profiles''.  These profiles allow to send updates to machines
which require certain software, regardless of type and location.

     But at the time of the evaluation there was no support built
in for incremental distribution (in our terms).  High entry costs
would pay off for many client hosts, but not for the few servers
we run.  Database and protocol overhead are not known.  XFer
would profit if installed together with cooperative network, user
administration and configuration management tools, which are not
available on our sites.

POSIX 1387.2: Software Administration

     P1387.2 provides the basis for standardized software
administration.  It includes commands to install, remove,
configure, and verify software.  A distribution format (install
image) of software is defined along with commands to create and
to merge distribution images.  Provision is also made for
tracking what software is installed and what its level is.
Commands may be directed on one system to occur on any number of
systems throughout your network.

     There is a set of concessions to operating systems not based
on POSIX.1 and POSIX.2, and there are exception conditions, so
that systems such as DOS can conform to P1387.2.

Status

     P1387.2 has passed its ballots within IEEE and has become
the first POSIX system administration project to complete its
work.  Now P1387.2 has to be approved by the IEEE Standards
Board, registered by ISO as a Committee Draft, and soon will be
balloted as a Draft International Standard.

     Copies of the current draft, P1387.2/Draft 14a, April 1995,
are available from the IEEE Computer Society and from the IEEE
Standards Office.  A previous version of the draft (P1387.2/Draft
13) is also available by anonymous ftp [16].  See [15] and [23]
for a more detailed discussion of the status of P1387.2 as of
April 1995.

     Suggestions for follow-on activity include a guide to best
use of the current standard, a profile for DOS (and related)
systems, version and patch/fix management, policies in
distributed management (especially related to the definition of
the success of an operation) and associated recovery policies,
file sharing and client management, hierarchical distribution,
scheduling, and queuing and queue management.

     Because P1387.2 does not specify the means by which
distributed functions occur, the System Management Working Group
at X/Open are working to provide the necessary specifications to
permit distributed interoperability.

Relevance

     P1387.2 focuses on the distribution of software by
installation.  Software Service (patching software) has
explicitly been left out of the standard, because existing
schemes currently in use were too diverse.  A possible solution
is described in the rationale, though.

     Once ISVs and all our internal developers of software,
technology and library data use P1387.2 for their products,
initial installation of software might become easier (or at least
more similar).  Hopefully even software service (i.e., applying
patches) will eventually become standardized.

     However, unless all our changes to all our software is done
using standard procedures, we will have to clone file servers.
Even if we were able to install all changes in a standard way, we
would need some kind of queuing, so that we first can test the
changes, before all servers are updated.  Updates would have to
be scheduled for night time.  We don't believe that there will be
a standard or a commercial product for cloning file servers any
time soon.

                            OpenDist

     No available tool solved all important requirements, so we
decided to implement our own tool set.  OpenDist consists of
administrative tools taking care of scheduling updates,
statistical tools to report changes and performance of updates,
and distribution tools do the actual update.  Design goals are
modularity and flexibility.  The tools should be independently
usable entities, easily exchangeable and should work together in
changeable configurations.  All tools are implemented in Perl
[12], Version 5.  Wherever possible, existing standard tools are
used: e.g., GNU tar (gtar(1)) gzip, etc.

     All tools currently use a command line interface.  A more
convenient graphical user interface for casual users will be
added using TkPerl or a HTML browser.

Administrative Tools

     These manage the scheduling of updates and call the
distribution tools.  Software administrators can subscribe and
unsubscribe to software packages, query the software package
database for information about each software package, temporarily
postpone the update of selected packages, force an immediate
update of selected packages, or roll-back updates.  There are
tools to browse the update history and to retrieve information
about the status of each file server and software package.

     Information about our software packages is stored in a flat
(ASCII) file database, and includes:  name and purpose of the
package, status (test, old, current), dependency between
packages, grouping of packages into bundles, recommended update
frequency, contact information of package maintainer, etc.

Statistical Tools

     These display performance information.  Transfer rates and
update duration are indicators of bottlenecks and problems with
WAN lines.  We need early indicators to be able to upgrade our
network in a timely manner.  The update history can be shown, as
well as the current and historic free disk status on the file
servers.

Distribution Tools

     These distribute software packages, replacing rdist in our
environment.  The tools are optimized for low speed links with
high latency.  Software updates are transmitted in a compressed
format.

     They try hard to never leave a package in an inconsistent
state, i.e.  an update must be completed or has to be rolled
back.  Pre- and post-processing scripts are supported to save
files before updating, or restart a license server after
updating.

                The OpenDist Distribution Engine

     The OpenDist distribution engine is implemented in several
independent stages.  Each stage has clearly defined input and
output data formats.  Tools can be easily combined and exchanged
as long as the interface does not change.  The different steps
can be pipelined and performed in parallel when updating several
software packages to further speed up the update and to optimized
resource usage.  For lines that do not support independent data
transfers in both directions, the sending of updates and the
retrieving of index files should not be done in parallel.

     For our important packages we run the update once a day on
the distribution server.  The distribution process is controlled
and mainly run on this server.   We do not use the master file
server but a dedicated machine as the distribution server.  This
server must have a direct access to all file servers.  The index
files and archives are temporarily stored on a holding disk.

 sub scan_it {
     my($Name) = @_; my(@filenames);
     ...

     ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size,
      $atime, $mtime, $ctime, $blksize, $blocks) = lstat($Name);
     ... if file not readable, print warning and return ...

     printf("%s\t0%o\t%d\t%d\t%d\t%d%s\n", $Name, $mode, $size, $mtime,
                  $nlink, $ino, -l $Name ? "\t->\t" . readlink($Name) : "");
     if (-d $Name && (! -l $Name)) {
         opendir(DIR, $Name);
         @filenames = sort(grep(!/^\.\.?$/, readdir(DIR)));
         closedir(DIR);

         for (@filenames) {
             &scan_it(sprintf("%s/%s", $Name, $_));
             }
         }
     }
               Figure 1:  Excerpt from od_getindex

The Update Flow

     A package is updated in the following steps:
 + INDEX
   Create an index of the software package on the master and all
   slave servers.  Compress and transfer to the distribution
   server's database.
 + COMPARE
   Compare the master index against each slave index.  Output a
   list of changed file attributes.  Exceptions are handled
   here.
 + BUILD-ARCHIVE
   Build a compressed archive with all changed files.
 + BUILD-INSTALL
   Build an installation script which does the actual update on
   the slave server and takes care of changed attributes.
 + BUILD-RESTORE
   Build a rollback script, which allows to undo the update in
   case of failure or in case the last status should be
   restored.
 + TRANSMIT
   Transmit the archive and the scripts to the slave server.
 + INSTALL
   Execute the installation script on the slave server and
   notify the administrator about success or failure.

The Update Stages in Detail

The INDEX Stage

     Every time an update cycle is started, first a sorted index
of the software package is retrieved.  The index is not stored on
the remote system, but immediately compressed and piped to the
master server.  It is plain ASCII and sorted alphabetically by
filename to make comparison of indices easier.  Each line in the
index file describes one object of the filesystem, e.g.,  a file,
a directory, a FIFO, etc.  The perl script od_getindex is started
on the master and the slave servers from the distribution server.
od_getindex scans the complete tree beneath a given directory.
Symbolic links are not traversed.  For each entry the returned
index contains the attributes ``name'', ``type & permissions'',
``size'' and ``modification time'' by default.  For symbolic
links the link target is appended.  For files with a link-count
greater than one, the link count and the inode number are
appended.  Optionally more attributes like owner, group-id or
checksum can be reported.  The core of od_getindex is very simple
and efficient (Figure 1).  Of course, for the final version, we
added more error checks.

     The attributes are separated by a tab.  Files with funny
characters in their name (e.g., tabs, newlines) are ignored and
od_getindex will issue a warning message.  A leading hash sign
marks extra data or comment lines.  Besides these characters no
restrictions are imposed on filenames or package names.  Usage of
these and other non-printable characters should be avoided,
anyway.

     Extra data like a time-stamp, the remote host name,
blocksize in this file system or the mapping of user id to user
name are added case by case.  The format of the entries is
described in ``#FORMAT'' lines.

     The output from od_getindex is run through an ignore filter,
compressed and finally transmitted to the distribution server.
The index file uses complete filenames relative to the root of
the software package and gets quite large (about 100 MBytes for
our 25 GBytes of software), on the other hand, the compression
rate is obviously very good (about a factor of 10).

     The ignore filter is an optional script, which, e.g., allows
to avoid the transfer of unwanted data like huge working
directories.  Ignore filters can be implemented site and/or
package specific.  The appropriate ignore filters will be
transferred from the distribution server in advance.

 #Start "od_getindex /pool3/gnu", host: server, time-stamp: 806615319
 #FORMAT_FILE     NAME    TYPM      MTIM    SIZE
 #FORMAT_HARDLINK NAME    TYPM      MTIM    SIZE   NLNK   INOD
 #FORMAT_SOFTLINK NAME    TYPM      MTIM    SIZE   ->     TARG
 #BLOCKSIZE: 1024

 ... Directory:

 gnu/sun4.1/bin         040775 806615225    1024
 ... File with linkcount > 1:

 gnu/sun4.1/bin/gzcat  0100755 806614252   65536 2  75248
 gnu/sun4.1/bin/gzip   0100755 806614252   65536 2  75248
 ... Symbolic link:

 gnu/sun4.1/bin/gzcmp  0120777 806615225       6 -> gzdiff
 gnu/sun4.1/bin/gzdiff 0100755 746115780    2008
 ... Plain file:

 gnu/sun4.1/bin/gzexe  0100755 746115781    3864
 gnu/sun4.1/bin/gzgrep 0100755 746115781    1341
 ...
 #End "od_getindex /pool3/gnu", host: server, time-stamp: 806615321
           Figure 2:  Excerpt from od_getindex output

     Other output filters (e.g., for encryption) can easily be
added to the stream.  At this stage there is no difference
between master and slave servers.  The very same program is run
on all locations where the package is found.

The COMPARE Stage

     As soon as the index for a master and a slave package is
available, differences can be calculated.  od_compare opens both
index files and reads them line by line.  As the index files are
sorted by entry name, missing or extra entries are recognized
easily.  An ADD or DEL tag followed by name and type is printed.
Following are a list of attributes and values relevant for this
type.

     If an entry is in both index files, all significant
attributes of this entry are compared.  A list of changed
attributes is printed.

     A post processing filter cares about well-known exceptions.
These exception filters can be defined per package and/or per
site.  Several possible actions like ``notify administrator'',
``ignore this'' or ``run script'' can be triggered.  An typical
case is to exclude printer configuration files from being
updated.

     To allow for mapping of specific attributes, an optional
mapping filter can be applied on the master index before
comparison.  Examples are mapping of user-id or names of entries.

The BUILD Stage

     The remaining changes are handed over to the od_build
script.  This script reads in the changes and decides what action
should be triggered, based on the type attribute.  The build
script packs all files, which need an update, into an archive.
Currently we are using gtar to create the archive.

     The build-update script counts the number of changes and the
size of these changes.  If specific built-in limits are reached
different actions might be started.  E.g. if the overall size of
an update archive is very large, a tape might be written and sent
instead of transmitting it over the direct link.  If the number
of changes is above a given percentage, the administrator is
notified.

     Besides the archive, an installation and a roll-back script
are generated.  The installation script contains appropriate code
for every change.  It starts pre- and post-installation scripts.
It checks for available disk-space on the target system and tries
to avoid inconsistent states.  The roll-back script is able to
undo the latest update, even if the update failed half-way.

 #Start "od_compare gnu.stat.2 gnu.stat", host: od_host, time-stamp: 806616691
 #MASTER #Start "od_getindex /pool3/gnu", host: server, time-stamp: 806615319
 #CLIENT #Start "od_getindex /pool2/gnu", host: client, time-stamp: 806614037

 ... Changed attributes of file:
 Change-MTIM: gnu/sun4.1/bin/gzcat    FILE  806614252 753610106
 Change-PERM: gnu/sun4.1/bin/gzcat    FILE  0755      0777
 ... Replace a file with a symbolic link:
 Change-TYPE: gnu/sun4.1/bin/gzcmp    SYML  SYML      FILE
 Change-SIZE: gnu/sun4.1/bin/gzcmp    SYML  6         2008
 Change-PERM: gnu/sun4.1/bin/gzcmp    SYML  0777      0755
 Change-MTIM: gnu/sun4.1/bin/gzcmp    SYML  806615225 746115780
 Change-TARG: gnu/sun4.1/bin/gzcmp    SYML  gzdiff    <UK>

 ... Added a symbolic link:
 ADD:         gnu/sun4.1/bin/zcat     SYML  5
 Change-PERM: gnu/sun4.1/bin/zcat     SYML  0777      <UK>
 Change-TARG: gnu/sun4.1/bin/zcat     SYML  gzcat     <UK>
 Change-MTIM: gnu/sun4.1/bin/zcat     SYML  806614148 <UK>
 ... Deleted a directory + files in it:
 DEL:         gnu/sun5.2/lib          DIR   1024
 Change-PERM: gnu/sun5.2/lib          DIR   <UK>      0775
 Change-MTIM: gnu/sun5.2/lib          DIR   <UK>      806612281
 DEL:         gnu/sun5.2/lib/libfl.a  FILE  1328
 Change-PERM: gnu/sun5.2/lib/libfl.a  FILE  <UK>      0644
 Change-MTIM: gnu/sun5.2/lib/libfl.a  FILE  <UK>      801569606

 ... Change the target of a symbolic link:
 Change-SIZE: gnu/sun5.4/man          SYML  16      11
 Change-TARG: gnu/sun5.4/man          SYML  ./share/man/man3 ./share/man
 #End "od_compare gnu.stat.2 gnu.stat", host: od_host, time-stamp: 806616694
             Figure 3:  Excerpt from compare output

The TRANSMIT Stage

     As soon as the archive, the install and the roll-back script
are ready, they are scheduled for transmission.  The actual
transmission can run on a direct link, on tape or by email.  The
current implementation relies on a direct link.

     The update package is transmitted in fragments suitable for
the line speed.  Each fragment gets a checksum and will be
retransmitted in case of an error.  The fragments might be
encrypted.  In case the package is going to be transmitted over a
leased line, a flag can be set to postpone the transmission until
non-busy hours, to avoid resource conflicts with other users.
The fragments are decrypted and concatenated on the target
system.

The INSTALL Stage

     After the complete update package is on the slave server,
the install script is started.  At first the install script
checks if the necessary disk space is available.  Next all files
which will be updated or deleted are written to a roll-back
archive on the slave distribution holding disk.

     Care is taken, that files are not overwritten.  Old files
are moved and unlinked, before new files are installed.  This
way, we can update programs that are currently executing.

     Then the changes are applied.  The install script runs all
pre- and post-installation scripts at the right time.

     If something fails with the update, the roll-back script is
called to restore the latest consistent state.

     Through the five independent steps changes can be
implemented more easily.  We could implement different INSTALL
backends and use the most appropriate case by case.

     The separation of finding and applying changes gives us more
flexibility in trying not to run into disk space problems:  We
could remove all files that are going to be changed at the
beginning of the update, or we could delete and add files
simultaneously.  We could even try to allocate extra disk space
to prevent someone else from ``stealing'' disk space while we are
installing the changes.  However as long as several changes are
applied simultaneously, it is not possible to guarantee that we
will have enough disk space to install the changes, nor that we
can roll-back to the previous status, though this is unlikely to
happen.

Results

     OpenDist performs much faster than our previous, rdist based
solution.  Comparing a subset of 10% of all software on the
software servers in Villach and Mnchen can be completed in half
an hour instead of 8 hours.

     Creating the archive of files to be distributed must be done
carefully.  There is no backup or archive program that can cope
with every special situation (files with holes, excessive
pathname or symbolic link length, unreadable files and
directories, or special (device) files) [10].  Currently we are
using gtar and will document known limitations.

     Distributing an archive of files helps to guarantee a
consistent state of the target server.  We do not start to change
the slave server before the complete set of changed files has
been transmitted to the slave server's holding disk.  During the
actual change of the slave server's software package, the network
link might be down without affecting the update process.

     Roll-backs of (at least one) update can be more easily
supported than with rdist.  The installation script can archive
all files to be deleted or changed.  The distribution engine
creates a script that replaces the changed files with the saved
ones.

     A link (currently a direct network link) between master and
slave server is needed only in stages INDEX and TRANSMIT.  The
COMPARE and the BUILD stage perform locally on the distribution
server, independent of the line to the remote host.

     Now instead of line speed and latency, index creation and
compression became the bottleneck:  the transfer rate from our
Singapore site would allow to transfer the complete compressed
index (5.5MBytes) in about 25 minutes, whereas it takes about
three hours now.

     When you access the software over NFS, index creation slows
down by a factor of 3-5.  Therefore index creation should always
be done on the file server that actually stores the software.

     Small changes are very likely to be bug-fixes to software
officially released and in use.  Therefore these have to be found
and applied as soon as possible.  If a package is changed
substantially, it's very likely to be new, not yet officially
released.  Changes to such a package are not as urgent and can be
postponed to weekends and non-busy hours.  OpenDist is very
efficient in finding differences and there is enough bandwidth
available to apply small changes on the fly.  This allows us to
run OpenDist once every day on all packages to apply small
changes and postpone major changes to weekends.

     The index files can be stored and the same tools can be used
not only to find differences between different sites, but also to
find changes to software packages in a specific period of time.
Thus the same tools can be used to track changes applied to our
software packages over time.

     OpenDist needs no special installation on the remote
systems.  We can run the OpenDist procedure with any external
company without forcing them to buy licenses or to install a huge
tool package.  All the tools used in OpenDist are freely
available and can be distributed.  Thus we could deliver, e.g.,
perl and gnu-software to the client, too.

     OpenDist uses rsh(1) and rcp(1) which requires trusted
hosts.  We would not use this over public networks.  However
within our corporate network this is acceptable.  It is also
possible to replace rsh usage by a more secure mechanism, if
required.

                           Conclusion

Is OpenDist Better Then rdist?

     Are apples better than oranges?  It depends.

     The OpenDist toolset has more functionality than a
distribution engine like rdist; in fact, we could use rdist as
one of OpenDist's distribution engines.  OpenDist fulfills all
our important requirements and compared to our previous rdist-
based solution, wins especially in the categories:  efficient
transfer over low speed, high latency lines; support to roll-back
changes; and time period in which software packages are in an
inconsistent state.  Initial installation and configuration needs
more effort, though.  You will also need additional disk space
for the holding disk on all file servers.

     rdist is easier to setup and configure and is a standard
part of most Unix systems (if your vendor ships an old version of
rdist, upgrade to rdist Version 6 and complain at you vendor
about the old version.)  We still use rdist for many smaller
tasks in the local network as well as to copy few files to remote
servers.

     Both tools have their limitations, and OpenDist depends on
the tool used to create archives, which  has its own set of
limitations [10].

     Though we started to just replace rdist with a better
distribution tool, we realized that software distribution is
closely coupled with other software maintenance tasks, like
installing software, keeping information about installed
software, de-installation, making software accessible by users.
All stages of the software life cycle interact with each other.
Changes made in one stage can help to solve problems in related
stages.

                           Future Work

     The first implementation of the OpenDist distribution engine
has proven to be superior to rdist in our environment, so we will
further enhance it.  Enhancements include reduction of re-
transmitted bytes in case of line failures, optional transfer
media (e.g., use tapes to transmit large low priority packages).
We will also add more functions to administrative and statistics
tools.  Scheduling of updates and tools to query the status of
slave software servers will be implemented next.

     We will fine tune the parallelization of the update to
further speed up the update.  We plan to implement some of the
needed update functions (move file before updating) within gtar,
which will make the installation script simpler and more robust.

                          Availability

     At the time of writing, OpenDist is only available for use
within Siemens AG.  We hope to make the source for OpenDist
available on the Internet at a later time.  The paper and the
slides for this talk are available by anonymous ftp from
ftp.ConnectDE.NET in the directory /pub
/sysadmin/sw-distribution/OpenDist/.

                         Acknowledgments

     Special thank goes to Tom Christiansen for reviewing early
drafts of this paper, to all our system administrators for
valuable discussions and encouragement, to Gernot Babin for his
many contributions, and to SAM for all the support you have
given.  We thank the companies Interchip and Opis for their
support during the evaluation of their products.  We also thank
Connect! for providing ftp and web space.

                       Author Information

     Peter W. Osel received his diploma in electrical engineering
from the Technische Universitt Mnchen (TUM) in 1985.  For three
years he worked at central research and development of Siemens
AG, where he developed tools for ECAD of Integrated Circuits.
Since 1988 he is working for the Semiconductor Division of
Siemens.  He is responsible for worldwide integration and
distribution of the CAD system, as well as the development of
central tools, and the coordination of the development sites'
system environment.  Reach Peter at Siemens AG, HL CAD, Postfach
801709, D-81617 Mnchen, Germany; or by e-mail at
pwo@HL.Siemens.DE, or see his Web page at URL: https://www.
ConnectDE.NET/people/pwo/.

     Wilfried Gnsheimer received his diploma in electrical
engineering from the Technische Universitt Mnchen (TUM) in 1990.
Since then he is working for the Semiconductor Division of
Siemens.  He started at the application support group for RISC
microprocessors and Siemens microcontrollers.  There he was
responsible for definition, specification and test of development
tools, customer support, benchmarking and simulation of system
performance.  He ran a small heterogeneous network (PC, X-
Terminals, Workstations) inside the Siemens network.  Since end
of 1994 he is working in the CAD department.  Management of
licenses, installation of software, software distribution and
user support are the main topics.  Trouble shooting
printer/plotter problems is his favorite time waster.  Reach
Wilfried at Siemens AG, HL CAD, Postfach 801709, D-81617 Mnchen,
Germany; or by e-mail at wig@HL.Siemens.DE.

                           References

 [1] Michael A. Cooper, ``Overhauling Rdist for the '90s'',
     Proceedings of the Sixth Systems Administration Conference
     (LISA VI), pp.175-188, Long Beach, CA, October 19-23, 1992
 [2] Stephen Shafer and Mary Thompson, ``The SUP Software Upgrade
     Protocol'', Carnegie Mellon University, School of Computer
     Science, 1988.  Available from mach.cs.cmu.edu in
     /usr/mach/public/doc/sup.ps.
 [3] Wallace Colyer and Walter Wong, ``Depot: A Tool for Managing
     Software Environments'', Proceedings of the Sixth Systems
     Administration Conference (LISA VI), pp.153-162, Long Beach,
     CA, October 19-23, 1992
 [4] Walter C. Wong, ``Local Disk Depot - Customizing the
     Software Environment'', Proceedings of the Seventh Systems
     Administration Conference (LISA VII), pp.51-55, Monterey,
     CA, November 1-5, 1993
 [5] Ola Ladipo, ``A Subscription-Oriented Software Package
     Update Distribution System (SPUDS)'', Proceedings of the
     Workshop on Large Installation Systems Administration,
     pp.75-77, Monterey, CA, November 17-18, 1988
 [6] Bjorn Satdeva and Paul M. Moriarty, ``Fdist: A Domain Based
     File Distribution System for a Heterogeneous Environment'',
     Proceedings of the Fifth Large Installation Systems
     Administration Conference (LISA V), pp.109-125, San Diego,
     CA, September 30 - October 3, 1991
 [7] John P. Rouillard and Richard B. Martin, ``Depot-Lite: A
     Mechanism for Managing Software'', Proceedings of the Eighth
     Systems Administration Conference (LISA VIII), pp.83-91, San
     Diego, CA, September 19-23, 1994
 [8] Michel Dagenais, Stphane Boucher, Robert Grin-Lajoie, Pierre
     Laplante, and Pierre Mailhot, ``LUDE: A Distributed Software
     Library'', Proceedings of the Seventh Systems Administration
     Conference (LISA VII), pp.25-32, Monterey, CA, November 1-5,
     1993
 [9] The track package is available at URL:
     ftp://ftp.cs.toronto.edu/pub/track.tar.Z
[10] Elizabeth D. Zwicky, ``Torture-testing Backup and Archive
     Programs: Things You Ought to Know But Probably Would Rather
     Not'', Proceedings of the Fifth Large Installation Systems
     Administration Conference (LISA V), pp.181-189, San Diego,
     CA, September 30 - October 3, 1991
[12] Larry Wall and Randal L. Schwartz, Programming perl,
     O'Reilly & Associates, Inc., Sebastopol, CA, 1991
[13] SunDANS (SoftDist) Manual, alpha draft, Sun Microsystems
     Computer Corporation, Mountain View, CA, September 1993
[14] Ram R. Vangala, Michael J. Cripps, and Raj G. Varadarajan,
     ``Software Distribution and management in a Networked
     Environment'', Proceedings of the Sixth Systems
     Administration Conference (LISA VI), pp.163-170, Long Beach,
     CA, October 19-23, 1992
[15] Barrie Archer, ``Towards a POSIX Standard for Software
     Administration'', Proceedings of the Seventh Systems
     Administration Conference (LISA VII), pp.67-79, Monterey,
     CA, November 1-5, 1993
[16] ``POSIX 1387 System Administration Standard (draft 13)'',
     available at URL: ftp://
     dcdmjw.fnal.gov/posix/
[17] John Sellens, ``Software Maintenance in a Campus
     Environment: The Xhier Approach'', Proceedings of the Fifth
     Large Installation Systems Administration Conference (LISA
     V), pp.21-28, San Diego, CA, September 30 - October 3, 1991
[18] Kenneth Manheimer, Barry A. Warsaw, Stephen N. Clark, and
     Walter Rowe, ``The Depot: A Framework for Sharing Software
     Installation Acros Organizational and UNIX Platform
     Boundaries'', Proceedings of the Fourth Large Installation
     System Administrator's Conference, pp.37-46, Colorado
     Springs, CO, October 18-19, 1990
[19] Thomas Eirich, ``Beam: A Tool for Flexible Software
     Update'', Proceedings of the Eighth Systems Administration
     Conference (LISA VIII), pp.75-82, San Diego, CA, September
     19-23, 1994
[20] D. Nachbar, ``When Nework File Systems Aren't Enough:
     Automatic Software Distribution Revisited'', USENIX
     Conference Proceedings, Summer 1986, pp.159-171
[21] Steven W. Lodin, ``The Corporate Software Bank'',
     Proceedings of the Seventh Systems Administration Conference
     (LISA VII), pp.33-42, Monterey, CA, November 1-5, 1993
[22] Helen E. Harrison, Stephen P. Schaefer, terry S. Yoo,
     ``Rtools: Tools for Software management in a Distributed
     Computing Environment'', Proceedings of the Summer 1988
     USENIX Conference, pp.85-94, San Fancisco, CA, 1988
[23] Nicholas M. Stoughton <nick@usenix.org>, ``Standards Update,
     POSIX System Administration: Software Administration'',
     Newsgroups: comp.std.unix, Message-ID: <3rsf12$nqu@
     cygnus.com>, 16 Jun 1995 10:29:06 -0700, available at URL:
     ftp://ftp.ConnectDE.NET/ pub/sysadmin/sw-
     distribution/OpenDist/ P1387.2-status-9504
[24] URL: https://www.arlut.utexas.edu/opt_depot/
     opt_depot.html
[25] URL: ftp://src.doc.ic.ac.uk/packages/mirror/