Speeding Up UNIX Login by Caching the Initial Environment Carl Hauser - Xerox Palo Alto Research Center ABSTRACT A package scheme helps users manage the environment variables needed by the applications that they use, but imposes a long delay during login while the environment is incrementally constructed. This paper describes an approach to caching the incrementally constructed environment. The mechanism caches different environments for different operating systems and is robust in the face of users' changes to their .login files. For the typical PARC user who enables 11 packages at login, caching reduces the time to login from about 30 seconds to about 5 seconds. Introduction The Xerox Palo Alto Research Center (PARC) research community and support staff use various UNIX systems and applications in the course of their daily work. Applications are stored on file servers and maintained for multiple systems using strategies similar to those used in the NIST Depot [2]. Each application is centrally maintained, but every user's computing environment is highly customized. Login scripts set environment variables, configure terminals, and so on, for the applications that the user actually uses. Managing the contents of the environment is burdensome for users who use many applications on many different kinds of systems. To alleviate some of the burden on users, PARC implemented a Packages scheme similar to the Modules scheme described by Furlani [1]. Each application (actually application version) is stored on a file server in a directory tree structured according to the Packages conventions. The conventions require that every package's directory has a top/ subdirectory containing a README file and C Shell scripts called bringover and enable. A user executes the bringover script once to establish the permanent state of the application in his home directory. Thereafter, the environment in any shell instance can be prepared for the application by executing the enable script. The enable script at least adds the package's bin/ directory to the shell's PATH environment variable and its man/ directory to the MANPATH environment variable. In general, however, enable scripts may affect environment variables in arbitrary ways. Every user's .cshrc file defines aliases implementing bringover and enable commands taking a package name as an argument. The aliases locate the top/ subdirectory for the named package, using the support programs of the Packages system, and source the appropriate OS-and-package-specific bringover or enable file. Therefore, before using a package the very first time, a user executes the command bringover once. Thereafter, executing enable in a shell instance sets up the environment variables in that shell to use the package. Users typically enable the packages that they use the most with commands in their .login files, but they also enable packages interactively, for example, to try out new software. Notice that, unlike Furlani's Modules, our Packages support only the C Shell (csh) and other shells that use the C Shell language. The caching techniques described here could be applied to module systems using other shells, but we have not needed to do so for the PARC environment. Problem Statement As was observed in the Modules system, a few seconds are needed to enable a package. This is acceptable when interactively enabling a single package, but it has proven unacceptable when many packages are enabled in the .login script. Since our researchers' work on interoperable and distributed systems leads them to login often to various machines, they soon find the delay during login becoming intolerable. While reducing the time for each enable would be highly desirable, csh's hashing of the directories on the search path with each change of the PATH variable imposes a lower bound at which the delay would still be too great. (The problem is compounded by a large directory containing standins for all known executables that many users place at the end of their search paths. The standins help users figure out what package needs to be enabled to provide the command, but with over 6000 entries the directory is very expensive to hash.) Approach Our caching approach is rooted in the observation that the state of the environment immediately after login is almost always the same for a given user on a given kind of machine. It would only be different if the user changed her .login file or the system administrators changed the effect of a package's enable script. Changing the .login file is an infrequent event. Changing an enable script is even rarer. ------------------------------------------------------------------ set path = ( /usr/ucb /bin /usr/bin $HOME/bin /etc /usr/etc \ /usr/parc/bin ) set packages=\ ( misc lemacs openwin X11R5 Xmisc afs gdb lcd ) if ( -r $HOME/.login-shared ) then # .login-shared enables each package listed in $packages source $HOME/.login-shared else # normally done in .login-shared for fast enable cacheing foreach p ($packages) echo -n " $p" enable $p end echo "" endif Figure 1: Sample .login fragment ------------------------------------------------------------------ The environment caching mechanism described here reduces the delay during login by setting all environment variables exactly once during login from a file source'd from the user's home directory. A separate file is kept for each OS type. The absence of a cache file for an OS or a change to the .login file causes each package to be individually enabled so that the environment is correctly initialized. The cache is recomputed asynchronously at each login so there is at most a one-login delay in correcting the cache for a change made to an enable script. Thus, the existence of the cache is transparent to the user, excepting only the shorter time it takes to login and the potential to miss a (rare) enable script change for one login. It would be possible to make the use of the cache sensitive to changes in the enable scripts by comparing a timestamp in the cache with the timestamp of the enable script, but this was rejected for three reasons. First, the additional time required to locate and stat the script files would slow down login in the most frequent case-that of no changes. Second, the vulnerability to changes would remain, because enable scripts may themselves have dependencies on other files that might change. Thus, users would have to be warned of the potential anomaly anyway. Finally, implementing such a test would further complicate the system. We judged that these negatives outweighed the benefit of a slightly more sensitive test for cache invalidity. For similar reasons, the benefits of accuracy and simplicity gained by completely recomputing the cache file at each login outweigh the reduction in system load that might be gained by trying to figure out when such a recomputation is really needed. Implementation Environment caching is implemented by a single, shared C Shell script source'd from users' .login files. To use it, users modify their .login files to initialize the shell variable packages with a list of the names of packages to be enabled and then source the file .login-shared. See Figure 1. .login-shared provides the caching implementation. (The script appears in the Appendix should you want to follow along during the discussion.) It is invoked in two ways: as we have seen, it is source'd from users' .login files; and .login-shared, itself, invokes a nice'd, background, C shell also executing .login-shared. The first of these establishes the environment for the user's current login session, using a cache if one is available, while the other computes a new cache for use the next time the user logs in. (Separate script files implementing these two functions could be used, but having them in a single file is perhaps a bit easier on our administrators.) .login-shared determines which of these two things it's supposed to do based on the definedness and value of the environment variable MAKEENABLECACHE: if MAKEENABLECACHE is undefined, .login-shared must construct the user environment and build a new cache in the background; if MAKEENABLECACHE is YES it should build a new cache; and if MAKEENABLECACHE is NO it should do nothing (see discussion of login -p below). To construct the user environment, .login-shared looks for a cache file (named .login-enables- by default) and confirms that its mtime is later than that of the .login file. If so, it source's the cache. As an additional consistency check, cache files are self-checking against the packages list that they implement. If all goes well, the cache file is source'd, constructs the environment and returns indicating success in a shell variable. Should either the mtime test or the packages list test fail, .login-shared takes the slow path of individually enabling each package in packages. Finally it sets MAKEENABLECACHE to NO and returns to the user's .login file. ------------------------------------------------------------------ # if ( $?debugenables ) echo " " if ( $?debugenables ) echo -n enable cache \ created Thu Jan 27 14:47:32 PST 1994 by tregonsee if ("$packages"==\ "import-support-1.0 import-support gnu-2.0 sunpro bridge-2.0") \ then \ setenv MANPATH '/import/bridge-2.0/man:/local/sunpro/SUNWspro/man:\ /import/gnu-2.0/sparc-sun-solaris2/man:\ /import/import-support-1.0/sparc-sun-solaris2.2/man:/usr/share/man' setenv PATH '/import/bridge-2.0/p2:/local/sunpro/SUNWspro/bin:\ /import/gnu-2.0/sparc-sun-solaris2/bin:\ /import/import-support-1.0/sparc-sun-solaris2.2/bin:.:\ /sbin:/usr/sbin:/usr/bin:/etc:/usr/ccs/bin:/usr/ucb:/usr/openwin/bin' set didenables endif Figure 2: A small environment cache file ------------------------------------------------------------------ The .login-shared instance that executes in the background receives the list of packages as its arguments. This shell sees a pristine environment in which none of the packages have been enabled. Its initial environment reflects only the contents of the user's .cshrc file and the .login file prior to its source'ing of .login-shared. .login-shared records this initial environment state in a temporary file and then enables each of its arguments. When it has finished all of them it compares the new environment with the old and produces a cache file containing a setenv command for each environment variable that changed or was newly defined. It is beyond the power of simple English to describe the sed, sort, uniq, and awk commands that accomplish the comparison between the results of the two printenv commands and their combination into a single collection of setenv commands, so please refer to the Appendix for the actual code. Figure 2 is an example cache file produced by .login-shared.* [[FOOTNOTE: Figure 2 has been edited to reduce the line lengths. The setenv commands are not (and must not be) split over lines in the file. ]] One obvious thing to worry about is multiple logins occurring in close succession. Care is required to ensure that the new cache value is correct in this situation. Temporary file names that .login-shared creates include the process id of their creator. Furthermore, the cache files are created with temporary names prior to being mv'd to the proper place. Since mv isn't atomic, theoretically another login proceeding simultaneously could see an inconsistent state. However, should this happen, that login would just take the long path of separately enabling each package, so no real harm would be done. The implementation supports all of the operating systems supported by the Packages system including Sun Solaris-1 and Solaris-2 for SPARC systems, IBM AIX 3.2 for the RS6000, SGI Irix 4 and Irix 5 for SGI systems, and OSF1 for the DEC Alpha. Performance A sample of 62 PARC Solaris-1 users enable between 3 and 26 packages in their .login files. The mean is 12 packages and the median and mode are each 11 packages. (Solaris-1 is the system used by a large majority of PARC UNIX users. One would be hard- pressed to find 26 enable-able packages for any of the other systems.) Recent measurements indicate that to enable 11 packages during login requires 28 to 35 seconds on a typical SparcStation 2 running Solaris-1. Using a cache to get the same effect takes 4 seconds. Gotchas The .login-shared file has gone through several releases over the last two years to correct deficiencies of the original design and implementation. Most have been to adapt the script to deal with the different locations of utilities such as printenv, uniq and awk on the various platforms we support. While tedious to correct, such problems are easily predicted in a multi- platform environment. Apart from the locations of the standard commands, no platform-specific customization has been needed, for example, to use different switches or different awk or sed scripts on the various platforms. Two more subtle bugs have emerged and been corrected over this time. The first concerns explicit use of the the login -p command. (Recall that login -p passes its caller's environment to the login shell that it creates.) If .login-shared is invoked from a shell started with login -p, it must not compute a new cache based on the difference between the original environment it sees and the environment created by enabling the listed packages: the original environment already has the packages enabled. This is the purpose of setting MAKEENABLECACHE to NO in the environment. Since MAKEENABLECACHE is inherited by a forked login shell if other environment variables are, .login-shared recognizes the use of login -p and doesn't compute a new cache. The other subtlety concerns environment values containing special characters. The printenv command does not quote the values in its output, so this has to be taken care of in the awk script that converts printenv output to setenv commands. While not difficult to fix, this bug was not triggered for a long time after the deployment of the caching programs. Finally, while not directly a problem with environment caching, the improved performance of login has encouraged people to have lots of packages enabled. This has, in turn, pushed them up against csh's 1K limit on the length of the search path. We have had to produce a variant of csh supporting paths up to 4K in length for use at PARC. Conclusions A package scheme providing scripts for setting up shell environments can be very useful to a large community using many applications, but both the Modules system and the PARC Package system suffer from the long time it takes to establish the initial environment at login. The environment caching technique described here reduces the time taken by login by about 25 seconds for a typical user at PARC. Since logins are usually not easily overlapped with other work activity, they tend to be particularly disruptive to thought processes. Saving 25 seconds here is seen as more valuable than saving 25 seconds in some other contexts. The cache validation scheme used is robust enough to immediately implement changes that a user might make to her list of enabled packages, but lags by one login changes that administrators might make to the effect of a package's enable script. Most users have found this behavior acceptable. Users who are uncomfortable with this behavior can easily opt out of using the caches by enabling packages directly in their .login files and invoking .login-shared without setting packages. Acknowledgements PARC's Package system was originally conceived and implemented by Stan Lanning for SunOS 4.1. Jim Foote contributed much of the multiplatform capability. Dale MacDonald currently maintains the Package system and many of the most commonly used packages. Steve Putz provided examples and fixes for the problem of environment values containing special characters, and Mark Verber acquainted me with the Modules work. As always, discussions with Al Demers provided new insights. Availability .login-shared is available for anonymous ftp. The URL is file://ftp.sage.usenix.org/pub/lisa/lisa8 /hauser.tar.Z Author Information Carl Hauser joined the Computer Science Laboratory at the Xerox Palo Alto Research Center ten years ago as a Member of the Research Staff after five years at the IBM San Jose Research Laboratory. He develops language run-time implementations for multi-threaded languages and has a particular affinity for developing caching solutions to performance problems. His 1980 dissertation at Cornell University concerned verification of parallel programs. Reach him by mail at Xerox Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA 94304 or via electronic mail at the address chauser@parc.xerox.com References [1] Furlani, J. ``Modules: Providing a Flexible User Environment.'' USENIX Large Installation System Administration V Conf. Proceedings, 1991, pp. 141-152. URL: file://ftp.sage.usenix.org /pub/lisa/lisa5/furlani91-modules.ps [2] Manheimer, K., Warsaw, B., Clark, S., and Rowe, W. ``The Depot: A Framework for Sharing Software Installation Across Organization and UNIX Platform Boundaries.'' USENIX Large Installation System Administration IV Conference Proceedings, 1991, pp. 37-76. The URL is file://ftp.sage.usenix.org/pub/lisa/lisa4 /manheimer90-depot.troff Appendix A: .login-shared #!/bin/csh -f # .login-shared: Usage in a .login file # if you want simple enable caching: # set packages=(blank-separated-list-of-package-names-to-be-enabled) # # if you want enabling to proceed silently define silentenabling # #set silentenables=yes # if ( -r $HOME/.login-shared ) then # # .login-shared enables each package listed in $packages # source $HOME/.login-shared # else # # if things are set up right, this branch should never be executed # foreach p ($packages) # enable $p # end # endif # if you enable different packages in different situations # if (situation-1) then # set cachename=.login-cache1 # set packages=(list-of-packages-for-situation-1) # else if (situation-2) then # set cachename=.login-cache2 # set packages=(list-of-packages-for-situation-2) # else if ... # endif # if ( -r $HOME/.login-shared ) then # # .login-shared enables each package listed in $packages # source $HOME/.login-shared # else # # if things are set up right, this branch should never be executed # foreach p ($packages) # enable $p # end # endif # if you don't want enable caching # unset packages # if ( -r $HOME/.login-shared ) then # # .login-shared enables each package listed in $packages # source $HOME/.login-shared # endif # That's the end of the documentation for ordinary users. What # follows is detailed documentation of the caching system. # Create a .login-enables file to be sourced by .login. The produced # file contains setenv commands that reflect environment changes made # by enable files for the listed packages. The advantage of using # enable caching is that the environment setting during .login # processing goes fast, while the hard work of figuring out what the # enable files do is deferred. The disadvantage is that the # environment variables will be set according to the enable files as # they existed at the previous login--which, it is claimed, is not too # bad since enable files are slowly changing objects. # This file is processed twice: once when sourced from .login, once in # a forked csh. The environment variable MAKEENABLECACHE controls its # operation. # MAKEENABLECACHE conventions # == YES: processing in a forked csh; .login-enables should be created. # == NO: processing was sourced from a .login that itself is # executing in an environment where enables have already been done. # unset: processing was sourced from a .login that itself is # executing in an environment where enables have not yet been done. # To start using enable caching, change your .login file as described # above; each subsequent login will use an existing .login-enables to # speed enable processing then create a more up-to-date one for the # next login. # Bugs: # SunOS4.1.1 dependent? # mv is not really atomic. # Presumes that the only effect of the enable files is to modify the # values of environment variables. # Don't alter the search patch set by .login. Instead, explicitly # reference the desired utilities depending on the host platform: set platform = `/import/import-support-1.0/bin/sys-os-type.1` if ( ! $?MAKEENABLECACHE ) then if ( ! $?packages ) then # If $packages is not set, .login has not been set up to use # this mechanism properly. # We might want to give some advice about using caching enables # here. exit else # enable import-support-1.0 set packages = (import-support-1.0 $packages) # if ( $?packages ) then # No MAKEENABLECACHE in environment so fork self to make the file. if ( ! $?cachename ) set cachename = .login-enables setenv ENABLECACHE $HOME/$cachename-$platform setenv MAKEENABLECACHE YES # Putting the following command in "()"s make messages go to /dev/null # instead of cluttering up the console. # But first, sleep a bit so as not to get in the way of the login. (sleep 15; /bin/nice /bin/csh -f $HOME/.login-shared $packages &) # Henceforth, logins that inherit the current environment should # not do this again. # If there was no .login-enables for this platform, or if the # .login-enables is older than the .login file, we're forced to # do the enables synchronously unset didenables if ( -r $ENABLECACHE ) then set LSRESULT = `/bin/ls -c -t $HOME/.login $ENABLECACHE` if ("$LSRESULT[1]" == "$ENABLECACHE" ) then # sets didenables if the package list matches if ( ! $?silentenables ) echo -n "fast enable: $packages" source $ENABLECACHE if ( ! $?silentenables && ! $?didenables) echo -n \ " -- failed; maybe the list changed?" echo "" endif unset LSRESULT endif if ( ! $?didenables ) then if ( ! $?silentenables ) echo -n "enabling:" foreach p ($packages) if ( ! $?silentenables ) echo -n " $p" enable $p end if ( ! $?silentenables ) echo "" endif unset didenables unsetenv ENABLECACHE # endif setenv MAKEENABLECACHE NO endif # $?packages = T else if ( $MAKEENABLECACHE == YES ) then # This is the forked self. Update the compiled enable file # $ENABLECACHE. # Make the compiled file self-validating wrt the package list echo "#" > $ENABLECACHE.$$ echo 'if ( $?debugenables )' echo '" "' >> $ENABLECACHE.$$ echo 'if ( $?debugenables )' echo -n 'enable cache created' \ `date` by `hostname` >> $ENABLECACHE.$$ echo 'if ( ' \"\$packages\" " == " \"$*\" ' ) then '\ >> $ENABLECACHE.$$ # Capture the current environment. # Each line in prefaced with a "b=" to mark it as being # "before" the enables. switch ( $platform ) case mips-sgi-irix4: case mips-sgi-irix5: case alpha-dec-osf1: case rs6000-ibm-aix: set PRINTENV = /bin/printenv breaksw case sparc-sun-solaris1: case m68k-sun-solaris1: case sparc-sun-solaris2.3: case i486-sun-solaris2: default: set PRINTENV = /usr/ucb/printenv breaksw endsw $PRINTENV | /bin/sed -e "s/^/b=/" > /tmp/environ$$ # our own definition of enable since user's .cshrc may not execute # .cshrc-shared in this forked process # n.b. for LISA VIII readers: # the following command had to be improperly split # across lines to fit on paper. Retrieve the # actual script to assure correctness. alias enable \ 'source "`/import/import-support-1.0/'$platform'\ /bin/package-file-name enable \!*`"' # Do an enable for each argument. foreach p ($*) enable $p end # Capture the resulting environment, adding it to the file created # before the enables. # Each line is prefaced with a "a=" to mark it as being # "after" the enables. $PRINTENV | /bin/sed -e "s/^/a=/" >> /tmp/environ$$ unset PRINTENV # Sort the file with all the environment values. The sort is # carefully designed to bring together lines that define the same # environment variable, with the "after" line before the "before" # line (if there was indeed a "before" value). Note the clever, # indirect use of the "a=" and "b=" that were added to each line -- # we depend on the fact that "a" comes before "b", and that "=" is # the field delimiter. /bin/sort -t= +1 -2 -o /tmp/environ$$ /tmp/environ$$ # Delete things that didn't change at all. Note how the leading # "a=" or "b=" are ignored in the comparison -- we depend on the # fact that the "a=" and "b=" were added to the beginning of the # line. set UNIQ = /bin/uniq if ( $platform == mips-sgi-irix4) then set UNIQ = /usr/bin/uniq endif if ( $platform == mips-sgi-irix5) then set UNIQ = /usr/bin/uniq endif if ( $platform == sparc-sun-solaris1 ) then # solaris-1 uniq is (silently) broken on lines longer than # 1000 characters set UNIQ = /import/textutils/sparc-sun-sunos4.1/bin/uniq endif $UNIQ -u +2 /tmp/environ$$ /tmp/environ.uniq$$ unset UNIQ # Collect the lines that remain and that come from the "after" # environment. At the same time, convert the syntax of the lines to # "setenv" commands. If I were really an awk hacker, I could # probably have this command do the "uniq" stuff above, too. But # I'm not. set AWK = /bin/awk if ( $platform == mips-sgi-irix4) then set AWK = /usr/bin/awk endif if ( $platform == mips-sgi-irix5) then set AWK = /usr/bin/awk endif /bin/sed "s/'/'\\''/g" /tmp/environ.uniq$$ \ | $AWK -F= 'BEGIN {sq = sprintf("%c", 39)} \ $1 == "a" { print " setenv " $2 " " \ sq substr($0,length($2)+4) sq }' \ >> $ENABLECACHE.$$ unset AWK echo " set didenables" >> $ENABLECACHE.$$ echo "endif" >> $ENABLECACHE.$$ # As atomically as possible, move the uniquely-named file to the # standard place. /bin/mv -f $ENABLECACHE.$$ $ENABLECACHE # Remove the temporary files. /bin/rm /tmp/environ*$$ endif # $MAKEENABLECACHE = YES unset platform