13th Systems Administration Conference (LISA '99)
Our thanks to the summarizers:
David J. Young
Andrew Hume & David Parter (right)
After the traditional announcements from the program chair, David Parter, a moment of silence to remember W. Richard Stevens was held. Andrew Hume, USENIX Association president, requested feedback on the direction of USENIX and SAGE. If you have any comments, please feel free to forward them to him, to any member of the USENIX Board, or to a member of the SAGE Executive Committee.
Wietse Venema (left) & Barb Dijker
SAGE President Barb Dijker then presented the 1999 SAGE Outstanding Achievement Award to Wietse Venema for his "continual work to improve the security of systems," including such tools as TCP Wrapper, SATAN, and Postfix, as well as The Coroner's Toolkit.
David Parter then presented the best paper awards:
Best Paper: "Dealing with Public Ethernet Jacks Switches, Gateways, and Authentication," by Robert Beck, University of Alberta.
Best Student Paper: "A Retrospective on Twelve Years of LISA Proceedings," by Eric Anderson and Dave Patterson, University of California at Berkeley.
The LISA 2000 program chairs were announced: Remy Evard and Phil Scarr.
Getting the Space Shuttle Ready to Fly
Joe Ruga, IBM Global Services
Joe Ruga (right) relaxing with unknown admirer
Joe Ruga spoke about his time at Rockwell International as a system administrator and operations support. He gave a high-level view of the growth of the company and how he thought sysadmins played a role and interacted with the business units of Rockwell.
Some interesting insights that Ruga shared with the audience were:
Ruga then talked briefly about how he defines goals using purpose, scope, and concept.
Session: Using Electronic Mail
Summarized by Jim Flanagan
ssmail: Opportunistic Encryption in sendmail
Damian Bentley, Australian National University; Greg Rose, QUALCOMM Australia; and Tara Whalen, Communications Research Centre Canada
The obstacles to a "proper" solution to email snooping are the fact that encryption is not as widely deployed as it should be, and that there is no control over the paths email takes.
Defending against active attacks requires an authentication infrastructure that does not currently exist, so ssmail restricts its threat model to passive attacks (i.e., snooping), and the authors adopt the stance that while snooping cannot be eliminated, removing as many opportunities for snooping as possible constitutes progress. Email snooping is not as uncommon as most people think; there were 26 known occurrences of sniffers installed on backbone segments in a single year in Great Britain.
ssmail is a modification to sendmail that will encrypt an SMTP session wherever possible, but will interact normally with non-ssmail MTAs. An ssmail server will advertise the encryption capability during the EHLO phase of an ESMTP negotiation. Both the message body and the envelope (MAIL From:, RCPT To:, etc.) are encrypted.
When encryption is negotiated, the two parties calculate a one-time session key using Diffie-Hellman key agreement. This key is then employed in a stream cipher (either an RC4-alike or SOBER-t32, a cipher developed by the authors, which has a shorter setup time) to encrypt the message traffic. Because the Diffie-Hellman algorithm is expensive and would swamp a busy mail exchanger, ssmail caches the session keys and will reuse these in a faster key-generation algorithm.
Other approaches to solving this problem include S/WAN, FreeSWAN, and IPSec. Similar work includes SMTP over Transport Layer Security (TLS), which is not as efficient, because it doesn't cache keys, and integrating PGP into MTAs, which would probably swamp keyservers. ssmail is currently in beta test, and users outside Australia will require an export license.
In response to questions from the audience, Greg Rose told us that they did not consider compression, since they wanted to minimize the impact of modifying sendmail, and that ssmail was modular enough to import into other MTAs; in fact someone had already ported it to qmail. Asked about a specific type of denial-of-service attack, Rose reminded the questioner that ssmail takes as a threat model only passive attacks, and DoS attacks are active.
MJDLM: Majordomo-based Distribution List Management
Vincent D. Skahan, Jr., and Robert Katz, The Boeing Company
There are about 125 ongoing lists, regenerated every week, with the possibility of building temporary targeted lists for special purposes. Because of the possible impact of sending messages to, in some cases, 140,000 recipients, the messages need to be approved by Boeing Public Relations; not just anyone can send messages.
List creation is kicked off from a Web interface and sent to DBA staff, who construct an SQL query based on the request. A general sanity check (number of recipients, etc.) is done on the results, and then a list is built or rebuilt. Large changes in the size of a list result in staff notifications. Alternate databases can be used to generate lists, or for additional selection criteria.
The flow time for mailings runs from 12 seconds to six hours, depending on the audience. The authors look at the last bounced message to place a lower bound on the total flow time of a mailing, from sending to last recipient delivery. Bounced messages are sent to a procmail-filtered mailbox and categorized by the cause of the bounce. The addresses in the lists are invariant, layer-of-abstraction-type addresses that are translated to real delivery addresses by sendmail. Sendmail 8.8.8 is used for its ability to employ additional alias databases. The translation process can take up to an hour for large mailings, and one planned improvement is to populate the lists with real addresses when they are built.
MJDLM is hosted on a single production server, with a redundant standby and two to three geographically distributed mail servers.
Users don't have the option to unsubscribe, but the team experienced little resistance from the user community, and the system provides a communications channel from the CEO wherever in the world he happens to be directly down to the line workers.
One question from the audience was, "How did you get HR to give you access to the personnel database?" to which the answer was, "The CEO told us to do this."
RedAlert: A Scalable System for Application Monitoring
Eric Sorenson, Explosive Networking, and Strata Rose Chalup, VirtualNet
RedAlert's architecture provides a central (or multiply distributed) "alerting" daemon for the aggregation of status traffic and dispatch of alerts, with "testing" clients written around a provided client API. The system is written in object-oriented Perl, and clients are subclassed from the RedAlert::Client module. Client Communication with the alerting daemon is done by serializing Perl code with the commonly available DataDumper module, and sending it via a TCP socket. Separating the alerting and testing functions like this makes it easier for sysadmins to incorporate existing system-monitoring scripts and tools into the RedAlert framework.
The alerting system supports alpha paging, SNMP traps, and email for notification. The daemon is highly configurable and allows for detailed definition of notification thresholds and methods, and different messages based on the category of alert received from clients. The presentation mentioned, but did not elaborate upon, the ability to treat certain alerts as being diagnostic for larger problems (e.g., are all the printers down or is there something wrong with the network?) and only send notifications for the larger problems.
After the talk, Elizabeth Zwicky stepped up to the mike to verify that she heard the speaker say that the clients sent evaluatable code to the server without any sort of authentication or control over reconstitution, and Eric Sorenson acknowledged that this was an area where there was room for improvement.
Session: The Way We Work
Summarized by Josh Simon
Deconstructing User Requests and the Nine-Step Model
Thomas A. Limoncelli, Lucent Technologies/Bell Labs
The Greeting ("Hello!")
1. The Greeting
Problem Identification ("What's wrong?")
2. Problem Classification
3. Problem Statement
4. Problem Verification
Planning and Execution ("Fix it")
5. Solution Proposals
6. Solution Selection
Verification ("Did it work?")
8. Craft verification
9. User verification/Closure
Skipping steps can lead to solving the wrong problem (steps 25), choosing a solution that doesn't solve the problem (step 6), making a mistake executing the solution (step 7), not checking our own work (step 8), or having the user call back with the same problem (step 9).
Adverse Termination Procedures, or, "How to Fire a System Administrator"
Matthew F. Ringel and Thomas A. Limoncelli, Lucent Technologies/Bell Labs
In summary, if you're in the unenviable position of firing a system administrator, you need to ensure that all three tiers of access are closed properly, because leaving one or more undone can result in a disgruntled person with superuser privileges having access to your systems, networks, and data. A last word: whether you're the one doing the firing or the one being let go, be professional. You may have to work with these people or companies again, and while expletives may be satisfying they're also counterproductive.
Organizing the Chaos: Managing Request Tickets in a Large Environment
Steve Willoughby, Intel Corporation
Having service-level agreements (SLA) with senior management on both the customer and support sides is required. Intel also rotates its senior people onto the help desk, automates processes, and allows the user to control the closure of a ticket. They've found that this system scales better, results in a lower administration/user ratio, and results in users having more control over their problem reports and feeling happier about the process.
Future plans include more work on root-cause analysis to help resolve problems before they become disasters.
Summarized by John Talbot
GTrace A Graphical Traceroute Tool
Ram Periakaruppan and Evi Nemeth, University of Colorado at Boulder and Cooperative Association for Internet Data Analysis
As with any interesting problem, there are always interesting solutions. The location-detection problem would be easily solved if the LOC resource records in DNS were available for every name and IP reachable on the Internet, but these records are not generally used by most organizations for considerations of security and overhead. There is no IP-to-location master database anywhere on the Net such a database would be massive to implement and daunting to maintain on a full-time basis. If the maintainer records for domains and IP ranges were used, there would often be discrepancies between the billing addresses of the maintainers and the actual location of the networks under such authority. While none of these problems has a direct solution, some information gleaned from these sources can be used to rule out erroneous information in the data collection process of GTrace. Some solutions have been intuitively applied to the data-collecting features of GTrace that assist in the location-determination process.
The developers of GTrace have used some novel techniques to zero in on location data. Using an even-step search and comparisons of known round-trip times (RTTs) from previously measured or known sources, erroneous location information can be excluded and more suitable location information can be deduced. This method has been deemed as the "clarifier" part of GTrace that marks such flagged RTTs for further inspection and prevents inaccurate information from providing answers for physically impossible situations and data-transfer rates. For the known quantities, GTrace comes with an initialization database that contains machine, host, city, organization, and even airport information (no, you can't use GTrace to book a better airfare, sorry). As an extension, the NetGeo online lookup server has been created to track an impressive 76% to 96% of RIPE, APNIC, and ARIN WHOIS records. GTrace also has extensions to let the user/programmer add customized databases, file stores, and text files for additional geographical and data lookups.
For the user-interface and program extensibility, GTrace provides a sleek interface for mapping location information and onscreen segment and network-hop data-display tables. Additional features, such as the flexibility to use third-party traceroutes, the ability to add new maps, and a zoom feature, make GTrace a very adaptable and versatile tool. For more information, see <http://www.caida.org/Tools/GTrace>.
rat: A Secure Archiving Program with Fast Retrieval
Willem A. (Vlakkies) Schreuder and Maria Murillo, University of Colorado at Boulder
For implementing security, rat uses MD5 and PGP for encryption and checksums. For performance selection, rat offers an open ability to choose from several compression and extraction options. Also, individual configuration needs and file-compression options can be specified by using a personalized .ratrc file, making rat extremely versatile. The librat library enables the rat archiving and extraction procedures to be accessed at program level. The Qt library is used to implement a GUI interface for accessing rat archives at the user level.
The rat paper was widely accepted by the attendees at the conference. It is important to note that it is always a good thing when a presentation is followed by a feast of deep technical questions from some of the greatest talents in the group. One attendee suggested that optimization ideas could be handled on I/O levels below the file system itself, and Schreuder, displaying his deep understanding of this new technology, walked through a detailed explanation of seek, open, and close operations that could be used to perform such operations. Other suggestions using signed integers for the modification/date stamps, and storing ACL information in the archive metadata were well received by Schreuder.
More information can be found at <http://www.netperls.com/rat/index.html>.
Cro-Magnon: A Patch Hunter-Gatherer
Jeremy Bargen, University of Colorado at Boulder and Raytheon Systems Company; Seth Taplin, University of Colorado at Boulder and CiTR, Inc.
At the heart of the Cro-Magnon suite is an engine that is surrounded by download, authentication, notification, and GUI mechanisms and controls. It is written in Perl and thus can provide virtually infinite module flexibility. However, module implementation is not standard on all UNIX platforms and becomes even tougher ("if not impossible") on NT. Complex module variations and large config files are needed to keep track of large, heavily varying system layouts, since not every system in a heterogeneously operated environment would need to be at the same revision level at the same time.
Ongoing development is planned for Cro-Magnon and its documentation. Greater stress testing is planned for the Cro-Magnon engine. The configuration-file layout may get broken into sections to alleviate the need for a flat master file; why not modularize the config, since the process that runs it is modular? Also, there is an open door to implement existing tools, such as wget, to aid the engine functionality.
While Cro-Magnon doesn't automatically apply the patches, it can save system administrators a large percentage of the time involved in system updates, since retrieving and comparing current against future patches (for those of you who don't patch-and-pray) is 90% of the work. It would be nice to see some standardization in the UNIX patch world. I can imagine vendors sending out their systems updates and software with a Cro-Magnon module, so you install it once and the process takes care of itself for future updates. For a tool that was designed by software developers to simplify their system maintenance and headaches, it has the potential to end a lot of tedium for others, not just for its creators.
Session: Thinking on the Job
Summarized by Jim Flanagan
A Retrospective on 12 Years of LISA Proceedings
Eric Anderson and Dave Patterson, UC Berkeley
Eric Anderson provided a quick overview of the categorization of the 342 papers presented at past LISA conferences, calling out the trends, patterns, and insights gained from the study.
The major pattern: Papers were written either from the point of view of system administrators or from that of academics. The former work tends to be practical and realistic, though repetitive, and the latter work tends to be extensive and detailed, but irrelevant. Why? Since system administrators tend to be busy, they end up all solving the same problems, whereas the academic isn't close enough to the day-to-day work to understand the real problems faced by system administrators. Eric Anderson urged the two camps to work together to produce thorough, relevant research into tools for system administration.
Two other categorizations of the data were presented: the source of the problem the work was trying to solve (the source model), and the tasks focused on in the work (the task model). The main insight gained by examining the papers using the source model is that while system administrators divide their time about equally among configuration management, maintenance, and training tasks, the content of the papers written did not reflect this division. Papers related to configuration-management problems were most prevalent.
Based on this detailed examination of the task model, they recommend moving toward a single methodology for OS and application software installation and package management. Anderson also mentioned that end-user configuration customization hasn't received a lot of attention in recent years.
A trend seen in the task area of configuration management is that corporate mergers, acquisitions, and divestitures, as well as growth in the IT industry, are driving the need for more site moves and related work. This is causing paper authors to look toward a theory of site design that facilitates site moves. Also, a more mobile user community is inspiring growth in the number of network-configuration-management papers.
It was found that more energy was being spent in papers on the performance of backups than on the more critical performance of the restores. The areas of technology trends, security, and archival storage are neglected among the papers.
For email, the noticeable trend is that there were many papers in the earlier years, then a pause in mail research until 1996, when the Internet began to swell and spam, scalability of delivery, and security became bigger problems.
Anderson concluded by repeating that the work by system administrators is repetitive, and that a database of related work would help to alleviate this problem. System administrators might also find benefit in providing guidance to academics looking for research topics rather than striking off on their own to develop solutions to problems.
The raw data and categorizations are available from the authors, who encourage further analysis of the material.
One questioner from the audience wanted to know how the authors determined that systems administrators spend roughly one-third of their time on the three problem sources. Anderson replied that they surveyed members of the community.
Managing Security in Dynamic Networks
Alexander V. Konstantinou and Yechiam Yemini, Columbia University; Sandeep Bhatt and S. Rajagopalan, Telcordia Technologies
Configuration management is difficult because it is human-intensive and involves distributed, heterogeneous data. Errors are often introduced because there is no way to verify that configurations actually reflect policy, and mistakes have to be undone by hand. For this reason, a network tends to be reconfigured only if there is a compelling reason to do so.
Conversely, policy decisions have complex implications for the configuration of elements and services. A simple change in policy might require changes to switches, VPN configurations, fileserver ACLs, routers, and more. System administrators should be making changes in a more abstract layer, not at the network-element level.
The authors' proposed solution involves placing a Unified Configuration Semantic Layer between the policy definition and element configuration that employs consistency checking, change propagation, and rollback and recovery functions. Their work builds on the NESTOR network-elementmanagement system developed at Columbia. NESTOR maintains consistent configurations by imposing constraints (such as "All hostnames must be unique") in the form of Object Constraint Language, which is part of the Unified Modeling Language (UML). If the constraints are not fulfilled, then either an error is flagged or a policy script can be executed.
The authors use NESTOR constraints to model security policy and also to provide a first attempt at an abstract "universal platform" that can be mapped onto various network-element configuration models.
Deployment of NESTOR for security management involves the creation of a policy, the abstract modeling of the network elements and services, instrumenting the actual network element interfaces, translating the policies into constraints and policy scripts, then deploying and populating a NESTOR server with the above. It should be in the interests of network-element vendors to provide fully instrumented interfaces to their products, if a standard universal platform specification existed.
The authors project that the role of the system administrator will shift to the manipulation of abstractions rather than the direct configuration of elements and services, because the latter does not scale to large, complex networks.
A question raised from the audience was, what happens when the NESTOR server fails for some reason? In that case, if configuration changes were needed they could still be done by hand, but these would not be protected by NESTOR's constraint checking and rollback functions.
It's Elementary, Dear Watson: Applying Logic Programming to Convergent
System Management Tasks
Alva L. Couch and Michael Gilfix, Tufts University
Cfengine is almost Prolog, in that it provides a list of assertions which must be true if the system is healthy. Prolog, however, is a language that looks more like a description of policy and can be made to do many of the same things Cfengine and PIKT can do, all in one language. The authors have built a prototype configuration system from SWI-Prolog, which allows them to call code from shared libraries. To this they added various language primitives (du, passwd, etc.).
Couch walked the audience through a sample program that utilized the implicit iteration capabilities of Prolog to examine the home directory of each user in the password file to see if their usage was larger than some value, then send them a "you're a pig" notification. The program pointed out various subtleties in Prolog programming that might be dangerous. For example, if you mistakenly used a literal instead of a variable name in one of the slots of the passwd iterator, Prolog would try to make it true, by changing, say, the home directory of every user to the same value.
Another problem with using Prolog as a system-management language is that making programs efficient is a subtle art, and not one to be undertaken by sleep-deprived sysadmins at 2:00 am in an outage situation. For this reason the authors propose creating a simpler preprocessor language that is translated into Prolog.
A member of the audience asked why, if Prolog allows the creation of new primitives, you would use a preprocessor. The answer was that they wanted to enforce strong typing, something Prolog does not support. More discussion ensued about the implications of accidentally using literals instead of variables and the severe damage that could be caused by the quirks of Prolog. The consensus was that safety features would have to be included before such a Prolog-based system would be a reasonable system-management tool.
Session: Network Infrastructure
Summarized by Bryon Beilman
NetReg: An Automated DHCP Registration System
Peter Valian and Todd K. Watson, Southwestern University
The solution that Peter Valian presented met their requirements and involved a unique way of forcing the users to register their IP addresses using the DNS server fields of the DHCP information. Before they are registered, the DHCP records force them to a fake DNS root server that resolves all addresses to the registration page. Once they register and enter their university account name and password, the software modifies the DHCP configuration file and allows the user to use the network. They are working out some security issues, but the system is low-maintenance and helps to ensure that only authorized and registered users can use their student network. More information can be found at <http://www.southwestern.edu/ITS/netreg/>.
Dealing with Public Ethernet Jacks Switches, Gateways, and
Robert Beck, University of Alberta
They wanted to make sure that users cannot "snoop" each other's packets, to prevent (or limit) spoofing, and to disallow broadcasting of unknown traffic. Their solution involved using a gateway based on OpenBSD that blocks all outbound traffic using packet filters until they authenticate. The user can telnet to the gateway and authenticate, and the traffic is allowed through the gateway. They also monitor ARP tables using swatch on syslog to monitor IP spoofing and take action.
They also use an ident server that rewrites all outbound mail addresses with the users' real names and addresses (that they used to authenticate), so they cannot fake their email addresses. The system works well for them and it is easy for the students to use. More information can be found at <http://www.ualberta.ca/~beck/lisa99.ps>; the code for this solution can be obtained at <ftp://sunsite.ualberta.ca/pub/Local/People/beck/authipf>.
NetMapper: Hostname Resolution Based on Client Network Location
Josh Goldenhar, Cisco Systems, Inc.
Goldenhar gave the example of using NetMapper as a customized Netscape wrapper that allows Netscape to start with different URLs depending on the client location. This can be used to direct the user to the local cafeteria, help-desk number, or some other category that is based on the network grouping.
The second example demonstrated how it could be used to route trouble tickets that came in from a Web form to the local help desk for a traveling user. Salespersons or other people on the road can get their problems routed to the geographically nearest help desk to allow rapid resolution.
The tool is flexible and can do more than was mentioned. More information can be obtained at <ftp://ftp.eng.cisco.com/josh/NetMapper.tgz>.
Session: File Systems
Summarized by Mike Newton
Enhancements to the Autofs Automounter
Ricardo Labiaga, Sun Microsystems, Inc.
Moving Large Filesystems On-Line, Including Exiting HSM Filesystems
Vincent Cordrey, Doug Freyburger, Jordan Schwartz, and Liza Weissler, Collective Technologies
Summarized by Bryon Beilman
Service Trak Meets NLOG/NMAP
Jon Finke, Rensselaer Polytechnic Institute
This combination of tools allows the user to identify site-configuration errors, verify that some new work has not inadvertently turned on a service, and validate the security settings of a host. Some of the lessons learned are that host grouping is very useful, knowing the OS is very handy, and there may be some policy issues with running this kind of tool on your network. More information can be found at <http://www.rpi.edu/~finkej>.
Burt: The Backup and Recovery Tool
Eric Melski, Scriptics Corporation
BURT has worked very well for the university; it has consistently high perfor-mance and is flexible. They are able to backup data from 350 workstations and from their AFS servers, which contain approximately 900GB every two weeks. More information can be obtained at <http://www.cs.wisc.edu/~jmelski/burt>.
Design and Implementation of a Failsafe Print System
Giray Pultar, Coubros Consulting LLC
A nice feature of the architecture is that it can print to low-cost printers attached to the back of an X-terminal while still utilizing the centralized spooling model. The system can also route print jobs from VM and VMS to all printers on the system. Pultar can be reached at <email@example.com>.
Summarized by Jim Flanagan
Automated Installation of Linux Systems Using YaST
Dirk Hohndel and Fabian Herschel, SuSE Rhein/Main AG
While proprietary UNIX vendors have tight control over the booting process, an obstacle for Linux is that most PC systems have poorly implemented, nonstandard BIOSes. One can usually count on being able to boot from the floppy, and this becomes the lowest common denominator, though NIC-based net booting solutions are becoming popular. Floppies and unattended systems are, however, an impedance mismatch. SuSE systems come up running after an install, without having to reboot the system, since some systems can hang because of BIOS-related problems. This improves the unattended install process.
The SuSE boot process provides a way to put the system definition on the boot floppy. This can be defined entirely or you can factor the common configurables into this system definition and get the network configurables from a DHCP server. Info files can also be defined for certain classes of hosts, and hosts can be in several classes.
To account for differences in disk layout, YaST uses heuristics to determine how to put filesystems on the available partitions. Package selection can also be predefined with a config file, which can be built by going through a package installation interactively once, and then massaging the resulting package config file for use with unattended installations. YaST is extensible with pre- and postinstall scripts, and most installations take about five minutes.
Future work will include a database-driven configuration engine, Web-based administration, support for net-boot, and a system-cloning capability. When asked if an automated YaST install could be instructed to leave certain partitions untouched, Hohndel replied that you can mark any number of partitions or a whole disk as type "NONE," and YaST will ignore them. Another audience member asked if YaST could be ported to other UNIX systems. A Linux port would be trivial, but because of various features of other UNIXes, such as logical volume managers, much work would be needed to make YaST work on those platforms.
Enterprise Rollouts with Jumpstart
Jason Heiss, Collective Technologies
Custom Jumpstart employs two types of servers: boot servers, which need
to be on the same subnet as their clients, and install servers, which
are simply NFS servers with the OS packages exported. The boot servers
need to be configured with information about the clients.
This is done with the command add_install_clients. Typical add_install_clients invocations can be several lines long, which is an error-prone and tedious process, since most of the data for the clients are the same. Jason built a tool called Config, which acts as a bulk add_install_client that performs pathological error checking such as comparing the hosts and ethers tables against reality. Config also knows something about the installation infrastructure and chooses the correct boot/install servers for a given client, and it can mark nonstandard hosts so that they don't get Jumpstarted.
To automate the actual installs, another tool was created, called Start, which, after a few last-minute sanity checks (such as whether users are logged in), forks into multiple processes that log into the clients to kick off the Jumpstarts. Given that each client takes about 15 seconds to initiate, a single-threaded application is not sufficient for hundreds of hosts in a one-hour window.
The status of the installation was available on the Web, so that a small team of admins could react quickly to any problems that might arise during the process. Heiss was about to describe how users were notified about impending reboot of their machine when the hall reverberated with a loud "Warning, Warning," and I thought that we were all going to have to leave the building. But the warning continued: "Your machine is about to be Jumpstarted. Please log off." This was one of the suite of warnings that could be piped to /dev/audio if a user was still logged on to a machine that was targeted to be Jumpstarted.
For their infrastructure, rather than waste a machine as a boot server on each subnet, the team used multi-homed hosts on several subnets each. The bandwidth requirements, based on estimates of 500MB/client, 200 clients/hour, give 200Mb/second at the server end. This will keep a server with three switched 100-BaseT interfaces fairly busy and will have a significant impact on your network; Heiss recommended that shared Ethernet be avoided in this situation. The net booting process results in about 60 SFS93 NFSops/sec/client, and so an Enterprise 3000 class machine can serve about 150 clients. Jason also recommended that the data be striped (RAID0 or RAID5) to increase the performance.
During the question session, one person asked how to deal with locally installed software. Jason replied that though they had to deal only with dataless clients, local apps could be installed using a Jumpstart finish script.
Automated Client-Side Integration of Distributed Application Servers
Conrad E. Kimball, Vincent D. Skahan, Jr., David J. Kasik, The Boeing Company; and Roger. L. Droz, Analysts International
The solution involved separating the public view of the application file space from a private view, so that applications could be upgraded or moved behind the scenes without the users modifying their behavior. Multiple version of applications can be maintained, and the applications are built using the private namespace. Both the private and private views exist under a /boeing directory, with the private application-directory hierarchy mounted from several fileservers under /boeing/mnt.
The public directory hierarchy is then script-generated as a series of symbolic links in /boeing/bin, /boeing/lib, etc.
In response to a question from the audience, Vince Skahan said that they had attempted an AFS implementation of this scheme, but met with limited success and are going to stick with NFS.
Deep Space BIND
Paul Vixie, Internet Software Consortium
Who better to present the Deep Space BIND talk than Paul Vixie? Welcome to a deep history of BIND with a scope targeted on the protocols, implementations, and special interests that have established DNS for well over a decade and left it virtually unchanged for nearly 15 years, and on the DNS MIBs, completed in 1992.
Recently BIND services have improved significantly. New resource records and classifications have been implemented, but Vixie noted that deployment of many resource records has been difficult over the years because of the overhead required to maintain such records and the questionable usefulness of the information that they represent for public Internet DNS queries.
BIND 8.2.2 was released a few weeks before the conference, and BIND 9 has been in production for about a year. BIND-4 was feature-frozen in 1995 at version 4.9.5 and has had only security and bug fixes released since then. The latest release, BIND-8, version 8.2.2, features greater security, performance, usability, and RFC conformance. Also, BIND-8 has features for selective zone forwarding and an asynchronous resolver for processing multiple transactions using pthreads to enhance performance. Vixie advised all to move away from BIND-4 since it "just does wrong" with such attributes as panics on oversized messages, promiscuous data sharing, and the compression of names.
BIND-9 was a complete ground-up rewrite with the objectives of open source, basis on IETF standards, scalability, and a "carrier grade" production-quality product. Surprisingly, Paul Vixie has had no hand in the coding for BIND-9, since his massive BIND expertise has been required for ongoing support of BIND-8, and he is "planning on retiring" from being the BIND-master, as he is colloquially known. Vixie modestly noted this aspect as a "good thing," since he was of the opinion that the BIND-8 code should have been written from the ground up as well and many of the BIND-4-isms were brought into the BIND-8 release simply because of programmer familiarity. Those doing the code write have performed a complete restructuring of BIND and placed new emphasis on security, performance usability, and RFC conformance.
Other efforts are also in place to expand the usability of DNS. Extended DNS has made it harder to add security to the current protocol. Some transaction signatures have been proposed to address authorization and signed keys. Secure DNS (DNSSEC) implements zone authenticity through public-key encryption, using a parent-child keytrust for zone information and transaction-signature (TSIG) relationships between known servers. Also noted was the fact that caching former verifications is generally bad for security. One problem of note is that the GSSAPI in WIN2K does not implement the normal ISC TSIG and is not compatible with the current ticket system or format.
On a final note, BIND has been released under a BSD-style licensing agreement to promote broad implementations of BIND, which Vixie hopes to benefit an expanding economy.
The Four-Star Approach to Network Management
Jeff R. Allen, WebTV Networks, Inc.; David Williamson, Global Networking and Computing, Inc.
This session attempted to provide an alternative to traditional all-inclusive single-vendor network management solutions. The speakers advocated a modular approach to network management.
Their philosophy for the management environment at WebTV was to avoid the vendor approach: "Deploy monolithic application/framework and solve all problems directly or with add-ons."
Such an approach results in a complex, incomplete and virtually unmanageable implementation and would also overshoot their budget.
They split their requirements into four parts and then identified tools addressing those requirements.
A modular approach allows incremental improvement in the network-management infrastructure. It also reduces the risk of having a large implementation of a vendor-specific product to address a small need.
However, it requires a lot of effort in implementing each of the components and then making them work together. Such a solution may be less reliable, since it contains many components working together that may not have been tested thoroughly before. It also requires considerable knowledge of each of those components.
Their conclusion: Although the four-star approach requires effort and care during implementation, it provides administrators and managers with tools that directly apply to their site and gives them control over their environment.
Microsoft's Internal Deployment of Windows 2000
Curt Cummings, Microsoft, Information Technology Group
The goals of the Microsoft deployment were to:
The planning process began even before the first beta release of Windows 2000. At this stage they decided on a geographic organizational structure and a five-phase rollout.
Phase I was done using Beta 2 of Windows 2000. It was rolled out to 6,000 workstations in the engineering groups in Washington.
Phase II was also restricted to Washington, but included 15,000 workstations. At the same time, 10 resource domains were collapsed to five organizational units (OUs).
Phase III included 25,000 workstations.
Phase IV included 48,000 workstations and collapsing 150 resource domains to 50 OUs.
Phase V, which was not complete as of LISA, was full deployment worldwide. The expected completion date was mid-December.
Cummings discussed the challenges that this migration faced. These included resistance from local administrators worried about losing administrative control of their systems in the consolidated admin structure, lack of tools for synchronization of data across AD "forests," and the need to continue to support NT 4 for ongoing interoperability testing.
Real World Intrusion Detection
Mark K. Mellis, Consultant, SystemExperts Corp.
I've never been too excited about the topic of security, since it brings to mind an image of the corporate security guard rummaging through my backpack looking for "bad things" as I enter or leave my place of employment. Intrusion detection, on the other hand, invokes a stimulating "cat-and-mouse" response, much like the adventure described in The Cuckoo's Egg.
Mark Mellis gave an excellent presentation on what intrusion detection means, how it impacts your organization, what kinds of intrusion detection to implement, and how to deploy intrusion detection.
An often overlooked but extremely important first step in implementing intrusion detection is to establish policy. What are you trying to protect? Who assumes the risk? How do you protect the company when under an attack? Who has the authority to take down the site in an emergency? Questions like these need to be addressed before an effective intrusion-detection strategy can be deployed. Intrusion detection may involve decisions and actions regarding sensitive issues. Privacy concerns or other company policies may impact how you approach your implementation.
Effective intrusion detection also requires comprehensive training. Subscribing to mailing lists and attending conferences and tutorials help people to stay current with the latest methods. Just as important is for your staff to be familiar with all of your tools used for intrusion detection. Simulate a real-life attack to test your staff's ability to detect, classify, and respond to an external threat. Include the real decision-makers so that they too are prepared to make the important decisions.
There are four main types of intrusion detection: network, host, application, and analysis.
Network intrusion detection offers realtime analysis. Some "smart sniffers" provide this ability, but typically network activity is logged for later analysis. Newer routers offer dynamic reconfiguration based on realtime events. This means they dynamically create and destroy path(s) through the firewall by looking for signatures in network traffic. Network intrusion detection is generally nontrivial to setup and maintain.
Host intrusion detection is an area most familiar to sysadmins. It involves instrumenting the host with tools to monitor host activity. Some of the more popular tools include tripwire (file integrity using checksums), klaxon (port masquerader), tcp-wrappers (track connections), and syslog (log system events).
Application intrusion detection is analyzing unusual application behavior. An excellent example is a typical e-commerce configuration. A Web application running in a demilitarized zone speaks SQL to a database on a secure net. It is assumed that the Web application makes bug-free SQL database queries. If there are SQL errors in the database logs, it may indicate that someone has compromised the Web server and is performing ad hoc queries against the database.
Analysis is another important component to a good intrusion-detection strategy. A restricted-access machine is used as a centralized logging server to store syslog and other data for daily analysis. Simple hourly/daily/weekly reports are then generated, such as:
The information contained in these reports may indicate an unusual event or trend that requires a proactive response.
Intrusion detection is not a project but, rather, a process. It is the detection of, classification of, and response to a network or system event. Implementing different types and levels of intrusion detection, and correlating and analyzing the results, will help you to detect and respond to real-world intrusions.
The System Administrator's Body of Knowledge
Geoff Halprin, The SysAdmin Group
Neither the threat of government censorship nor that of the conference center burning down could have kept Geoff Halprin from delivering his message of developing a maturity model for system administration. Halprin, who endured a series of general fire-control-system false alarms during the presentation (what a trouper!), opened his talk by describing a number of electronic-communications regulations recently invoked in Australia which, based on uninformed decisions and a lack of understanding of what it would take to administer such regulations, effectively result in censorship and reduced protection of copyrights. Halprin's message was that if we do not take more of an interest in what is happening around us, it will happen to us.
Developing a better definition of the body of system-administration practices could help prevent many of the problems we face; it can enhance the ability of people, businesses, and governments to make informed decisions about practices that depend on system-administration support. Halprin noted that with the increasing use of e-commerce, the need to take a disciplined approach to system administration has moved into the spotlight. It is no longer just the concern of big IT departments.
The Systems Administrator's Body of Knowledge (SA-BOK) is being designed to help address these and other system-administration issues through defining the profession and its core elements. One of the key steps toward this goal is to define a taxonomy schema that provides a foundation for expectations, deliverables, and functionality of system administrators.
Halprin listed the roles of system administrators as troubleshooter, "the walking encyclopaedia," toolsmith, researcher, student, technical writer, strategist, tactician, and even a "doctor and counselor" to some. System administrators face such problems in the workplace as lack of understanding from management, lack of accurate reporting metrics, lack of standards, lack of time for proactive work, lack of boundaries (where the job's role starts and stops), and the demands of ever-increasing business needs. Core to all of these is a lack of clear understanding of what our role really entails, with a consequent inability to communicate the needs of that role (time, money, resources) to other communities such as management and government. To help manage such problems, better definitions of what system administrators do, what is needed to do their jobs, and methods to identify difficulties must be developed. Also, the system administrator's image and availability must be clarified, so that system administrators can readily answer the question, "Where the heck were you when it hit the fan?"
As the system-administration field grows, greater emphasis is being placed on availability, standards, and the nature of the job. Meanwhile, the system administrator is expected to understand every detail of a constantly changing environment. How can the system-administration profession maintain a positive development role under such pressures? Halprin stated point-blank to his fellow systems professionals, "We need to grow up."
In the path to professional growth lie many obstacles and requirements. The many unique features of the job of systems administration make defining its taxonomy difficult. Established models rely heavily on predefined iterations to develop a series of procedures that can be followed by less skilled people, whereas system administrators are faced with a continuing stream of unique problems. We must therefore turn our attention to the core competencies and disciplines of system administration, and to the higher-level processes and standards that should be found in mature organizations. Inherent operational costs, technology turnover, and the pressure to succumb to a "just fix it" philosophy tends to override a total-solution implementation, and so the conflict grows.
Halprin pointed to essentials like shared mental models to enhance shared ideas, benchmarks and site evaluations to build organizational maturity, and establishing degrees and certifications in the system-administration field to propel personal development. Several organizational models, including ISACA COBIT, SEI CMM, and the PM-BOK, address similar fields and issues, and can be drawn upon.
Halprin identified 15 areas of systems administration disciplines: change management, problem management, production management, asset management, facilities management, network management, server management, software management, data management, data security, business continuity planning, performance management, process automation, capacity planning, and technology planning. We are all responsible for each of these areas, but we typically worry only about whichever one is hurting us most today. By taking a step back and quantifying these responsibilities, we can then take a proactive stance, planning improvements to each of these areas, and reaping the benefits in reduced stress and increased availability.
Halprin finished by describing the phases of the Taxonomy project, which is a long-term project with the goals of:
These goals are being ambitiously pursued in corresponding phases:
Phase One is the SA-BOK, which seeks to define the domains and subdomains of responsibility and the concepts, knowledge and tasks associated with each domain.
Phase Two is to define levels of maturity with respect to each of these domains, so that organizations can assess their maturity and plan improvement programs.
Phase Three is to capture industry best practices in each of the domains, to provide an industry-wide shared model of the best practices, contributed to and used by all.
For more information, see <http://www.sysadmin.com.au/sa-bok.html>.
Building Internet Data Centers
Jay Yu and Bryan McDonald, GNAC, Inc.
The speakers outlined the need to understand the service levels required for the business. For example:
Once the requirements and service levels have been identified, the decision must be made whether to build and maintain the datacenter locally or outsource it. The decision should primarily be based on the resources available and costs associated with each option. Some points to consider:
In addition, building a datacenter requires interaction with many people in the facilities world. Past experiences and recent ventures in the outsourcing world indicate that it would be wise to outsource the datacenter, unless certain business requirements make it mandatory to house the datacenter locally. In the latter case it should be noted that building datacenters is generally a time-consuming process, and that it's important to organize finances well in advance.
Professional assistance in datacenter design could help address such questions as:
Follow the N+1 rule: provide for N+1 quantities of resources when N are required.
Approaching a Petabyte
Hal Miller, University of Washington
Hal Miller, the immediate past president of SAGE, gave a talk on what it's like to approach a petabyte of storage.
A petabyte is 1,024 terabytes, or approximately 1.1x1015 bytes. (For the curious, the next orders of magnitude are exabyte [EB] and zetabyte [ZB].) The trends are toward explosive growth but with bandwidth bottlenecks. The desire seems to be the equivalent of "dial tone" for IP networking, computing, and storage. This is all well and good, but how do we get there and make it work?
The problems a petabyte presents are many. Miller touched on some of them: 1PB is approximately 100,000 spindles on 18GB disks. Mirrored five-way, that's 500,000 spindles (and two copies offsite). Mirrors take 70,000 spindles, plus RAID drives, spares, and boot blocks, so we're talking around 1,000,000 total spindles. At $1,000 per, that's $1 billion just for the disk this excludes the costs of servers, towers, networking, and so on. Where do you put these disks? What are the power and cooling requirements for them? How do you perform the backups? How restorable are the backups? Where do you store the backups? How can you afford the storage, the facilities, the power, the cooling, the maintenance, the replacement of disks?
Who faces this problem? Oil companies (geophysical research), medical research (including genetic research), and movie companies (special effects) face it now. Atmospheric sciences, oceanographic sciences, manufacturing, and audio delivery will face it soon. And academic institutions will face it as well, since they do as much research as (if not more than) commercial institutions.
More information is available at <http://chrome.mbt.washington.edu/hal/LISA>.
Providing Reliable NT Desktop Services by Avoiding NT Server
Tom Limoncelli, Lucent Technologies
This was an excellent talk that was misnamed. It should have been titled: "Selecting Client and Server Ends of Systems Separately to Get the Best of Both Worlds."
Vendors with good servers tend to have poor clients, and vice versa. Therefore, pick an open protocol and separate server and client vendors that use it.
Managing Your Network(s): Corporate Mergers & Acquisitions, or, You
Got Your Chocolate in My Peanut Butter
Eliot Lear, Cisco Systems
Eliot Lear delivered an astounding wealth of information on how systems folks can deal with corporate mergers.
Company mergers create an exercise in scaling. This is where you and the network enter. The finished product typically does not look like either company. A merger makes the new company larger and can make life easier. The first rule to adopt is to automate as much as possible. More important, though, is the use of standards. This one small rule will save time when you are in the process of integrating two networks and people are continuously asking questions. When you begin to merge the networks, you may lose some functionality. Prepare for this by making a flowchart of what and when things are supposed to happen, then update it as you go along.
Remember: Employees are forgiving. Customers are not.
Many times, as the network admin, you may not know about the merger until the public does. Your most critical activity during this time is dealing with senior management. There may be times when senior managers request something that is not feasible; simultaneously, you are asking management to lay out specific policies and guidelines about the network you are designing. Before you set up security and usage quotas, these policies need to be in place. Requesting these policies and guidelines as soon as possible, pending a merger, is in your best interest. Ideally, get as much information as possible regarding merging sites and departments. Which ones are going to be restructured? Stay away from "stupid network tricks"; either fix it or not there is no middle ground. An interim fix will always come back and bite you in the end.
Here are some helpful hints that may save future confusion and make the transition smoother. Check for interoperability. For example, are you using ATM on one network and FDDI on the other? Focus on industry standards. Is your addressing global, or private? If you have both, which one will remain? Do not forget to leave room for your new company's requirements, and for growth as well. Look for tools that can help you do these things, such as Cisco Works 2000.
State of the Art in Internet Measurement and Data Analysis: Topology,
Workload, Performance, and Routing Statistics
kc claffy, Cooperative Association for Internet Data Analysis
kc claffy gave a very interesting talk about the difficulties in Internet measurements. She broke down assessment into four parts: topology, workload characterization, performance evaluation, and routing. According to her, one of the main problems is the lack of tools.
claffy presented a multifunctional tool, called Skitter, which was created by Caida. She used Skitter to display large sections of the networks coming out of California. The 3D graphics were impresive, looking like a multicolored spider web. Skitter is also able to do dynamic discovery of routes, much as routing protocols do on a router. We were able to see a breakdown of the different types of protocols going across a given Internet line. For example, you might see 20,000 TCP packets travel between San Francisco and Los Angeles in a given time period. This is all well and good, but it means absolutely nothing without a point of comparison.
What you want is the ability to determine average usage from an ISP to their customers or from a main branch to a satellite branch. Once an average is calculated, it provides a starting point for future measurement. You can determine whether something went down locally or if it is a widespread problem. All of this sounds great, and I myself have wondered how to implement this for my own network. The big problem is that there are literally thousands of data streams coming into a single recording point. It takes quite a while to decipher the pertinent information and transform it into something usable.
I can see the appeal, as kc does, in doing the research, simply because it is fascinating. As she pointed out, though, it takes years just to collect the data and then more years to understand and interpret it. Very few people are doing this type of work, which means it may take quite a while to have a complete measurement tool. It is a tedious job, but such measurements will allow us to find and fix problems on the Internet before they turn into a major crisis.
Look Ma, No Hands! Coping with RSI
Trey Harris, University of North Carolina, Chapel Hill
Trey spoke from real-life experience on the topic of repetitive stress injury (RSI). He's heard the myths, seen all the doctors, and gone through lots of trial and error. Above all, he's experienced a lot of pain that has had major impact on his ability to do his day-to-day job. He dispelled myths like "Can't happen to me," and "Can't get any worse." People must be aware of what can happen to them and take corrective measures as soon as possible.
For those who develop RSI, voice dictation is one option. Good packages are hard to find and are still susceptible to problems. Naturally Speaking is one such tool that Trey demonstrated for the audience. He could actually talk almost naturally and it kept up with him. However, when it came to writing Perl code, the results were disastrous. Passwords are a real problem too, unless you have your own private office. Any computer that takes on voice dictation requires lots of spare RAM for reasonable performance.
With respect to the RSI diagnosis, there is still much uncertainty among doctors. It's important to get second and third opinions. Some doctors prematurely offer surgery often not the best option, because it's invasive. Carpal tunnel syndrome (CTS) is a subset of injuries under RSI. There are many other ways people can injure their fingers, arms, and wrists. The most common cause of RSI is using the smallest muscles to do repetitive tasks. To avoid RSI, do the opposite use the largest muscles to do repetitive tasks. Small muscles were never intended to do the type of computer work we do today. Remember, RSI is not limited to computer work.
Panel moderated by Dan Klein, Consultant
The Buzzword Bingo Panel at work
Any opinions are those of the panel and not necessarily widely held.
The idea was to define various buzzwords and whether the average system administrator needs to worry about each right now, in three or nine months, farther out, or never. But in practice, the discussions centered upon definitions and not too much on the worry factor. (Errors in definitions below could be from the panel or be errors resulting from rapid note-taking by the summarizer.) Buzzwords discussed included:
dot-com enabled PHB (pointy-haired boss)-speak for "can you get us a Web site?".
brochureware see #1.
garage-band ISP three guys with six Linux boxes.
USB, Firewire Both are high-speed peripheral interfaces. USB is definitely here. Firewire has a small installed base and is more suited to consumer and specialized applications (e.g., downloading video); it may become dominant in the home in 39 months.
FM200 a.k.a. "halon++", a less toxic, less caustic material for fire suppression, basically works by removing all available oxygen, so it can still kill you if you're too far from the exit when the stuff is released. Very expensive will be a while before it is widely adopted.
OODB object-oriented databases. "Idea has a long way to go before becoming useful" (Greg Rose).
petabyte three orders of magnitude more than a terabyte. Good for reading/writing, but not reproducing; problems with fsck and dump abound. "If you're not storing video or satellite feeds, not common." Of course, people said a few years ago that nobody needed a terabyte either, so . . . start worrying. You may have a few years to do so.
J++/Visual J++ PHB. Compatibility issues for programmers, but generally sysadmins don't need to worry.
CRM customer relationship management. More PHB stuff . . .
although there was much discussion here. Refers to software to produce
statistics that are useful for writing proposals/justifications/
reports but may not necessarily do much for how you deal with your customers.
ERP enterprise resource planning. "Accounting with human resources." Affects sysadmins in that software (e.g., Oracle Financials, Peoplesoft) needs to be installed/maintained.
LDAP lightweight directory access protocol. It's here. Basically a useful subset of X.500. Interoperability a problem. Windows 2000/Active Directory drops WINS, implements LDAP. "Extensively complicated to configure." But get used to it.
SAN storage-area network, multiple disks and tapes connected via fiber. Big stuff, especially in backups.
NAS networkarea storage basically Network Appliance and other systems of their ilk.
fiber channel an optical diskconnect technology. (FCAL = fiber channel arbitration loop; FCVI = fiber channel virtual interface.) More or less a replacement for SCSI. Differing from USB and Firewire in scope/scalability, it is basically a datacenter tool. (Note that channel is IBM-speak for bus.)
.*M.?L markup languages. Discussion focused on XML, the extensible markup language, becoming a standard for electronic data interchange (EDI) think storing of data with formatting in a form that is easily readable like HTML. Some say it will replace HTML and perhaps PDF. Brief mention of SGML, Standardized General Markup Language. XML is a useful subset of SGML, as is HTML.
SCSI fast, wide, ultra, differential . . . "All colors of dead chickens" (Brent Chapman). SCSI (OK, we know that's small computer system interface) was basically 8 bits at a given clock rate. "Fast" doubled the clock rate; "wide" moved to 16 bits. "Ultra" doubled the fast clock and moved to 32 bits. Note that as SCSI moved to fast, wide, ultra, the maximum cable length dropped. Differential includes more error-checking on the bus; incidentally, the maximum cable length is "nearly back out to where it should be." Differential regular voltage is 12 volts, low-voltage is 3 volts (w/ 10,000 rpm drives). Thing to note here is that if you mix regular and low voltage, something will fry. Also in all this mess is the wonderful array of connectors, adapters, and compatibility of different types of devices.
SSA serial storage architecture. Serial bus-based disk architecture from IBM. Cool stuff, but then so was Betamax.
SSL secure socket layer a Web thing. If a sysadmin doesn't know about this already, something's wrong. Chapman noted that one needs to plan certificates carefully with respect to server names, etc., since the certificates are not easy or quick to change (or cheap). Also that certificate use does not equal authentication, but is merely a useful addition (e.g., some sites combine certificate use with cookies). Too much to discuss here in three minutes.
PKI public-key infrastructure. Important, and we have none. Attend a tutorial on cryptography/security and you'll see why.
ASP active server pages. Goes with PHP (pointy-haired protocol). Some dissenting opinions here . . . basically used to generate dynamic content and is "one step smarter than CGI." Others called it "Visual Basic for the Web." Brent Chapman's summary was, "It's not pretty, it's not the way we would do it, but we don't have to do it."
ASP, take 2 application service provider. Theory of multiple businesses sharing really expensive applications that would normally be installed on each business's intranet. Not e-commerce per se. Stay tuned.
IAP Internet application provider refers to e-commerce sites sharing back-end engines (e.g., eBay and others using someone else's auction engine).
DSL digital subscriber line and its variants. "ISDN on steroids but not a dialup." "How to take one lousy 50-year-old pair of copper wire and achieve reasonable network speed." Seen by some as the true beginning of a paradigm shift; used with VPN (virtual private networking) will replace corporate dialup it is already cheaper in most cases to go this route than to maintain a modem pool and pay long distance/800# phone bills. Estimated that 70% of U.S. residents live close enough to a switching station to get 384kb data rates. But one needs to check whether local ISPs have the capacity to support the number of subscribers at such rates many tier 2 and 3 ISPs cannot. Some sites that rely on employees to use DSL/VPN for access may find that they don't themselves have enough bandwidth especially as employees start doing things that were previously considered impossible with it.
ISO9000 a standard. "Do you have processes? Are they written down? Do you follow them?" and that's it. Quality of the processes doesn't matter, just whether or not they are repeatable and consistent. If your process is to shoot your customers and you do it every time, you can be ISO9000 certified. (You'll also be in jail.) Another opinion on this was that it meant "It is better to be up than fast; it is better to be reliable than good." "6-Sigma" in this category, too.
A Couple of Web Servers, a Small Staff, Thousands of Users, and
Millions of Web Pages . . . How We Manage (sort of)
Anne Salemme and Jag Patel, MIT
MIT developed its Web servers on the basis of the assumption, "If you build it, they will come." The university currently has 600,000 Web pages on 1,000 Web servers. They decided to use existing resources. AFS is used extensively because of its scalability and security using Kerberos. They also use Apache-SSL, Fast-CGI, and Java servlets.
In 19941997, MIT's Web environment had:
In 19981999, MIT's Web environment had:
Upcoming items in MIT's Web environment:
Budgeting for SysAdmins
Adam Moskowitz, LION bioscience Research, Inc.; and Gregory H. Hamm, GPC USA, Inc.
This was an excellent practical discussion covering not only elements of a budget in detail but Moskowitz and Hamm's tips with respect to the purpose and people of budgets and the budgeting process.
The purpose of a budget, essentially, is to serve as a very detailed planning tool, describing what you want to do next year and why. It is a way to get funding but not necessarily the only way. It is an instrument to foster discussions about what your company/department is doing and hopes to accomplish, as well as a means to find out whether everyone is "on the same page." It also allows you to be able to answer questions from other employees and departments so that they in turn can plan their own budgets.
The scary part of developing a budget, especially one's first time through the process, is coming up with the numbers. Moskowitz and Hamm counsel that you're not expected to know all the numbers, but simply with whom to talk to get them. Users, your boss, "the bean counters" (accounting and purchasing), and "the suits" (department heads, VPs, directors . . . maybe even your CIO, CFO, CEO for elements of the business plan, and if the company structure allows it and it's not a bad idea for your environment) all of these people can be extremely useful to you. Too often the bean counters and suits are seen as adversaries or, worse, stupid. In reality, they're neither they simply have different jobs from yours. If you play to their strengths and take advantage of what they can offer you, you'll be happier and more likely to get what you and your company need e.g., purchasing could help you out with the numbers on just how much toner you ordered last year, while different levels of management can tell you about hiring plans and company directions.
The last general guideline Moskowitz gave was to plan to have your budget cut, because they always are. If your budget is structured into reasonable categories with not entirely obvious slash points, you can whittle down the budget to your own liking, as opposed to having it done to you.
You can find the presentation slides at <http://www.menlo.com/lisa99/budgeting.ppt>.
Simon Cooper, SGI
As the use of the Internet grows, so does the need for inexpensive firewalls to protect the security of internal systems. Simon Cooper described the needs and uses for an inexpensive firewall and how to build and administer the systems.
Inexpensive firewalls are dedicated systems using available or low-cost hardware and free or low-cost software. It was pointed out that these are not no-cost systems: a substantial time investment is needed, and these firewalls do not provide maximum security or the highest reliability available. Appropriate areas of deployment for inexpensive firewalls are departmental networks, small businesses, homes, and personal domains.
The talk covered various aspects of building a firewall, including determining firewall needs; hardware; OS and software selection; OS hardening; kernel defenses; and filtering software information and examples, services, build tips, and experiences. The administration section discussed securing remote-administration connections and maintaining system integrity.
Lee Damon, Qualcomm; and Rob Kolstad, SANS Institute
This session's stated objective was to try to "avoid making egregious first-order mistakes and move on to second-order mistakes."
The speakers began with an attempt to define what ethics are and a discussion of why they might be important. The issue of ethics for system administrators has taken on a higher profile in recent years because of the increasing amount of data including sensitive data stored online.
They then went on to discuss some of SAGE's six canons of ethics and how some of them may not be entirely realistic.
Finally, the speakers led an audience discussion of several scenarios that a system administrator might face. They ranged from questions of when to inform a manager about employees misusing company resources to what your re-sponse should be to a request by a manager to search for child pornography in an ISP customer's home directory.
These scenarios weren't intended to show us what the "correct" response in a given situation is, but to show us how reasonable people with similar goals will, nonetheless, think differently on ethical matters.
David Williamson, GNAC, Inc.; Gerald W. Carter, Auburn University; Greg Rose, Qualcomm Australia
A review of three recent conferences replaced a session that had to be cancelled. The program chairs from each of the conferences spoke briefly about the highlights of each.
University Issues Panel
Moderated by Jon Finke, Rensselaer Polytechnic Institute
William Annis of the University of Wisconsin described how they have managed the growth in one group within the university. He related how they had developed a detailed planning document for centralizing and standardizing the systems and the implementation of cfengine to ensure consistency across the environment.
David Brumley of Stanford University discussed how his organization deals with computer security and incident response. The goal of the security office was to provide a secure, fast, and reliable network without firewalls; provide technical assistance with technical implementations; and provide a point of contact for incident reporting, handling, and follow-through.
Robyn Landers from the University of Waterloo discussed their solution for residence-hall networking. She described the process for students to get connected and how the university implemented an automated system of limiting the amount of network traffic allowed to individual students. This "rate-limiting" has prevented network overload and has encouraged students to share resources.
Kathy Penn from the University of Maryland described their backup procedures and policies. She emphasized the importance of documented procedures for doing backups and restores. She suggested that overview information, as well as cookbook-type instructions, is necessary. Documenting the policies regarding frequency of backups and the creation of archival copies, how long the archives are kept, and what you don't back up are necessary. Additionally, provide information on how to request restores and how long it should take to do the restore. A policy for who can request restores of information is critical.
Advanced Topics Workshop
Adam S. Moskowitz, Moderator and Chair
Once again the Advanced Topics Workshop was wonderfully hosted and moderated by Adam Moskowitz. The 30 or so of us each discussed our environments and mentioned some of the problems we were seeing. We then looked at some of the common themes, such as hiring and growth (virtually everybody present had open positions), scaling (especially at the enterprise level), some tools, and areas where we felt there had to be improvement (such as system administrators being able to speak the language of business in order to justify expenses).
The afternoon session of the workshop included some predictions for what we thought would be coming in the next year (wireless LANs, load-balancing hardware, LDAP, the lack of adoption on a widespread basis of Windows 2000, the lack of adoption on a widespread basis of IPv6, an increased demand for H.323 proxies for video conferencing, at least one major DNS outage lasting 24 hours, no new top-level DNS domains like .web and .biz, and no major problems when the century rolls over). Lest you think that we're omniscient or that we even consider that as a possibility we also looked at our success rates from the previous four workshops. We were right about some things, dead wrong on others, and one to four years ahead of our time on still others. So take these predictions with a grain (or bushel) of salt.
Finally, we wrapped up the workshop with a discussion of some problems we're facing (a VMS-to-UNIX transition in one place, the administration of customers' router passwords in another, and so on), with possible solutions bandied about. We also briefly touched on some interesting or cool stuff we had done in the past year. A lot of us were doing Y2K remediation and documentation.
GIGA LISA Workshop
Joel Avery, Nortel Networks, chair
At the workshop, we broke into four groups, each of which discussed two topics. I covered NT-UNIX integration and internal firewalls. The group also discussed the "most daunting problem."
Cooperation isn't good enough. Integration is about sharing as much as it makes sense to share between the two operating-system environments.
Password sharing solutions: Sites have started storing password data in various types of databases and have written utilities to reencrypt the passwords for each system. Used were a custom Oracle database, a Radius/Informix utility normally used to control modem dial-ins and routers, and a hacked Kerberos.
File sharing solutions: NetApps and Auspexen support both file systems directly. Smaller sites get along with Samba on UNIX and Dave on Macs.
Patch maintenance: Active versus passive maintenance schedules; no integrated solution, though.
Dataless clients: With file sharing, the dataless model makes excellent sense in both worlds.
No solution presented: Unified user-profile storage on NT to match the user-account-based dot files on UNIX. Since user-configuration information was moved into a database in NT, how can it be moved from machine to machine as a user roams, and how can other users be prevented from accessing a user's email?
As the Internet reached 250K nodes, people started making firewalls. Now that large companies have more than 250K nodes inside their networks, internal firewalls are being installed.
They are for resource constraints. The firewall is to protect the group that installs it, so they are local responsibilities. This got called "directional protection."
Use NAT to redirect by service.
Interesting tidbit: One in 700 employees is actively hostile to his or her employer. I wonder who came up with this and if it is true.
SUNROC versus NTRPC is a knotty problem. Netmeeting is a bear.
With multiple firewalls, asymmetric routing becomes a serious problem because IP packets do not record their path, so routers can choose between redundant paths. With firewalls, this becomes a serious problem.
Most Daunting Problem
Someone had a pair of datacenters to build the next day. He would run AIX on the main servers, and he wasn't an AIX wizard yet. The group spent an hour asking questions and making recommendations. He took notes the whole time.
Naming Services BOF
This BOF session was advertised to be on naming services (LDAP, NIS, DNS, etc.). It turned out to be a presentation (with supporting transparencies) of a product no longer available called Uname*IT. Uname*IT was developed a couple of years back by a company that has since gone bankrupt. It was basically a database that would allow any admin to maintain its name space. All the information is stored in the Informix database, and data can be pushed out using different format (NIS tables, DNS zone files, etc.).
The presenter didn't really get a chance to explain the product, since he was hammered with questions like: "Why is this a BOF? What do you want from us?" about 10 minutes into the presentation.
Turns out he was only trying to get feedback from people. He's interested in bringing back the product on the market (somehow?) and he wanted to know what people thought of it.
SAGE Community Meeting
Peg Schafer opened the SAGE Community Meeting with a number of announcements, which were followed by a question-and-answer session.
Current activities include preparations for the LISA 2000 conference, December 38 next year in New Orleans and the LISA-NT conference scheduled for July 30August 2 in Seattle.
The board was pleased to announce that SAGE was recently able to purchase the sage.org domain name.
SAGE-WISE has formed, representing Wales, Ireland, Scotland, and England.
The topic of "understanding what we do" has been the focus of a number of efforts including the SAGE Taxonomy working group, the salary survey, the occupational analysis survey being conducted by the SAGE Certification working group, and the results from the "Day in the Life" survey. There is increasing activity in how system administrators are educated as well as in helping match mentors with individuals who want to improve their system administration skills.
The question-and-answer session seemed to focus primarily on the need for more publicity and marketing for SAGE that convey the value it offers to system administrators and to businesses.
SAGE Mentoring Project BOF
The primary purposes of this BOF, which was led by Michael Ewan, were to identify individuals who were interested in serving as mentors and to provide the opportunity for individuals who would like to be mentored to step forward. The discussion also centered on the process of matching up individuals with mentors and how SAGE can help with the logistics of the mentoring relationship.
SAGE Taxonomy BOF
Geoff Halprin opened the BOF by asking a number of questions of the audience. The discussion centered on how different organizations have attempted to standardize the work that system administrators do. The group discussed the work proposed by Geoff in his first draft of a "Body of Knowledge" for systems administration and how it can be used. It was suggested that a method of evaluating an organization's competence in each of the "Body of Knowledge" areas would be beneficial.
The name Terminal Room has not been accurate for a long time; it should probably be renamed to "Internet Connection Room."
The terminal room, managed by Lynda McGinley and staffed by volunteers, was actually two rooms: one room with 30 PCs running Linux and a separate room with 40 Ethernet connections for laptops and the Axis Webcam. In addition, 10 modems were set up to allow access to the network from a Sheraton hotel room by dialing a four-digit extension; four of these modems were accessible from other hotels.
The Internet connection was a framed T1 provided by Earthlink. A wireless point-to-point connection from the Convention Center to the hotel was set up for the conference and paid for by GNAC. The networking equipment consisted of Cabletron and NetGear hubs.
As an experiment, 120 Lucent Technologies Wavelan 802.11 Turbo Bronze wireless PCMCIA cards (in both 2 and 11 Mb speeds) were available for checkout with a credit card; they were all checked out in the first couple of hours. Five wireless bridges (or Access Points) were provided to support the Wavelan cards, including one in the hotel bar!
The PCs were rented from Houlihans. Terminal room volunteers Dave Putz and Connie Sieh provided a set of six custom CDs and diskettes for Linux installation that made the installation and configuration go very quickly and smoothly. The PCs were installed with a minimum of software, but did include Netscape and ssh. Dave also provided a Tcl program that monitored the use of the PCs and enabled him to gather usage statistics at the same time. Dave's usage graphs indicate that a majority of the PCs were busy most of the time that the room was open.
USENIX conference attendees have come to depend on the terminal room at large conferences. Because of this, USENIX is looking at the feasibility of providing Internet connectivity at every conference and workshop.
The LISA Reception at the Museum of Flight
LISA Reception: You mean, this thing flew?
Dana Geffner & Monica Ortiz of the USENIX staff with Geoff Halprin
at the exhibits Happy Hour