;login: The Magazine of USENIX & SAGE

 

system administration research

burgess_mark

by Mark Burgess

Mark is associate professor at Oslo College and is the author of cfengine and winner of the best paper award at LISA 1998.

<Mark.Burgess@iu.hioslo.no>

Part 1: System Administration at the Crossroads

The Delicate Art of Assertion

I like to think of myself as a scientist. I always wanted to do science; I studied physics and have since interested myself in many other areas, not least among them computers. There is something about the creative pursuit of understanding that appeals to my mindset and motivates what I do. On the other hand, I have always found the world of commerce somewhat bewildering.

Apart from feeling nauseated when I am forced to think about money matters, I have an almost pathological distrust of sales promotion. The funniest things get said in the name of marketing. Wild assertions like "You know it makes sense" and "Eight out of ten owners said their cats preferred it." I recall a version of Solaris being released with the slogan "over a hundred bugs fixed," as though this were a good thing. What were they doing there is the first place? Although I am not so naive as to believe that making money is a bad thing, I do believe that the subjective metaphors of marketing have no place in the objective goals of science.

The field of system administration is now at a crossroads, poised between selling its soul to the devil of marketing (sorry, devils) and finding a more rigorous foundation in the tradition of science (cue angelic choir). Like most engineering disciplines, it will no doubt split and go both ways. It is important that a large part of it takes the road to scientific salvation.

If I haven't already made it plain, I'm on the side of science. This is the side that SAGE has been steadily encouraging since the first LISA conferences, by opening up a forum for vendor-neutral discussion. I would like to see much more scientific, technical work presented at the LISA conferences and far fewer descriptions of tools.

Looking back through all the LISA conferences while writing my new book on system administration [Editor's note: See this issue's Bookworm column], I saw a panel debate very early on that asked: why do we keep reinventing the wheel? Pursuing my journey through the papers to the present day, my only answer was: why indeed? Very little progress has been made in the field in more than ten years. This is mainly characterized by a slight gradient in the level of sophistication, fueled by technological advances.

This set me thinking. Why are we not doing more scientific work? In recent years, it has become popular to study Internet traffic and network behavior in a scientific way, so that the idea of empirical work has become virtually synonymous with testing network behavior. But network behavior is just one tiny part of system administration. What about the rest of it?

This series of articles is aimed at activating the sleeping academic in all of us. The most constructive way, arguably, to complain about the state of the art is to try to improve on it. I would like to provoke that development. The wonderful thing about science is that anyone can do it. You do not need a fancy degree to make a contribution (though the training you get often makes it easier); all you need is a critical sense and a measure of discipline.

Many problems must be faced in addressing system administration scientifically. This is surely why its development has been slow, in spite of the many scientifically trained system administrators in the field. The aim of this series is to explain the meaning of system administration as a scientific pursuit: what is it we should be looking for? How do we look for it? How can we be sure that what we're claiming is correct? How can we be our own worst critics?

Before starting, I would like to offer a word of warning. Like marketing, personal promotion through research competitiveness has no place in science, yet it thrives like a virulent plague among all-too-human researchers, eager to "beat the competition" and be the best by bashing the rest. Since I find LISA's friendly and open-minded atmosphere a refreshing change from many other conferences I attend, I have worried that bringing more critical rigor into the field might also bring the arrogance that frequently attends it. As soon as one person starts being more academic, there could be a race to be the loftiest of them all.

The key is to keep that race constructively aimed at the goal of the research and not at self-promotion, or engaging in unfriendly one-upmanship. Let us try to avoid these specters as long as possible and cooperate rather than compete; they will no doubt haunt us soon enough. That said, down to business.

Critical Sense and Common Sense

So I'm standing in the corridor and a student from my system-administration course says to me, "Why is Solaris server X soooo sloooow? Couldn't we just get an NT server and be done with it?" After asking him to remove his glasses, so I could do the thing with two fingers, he laughed and disappeared, but then a moment of humor faded into depression. How could someone in my course say such a thing?

For a start, the server that was relevant to his work was a GNU/Linux machine, not the Solaris machine to which he referred. Second, a moment's investigation would have shown that this host was running at lightning speed and doing practically nothing, but that something in the network (a bad switch as it turns out) was causing an appalling network-transmission rate. Third, how would running NT make anything faster?

What made this worse was that I have heard comparable abuses from considerably more informed practitioners. For instance, I once heard, "GNU/Linux is at least as fast as Solaris at NFS." This woke me up. What could this possibly mean? Actually I don't care which is faster, but what was the person trying to say? Were the two operating systems ever compared on identical hardware, under exactly the same conditions? Under what range of tests was this investigation carried out? What hardware was involved? Were the networks the same? Were any measurements actually made, or was this a gut feeling? I suspect the latter.

Here's another one: "Bourne shell scripts are lighter on the system than Perl because Perl has a huge binary, whereas the Bourne shell binary is much smaller." In fact, it is possible to settle the issue by measurement. That is the scientific thing to do. If there is a question, one carries out an investigation. No assertions are without risk of being toppled by contrary evidence. As it turns out, that assertion is full of holes. First of all, let us look at the sizes of the binaries on disk:

  GNU/Linux
516828 /usr/bin/perl
373176 /bin/bash

  Solaris
718688 /local/bin/perl
91668 /bin/sh
688972 /local/gnu/bin/bash

The Perl binary is indeed significantly larger than /bin/sh on the Solaris machine, but not much bigger than bash. The Solaris binaries are generally larger than the Intel ones, since the SPARC processor is a RISC architecture, so it is not even generally true that the Perl binary is bigger. But comparing binaries on disk is far from the point: what is interesting in a system-administration context is how much of the system's resources are consumed by a script. The size of the binary on disk is irrelevant. What we need to know is the total resident size of the program in memory! This is now a complex thing. Consider the RSS measure from ps ux on a GNU/Linux host:

924 bash
824 perl

Here we find that an actual measurement gives Perl a lower resident size at the time of measurement. Moreover, a shell script is not a complete program — it calls shell commands, each of which is a forked process, adding its own memory imprints to the sum, as well as a lot of context switching and interprocess communication (pipes and the like). Since Perl does not require pipes in order to communicate internally, for most things it suddenly seems like a very lightweight language compared to the shell.

System administration is a young field in academic terms, but there is no excuse for unscientific exclamations in a public arena (behind closed doors, over beers, we can BS all we like). Marketing slogans: my program is more secure than yours because it is written in Perl rather than C, my window system is better than yours because more users have it. These contain no truth, they make little sense, they are hot air. Trusted experts, whose opinions are respected, should not make sloppy and unfounded remarks that are unworthy of them. Never is this more prevalent than in operating-system wars or favorite-software scuffles. X is better than Y because I have a good story to tell. The aim of introducing a more scientific culture to the field is not to pull rank, but precisely to make rank-pulling impossible.

Science

Science is a meld of two ideas: theoretical model building and empirical data gathering. Scientists are hunter-gatherers! These two belong together. Models are needed to interpret empirical data and motivate experiments, and data are needed to substantiate theories or to inspire models.

As an empirical science, system administration leaves a lot to be desired. It is generally not hard to make measurements using the variety of programs available to us; rather, the problem is that in order to make a verifiable assertion, we need repeatability. An experiment that can be repeated many times with the same essential result can be trusted far more than a result obtained only once. It weeds out flukes. But repeatability is virtually unobtainable in social sciences (and system admin is a social science), because when we try to measure things where large groups of people are involved, the conditions under which measurements are made change constantly.

Comparing the results of one trial with the results of another requires performing them both under identical conditions. This problem has confounded the social sciences from their outset, but there the problem lies in asking questions that are too broad or too vague. In system administration, we must ask smaller, more concrete questions.

For instance, it does not make sense to compare the performance of a computer system during the day with performance measured during the night, and then use the average of those values as being the true performance. There are all kinds of things going on in a computer system that contribute to (and cannot be separated from) performance measurements.

What users do is a very significant influence (often the most important influence) on the system. Comparing a measurement at different times during the human social cycle is asking for trouble. During the night computer users work, during the day they sleep. (Or is it the other way around?) If multiple measurements are to be made, they should be made under virtually comparable conditions and then analyzed statistically to take into account small residual variations in conditions of measurement. Science is not just about measuring stuff, it is also about separating effects that do not belong together. This is one of the themes we shall return to in detail in this series.

So much for experiment. As a theoretical science, system administration almost doesn't exist. Actually, as a theoretician, I have been working on this problem, and I'll get back to this in good time. This should not be taken as an indication that theory is unimportant. On the contrary. Real progress will not be made in science without the ability to refer to a model, a set of assumptions, goals, and intentions. This is where theory will enter.

What kinds of issues will be we able to address with research into system administration?

  • Evaluating system policies
  • General investigative, troubleshooting methods
  • Optimal strategies for solving problems
  • Elucidating failures and problems
  • Discovering the need for new technology
  • Comparing technologies
  • Predicting problems in advance
  • Building simulations of complex systems

Cause and Effect

The central principle behind the behavior of any mechanism is that every effect (every change) is the result of a cause. This might sound painfully obvious, but it is also the part of investigation that is so obvious that it is frequently forgotten by the inexperienced. The art of making an investigation, as Sherlock Holmes knew, is to build a chain (sometimes a more complicated web) connecting the observed evidence with a proposed theory. Until this is done, any "explanation" of a phenomenon is simply speculation or assertion.

  • Something happened or changed.
  • What changed?
  • Does it continue to change?
  • Measure the change.
  • What processes affect the values you measured?
  • Guess an explanation for the change (theory).
  • Test the theory by carrying out several experiments.

It is not always possible to prove or disprove results. Usually the world around us is so complex that we cannot know every detail of the causal chain, and uncertainties blur our understanding. This is where statistics come into the picture. If we do not get exactly the same result every time, then we are missing some piece of the puzzle, but maybe it doesn't matter. Maybe we only need to know the causal web approximately, taking into account the main reasons and ignoring the minor changes by calling them "errors" or "uncertainties." In this case we can never actually trace every detail from cause to effect and "prove" a theory. It is only possible to say that something is probably true. The corollary to the above list is this:

  • Do the experiments agree exactly? Were there some discrepancies?
  • Are the discrepancies as big as the measured effect, or small?
  • Is your proposed explanation plausible?
  • Look for every possible flaw in your argument.
  • Are there alternative explanations?

A brief note on case studies. Case studies are anecdotal evidence, useful for motivating work. Case studies are inherently flawed as scientific evidence, however, because they are never repeatable. They suffer from the sociologists' dilemma of never being able to repeat under the same conditions. Thus case studies can never be used to prove a point, only to suggest an explanation. In many cases case studies may be the best we can do, but we should strive for more.

Statistics

Statistics is the course I hated most throughout my school and university years. I hated its pompous jargon and its incessant fascination with rolling dice. It was only later, when I chanced upon the meaning of statistics through the back alley of quantum physics, that I realized that there are, in fact, a few key ideas in statistics that my school-day arrogance had refused to let me see. They lie at the core of an opaque and intensely tedious melanoma of self-importance, whose turgid propensity for making simple things sound technical is paralleled only by postmodern philosophy. Whew! There, I said it and it felt really good! (Just for the record, I am laughing.)

Those who manage to see through or bypass the opacity of statistics courses learn to apply the few principles and understand their meaning. Nothing is worse than a bad statistical analysis for making nonsense of data. But a good statistical analysis can result in a significant improvement in understanding data. Of course, that assumes that it is relevant to apply statistical methods at all.

Statistics is really about classifying the different kinds of change that result from a complex web of cause and effect. We resort to statistics only in situations in which details are missing from our understanding of the link between cause and effect. For instance, user creates new file, new file appears. This experiment does not have to be repeated many times before one understands that the identical outcome results from the identical cause. We do not have to collect statistics about this, because the operations are so primitive that we can trace every part of the change and develop a theoretical explanation that details every link in the chain: it has to work.

Of course, the above example also illustrates the point that, on occasion, apparently well-understood phenomena surprise us. If the file system is incapable of creating a new file, that trusted experiment will also fail, and we must seek a new link in the causal chain to explain the result. Collecting statistics would not help us to understand why the act of creating a new file fails on occasion.

Statistics are useful when changes are tangled in a web of intrigue and we are not able to understand all of the influences or the results they give rise to. In that case we can do several things: if there are both big changes (trends) in data and small changes (fluctuations), then these can be separated by averaging. The average separates the large changes from the small ones, because it smudges out small details.

Normally one plots an average value and uses error bars to indicate the actual spread of values that were averaged over. The error bars indicate the size of the small fluctuations, or deviations from the "normal" average. If the size of the error bars (the spread of values) is not much smaller than the size of the effect we are looking for, then we cannot trust our statistical analysis: it tells us nothing meaningful (though it might help us quantify the meaninglessness). Indeed, it tells us that separation of large and small is not possible. If the error bars are small enough, then it means that the main features of the data can be understood in terms of the average values.

What this does is to provide a justification for ignoring a minor source of change (the small fluctuations) that we do not want to deal with, allowing us to focus on the main effect. In other words, statistics is a filter for cleaning up a noisy signal or for classifying different parts of the noise. The techniques of statistical analysis are all variations on this simple theme. Statistics is a useful tool, just as a graphic equalizer is useful in a sound studio, but it is not a guarantee for finding meaning.

An Ongoing Discussion

We shall return to more concrete examples in the remainder of this series. With LISA 2000 approaching, I would like to encourage anyone with an interest in their systems to think long and hard about how they can participate in raising the discussion about system administration to a more scientific level. It is not necessary to present a finished tour de force. That is not the real purpose of a paper. It is more realistic to expect to be able to present work that raises questions that could require several years to answer. Ask yourself:

  • Do I have something to say? Even something small?
  • Will this advance the state of the field?
  • Will this contribute to a larger discussion?
  • Is it about the behavior of hosts?
  • Is it about the behavior of users?
  • Is it general, or is it a specific example?
  • Can others learn from my experiences?
  • Have I thought of every possible explanation?
  • Have I explained how my study fits into the context of the larger discussion?
 

?Need help? Use our Contacts page.
Last changed: 20 nov. 2000 ah
Issue index
;login: index
USENIX home