As briefly described before, most UNIX and UNIX-like operating systems maintain a large number of statistics corresponding to various events that have occurred in the operation of the system. Access to these statistics and other configuration information is provided by a number of command interfaces. Problem diagnosis usually consists of manually obtaining appropriate statistics and perusing them for aberrant values.
System administrators in some organizations that use a large number of UNIX systems often use a set of home-grown (or commercially available) frameworks of automated scripts to obtain information from a large number of systems and analyse these values. There is a wealth of literature describing these tools [29,10,9,2]. In some ways, this is similar to our technique of continuous monitoring. The information gathered by these automated scripts, however, is at the granularity at which the various operating systems export system information. This granularity is usually too coarse for extensive auto-diagnosis of the kind that we can perform inside the operating system kernel with reasonable system overhead. These environments are also limited in the types of active tests that they can perform for pin-pointing problems.