Check out the new USENIX Web site. next up previous
Next: Implementation of the NetApp Up: Problem auto-diagnosis methods Previous: Automatic configuration change tracking

Extensibility issues

It is important for an auto-diagnosis system built around the techniques described above to be extensible. As explained above, the checks and actions performed by the continuous monitoring logic need to be developed in a phased and conservative manner. Each time a new version of this logic is available, a vendor may want to upgrade the systems in the field with this logic, even if the customers do not wish to upgrade the rest of the system. A customer may not wish to take on the risk associated with a new software release, or may not want to pay for the release, especially if it does not contain any functionality that the customer needs. It is, however, usually in the vendor's interest to upgrade the auto-diagnosis logic because of the little associated risk and potential benefits of lower support costs.

For example, an appliance problem may have been first discovered at one customer's installation because of an environment change, e.g., the addition of a new model of some hardware in the network fabric. In some cases, significant effort by human experts may be required to debug this problem since it has not been seen before. Ideally, we would like to leverage off this effort by codifying the debugging logic used in this manual diagnosis into the appliance's auto-diagnosis logic and upgrading the auto-diagnosis subsystems of all the systems in the field. This may save a lot of time and effort by auto-diagnosing subsequent instances of this problem which would otherwise require significant human intervention.

Extensibility can be achieved in a variety of ways. One method is for the continuous monitoring system to use a configuration file containing equations that define the various periodic checks that the monitoring system performs and conditions that trigger the flagging of an ERROR state, or cause an active subtest to be executed. This requires a language to express the logic of the periodic checks, and an interpreter for this language to be part of the problem auto-diagnosis subsystem.


next up previous
Next: Implementation of the NetApp Up: Problem auto-diagnosis methods Previous: Automatic configuration change tracking
Gaurav Banga
2000-04-24