NSDI '05 Abstract
IP Fault Localization Via Risk Modeling
Ramana Rao Kompella, University of California, San Diego; Jennifer
Yates and Albert Greenberg, AT&T LabsResearch; Alex C. Snoeren,
University of California, San Diego
Automated, rapid, and effective fault management is a central goal of
large operational IP networks. Today's networks suffer from a wide
and volatile set of failure modes, where the underlying fault proves
difficult to detect and localize, thereby delaying repair. One of the
main challenges stems from operational reality: IP routing and the
underlying optical fiber plant are typically described by disparate
data models and housed in distinct network management systems.
We introduce a fault-localization methodology based on the use of risk
models and an associated troubleshooting system, SCORE (Spatial
Correlation Engine), which automatically identifies likely root causes
across layers. In particular, we apply SCORE to the problem of localizing link
failures in IP and optical networks. In experiments conducted on
a tier-1 ISP backbone, SCORE proved remarkably effective at localizing
optical link failures using only IP-layer event logs.
SCORE was often able to automatically uncover inconsistencies in the
databases that maintain the critical associations between the IP and optical
- View the full text of this paper in HTML and PDF.
Until May 2005, you will need your USENIX membership identification in order to access the full papers. The Proceedings are published as a collective work, © 2005 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.
- If you need the latest Adobe Acrobat Reader, you can download it from Adobe's site.