Talking to an OpenStack Cluster in Plain English

Monday, May 22, 2017 - 4:30pm4:55pm

Wei Xu, Tsinghua University


Modern systems build on layers and layers of abstractions with tons of modules. These abstractions help the development but make it a nightmare to operate. OpenStack is a system of this kind: its states, including persistent (DB) states, are distributed across dozens of modules in the system. Operators have to access these states using obscure command line tools that has hundreds of switches no one remembers. Integrating it with other open source projects like Ceph further complicates the problem. Reasoning about the inconsistencies of these states – one of the leading causes of user-visible bugs – is beyond what current log-based monitoring systems’ capability. 

As both system operation practitioners and academic researchers, we discuss our experience in operating a 130-node OpenStack private cloud, as well as our research on how we automatically build a knowledge graph based on system states and logs. We will demonstrate our natural language interface that can provide all information about the system, crossing layers, and modules – all with plain English queries. Finally, we also present a simple anomaly detection system indicating “why” a problem happens.

Wei Xu is an assistant professor at the Institute for Interdisciplinary Information Sciences of Tsinghua University. He received his Ph.D. from UC Berkeley in 2010. He worked at Google as a software engineer before joining Tsinghua University.

Wei Xu has a broad research interest in distributed system design and big data. He has published 20+ research papers in leading venues.

He is also the director of Open Compute Project (OCP) Certification Lab in China. He is a recipient of the Chinese National Youth 1000 Program, graduate student advising award from Tsinghua, and faculty research awards from Google, IBM, and Microsoft.

