Modulo: Finding Convergence Failure Bugs in Distributed Systems with Divergence Resync Models

Authors: 

Beom Heyn Kim, Samsung Research, University of Toronto; Taesoo Kim, Samsung Research, Georgia Institute of Technology; David Lie, University of Toronto

Abstract: 

While there exist many consistency models for distributed systems, most of those models seek to provide the basic guarantee of convergence: given enough time and no further inputs, all replicas in the system should eventually converge to the same state. However, because of Convergence Failure Bugs (CFBs), many distributed systems do not provide even this basic guarantee. The violation of the convergence property can be crucial to safety-critical applications collectively working together with a shared distributed system. Indeed, many CFBs are reported as major issues by developers. Our key insight is that CFBs are caused by divergence, or differences between the state of replicas, and that a focused exploration of divergence states can reveal bugs in the convergence logic of real distributed systems while avoiding state explosion. Based on this insight, we have designed and implemented Modulo, the first Model-Based Testing tool using Divergence Resync Models (DRMs) to systematically explore divergence and convergence in real distributed systems. Modulo uses DRMs to explore an abstract state machine of the system and derive schedules, the intermediate representation of test cases, which are then translated into test inputs and injected into systems under test (SUTs). We ran Modulo to check ZooKeeper, MongoDB, and Redis and found 11 bugs (including 6 previously unknown ones)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

Presentation Video