Next: COPE Up: Implementing Causal Logging using Previous: Abstract

Introduction

In nearly all distributed systems, some data values are replicated across different processors. For example, one might have a distributed cache to allow reads to be performed more quickly. Or, one might have multiple copies of a critical data structure to ensure that should a processor crash or become isolated by a network failure, a copy of the data will still remain accessible. Replicated data management is therefore an essential service of distributed systems.

One issue arising in replicated data management is how consistent the replicas need to be. Considerable effort has gone into defining and analyzing different consistency models. A very strong requirement is for the replicas to reflect the real-time order in which they were written. For example, consider two values x and y that are replicated on processors a, b, c and d, and let the initial values of x and y be zero. If processor a sets x to 1 before processor b sets y to 2, then c and d will both see x set to 1 before seeing y set to 2 (the same, of course, holds for a and b). This kind of consistency, called atomic consistency [7], is in general expensive to implement, and thus used only when absolutely required. A weaker form of consistency is called sequential consistency [6] in which some total order on updates is imposed. With the above example, c and d will both see the same order of updates to x and y, but not necessarily the update of x before the update of y. Sequential consistency is less expensive to implement than atomic consistency and yet is often strong enough for many applications.

A still weaker consistency property is causal consistency [1]. Suppose that processor b did not update y until it read x and found a value of 1. In this case, we say that the value of y causally depends upon the value of x--that is, the act of a setting x to 1 in some sense caused b to set y to 2. Causal consistency ensures that the sequence of updates as read by other processors is consistent with causal dependency. In this case, both c and d would see the update of x before the update of y. On the other hand, if b did not read x before setting y, then the updates are said to be concurrent. Concurrent updates are not ordered, and so c could see x updated before y while d could see y updated before x.

Preserving only causal dependencies among replicated data is sufficient for many applications [1]. We give an example of such an application later in this paper, namely optimistic execution in an object-oriented environment. And, causal consistency is cheaper to implement than sequential consistency in a wide-area setting. A problem with wide-area networks is the large variance in communication latency. With sequential consistency, a processor that updates a shared variable must ensure that its update is ordered with any other potentially concurrent updates, and so the latency of an update can be no smaller than the longest latency from the updating processor to a copy of the data [11]. Causal consistency does not require concurrent updates to be ordered and so a processor can simply update its local copy and continue. There can, however, be latency introduced when reading a shared variable [10].

Latency can be reduced by implementing causal consistency through a technique called causal logging [3]. With causal logging, a message that updates a shared variable piggybacks the updates upon which the variable causally depends. That is, causal logging trades off bandwidth for latency. Causal logging has been used in several applications besides implementing causally consistent replicated data, including distributed simulation [5] and techniques for low-cost failure recovery [2]. These applications all share the same general property: a process does not observe an action (such as the delivery of a message or the update of a shared variable) until it has observed all actions that the observed action causally depends upon.

In one of our research projects, we faced the problem of implementing causal logging when constructing a CORBA service. We call this service COPE and briefly describe it in Section 2. To implement this service in CORBA, we hoped to use an interception facility. Interception is a way to add functionality to CORBA services in a manner that is orthogonal and non-intrusive to the main computation. CORBA interception is implemented using interceptors which are code that can be invoked upon a message being sent, or upon a message being received (as well as other trigger points). We describe why this facility allows for a straightforward implementation of causal logging, as well as describing this implementation, in Section 3. We chose OrbixWeb as the platform for implementing COPE because it is a popular Java ORB that provides interception facilities. We detail these features in Section 4.

COPE is a somewhat complex CORBA service, and to the best of our knowledge no other group has considered a service with similar functionality. It shares some features with proposed CORBA security services, most notably Application Access Policy [9]. In implementing COPE, we encountered several problems with OrbixWeb, which we describe in Section 5. We believe that the problems we encountered will also be encountered by those implementing similar CORBA services. Our goal in writing this paper is to discuss the problems we encountered in hope that those building CORBA ORBs will be aware of them when building interception facilities.

Next: COPE Up: Implementing Causal Logging using Previous: Abstract

1999-03-21