Issues in Object Caching

This section deals with the problems that are independent of consistency levels which need to be addressed in building an object caching system and the various possible solutions.

1. Object Faulting and Access Detection: A method invocation on an object can be executed locally if the object's state currently resides in the cache. Thus, to execute a method invocation, it is necessary to determine if the object state is in the cache. If the object is non-resident, then it must be brought to the client and made available to the program in memory; this is called object faulting [13]. Also, concurrent sharing of cached objects leads to the issue of consistency among the copies. Protocols used for maintaining consistency of shared objects may require the system to detect updates to object state.

Many present day operating systems allow user-level programs to exploit virtual memory (VM) mechanisms (e.g., mmap and mprotect calls) to manipulate page protections. These mechanisms can be used for object faulting. VM mechanisms to detect access violations coupled with user defined handlers can be used to implement both object faulting and to detect accesses to shared objects. Several software schemes also exist to facilitate object faulting and access detection that include tagging to distinguish between references to resident and non-resident objects [13]. Object oriented programming languages can exploit the indirection [9] implicit in the method invocation mechanism to fold residency checks into the overhead of method invocation. In IBM's SOM, one could use the BeforeAfter metaclass mechanism [10] which allows a before method and an after method to be called before and after (respectively) every method of a class. This way, the metaclass can gain control before and after an object invocation that can be used to handle details of object faulting and access detection. In [37], the authors present a method for write detection in a DSM system that relies on the compiler and runtime system.

The VM based scheme induces page faults that have considerable overhead. Also, large page sizes can create problems with false sharing [3]. However, using VM mechanisms induces caching overhead only on object faults and some object accesses. Software schemes, on the other hand, require extra instructions on every fetch and/or store. In [37][13], the authors argue that software based schemes out-perform VM based schemes. However, these arguments are made either in the context of an entry-consistent DSM system [37] or in the context of implementing persistent stores, and garbage collection [13]. On the other hand, a previous study [7] on the issue of granularity choices for data transfer in client-server object-oriented data base management systems (OODBMS) favored page servers (mainly due to the consequent advantages of object clustering). Thus, there is no clear choice and we chose the VM based scheme.

Another issue that arises in object faulting is the difficulty in detecting the type of access - namely, read-only or read-write - while faulting on an object. When a simple memory object is faulted (as in DSM systems), the type of access is readily known since operations are elementary ones (like read or write). However, objects are accessed using method invocations. When a method is invoked on an object, the type of access is not obvious. Faulting an object copy in read-only mode only to realize later during the course of the method invocation that a read-write copy was needed can be inefficient. We take the approach of having the class implementor specify the access type associated with each method invocation. Obviously, such information will have to be placed outside the object in order to decide on the access mode in which to request a copy of the object during object faulting. Consequently, access to such information should be made very efficient possibly by caching relevant parts locally.

2. Control in caching: We alluded to the issue of control in caching in section 1. Here, we recapitulate the issues that lead to the necessity of control. Not all objects encapsulate only data. Some objects are really wrappers around control or hardware devices. Thus, it is not desirable to cache all kinds of objects. Having decided to cache an object, the issue of the most appropriate consistency level for the object remains. Objects have references to other objects. When an object is faulted in, it is not always clear if all referenced objects should also be faulted in. Control over the above aspects of caching can be exercised by one or more of the following entities among others.

Class implementor: This is based on the assumption that the class implementor knows best how the object should be implemented.
Client application: Usage patterns of objects may vary depending upon the application. In such cases, client application initiated caching and consistency maintenance can be very useful.
System: Traditional considerations such as load balancing can be best handled by the underlying system.

Clearly, no single method of control is suitable for all purposes. For example, an implementor of a class may choose to provide caching for objects of that class based on the fact that most applications could benefit from caching such objects. While this benefits applications that need this default behaviour, some applications may have different requirements. For example, they may not choose to cache these objects. Similarly, the system may decide not to cache instances of the class if the machine on which the client application is executing does not have enough resources. Thus, the above scenario illustrates that while the class implementor may be able to decide for the general case, client applications and/or the system may also need control in this regard. In our system, we have implemented class implementor initiated caching.

The class implementor can choose to inherit (directly or indirectly) from one of the classes in the consistency framework depending upon the need for caching and the desired consistency level. Also, one of the classes in the consistency framework (the Cached class) provides two methods (readFromString and writeToString) as part of its interface that provide control to the class implementor regarding the amount of object state that needs to be exchanged between processes. We describe the consistency framework in section 4.1.

3. Cache Organization: Several issues arise related to the memory pool which constitutes the cache. Firstly, clients may or may not be allowed to directly access the cache. For example, in Spring [27], clients do not directly access the cache; instead, they communicate with a local proxy server which in turn has direct access to the cache; in other words, objects accessed by clients are actually cached by the proxy server. Such a proxy server provides better security since all client invocations can be intercepted by the proxy server. This design assumes more importance in object caching because it is customary to allow clients to have access to specific methods of an object instead of the entire object. However, every client access to cached objects incurs inter-process communication overhead with this approach. We chose to allow client processes to have direct access to shared objects in the present implementation because of the high overhead of inter-address space communication in UNIX. In the future, we intend to build an user-level RPC (URPC) [4] like mechanism to reduce inter-process communication overhead at which time we will investigate the proxy server approach.

Once we decided to allow clients direct access to cached objects, the second issue that arises is whether the memory pool used for caching should be shared across all the clients on a node. We call this scheme per-node caching. The alternative is for each client to have its own pool, which we call per-process caching.

Figure 1: Per-node caching and legacy applications

The per-node caching scheme has the advantage that object faulting and consistency related actions only need to be done for a single copy at a node even when several clients access the cached object at the node. However, there are disadvantages associated with per-node caching. A notable one is the issue of enabling caching in legacy applications. Consider the class definition given for myClass in figure 1. Suppose that client applications A and B accessed obj (in figure 1) by doing remote invocations on a server and would now like to enable caching for objects of type myClass. One way to do it is to define a new class myNewClass that inherits from myClass and another class that enables caching (such as the Cached class shown in figure 2 in the next section). The client application then creates instances of myNewClass. Now, A creates obj and caches it. However, when it invokes the set method on obj, it assigns ptrVar in obj to an area of memory that is in A's private address space. When B executes, it gets the right handle to obj but its attempt to dereference longVar results in a segmentation violation.

An obvious solution to this problem is to change the implementation of myClass such that the set method makes ptrVar point to some area in the shared pool of memory that constitutes the cache. One way to do this is to redefine ptrVar to be a pointer of type newLong and define the constructor of this type to allocate memory from shared memory instead of the process' private address space. However, changing (and recompiling) legacy code may not always be an acceptable option. This issue persists even in other approaches to enable caching such as a scheme that employs runtime inheritance [26].

One way to get around this problem is to route method invocations on shared objects through stub routines that do the necessary marshalling/unmarshalling of arguments. Another approach would be to allow such pointers to private address spaces and handle segmentation faults as they occur. Both approaches introduce additional overhead during each method invocation. In our implementation, we employ per-process caching. The reason the scenario in figure 1 does not cause problems in per-process caching is briefly the following. When B tries to access the shared object (obj), it cannot get a pointer to A's copy of obj. Instead, the entire state of the object is obtained and stored in B's per-process cache thus precluding the problem of accessing memory belonging to another process' address space.

4. Cache Management: As we described before, clients on a node can freely cache in copies of objects to which they have access. To maintain consistency among the copies, mechanisms to invalidate or update object copies and extract state from the object copies are required. Requests for these actions can come asynchronously with the execution of a client process. For example, when an object state is updated at a remote node, the client may receive a message asking it to invalidate its local copy of the object. Such requests can be handled in several ways. However, Fresco's object transport interface is limited by the interface provided by Sun RPC on which it is layered. Moreover, the transport runtime is not multi-threaded. Here, we discuss possible solutions in this context.

Our approach is to create a cache manager process which shares the memory pool used for caching with the client and fields external (consistency) requests which may require invalidations or updates and sometimes extraction of the state from an object copy. In our implementation, two processes share memory by opening a common file and mmaping the file to their respective address spaces. We considered three approaches. Problems exist in each of the approaches and we discuss them below:

In the first approach, one additional cacher process exists per node which pair-wise shares memory with each of the clients and we call this process the node cacher. Since the node cacher shares memory with all clients on the node, it needs to open one file per client and may exceed the limit on the maximum number of file descriptors that can be opened by a process. This problem can be solved by managing the open file descriptors independent of the number of client processes on the node.
The second approach is to have one additional cacher process per client process and we call such processes process cachers. Each client process shares memory with its process cacher. This approach has two problems:
1. By doubling the number of processes on a node (one additional process-cacher per client), general degradation in performance can be expected.
2. Consider the case where multiple client processes on the same node (say ) have copies of some object. Suppose that a client on another node (say ) wants to access the object and the consistency actions need communication with all the copies (e.g., their invalidations). In order to communicate with the copies on node , multiple remote invocations from to are required. This can be avoided if incoming requests to a node are channelled through some process (for example, the node cacher). This is the approach taken in next case.
This approach is the same as the previous one except that it additionally has a node cacher (which does not share memory with clients); all incoming requests are channelled through the node cacher which then contacts the individual process cachers. This approach has the following problems:
1. It has the same problem as in case (2a) above.
2. This approach incurs the overhead of an additional inter-process communication (IPC) between the node cacher and the process cachers.
The prototype presented in this paper adopts the last approach. However, based on our experiences, we would like to move away from this approach in the next revision of the implementation of the caching system. In particular, we would like to adopt the second approach while moving the node cacher functionality to the object manager or a server.

5. Object Implementation: So far, we have focussed on the issue of managing object state when objects are cached. A related issue that is important in object caching is that of installing, finding, and using code that implements an object's methods; we will refer to the code as the object's implementation. A number of choices exist for dealing with each of the above activities. For example in Emerald [17], object implementations are stored in concrete type objects. When a kernel receives an object's state, it determines whether a copy of the concrete type object implementing that object already exists locally; if it does not, the kernel obtains a copy from another node using some location algorithm. In [34], code mobility in Emerald is achieved on heterogeneous computers at the native code level; migrated code runs at native code speed before and after migration.

In the Common ORB Architecture document (CORBA 1.2) [28], on the other hand, it is stated that the object implementation information is provided at installation time and is stored in the Implementation Repository for use during request delivery. Object adaptors in CORBA are responsible for the interpretation of object references and mapping object references to the corresponding object implementations amongst other duties. These functions are performed with the cooperation of the ORB Core and the implementation skeletons. When a client invokes a method on an object, the ORB Core, object adaptor, and the implementation skeleton at the server end arrange that a call is made to the appropriate method of the implementation. A similar mechanism can be used at the client end to use implementations of cached objects, possibly with the aid of a library object adaptor linked with clients that cache objects. This may necessitate that object implementations be installed at each of the sites of use.

This is an important issue particularly in the context of CORBA compliant systems, since an important goal of CORBA is to provide interoperability between applications on different machines in heterogeneous distributed environments and to seamlessly interconnect multiple object systems [29]. In our implementation, however, we make the assumption that the implementation for the cached object is available; in fact, the application code is either compiled or dynamically linked with the code that implements the shared objects.

Next: Implementation Up: Object Caching in a Previous: Building Multiple Consistency

Rammohan Kordale
Tue May 7 09:16:10 EDT 1996