System Flexibility

Next: Performance Considerations Up: System Architecture Previous: System Architecture

System Flexibility

In this section, we demonstrate Jupiter's flexibility by examining several configurations of the system's building-block modules. We focus on a recurring example--the object creation subsystem. Through examples, we present several hypothetical ways in which Jupiter could be modified to exploit memory locality on a non-uniform memory access (NUMA) multiprocessor system. In such a system, accessing local memory is less time-consuming than accessing remote memory. Hence, it is desirable to take advantage of local memory whenever possible.

Object creation begins with ObjectSource, whose getObject method takes a Class to instantiate, and returns a new instance of that class. At the implementation level, Java objects are composed of two resources: memory to store field data, and a monitor to synchronize accesses to this data. In order to allocate the memory and monitor for a new Object, the ObjectSource uses a MemorySource and a MonitorSource, respectively. The MemorySource may be as simple as a call to a garbage collected allocator such as the Boehm conservative collector [12]. Typically, the MonitorSource uses that same MemorySource to allocate a small amount of memory for the monitor.

**Figure 3:** A simple object allocation building-block structure.
$\includegraphics[scale=0.8]{simple.ps}$

The objects employed by such a simple scheme are shown in Figure 3, where arrows indicate the uses relation between the modules. The ExecutionEngine at the top is responsible for executing the bytecode instructions, and calls upon various facility classes, of which only ObjectSource is shown. The remainder of this section will explore the system modifications that can be implemented by reconfiguring the building blocks of this archetypal object allocation scheme.

Suppose the memory allocator on a NUMA system takes a node number as an argument and allocates memory in the physical memory module associated with that node:

 void *nodeAlloc(int nodeNumber, int size);

We can make use of this interface, even though our getMemory function of the MemorySource facility does not directly utilize a nodeNumber argument. We do so by having one MemorySource object for each node in the system. We then choose the node on which to allocate an object by calling upon that node's MemorySource.

**Figure 4:** Locality decisions made at the `MemorySource` level.
$\includegraphics[scale=0.8]{muxmemory.ps}$

There are a number of ways the ExecutionEngine can make use of these multiple MemorySources. One way would be to use a ``facade'' MuxMemorySource module that chooses which subordinate node-specific MemorySource to use, in effect multiplexing several MemorySources into one interface. This is shown in Figure 4. MuxMemorySource uses appropriate heuristics (such as first-hit or round-robin) to delegate the request to the appropriate subordinate MemorySource. The advantage of such a configuration is that it hides the locality decisions inside MuxMemorySource, allowing the rest of the system to be used without any modification.

**Figure 5:** Locality decisions made at the `ObjectSource` level.
$\includegraphics[scale=0.8]{muxobject.ps}$

A second possibility is to manage locality at the ObjectSource level on a per-object basis, as shown in Figure 5. MuxObjectSource is similar to MuxMemorySource, in that it uses some heuristic to determine the node on which to allocate an object. We can use the same node-specific MemorySource code as in the previous configuration from Figure 4. We can also use the same ObjectSource and MonitorSource classes as in the original configuration (Figure 3); we simply use multiple instances of each one. Very little code needs to change in order to implement this configuration.

**Figure 6:** Locality decisions made by the `ExecutionEngine` itself.
$\includegraphics[scale=0.8]{muxnone.ps}$

Yet a third possibility is to allow the ExecutionEngine itself to determine the location of the object to be created. Since the ExecutionEngine has a great deal of information about the Java program being executed, it is likely to be in a position to make good locality decisions, on a per-thread basis. In this configuration, shown in Figure 6, the ObjectSource and MemorySource remain the same as in the original configuration. The execution engine chooses where to allocate its objects by calling the appropriate ObjectSource. Again, we have not changed ObjectSource or MonitorSource classes, and the node-specific MemorySource class is the same one from the previous configurations.

These examples demonstrate the flexibility of Jupiter's building-block architecture. Each scheme has advantages and disadvantages, and it is not clear which is best. However, the ease with which they can be incorporated allows researchers to implement and compare them with minimal effort.

Next: Performance Considerations Up: System Architecture Previous: System Architecture

Tarek S. Abdelrahman
2002-05-27