Check out the new USENIX Web site. nextupprevious
Next:EvaluationUp:Virtual Services A New Previous:Failing System Calls

  
Implementation


We added VSs as loadable modules to the 2.0.36 version of the Linux kernel; located at https://www.eecs.umich.edu/~reumann/vs.html. Figure 6 shows dependencies among the VS modules. To implement the gates, we added only a few lines of call-back code to the intercepted system calls to trigger VS classification. The VS structure itself (Figure 8) contains the previously-described membership information, statistics, and resource limits. The VS structure, VS hierarchy management and most of the gates are portable since they hardly depend on Linux internals. The placement of the call-back code in the original system calls is Linux-specific.

We implemented VS-level fair-shares [12,15] for CPU and network to provide strict VS-level resource guarantees. VSs that are neither directly nor indirectly (via a parent VS) associated with a share are scheduled on a best-effort basis. Best-effort VSs use all unreserved resource slots. Any excess capacity is shared between VSs that own resource shares in a round-robin fashion (see also firm Capacity Reserves [16]). The implementation of VS resource shares is not portable across platforms. Nevertheless, numerous implementations of capacity reserves and fair-shares exist. Therefore, requiring VS-level fair-shares does not limit the applicability of our approach.

VS statistics are cumulative aggregates of the VS's members' statistics. The attributes include a wide range of statistics that Linux keeps for processes and sockets, such as page faults and virtual time elapsed.

Figure 6:VS module dependencies
\begin{figure}\epsfig{file=figures/modules.eps , width=3in}\end{figure}

To set up the VS hierarchy and adjust CPU limits, VS membership, policies, attribute inheritance, resource limits, and query VS attributes, the OS offers a new system call (servctl). It takes a command, the size of the argument, and an argument structure as parameters.

Gates are implemented as loadable modules. We currently support fork, exit, exec, open, accept, and socket gates. Upon insertion of a gate module, the call-back stubs that are placed in their corresponding system calls are activated so that the gate's prefilter, postfilter, and classifier are executed each time the control flow of a server application passes through the intercepted system calls. Each gate also registers its own servctl-handler to enable gate configuration.

The advantage of our modular gate design is that one only needs to add those gates to the kernel that are absolutely necessary to classify VS membership and insulate services. This is very important because the insertion of each gate into a running kernel increases system overhead (see Section 5 for more detail). The remainder of this section describes our implemented gates.
 
 

Figure 7:Control-flow of the fork gate
\begin{figure}\begin{center}\epsfig{file=figures/fork_gate_illu.eps,width=3in}\end{center}\end{figure}

Fork: Upon interception of fork the created process is classified as a new member of some VS. To determine the resulting VS affiliation, we check the fork_policy object of the creator's VS. The map_to attribute of the fork_policy specifies the affiliation of the created process. If the VS specified by map_to has reached its process count limit (set via the servctl call), the failure behavior that was configured for that VS is invoked (Section 3.5). Figure 7 shows a high-level control flow graph for this gated system call.

Exit: If a process exits -- including ungraceful SIGSEGV and other uncaught signal exits -- it must be removed from the VS with which it is associated. This gate is not configurable.

Exec: Upon calling one of the exec-family system calls, the caller can be reclassified based on the name of the program that was invoked. The gate code checks the name of the program against a hashed mapping table, i.e., the exec_policy field in Figure 8.

Open: The open gate acts like the exec gate. The only difference is that the file descriptor may be tagged with a VS affiliation at the same time. Moreover, the open gate uses a prefix-tree to match the file names. Thus, whole directories -- identified by a shared prefix -- can map to one VS. This is important because large numbers of data files residing in one directory subtree may yield identical VS classifications.

Socket: The socket gate resembles the fork gate. The socket_policy of a VS specifies the future VS affiliation of the created socket. Once messages are relayed via such a classified socket, they are tagged with the VS affiliation of the socket in their IP Type-of-Service field (TOS), thus allowing VS information to propagate over the network. Since the TOS field may be used by DiffServ to provide differential QoS in a WAN, this field can only be used inside server clusters. If the TOS field cannot be used or one needs more than 256 VSID's (the TOS is eight bits wide), one may introduce a new IP-option [17] to hold the VSID. This option should be set in every fragment of the IP datagram to facilitate VS-aware routing.

Close: Closed file descriptors' and sockets' VS affiliation must be removed.

Accept:  The accept gate is quite complex. It first determines the highest-priority VS among the caller, listening socket, and incoming connection. Then the winning VS structure is checked for a VS mapping based on the incoming IP address and the VS affiliation of the listen-socket, process, and incoming connection. The VS affiliation of the incoming connection can only be determined if it was initiated by another server with our VS support and its VSID is from the global VSID range. The VSID is stored in the incoming SYN packet's IP TOS bits. For local accepts, the VS of the incoming connection is the connecting socket's VS affiliation. Both socket and receiver may be reclassified.
 
 

Figure 8:The VS struct

struct service_struct {

  service_t  sid;
  struct service_struct *parent;
  char name[SERVICE_NAME_MAX_LEN];
  int precedence;

   // int_value_ptr is either a value or a pointer to a parent value
  int_value_ptr process_count;
  int_value_ptr socket_count;
  int_value_ptr bytes_sent;
  int_value_ptr vtime;
  int_value_ptr majflt;
  int_value_ptr minflt;

    // member sets
  member_struct *tasks;
  member_struct *sockets;
  member_struct *services;

  fork_policy_struct fork_policy;
  exec_policy_struct exec_policy; ... more ...
  cpu_policy_struct  cpu_policy;
  comm_policy_struct comm_policy;
};
 

The difficulty with accept is that it should not block if the first pending connection on the listen queue leads to a violation of resource limits. There may be a connection that can be accepted without violating any VS resource limits. Therefore, our implementation scans the listen queue for the incoming connection whose VS has utilized its resource reservation the least.

Concurrent Gate Versions: A powerful feature of our implementation is to allow multiple versions of a gate to be loaded at the same time. Hence, VSs may specify which gate version they want to use when their process members invoke the corresponding gated system call. This way it is possible to eliminate unnecessary checks for specific VSs. For example, if fork-ed off processes should always inherit their parents' VS affiliation, it is unnecessary to check for a ( fork, VSIDx) => VSIDx mapping as is required for general VS classification. One can implement one fork gate version that always applies the parent's VS affiliation to the forked child. Another example is the accept gate, which is quite complex in its general form. In a server-farm setup it is likely that incoming service requests are already classified by the frontends and that the applications that process requests in the backends only need to inherit these classifications. This reduces the complexity of the backends' accept gates. We used such an optimized version of the accept gate in our experiments. In our experiments incoming requests were classified as they were picked up by the HTTP server. Whenever the HTTP server relayed work to a shared backend Fast-CGI (FCGI) service, the backend FCGI inherited the classification of the requesting HTTP server process.


nextupprevious
Next:EvaluationUp:Virtual Services A New Previous:Failing System Calls