Check out the new USENIX Web site.

Feasibility Study

Our first experiment aimed to determine the performance feasibility of basing SWORD on the DHT-based range search primitive described earlier. As a baseline workload we ran SWORD on approximately 200 PlanetLab nodes on August 1, 2004. Each node sent an update containing 54 metrics every 2 minutes. We measured the query latency for a query that retrieves the full range of one attribute (the node's one-minute load average) under four configurations: 1, 40, 80, and 200 queries issued per minute across the system (evenly distributed among nodes). Each attribute was mapped to approximately six nodes; therefore six nodes were visited in satisfying the range query.

For comparison, we implemented a ``centralized'' version of SWORD in which each update is sent to one of $N$ servers at random, and each query is sent to all $N$ servers. We chose the servers to be machines with low load and low latency/high bandwidth network links. In contrast, the nodes used to satisfy a query in the DHT approach are selected randomly, since nodes choose their DHT IDs randomly.

Figure 2: Performance of DHT-based range search. The bar at each $x$ value shows median query latency and $90th$ and $10th$ percentile latency.
\begin{figure}\centering
\epsfig{file=dhtrange.eps, width=2.75in}
\vspace{-2mm}\vspace{-5mm}
\end{figure}

Figure 2 shows the performance of the DHT range search as a function of query rate. We emphasize that these experiments were conducted over a 4-hour period during which node and network resource contention varied, and the number of nodes in the system varied slightly. These results are therefore only approximately repeatable. For comparison, the ``centralized'' version with one server offered consistently inferior median performance for query rates of 40 queries/min and above (ranging from median latency of 728 ms for 1 query/min to 12.5 seconds for 200 queries/min), while the ``centralized'' version with two servers offered consistently superior median performance for all query rates (with median latency ranging from 251 ms for 1 query/min to 3.3 seconds for 200 queries/min). This suggests that although DHT-based SWORD may offer acceptable performance, a non-DHT-based version with as few as two well-chosen servers may offer superior performance. The general lesson here is that users deploying services intended for the scale of PlanetLab might be wise to consider implementing a simpler ``centralized'' version first, only moving to a decentralized design if they believe it will substantially improve some property of the service.

Our next experiment examined optimizer latency as a function of number of candidate nodes, for a query that requested two groups of nodes, 4 nodes in each group, with at most 150 ms inter-node latency within each group, and all nodes with a varying range of loads ranges that allowed us to control the number of candidate nodes returned. This experiment showed that even under moderately high load (the optimizer was run on a node with a load consistently in the 3-4 range, with over two dozen active slices), and with the worst case of all nodes returned by the distributed query as candidate nodes, the optimizer could satisfy the query in less than seven seconds. Still, this performance is somewhat disappointing, suggesting that we should attempt to improve the optimizer's performance and should consider moving the computation entirely to the (presumably less loaded) user's machine rather than running it on the PlanetLab node that servers as the query's entry point into SWORD.

Jeannie Albrecht 2004-11-03