Check out the new USENIX Web site. next up previous
Next: Experimental Methodology Up: The Design of WOLF Previous: Reading

Garbage Collection

WOLF does not completely eliminate garbage, therefore garbage collection is still needed. Benefit-to-Cost cleaning algorithm works well in most cases while hole-plugging policy works well when the disk segment utilization is very high. Since previous research shows that a single cleaning algorithm is unlikely to perform equally well for all kinds of workloads, we used an adaptive approach similar to the Matthews' method [9]. This policy automatically selects either the benefit-to-cost cleaner or the hole-plugging method depending on the cost-benefit estimates.

In WOLF, the cleaner runs when the system is idle or disk utilization exceeds a high water-mark. In our simulation, the high water-mark is when 80% of the disk is full, and idle is defined as the file system has no activities in 5 minutes. The amount of data that the cleaner may process at one time can be varied. In this paper, we allowed the cleaner to process up to 20 MB at one time. To calculate the benefit and overhead of garage collection, we used the following mathematical model. These formula were developed by Matthews et al. (See more details in [9]).

The benefit-to-cost ratio is defined as follows:


\begin{displaymath}
\frac{benefit}{cost} = \frac{(1-utilization)*age \ of \
segment}{(1+utilization)}
\end{displaymath}

Here utilization represents the ratio of the live bytes to one segment size. Specifically, the cost-benefit values of cleaning and hole-plugging policies are calculated as follows:


\begin{displaymath}
CostBenefit_{Cleaning} =
\frac{TransferTime_{Cleaning}}{SpaceFreed_{Cleaning}}
\end{displaymath}


\begin{displaymath}
CostBenefit_{Plugging} =
\frac{TransferTime_{Plugging}}{SpaceFreed_{Plugging}}
\end{displaymath}

The adaptive policy always picks up segments with the lower Cost-Benefit estimates to clean. Segments with more garbage (hence very low segment utilization and high benefit-to-cost ratios) will be cleaned first. Older segments will also be cleaned first, as data in younger segments will have a better chance to be invalidated in the future.

Because WOLF's buffer manager separates the active data from inactive data which leads to a bimodal disk segment layout, both the benefit-to-cost and hole-plugging methods can benefit from this nice layout. For benefit-to-cost, since most active segments are mostly garbage (hence very low utilization), their benefit-to-cost ratios are very high. These segments will be cleaned first to yield many blank segments. For hole-plugging, when the adaptive cleaner switches to this method (which will tend to occur in very high disk utilization), cleaner uses the least utilized segments to plug the holes in the most utilized segments. WOLF simply reads the few remaining live bytes from an active disk segment and plug them into the few available slots of an inactive disk segment (very high segment utilization).


next up previous
Next: Experimental Methodology Up: The Design of WOLF Previous: Reading
Jun Wang 2001-10-31