Check out the new USENIX Web site. next up previous
Next: 2.3 Backing up data Up: 2 The simplified scheme Previous: 2.1 Finding partners

2.2 Creating a reliable logical disk

We use Reed-Solomon erasure-correcting codes [13] to create a highly-reliable logical disk from a large number of partners. A $(k{+}m,m)$-Reed-Solomon erasure-correcting code generates $m$ redundancy blocks from $k$ data blocks in such a way that the original $k$ data blocks can be reconstructed from any $k$ of the $k{+}m$ data and redundancy blocks. By placing each of the $k{+}m$ blocks on a different partner, the logical disk can survive the failure of any $m$ of the $k{+}m$ partners without any data loss, with a space overhead of $m/k$.

Erasure-correcting codes are more efficient than error-correcting codes because they handle only erasures (detectable losses of data) rather than the more general class of errors (arbitrary changes in data). Block errors can be turned into block erasures by attaching a checksum and version number to each stored block; all but the blocks with correct checksums and the highest version number are considered erased.3

Figure 1: Sample block layout using 6 partners ($\cal B$-$\cal G$) with $k=4$ and $m=2$. $R_{w,x,y,z}$ denotes the first redundancy block and $R'_{w,x,y,z}$ the second redundancy block generated from data blocks $w$, $x$, $y$, and $z$.
\begin{figure}\begin{tabular}{\vert c\vert c\vert c\vert c\vert c\vert c\vert}
\...
...vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\
\end{tabular}\end{figure}

So that we can use a small fixed block size, we stripe blocks across the partners using the erasure-correcting code. See Figure 1 for an example block layout for a computer $\cal A$ with 6 partners that is using $k=4$ and $m=2$. We place corresponding outputs of the erasure-correcting code on the same partner (e.g., $\cal F$ holds all the first redundancy blocks in Figure 1) to make reconfiguration easier (see Section 2.6).

Figure 2: Reliability and overhead for increasing values of $m$, holding $k$ constant at 6, and assuming an individual computer reliability of 90%.
\begin{figure}\begin{tabular}{\vert c\vert c\vert c\vert r\vert r\vert}
\hline
$...
...\% \\
\hline
6 & 6 & 12 & 99.995\% & 100\% \\
\hline
\end{tabular}\end{figure}

Each computer decides for itself how to tradeoff reliability against overhead by choosing values for $k$ and $m$; these choices in turn determine the number of partners $n=k+m$ it needs. To get a feel for how these tradeoffs work in practice, see Figures 2 and 3. Figure 2 shows how reliability rapidly and overhead slowly increase as the number of redundancy blocks ($m$) is increased while holding the number of data blocks ($k$) constant. Figure 3 shows that the overhead can be decreased for a given level of reliability by increasing the number of partners ($n{=}k{+}m$). (Unlike traditional RAID systems, we can use high values of $m$ and $n$ because backup and restoration are relatively insensitive to latency.)

These figures were calculated via the binomial distribution assuming that individual Internet computers fail independently and are 90% reliable. More precisely, they assume that when a computer tries to restore its data, the probability that a particular one of its partners still has its data, uncorrupted, and is sufficiently available during a limited restoration time window to supply that data is 90%.

This number is meant to be conservative; we expect from our personal experience that the real number will be considerably higher. Indeed, the only empirical study we know of on PC availability, Bolosky et al. [3], found that over half of all Microsoft's corporate desktops were up over 95% of the time when pinged hourly. As that study included some PCs that are shut down at night or over the weekend and a reasonable restoration window would probably be at least 24 hours, their numbers underestimate restoration availability for machines promising 24-hour availability. Nonetheless, even when using our conservative number, our calculations show that high reliability can be achieved with low overhead.

We expect randomly-chosen Internet PCs to fail independently except in cases of widespread virus damage, sustained loss of Internet connectivity, and (in the case of uncooperative environments) coordinated attacks involving multiple computers. See Section 3.3 for discussion of why we believe the later is unlikely to be a problem in practice.

Figure 3: Reliability and overhead for increasing values of $k$, using the minimum value of $m$ necessary to get a reliability of at least 99.995% and assuming an individual computer reliability of 90%.
\begin{figure}\begin{tabular}{\vert c\vert c\vert c\vert r\vert r\vert}
\hline
$...
...% \\
\hline
18 & 10 & 28 & 99.996\% & 56\% \\
\hline
\end{tabular}\end{figure}


next up previous
Next: 2.3 Backing up data Up: 2 The simplified scheme Previous: 2.1 Finding partners
Mark Lillibridge 2003-04-07