Binning errors

Next: Variances of estimators Up: Bins, timeouts, and flow Previous: Slices shorter than bins

Binning errors

So far we assumed that Flow Slices uses binned measurement. This guarantees that as long as the analysis is on time intervals that are exact multiples of the measurement bins used, it will be easy to determine exactly how many of the packets and the bytes counted by the record were within the bin. But by default Flow Slices doesn't use bins, and for records that span bin boundaries, the user will have to guess how the packets and bytes were actually divided between the bins. We can prove that our reconstruction of how the traffic divides between the bins is unbiased only if we make an assumption about the spacing of the packets.

Assumption 3 For every flow at the input of the flow slicing algorithm, the time between the arrivals of all pairs of its consecutive packets is the same.

We use the following algorithm for distributing the packets of reported by a flow record that spans bins between the bins covered by the record. We consider packet arrival events, the first one is the timestamp of the first packet counted by the entry, the last one the timestamp of the last packet counted by the entry and the remaining evenly spaced between them. We consider that packet arrived at every packet arrival event, except for the first event which has packets, and distribute the packets between bins accordingly. This can be shown to be an unbiased way of distributing packets between bins under assumption 3. We recommend distributing the bytes of the flow between bins proportionally with the number of packets counted against each bin. Assumption 3 is not enough to prove this distribution of bytes between the bins to be unbiased, we would need an additional assumption about uniformity of packet sizes. For flow arrivals, we do not have a binning problem because we assume that the first packet counted by the flow record is the one with the SYN, so we count the flow arrival against the bin the first packet is in.

We cannot achieve provably unbiased binning for bytes and packets under realistic assumptions about inter packet arrival times and packet size distributions within flows. We turn to measurements instead to see how much the binning error is on typical traffic. We recommend using such experimental results to decide whether increasing the size of the flow record by adding multiple counters to do binned measurement is worth it.

Next: Variances of estimators Up: Bins, timeouts, and flow Previous: Slices shorter than bins

Ramana Rao Kompella 2005-08-12