Check out the new USENIX Web site. next up previous
Next: Results Up: Role Correlation Previous: Challenges

The Role Correlation Algorithm

The correlation algorithm operates by comparing the results of two executions of the grouping algorithms. Let Pt-1 and Pt be the group sets generated by the grouping algorithm at time t-1 and t respectively. The correlation algorithm updates the ID set of Pt, so that IDGt = IDGt-1, where $G_t \in P_t$ and $G_{t-1} \in
P_{t-1}$, if and only if Gt and Gt-1 considered to represent the same logical role. More specifically, the connection patterns of the members of Gt and those of Gt-1 are very similar. The groups correlation algorithm correlates the IDGt and IDGt-1 in a meaningful manner and thus allow applications to preserve data specific to a particular group.

The role correlation algorithm:

First, the correlation algorithm eliminates the differences between the two host sets, It and It-1, so that it can compare the connection patterns meaningfully. The algorithm computes the set of nodes that existed at time t-1 but have been removed in time t ( $I_{t-1} \setminus I_{t}$), and the set of nodes that only appear at time t ( $I_{t} \setminus I_{t-1}$). All new nodes are removed from It and deleted nodes are removed from It-1. Thus, the changes in connection set of each host is only as a direct result of changing connection patterns between the host and its neighbors (which existed at time t).

Second, the algorithm heuristically identifies the set, $H_{\textit{same}}$, of nodes that are very like to play the same logical roles from t-1 to t. We say that the two nodes ht and ht-1 are highly likely to be the same machine (i.e. it hasn't changed its logical role) if they have the identical connection sets. Specifically, $H_{\textit{same}} = \{ h_t \vert
\exists h_{t-1} \in I_{t-1}, C(h_t) = C(h_{t-1})\}$. We will explain shortly how we use the fact that a host $h \in H_{\textit{same}}$ to our advantage in computing the time varying similarity measure.

The role correlation algorithm will determine whether the two groups Gt and Gt-1 are the same group by heuristically computing the time-varying similarity measure and comparing against the pre-defined threshold. The group correlation algorithm operates as follows:

  1. For each group Gt, identify Gt and Gt-1 as the same group if i) Gt-1 has the strongest time-varying similarity with Gt, among all the groups in Pt-1 and ii) the average number of connections is at least within Thi percent of the average number of connections of Gt-1.

  2. For each group pair (Gt, Gt-1) that remain uncorrelated, decide whether Gt and Gt-1 represent the same logical group based on how similar the connection patterns between Gt and its neighbor groups are to those between Gt-1 and its neighbor groups.

Step 1 decides whether the two groups Gt and Gt-1 are identical based on the time varying similarity measure. As in Section 4.2, we compute the similarity measure based on the average number of connections between the groups and their common neighbors. However, finding the common neighbor set between Gt and Gt-1 is not trivial. This is because we cannot simply assume that a neighbor $h_t \in C(G_t)$ and a neighbor $h_{t-1} \in C(G_{t-1})$ are the same host even if they have the same host identifier. We use the following techniques to identify the common neighbor set:

The algorithm then computes the time-varying similarity measure between each neighbor pair (ht, ht-1), which meets the aforementioned requirements, as the minimum of the average number of connections between ht and Gt and between ht-1 and Gt-1. If the sum of the similarity measures for all common neighbor pairs within the bounds of the specified thresholds , the algorithm declares that groups Gt and Gt-1 mean the same group.

next up previous
Next: Results Up: Role Correlation Previous: Challenges
Godfrey Tan 2003-04-01