Check out the new USENIX Web site. next up previous
Next: Role Correlation Up: Role Classification Previous: Forming Groups


Merging Groups

The aforementioned group formation algorithm that uses a simple definition of similarity tends to produce too many groups in many situations. Consider, for example, the network in Figure 1 modified so that Sales-1 communicates with the Mail and SalesDatabase servers but not the Web server. The grouping algorithm in Section 4 will create two groups for the sales hosts, one that only contains Sales-1 and one that contains the rest of the sales hosts. This might be appropriate, but it is probably not what a network administrator would want. The group merging phase builds on the results generated by the group formation phase. It merges groups that are similar in connection habits in a way that allows users to control the process so that more meaningful results can be achieved. During the grouping phase, we merge two groups if they meet the following two requirements:
Similarity requirement.
The similarity measure between the two groups exceeds a user-specified threshold.
Connection requirement.
The average number of connections of each group is comparable.
The algorithm repeatedly merges two groups that meet the two requirements and have the highest similarity measure until no groups can be merged. The K value of a newly merged group is set to the minimum number of connections a host in the group has.
Figure 3: Pseudo-code for determining the similarity and connection requirements.
\fbox{
\begin{minipage}{3in}
\begin{tabbing}
\hspace*{0.5cm}\=\hspace{0.5cm}\...
...+ \frac{s}{c2})$\\
\>return $s * 100$\\
\}
\end{tabbing}
\end{minipage}
}
Figure 3 depicts the pseudo-code for determining the average connection requirement and the similarity requirement. The procedure MEETCONNECTIONREQ decides whether the two groups, G1 and G2, meet the connection requirement. This requirement keeps a group with a large number of connections from merging with another group with a much smaller number of connections. The procedure MEETSIMILARITYREQ determines whether the two groups meet the similarity requirement. Shi and Slo, Shi > Slo, are similarity thresholds that can be set by the user to control the merging process. Which threshold is used depends upon whether $\textit{max}(K_{G_1}, K_{G_2}) \geq K^{hi}$, where Khi is a constant intended to define whether a KG value is ``high.'' The similarity threshold for merging groups is higher for groups with a high KG value, those groups whose member hosts share a high number of common neighbors. This is because merging two groups can change the relationships between other groups in a way that induces additional undesirable merges. Again, consider the groups in the example network illustrated in Figure 1. Notice that if N is large, the similarity measure between the SalesDatabase group and the Mail and Web group will be large. Similarly, for large M, the SourceRevisionControl group will be highly similar to the Mail and Web group. If all three groups were to merge, it would effectively cause the sales group and the engineering group to merge, resulting in a partitioning with two groups: one containing all the servers, and one containing all other hosts. In most situations, this grouping would be undesirable since the network administrators lose the important separation between the Sales machines and the Eng machines. For these reasons, groups with high KG values are required to have a higher similarity measure to merge. We discuss how best to choose the constants in Section 6. SIMILARITY computes the similarity s (between 0 and 100) of connection patterns between two groups. CP(x1, x2) returns the total number of connections between x1 and x2, where xi could either be a host or a group. Two groups are considered similar if they have many common neighboring groups and similar average numbers of connections. For example, if the set of neighbors of G1 is a subset of the set of neighbors of G2, it increases the desirability of merging these two groups. However, if the average numbers of connections of G1 and G2 are quite different, the desirability of merging them is lessened.
next up previous
Next: Role Correlation Up: Role Classification Previous: Forming Groups
Godfrey Tan 2003-04-01