Check out the new USENIX Web site. next up previous
Next: Run Time Up: Results Previous: Configuration

Effects of Similarity Thresholds

In this subsection, we examine how the choice of the user-defined thresholds, Slo, Shi, and Khi, affect the number of groups formed by the role classification algorithm. Recall that the two groups are merged if and only if their similarity measure is $\geq
S^{lo}$. Furthermore, if the maximum KG associated with the groups is $\geq K^{hi}$, they are not merged unless their similarity measure is $\geq S^{hi}$. We require that $0 \leq S^{lo} < S^{hi} \leq 100$.

Figure 6 illustrates how Slo affects the total number of groups formed for both Mazu and BigCompany networks. The number of groups increases with Slo. Again, a large Slo value keeps more groups from merging and as a result, the total number of groups remains large.

The number of groups may not increase smoothly with the increase in Slo. For instance, there is steeper incline (knee) in the number of groups of BigCompany network when Slo is increased from 70 to 90. The reason is that the increase in Slo causes some groups with high numbers of connections to split, since they no longer meet the stronger similarity requirement to merge. This in turn causes several neighboring groups to split. The extent to which such splits occur varies from network to network. A knee in the curve indicates that the algorithm can expose the logical structure of the network in two significantly different manners. Consider again the network in Figure 1. If Slo is too low, Mail, Web, SalesDatabase, and SourceRevisionControl will all be placed in one group, whereas all sales and engineering machines will be placed in another. In some cases, such grouping might be more appropriate than the one achieved in Figure 1. Network administrators should compare the grouping results on both sides of the knee and decide which one better suits their needs.

Our experiments show that as long as $S^{hi} \geq 80$, changes to Shi hardly affect the grouping results. Therefore, we suggest that Shi be fixed.

On the other hand, the choice of Khi has a significant impact and should probably vary from network to network. If Khi is set to the maximum number of connections that any host has, the similarity measure between hosts is only compared against Slo. If Khi = 0, the similarity measure is only compared against Shi. Ideally, Khi should be set at a value that partitions the hosts in the network into two groups, one containing all server-like machines, and one containing all others.

Figure 7 shows how Khi affects the number of groups formed. For any two data points with the same number of groups, the grouping results are identical. Clearly, the grouping results do not change for the Mazu network when $K^{hi} \geq 4$. Similarly, the grouping results hardly change for the BigCompany network when $K^{hi}
\geq 3$. This implies that it is not too difficult to find an appropriate Khi for a particular network. By default, we set Khi = 7 and believe that this value will be suitable for most networks. Nevertheless, we are currently working on automatically setting Khi.

next up previous
Next: Run Time Up: Results Previous: Configuration
Godfrey Tan 2003-04-01