Effects of Similarity Thresholds

Next: Run Time Up: Results Previous: Configuration

Effects of Similarity Thresholds

In this subsection, we examine how the choice of the user-defined thresholds, S^lo, S^hi, and K^hi, affect the number of groups formed by the role classification algorithm. Recall that the two groups are merged if and only if their similarity measure is $\geq S^{lo}$ . Furthermore, if the maximum K_G associated with the groups is $\geq K^{hi}$ , they are not merged unless their similarity measure is $\geq S^{hi}$ . We require that $0 \leq S^{lo} < S^{hi} \leq 100$ .

Figure 6 illustrates how S^lo affects the total number of groups formed for both Mazu and BigCompany networks. The number of groups increases with S^lo. Again, a large S^lo value keeps more groups from merging and as a result, the total number of groups remains large.

The number of groups may not increase smoothly with the increase in S^lo. For instance, there is steeper incline (knee) in the number of groups of BigCompany network when S^lo is increased from 70 to 90. The reason is that the increase in S^lo causes some groups with high numbers of connections to split, since they no longer meet the stronger similarity requirement to merge. This in turn causes several neighboring groups to split. The extent to which such splits occur varies from network to network. A knee in the curve indicates that the algorithm can expose the logical structure of the network in two significantly different manners. Consider again the network in Figure 1. If S^lo is too low, Mail, Web, SalesDatabase, and SourceRevisionControl will all be placed in one group, whereas all sales and engineering machines will be placed in another. In some cases, such grouping might be more appropriate than the one achieved in Figure 1. Network administrators should compare the grouping results on both sides of the knee and decide which one better suits their needs.

Our experiments show that as long as $S^{hi} \geq 80$ , changes to S^hi hardly affect the grouping results. Therefore, we suggest that S^hi be fixed.

On the other hand, the choice of K^hi has a significant impact and should probably vary from network to network. If K^hi is set to the maximum number of connections that any host has, the similarity measure between hosts is only compared against S^lo. If K^hi = 0, the similarity measure is only compared against S^hi. Ideally, K^hi should be set at a value that partitions the hosts in the network into two groups, one containing all server-like machines, and one containing all others.

Figure 7 shows how K^hi affects the number of groups formed. For any two data points with the same number of groups, the grouping results are identical. Clearly, the grouping results do not change for the Mazu network when $K^{hi} \geq 4$ . Similarly, the grouping results hardly change for the BigCompany network when $K^{hi} \geq 3$ . This implies that it is not too difficult to find an appropriate K^hi for a particular network. By default, we set K^hi = 7 and believe that this value will be suitable for most networks. Nevertheless, we are currently working on automatically setting K^hi.

Next: Run Time Up: Results Previous: Configuration

Godfrey Tan 2003-04-01