Check out the new USENIX Web site. next up previous
Next: Preliminary Results. Up: : Previous: Challenges and Caveats

Exposing hidden botnet connections

One of the most challenging facets of the botnet membership problem lies in discerning the relationship among (seemingly) different botnets. To highlight this, we examine the existence of hidden relations among the botnets we tracked. The presence of these relations raises new challenges to the accuracy of botnet population counting techniques. Specifically, for botnets that are related, is the aggregate population count simply the sum of the different botnet populations? Or more importantly, how do we characterize the overlap between different botnet populations? In what follows, we discuss our methodology for finding potential hidden relationships among botnets.

Figure 6: Example of a botnet cluster.
      # <DNS name>    <Channel>  <Server ID>   <Botmaster ID>    <Server Version>
     [1] #!GT!# IRC.Death.TeaM.KW [Lindi_Cracker]-1!HackPimp Unreal3.2.5
     [2] #!GT!# IRC.Death.TeaM.KW ChanServ!Coder Unreal3.2.5
     [3] #.rxb0t IRC.Death.TeaM.KW Chan!Coder Unreal3.2.5
     [4] #.rxbot IRC.Death.TeaM.KW Chan!Coder Unreal3.2.5

First, we create for each botnet a $ d$-dimensional structural feature vector $ \vec v_i = x_1, x_2, \dots , x_d$. We choose the following features to represent a botnet's unique identity:

  1. DNS name and/or IP address of IRC Server.
  2. IRC server or IRC network name (e.g., ToXiC.BoTnEt.Net).
  3. Server version (e.g., Unreal3.2.3).
  4. IRC channel name.
  5. Botmaster ID. All these IDs are extracted from the IRC trace by observing the identity of the user with operator privileges who posts commands to the channel.

To reveal the existence of clusters of related botnets we then create a proximity matrix $ \mathbf M$ by calculating a pair-wise scores across all botnet vectors, $ {\cal V} =
\vec{v_1},\vec{v_2},\dots,\vec{v_n}$. For a pair of vectors $ \vec{v_i}, \vec{v_j}$ the pair-wise score $ m_{i,j}$ is a weighted dot product of the two vectors.

$\displaystyle m_{i,j} = \sum_{k=0}^{d} w_k~ (x_{i,k} \cdot x_{j,k})$    

where $ w_k$ is the weight assigned to dimension $ k$ and the product of the two vector fields is one if they are identical, or zero otherwise. Considering that similarity in the names of the IRC servers implies strong correlation between two botnets, we assign a weight of 1.5 to the IRC server dimension, while all other dimensions are given equal weights of 0.5.

Given the matrix $ \mathbf M$, we infer related botnets by extracting botnet groups that have pairwise similarity scores above a threshold $ \delta$. We choose $ \delta= 1.5$, so that two botnets are related if they have the same IRC server DNS name or match in at least three other dimensions.

next up previous
Next: Preliminary Results. Up: : Previous: Challenges and Caveats
Fabian Monrose 2007-04-03