Understanding Network Traffic through Lightweight Hierarchical Clustering Abdulrahman Hijazi, Hajime Inoue, Dana Jansens, Ashraf Matrawy, Paul van Oorschot, Anil Somayaji Carleton Computer Security Laboratory (CCSL) Carleton University, Ottawa, Canada The complexity of current Internet applications makes the understanding of network traffic a challenging task. By providing larger-scale aggregates for analysis, unsupervised clustering approaches can greatly aid in the identification of new applications, attacks, and other changes in network usage patterns. ADHIC (Approximate Divisive HIerarchical Clustering) is a new algorithm that clusters similar network traffic together without prior knowledge of protocol structures. Packet similarity is determined through comparisons of substrings within packets at distinguishing offsets. ADHIC is notable in that it * produces a hierarchical decomposition of network traffic in the form of a cluster-identifying decision tree, * needs only a small fraction of packets (about 3% in our traces) to generate a decision tree, and * generates a decision tree that can be used to cluster packets at wire speeds (250 Mbit/sec in an unoptimized software implementation). We find that ADHIC appropriately segregates well-known protocols, clusters together traffic of the same protocol running on multiple ports, and segregates traffic from applications, such as p2p, that do not use standard ports. NetADHICT, our implementation of ADHIC, is available for download at https://www.ccsl.carleton.ca/software and is licensed under the GNU GPL license.