Dns2Vec: Exploring Internet Domain Names through Deep Learning

Monday, August 12, 2019 - 4:00 pm4:30 pm

Amit Arora, Hughes Network Systems

Abstract: 

The concept of vector space embeddings was first applied in the area of Natural Language Processing (NLP) but has since been applied to several domains wherever there is an aspect of semantic similarity. Here we apply vector space embeddings to Internet Domain Names. We call this Dns2Vec. A corpus of Domain Name Server (DNS) queries was created from traffic from a large Internet Service Provider (ISP). A skipgram word2vec model was used to create embeddings for domain names. The objective was to find similar domains and examine if domains in the same category (news, shopping etc.) cluster together. The embeddings could then be used for several traffic engineering application such as shaping, content filtering, prioritization and also for predicting browsing sequence and anomaly detection. The results were confirmed by manually examining similar domains returned by the model, visualizing clusters using t-SNE and also using a 3rd party web categorization service (Symantec K9).

Amit Arora, Hughes Network Systems

Data scientist at Hughes Network Systems. Graduated from M.S. in Data Science program from Georgetown University, December 2018. Love working with data, R and Python, Machine Learning, AutoML, Apache Spark, Flink, Deep Learning, NLP, Shiny, Elasticsearch, AWS, GCP, Datarbricks. Have a flair for teaching.

Before transitioning to a full time data scientist role I had more than 18 years of work experience in Satellite Networking domain. Have extensively worked on satellite systems, with direct work experience on satellite modems as well as hub side gateways. Worked on several key technologies related to IPv6, IMS, traffic acceleration, traffic shaping, encryption, routing, layer 2 protocols, FIPS 140-2 certification, diagnostics etc.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {238499,
author = {Amit Arora},
title = {{Dns2Vec}: Exploring Internet Domain Names through Deep Learning},
year = {2019},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = aug
}