The identification of the services that generate traffic is crucial for ISPs and companies to plan and monitor the network. The widespread deployment of encryption and the convergence of the web services towards HTTP/HTTPS challenge traditional classification techniques. Algorithms to classify traffic are left with little information, such as server IP addresses, flow characteristics and queries performed at the DNS. Moreover, due to the usage of Content Delivery Networks and cloud infrastructure, it is unclear whether such coarse metadata is sufficient to differentiate the traffic. This paper studies to what extent basic information visible at flow-level measurements is useful for traffic classification on the web. By analyzing a large dataset of flow measurements, we quantify how often the same server IP address is used by different services, and how services use hostnames. Our results show that a very simple classifier that relies only on server IP addresses and on lists of hostnames can distinguish up to 55% of the traffic volume. Yet, collisions of names and addresses are common among popular services, calling for more ingenuity. This paper is a preliminary step in the evaluation of classification algorithms that are suitable for the modern Internet, where only minimal metadata collection will be possible in the network.

Towards web service classification using addresses and DNS

TREVISAN, MARTINO;
2016-01-01

Abstract

The identification of the services that generate traffic is crucial for ISPs and companies to plan and monitor the network. The widespread deployment of encryption and the convergence of the web services towards HTTP/HTTPS challenge traditional classification techniques. Algorithms to classify traffic are left with little information, such as server IP addresses, flow characteristics and queries performed at the DNS. Moreover, due to the usage of Content Delivery Networks and cloud infrastructure, it is unclear whether such coarse metadata is sufficient to differentiate the traffic. This paper studies to what extent basic information visible at flow-level measurements is useful for traffic classification on the web. By analyzing a large dataset of flow measurements, we quantify how often the same server IP address is used by different services, and how services use hostnames. Our results show that a very simple classifier that relies only on server IP addresses and on lists of hostnames can distinguish up to 55% of the traffic volume. Yet, collisions of names and addresses are common among popular services, calling for more ingenuity. This paper is a preliminary step in the evaluation of classification algorithms that are suitable for the modern Internet, where only minimal metadata collection will be possible in the network.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/3024004
 Avviso

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 24
  • ???jsp.display-item.citation.isi??? 22
social impact