Challenges for clustering streaming data are getting continuously more sophisticated. This trend is driven by the the emerging requirements of the application where those algo- rithms are used and the properties of the stream itself. Some of these properties are the continuous data arrival, the time-critical processing of objects, the evolution of the data streams, the presence of outliers and the varying densities of the data. Due to the fact that the stream evolves continuously in the process of its existence, it is crucial that the algorithm autonomously detects clusters of arbitrary shape, with different densities, and varying number of clusters. Recently, the first hierarchical density-based stream clustering algorithm based on cluster stability, called HASTREAM, was proposed. Although the algo- rithm was able to meet the above mentioned requirements, it inherited the main drawback of density-based hierarchical clustering algorithms, namely the efficiency issues. In this paper we propose I-HASTREAM , a first density-based hierarchical clustering algorithm that has considerably less computational time than HASTREAM. Our proposed method utilizes and introduces techniques from the graph theory domain to devise an incremental update of the underlying model instead of repeatedly performing the expensive calculations of the huge graph. Specifically the Prim’s algorithm for constructing the minimal spanning tree is adopted by introducing novel, incremental maintenance of the tree by vertex and edge insertion and deletion. The extensive experimental evaluation study on real world datasets shows that I-HASTREAM is considerably faster than a state-of-the-art hierarchi- cal density-based stream clustering approach while delivering almost the same clustering quality.

Adaptive Stream Clustering Using Incremental Graph Maintenance

CUZZOCREA, Alfredo Massimiliano;
2015

Abstract

Challenges for clustering streaming data are getting continuously more sophisticated. This trend is driven by the the emerging requirements of the application where those algo- rithms are used and the properties of the stream itself. Some of these properties are the continuous data arrival, the time-critical processing of objects, the evolution of the data streams, the presence of outliers and the varying densities of the data. Due to the fact that the stream evolves continuously in the process of its existence, it is crucial that the algorithm autonomously detects clusters of arbitrary shape, with different densities, and varying number of clusters. Recently, the first hierarchical density-based stream clustering algorithm based on cluster stability, called HASTREAM, was proposed. Although the algo- rithm was able to meet the above mentioned requirements, it inherited the main drawback of density-based hierarchical clustering algorithms, namely the efficiency issues. In this paper we propose I-HASTREAM , a first density-based hierarchical clustering algorithm that has considerably less computational time than HASTREAM. Our proposed method utilizes and introduces techniques from the graph theory domain to devise an incremental update of the underlying model instead of repeatedly performing the expensive calculations of the huge graph. Specifically the Prim’s algorithm for constructing the minimal spanning tree is adopted by introducing novel, incremental maintenance of the tree by vertex and edge insertion and deletion. The extensive experimental evaluation study on real world datasets shows that I-HASTREAM is considerably faster than a state-of-the-art hierarchi- cal density-based stream clustering approach while delivering almost the same clustering quality.
http://www.jmlr.org/proceedings/papers/v41/
http://www.jmlr.org/proceedings/papers/v41/hassani15.html
File in questo prodotto:
File Dimensione Formato  
Adaptive Stream Clustering Using Incremental Graph Maintenance.pdf

non disponibili

Descrizione: articolo principale
Tipologia: Documento in Versione Editoriale
Licenza: Digital Rights Management non definito
Dimensione 1.2 MB
Formato Adobe PDF
1.2 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11368/2872362
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact