Clustering high-dimensional data is often a challenging task both because of the computational burden required to run any technique, and because the difficulty in interpreting clusters generally increases with the data dimension. In this work, a method for finding low-dimensional representations of high-dimensional data is discussed, specically conceived to preserve possible clusters in data. It is based on the critical bandwidth, a nonparametric statistic to test unimodality, related to kernel density estimation. Some useful properties of the aforementioned statistic are enlightened and an adjustment to use it as a basis for reducing dimensionality is suggested. The method is illustrated by simulated and real data examples.
Reducing Data Dimension for Cluster Detection
TORELLI, Nicola;
2013-01-01
Abstract
Clustering high-dimensional data is often a challenging task both because of the computational burden required to run any technique, and because the difficulty in interpreting clusters generally increases with the data dimension. In this work, a method for finding low-dimensional representations of high-dimensional data is discussed, specically conceived to preserve possible clusters in data. It is based on the critical bandwidth, a nonparametric statistic to test unimodality, related to kernel density estimation. Some useful properties of the aforementioned statistic are enlightened and an adjustment to use it as a basis for reducing dimensionality is suggested. The method is illustrated by simulated and real data examples.File | Dimensione | Formato | |
---|---|---|---|
Torelli_Reducing Data Dimension for Cluster Detection.pdf
Accesso chiuso
Descrizione: articolo
Tipologia:
Documento in Versione Editoriale
Licenza:
Copyright Editore
Dimensione
5.28 MB
Formato
Adobe PDF
|
5.28 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.