Spatial data mining (SDM) refers to the mining of knowledge from spatial data. Recently, location-based services have enabled the gathering of a significant amount of geo-referenced data, i.e., of spatial big data (SBD). Spatial datasets often exceed the ability of current computing systems to manage these data with reasonable effort; therefore, data-intensive computing and data mining techniques are useful tools for conducting an analysis. In this paper, we present an approach to the clustering of high-dimensional data that allows a flexible approach to the statistical modelling of phenomena characterised by unobserved heterogeneity. Numerous clustering algorithms have been developed for large databases; density-based algorithms particularly treat a huge amount of data in large spatial databases. We present the Modified Density-Based Spatial Clustering of Applications with Noise (MDBSCAN) algorithm and compare it to the classical k-means approach. Both applications use synthetic datasets and a dataset of satellite images.

A methodology for dealing with spatial big data

SCHOIER, GABRIELLA;BORRUSO, GIUSEPPE
2017-01-01

Abstract

Spatial data mining (SDM) refers to the mining of knowledge from spatial data. Recently, location-based services have enabled the gathering of a significant amount of geo-referenced data, i.e., of spatial big data (SBD). Spatial datasets often exceed the ability of current computing systems to manage these data with reasonable effort; therefore, data-intensive computing and data mining techniques are useful tools for conducting an analysis. In this paper, we present an approach to the clustering of high-dimensional data that allows a flexible approach to the statistical modelling of phenomena characterised by unobserved heterogeneity. Numerous clustering algorithms have been developed for large databases; density-based algorithms particularly treat a huge amount of data in large spatial databases. We present the Modified Density-Based Spatial Clustering of Applications with Noise (MDBSCAN) algorithm and compare it to the classical k-means approach. Both applications use synthetic datasets and a dataset of satellite images.
File in questo prodotto:
File Dimensione Formato  
2017_IJBIDM_1705_FPV.pdf

Accesso chiuso

Descrizione: articolo principale
Tipologia: Documento in Versione Editoriale
Licenza: Digital Rights Management non definito
Dimensione 321.3 kB
Formato Adobe PDF
321.3 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
2914562_2017_IJBIDM_1705_FPV-PostPrint.pdf

accesso aperto

Tipologia: Bozza finale post-referaggio (post-print)
Licenza: Digital Rights Management non definito
Dimensione 328.56 kB
Formato Adobe PDF
328.56 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/2914562
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? ND
social impact