Chronological corpora are collections of texts ordered in time. In bag-of-words approaches, data are typically the frequencies of individual words in the set of texts being grouped into equal-distant time points. In our work the temporal course of a word occurrence is viewed as a proxy of a word life-cycle: recognition of temporal shapes and clustering of words having similar life-cycles are the basic objective. However, the strong asymmetry of frequency spectrum typical of textual data has to be taken into account when defining the specific purpose of clustering and, hence, any type of further processing of data. By adopting a functional data approach and a distance-based curve clustering, the effect of selected data transformations on the generation of word groups is examined.

Effects on curve clustering of different transformations of chronological textual data

TREVISANI, MATILDE
;
2016-01-01

Abstract

Chronological corpora are collections of texts ordered in time. In bag-of-words approaches, data are typically the frequencies of individual words in the set of texts being grouped into equal-distant time points. In our work the temporal course of a word occurrence is viewed as a proxy of a word life-cycle: recognition of temporal shapes and clustering of words having similar life-cycles are the basic objective. However, the strong asymmetry of frequency spectrum typical of textual data has to be taken into account when defining the specific purpose of clustering and, hence, any type of further processing of data. By adopting a functional data approach and a distance-based curve clustering, the effect of selected data transformations on the generation of word groups is examined.
File in questo prodotto:
File Dimensione Formato  
TrevisaniTuzzi_cladag2015_cod.pdf

Accesso chiuso

Descrizione: Articolo principale
Tipologia: Documento in Versione Editoriale
Licenza: Digital Rights Management non definito
Dimensione 1.47 MB
Formato Adobe PDF
1.47 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/2846552
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact