In many applications of textual analysis corpora include texts having a chronological order. In a typical bag-of-words approach, data of chronological corpora are organized as word-type x time-point contingency tables where row frequencies represent the temporal trajectories of “words”. In this setting major objectives of analysis are finding clusters of words portraying similar temporal patterns and possibly determining prototype patterns of evolution. We propose the application of a class of wavelet-based functional clustering mixed models to address specific issues posed by these data, which are highly sparse over time and individually heterogeneous, besides of being high-dimensional. Wavelet representation can accommodate a wider range of functional shapes, such as peak-like curves, and is more computationally efficient than splines. Moreover, it turns out to be useful in inspecting on different scales of the corpus temporal process. Procedures are tested using different text genres.

Functional model-based curve clustering for discovering temporal patterns in chronological corpora

TREVISANI, MATILDE;
2012-01-01

Abstract

In many applications of textual analysis corpora include texts having a chronological order. In a typical bag-of-words approach, data of chronological corpora are organized as word-type x time-point contingency tables where row frequencies represent the temporal trajectories of “words”. In this setting major objectives of analysis are finding clusters of words portraying similar temporal patterns and possibly determining prototype patterns of evolution. We propose the application of a class of wavelet-based functional clustering mixed models to address specific issues posed by these data, which are highly sparse over time and individually heterogeneous, besides of being high-dimensional. Wavelet representation can accommodate a wider range of functional shapes, such as peak-like curves, and is more computationally efficient than splines. Moreover, it turns out to be useful in inspecting on different scales of the corpus temporal process. Procedures are tested using different text genres.
2012
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/2635128
 Avviso

Registrazione in corso di verifica.
La registrazione di questo prodotto non è ancora stata validata in ArTS.

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact