In many applications of textual analysis corpora include texts having a chronological order. In a typical bag-of-words approach, data of chronological corpora are organized as word-type x time-point contingency tables where row frequencies represent the temporal trajectories of “words”. In this setting major objectives of analysis are finding clusters of words portraying similar temporal patterns and possibly determining prototype patterns of evolution. We propose the application of a class of wavelet-based functional clustering mixed models to address specific issues posed by these data, which are highly sparse over time and individually heterogeneous, besides of being high-dimensional. Wavelet representation can accommodate a wider range of functional shapes, such as peak-like curves, and is more computationally efficient than splines. Moreover, it turns out to be useful in inspecting on different scales of the corpus temporal process. Procedures are tested using different text genres.
Functional model-based curve clustering for discovering temporal patterns in chronological corpora
TREVISANI, MATILDE;
2012-01-01
Abstract
In many applications of textual analysis corpora include texts having a chronological order. In a typical bag-of-words approach, data of chronological corpora are organized as word-type x time-point contingency tables where row frequencies represent the temporal trajectories of “words”. In this setting major objectives of analysis are finding clusters of words portraying similar temporal patterns and possibly determining prototype patterns of evolution. We propose the application of a class of wavelet-based functional clustering mixed models to address specific issues posed by these data, which are highly sparse over time and individually heterogeneous, besides of being high-dimensional. Wavelet representation can accommodate a wider range of functional shapes, such as peak-like curves, and is more computationally efficient than splines. Moreover, it turns out to be useful in inspecting on different scales of the corpus temporal process. Procedures are tested using different text genres.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.