Latent Dirichlet Allocation is a generative probabilistic model that can be used to describe and analyse textual data. We extend the basic LDA model to search and classify a large set of administrative documents taking into account the structure of the textual data that show a clear hierarchy. This can be considered as a general approach to the analysis of short texts semantically linked to larger texts. Some preliminary empirical evidence that support the proposed model is presented.
Clustering Textual Data by Latent Dirichlet Allocation: Application and Extension to Hierarchical Data
TORELLI, Nicola
2010-01-01
Abstract
Latent Dirichlet Allocation is a generative probabilistic model that can be used to describe and analyse textual data. We extend the basic LDA model to search and classify a large set of administrative documents taking into account the structure of the textual data that show a clear hierarchy. This can be considered as a general approach to the analysis of short texts semantically linked to larger texts. Some preliminary empirical evidence that support the proposed model is presented.File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.