In text clustering most distance-based methods summarize the occurrences of a set of linguistic features to obtain a distance. It should decrease when texts are written by the same author, however, there are further properties that might influence the result: gender of the authors, their age, their geographical origin, publication date of the novels, their size, etc. In this study, regression analyses compare the performance of three distances and highlight, among available covariates, the preeminent effect of the author's hand but also interesting patterns in the effect of novels’ size.

Distance measures for exploring pairs of novels in a large corpus of Italian literature

M. Trevisani
;
2020-01-01

Abstract

In text clustering most distance-based methods summarize the occurrences of a set of linguistic features to obtain a distance. It should decrease when texts are written by the same author, however, there are further properties that might influence the result: gender of the authors, their age, their geographical origin, publication date of the novels, their size, etc. In this study, regression analyses compare the performance of three distances and highlight, among available covariates, the preeminent effect of the author's hand but also interesting patterns in the effect of novels’ size.
2020
9788891910776
File in questo prodotto:
File Dimensione Formato  
TrevisaniTuzzi_sis2020.pdf

Accesso chiuso

Tipologia: Documento in Versione Editoriale
Licenza: Digital Rights Management non definito
Dimensione 679.8 kB
Formato Adobe PDF
679.8 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/2996993
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact