We describe the approach that we submitted to the 2015 PAN competition for the author identification task. The task consists in determining if an unknown document was authored by the same author of a set of documents with the same author. We propose a machine learning approach based on a number of different features that characterize documents from widely different points of view. We construct non-overlapping groups of homogeneous features, use a random forest regressor for each features group, and combine the output of all regressors by their arithmetic mean. We train a different regressor for each language. Our approach achieved the first position in the final rank for the Spanish language.
An Author Verification Approach Based on Differential Features
BARTOLI, Alberto;DAGRI, ALEX;DE LORENZO, ANDREA;MEDVET, Eric;TARLAO, FABIANO
2015-01-01
Abstract
We describe the approach that we submitted to the 2015 PAN competition for the author identification task. The task consists in determining if an unknown document was authored by the same author of a set of documents with the same author. We propose a machine learning approach based on a number of different features that characterize documents from widely different points of view. We construct non-overlapping groups of homogeneous features, use a random forest regressor for each features group, and combine the output of all regressors by their arithmetic mean. We train a different regressor for each language. Our approach achieved the first position in the final rank for the Spanish language.File | Dimensione | Formato | |
---|---|---|---|
41-CR.pdf
accesso aperto
Descrizione: pdf editoriale
Tipologia:
Documento in Versione Editoriale
Licenza:
Digital Rights Management non definito
Dimensione
183.88 kB
Formato
Adobe PDF
|
183.88 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.