This paper illustrates methods, tools, and preliminary results of a study aimed to create a list of job titles and analyze them in a corpus of 100 Italian canonical and non-canonical fictional prose works published between 1825 and 1923. Job titles are interesting because they reflect socio-economic changes, as well as giving important information on how literary settings and genres changed. After a short introduction on job titles, we will discuss tools and methods used for the creation of a list of words, data extraction, and data representation, both from a linguistic and programming point of view. Some preliminary results will be shown and discussed with a statistical approach: data do not suggest significant patterns over time: whereas job titles appear more or less consistent from a chronological point of view. Finally, some advantages and limitations will be examined. The goal of this study is to develop a set of tools and methods that can be easily reproduced to build complex lexical lists, find their items, and represent data for corpora of any genre and size in a simple and effective way.
“Trovare lavoro” in un corpus di narrativa del XIX-XX secolo. Procedure, aspetti e problemi di creazione, estrazione e rappresentazione dei dati
Floriana Carlotta Sciumbata
;Paolo Nadalutti
;Luca Tringali
2021-01-01
Abstract
This paper illustrates methods, tools, and preliminary results of a study aimed to create a list of job titles and analyze them in a corpus of 100 Italian canonical and non-canonical fictional prose works published between 1825 and 1923. Job titles are interesting because they reflect socio-economic changes, as well as giving important information on how literary settings and genres changed. After a short introduction on job titles, we will discuss tools and methods used for the creation of a list of words, data extraction, and data representation, both from a linguistic and programming point of view. Some preliminary results will be shown and discussed with a statistical approach: data do not suggest significant patterns over time: whereas job titles appear more or less consistent from a chronological point of view. Finally, some advantages and limitations will be examined. The goal of this study is to develop a set of tools and methods that can be easily reproduced to build complex lexical lists, find their items, and represent data for corpora of any genre and size in a simple and effective way.File | Dimensione | Formato | |
---|---|---|---|
2021_Sciumbata-2_et-al RITT.pdf
accesso aperto
Descrizione: articolo
Tipologia:
Documento in Versione Editoriale
Licenza:
Creative commons
Dimensione
2.73 MB
Formato
Adobe PDF
|
2.73 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.