This chapter illustrates the theoretical background of the implementation of computational linguistic methods to probe the translation universals hypothesis. Starting from the assumption that both the translation process and the source language impact the linguistic features of translations, we use Labbé’s method for calculating intertextual distance to check whether it can distinguish translated from non-translated texts and proves successful in grouping together texts translated from the same language within a corpus of translations. In addition to compiling a balanced corpus of newspaper articles (both originally written in Italian and translated from several languages), ad hoc procedures are necessary to offset the impact of different text lengths and contents on intertextual distance values. The selection of text chunks of equal length and different language tokens (grammar words, multi-words etc.), along with POS-tagging procedures to identify additional useful linguistic features, provide a promising approach to evaluate different methods to calculate the intertextual distance between translated and non-translated texts (cosine similarity, machine learning, stylometry).
Distanza intertestuale e lingua fonte: premesse teoriche, compilazione di un corpus e procedure di analisi
Ondelli
Membro del Collaboration Group
2017-01-01
Abstract
This chapter illustrates the theoretical background of the implementation of computational linguistic methods to probe the translation universals hypothesis. Starting from the assumption that both the translation process and the source language impact the linguistic features of translations, we use Labbé’s method for calculating intertextual distance to check whether it can distinguish translated from non-translated texts and proves successful in grouping together texts translated from the same language within a corpus of translations. In addition to compiling a balanced corpus of newspaper articles (both originally written in Italian and translated from several languages), ad hoc procedures are necessary to offset the impact of different text lengths and contents on intertextual distance values. The selection of text chunks of equal length and different language tokens (grammar words, multi-words etc.), along with POS-tagging procedures to identify additional useful linguistic features, provide a promising approach to evaluate different methods to calculate the intertextual distance between translated and non-translated texts (cosine similarity, machine learning, stylometry).File | Dimensione | Formato | |
---|---|---|---|
Testi_corpora_confronti metadati.pdf
accesso aperto
Descrizione: frontespizio e sommario del volume
Tipologia:
Documento in Versione Editoriale
Licenza:
Creative commons
Dimensione
84.62 kB
Formato
Adobe PDF
|
84.62 kB | Adobe PDF | Visualizza/Apri |
Distanza intertestuale e lingua fonte premesse teoriche, compilazione di un corpus e procedure di analisi.pdf
accesso aperto
Descrizione: testo del contributo
Tipologia:
Documento in Versione Editoriale
Licenza:
Creative commons
Dimensione
6.15 MB
Formato
Adobe PDF
|
6.15 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.