Distanza intertestuale e lingua fonte: premesse teoriche, compilazione di un corpus e procedure di analisi

Ondelli, Stefano

doi:10.13137/978-88-8303-913-3/18479

This chapter illustrates the theoretical background of the implementation of computational linguistic methods to probe the translation universals hypothesis. Starting from the assumption that both the translation process and the source language impact the linguistic features of translations, we use Labbé’s method for calculating intertextual distance to check whether it can distinguish translated from non-translated texts and proves successful in grouping together texts translated from the same language within a corpus of translations. In addition to compiling a balanced corpus of newspaper articles (both originally written in Italian and translated from several languages), ad hoc procedures are necessary to offset the impact of different text lengths and contents on intertextual distance values. The selection of text chunks of equal length and different language tokens (grammar words, multi-words etc.), along with POS-tagging procedures to identify additional useful linguistic features, provide a promising approach to evaluate different methods to calculate the intertextual distance between translated and non-translated texts (cosine similarity, machine learning, stylometry).

Distanza intertestuale e lingua fonte: premesse teoriche, compilazione di un corpus e procedure di analisi / Ondelli, Stefano. - STAMPA. - (2017), pp. 27-42. [10.13137/978-88-8303-913-3/18479]