Ill-defined strings often occur in soft sciences, e.g. in linguistics or in biology. In this paper we consider l-length strings which have in each position one of the three symbols 0 or false, 1 or true, b or irrelevant. We tackle some generalisations of the usual Hamming distance between binary crisp strings which were recently used in computational linguistics. We comment on their metric properties, since these should guide the selection of the clustering algorithm to be used for language classification. The concluding section is devoted to future work, and the string approach, as currently pursued, is compared to alternative approaches.
Hamming-like distances for ill-defined strings in linguistic classification
BORTOLUSSI, LUCA;SGARRO, ANDREA
2007-01-01
Abstract
Ill-defined strings often occur in soft sciences, e.g. in linguistics or in biology. In this paper we consider l-length strings which have in each position one of the three symbols 0 or false, 1 or true, b or irrelevant. We tackle some generalisations of the usual Hamming distance between binary crisp strings which were recently used in computational linguistics. We comment on their metric properties, since these should guide the selection of the clustering algorithm to be used for language classification. The concluding section is devoted to future work, and the string approach, as currently pursued, is compared to alternative approaches.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.