Ill-defined strings often occur in soft sciences, e.g. in linguistics or in biology. In this paper we consider l-length strings which have in each position one of the three symbols 0 or false, 1 or true, b or irrelevant. We tackle some generalisations of the usual Hamming distance between binary crisp strings which were recently used in computational linguistics. We comment on their metric properties, since these should guide the selection of the clustering algorithm to be used for language classification. The concluding section is devoted to future work, and the string approach, as currently pursued, is compared to alternative approaches.

Hamming-like distances for ill-defined strings in linguistic classification

BORTOLUSSI, LUCA;SGARRO, ANDREA
2007-01-01

Abstract

Ill-defined strings often occur in soft sciences, e.g. in linguistics or in biology. In this paper we consider l-length strings which have in each position one of the three symbols 0 or false, 1 or true, b or irrelevant. We tackle some generalisations of the usual Hamming distance between binary crisp strings which were recently used in computational linguistics. We comment on their metric properties, since these should guide the selection of the clustering algorithm to be used for language classification. The concluding section is devoted to future work, and the string approach, as currently pursued, is compared to alternative approaches.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/1846267
 Avviso

Registrazione in corso di verifica.
La registrazione di questo prodotto non è ancora stata validata in ArTS.

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact