An elastic-degenerate string is a sequence of n sets of strings of total length N. It has been introduced to represent a multiple alignment of several closely-related sequences (e.g., pan-genome) compactly. In this representation, substrings of these sequences that match exactly are collapsed, while in positions where the sequences differ, all possible variants observed at that location are listed. The natural problem that arises is finding all matches of a deterministic pattern of length m in an elastic-degenerate text. There exists a non-combinatorial O(nm1.381+N)-time algorithm to solve this problem on-line [1]. In this paper, we study the same problem under the edit distance model and present an O(k2mG+kN)-time and O(m)-space algorithm, where G is the total number of strings in the elastic-degenerate text and k is the maximum edit distance allowed. We also present a simple O(kmG+kN)-time and O(m)-space algorithm for solving the problem under Hamming distance.

Approximate pattern matching on elastic-degenerate text

Bernardini, G;
2020-01-01

Abstract

An elastic-degenerate string is a sequence of n sets of strings of total length N. It has been introduced to represent a multiple alignment of several closely-related sequences (e.g., pan-genome) compactly. In this representation, substrings of these sequences that match exactly are collapsed, while in positions where the sequences differ, all possible variants observed at that location are listed. The natural problem that arises is finding all matches of a deterministic pattern of length m in an elastic-degenerate text. There exists a non-combinatorial O(nm1.381+N)-time algorithm to solve this problem on-line [1]. In this paper, we study the same problem under the edit distance model and present an O(k2mG+kN)-time and O(m)-space algorithm, where G is the total number of strings in the elastic-degenerate text and k is the maximum edit distance allowed. We also present a simple O(kmG+kN)-time and O(m)-space algorithm for solving the problem under Hamming distance.
2020
Pubblicato
https://www.sciencedirect.com/science/article/pii/S0304397519305018
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0304397519305018-main.pdf

Accesso chiuso

Tipologia: Documento in Versione Editoriale
Licenza: Copyright Editore
Dimensione 479.97 kB
Formato Adobe PDF
479.97 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
3019812_1-s2.0-S0304397519305018-main-Post_print.pdf

Accesso chiuso

Tipologia: Bozza finale post-referaggio (post-print)
Licenza: Creative commons
Dimensione 936.9 kB
Formato Adobe PDF
936.9 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/3019812
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 6
social impact