We consider the problem of identifying within a given document all text items which follow a certain pattern to be specified by a user. In particular, we focus on scenarios in which the task is to be completed very quickly and the user is not able to specify the exact pattern of interest. The key use case corresponds to the interactive exploration of documents in search of snippets that do not fit Boolean, word-based search expressions. We propose an interactive framework in which the user provides examples of the items he is interested in, the system identifies items similar to those provided by the user and progressively refines the similarity criterion by submitting selected queries to the user, in an active learning fashion. The fact that the search is to be executed very quickly places severe requirements on the algorithms that can be used by the system, both for identifying the items and for constructing the queries. We propose and assess experimentally in detail a number of different design options for the components of the learning machinery. The results demonstrate the ability of our approach to achieve effectiveness close to state-of-the-art approaches based on regular expressions, while requiring an execution time which is orders of magnitude shorter.

Interactive Example-based Finding of Text Items

Medvet, Eric
;
Bartoli, Alberto;De Lorenzo, Andrea;Tarlao, Fabiano
2020-01-01

Abstract

We consider the problem of identifying within a given document all text items which follow a certain pattern to be specified by a user. In particular, we focus on scenarios in which the task is to be completed very quickly and the user is not able to specify the exact pattern of interest. The key use case corresponds to the interactive exploration of documents in search of snippets that do not fit Boolean, word-based search expressions. We propose an interactive framework in which the user provides examples of the items he is interested in, the system identifies items similar to those provided by the user and progressively refines the similarity criterion by submitting selected queries to the user, in an active learning fashion. The fact that the search is to be executed very quickly places severe requirements on the algorithms that can be used by the system, both for identifying the items and for constructing the queries. We propose and assess experimentally in detail a number of different design options for the components of the learning machinery. The results demonstrate the ability of our approach to achieve effectiveness close to state-of-the-art approaches based on regular expressions, while requiring an execution time which is orders of magnitude shorter.
2020
Pubblicato
https://www.sciencedirect.com/science/article/pii/S095741742030227X
File in questo prodotto:
File Dimensione Formato  
2020-ESWA-InteractiveExampleBasedTextFinding.pdf

Open Access dal 07/04/2022

Descrizione: Articolo principale
Tipologia: Bozza finale post-referaggio (post-print)
Licenza: Creative commons
Dimensione 448.56 kB
Formato Adobe PDF
448.56 kB Adobe PDF Visualizza/Apri
1-s2.0-S095741742030227X-main.pdf

Accesso chiuso

Descrizione: Supplementary material associated with this articlecan be found,in the onlineversion,at doi:10.1016/j.eswa.2020.113403 under CCBY
Tipologia: Documento in Versione Editoriale
Licenza: Copyright Editore
Dimensione 1.29 MB
Formato Adobe PDF
1.29 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/2961653
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact