We consider the problem of identifying within a given document all text items which follow a certain pattern to be specified by a user. In particular, we focus on scenarios in which the task is to be completed very quickly and the user is not able to specify the exact pattern of interest. The key use case corresponds to the interactive exploration of documents in search of snippets that do not fit Boolean, word-based search expressions. We propose an interactive framework in which the user provides examples of the items he is interested in, the system identifies items similar to those provided by the user and progressively refines the similarity criterion by submitting selected queries to the user, in an active learning fashion. The fact that the search is to be executed very quickly places severe requirements on the algorithms that can be used by the system, both for identifying the items and for constructing the queries. We propose and assess experimentally in detail a number of different design options for the components of the learning machinery. The results demonstrate the ability of our approach to achieve effectiveness close to state-of-the-art approaches based on regular expressions, while requiring an execution time which is orders of magnitude shorter.
Interactive Example-based Finding of Text Items
Medvet, Eric
;Bartoli, Alberto;De Lorenzo, Andrea;Tarlao, Fabiano
2020-01-01
Abstract
We consider the problem of identifying within a given document all text items which follow a certain pattern to be specified by a user. In particular, we focus on scenarios in which the task is to be completed very quickly and the user is not able to specify the exact pattern of interest. The key use case corresponds to the interactive exploration of documents in search of snippets that do not fit Boolean, word-based search expressions. We propose an interactive framework in which the user provides examples of the items he is interested in, the system identifies items similar to those provided by the user and progressively refines the similarity criterion by submitting selected queries to the user, in an active learning fashion. The fact that the search is to be executed very quickly places severe requirements on the algorithms that can be used by the system, both for identifying the items and for constructing the queries. We propose and assess experimentally in detail a number of different design options for the components of the learning machinery. The results demonstrate the ability of our approach to achieve effectiveness close to state-of-the-art approaches based on regular expressions, while requiring an execution time which is orders of magnitude shorter.File | Dimensione | Formato | |
---|---|---|---|
2020-ESWA-InteractiveExampleBasedTextFinding.pdf
Open Access dal 07/04/2022
Descrizione: Articolo principale
Tipologia:
Bozza finale post-referaggio (post-print)
Licenza:
Creative commons
Dimensione
448.56 kB
Formato
Adobe PDF
|
448.56 kB | Adobe PDF | Visualizza/Apri |
1-s2.0-S095741742030227X-main.pdf
Accesso chiuso
Descrizione: Supplementary material associated with this articlecan be found,in the onlineversion,at doi:10.1016/j.eswa.2020.113403 under CCBY
Tipologia:
Documento in Versione Editoriale
Licenza:
Copyright Editore
Dimensione
1.29 MB
Formato
Adobe PDF
|
1.29 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.