We consider the problem of constructing a regular expression for information extraction automatically, based only on examples of the desired extraction behavior. We describe an active learning framework that is not aimed at synthesizing a solution from scratch, but rather is aimed at selecting a solution from a set of more than 3000 solutions that have already proven useful in a broad range of practical applications. The user provides only one example of desired extraction and then interactively annotates text snippets selected by the system. The system constructs such queries based on uncertainty sampling, i.e., by selecting the snippet on which it is most uncertain at each learning step. The resulting framework allows solving many practical extraction problems quickly and simply.

Active Learning of Predefined Models for Information Extraction: Selecting Regular Expressions from Examples

Alberto Bartoli;Andrea De Lorenzo;Eric Medvet;Fabiano Tarlao
2019-01-01

Abstract

We consider the problem of constructing a regular expression for information extraction automatically, based only on examples of the desired extraction behavior. We describe an active learning framework that is not aimed at synthesizing a solution from scratch, but rather is aimed at selecting a solution from a set of more than 3000 solutions that have already proven useful in a broad range of practical applications. The user provides only one example of desired extraction and then interactively annotates text snippets selected by the system. The system constructs such queries based on uncertainty sampling, i.e., by selecting the snippet on which it is most uncertain at each learning step. The resulting framework allows solving many practical extraction problems quickly and simply.
File in questo prodotto:
File Dimensione Formato  
2019 - FSDM_Active_Learning_Predefined_Regex.pdf

Accesso chiuso

Tipologia: Documento in Versione Editoriale
Licenza: Copyright Editore
Dimensione 441.47 kB
Formato Adobe PDF
441.47 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
FAIA-320-FAIA190157-fm.pdf

Accesso chiuso

Descrizione: front matter
Tipologia: Altro materiale allegato
Licenza: Copyright Editore
Dimensione 286.3 kB
Formato Adobe PDF
286.3 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/2952101
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact