We consider the problem of constructing a regular expression for information extraction automatically, based only on examples of the desired extraction behavior. We describe an active learning framework that is not aimed at synthesizing a solution from scratch, but rather is aimed at selecting a solution from a set of more than 3000 solutions that have already proven useful in a broad range of practical applications. The user provides only one example of desired extraction and then interactively annotates text snippets selected by the system. The system constructs such queries based on uncertainty sampling, i.e., by selecting the snippet on which it is most uncertain at each learning step. The resulting framework allows solving many practical extraction problems quickly and simply.
Active Learning of Predefined Models for Information Extraction: Selecting Regular Expressions from Examples
Alberto Bartoli;Andrea De Lorenzo;Eric Medvet;Fabiano Tarlao
2019-01-01
Abstract
We consider the problem of constructing a regular expression for information extraction automatically, based only on examples of the desired extraction behavior. We describe an active learning framework that is not aimed at synthesizing a solution from scratch, but rather is aimed at selecting a solution from a set of more than 3000 solutions that have already proven useful in a broad range of practical applications. The user provides only one example of desired extraction and then interactively annotates text snippets selected by the system. The system constructs such queries based on uncertainty sampling, i.e., by selecting the snippet on which it is most uncertain at each learning step. The resulting framework allows solving many practical extraction problems quickly and simply.File | Dimensione | Formato | |
---|---|---|---|
2019 - FSDM_Active_Learning_Predefined_Regex.pdf
Accesso chiuso
Tipologia:
Documento in Versione Editoriale
Licenza:
Copyright Editore
Dimensione
441.47 kB
Formato
Adobe PDF
|
441.47 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
FAIA-320-FAIA190157-fm.pdf
Accesso chiuso
Descrizione: front matter
Tipologia:
Altro materiale allegato
Licenza:
Copyright Editore
Dimensione
286.3 kB
Formato
Adobe PDF
|
286.3 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.