An essential component of any workflow leveraging digital data consists in the identification and extraction of relevant patterns from a data stream. We consider a scenario in which an extraction inference engine generates an entity extractor automatically from examples of the desired behavior, which take the form of user-provided annotations of the entities to be extracted from a dataset. We propose a methodology for predicting the accuracy of the extractor that may be inferred from the available examples. We propose several prediction techniques and analyze experimentally our proposals in great depth, with reference to extractors consisting of regular expressions. The results suggest that reliable predictions for tasks of practical complexity may indeed be obtained quickly and without actually generating the entity extractor.
Predicting the effectiveness of pattern-based entity extractor inference
BARTOLI, Alberto;DE LORENZO, ANDREA;MEDVET, Eric;TARLAO, FABIANO
2016-01-01
Abstract
An essential component of any workflow leveraging digital data consists in the identification and extraction of relevant patterns from a data stream. We consider a scenario in which an extraction inference engine generates an entity extractor automatically from examples of the desired behavior, which take the form of user-provided annotations of the entities to be extracted from a dataset. We propose a methodology for predicting the accuracy of the extractor that may be inferred from the available examples. We propose several prediction techniques and analyze experimentally our proposals in great depth, with reference to extractors consisting of regular expressions. The results suggest that reliable predictions for tasks of practical complexity may indeed be obtained quickly and without actually generating the entity extractor.File | Dimensione | Formato | |
---|---|---|---|
2016-ASOC-editoriale.pdf
Accesso chiuso
Descrizione: Articolo principale
Tipologia:
Documento in Versione Editoriale
Licenza:
Digital Rights Management non definito
Dimensione
707.26 kB
Formato
Adobe PDF
|
707.26 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
2016_ASOC_PredictingEffectivenessDistanceBased Medved Bartoli.pdf
Open Access dal 21/05/2018
Tipologia:
Bozza finale post-referaggio (post-print)
Licenza:
Creative commons
Dimensione
319.34 kB
Formato
Adobe PDF
|
319.34 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.