Predicting the effectiveness of pattern-based entity extractor inference

An essential component of any workflow leveraging digital data consists in the identification and extraction of relevant patterns from a data stream. We consider a scenario in which an extraction inference engine generates an entity extractor automatically from examples of the desired behavior, which take the form of user-provided annotations of the entities to be extracted from a dataset. We propose a methodology for predicting the accuracy of the extractor that may be inferred from the available examples. We propose several prediction techniques and analyze experimentally our proposals in great depth, with reference to extractors consisting of regular expressions. The results suggest that reliable predictions for tasks of practical complexity may indeed be obtained quickly and without actually generating the entity extractor.

Predicting the effectiveness of pattern-based entity extractor inference

BARTOLI, Alberto;DE LORENZO, ANDREA;MEDVET, Eric;TARLAO, FABIANO

2016-01-01

Abstract

An essential component of any workflow leveraging digital data consists in the identification and extraction of relevant patterns from a data stream. We consider a scenario in which an extraction inference engine generates an entity extractor automatically from examples of the desired behavior, which take the form of user-provided annotations of the entities to be extracted from a dataset. We propose a methodology for predicting the accuracy of the extractor that may be inferred from the available examples. We propose several prediction techniques and analyze experimentally our proposals in great depth, with reference to extractors consisting of regular expressions. The results suggest that reliable predictions for tasks of practical complexity may indeed be obtained quickly and without actually generating the entity extractor.

Scheda breve

Scheda completa

	Anno
	
				2016
			
	Rivista
	
				APPLIED SOFT COMPUTING
			
	DOI
	
				https://dx.doi.org/10.1016/j.asoc.2016.05.023
			
	Appare nelle tipologie:
	
				1.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
2016-ASOC-editoriale.pdf Accesso chiuso Descrizione: Articolo principale Tipologia: Documento in Versione Editoriale Licenza: Digital Rights Management non definito Dimensione 707.26 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	707.26 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
2016_ASOC_PredictingEffectivenessDistanceBased Medved Bartoli.pdf Open Access dal 21/05/2018 Tipologia: Bozza finale post-referaggio (post-print) Licenza: Creative commons Dimensione 319.34 kB Formato Adobe PDF Visualizza/Apri	319.34 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/2874801

Citazioni

ND

2

2

social impact