We propose a system for the automatic generation of regular expressions for text-extraction tasks. The user describes the desired task only by means of a set of labeled examples. The generated regexes may be used with common engines such as those that are part of Java, PHP, Perl and so on. Usage of the system does not require any familiarity with regular expressions syntax. We performed an extensive experimental evaluation on 12 different extraction tasks applied to real-world datasets. We obtained very good results in terms of precision and recall, even in comparison to earlier state-of-the-art proposals. Our results are highly promising toward the achievement of a practical surrogate for the specific skills required for generating regular expressions, and significant as a demonstration of what can be achieved with GP-based approaches on modern IT technology.
Automatic Synthesis of Regular Expressions from Examples
BARTOLI, Alberto;DE LORENZO, ANDREA;MEDVET, Eric;
2014-01-01
Abstract
We propose a system for the automatic generation of regular expressions for text-extraction tasks. The user describes the desired task only by means of a set of labeled examples. The generated regexes may be used with common engines such as those that are part of Java, PHP, Perl and so on. Usage of the system does not require any familiarity with regular expressions syntax. We performed an extensive experimental evaluation on 12 different extraction tasks applied to real-world datasets. We obtained very good results in terms of precision and recall, even in comparison to earlier state-of-the-art proposals. Our results are highly promising toward the achievement of a practical surrogate for the specific skills required for generating regular expressions, and significant as a demonstration of what can be achieved with GP-based approaches on modern IT technology.File | Dimensione | Formato | |
---|---|---|---|
2014-Computer-AutomaticSynthesisRegexExamples (2).pdf
accesso aperto
Descrizione: pdf post-print
Tipologia:
Bozza finale post-referaggio (post-print)
Licenza:
Digital Rights Management non definito
Dimensione
252.75 kB
Formato
Adobe PDF
|
252.75 kB | Adobe PDF | Visualizza/Apri |
computer2014.pdf
Accesso chiuso
Descrizione: pdf editoriale
Tipologia:
Documento in Versione Editoriale
Licenza:
Digital Rights Management non definito
Dimensione
1.38 MB
Formato
Adobe PDF
|
1.38 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.