The problem of extracting knowledge from large volumes of unstructured textual information has become increasingly important. We consider the problem of extracting text slices that adhere to a syntactic pattern and propose an approach capable of generating the desired pattern automatically, from a few annotated examples. Our approach is based on Genetic Programming and generates extraction patterns in the form of regular expressions that may be input to existing engines without any post-processing. Key feature of our proposal is its ability of discovering automatically whether the extraction task may be solved by a single pattern, or rather a set of multiple patterns is required. We obtain this property by means of a separate-and-conquer strategy: once a candidate pattern provides adequate performance on a subset of the examples, the pattern is inserted into the set of final solutions and the evolutionary search continues on a smaller set of examples including only those not yet solved adequately. Our proposal outperforms an earlier state-of-the-art approach on three challenging datasets.
Titolo: | Learning Text Patterns using Separate-and-Conquer Genetic Programming |
Autori: | |
Data di pubblicazione: | 2015 |
Abstract: | The problem of extracting knowledge from large volumes of unstructured textual information has become increasingly important. We consider the problem of extracting text slices that adhere to a syntactic pattern and propose an approach capable of generating the desired pattern automatically, from a few annotated examples. Our approach is based on Genetic Programming and generates extraction patterns in the form of regular expressions that may be input to existing engines without any post-processing. Key feature of our proposal is its ability of discovering automatically whether the extraction task may be solved by a single pattern, or rather a set of multiple patterns is required. We obtain this property by means of a separate-and-conquer strategy: once a candidate pattern provides adequate performance on a subset of the examples, the pattern is inserted into the set of final solutions and the evolutionary search continues on a smaller set of examples including only those not yet solved adequately. Our proposal outperforms an earlier state-of-the-art approach on three challenging datasets. |
Handle: | http://hdl.handle.net/11368/2832545 |
ISBN: | 9783319165004 9783319165011 |
URL: | http://link.springer.com/chapter/10.1007/978-3-319-16501-1_2 http://link.springer.com/book/10.1007/978-3-319-16501-1 |
Appare nelle tipologie: | 4.1 Contributo in Atti Convegno (Proceeding) |
File in questo prodotto:
File | Descrizione | Tipologia | Licenza | |
---|---|---|---|---|
2015_EuroGP_LearningMultiPatterns.pdf | pdf post-print | Bozza finale post-referaggio (post-print) | Digital Rights Management non definito | Open Access Visualizza/Apri |