Genetic programming benchmarks: looking back and looking forward

Mcdermott, James; Kronberger, Gabriel; Orzechowski, Patryk; Vanneschi, Leonardo; Manzoni, Luca; Kalkreuth, Roman; Castelli, Mauro

doi:10.1145/3578482.3578483

In 2011, something which was latent in discussions of Genetic Programming (GP) was crystallised by Sean Luke in an email to the GP mailing list: “I think GP has a toy problem”. There was a damaging mismatch between the problems used to test GP performance in research papers and the real-world problems that researchers actually cared about. This idea was picked up and expanded by a group of GP researchers to become a discussion, a project, and then two papers: • Genetic programming needs better benchmarks, James McDermott, David R White, Sean Luke, Luca Manzoni, Mauro Castelli, Leonardo Vanneschi, Wojciech Jaśkowski, Krzysztof Krawiec, Robin Harper, Kenneth De Jong, Una-May O’Reilly, GECCO 2012. • Better GP benchmarks: community survey results and proposals, David R White, James McDermott, Mauro Castelli, Luca Manzoni, Brian W Goldman, Gabriel Kronberger, Wojciech Jaśkowski, Una-May O’Reilly, Sean Luke, GPEM 2013. The issues raised in these papers included: • There is a mismatch between benchmark problems and real-world problems; • Easy benchmark problems may give misleading information about performance; • Experimental practice is sometimes inadequate, e.g. in relation to train-test splits and statistical testing; • GP is highly flexible in many respects, making experimental comparisons difficult. Neither paper went so far as to curate a new benchmark suite, but the 2013 paper found community support for “blacklisting” certain problems and proposed possible replacements. On the occasion of the ACM SIGEvo 10-year Impact award for the GECCO 2012 paper, recently awarded at GECCO 2022, we (some of the original authors, and some new ones) take the opportunity to look back and to look forward.