An analysis of the ingredients for learning interpretable symbolic regression models with human-in-the-loop and genetic programming

Nadizar, Giorgia; Rovito, Luigi; De Lorenzo, Andrea; Medvet, Eric; Virgolin, Marco

doi:10.1145/3643688

Interpretability is a critical aspect to ensure a fair and responsible use of machine learning (ML) in high-stakes applications. Genetic programming (GP) has been used to obtain interpretable ML models because it operates at the level of functional building blocks: if these building blocks are interpretable, there is a chance that their composition (i.e., the entire ML model) is also interpretable. However, the degree to which a model is interpretable depends on the observer. Motivated by this, we study a recently-introduced human-in-the-loop system that allows the user to steer GP’s generation process to their preferences, which shall be online-learned by an artificial neural network (ANN). We focus on the generation of ML models as analytical functions (i.e., symbolic regression) as this is a key problem in interpretable ML, and propose a two-fold contribution. First, we devise more general representations for the ML models for the ANN to learn upon, to enable the application of the system to a wider range of problems. Second, we delve into a deeper analysis of the system’s components. To this end, we propose an incremental experimental evaluation, aimed at (1) studying the effectiveness by which an ANN can capture the perceived interpretability for simulated users, (2) investigating how the GP’s outcome is affected across different simulated user feedback profiles, and (3) determining whether humans participants would prefer models that were generated with or without their involvement. Our results pose clarity on pros and cons of using a human-in-the-loop approach to discover interpretable ML models with GP.