Geometric Semantic Genetic Programming () is a powerful variant of Genetic Programming (GP) that defines genetic operators inducing unimodal fitness landscapes. In recent years, a new mutation operator, Geometric Semantic Mutation with Local Search (GSM-LS), has been proposed to include a local search step in the mutation process. The core idea of GSM-LS is to incorporate a linear regression step during mutation, thereby accelerating convergence toward high-quality solutions. While GSM-LS helps the convergence of the evolutionary search, it is prone to overfitting. Thus, it was suggested to apply GSM-LS only for a limited number of generations and then revert to standard geometric semantic mutation. A more recently defined variant of (called -reg) also includes a local search step, but shares similar strengths and weaknesses with GSM-LS. Here, we investigate several strategies to mitigate overfitting in GSM-LS and -reg, ranging from simple regularized regression techniques to adaptive methods that estimate overfitting risk at each mutation. The latter approaches partition the training set into two subsets: one used to perform the mutation, and the other to evaluate the risk of overfitting based on the mutation’s impact on held-out data. Experimental evaluations across seven real-world regression benchmarks show that, while plain GSGP underperforms on all datasets, methods incorporating local search often achieve significantly better test performance. For example, on the Airfoil dataset, the GSM-LS variant achieves a median RMSE below 10 compared to 30 with standard GSGP. On the LD50 and Bioavailability datasets, the proposed gen and ridge-regularized variants effectively mitigate overfitting, reducing test RMSE by up to 40% relative to baseline GSGP. We conclude that local search, when used with regularization strategies, enhances GSGP’s performance and generalization capability across a diverse range of tasks.

Local search, semantics, and genetic programming: a global analysis / Anselmi, F., Castelli, M., D'Onofrio, A., Manzoni, L., Mariot, L., Saletta, M.. - In: SOFT COMPUTING. - ISSN 1432-7643. - 30:3(2026), pp. 1541-1559. [10.1007/s00500-025-11051-7]

Local search, semantics, and genetic programming: a global analysis

Anselmi, Fabio;Castelli, Mauro;d'Onofrio, Alberto;Manzoni, Luca;Mariot, Luca
;
Saletta, Martina
2026-01-01

Abstract

Geometric Semantic Genetic Programming () is a powerful variant of Genetic Programming (GP) that defines genetic operators inducing unimodal fitness landscapes. In recent years, a new mutation operator, Geometric Semantic Mutation with Local Search (GSM-LS), has been proposed to include a local search step in the mutation process. The core idea of GSM-LS is to incorporate a linear regression step during mutation, thereby accelerating convergence toward high-quality solutions. While GSM-LS helps the convergence of the evolutionary search, it is prone to overfitting. Thus, it was suggested to apply GSM-LS only for a limited number of generations and then revert to standard geometric semantic mutation. A more recently defined variant of (called -reg) also includes a local search step, but shares similar strengths and weaknesses with GSM-LS. Here, we investigate several strategies to mitigate overfitting in GSM-LS and -reg, ranging from simple regularized regression techniques to adaptive methods that estimate overfitting risk at each mutation. The latter approaches partition the training set into two subsets: one used to perform the mutation, and the other to evaluate the risk of overfitting based on the mutation’s impact on held-out data. Experimental evaluations across seven real-world regression benchmarks show that, while plain GSGP underperforms on all datasets, methods incorporating local search often achieve significantly better test performance. For example, on the Airfoil dataset, the GSM-LS variant achieves a median RMSE below 10 compared to 30 with standard GSGP. On the LD50 and Bioavailability datasets, the proposed gen and ridge-regularized variants effectively mitigate overfitting, reducing test RMSE by up to 40% relative to baseline GSGP. We conclude that local search, when used with regularization strategies, enhances GSGP’s performance and generalization capability across a diverse range of tasks.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/3139241
 Avviso

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact