Exploring the Effect of Genetic Improvement for Large Language Models-Generated Code

Pinna, Giovanni; Ravalico, Damiano; Rovito, Luigi; Manzoni, Luca; De Lorenzo, Andrea

doi:10.1007/s42979-025-04281-x

In recent years, the field of Natural Language Processing (NLP) has made considerable progress with the development of neural network-based models, leading to the creation of various Large Language Models (LLMs). These models have demonstrated strong performance in various NLP tasks, such as language translation, sentiment analysis, and named entity recognition. One notable application of LLMs is their ability to generate code automatically from simple problem descriptions. However, even advanced LLMs frequently generate incorrect code. To address this issue, we extend a recently proposed method that aims to improve the correctness of code generated by LLMs using an evolutionary approach known as Genetic Improvement (GI). Our method involves constructing a dynamic grammar based on the LLM-generated code and using a problem-agnostic fitness function. In our experiments, we evaluated the proposed method on 25 well-known and widely-used problems across four different LLMs, both open-source and proprietary models. We demonstrate that our approach significantly improves the accuracy of code generated by LLMs. Specifically, for problems that the LLM alone does not fully solve, we show that GI significantly improves the initial LLM-generated solution in 50% to 75% of cases across the tested models. Our proposed GI approach remains effective as long as the initial LLM-generated code, despite some errors, provides a solid foundation for constructing a correct program.