In recent years, the field of Natural Language Processing (NLP) has made considerable progress with the development of neural network-based models, leading to the creation of various Large Language Models (LLMs). These models have demonstrated strong performance in various NLP tasks, such as language translation, sentiment analysis, and named entity recognition. One notable application of LLMs is their ability to generate code automatically from simple problem descriptions. However, even advanced LLMs frequently generate incorrect code. To address this issue, we extend a recently proposed method that aims to improve the correctness of code generated by LLMs using an evolutionary approach known as Genetic Improvement (GI). Our method involves constructing a dynamic grammar based on the LLM-generated code and using a problem-agnostic fitness function. In our experiments, we evaluated the proposed method on 25 well-known and widely-used problems across four different LLMs, both open-source and proprietary models. We demonstrate that our approach significantly improves the accuracy of code generated by LLMs. Specifically, for problems that the LLM alone does not fully solve, we show that GI significantly improves the initial LLM-generated solution in 50% to 75% of cases across the tested models. Our proposed GI approach remains effective as long as the initial LLM-generated code, despite some errors, provides a solid foundation for constructing a correct program.

Exploring the Effect of Genetic Improvement for Large Language Models-Generated Code

Giovanni Pinna
Co-primo
;
Damiano Ravalico
Co-primo
;
Luigi Rovito
Co-primo
;
Luca Manzoni
Penultimo
;
Andrea De Lorenzo
Ultimo
2025-01-01

Abstract

In recent years, the field of Natural Language Processing (NLP) has made considerable progress with the development of neural network-based models, leading to the creation of various Large Language Models (LLMs). These models have demonstrated strong performance in various NLP tasks, such as language translation, sentiment analysis, and named entity recognition. One notable application of LLMs is their ability to generate code automatically from simple problem descriptions. However, even advanced LLMs frequently generate incorrect code. To address this issue, we extend a recently proposed method that aims to improve the correctness of code generated by LLMs using an evolutionary approach known as Genetic Improvement (GI). Our method involves constructing a dynamic grammar based on the LLM-generated code and using a problem-agnostic fitness function. In our experiments, we evaluated the proposed method on 25 well-known and widely-used problems across four different LLMs, both open-source and proprietary models. We demonstrate that our approach significantly improves the accuracy of code generated by LLMs. Specifically, for problems that the LLM alone does not fully solve, we show that GI significantly improves the initial LLM-generated solution in 50% to 75% of cases across the tested models. Our proposed GI approach remains effective as long as the initial LLM-generated code, despite some errors, provides a solid foundation for constructing a correct program.
File in questo prodotto:
File Dimensione Formato  
s42979-025-04281-x.pdf

accesso aperto

Tipologia: Documento in Versione Editoriale
Licenza: Creative commons
Dimensione 1.28 MB
Formato Adobe PDF
1.28 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/3115302
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact