Curriculum learning (CL) consists in using a diverse set of user-provided test cases, with varying levels of difficulty and organized in a suitable progression, for learning a policy. The quality of test cases is important to allow optimization techniques as genetic programming (GP) to solve policy search problems. In this work, we evaluate large language models (LLMs) as providers of test cases for GP-based policy search. We consider two policy search tasks, a single-player and a multi-player game, and four LLMs differing in complexity and specialization, which we prompt in order to generate suitable test cases for the two games. We experimentally assess the intrinsic quality of LLM-generated test cases and their utility when inserted in a curriculum consumed by a GP optimization. We evaluate the robustness of the approach with respect to the way cases are scheduled in curricula and with respect to the policy representation, for which we use both graphs and linear programs evolved by GP. We observe that the effectiveness of LLM-assisted CL depends on both the choice of LLM and the design of the prompting and scheduling strategies. These findings highlight important considerations for leveraging LLMs in automated curriculum design for GP-based optimization.

Policy Search through Genetic Programming and LLM-assisted Curriculum Learning

Giorgia Nadizar;Gloria Pietropolli;Luca Manzoni;Eric Medvet
;
2025-01-01

Abstract

Curriculum learning (CL) consists in using a diverse set of user-provided test cases, with varying levels of difficulty and organized in a suitable progression, for learning a policy. The quality of test cases is important to allow optimization techniques as genetic programming (GP) to solve policy search problems. In this work, we evaluate large language models (LLMs) as providers of test cases for GP-based policy search. We consider two policy search tasks, a single-player and a multi-player game, and four LLMs differing in complexity and specialization, which we prompt in order to generate suitable test cases for the two games. We experimentally assess the intrinsic quality of LLM-generated test cases and their utility when inserted in a curriculum consumed by a GP optimization. We evaluate the robustness of the approach with respect to the way cases are scheduled in curricula and with respect to the policy representation, for which we use both graphs and linear programs evolved by GP. We observe that the effectiveness of LLM-assisted CL depends on both the choice of LLM and the design of the prompting and scheduling strategies. These findings highlight important considerations for leveraging LLMs in automated curriculum design for GP-based optimization.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/3119618
 Avviso

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact