We consider the problem of optimizing a controller for agents whose observation and action spaces are continuous, i.e., where the controller is a multivariate real function f: R^n → R^m. We use genetic programming (GP) for solving this optimization problem. Namely, we employ a multi-tree-based GP variant, where a candidate solution is an array of m trees, each encoding a univariate function of the agent observation. We compare this form of optimization against the more common one where the controller is a multi-layer perceptron, with a predefined topology, whose weights are optimized through (neuro)evolution (NE). Moreover, we consider an evolutionary algorithm, GraphEA, that directly evolves graphs, each having n input nodes and m output nodes. We apply these three approaches to the case of simulated modular soft robots, where a robot is an aggregation of identical soft modules, each employing a controller that processes the local observation and produces the local action. We find that, in our scenario, multi-tree-based GP is competitive with NE and tends to produce different behaviors. We then experimentally investigate the possibility of optimizing a controller using another, pre-optimized one, as teacher, i.e., we realize a form of offline imitation learning. We consider all the teacher-learner pairs resulting from the three evolutionary algorithms and find that NE is a better learner than GP and GraphEA. However, controllers obtained through offline imitation learning are far less effective than those obtained through direct evolution. We hypothesize that this gap in effectiveness may be explained by the possibility, given by direct evolution, of exploring during the simulations a larger portion of the observation-action space.

GP for Continuous Control: Teacher or Learner? The Case of Simulated Modular Soft Robots

Medvet, Eric
;
Nadizar, Giorgia
2024-01-01

Abstract

We consider the problem of optimizing a controller for agents whose observation and action spaces are continuous, i.e., where the controller is a multivariate real function f: R^n → R^m. We use genetic programming (GP) for solving this optimization problem. Namely, we employ a multi-tree-based GP variant, where a candidate solution is an array of m trees, each encoding a univariate function of the agent observation. We compare this form of optimization against the more common one where the controller is a multi-layer perceptron, with a predefined topology, whose weights are optimized through (neuro)evolution (NE). Moreover, we consider an evolutionary algorithm, GraphEA, that directly evolves graphs, each having n input nodes and m output nodes. We apply these three approaches to the case of simulated modular soft robots, where a robot is an aggregation of identical soft modules, each employing a controller that processes the local observation and produces the local action. We find that, in our scenario, multi-tree-based GP is competitive with NE and tends to produce different behaviors. We then experimentally investigate the possibility of optimizing a controller using another, pre-optimized one, as teacher, i.e., we realize a form of offline imitation learning. We consider all the teacher-learner pairs resulting from the three evolutionary algorithms and find that NE is a better learner than GP and GraphEA. However, controllers obtained through offline imitation learning are far less effective than those obtained through direct evolution. We hypothesize that this gap in effectiveness may be explained by the possibility, given by direct evolution, of exploring during the simulations a larger portion of the observation-action space.
2024
9789819984121
9789819984138
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/3070998
 Avviso

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact