Learning to explore tree neighbourhoods for phylogenetic inference

Camerota Verdù, Federico Julian; Gasparin, Andrea; Bortolussi, Luca; Castelli, Lorenzo

doi:10.1093/bib/bbaf732

Phylogenetic inference is a key challenge in computational biology, with applications ranging from evolutionary analysis to comparative genomics. The balanced minimum evolution problem (BMEP) offers a well-established formulation of this problem, but remains computationally intractable for large instances. In this work, we propose a reinforcement learning (RL) framework to tackle the BMEP through local search in the space of phylogenetic trees. Our contributions are three-fold: (i) we introduce an improved RL formulation tailored to the structure of phylogenetic inference in the context of the BMEP; (ii) we train an RL agent capable of solving instances with up to 100 taxa; and (iii) we investigate the generalization capabilities of the learned policy across different substitution models, instance sizes, and datasets. To address the limitations of relying solely on the learned policy at inference time, we integrate it into a novel search-based framework that enables effective adaptation during evaluation. Experimental results show that our method outperforms greedy heuristics and matches the performance of state-of-the-art algorithms for the BMEP. When tested under significant distributional shifts, we greatly reduce the gap with state-of-the-art algorithms. This demonstrates the potential of RL applications to phylogenetic inference.

Learning to explore tree neighbourhoods for phylogenetic inference / Camerota Verdù, F.J., Gasparin, A., Bortolussi, L., Castelli, L.. - In: BRIEFINGS IN BIOINFORMATICS. - ISSN 1467-5463. - ELETTRONICO. - 27:1(2026), pp. bbaf732.--bbaf732.-. [10.1093/bib/bbaf732]