: Since the development of DNA microarrays and later RNA bulk sequencing, testing with statistically independent samples has been the standard method for detecting genes with different transcription patterns. Single-cell assays challenge these assumptions because individual cells are statistically dependent, and all proposed methodologies present mathematical limitations or computational bottlenecks that prevent a seamless integration of data from many cells and patients simultaneously. In this work, we solve this crucial limitation by introducing a Bayesian framework that retrieves the independence structure at the level of individual patients, separating differences across individuals from actual transcriptional differences. Leveraging multi-GPU and variational inference, our approach excels across different experimental designs and scales to analyse over 10 million cells. This framework enables single-cell differential expression analysis that can finally integrate datasets from large clinical cohorts, atlas projects, or drug-response screens with thousands of samples and millions of cells.

Scalable, fast and accurate differential gene expression testing from millions of cells of multiple patients / Santacatterina, G., Tosato, N., Milite, S., Davydzenka, K., Insaghi, E., Sanguinetti, G., Cozzini, S., Egidi, L., Caravagna, G.. - In: NATURE COMMUNICATIONS. - ISSN 2041-1723. - (2026), pp. ---. [10.1038/s41467-026-74451-9]

Scalable, fast and accurate differential gene expression testing from millions of cells of multiple patients

Giovanni Santacatterina;Niccolò Tosato;Edoardo Insaghi;Guido Sanguinetti;Leonardo Egidi;Giulio Caravagna
2026-01-01

Abstract

: Since the development of DNA microarrays and later RNA bulk sequencing, testing with statistically independent samples has been the standard method for detecting genes with different transcription patterns. Single-cell assays challenge these assumptions because individual cells are statistically dependent, and all proposed methodologies present mathematical limitations or computational bottlenecks that prevent a seamless integration of data from many cells and patients simultaneously. In this work, we solve this crucial limitation by introducing a Bayesian framework that retrieves the independence structure at the level of individual patients, separating differences across individuals from actual transcriptional differences. Leveraging multi-GPU and variational inference, our approach excels across different experimental designs and scales to analyse over 10 million cells. This framework enables single-cell differential expression analysis that can finally integrate datasets from large clinical cohorts, atlas projects, or drug-response screens with thousands of samples and millions of cells.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/3139261
 Avviso

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact