: Since the development of DNA microarrays and later RNA bulk sequencing, testing with statistically independent samples has been the standard method for detecting genes with different transcription patterns. Single-cell assays challenge these assumptions because individual cells are statistically dependent, and all proposed methodologies present mathematical limitations or computational bottlenecks that prevent a seamless integration of data from many cells and patients simultaneously. In this work, we solve this crucial limitation by introducing a Bayesian framework that retrieves the independence structure at the level of individual patients, separating differences across individuals from actual transcriptional differences. Leveraging multi-GPU and variational inference, our approach excels across different experimental designs and scales to analyse over 10 million cells. This framework enables single-cell differential expression analysis that can finally integrate datasets from large clinical cohorts, atlas projects, or drug-response screens with thousands of samples and millions of cells.
Scalable, fast and accurate differential gene expression testing from millions of cells of multiple patients / Santacatterina, G., Tosato, N., Milite, S., Davydzenka, K., Insaghi, E., Sanguinetti, G., Cozzini, S., Egidi, L., Caravagna, G.. - In: NATURE COMMUNICATIONS. - ISSN 2041-1723. - (2026), pp. ---. [10.1038/s41467-026-74451-9]
Scalable, fast and accurate differential gene expression testing from millions of cells of multiple patients
Giovanni Santacatterina;Niccolò Tosato;Edoardo Insaghi;Guido Sanguinetti;Leonardo Egidi;Giulio Caravagna
2026-01-01
Abstract
: Since the development of DNA microarrays and later RNA bulk sequencing, testing with statistically independent samples has been the standard method for detecting genes with different transcription patterns. Single-cell assays challenge these assumptions because individual cells are statistically dependent, and all proposed methodologies present mathematical limitations or computational bottlenecks that prevent a seamless integration of data from many cells and patients simultaneously. In this work, we solve this crucial limitation by introducing a Bayesian framework that retrieves the independence structure at the level of individual patients, separating differences across individuals from actual transcriptional differences. Leveraging multi-GPU and variational inference, our approach excels across different experimental designs and scales to analyse over 10 million cells. This framework enables single-cell differential expression analysis that can finally integrate datasets from large clinical cohorts, atlas projects, or drug-response screens with thousands of samples and millions of cells.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


