Background: Structural variants (SVs) play a significant role in gene function and are implicated in numerous human diseases. With advances in sequencing technologies, identifying SVs through whole-genome sequencing (WGS) has become a key area of research. However, variability in SV detection persists due to the wide range of available tools and the absence of standardized methodologies. Methods: We assessed the accuracy of SV detection across various short-read (srWGS) and long-read (lrWGS) sequencing technologies—including Illumina short reads, PacBio long reads, and Oxford Nanopore Technologies (ONT) long reads—using deletion calls from the HG002 benchmark dataset. We examined how variables such as variant calling algorithms, reference genome choice, alignment strategies, and sequencing coverage influence SV detection performance. Results: DRAGEN v4.2 delivered the highest accuracy among ten srWGS callers tested. Notably, leveraging a graph-based multigenome reference improved SV calling in complex genomic regions. Moreover, we proved that combining minimap2 with Manta achieved performance comparable to DRAGEN for srWGS. For PacBio lrWGS data, Sniffles2 outperformed the other two tested tools. For ONT lrWGS, alignment with minimap2—among four aligners tested—consistently led to the best results. At up to 10× coverage, Duet achieved the highest accuracy, while at higher coverages, Dysgu yielded the best results. Conclusions: These results show for the first time that alignment software choice significantly impacts SV calling from srWGS, with results comparable to commercial solutions. For lrWGS, the performance depends on the technology and coverage.

A Hitchhiker Guide to Structural Variant Calling: A Comprehensive Benchmark Through Different Sequencing Technologies

Nardone, Giuseppe Giovanni
Primo
;
Santin, Aurora
;
Morgan, Anna;Spedicati, Beatrice;Concas, Maria Pina;Gasparini, Paolo;Girotto, Giorgia
Penultimo
;
2025-01-01

Abstract

Background: Structural variants (SVs) play a significant role in gene function and are implicated in numerous human diseases. With advances in sequencing technologies, identifying SVs through whole-genome sequencing (WGS) has become a key area of research. However, variability in SV detection persists due to the wide range of available tools and the absence of standardized methodologies. Methods: We assessed the accuracy of SV detection across various short-read (srWGS) and long-read (lrWGS) sequencing technologies—including Illumina short reads, PacBio long reads, and Oxford Nanopore Technologies (ONT) long reads—using deletion calls from the HG002 benchmark dataset. We examined how variables such as variant calling algorithms, reference genome choice, alignment strategies, and sequencing coverage influence SV detection performance. Results: DRAGEN v4.2 delivered the highest accuracy among ten srWGS callers tested. Notably, leveraging a graph-based multigenome reference improved SV calling in complex genomic regions. Moreover, we proved that combining minimap2 with Manta achieved performance comparable to DRAGEN for srWGS. For PacBio lrWGS data, Sniffles2 outperformed the other two tested tools. For ONT lrWGS, alignment with minimap2—among four aligners tested—consistently led to the best results. At up to 10× coverage, Duet achieved the highest accuracy, while at higher coverages, Dysgu yielded the best results. Conclusions: These results show for the first time that alignment software choice significantly impacts SV calling from srWGS, with results comparable to commercial solutions. For lrWGS, the performance depends on the technology and coverage.
File in questo prodotto:
File Dimensione Formato  
biomedicines-13-01949.pdf

accesso aperto

Descrizione: Materiale supplementare disponibile al link: https://www.mdpi.com/article/10.3390/biomedicines13081949/s1
Tipologia: Documento in Versione Editoriale
Licenza: Creative commons
Dimensione 1.73 MB
Formato Adobe PDF
1.73 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/3115398
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact