Background: The drop of Whole Exome Sequencing (WES) and Whole Genome Sequencing (WGS) prices has started a race toward the generation of denser and more accurate maps of the human genome, but even with the contribute of huge projects as UK10K (The UK10K Consortium, 2015), the resources currently available for Genome Wide Association Studies (GWAS) in terms of sample size and power to detect associations, outdo the ones available for Whole Genome rare variants analyses (e.g UKB (Sudlow et al., 2015) , GIANT (Speliotes et al., 2010) etc. ). GWAS analysis is still the most used tool to date to discover correlations between genotypes and phenotypes also due to the development of imputation algorithms which allow to infer missing geno- types in a sample using a scaffold of known haplotypes(Marchini and Howie, 2010). The release of the 1000 Genomes project data (1000 Genomes Project Consortium et al., 2012) allowed the creation of a reference panel which comprises population from dif- ferent ancestry based on Next generation Sequencing data (Howie et al., 2011): this initial resource proved to be extremely valuable for the scientific community and has been recently updated (Sudmant et al., 2015). Moreover, this showed how useful could be to include WGS data belonging to the population in study in a reference panel for imputation (Sidore et al., 2015). To date the rush for the ‘best panel’ is still open and many collaborations are arising based on data sharing to provide a ‘state of the art’ resource (McCarthy et al., 2016). Research aims: With this work we aim to create a resource which can be used as a tool to improve imputation quality and increase the statistical power of the Italian Network of Genetic Isolates (INGI) cohorts and, at the same time, which will provide us data to have a better insight of the structure and peculiar characteristics of our cohorts compared with outbred populations. Methods: We generated low-coverage WGS data for ∼ 1000 samples belonging to three different INGI cohorts Carlantino (CARL), Friuli Venezia Giulia (FVG) and Val Borbera (VBI) and after a characterization of this data we will proceed with the description of the generation of a reference panel for the imputation which includes both the INGI and the 1000Genomes project phase 3 data.

Whole Genome Sequencing of Italian Isolate Populations to identify rare and characteristic variants and to generate a reference panel for imputation.

COCCA, MASSIMILIANO
2017-03-27

Abstract

Background: The drop of Whole Exome Sequencing (WES) and Whole Genome Sequencing (WGS) prices has started a race toward the generation of denser and more accurate maps of the human genome, but even with the contribute of huge projects as UK10K (The UK10K Consortium, 2015), the resources currently available for Genome Wide Association Studies (GWAS) in terms of sample size and power to detect associations, outdo the ones available for Whole Genome rare variants analyses (e.g UKB (Sudlow et al., 2015) , GIANT (Speliotes et al., 2010) etc. ). GWAS analysis is still the most used tool to date to discover correlations between genotypes and phenotypes also due to the development of imputation algorithms which allow to infer missing geno- types in a sample using a scaffold of known haplotypes(Marchini and Howie, 2010). The release of the 1000 Genomes project data (1000 Genomes Project Consortium et al., 2012) allowed the creation of a reference panel which comprises population from dif- ferent ancestry based on Next generation Sequencing data (Howie et al., 2011): this initial resource proved to be extremely valuable for the scientific community and has been recently updated (Sudmant et al., 2015). Moreover, this showed how useful could be to include WGS data belonging to the population in study in a reference panel for imputation (Sidore et al., 2015). To date the rush for the ‘best panel’ is still open and many collaborations are arising based on data sharing to provide a ‘state of the art’ resource (McCarthy et al., 2016). Research aims: With this work we aim to create a resource which can be used as a tool to improve imputation quality and increase the statistical power of the Italian Network of Genetic Isolates (INGI) cohorts and, at the same time, which will provide us data to have a better insight of the structure and peculiar characteristics of our cohorts compared with outbred populations. Methods: We generated low-coverage WGS data for ∼ 1000 samples belonging to three different INGI cohorts Carlantino (CARL), Friuli Venezia Giulia (FVG) and Val Borbera (VBI) and after a characterization of this data we will proceed with the description of the generation of a reference panel for the imputation which includes both the INGI and the 1000Genomes project phase 3 data.
GASPARINI, PAOLO
29
2015/2016
Settore MED/38 - Pediatria Generale e Specialistica
Università degli Studi di Trieste
File in questo prodotto:
File Dimensione Formato  
COCCA_PHD_dissertation_23032017.pdf

accesso aperto

Descrizione: tesi di dottorato
Dimensione 10.66 MB
Formato Adobe PDF
10.66 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11368/2908127
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact