Background: The drop of Whole Exome Sequencing (WES) and Whole Genome Sequencing (WGS) prices has started a race toward the generation of denser and more accurate maps of the human genome, but even with the contribute of huge projects as UK10K (The UK10K Consortium, 2015), the resources currently available for Genome Wide Association Studies (GWAS) in terms of sample size and power to detect associations, outdo the ones available for Whole Genome rare variants analyses (e.g UKB (Sudlow et al., 2015) , GIANT (Speliotes et al., 2010) etc. ). GWAS analysis is still the most used tool to date to discover correlations between genotypes and phenotypes also due to the development of imputation algorithms which allow to infer missing geno- types in a sample using a scaffold of known haplotypes(Marchini and Howie, 2010). The release of the 1000 Genomes project data (1000 Genomes Project Consortium et al., 2012) allowed the creation of a reference panel which comprises population from dif- ferent ancestry based on Next generation Sequencing data (Howie et al., 2011): this initial resource proved to be extremely valuable for the scientific community and has been recently updated (Sudmant et al., 2015). Moreover, this showed how useful could be to include WGS data belonging to the population in study in a reference panel for imputation (Sidore et al., 2015). To date the rush for the ‘best panel’ is still open and many collaborations are arising based on data sharing to provide a ‘state of the art’ resource (McCarthy et al., 2016). Research aims: With this work we aim to create a resource which can be used as a tool to improve imputation quality and increase the statistical power of the Italian Network of Genetic Isolates (INGI) cohorts and, at the same time, which will provide us data to have a better insight of the structure and peculiar characteristics of our cohorts compared with outbred populations. Methods: We generated low-coverage WGS data for ∼ 1000 samples belonging to three different INGI cohorts Carlantino (CARL), Friuli Venezia Giulia (FVG) and Val Borbera (VBI) and after a characterization of this data we will proceed with the description of the generation of a reference panel for the imputation which includes both the INGI and the 1000Genomes project phase 3 data.
Whole Genome Sequencing of Italian Isolate Populations to identify rare and characteristic variants and to generate a reference panel for imputation / Cocca, Massimiliano. - (2017 Mar 27).
Whole Genome Sequencing of Italian Isolate Populations to identify rare and characteristic variants and to generate a reference panel for imputation.
COCCA, MASSIMILIANO
2017-03-27
Abstract
Background: The drop of Whole Exome Sequencing (WES) and Whole Genome Sequencing (WGS) prices has started a race toward the generation of denser and more accurate maps of the human genome, but even with the contribute of huge projects as UK10K (The UK10K Consortium, 2015), the resources currently available for Genome Wide Association Studies (GWAS) in terms of sample size and power to detect associations, outdo the ones available for Whole Genome rare variants analyses (e.g UKB (Sudlow et al., 2015) , GIANT (Speliotes et al., 2010) etc. ). GWAS analysis is still the most used tool to date to discover correlations between genotypes and phenotypes also due to the development of imputation algorithms which allow to infer missing geno- types in a sample using a scaffold of known haplotypes(Marchini and Howie, 2010). The release of the 1000 Genomes project data (1000 Genomes Project Consortium et al., 2012) allowed the creation of a reference panel which comprises population from dif- ferent ancestry based on Next generation Sequencing data (Howie et al., 2011): this initial resource proved to be extremely valuable for the scientific community and has been recently updated (Sudmant et al., 2015). Moreover, this showed how useful could be to include WGS data belonging to the population in study in a reference panel for imputation (Sidore et al., 2015). To date the rush for the ‘best panel’ is still open and many collaborations are arising based on data sharing to provide a ‘state of the art’ resource (McCarthy et al., 2016). Research aims: With this work we aim to create a resource which can be used as a tool to improve imputation quality and increase the statistical power of the Italian Network of Genetic Isolates (INGI) cohorts and, at the same time, which will provide us data to have a better insight of the structure and peculiar characteristics of our cohorts compared with outbred populations. Methods: We generated low-coverage WGS data for ∼ 1000 samples belonging to three different INGI cohorts Carlantino (CARL), Friuli Venezia Giulia (FVG) and Val Borbera (VBI) and after a characterization of this data we will proceed with the description of the generation of a reference panel for the imputation which includes both the INGI and the 1000Genomes project phase 3 data.File | Dimensione | Formato | |
---|---|---|---|
COCCA_PHD_dissertation_23032017.pdf
accesso aperto
Descrizione: tesi di dottorato
Dimensione
10.66 MB
Formato
Adobe PDF
|
10.66 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.