Whole Genome Sequencing of Italian Isolate Populations to identify rare and characteristic variants and to generate a reference panel for imputation.

Background: The drop of Whole Exome Sequencing (WES) and Whole Genome Sequencing (WGS) prices has started a race toward the generation of denser and more accurate maps of the human genome, but even with the contribute of huge projects as UK10K (The UK10K Consortium, 2015), the resources currently available for Genome Wide Association Studies (GWAS) in terms of sample size and power to detect associations, outdo the ones available for Whole Genome rare variants analyses (e.g UKB (Sudlow et al., 2015) , GIANT (Speliotes et al., 2010) etc. ). GWAS analysis is still the most used tool to date to discover correlations between genotypes and phenotypes also due to the development of imputation algorithms which allow to infer missing geno- types in a sample using a scaffold of known haplotypes(Marchini and Howie, 2010). The release of the 1000 Genomes project data (1000 Genomes Project Consortium et al., 2012) allowed the creation of a reference panel which comprises population from dif- ferent ancestry based on Next generation Sequencing data (Howie et al., 2011): this initial resource proved to be extremely valuable for the scientific community and has been recently updated (Sudmant et al., 2015). Moreover, this showed how useful could be to include WGS data belonging to the population in study in a reference panel for imputation (Sidore et al., 2015). To date the rush for the ‘best panel’ is still open and many collaborations are arising based on data sharing to provide a ‘state of the art’ resource (McCarthy et al., 2016). Research aims: With this work we aim to create a resource which can be used as a tool to improve imputation quality and increase the statistical power of the Italian Network of Genetic Isolates (INGI) cohorts and, at the same time, which will provide us data to have a better insight of the structure and peculiar characteristics of our cohorts compared with outbred populations. Methods: We generated low-coverage WGS data for ∼ 1000 samples belonging to three different INGI cohorts Carlantino (CARL), Friuli Venezia Giulia (FVG) and Val Borbera (VBI) and after a characterization of this data we will proceed with the description of the generation of a reference panel for the imputation which includes both the INGI and the 1000Genomes project phase 3 data.

Whole Genome Sequencing of Italian Isolate Populations to identify rare and characteristic variants and to generate a reference panel for imputation / Cocca, Massimiliano. - (2017 Mar 27).

Whole Genome Sequencing of Italian Isolate Populations to identify rare and characteristic variants and to generate a reference panel for imputation.

COCCA, MASSIMILIANO

2017-03-27

Abstract

Background: The drop of Whole Exome Sequencing (WES) and Whole Genome Sequencing (WGS) prices has started a race toward the generation of denser and more accurate maps of the human genome, but even with the contribute of huge projects as UK10K (The UK10K Consortium, 2015), the resources currently available for Genome Wide Association Studies (GWAS) in terms of sample size and power to detect associations, outdo the ones available for Whole Genome rare variants analyses (e.g UKB (Sudlow et al., 2015) , GIANT (Speliotes et al., 2010) etc. ). GWAS analysis is still the most used tool to date to discover correlations between genotypes and phenotypes also due to the development of imputation algorithms which allow to infer missing geno- types in a sample using a scaffold of known haplotypes(Marchini and Howie, 2010). The release of the 1000 Genomes project data (1000 Genomes Project Consortium et al., 2012) allowed the creation of a reference panel which comprises population from dif- ferent ancestry based on Next generation Sequencing data (Howie et al., 2011): this initial resource proved to be extremely valuable for the scientific community and has been recently updated (Sudmant et al., 2015). Moreover, this showed how useful could be to include WGS data belonging to the population in study in a reference panel for imputation (Sidore et al., 2015). To date the rush for the ‘best panel’ is still open and many collaborations are arising based on data sharing to provide a ‘state of the art’ resource (McCarthy et al., 2016). Research aims: With this work we aim to create a resource which can be used as a tool to improve imputation quality and increase the statistical power of the Italian Network of Genetic Isolates (INGI) cohorts and, at the same time, which will provide us data to have a better insight of the structure and peculiar characteristics of our cohorts compared with outbred populations. Methods: We generated low-coverage WGS data for ∼ 1000 samples belonging to three different INGI cohorts Carlantino (CARL), Friuli Venezia Giulia (FVG) and Val Borbera (VBI) and after a characterization of this data we will proceed with the description of the generation of a reference panel for the imputation which includes both the INGI and the 1000Genomes project phase 3 data.

Scheda breve

Scheda completa

	Anno di discussione
	
				27-mar-2017
			
	Tutor afferenti all'Ateneo
	
				GASPARINI, PAOLO
			
	Ciclo
	
				29
			
	Anno Accademico
	
				2015/2016
			
	Settori scientifico-disciplinari (validi fino a 24/06/2024)
	
				Settore MED/38 - Pediatria Generale e Specialistica
			
	Editore
	
				Università degli Studi di Trieste
			
	Appare nelle tipologie:
	
				8.1 Tesi di dottorato

File in questo prodotto:

File	Dimensione	Formato
COCCA_PHD_dissertation_23032017.pdf accesso aperto Descrizione: tesi di dottorato Dimensione 10.66 MB Formato Adobe PDF Visualizza/Apri	10.66 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/2908127

Avviso

Registrazione in corso di verifica.
La registrazione di questo prodotto non è ancora stata validata in ArTS.

Citazioni

ND

ND

ND

social impact