Gene prioritization is the process of determining which variants and genes identified in genetic analyses are likely to cause a disease or a variation in a phenotype. For many genes, neither in vitro nor in vivo testing is available, thus assessing their pathogenic role could be challenging, leading to false-positive or false-negative results. In this paper, we propose an innovative score of gene prioritization based on the population of interest. We introduce the concept of singleton-cohort variants (SC variant), a variant that has allele count equal to one in the cohort under study. The difference between the normalized count of SC variants in the coding region and the normalized count of SC variants in the non-coding region should give a hint regarding the level of constraints for that gene in a specific population. This scoring system is negative when there are constraints that allow the presence of SC variants only in the non-coding region; on the contrary, it is positive when there are no constraints. A complimentary score is the sum of SC variants normalized count in both coding and non-coding regions, which could be used as a proxy of positive or strong purifying selection in a specific population. Our methodology showed a high level of constraining for genes such as USP34 in all subpopulations tested (1000 G dataset). In contrast, some genes showed a high negative score only in specific populations, e.g., MYT1L in Europeans, UBR5 in East Asians, and FBXO11 in Africans.

A population-based approach for gene prioritization in understanding complex traits

Mezzavilla M.
;
Gasparini P.
2020-01-01

Abstract

Gene prioritization is the process of determining which variants and genes identified in genetic analyses are likely to cause a disease or a variation in a phenotype. For many genes, neither in vitro nor in vivo testing is available, thus assessing their pathogenic role could be challenging, leading to false-positive or false-negative results. In this paper, we propose an innovative score of gene prioritization based on the population of interest. We introduce the concept of singleton-cohort variants (SC variant), a variant that has allele count equal to one in the cohort under study. The difference between the normalized count of SC variants in the coding region and the normalized count of SC variants in the non-coding region should give a hint regarding the level of constraints for that gene in a specific population. This scoring system is negative when there are constraints that allow the presence of SC variants only in the non-coding region; on the contrary, it is positive when there are no constraints. A complimentary score is the sum of SC variants normalized count in both coding and non-coding regions, which could be used as a proxy of positive or strong purifying selection in a specific population. Our methodology showed a high level of constraining for genes such as USP34 in all subpopulations tested (1000 G dataset). In contrast, some genes showed a high negative score only in specific populations, e.g., MYT1L in Europeans, UBR5 in East Asians, and FBXO11 in Africans.
File in questo prodotto:
File Dimensione Formato  
Mezzavilla2020_Article_APopulation-basedApproachForGe.pdf

Accesso chiuso

Tipologia: Documento in Versione Editoriale
Licenza: Copyright Editore
Dimensione 1.9 MB
Formato Adobe PDF
1.9 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/2964103
Citazioni
  • ???jsp.display-item.citation.pmc??? 2
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 5
social impact