Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity

Steven Gazal, Omer Weissbrod, Farhad Hormozdiari, Kushal K. Dey, Joseph Nasser, Karthik A. Jagadeesh, Daniel J. Weiner, Huwenbo Shi, Charles P. Fulco, Luke J. O’Connor, Bogdan Pasaniuc, Jesse M. Engreitz & Alkes L. Price.
Nature Genetics. 2022-06-06;(2022)
Abstract
Disease-associated single-nucleotide polymorphisms (SNPs) generally do not implicate target genes, as most disease SNPs are regulatory. Many SNP-to-gene (S2G) linking strategies have been developed to link regulatory SNPs to the genes that they regulate in cis. Here, we developed a heritability-based framework for evaluating and combining different S2G strategies to optimize their informativeness for common disease risk. Our optimal combined S2G strategy (cS2G) included seven constituent S2G strategies and achieved a precision of 0.75 and a recall of 0.33, more than doubling the recall of any individual strategy. We applied cS2G to fine-mapping results for 49 UK Biobank diseases/traits to predict 5,095 causal SNP–gene-disease triplets (with S2G-derived functional interpretation) with high confidence. We further applied cS2G to provide an empirical assessment of disease omnigenicity; we determined that the top 1% of genes explained roughly half of the SNP heritability linked to all genes and that gene-level architectures vary with variant allele frequency.

Related data

Data summary
The list of 19,995 genes, summary statistics of the 63 independent traits, training and validation critical gene sets, S2G and cS2G strategies, SNP annotations, predicted causal SNP–disease pairs from UK Biobank fine-mapping analyses and from the NHGRI-EBI GWAS Catalog and SNP heritability causally explained by SNPs linked to each gene have been made publicly available at https://alkesgroup.broadinstitute.org/cS2G
Data summary
Links for all data sets used to create S2G strategies are provided in Supplementary Table 26.
Data summary
Access to the UK Biobank resource is available via application
Data summary
The GWAS Catalog is available
Data summary
Open Targets SNP–gene pairs are available
Data summary
SNP–gene pairs from ref48 are available
Data summary
The code to estimate precision and recall of S2G strategies and the code to create combined S2G strategies have been made publicly available.
Data summary
The code to estimate precision and recall of S2G strategies and the code to create combined S2G strategies have been made publicly available.