Volume 3 Supplement 7
Inference of disease associations with unmeasured genetic variants by combining results from genome-wide association studies with linkage disequilibrium patterns in a reference data set
© Hadley and Strachan; licensee BioMed Central Ltd. 2009
Published: 15 December 2009
Results from whole-genome association studies of many common diseases are now available. Increasingly, these are being incorporated into meta-analyses to increase the power to detect weak associations with measured single-nucleotide polymorphisms (SNPs). Imputation of genotypes at unmeasured loci has been widely applied using patterns of linkage disequilibrium (LD) observed in the HapMap panels, but there is a need for alternative methods that can utilize the pooled effect estimates from meta-analyses and explore possible associations with SNPs and haplotypes that are not included in HapMap.
By a weighted average technique, we show that association results for common SNPs in an observed data set can be scaled and combined to infer the effect of a genetic variant that has been measured only in an independent reference data set. We show that the ratio p(R-1)/[1 + p(R-1)], where R is the relative risk associated with a measured or unmeasured allele of frequency p, is appropriately scaled by 1/D' and weighted in proportion to r2, both common measures of LD being derived from the reference data set.
We illustrate this computationally simple method by combining the results of a genome-wide association screen from the North American Rheumatoid Arthritis Consortium with LD measures from the British 1958 Birth Cohort, and explore the validity of underlying assumptions about the generalizability of LD from one population to another, and from healthy subjects to subjects with clinical disease.
The HLA allele DRB1*04 has been shown to be more strongly associated with rheumatoid arthritis than nearby tagging single-nucleotide polymorphisms (SNPs) . We propose here a method of inferring the effects of an unmeasured genetic variant (such as DRB1*04) using linkage disequilibrium (LD) measures from an independent reference data set (the British 1958 Birth Cohort) to scale and weight the associations of rheumatoid arthritis with tagSNPs in the North American Rheumatoid Arthritis Consortium (NARAC) data set, supplied as Problem 1 for the Genetic Analysis Workshop 16.
The British 1958 Birth Cohort (B58C) compromises all infants in England, Wales, and Scotland born in one week in 1958. During a follow-up in 2002 to 2004 , a cell-line-backed DNA collection was established as a nationally representative reference set for genetic case-control studies. Field protocols and consent forms were approved by the South East England Multi-Centre Research Ethics Committee. Genome-wide data from the Illumina HumanHap550 Beadarray on 1430 members of the B58C was deposited by the Wellcome Trust Sanger Institute . In addition, data on HLA typing using Dynal technologies were deposited by the Diabetes & Inflammation Laboratory, Cambridge . Further details about these deposits and the B58C DNA Collection are published online .
The NARAC data set , provided as Problem 1 for the Genetic Analysis Workshop 16, consists of individual-level genotype data for 550,000 SNPs tested on the Illumina HumanHap550 Bead Array linked to case/control status for rheumatoid arthritis as well as the HLA alleles at the DRB1 locus. We derived a numerical score to represent the number of DRB1*04 alleles for each individual in both the NARAC and B58C data sets. Association tests between the rheumatoid arthritis case/control status and each tagSNP in the MHC region (chromosome 6; 27-33 Mb ) were generated using Stata™ 9.2.
We tested whether combining the LD patterns from the B58C and the association patterns from NARAC provide an unbiased estimate the effect of the DRB1*04 allele, using only neighboring tagSNPs (± 300 kb) that are common to both data sets. This application has the advantage that the DRB1*04 allele (our target variant) has been measured directly in both data sets, permitting validity checks between our inferred relative risk estimate and the observed effect in the NARAC case-control study.
We define Φobs = q(Robs-1)/[1 + q(Robs-1)], where Robs is the relative risk of disease associated with each copy of a measured allele of frequency q. The expression for Φ is similar to that for the population-attributable risk fraction (PARF) in epidemiological studies. However, in this genetic application, because each individual has two chromosomes, the PARF of a variant is Φ(2-Φ). In the Appendix we show that when the tagSNP is more common than the target variant, Φobs/D' is an unbiased estimator of Φtrue, where Φtrue is the equivalent parameter at the unmeasured target locus and D' is the conventional measure of LD between the measured tagSNP and the target variant in an undiseased population. We also show (see Appendix) that, for any given target variant and study design, the variance of Φobs for each tagSNP under the null hypothesis is inversely proportional to the LD measure r2, relating the measured tagSNP to the unmeasured target variant in the undiseased population.
We calculated a weighted average of the values of Φinfer = Φobs/D' across all tagSNPs in the selected 600-kb region, and derived an inferred relative risk from this pooled estimate of Φ. In this paper we explore the effect of different assumptions about the generalizability of LD measures and selection of tagSNPs upon this point estimate. An empirical variance of this pooled estimate under the null hypothesis was derived from multiple random permutations of the B58C data set.
Individually linked HLA-DRB1 diplotypes and Illumina HumanHap550 genotypes were available for 1217 members of B58C and for 1187 controls and 799 cases from NARAC. The frequency of DRB1*04 was 21.1% in the B58C reference set, 16.5% in the NARAC controls and 52.2% in the NARAC cases. Counting chromosomes among cases and controls, the observed odds ratio of rheumatoid arthritis per copy of the DRB1*04 variant was 5.54 (95%CI 4.78-6.41) corresponding to a Φ of 0.4280.
All subsequent data relates to 156 SNPs common to both data sets where the minor allele frequency (MAF) of the tag is greater than that of DRB1*04, also less than 44% in the reference data set and |D| > 0.01, where D is the covariance measure of LD. These restrictions are imposed to remove SNPs where the effect allele could be inconsistent between the B58C data and the NARAC controls data.
Effect of different choices of tagSNP inclusion criteria on estimation of Φ
SE of pooled Φ
Number of tagSNPs used in estimate
Comparing B58C to the NARAC controls, there was close correspondence in LD patterns. Within the NARAC data, however, the LD measures r2 and t in the region of interest were markedly different in cases and controls, reflecting the strong association between the target variant (DRB1*04) and rheumatoid arthritis. When compared with imputation approaches, the inference method has the advantage of working only with the LD pattern among controls, although it does assume generalizability of the sensitivity parameter s from controls to cases. This latter assumption was supported in the NARAC data. The inference approach that we describe can, with modification, be applied to continuous outcomes and offers an alternative to imputation methods that will be particularly attractive for target variants that are not part of standard panels such as HapMap, and for exploring further genome-wide sets of association measures that have been derived from meta-analysis of a particular disease or quantitative trait.
Haplotype frequencies and disease incidence for variants in positive LD
Target variant (e.g., HLA-DRB1*04)
I a = I b R obs
I b = I o [R(1-s)p+(1-q)-(1-s)p]/[1-q]
I tota l = I b [1+q(R obs -1)] I total = I o [1+p(R-1)]
We show below that Φobs/D' is an unbiased estimator of Φtrue.
Inverse-variance weights proportional to the LD measure r2 are therefore appropriate because p (the minor allele frequency for the target variant) is the same for all tag SNPs and c and N are fixed by the study design.
List of abbreviations used
British 1958 Birth Cohort
North American Rheumatoid Arthritis Consortium
Population-attributable risk fraction
We used genotypes from the British 1958 Birth Cohort DNA collection, funded by Medical Research Council grant G0000934 and Wellcome Trust grant 068545/Z/02. Illumina tagSNP genotypes were deposited by the Wellcome Trust Sanger Institute. HLA data were deposited by the Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge (John Todd, Helen Stevens and Neil Walker), funded by Juvenile Diabetes Research Foundation International, the Wellcome Trust, and the National Institute for Health Research Cambridge Biomedical Research Centre. The Genetic Analysis Workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences.
This article has been published as part of BMC Proceedings Volume 3 Supplement 7, 2009: Genetic Analysis Workshop 16. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/3?issue=S7.
- Newton JL, Harney SM, Wordsworth BP, Brown MA: A review of the MHC genetics of rheumatoid arthritis. Genes Immun. 2004, 5: 151-157. 10.1038/sj.gene.6364045.View ArticlePubMedGoogle Scholar
- Strachan DP, Rudnicka AR, Power C, Shepherd P, Fuller E, Davis A, Gibb I, Kumari M, Rumley A, Macfarlane GJ, Rahi J, Rodgers B, Stansfeld S: Lifecourse influences on health among British adults: effects of region of residence in childhood and adulthood. Int J Epidemiol. 2007, 36: 522-531. 10.1093/ije/dyl309.View ArticlePubMedGoogle Scholar
- van Heel DA, Franke L, Hunt KA, Gwilliam R, Zhernakova A, Inouye M, Wapenaar MC, Barnardo MC, Bethel G, Holmes GK, Feighery C, Jewell D, Kelleher D, Kumar P, Travis S, Walters JR, Sanders DS, Howdle P, Swift J, Playford RJ, McLaren WM, Mearin ML, Mulder CJ, McManus R, McGinnis R, Cardon LR, Deloukas P, Wijmenga C: A genome-wide association study for celiac disease identifies risk variants in the region harboring IL2 and IL21. Nat Genet. 2007, 39: 827-829. 10.1038/ng2058.PubMed CentralView ArticlePubMedGoogle Scholar
- Nejentsev S, Howson JM, Walker NM, Szeszko J, Field SF, Stevens HE, Reynolds P, Hardy M, King E, Masters J, Hulme J, Maier LM, Smyth D, Bailey R, Cooper JD, Ribas G, Campbell RD, Clayton DG, Todd JA, Wellcome Trust Case Control Consortium: Localization of type 1 diabetes susceptibility to the MHC class I genes HLA-B and HLA-A. Nature. 2007, 450: 887-892. 10.1038/nature06406.PubMed CentralView ArticlePubMedGoogle Scholar
- Genetic information from the British 1958 Birth Cohort. [http://www.b58cgene.sgul.ac.uk]
- Jawaheer D, Seldin MF, Amos CI, Chen WV, Shigeta R, Monteiro J, Kern M, Criswell LA, Albani S, Nelson JL, Clegg DO, Pope R, Schroeder HW, Bridges SL, Pisetsky DS, Ward R, Kastner DL, Wilder RL, Pincus T, Callahan LF, Flemming D, Wener MH, Gregersen PK: A genomewide screen in multiplex rheumatoid arthritis families suggests genetic overlap with other autoimmune diseases. Am J Hum Genet. 2001, 68: 927-936. 10.1086/319518.PubMed CentralView ArticlePubMedGoogle Scholar
- de Bakker PI, McVean G, Sabeti PC, Miretti MM, Green T, Marchini J, Ke X, Monsuur AJ, Whittaker P, Delgado M, Morrison J, Richardson A, Walsh EC, Gao X, Galver L, Hart J, Hafler DA, Pericak-Vance M, Todd JA, Daly MJ, Trowsdale J, Wijmenga C, Vyse TJ, Beck S, Murray SS, Carrington M, Gregory S, Deloukas P, Rioux JD: A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC. Nat Genet. 2006, 38: 1166-1172. 10.1038/ng1885.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.