Admixture mapping analysis in the context of GWAS with GAW18 data
© Chen et al.; licensee BioMed Central Ltd. 2014
Published: 17 June 2014
Admixture mapping is a disease-mapping strategy to identify disease susceptibility variants in an admixed population that is a result of mating between 2 historically separated populations differing in allele frequencies and disease prevalence. With the increasing availability of high-density genotyping data generated in genome-wide association studies, it is of interest to investigate how to apply admixture mapping in the context of the genome-wide association studies and how to adjust for admixture in association tests. In this study, we first evaluated 3 different local ancestry inference methods, LAMP, LAMP-LD, and MULTIMIX. Then we applied admixture mapping analysis based on estimated local ancestry. Finally, we performed association tests with adjustment for local ancestry.
In human genetics studies, several approaches are commonly used to identify disease risk variants, including linkage analysis, association analysis, and admixture mapping. Among these 3 methods, admixture mapping can fill a niche between family based linkage analysis and population-based association analysis, when the disease-causing variants differ in frequency between different ethnic groups because of drift or selection. This approach has been successfully applied in recent studies of African Americans  and Mexican Americans , implicating susceptibility regions for prostate cancer and type 2 diabetes that are associated with ancestry. Of relevance for Genetic Analysis Workshop 18 (GAW18), epidemiological studies have found that rates of hypertension vary markedly in different regions and ethnic groups, suggesting that admixture mapping may be a viable approach to analyzing the GAW18 Mexican American cohort. In this study, we first compared 3 local ancestry inference methods based on several metrics. With the estimated local ancestry, we then performed admixture mapping. Finally, we performed association tests with adjustment for local ancestry.
Data set, reference panels, and preprocessing
All analyses presented in this paper are based on the genotyping data of 109 unrelated individuals with blood pressure information from GAW18. Two individuals, T2DG1101320 and T2DG0800490, were excluded because of their high genotype missing rates. For association analysis with quantitative traits, we used log-transformed systolic and diastolic blood pressure measurements at the second examination of each individual. For binary traits, that is, case or control, all individuals diagnosed with hypertension at least once were classified as cases, whereas others were classified as controls. Because of the methodological purpose of our analysis, we performed the analyses only on chromosome 3.
We utilized CEU and YRI samples of release 27 of merged phases II and III of the International Haplotype Map Project (HapMap) for European and African ancestry, respectively, and the Human Genome Diversity Project (HGDP) samples from the Americas for Native American ancestry, which include 6 Colombian, 13 Karitiana, 22 Maya, 14 Pima, and 8 Surui individuals (denoted as NA). We extracted markers that were present in both the reference panels and the GAW18 subjects and then removed markers with missingness greater than 20%, resulting in a set of 37,438 single-nucleotide polymorphisms (SNPs). Inconsistency in the strand orientation was observed among data from HapMap, HGDP, and GAW18. We realigned HGDP and GAW18 to the orientation of HapMap, resulting in 7004 of the 37,438 SNPs being recoded.
Global and local ancestry estimation
We performed supervised global ancestry estimation of chromosome 3 using ADMIXTURE , with the number of ancestral populations fixed at 3. Global estimates were used as a reference metric to assess the performance of local ancestry estimates. The performance of ADMIXTURE depends highly on the number of markers used. Generally, more markers are needed to perform adequate genome-wide association studies (GWAS) correction than to depict population structure. As a rule of thumb, 100,000 markers (genome-wide) are necessary to perform GWAS correction when populations are within a continent (the context of GAW18). To fully harness the ancestral information from markers, we used 37,348 SNPs for chromosome 3. Thus, we think the global ancestry estimation from ADMIXTURE is of high quality.
We used LAMP, LAMP-LD , and MULTIMIX  to estimate local ancestry. To apply LAMP, we first constructed an ancestry-informative marker (AIM) panel based on the F-statistic (Fst), a commonly used measure of genetic diversity across populations. To calculate Fst, we used allele frequencies for CEU and YRI from HapMap release 27, as well as allele frequencies for Mayan (MAY) and Pima (PMA) from the Allele FREquency Database (ALFRED) . Sets of AIMs were selected so that for each SNP, (a) allele frequency was similar in Mayan and Pima Indians (FstMAY−PMA <0.1); (b) allele frequency was different in CEU and YRI (FstCEU−MAY >0.2 and FstCEU−PMA >0.2 and FstYRI−MAY >0.2 and FstYRI−PMA >0.2); and (c) LD r2 <0.1 for each pair of selected SNPs (this step was automatically performed in LAMP). This resulted in 522 AIMs in total. We ran LAMP in the LAMPANC mode inputting allele frequencies of CEU, YRI, and NA, respectively, with the following configuration parameters: mixture proportions (alpha) = 0.6, 0.1, 0.3; number of generations since admixture (g) = 10; recombination rate (r) = 1e-8; fraction of overlap between adjacent windows (offset) = 0.2; and r2 threshold (ldcutoff) = 0.1.
To apply LAMP-LD, we first phased the reference panel using the SHAPEIT software  with default settings. We then ran LAMP-LD on the GAW18 samples. Considering that LAMP-LD used window-based processing in its model, we tested the effects of different window sizes (150, 100, 75, and 50). We applied MULTIMIX to both phased (by SHAPEIT) and unphased GAW18 samples with phased reference. For phased samples, we used the MULTIMIX_EM algorithm with resolving step. For unphased samples, we used the MULTIMIX_MCMCgeno algorithm with misfitting probabilities equal to the estimation from MULTIMIX_EM.
Estimation of the number of generations since admixture
Given the number of ancestry segments (A) of each individual, we estimate the number of generations (N) since admixture at N = A/4a(1-a)L, where a is the admixture proportion of European ancestry, 4a(1-a) is the number of expected recombination events in a diploid individual, and L is the length of genetic map in morgans (2.217 morgans on chromosome 3).
Admixture mapping analysis
We used a linear regression model similar to the model proposed by Zhu et al.  for admixture analysis. Specifically, let be the residual trait value of individual i after adjusting for age and gender. Let be the European/Native American ancestry at the jth marker, and be the average of the European/Native American ancestry of individual i. We tested the null hypothesis on the model . We selected SNPs whose false discovery rate (FDR) adjusted p-value was less than 0.05.
Association analysis with adjustment for local ancestry
We propose an association test with adjustment for local ancestry based on the model , where is the residual trait value of individual i as defined above, is half of the number of nonreference alleles, is the local ancestral proportion for the kth ancestral population of individual i at the jth marker, and is the allele frequency of the nonreference allele for the kth ancestral population. We tested for association with the null hypothesis:. We selected SNPs whose FDR-adjusted p-value was less than 0.05.
Results and discussion
Comparisons of local ancestry inference methods
Among the 3 methods we compared, LAMP represents traditional methods that infer local ancestry on a predefined AIM panel with several thousand SNPs across the entire genome. It does not use linkage disequilibrium (LD) information and only works when the SNP density is not high. LAMP-LD and MULTIMIX represent methods taking into account background LD and are capable of multiway admixture deconvolution. Phased reference panels are required for LAMP-LD. Compared with LAMP-LD, MULTIMIX is more flexible with input data. It can handle both phased and unphased sample genotypes, as well as phased and unphased reference panels. Both methods involve a window-based processing procedure; MULTIMIX requires an additional boundary-resolving step with a larger computational burden, whereas LAMP-LD resolves the boundaries internally with better efficiency.
Comparisons of different local ancestry inference methods.
Computing time (minutes)2
Number of ancestry segments
Correlation with admixture
20 + 5603
Comparisons of local ancestry estimates
Comparisons of local estimates from LAMP-LD and MULTIMIX.
Window size (SNPs)
SD of correlation1
Mean deviation (%)1
SD of deviation (%)1
Diploid inconsistency (%)
Admixture mapping analysis and association test
We used local ancestry estimates from MULTIMIX and LAMP-LD with window size 100 for both admixture mapping analyses and association tests. No SNP's adjusted FDR p-value is lower than 0.05 for all the tests. This may be as a result of the low power because of the small sample size and/or the lack of genomic regions affecting these traits.
Overall, the admixture mapping analysis for local ancestry is underpowered. The genomic control factors for the test on European ancestry using LAMP and MULTIMIX amounted to 0.894 and 0.936, respectively. We further compared the test statistics (t) from admixture mapping analyses using different local ancestry estimates. In the association tests of diastolic blood pressure with European ancestry, we found 694 SNPs with a t-statistic higher than 3 based on MULTIMIX estimates, compared to 68 for LAMP-LD estimates. We used permutation to assess the significance of this association. Specifically, we permuted the traits and subsequently fitted the regression model 3000 times. For each regression we calculated the number of SNPs with a t-statistic greater than 3. By comparing the observed t-value with the distribution of the statistic from the permutations, we obtained a p-value of 0.027 for MULTIMIX estimates and a p-value of 0.2 for LAMP-LD estimates. This indicates the inconsistent inferences between MULTIMIX and LAMP-LD may lead to different conclusions in downstream analyses.
The small number of investigated individuals was a limitation of the present study. For example, assuming a 9.75% exposure probability among controls (minor allele frequency 5% and dominant penetrance model), a type I error equal to 0.05/37,438, and a sample size of 36 cases and 72 controls (the hypertension prevalence in the sample was 33%), the present study had 80% power to identify a genotype relative risk of 14.4. The low statistical power clearly limits the detection of effects attributable to ancestry adjustment.
Although many methods have been proposed to infer local ancestry in the last several years, most of them are not applicable to 3-way admixed Latino and Hispanic populations, nor can they account for background LD. LAMP-LD and MULTIMIX are newly developed methods to address these challenges. This report shows that both methods performed better than LAMP in the context of GWAS with densely spaced markers. Using global estimates from ADMIXTURE as a standard, this report shows that both methods achieved high accuracy of ancestry estimation at the global level (greater than 95%). However, 18% of the SNPs had different ancestry inferences between the 2 methods. MULTIMIX with phased samples produces much smaller ancestry segments, which is a major cause of the discrepancy at the local level. The statistical properties of MULTIMIX need to be further studied. Consequently, multiway admixture deconvolution at the local level is still a challenging problem.
It has been shown that local ancestry at a SNP might confound with the association signal. Ignoring this could lead to spurious associations . Although no significant association was detected in the present exercise, we note that local ancestry is an important facet of population stratification, and integrating the local heterogeneity into association tests is necessary for admixed samples.
Acknowledgements and declarations
We thank the organizers of GAW18 for providing the data set. This research was supported in part by National Institutes of Health (NIH) grants R01 GM50507, R01 DA12849 and R01 DA030976, and the VA Cooperative Studies Program of the Department of Veterans Affairs. MC, CL, and XC acknowledge support from CSC Scholarship. The GAW18 whole genome sequence data were provided by the T2D-GENES Consortium, which is supported by NIH grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. The other genetic and phenotypic data for GAW18 were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder Study, which are supported by NIH grants P01 HL045222, R01 DK047482, and R01 DK053889. The Genetic Analysis Workshop is supported by NIH grant R01 GM031575.
This article has been published as part of BMC Proceedings Volume 8 Supplement 1, 2014: Genetic Analysis Workshop 18. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcproc/supplements/8/S1. Publication charges for this supplement were funded by the Texas Biomedical Research Institute.
- Reich D, Patterson N, De Jager PL, McDonald GJ, Waliszewska A, Tandon A, Lincoln RR, DeLoa C, Fruhan SA, Cabre P, et al: A whole-genome admixture scan finds a candidate locus for multiple sclerosis susceptibility. Nat Genet. 2005, 37: 1113-1118. 10.1038/ng1646.View ArticlePubMedGoogle Scholar
- Adler S, Pahl M, Abboud H, Nicholas S, Ipp E, Seldin M: Mexican-American admixture mapping analyses for diabetic nephropathy in type 2 diabetes mellitus. Semin Nephrol. 2010, 30: 141-149. 10.1016/j.semnephrol.2010.01.005.PubMed CentralView ArticlePubMedGoogle Scholar
- Alexander DH, Novembre J, Lange K: Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009, 19: 1655-1664. 10.1101/gr.094052.109.PubMed CentralView ArticlePubMedGoogle Scholar
- Baran Y, Pasaniuc B, Sankararaman S, Torgerson DG, Gignoux C, Eng C, Rodriguez-Cintron W, Chapela R, Ford JG, Avila PC, et al: Fast and accurate inference of local ancestry in Latino populations. Bioinformatics. 2012, 28: 1359-1367. 10.1093/bioinformatics/bts144.PubMed CentralView ArticlePubMedGoogle Scholar
- Churchhouse C, Marchini J: Multi-way admixture deconvolution using phased or unphased ancestral panels. Genet Epidemiol. 2013, 37: 1-12. 10.1002/gepi.21692.View ArticlePubMedGoogle Scholar
- The ALlele FREquency Database. [http://alfred.med.yale.edu/alfred/AboutALFRED.asp]
- Delaneau O, Marchini J, Zagury JF: A linear complexity phasing method for thousands of genomes. Nat Methods. 2011, 9: 179-181. 10.1038/nmeth.1785.View ArticlePubMedGoogle Scholar
- Zhu X, Young JH, Fox E, Keating BJ, Franceschini N, Kang S, Tayo B, Adeyemo A, Sun YV, Li Y, et al: Combined admixture mapping and association analysis identifies a novel blood pressure genetic locus on 5p13: contributions from the CARe consortium. Hum Mol Genet. 2011, 20: 2285-2295. 10.1093/hmg/ddr113.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang X, Zhu X, Qin H, Cooper RS, Ewens WJ, Li C, Li M: Adjustment for local ancestry in genetic association analysis of admixed populations. Bioinformatics. 2011, 27 (5): 670-677. 10.1093/bioinformatics/btq709.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.