Volume 3 Supplement 7
Genetic Analysis Workshop 16
Accommodating population stratification in casecontrol association analysis: a new test and its application to genomewide study on rheumatoid arthritis
 Yufang Zhang^{1},
 Xiangjun Xiao^{1} and
 Kai Wang^{2}Email author
DOI: 10.1186/175365613S7S111
© Zhang et al; licensee BioMed Central Ltd. 2009
Published: 15 December 2009
Abstract
It is well known that conventional association tests can lead to excessive false positives when there is population stratification. We propose a new test for detecting genetic association with a casecontrol study design. Unlike some other methods for handling population stratification, we treat the cases as a population and the controls as another one even though each of them may be a mixture of several subpopulations. A likelihoodratio test is used to test whether the allele frequency of a testing singlenucleotide polymorphism in the case population is the same as that in the control population. This new test is applied to the Genetic Analysis Workshop 16 Problem 1 data on rheumatoid arthritis. Compared with the Pearson chisquare genotype test, the association strength of many singlenucleotide polymorphisms is decreased while the signal at the HLA region on 6p21 is maintained.
Background
One well known drawback of casecontrol study design in genetic association studies is that it may be affected by population stratification. Population stratification is an ethnic confounder. If a sample population is from a recent mixture of different ethnic subpopulations, it may make the cases and controls have different genetic background and spurious association may occur. In order to control the effect of population stratification, genome control [1], structured association [2], and principal components [3] are usually used. These methods try to gather information on population structure from markers not associated with the phenotype (null markers). In this paper, we introduce a likelihoodratio test for genetic association in the presence of population stratification. This method does not make assumptions on the number of subpopulations in cases or in controls, nor does it make use of null markers. This method is then applied to the Genetic Analysis Workshop 16 (GAW16) Problem 1 data set.
Methods
Genotype frequencies in cases and controls
AA  Aa  aa  

Cases  p_{12} = F_{1}p_{1}+(1F_{1})p_{1}^{2}  p_{11} = 2(1F_{1})p_{1}(1p_{1})  p_{10} = F_{1}(1p_{1})+(1F_{1})(1p_{1})^{2} 
Controls  P_{22} = F_{2}p_{2}+(1F_{2})p_{2}^{2}  p_{21} = 2(1F_{2})p_{2}(1p_{2})  p_{20} = F_{2}(1p_{2})+(1F_{2})(1p_{2})^{2} 
in which i = 1 or 2 for cases or controls; for each marker genotype, j = 0, 1, or 2 for zero A allele, one A allele, or two A alleles, respectively. n_{ ij }are observed genotype counts and p_{ ij }are genotype frequencies as listed in Table 1.
The maximization of the likelihood function L(p_{1}, p_{2}, F_{1}, F_{2}) under the alternative hypothesis is straightforward. The maximized estimate of each genotype frequency happens to be the observed genotype frequency in cases and controls. However, there is no explicit solution to the maximization problem under the null hypothesis. To maximize the loglikelihood function under H_{0}, we take the firstorder partial derivatives of the loglikelihood function under the null with respect to F1 and F2 and set them to zero. Each of the two equations gives an expression of F1 or F2 in terms of p. Then a grid search (step size 0.001) over p ranging from 0.001 to 0.999 is used to find the best value of p maximizing the null loglikelihood function.
According to standard statistical theory, it asymptotically follows a chisquare distribution with 1 degree of freedom.
Results
Discussion
We proposed a test for genetic association study in the presence of population stratification. Population stratification is a confounder to the difference of genotype frequencies between cases and controls. Unlike some other methods such as the structured association, the proposed test does not try to classify each individual. Instead, it allows for the difference in the composition of cases and controls by using two of FST coefficients, one for cases and one for controls. Population genetics suggests that the FST for a natural population may be small (for instance, 0.001 or 0.01). This may be true for controls, but no longer true for a selected sample such as cases. It is easy to construct a case sample for which the FST is 0.8 or higher. Our test provides a simple way to reduce the confounding impact of population stratification compared with the Pearson chisquare statistic.
The proposed method attributes any deficiency in heterozygosity in cases or controls to population stratification. Its power to detect association can be compromised when there is no population stratification, especially when the trait is recessive [5]. Because population stratification affects not only FST but also allele frequencies in cases and controls, the proposed method cannot completely eliminate the confounding effect of population stratification. Due to the page limitation, no simulation results comparing the proposed method and the Pearson's chisquare statistic are reported. One reviewer pointed out that this may make it difficult to interpret the difference between these two methods observed in current study. In our unreported simulation studies, the proposed method is still more robust to population stratification than Pearson's chisquare statistic.
Conclusion
A method for detecting association in the presence of population stratification is proposed. Analysis of the GAW16 Problem 1 data on rheumatoid arthritis suggests it is more robust to population stratification than the Pearson's chisquare statistic. The proposed test is implemented in two computer languages, C++ and R. Both versions are available from the authors upon request.
List of abbreviations used
 GAW16:

Genetic Analysis Workshop 16.
Declarations
Acknowledgements
We thank Drs. Deborah Dawson, Trudy Burns, and Jian Huang, GAW16 Group 13 Editors Tony Hinrichs, PhD and Brian Suarez, PhD, and two anonymous reviewers for their valuable comments and suggestions.
This article has been published as part of BMC Proceedings Volume 3 Supplement 7, 2009: Genetic Analysis Workshop 16. The full contents of the supplement are available online at http://www.biomedcentral.com/17536561/3?issue=S7.
Authors’ Affiliations
References
 Devlin B, Roeder K: Genomic control for association studies. Biometrics. 1999, 55: 9971004. 10.1111/j.0006341X.1999.00997.x.View ArticlePubMedGoogle Scholar
 Pritchard JK: Association mapping in structured populations. Am J Hum Genet. 2000, 67: 170181. 10.1086/302959.PubMed CentralView ArticlePubMedGoogle Scholar
 Price AL: Principal components analysis corrects for stratification in genomewide association studies. Nat Genet. 2006, 38: 904909. 10.1038/ng1847.View ArticlePubMedGoogle Scholar
 Wright S: Evolution and the Genetics of Populations, volume 2. The Theory of Gene Frequencies. 1969, Chicago, University of Chicago PressGoogle Scholar
 Won S, Elston RC: The power of independent types of genetic information to detect association in a casecontrol study design. Genet Epidemiol. 2008, 32: 731756. 10.1002/gepi.20341.View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.