Volume 3 Supplement 7
Family-based genome-wide association study for simulated data of Framingham Heart Study
© Xu et al; licensee BioMed Central Ltd. 2009
Published: 15 December 2009
Genome-wide association studies (GWAS) have quickly become the norm in dissecting the genetic basis of complex diseases. Family-based association approaches have the advantages of being robust to possible hidden population structure in samples. Most of these methods were developed with limited markers. Their applicability and performance for GWAS need to be examined. In this report, we evaluated the properties of the family-based association method implemented by ASSOC in the S.A.G.E package using the simulated data sets for the Framingham Heart Study, and found that ASSOC is a highly useful tool for GWAS.
Genome-wide association studies (GWAS) are gaining popularity in genetic analysis of complex traits with the development of genotyping technology at the genome level. With genetic information at millions of single-nucleotide polymorphisms (SNPs) and other genetic markers, such studies offer great opportunities and also present challenges in developing appropriate statistical analysis methods. The data distributed by Genetic Analysis Workshop 16 provides a great opportunity to examine the strengths and limitations of current statistical methods for GWAS.
Many GWAS have been reported in the literature and many more are being performed. Most of them are population-based association studies and use designs such as case-control studies. Such designs have the advantages that samples are easy to ascertain and that results have relatively high power when analyses are carried out properly. However, it has been shown that population-based approaches, such as case-control studies, can produce spurious associations in the presence of population substructure, especially in large-scale studies at the genomic level [1, 2]. An alternative approach is to use the family-based association methods, such as the transmission-disequilibrium test , the family-based association test (FBAT) , and a regression method  implemented by ASSOC in the S.A.G.E. package. These methods are robust to population substructure and other cryptic relatedness in the samples. However, these methods have been proposed in the era with only a few genetic markers and are intended mostly for candidate-gene studies. Their applicability and performance for GWAS has not been examined. The simulated data set of the Framingham Heart Study (FHS) provides both the family structure and the genotype information at genome-wide SNPs. The underlying simulation models are also provided. It is the purpose of this study to evaluate the performance of the family-based association method implemented by ASSOC in analyzing GWAS.
ASSOC implements a regression method that tests for association between a continuous trait and one or more covariates, including genetic markers from extended family data, and accounts for familial correlations . The program estimates the parameters of a baseline model and those of alternate models that include specific sets of covariates. A likelihood-ratio test is then performed to evaluate the significance of the covariates.
Test results of the 5 SNPs from 500 k panel for major gene effects of HDL
3.20 × 10-18
1.63 × 10-7
We performed association tests of all 1,639 markers on chromosome 19 with simulated HDL data. Out of 1,639 markers, the tests of 18 markers were significant at the 0.01 level. Because the HDL data were simulated using the markers in the 500 k panel and not the markers in the 50 k panel, and since the markers in the two panels are not close, we assume that the markers in the 50 k panel did not contribute to the HDL phenotype. The significance result reflects false positives. Therefore, our results give an empirical type I error probability of 0.011, which agrees well with the nominal level of 0.01.
We further tested the association of the five SNPs representing the major genes with HDL (Table 1). They were all significant at 0.05 level regardless of the mode of inheritance and heritability. The true genetic models are additive for rs8103444, rs8035006, and rs8192719, and dominant for rs10820738 and rsrs3200218. However, if more stringent significance level were used for genome-wide studies, only two markers, rs10820738 and rs3200218, were found significant at 10-6 level. Table 1 also gives the estimated effect size for the five SNPs. SNP rs10820738 has the largest effect size, which is consistent with the simulation model in which it has the largest heritability.
Test results among polygenes affecting HDL on chromosome 19
We also applied FBAT to test the association of HDL with all 6,350 SNPs on chromosome 19. A total of 325 tests are significant at 0.05 level. However, only two polygenes are true positives (rs10420985 and rs599458, with p-values 0.004408 and 0.028563, respectively). None of the major genes was significant with FBAT. Nonetheless, the type I error rate seems well controlled.
Family-based association methods are appealing alternatives for the population-based case-control design because they are robust to population stratification in the samples. Several such methods have been proposed. However, they were all proposed before the current genomic era. As the norm of the field moves to GWAS, the performance and applicability of these methods need to be examined for GWAS. In this report, we examined the performance of a regression-based method , implemented in the program ASSOC in the software package S.A.G.E., using the simulated HDL data in the Framingham Heart Study. Based on the results of the tests with the markers on chromosome 19 in the 50 k panel, we found that ASSOC gives the correct type I error rate. When applied to the markers on chromosome 19 in the 500 k panel (tests performed at 0.05 level), the empirical type I error rate was 0.06, which is slightly inflated. The reason could be that there is linkage disequilibrium between the markers in the causal genes and markers close by, and when we estimated type I error rate, we only excluded the markers in the causal genes with either major or polygenic effects and not those markers in linkage disequilibrium. Therefore, as a general conclusion, ASSOC gives a more-or-less the correct type I error rate, and hence is a valid test for GWAS.
In our analysis, ASSOC detected all five major genes and six of the 15 polygenes for HDL on chromosome 19. In contrast, FBAT detected only two of the 15 polygenes and none of the major genes on chromosome 19. It should be noted that the data may be too limited to give a reliable estimation of the power. However, it is encouraging to see that ASSOC could detect one of the polygenes, rs10403702, whose minor allele frequency is only 0.35%. Current association studies generally focus on common SNPs (e.g., SNPs with minor allele frequency > 5%) based on the common disease, common variants hypothesis [7–10]. The other reason is that the statistical power may not be sufficient for rare SNPs when the sample size is limited. However, recent development in genotyping technology allows efficient genotyping in large samples and there is a call for shifting the paradigm of association studies to rare SNPs because it may be more effective to discover susceptibility genes for common diseases .
In conclusion, the method implemented in ASSOC provides a valid association test for family-based data and is reasonably powerful approach to be applied in GWAS. However, it should be noted that it is also a rather slow method. In our analysis, it took around 10 minutes to test one marker in our Windows-based workstation with 2.13 GHz CPU. It will take substantial amount of time to perform the test for millions of markers in GWAS. Parallel computing would be the solution.
Using the simulated data for the Framingham Heart Study, we found the family-based regression method George et al.  implemented in ASSOC in the S.A.G.E. software is applicable to GWAS. It provides the correct type I error rate and reasonable power. However, this method is computationally time-consuming.
List of abbreviations used
Family-based association test
Framingham Heart Study
Genome-wide association studies
The Genetic Analysis Workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences.
This article has been published as part of BMC Proceedings Volume 3 Supplement 7, 2009: Genetic Analysis Workshop 16. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/3?issue=S7.
- Xu H, Shete S: Effects of population structure on genetic association studies. BMC Genet. 2005, 6 (suppl 1): S109-10.1186/1471-2156-6-S1-S109.PubMed CentralView ArticlePubMedGoogle Scholar
- Marchini J, Cardon LR, Phillips MS, Donnelly P: The effects of human population structure on large genetic association studies. Nat Genet. 2004, 36: 512-517. 10.1038/ng1337.View ArticlePubMedGoogle Scholar
- Spielman RS, McGinnis RE, Ewens WJ: Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet. 1993, 52: 506-516.PubMed CentralPubMedGoogle Scholar
- Rabinowitz D, Laird N: A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Hum Hered. 2000, 50: 211-223. 10.1159/000022918.View ArticlePubMedGoogle Scholar
- George V, Tiwari HK, Zhu X, Elston RC: A test of transmission/disequilibrium for quantitative traits in pedigree data, by multiple regression. Am J Hum Genet. 1999, 65: 236-245. 10.1086/302444.PubMed CentralView ArticlePubMedGoogle Scholar
- Barrett JC, Fry B, Maller J, Daly MJ: Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005, 21: 263-265. 10.1093/bioinformatics/bth457.View ArticlePubMedGoogle Scholar
- Risch N, Merikangas K: The future of genetic studies of complex human diseases. Science. 1996, 273: 1516-1517. 10.1126/science.273.5281.1516.View ArticlePubMedGoogle Scholar
- Muller-Myhsok B, Abel L: Genetic analysis of complex diseases. Science. 1997, 275: 1328-1329.PubMedGoogle Scholar
- Scott WK, Pericak-Vance MA, Haines JL: Genetic analysis of complex diseases. Science. 1997, 275: 1327-10.1126/science.275.5304.1327.View ArticlePubMedGoogle Scholar
- Long AD, Grote MN, Langley CH: Genetic analysis of complex diseases. Science. 1997, 275: 1328-PubMedGoogle Scholar
- Gorlov IP, Gorlova OY, Sunyaev SR, Spitz MR, Amos CI: Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms. Am J Hum Genet. 2008, 82: 100-112. 10.1016/j.ajhg.2007.09.006.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.