A gene-based approach for testing association of rare alleles
© Xu and George; licensee BioMed Central Ltd. 2011
Published: 29 November 2011
Rare genetic variants have been shown to be important to the susceptibility of common human diseases. Methods for detecting association of rare genetic variants are drawing much attention. In this report, we applied a gene-based approach to the 200 simulated data sets of unrelated individuals. The test can detect the association of some genes with multiple rare variants.
Genome-wide association studies (GWAS) have been promising for identifying the underlying genetic basis of complex disorders. Indeed, many disease susceptibility regions have been identified using this approach. However, there is still “missing heritability” for most common diseases . Part of the reason is that the statistical tests used in traditional GWAS may not have sufficient power to detect the association of rare genetic variants because of low allele counts in a sample. Rare genetic variants have been shown to contribute to the risk in some common disorders [2, 3]. A possible approach is to combine the information at multiple rare genetic variants and test the association collectively in a gene or a pathway [4, 5]. Dering et al.  provides a review of the association methods that combine information from multiple genetic markers. Here, we apply a gene-based approach for testing association of rare alleles  to the 200 simulated data sets of unrelated individuals provided by Genetic Analysis Workshop 17. The empirical type I error rate and power are reported.
Under the null hypothesis of no association of the set of SNPs in a gene with the disease, the average number of mutations in the case and control groups should be equal and the test statistic T G is asymptotically distributed as a central chi-square distribution with one degree of freedom.
Distribution of SNPs within gene
We identified all the SNPs within a specific gene by comparing the nucleotide position information against the starting and ending positions of the gene. Gene AHNAK, which has 231 SNPs, is the gene with the most SNPs. Nonetheless, 1,191 genes have only one SNP. The mean number of SNPs is 8.33 per gene with a variance of 227.6.
Type I error
Significant genes at the 10–4 level from replicate 1
Gene length (bp)
Number of SNPs
1.74 × 10–6
2.23 × 10–6
8.15 × 10–6
1.18 × 10–5
1.51 × 10–5
2.73 × 10–5
2.82 × 10–5
3.26 × 10–5
4.93 × 10–5
6.50 × 10–5
8.57 × 10–5
Number of significant tests across 200 replicates
Number of significant tests
Gene length (bp)
Number of SNPs
α = 0.01
α = 0.001
Association methods for rare genetic variants are attracting much attention in the genome era, especially with the advance of next-generation sequencing technology. Because rare alleles appear in only a few individuals, the traditional single-marker tests have low power. An alternative method is to group genetic variants by gene or pathway and test the variants in one group collectively. In this report, we applied a gene-based approach to the data on unrelated individuals. The test is based on the Poisson mutation process for rare genetic variants. Our results show that this test has modest power in detecting the association of genes when all the underlying genes are considered. The type I error rate seems to be well controlled on average but is inflated in some replicates, as shown in Figure 2. However, it should be noted that the approach for estimating the type I error rate is not rigorous in that in each replicate the estimate is based on only 3,169 “null genes” assumed to be unrelated to the disease status. There could be considerable Monte Carlo error; the assumption that all of the 3,169 genes are unrelated to the disease status may not be true if there are some unknown interactions between some genes used in simulating the disease phenotype and some null genes. It is encouraging to see that the test can detect the association signal at FLT1 and PIK3C2B with relatively good power. Nonetheless, the validity and power of the test depend on the assumption of the distribution of the susceptibility mutations. Apparently, if a particular gene has many susceptibility mutations, then, because all of them are contributing to the disease risk, we would expect a larger difference between the number of mutations in the case and control groups, which could translate into higher power than genes with fewer mutations.
The validity of using the simulated data sets also depends on the simulation model and its compatibility with the test assumptions. Because this is a group test of all the SNPs within one gene, the model might not work well for genes that have only one or two susceptibility mutations, whereas it does work well for genes with more susceptibility mutations, as in the cases of FLT1 and PIK3C2B, which have 11 and 24 susceptibility SNPs, respectively. The simulation model also assumes that all the minor alleles in the model increase disease risk, which may favor some of the collapsing methods.
The proposed gene-based association method can detect the association of some genes with multiple rare variants.
The work of HX and VG was supported by National Institute of Health grant P01HL069999. The Genetic Analysis Workshops are supported by grant R01GM031575 from the National Institute of General Medical Sciences.
This article has been published as part of BMC Proceedings Volume 5 Supplement 9, 2011: Genetic Analysis Workshop 17. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/5?issue=S9.
- Eichler EE, Flint J, Gibson G, Kong A, Leal SM, Moore JH, Nadeau JH: Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet. 2010, 11: 446-450. 10.1038/nrg2809.PubMed CentralView ArticlePubMedGoogle Scholar
- Johnson N, Fletcher O, Palles C, Rudd M, Webb E, Sellick G, dos Santos Silva I, McCormack V, Gibson L, et al: Counting potentially functional variants in BRCA1, BRCA2, and ATM predicts breast cancer susceptibility. Hum Mol Genet. 2007, 16: 1051-1057. 10.1093/hmg/ddm050.View ArticlePubMedGoogle Scholar
- Walsh T, McClellan JM, McCarthy SE, Addington AM, Pierce SB, Cooper GM, Nord AS, Kusenda M, Malhotra D, Bhandari A, et al: Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science. 2008, 320: 539-543. 10.1126/science.1155174.View ArticlePubMedGoogle Scholar
- Li B, Leal SMM: Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008, 83: 311-321. 10.1016/j.ajhg.2008.06.024.PubMed CentralView ArticlePubMedGoogle Scholar
- Madsen BE, Browning SR: A groupwise association test for rare mutations using a weighted sum statistic. PloS Genet. 2009, 5: e1000384-10.1371/journal.pgen.1000384.PubMed CentralView ArticlePubMedGoogle Scholar
- Dering C, Pugh E, Ziegler : Statistical analysis of rare sequence variants: an overview of collapsing methods. Genet Epidemiol. 2011, X (suppl X): X-X.Google Scholar
- Dong H, Luo L, Hong S, Siu H, Xiao Y, Jin L, Chen R, Xiong M: Integrated analysis of mutations, miRNA, and mRNA expression in glioblastoma. BMC Syst Biol. 2010, 4: 163-182. 10.1186/1752-0509-4-163.PubMed CentralView ArticlePubMedGoogle Scholar
- Almasy LA, Dyer TD, Peralta JM, Kent JW, Charlesworth JC, Curran JE, Blangero J: Genetic Analysis Workshop 17 mini-exome simulation. BMC Proc. 2011, 5 (suppl 9): S2-10.1186/1753-6561-5-S9-S2.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.