An integrated genome-wide association analysis on rheumatoid arthritis data
- Jun Zhang^{1}Email author,
- Xiaofeng Zhu^{2} and
- Richard S Cooper^{3}
https://doi.org/10.1186/1753-6561-1-S1-S35
© Zhang et al; licensee BioMed Central Ltd. 2007
Published: 18 December 2007
Abstract
We propose a nonparametric association analysis combining both family and unrelated case-control genotype data. Under the assumption of Hardy-Weinberg equilibrium, we formed an affected group to compare with a group of unaffecteds.
Comparison with traditional case-control chi-square test and transmission-disequilibrium test shows that this new approach has noticeably improved power. All analysis was based on the simulated rheumatoid arthritis data provided by Genetic Analysis Workshop 15. In the situation of population stratification, we also suggest an approach to update the genotype data using principal components. However, the Genetic Analysis Workshop 15 simulation data does not simulate population stratification. All analysis was done without knowledge of the answers.
Background
Traditional linkage analysis has achieved great success in the genetic dissection of mendelian diseases caused by a single gene with large effect. However, it is well known that association analysis has more power than linkage analysis for complex diseases such as rheumatoid arthritis (RA) [1]. Nowadays genome-wide association studies have been widely planned and carried out due to biotechnical improvements and decreasing experimental costs. Traditional approaches to association study designs are either family-based or unrelated case-control subjects based. Here we demonstrate an integrated association analysis using both family and unrelated simulation data on RA from Genetic Analysis Workshop 15 (GAW15).
Methods
Simulated data without population stratification
And p is the estimated average allele frequency of all subjects in the data. For our final data, r = 2; s = 0; l = 140; m = 56; n = 2, and u = v = 200.
In the presence of population stratification
In the situation of population stratification, we suggest an approach to adjust the genotype data using principal components before the above procedures are applied. Unfortunately, the RA data was simulated without a population stratification effect, therefore we only give brief idea of this method here. The rationale of this approach is that across the genome there should be a consistent pattern among allele frequency differences, and that pattern is summarized by principal components to which many markers contribute. We sketch the procedures below. Details may be found in Price et al. [3]. First, pick founders from each family and all unrelated case-controls. Denote the genotype at the i^{th} locus for j^{th} individual by g_{ ij }, i = 1,..., M and j = 1,..., N. Let ${u}_{i}=\frac{1}{N}{\displaystyle \sum _{j=1}^{N}{\text{g}}_{\text{ij}}}$ be the sample mean for i^{th} locus and X = (x_{ ij }) the matrix normalized by subtracting u_{ i }from each row and dividing by $\sqrt{\frac{1}{2}{u}_{i}(1-{u}_{i})}$. Second, compute the estimated covariance matrix of all markers ${\psi}_{M\times M}=\frac{1}{N-1}X{X}^{T}$, and list the first k largest eigenvalues λ_{1},..., λ_{ k }with corresponding eigenvectors v_{1},..., v_{ k }The l^{th} eigenvector v_{ l }= (v_{l1},..., v_{ lM }) gives the l^{th} principal component as ${v}_{l}\cdot g=({v}_{l1},\mathrm{...},{v}_{lM})\cdot ({g}_{1},\mathrm{...},{g}_{M})={\displaystyle \sum _{i=1}^{M}{v}_{li}{g}_{i}}$. Finally, regress genotypes on the markers by ${g}_{ij,update}={g}_{ij}-{\displaystyle \sum _{l=1}^{k}{v}_{li}{\displaystyle \sum _{s=1}^{M}{v}_{ls}{g}_{sj}}}$, where $\sum _{s=1}^{M}{v}_{ls}{g}_{sj}$ is the regression coefficient for l^{th} marker and j^{th} individual.
Results
The most significant SNPs out of the total 9187 markers and their test values with associated p-values before Bonferroni correction.
SNP | Location (cM) | test z | p _{ z } | χ ^{2} | p _{x2} | family z_{ fam } | p _{ fam } | TDT | p _{ TDT } |
---|---|---|---|---|---|---|---|---|---|
SNP6-152 | 49.4300 | 16.10 | 0 | 92.93 | 0 | 12.31 | 0 | 7.57 | 1.87 × 10^{-14} |
SNP6-153 | 49.4606 | 24.07 | 0 | 276.53 | 0 | 16.49 | 0 | 10.52 | 0 |
SNP6-154 | 49.4662 | 23.26 | 0 | 225.80 | 0 | 16.68 | 0 | 10.06 | 0 |
SNP6-155 | 49.6216 | 10.30 | 0 | 57.68 | 3.09 × 10^{-14} | 6.60 | 2.06 × 10^{-11} | 3.88 | 5.22 × 10^{-5} |
Type I error rates of different tests for all markers except those on chromosome 6.
Test | Type I error |
---|---|
Combined z test | 0.0508 |
Family z_{ fam } | 0.0507 |
Case/control χ^{2} | 0.0509 |
TDT | 0.0506 |
Discussion
Under the assumption of Hardy-Weinberg equilibrium, the proposed approach has improved power by combining families of different structures with unrelated subjects, and it also give a potential way to resolve the issue of population stratification. Compared with the traditional TDT test, the proposed test can combine all the available families and may have better power than the TDT because the TDT excludes a certain proportion of families. Under the assumption of no population stratification and low disease prevalence in parents, another simpler test that Risch and Teng describe is to regard all parents from families as unaffected, with the remainder of this test being the same as ours [2]. However, when we carried out this test on the RA data, it led to an inflated type I error rate. At the significance level α = 0.05, the type I error rate reached 0.055. On the other hand, our new proposed test might lose power without the random mating assumption.
Recently Epstein et al. [5] described a likelihood-based approach for combining triads and unrelated subjects, but it requires further work to combine families of different structures. Li et al. [6] also published another likelihood-based approach using hidden Markov model of affected sibling pairs. However, their approaches can not deal with the issue of population stratification. We proposed a principal-component based approach to resolve this, and will test the performance of adjusting population stratification procedure elsewhere.
Declarations
Acknowledgements
The authors are very grateful to the reviewers for their numerous suggestions for improving the format and content of this paper. This work was supported by a grant from National Human Genome Research Institute (R01 HG003054) to XZ.
This article has been published as part of BMC Proceedings Volume 1 Supplement 1, 2007: Genetic Analysis Workshop 15: Gene Expression Analysis and Approaches to Detecting Multiple Functional Loci. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/1?issue=S1.
Authors’ Affiliations
References
- Risch N, Merikangas K: The future of genetic studies of complex human diseases. Science. 1996, 273: 1516-1517. 10.1126/science.273.5281.1516.View ArticlePubMedGoogle Scholar
- Risch N, Teng J: The relative power of family-based and case-control designs for linkage disequilibrium studies of complex human diseases I. DNA Pooling. Genome Res. 1998, 8: 1273-1288.PubMedGoogle Scholar
- Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D: Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006, 38: 904-909. 10.1038/ng1847.View ArticlePubMedGoogle Scholar
- Barrett JC, Fry B, Maller J, Daly MJ: Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005, 21: 263-265. 10.1093/bioinformatics/bth457.View ArticlePubMedGoogle Scholar
- Epstein MP, Veal CD, Trembath RC, Barker JN, Li C, Satten GA: Genetic association analysis using data from triads and unrelated subjects. Am J Hum Genet. 2005, 76: 592-608. 10.1086/429225.View ArticlePubMed CentralPubMedGoogle Scholar
- Li M, Boehnke M, Abecasis G: Efficient study for test of genetic association analysis using sibship data and unrelated cases and controls. Am J Hum Genet. 2006, 78: 778-792. 10.1086/503711.View ArticlePubMed CentralPubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.