Testing for genetic association taking into account phenotypic information of relatives

Uh, Hae-Won; Wijk, Henk Jan van der; Houwing-Duistermaat, Jeanine J

doi:10.1186/1753-6561-3-S7-S123

Volume 3 Supplement 7

Genetic Analysis Workshop 16

Proceedings
Open access
Published: 15 December 2009

Testing for genetic association taking into account phenotypic information of relatives

Hae-Won Uh¹,
Henk Jan van der Wijk¹ &
Jeanine J Houwing-Duistermaat¹

BMC Proceedings volume 3, Article number: S123 (2009) Cite this article

665 Accesses
13 Citations
Metrics details

Abstract

We investigated efficient case-control association analysis using family data. The outcome of interest was coronary heart disease. We employed existing and new methods that take into account the correlations among related individuals to obtain the proper type I error rates. The methods considered for autosomal single-nucleotide polymorphisms were: 1) generalized estimating equations-based methods, 2) variance-modified Cochran-Armitage (MCA) trend test incorporating kinship coefficients, and 3) genotypic modified quasi-likelihood score test. Additionally, for X-linked single-nucleotide polymorphisms we proposed a two-degrees-of-freedom test. Performance of these methods was tested using Framingham Heart Study 500 k array data.

Background

Several single-gene variants associated with coronary heart disease (CHD) using Framingham Heart Study (FHS) 100 k array data were reported previously [1]. Regression models with generalized estimating equations (GEE) [2] as well as family-based association testing using FBAT [3] were used. Both methods do not utilize all family information available. While the FBAT test statistic is based on the use of offspring genotypes conditional on (informative) parental genotypes, the GEE association test uses all individuals with genotype and phenotype data. The latter usually uses an exchangeable working correlation matrix to account for correlation within each sibship. Hence, available parental information is not optimally used.

Our aim is to use family information efficiently. In this paper we study an association between CHD and candidate genes using the binary outcome of CHD directly. The following methods were investigated: 1) a logistic regression model taking into account familial dependence of the observations using GEE, 2) Cochran-Armitage (CA) trend test taking into account the correlations among related individuals when computing the variance, and 3) the extensions of modified quasi-likelihood score (M_QLS) test [4]. The last methods also use phenotypic information of ungenotyped family members for an optimal weighting scheme, and can be used for sibships as well as for nuclear families. Because the first two methods are genotypic tests, we extended the allelic M_QLS test to the corresponding genotypic test (gMQLS), assuming a multiplicative model [5].

Unil now, little has been reported on performance of such test statistics for association on the X chromosome [6, 7]. Because the X chromosome represents 2.5% of the human genome for males and 5% for females, information coming from the X chromosome cannot be ignored. To identify X-linked markers for susceptibility to a disease, we investigate statistics to test for association on the X chromosome in a related sample using GEE and sex-stratified allelic M_QLS test.

Methods

Study sample

We analyzed Problem 2 of Genetic Analysis Workshop 16 data, using GeneChip^® Human Mapping 500 k Array Set provided by the FHS SHARe (SNP Health Association Resource) project. The large pedigrees (n = 841) were broken up into nuclear family units (n = 1,902). The data consist of 2,878 subjects in the Offspring Cohort (n = 2,555) and their parents in the Original Cohort (n = 323). A binary outcome variable was created as any event of hard CHD (n = 225). The details of data sets created and used are described in Table 1.

Table 1 Description of data used for each method

Full size table

Single-nucleotide polymorphism (SNP) selection

We checked inheritance error. PLINK version 1.02 [8] was used for preprocessing of data with the following inclusion thresholds: minor allele fequency ≥ 0.01, missing rate per person ≤ 0.1, missing rate per SNP ≤ 0.1, and Hardy-Weinberg equilibrium p ≥ 0.001. For chromosome 8, by ignoring relatedness between subjects, we conducted allelic tests for the preprocessed 22,207 SNPs (from 27,362 of FHS 500 k SNP resource) using PLINK. Then, 121 SNPs were selected using a threshold of allelic p-values < 0.005. For chromosome X, 8,020 SNPs (from 9,828) were tested, and using the same threshold 35 SNPs were selected.

GEE-based and modified CA trend test

One merit of using pedigrees in a case-control study is that cases with affected relatives might have higher expected frequency of associated alleles than cases without affected relatives. For GEE, an exchangeable working correlation matrix was used to account for correlation within each sibship and each family. However, this correlation is prone to misspecification, and subsequent loss of efficiency may be substantial [9].

Under the null hypothesis of no association between genotype and disease, CA trend test is , where U is a sum of weighted differences of genotype counts between cases and controls. When subjects are biologically related, we need to account for their correlations by computing the variance of U. Slager and Schaid [10] proposed a method in which the variance and covariance terms can be calculated based on identity-by-decent-sharing probabilities. We calculated the covariance using expected identity-by-decent (2 times kinship coefficient); hence, this method is called the modified Cochran-Armitage (MCA) test.

M_QLStest and its extensions

Alternatively, we considered M_QLS test proposed by Thornton and McPeek [4], which is said to be more powerful and more widely applicable. It distinguishes between unaffected controls and controls of unknown phenotype (general population controls), and it also incorporates phenotypic data of relatives with missing genotypes.

Suppose we have n + m sampled individuals with phenotypic information. Let Y = (Y₁, ..., Y_n) denote genotype data of n individuals with non-missing genotype, so that m individuals have missing genotype. Let Φ be the kinship matrix of the non-missing genotype individuals, and Φ_{N, M}between missing and non-missing genotype individuals. The entries of the matrix are 1 on the diagonal and 2ϕ_ijkinship coefficient between the i^th and j^th individual off the diagonal. A_Nand A_Mare the column of the phenotype of the respectively non-missing and missing genotype individuals. The entry in A for the i^th individual from the j^th family is

(1)

with 0 <k < 1 specified to be the population prevalence of the trait. Then, the statistic is given by

where, α = A_N+ Φ^-1 Φ_{N, M}A_M, Γ = α^T(ΦA_N+ Φ_{N, M}A_M)- (1^Tα)² (1^TΦ^-11^T)^-1,

, , and .

We extended the allelic M_QLS test to the corresponding genotypic test, gMQLS, assuming multiplicative model using genotypic mean and the corresponding variance .

For the X-linked SNPs, a simple allele-based test can be constructed by counting alleles, with males contributing a single allele and females two alleles. Because the assumption that the allele frequency does not vary with sex could not be met, we stratified the analysis by sex, and used the allelic M_QLS test. To combine the results we combined the two chi-squared tests to obtain a two-degrees-of-freedom test (xMQLS).

The analyses using new methods have been conducted using functions written by the authors in R [11].

Results

Association study for autosomal SNPs on chromosome 8

We compared the following methods: CA, MCA, GEE, and gMQLS. These tests were performed 1) using Offspring Cohort and 2) using the Original and Offspring Cohorts as described in Table 1. Note that for gMQLS, phenotypic information of un-genotyped individuals was also incorporated. The population prevalence of CHD - k in Eq. (1) - was set as 5%. To compare type 1 error rates, the quantile-quantile plots of 0.5-percentiles (the percentage of SNPs selected) are depicted in Figure 1. The points below the diagonal indicate that allelic tests ignoring relatedness in PLINK overestimated the association. The results are comparable for these selected SNPs.

In Table 2, the top ten ranking SNPs detected by gMQLS using nuclear families are reported. The gMQLS gave more significant results when information of parental generation was included: for example, the p-value decreased from 9.80 × 10^-5 to 1.05 × 10^-5 for RS17094201. None of the SNPs tested were found to have genome-wide significance (nominal p < 5 × 10^-8).

Table 2 p-Values of autosomal SNPs on chromosome 8 using (1) Offspring Cohort and (2) Original and Offspring Cohort

Full size table

Testing association for X-linked SNPs

We performed analysis using GEE adjusted for sex and the two-degrees-of-freedom test, xMQLS. The results of the top ten ranking SNPs using xMQLS are reported in Table 3. The xMQLS gave more significant results compared with other methods (minimum p-value = 6.05 × 10^-7).

Table 3 p-Values of X-linked SNPs using PLINK allelic association test, GEE adjusted for sex assuming an additive model, and xMQLS, a two-degrees-of freedom test

Full size table

Discussion

The fact that the behavior of the GEE-based methods sometimes deviates from other methods may be explained by the fact that the working correlation matrix has not been specified correctly, especially for nuclear families [9]. This can be a disadvantageous feature of the GEE-based methods for family-based genome-wide association study.

We did not perform simulation studies regarding type 1 error rates of the new methods. However, a good performance of the allelic variants has been reported [4, 12], and it is reasonable to expect similar performance from the new tests.

The extended M_QLS tests can be used for different types of families, and also to incorporate phenotypic information of ungenotyped relatives. Therefore, a better performance can be expected by increasing the number of cases. For this, selecting families with many cases might be more efficient.

The use of an allelic test for X-linked SNPs leads to criticism that males have only half the impact on the analysis as females. Instead, Clayton [7] proposed genotype-based tests for association that treat males as homozygous females. For females, we denote genotypes 0, 1, and 2, and genotypes of males are coded as 0 and 2. Then, X-chromosome specific covariances can be used to calculate genotypic trend tests taking into account the family relationship.

The extended M_QLS methods are promising. However, these may not be computationally feasible for family-based genome-wide association study. We recommend these tests to be used in a two-stage approach.

Conclusion

Analyzing family data using all information available in a case-control association study may improve efficiency. Two different subsets of data were considered: one consists of the Offspring Cohort, and the second with nuclear families (Original and Offspring Cohort). To account for relatedness among individuals, we considered first the GEE-based methods. As an alternative, we proposed new methods by extending CA trend test.

To gain efficiency, we also considered the extensions of MQLS test. The last methods utilize most of family information, and therefore might be more efficient than others. Using these methods, we analyzed the real FHS data. The new methods performed well compared with the GEE-based methods.

Adding family information seemed to improve the results. Although only a small number (n = 323) was added, the proportion of cases added (20%) was relatively large compared with that in the sibling-only data (6%). And, the gMQLS test might be more efficient because it incorporates all phenotypic information available - even CHD cases of un-genotyped parents.

For X-linked SNPs, equivalent results were obtained: the xMQLS test outperform the GEE-based methods using these specific data. Further work should be done to evaluate the new methods.

Abbreviations

CA:: Cochran-Armitage
CHD:: Coronary heart disease
FHS:: Framingham Heart Study
GEE:: Generalized estimating equations
gMQLS:: Genotypic test corresponding to the modified quasi-likelihood score
MCA:: Modified Cochran-Armitage
M_QLS:: Modified quasi-likelihood score
SNP:: Single-nucleotide polymorphism
xMQLS:: Two-degrees-of-freedom MQLS.

References

Larson MG, Atwood LD, Benjamin EJ, Cupples LA, D'Agostino RB, Fox CS, Govindaraju DR, Guo CY, Heard-Costa NL, Hwang SJ, Murabito JM, Newton-Cheh C, O'Donnell CJ, Seshadri S, Vasan RS, Wang TJ, Wolf PA, Levy D: Framingham Heart Study 100 K project: genome-wide associations for cardiovascular disease outcomes. BMC Med Genet. 2007, 8 (suppl 1): S5-10.1186/1471-2350-8-S1-S5.
Article PubMed Central PubMed Google Scholar
Liang KY, Zeger SL: Longitudinal data analysis using generalized linear models. Biometrika. 1986, 73: 13-22. 10.1093/biomet/73.1.13.
Article Google Scholar
Laird NM, Horvath S, Xu X: Implementing a unified approach to family-based tests of association. Genet Epidemiol. 2000, 19 (suppl 1): S36-S42. 10.1002/1098-2272(2000)19:1+<::AID-GEPI6>3.0.CO;2-M.
Article PubMed Google Scholar
Thornton T, McPeek MS: Case-control association testing with related individuals: a more powerful quasi-likelihood score test. Am J Hum Genet. 2007, 81: 321-337. 10.1086/519497.
Article PubMed Central CAS PubMed Google Scholar
Sasieni P: From genotypes to genes: doubling the sample size. Biometrics. 1997, 53: 1253-1261. 10.2307/2533494.
Article CAS PubMed Google Scholar
Zheng G, Joo J, Zhang C, Geller NL: Testing association for markers on the X chromosome. Genet Epidemiol. 2007, 31: 834-843. 10.1002/gepi.20244.
Article PubMed Google Scholar
Clayton D: Testing for association on the X chromosome. Biostatistics. 2008, 9: 593-600. 10.1093/biostatistics/kxn007.
Article PubMed Central PubMed Google Scholar
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007, 81: 559-575. 10.1086/519795.
Article PubMed Central CAS PubMed Google Scholar
Wang YG, Carey V: Working correlation structure misspecification, estimation and covariance design: implications for generalised estimating equations performance. Biometrika. 2003, 90: 29-41. 10.1093/biomet/90.1.29.
Article Google Scholar
Slager SL, Schaid DJ: Evaluation of candidate genes in case-control studies: a statistical method to account for related subjects. Am J Hum Genet. 2001, 68: 1457-1462. 10.1086/320608.
Article PubMed Central CAS PubMed Google Scholar
R Development Core Team: A Language and Environment for Statistical Computing. [http://www.R-project.org]
Bourgain C, Hoffjan S, Nicolae R, Newman D, Steiner L, Walker K, Reynolds R, Ober C, McPeek MS: Novel case-control test in a founder population identifies P-selectin as an atopy-susceptibility locus. Am J Hum Genet. 2003, 73: 612-626. 10.1086/378208.
Article PubMed Central CAS PubMed Google Scholar

Download references

Acknowledgements

The Genetic Analysis Workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences. H-WU was supported by grants from IOP Genomics/SenterNovem (IGE05007).

This article has been published as part of BMC Proceedings Volume 3 Supplement 7, 2009: Genetic Analysis Workshop 16. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/3?issue=S7.

Author information

Authors and Affiliations

Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, PO Box 9600, Leiden, 2300RC, The Netherlands
Hae-Won Uh, Henk Jan van der Wijk & Jeanine J Houwing-Duistermaat

Authors

Hae-Won Uh
View author publications
You can also search for this author in PubMed Google Scholar
Henk Jan van der Wijk
View author publications
You can also search for this author in PubMed Google Scholar
Jeanine J Houwing-Duistermaat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hae-Won Uh.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

H-WU performed the analyses and wrote the manuscript. H-WU and JJH-D participated in the development of the methods, and interpreted the results of the analysis. HJvdW participated in data preprocessing. All authors read and approved the final manuscript.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Uh, HW., Wijk, H.J. & Houwing-Duistermaat, J.J. Testing for genetic association taking into account phenotypic information of relatives. BMC Proc 3 (Suppl 7), S123 (2009). https://doi.org/10.1186/1753-6561-3-S7-S123

Download citation

Published: 15 December 2009
DOI: https://doi.org/10.1186/1753-6561-3-S7-S123

Genetic Analysis Workshop 16

Testing for genetic association taking into account phenotypic information of relatives

Abstract

Background