Fast genome-wide pedigree quantitative trait loci analysis using MENDEL
© Zhou et al.; licensee BioMed Central Ltd. 2014
Published: 17 June 2014
The linkage era left a rich legacy of pedigree samples that can be used for modern genome-wide association sequencing (GWAS) or next-generation sequencing (NGS) studies. Family designs are naturally equipped to detect rare variants, control for population stratification, and facilitate the study of parent-of-origin effects. Unfortunately, pedigree likelihoods are notoriously hard to compute, and current software for association mapping in pedigrees is prohibitively slow in processing dense marker maps. In a recent release of the comprehensive genetic analysis software MENDEL, we implemented an ultra-fast score test for association mapping with pedigree-based GWAS or NGS study data. Our implementation (a) works for random sample data, pedigree data, or a mix of both;(b) allows for covariate adjustment, including correction for population stratification;(c) accommodates both univariate and multivariate quantitative traits; and (d) allows missing values in multivariate traits. In this paper, we assess the capabilities of MENDEL on the Genetic Analysis Workshop 18 sequencing data. For instance, when jointly testing the 4 longitudinally measured diastolic blood pressure traits, it takes MENDEL less than 51 minutes on a standard laptop computer to read, quality check, and analyze a data set with 959 individuals and 8.3 million single-nucleotide polymorphisms (SNPs). Our analysis reveals association of one SNP in the q32.2 region of chromosome 1. MENDEL is freely available on http://www.genetics.ucla.edu/software.
Pedigree data are attractive in modern association studies because they permit control of population substructure and study of parent-of-origin effects . Related affecteds are also more likely to share the same disease-predisposing gene than unrelated affecteds. The classical variance component model has been a powerful tool for mapping quantitative trait loci in pedigrees .Polygenic effects are effectively captured by the kinship coefficient matrix as a variance component. In genome-wide association sequencing (GWAS), two alleles of a single nucleotide polymorphism (SNP) shift trait means and can be tested as a fixed effect. However, fitting a variance component model with pedigrees is computationally challenging, especially when it has to be done for a huge number of markers.We reexamine the computational bottlenecks and implement an ultra-fast score test when pedigree structure is explicitly given. Score tests require no additional iteration under the alternative model.All that is needed is evaluation of a quadratic form combining the score vector and the expected information matrix at the maximum likelihood estimates under the null model. Fast pedigree GWAS is now implemented in our software package MENDEL  for easy use by the genetics community. In this paper, we demonstrate the capabilities of MENDEL on the Genetic Analysis Workshop 18 (GAW18) sequencing data.
Quantitative trait locus (QTL) association mapping typically invokes the multivariate Gaussian distribution to model the observed trait values over a pedigree. The standard model (2, Chapter 8) collects the corresponding means into a vector and the corresponding covariances into a matrix and represents the loglikelihood of a pedigree as
Here the variance component is the global kinship coefficient matrix capturing additive polygenic effects, and is a condensed identity coefficient matrix capturing dominance genetic effects. The household effect matrix has entries if individuals and are in the same household and 0 otherwise. Individual environmental contributions and trait measurement errors are incorporated via the identity matrix . When one tests multiple traits, the covariance matrix has to be properly augmented by matrix Kronecker products. QTL fixed effects are captured through the mean component for some predictor matrix and vector of regression coefficients .
To implement likelihood ratio testing, iterative maximum likelihood estimation must be undertaken for each and every SNP under the alternative hypothesis. This unfortunate requirement is the major stumbling block retarding pedigree analysis. Score tests serve as convenient substitutes for likelihood ratio tests. A careful analysis shows that the basic elements of the score statistic can be quickly assembled. In MENDEL , SNPs with the most impressive score test p-values (top 50 by default) are further tested by the more accurate likelihood ratio method, thus achieving a good compromise of speed and power for large-scale QTL analysis.
Our analysis is based on the genotype calls for 959 individuals (464 directly sequenced and the rest imputed) provided in the chrX-geno.csv.gz files. Simulated traits in all 200 replicates (SIMPHEN.1-200) were used for size and power studies in the first example. The second example presents results from a pedigree GWAS performed on chromosome 3 using the traits in the first simulation replicate (SIMPHEN.1). A whole genome QTL analysis for the real phenotype diastolic blood pressure (DBP) is presented in the final example.
Adjustment for environmental effects
where indexes individuals, indexes 3 time points, s are the fixed effects, is an individual level random intercept assumed to be normal with covariance , and are independent standard normal errors. If we stack the traits into a column, this corresponds to a variance component model with a genetic component , where is the kinship coefficient matrix, and an environmental component . LMM is fitted by maximum likelihood (ML).
Summary of environmental effects for traits systolic blood pressure(top), diastolic blood pressure(middle) and Q1 (bottom) in simulation replicate SIMPHEN.1
(2 × 10−11)
(1 × 10−11)
(7 × 10−91)
(6 × 10−1)
(4 × 10−1)
(1 × 10−49)
(4 × 10−43)
(7 × 10−4)
(8 × 10−2)
(7 × 10−5)
(1 × 10−1)
(8 × 10−1)
(9 × 10−1)
(2 × 10−16)
(7 × 10−12)
(5 × 10−9)
(3 × 10−7)
(2 × 10−3)
(1 × 10−3)
(3 × 10−2)
(3 × 10−2)
(7 × 10−1)
(7 × 10−1)
(4 × 10−8)
(1 × 10−9)
(9 × 10−1)
(9 × 10−1)
Size and power study (using SIMPHEN.1-200)
Pedigree Genetic Analysis Workshopon chromosome 3 (using SIMPHEN.1)
Analysis of real phenotypes diastolic blood pressure
The phenotypes (SBP and DBP measured at 4 time points) are available for 1389 members from 20 extended families. The largest family contains 107 individuals; the smallest, 27. Genotypes at 8,348,674 SNPs were available on 959 of the individuals. For brevity, we only present results for the multivariate DBP trait here.
The most significant p-value found by whole genome analysis was 1 × 10−10.5 on chromosome 1 q32.2 region at 210,338,112 base pairs. No other SNPs reached genome-wide significance.
By supplying a comprehensive, fast, and easy-to-use package for GWAS on quantitative traits in general pedigrees, we hope to encourage exploitation of family-based data sets for gene mapping. A gene mapping study should collect as large a sample as possible consistent with economic constraints and consistent trait phenotyping.If the sample includes pedigrees, all the better. Here we have argued that score tests can efficiently handle unrelated individuals, pedigrees, or a mixture. For human studies, in whichcontrolling breeding is forbidden, nature has provided pedigrees segregating every conceivable genetic trait. Many of these pedigrees are known from previous linkage studies and should be treasured as valuable resources.
The authors gratefully acknowledge the National Institutes of Health (NIH) grants GM053275 (EMS and KL) and HG006139 (HZ, EMS, and KL) and National Science Foundation (NSF) grant DMS-1310319 (HZ).The GAW18 WGS data were provided by the T2D-GENES (Type 2 Diabetes Genetic Exploration by Next-generation sequencing in Ethnic Samples)Consortium, which is supported by National Institutes of Health (NIH) grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. The other genetic and phenotypic data for GAW18 were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder Study, which are supported by NIH grants P01 HL045222, R01 DK047482, and R01 DK053889. The GAW is supported by NIH grant R01 GM031575.
This article has been published as part of BMC Proceedings Volume 8 Supplement 1, 2014: Genetic Analysis Workshop 18. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcproc/supplements/8/S1. Publication charges for this supplement were funded by the Texas Biomedical Research Institute.
- Ott J, Kamatani Y, Lathrop M: Family-based designs for genome-wide association studies. Nat Rev Genet. 2011, 12: 465-474.View ArticlePubMedGoogle Scholar
- Lange K: Mathematical and Statistical Methods for Genetic Analysis Statistics for Biology and Health. 2002, New York, Springer-Verlag, 2View ArticleGoogle Scholar
- Lange K, Papp JC, Sinsheimer JS, Sripracha R, Zhou H, Sobel ES: Mendel: the Swiss army knife of genetic analysis programs. Bioinformatics. 2013, 29: 1568-1570. 10.1093/bioinformatics/btt187.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.