Structural equation modeling with latent variables for longitudinal blood pressure traits using general pedigrees
© The Author(s). 2016
Published: 18 October 2016
Structural equation modeling (SEM) has been used in a wide range of applied sciences including genetic analysis. The recently developed R package, strum, implements a framework for SEM for general pedigree data. We explored different SEM techniques using strum to analyze the multivariate longitudinal data and to ultimately test the association of genotypes on blood pressure traits. The quantitative blood pressure (BP) traits, systolic BP (SBP) and diastolic BP (DBP) were analyzed as the main traits of interest with age, sex, and smoking status as covariates. The single nucleotide polymorphism (SNP) genotype information from genome-wide association studies (GWAS) data was used for the test of association. The adjustment for hypertension treatment effect was done by the censored regression approach. Two different longitudinal data models, autoregressive model and latent growth curve model, were used to fit the longitudinal BP traits. The test of association for SNP was done using a novel score test within the SEM framework of strum. We found the 10 SNPs within the GWAS suggestive P value level, and among those 10, the most significant top 3 SNPs agreed in rank in both analysis models. The general SEM framework in strum is very useful to model and test for the association with massive genotype data and complex systems of multiple phenotypes with general pedigree data.
Structural equation modeling (SEM) has been used in a wide range of applied sciences as well as in genetic analysis [1, 2], particularly for longitudinal data analysis [3, 4]. SEM is a general and powerful approach to account for measurement error and causal pathways by estimating the parameters for a system of simultaneous equations [5, 6]. The R package strum was recently developed , implementing the framework for SEM for general pedigrees described in Morris et al. . It includes both fitting and simulation of a broad range of latent measurement models and structural equation models with covariates, allowing for a wide variety of models including latent growth curve models. It can handle multilevel models, polygenic random effects and linkage random effects. Traditional structural equation models and confirmatory factor analysis may also be performed.
The Genetic Analysis Workshop 19 family data set includes the longitudinal multivariate blood pressure traits. This complexity of traits in this data set provides a good opportunity to evaluate the flexibility and applicability of the strum package for modeling in family data. This paper explores the 2 different SEM techniques using strum to analyze the multivariate longitudinal data and, ultimately, to test the association of genotype to blood pressure (BP) traits, looking for a set of single-nucleotide polymorphisms (SNPs) that came up as significant in both analysis models.
We analyzed the real family data set, which consists of 1389 individuals from 20 families with 27 to 107 members. The detailed description of the data can be found in Almasy et al. . The quantitative BP traits, systolic BP (SBP) and diastolic BP (DBP) were analyzed as the main traits of interest. We included age, sex, and smoking status as covariates. For both methods, we only included the first 3 visits into our analysis as more than 80 % of data were missing for the 4th visit, and there were 10 families with completely missing data for the 4th visit. The genotype information from genome-wide association studies (GWAS) data was used, which included 472,060 SNPs in total. After removing the SNPs with no variation in the data set or with no score test results, the remaining 460,359 SNPs were tested for association with the main traits by coding additively as 0, 1, or 2 based on the minor allele count.
Before being included in any structural equation analysis, the BP trait value at each visit was adjusted for the effect of hypertension medication as done by other researchers, to reduce the bias in the estimated effect of interest and the loss in statistical power . We followed the censored regression approach of Konigorski et al. . The difference between the observed and fitted BP for the untreated individuals and the difference between the adjusted and fitted BP for the treated individuals are used as our main BP trait values (denoted as rSBP and rDBP).
Model 1: autoregressive model with measurement error
This is a first-order AR model with measurement error. In this model, the latent variable at t(n) is a function of the latent variable at t(n-1) and not any variable before, so the true underlying latent variable has an AR structure. The SNP effect is modeled directly on the latent variable in the first time point and indirectly to other time points.
Model 2: latent growth curve model with latent slope and intercept
This is a method to study growth (or change) over time. In this model, the latent variables at different time points share a common intercept with different slopes. The model includes the SNP effects directly on the intercept, so the SNP effects all time points equally. This approach models individual change process as function of latent intercept and slope factors.
Based on the original strum framework, we developed a new score test. This method is a computationally rapid test of association with many SNPs in GWAS data (manuscript in preparation). In this new score test, we first assessed the null model fit without any SNPs in the model to confirm the appropriateness of the model for the data. For each model, we ran the analysis 3 times to make sure the results were fully converged. Then, one at a time, each SNP was tested for association with the BP traits.
SNPs associated with SBP and DBP in both analysis models
AR P value
LG P value
There have been several recent genetic studies on BP traits [12, 13]. In most studies, 2 BP traits, SBP and DBP, are analyzed separately or they are summarized into 1 value. In addition, the longitudinal values are also summarized into a value. In our study, we report the SNPs associated with the latent variable for both BP traits longitudinally. Therefore, our results and the results from the association test on the summarized BP trait may not be easily comparable, and our results provide different GWAS findings. However, the differences and agreements of the results from ours and from the analysis done in each time points separately might give another interesting and useful insight into the relation between BP traits and genotypes.
The unbalanced missing rates in each time points with the longitudinal data were a limitation with this study. We were only able to include the first 3 visits into our analysis since there were 10 families with the completely missing data for the 4th visit which would have reduced the effective sample size to 10 from the original 20 families.
Similar results were found from 2 different analysis models, but there were differences in magnitude of P values for the top hits. This might be a result of the differences in the number of parameters in the models. Also, the highly significant P values for the same top 3 SNPs from both models might be a result of the low minor allele frequency (MAF). Upon further examination, we found the minor allele was not present in many families, reducing the effective sample size. The effect of MAF on type I error in family data using this approach needs to be investigated.
The initial version of the novel score test we have developed is computationally efficient enough for genome wide analysis, but its statistical properties need to be more fully evaluated. Among the results from 2 analysis models, that is, the AR model and LG model, we found the 10 SNPs within the GWAS suggestive P value level, and among those 10, the most significant top 3 SNPs agreed in rank in both analysis models. The similar results from both models provide more confidence on the results. The general SEM framework in strum is very useful to model and test for the association with massive genotype data and complex systems of multiple phenotypes with general pedigree data.
This work was supported by the National Institutes of Health grant [R01HL096811-01]. We thank the reviewers for their insightful and helpful comments.
This article has been published as part of BMC Proceedings Volume 10 Supplement 7, 2016: Genetic Analysis Workshop 19: Sequence, Blood Pressure and Expression Data. Summary articles. The full contents of the supplement are available online at http://bmcproc.biomedcentral.com/articles/supplements/volume-10-supplement-7. Publication of the proceedings of Genetic Analysis Workshop 19 was supported by National Institutes of Health grant R01 GM031575.
YES designed study, wrote the manuscript, and performed the analysis. NJM and CMS oversaw the project and revised the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Li R, Tsaih SW, Shockley K, Stylianou IM, Wergedal J, Paigen B, et al. Structural model of analysis of multiple quantitative traits. PLoS Genet. 2006;2(7):e114.View ArticlePubMedPubMed CentralGoogle Scholar
- Stein CM, Morris NJ, Nock NL. Structural equation modeling. In: Elston RC, Satagopan JM, Sun S, editors. Statistical Human Genetics. New York: Springer; 2012. p. 495–512.View ArticleGoogle Scholar
- Chavance M, Escolano S, Romon M, Basdevant A, de Lauzon-Guillain B, Charles MA. Latent variables and structural equation models for longitudinal relationship: an illustration in nutritional epidemiology. BMC Med Res Methodol. 2010;10:37.View ArticlePubMedPubMed CentralGoogle Scholar
- Rosel J, Plewis I. Longitudinal data analysis with structural equations. Methodology. 2008;4(1):37–50.View ArticleGoogle Scholar
- Bollen KA. Structural Equations with Latent Variables. New York: Wiley; 1989.View ArticleGoogle Scholar
- Muthen BO. Beyond SEM: general latent variable modeling. Behaviormetrika. 2002;29:81–117.View ArticleGoogle Scholar
- Song YE, Stein CM, Morris NJ. strum: an R package for structural modeling of latent variables for general pedigrees. BMC Genet. 2015;16:35.View ArticlePubMedPubMed CentralGoogle Scholar
- Morris NJ, Elston RC, Stein CM. A framework for structural equation models in general pedigrees. Hum Hered. 2010;70(4):278–86.View ArticlePubMedGoogle Scholar
- Blangero J, Teslovich TM, Sim X, Almeida MA, Jun G, Dyer TD, Johnson M, Peralta JM, Manning AK, Wood AR, et al. Omics squared: Human genomic, transcriptomic, and phenotypic data for Genetic Analysis Workshop 19. BMC Proc. 2015;9 Suppl 8:S2.Google Scholar
- Tobin MD, Sheehan NA, Scurrah KJ, Burton PR. Adjusting for treatment effects instudies of quantitative traits: antihypertensive therapy and systolic blood pressure. Stat Med. 2005;24(19):2911–35.View ArticlePubMedGoogle Scholar
- Konigorski S, Yilmaz YE, Bull SB. Bivariate genetic association analysis of systolic and diastolic blood pressure by copula models. BMC Proc. 2014;8 Suppl 1:S72.View ArticlePubMedPubMed CentralGoogle Scholar
- Levy D, Ehret GB, Rice K, Verwoert GC, Launer LJ, Dehghan A, et al. Genome-wide association study of blood pressure and hypertension. Nature. 2009;41(6):677–87.Google Scholar
- Ganesh SK, Chasman DI, Larson MG, Guo X, Verwoert G, Bis JC, et al. Effects of long-term average of quantitative blood pressure traits on the detection of genetic associations. Am J Hum Genet. 2014;95(1):49–65.View ArticlePubMedPubMed CentralGoogle Scholar