- Proceedings
- Open Access
- Published:

# Multipoint association mapping for longitudinal family data: an application to hypertension phenotypes

*BMC Proceedings*
**volume 10**, Article number: 54 (2016)

## Abstract

It is essential to develop adequate statistical methods to fully utilize information from longitudinal family studies. We extend our previous multipoint linkage disequilibrium approach—simultaneously accounting for correlations between markers and repeat measurements within subjects, and the correlations between subjects in families—to detect loci relevant to disease through gene-based analysis. Estimates of disease loci and their genetic effects along with their 95 % confidence intervals (or significance levels) are reported. Four different phenotypes—ever having hypertension at 4 visits, incidence of hypertension, hypertension status at baseline only, and hypertension status at 4 visits—are studied using the proposed approach. The efficiency of estimates of disease locus positions (inverse of standard error) improves when using the phenotypes from 4 visits rather than using baseline only.

## Background

Approaches for analyzing longitudinal family data have been categorized into 2 groups [1]: (a) first summarizing repeated measurements into 1 statistic (eg, a mean or slope per subject) and then using the summarized statistic as a standard outcome for genetic analysis; or (b) simultaneous modeling of genetic and longitudinal parameters. In general, joint modeling is appealing because (a) all parameter estimates are mutually adjusted, and (b) within- and between-individual variability at the levels of gene markers, repeat measurements, and family characteristics are correctly accounted for [1].

The semiparametric linkage disequilibrium mapping for the hybrid family design we developed previously [2] uses all markers simultaneously to localize the disease locus without making an assumption about genetic mechanism, except that only 1 disease gene lies in the region under study. The advantages of this approach are (a) it does not require the specification of an underlying genetic model, so estimating the position of a disease locus and its standard error is robust to a wide variety of genetic mechanisms; (b) it provides estimates of disease locus positions, along with a confidence interval for further fine mapping; and (c) it uses linkage disequilibrium between markers to localize the disease locus, which may not have been typed. We extended this approach to map susceptibility genes using longitudinal nuclear family data with an application to hypertension. Four different outcomes were used based on the proposed method: (I) ever having hypertension (“Ever”), (II) incidence event with status changed from unaffected to affected (“Progression”), (III) first available visit as baseline only (“Baseline”), and (IV) all available time points (“Longitudinal”). We compared the estimates of the disease locus positions, their standard errors, the genetic effect estimate at the disease loci, and their significance for the 4 phenotypes to examine the efficiency gained from using repeated longitudinal phenotypes.

## Methods

### Genome-wide genotypes and phenotype data

Association mapping was conducted on chromosome 3 of the genome-wide association study (GWAS) data. A total of 65,519 single-nucleotide polymorphisms (SNPs) included in 1095 genes were genotyped on chromosome 3 for 959 individuals from 20 original pedigrees in Genetic Analysis Workshop 19 (GAW19). Of these individuals, there were 178 (38 %) affected offspring out of 469 offspring for phenotype (I) “Ever”; 130 (31 %) out of 421 offspring for phenotype (II) “Progression”; 64 (11 %) out of 600 offspring for phenotype (III) “Baseline”; and 60 (11 %) out of 565 offspring to approximately 85 (45 %) out of 189 offspring across the 4 visits (or 87 [21.63 %] out of 402 offspring on average) for phenotype (IV) “Longitudinal” (Table 1). To compare phenotypes (I) and (II), only individuals with at least 2 measurements were included in the “ever” phenotype. PedCut [3] was used to split large pedigrees with members more than 20 members into nuclear pedigrees. Consequently, we analyzed a total of 138 pedigrees with 1,495 individuals (the IDs for missing parents were added to form trios). In divided pedigrees, the nuclear families contained between 3 and 25 individuals. Five SNPs were removed because they failed the test of Hardy-Weinberg equilibrium (HWE) (*p* value < 10^{−4}). The HWE test was performed using PLINK 1.07 [4] based on 56 unrelated subjects. (For information on PLINK, see http://pngu.mgh.harvard.edu/purcell/plink/.) A total of 22,056 genotypes from various SNPs with genotyping errors (genotyping error rate was around 3.51 × 10^{−4}) were further excluded by the MERLIN 1.1.2 computing package (see http://www.sph.umich.edu/csg/abecasis/merlin/tour/linkage.html). None of the covariates was adjusted for in this approach.

### Multipoint linkage disequilibrium mapping

Suppose M markers were genotyped in the region *R* at locations of 0 ≤ *t*
_{1} < *t*
_{2} < … < *t*
_{
M
} ≤ *T*. We assume there are 2 alleles per marker. With *H* (*t*) being the target allele at marker position *t*, and *h* (*t*) being the nontarget allele, we define

\( \begin{array}{l}{Y}_1^{D_{k_{il}}}(t)=\left\{\begin{array}{l}1\kern0.36em \mathrm{if}\;\mathrm{the}\;\mathrm{transmitted}\;\mathrm{paternal}\;\mathrm{allele}\;\mathrm{at}\;t\;\mathrm{is}\;H\;(t)\\ {}0\kern0.24em \mathrm{if}\;\mathrm{the}\;\mathrm{transmitted}\;\mathrm{paternal}\;\mathrm{allele}\;\mathrm{at}\;t\;\mathrm{is}\;h\;(t)\end{array}\right.,\;\\ {}{Y}_2^{D_{k_{il}}}(t)=\left\{\begin{array}{l}1\kern0.24em \mathrm{if}\;\mathrm{the}\;\mathrm{nontransmitted}\;\mathrm{paternal}\;\mathrm{allele}\;\mathrm{at}\;t\ \mathrm{is}\;H(t)\\ {}0\kern0.24em \mathrm{if}\;\mathrm{the}\;\mathrm{nontransmitted}\;\mathrm{paternal}\;\mathrm{allele}\;\mathrm{at}\;t\ \mathrm{is}\;h(t)\end{array}\right.,\end{array} \) for the affected offspring \( {D}_{k_{il}} \),

and \( \begin{array}{l}{Y}_1^{{\overline{D}}_{k_{il}}}(t)=\left\{\begin{array}{l}\hbox{-} 1\kern0.36em \mathrm{if}\;\mathrm{the}\;\mathrm{transmitted}\;\mathrm{paternal}\;\mathrm{allele}\;\mathrm{at}\;t\;\mathrm{is}\;H\;(t)\\ {}0\kern0.24em \mathrm{if}\;\mathrm{the}\;\mathrm{transmitted}\;\mathrm{paternal}\;\mathrm{allele}\;\mathrm{at}\;t\;\mathrm{is}\;h\;(t)\end{array}\right.,\;\\ {}{Y}_2^{{\overline{D}}_{k_{il}}}(t)=\left\{\begin{array}{l}\hbox{-} 1\kern0.24em \mathrm{if}\;\mathrm{the}\;\mathrm{nontransmitted}\;\mathrm{paternal}\;\mathrm{allele}\;\mathrm{at}\;t\ \mathrm{is}\;H(t)\\ {}0\kern0.24em \mathrm{if}\;\mathrm{the}\;\mathrm{nontransmitted}\;\mathrm{paternal}\;\mathrm{allele}\;\mathrm{at}\;t\ \mathrm{is}\;h(t)\end{array}\right.,\end{array} \) for the unaffected offspring \( {\overline{D}}_{k_{il}} \). Then, we define the preferential transmission statistic \( {Y}_{T_{k_{il}}}(t)={Y}_1^{D_{k_{il}}}(t)-{Y}_2^{D_{k_{il}}}(t) \) for the paternal side and \( {X}_{T_{k_{il}}}(t)={X}_1^{D_{k_{il}}}(t)-{X}_2^{D_{k_{il}}}(t) \) for the maternal side for a trio; similarly, the preferential transmission statistic \( {Y}_{U_{k_{il}}}(t)={Y}_1^{{\overline{D}}_{k_{il}}}(t)-{Y}_2^{{\overline{D}}_{k_{il}}}(t) \) and \( {X}_{U_{k_{il}}}(t)={X}_1^{{\overline{D}}_{k_{il}}}(t)-{X}_2^{{\overline{D}}_{k_{il}}}(t) \) for an unaffected trio for both parental sides, respectively, where *k*
_{
il
} = 1, …, *N*
_{1il
} (for unaffected), *N*
_{1il
} (*N*
_{2il
}) is the number of affected (unaffected) offspring in the family *i* at the *l*
^{th} time point, *i* = 1, … *n*, *l* = 1, …, *L* (L = 1 or 4 in this study).

The expectation of the statistic is \( {\mu}_{1\;{k}_{il}\;j\;}\left(\delta,\;\pi \right)=E\left[{Y}_{T_{k_{il}}}\left({t}_j\right)\;\left|{\varPhi}_1\right.\right]=\left(1-2{\theta}_{t_j,\;\tau}\right)\;C\;{\left(1-{\theta}_{t_j,\;\tau}\right)}^N\;{\pi}_j \) for case-parent trios and \( {\mu}_{2\;{k}_{il}\;j\;}\left(\delta,\;\pi \right)=E\left[{Y}_{U_{k_{il}}}\left({t}_j\right)\;\left|{\varPhi}_2\right.\right]=\left(1-2{\theta}_{t_j,\;\tau}\right)\;C\;*{\left(1-{\theta}_{t_j,\;\tau}\right)}^N\;{\pi}_j \) for control-parent trios, where \( {\theta}_{t_j,\;\tau } \) is the recombination fraction between marker position *t*
_{
j
} and disease locus position *τ*, the recombination fraction Θ is a parametric function of the parameter of primary interest (*τ*, the physical position of the functional variant), *N* is the number of generations since the initiation of the disease variant, *Φ*
_{1} denotes the event that the offspring is affected, *Φ*
_{2} represents the event that the offspring is unaffected, \( C=E\left[{Y}_{T_{k_{il}}}\left(\tau \right)\;\left|{\varPhi}_1\right.\right]=E\left[{X}_{T_{k_{il}}}\left(\tau \right)\;\left|{\varPhi}_1\right.\right] \), \( C*=E\left[{Y}_{U_{k_{il}}}\left(\tau \right)\;\left|{\varPhi}_2\right.\right]=E\left[{X}_{U_{k_{il}}}\left(\tau \right)\;\left|{\varPhi}_2\right.\right],\delta =\left(\tau, N,C,C,*\right) \)is the vector of parameters, and *π*
_{
j
} = Pr [*h*(*t*
_{
j
}) |*h*(*τ*)]. \( {\mu}_{1{k}_{il}j} \) is the probability for an affected offspring to receive a target allele, and \( -{\mu}_{2{k}_{il}j} \) is the probability for an unaffected offspring to receive a target allele. The statistic \( {Z}_{1{k}_{il}j}={X}_{T{k}_{il}j}+{Y}_{T{k}_{il}j} \) and \( {Z}_{2{k}_{il}j}={X}_{U{k}_{il}j}+{Y}_{U{k}_{il}j} \) were used to estimate the parameters. The estimating equations used to solve for parameters *δ* are:

where \( {\widehat{\pi}}_j \) is the average of nontransmitted parental alleles in the sample.

The estimating equations were solved iteratively for parameters *τ*, *N*, *C,* and *C**, where *τ* and *C* are the 2 parameters of interest. The variance of the disease locus position estimate was estimated to make inferences about the disease locus position *(τ)* and its genetic effect *(C)* [2]. Theoretically, the genetic effect of *τ*, characterized by *C*, is the transmission probability that the affected offspring will carry the disease allele, *H*, at *τ*. Detailed derivations for case-parent trios in a cross-sectional design can be found in Chiu et al. [2, 5]. We will present the details of this proposed methodology elsewhere.

Gene-based association mapping was conducted for all SNPs on chromosome 3. This approach accounts for correlations between markers and repeated phenotypes within subjects, and correlations between subjects per family. The consistent estimates of hypertension locus position using “Ever” and “Progression” are shown in Table 2 and Fig. 1, while the consistent estimates of hypertension locus position using baseline and longitudinal data (at all 4 visits) are listed in Table 3 and Fig. 2.

## Results and discussion

A total of 119 (11 %), 79 (7 %), 49 (4 %), and 42 (4 %) of 1095 genes had a significant genetic effect (*P* < 4.57 × 10^{−5} with Bonferroni correction) based on hypertension status at “Ever,” “Progression,” baseline (“Baseline”), and 4 visits (“Longitudinal”), respectively. There are only 3 significantly associated genes (*P* ≤ 0.05) for baseline and longitudinal phenotypes duplicated with the significantly associated genes for “Ever” and “Progression” outcomes: *FETUB, IL1RAP,* and *C3orf21*. Several hits identified here have been reported from linkage or GWAS studies. Table 2 shows genes with a significant genetic effect (*P* < 4.57 × 10^{−5}). Table 3 presents the genes that are significant at a significance level of 0.05. Only 1 gene, *GRM7*, is significant at the level of *P* < 4.57 × 10^{−5}.

Figures 1 and 2 display the 95 % confidence intervals for the estimate of the hypertension locus position for the 4 phenotypes centered at the estimated disease locus position. The comparison is shown for the genes listed in Tables 2 and 3. The standard errors of the estimates for the disease locus position are smaller in 64 % of the genes based on longitudinal data (Table 3) compared to those based on baseline data. This is because those incidence cases included in “Progression” were also included in the analysis of “Ever.” Only prevalent cases, a relatively small proportion, are additionally included in the analysis of “Ever.” Thus, the results from “Progression” and “Ever” are similar.

## Conclusions

Methods of genetic analysis rely heavily on correlations among family members’ outcomes to infer genetic effects, whereas longitudinal studies allow investigators to study factors’ effects on outcomes and changes over time [1]. To retrieve full information from longitudinal family data, appropriate statistical approaches are crucial. We proposed a multipoint linkage disequilibrium approach accounting for multilevel correlations between markers per subject, within-subject longitudinal observations, and subjects within families, aiming to correctly localize the disease locus and assess its genetic effects. This approach has several advantages: it allows us to estimate the disease locus position, the disease locus’s genetic effect, and the 95 % confidence intervals without specifying a disease genetic mode and yet making full use of the markers and repeated measurements. In addition, this approach treats the genotype data as random conditional on the phenotype, eliminating the problem of ascertainment bias. We applied this approach to the baseline and longitudinal prevalence/incidence of hypertension events. The efficiency of parameter estimates was similar for the “Ever” and “Progression” categories, but was improved with repeated longitudinal outcomes compared to the use of “Baseline” only. This difference between analyses might largely result from the different total sample sizes and proportions of hypertensive subjects for different phenotypes. Several identified genes on chromosome 3 for hypertension were consistent with findings from previous linkage and association studies. Despite its advantages, this proposed approach also has limitations; for example, covariate adjustment is not available.

## References

- 1.
Gauderman WJ, Macgregor S, Briollais L, Scurrah K, Tobin M, Park T, et al. Longitudinal data analysis in pedigree studies. Genet Epidemiol. 2003;25 Suppl 1:S18–28.

- 2.
Chiu YF, Lee CY, Kao HY, Pan WH, Hsu FC. Analysis of family- and population-based samples using multiple linkage disequilibrium mapping. Ann Hum Genet. 2013;77(3):251–67.

- 3.
Liu F, Kirichenko A, Axenovich TI, van Duijn CM, Aulchenko YS. An approach for cutting large and complex pedigrees for linkage analysis. Eur J Hum Genet. 2008;16(7):854–60.

- 4.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet. 2007;81(3):559–75.

- 5.
Chiu YF, Liang KY, Pan WH. Incorporating covariates into multipoint association mapping in the case-parent design. Hum Hered. 2010;69(4):229–41.

## Acknowledgements

We deeply appreciate the reviewers’ thorough reviews and constructive suggestions, which greatly improve the quality of this manuscript. This project was supported by a grant from Ministry of Science and Technology, Taiwan (MOST102-2118-M-400-005) and a grant from the National Health Research Institutes, Taiwan (PH-103-pp-04). We thank Ms. Karen Klein (Biomedical Research Services and Administration, Wake Forest School of Medicine) for her editorial contributions to this manuscript.

### Declarations

This article has been published as part of *BMC Proceedings* Volume 10 Supplement 7, 2016: Genetic Analysis Workshop 19: Sequence, Blood Pressure and Expression Data. Summary articles. The full contents of the supplement are available online at http://bmcproc.biomedcentral.com/articles/supplements/volume-10-supplement-7. Publication of the proceedings of Genetic Analysis Workshop 19 was supported by National Institutes of Health grant R01 GM031575.

### Authors’ contributions

YFC designed the overall study, CYL conducted statistical analyses, YFC and FCH drafted the manuscript. All authors read and approved the final manuscript.

### Competing interests

The authors declare that they have no competing interests.

## Author information

## Rights and permissions

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

## About this article

### Cite this article

Chiu, Y., Lee, C. & Hsu, F. Multipoint association mapping for longitudinal family data: an application to hypertension phenotypes.
*BMC Proc* **10, **54 (2016). https://doi.org/10.1186/s12919-016-0049-2

Published:

### Keywords

- Genetic Effect
- Disease Locus
- Association Mapping
- Hypertension Status
- Linkage Disequilibrium Mapping