Volume 1 Supplement 1
Comparison of multipoint linkage analyses for quantitative traits in the CEPH data: parametric LOD scores, variance components LOD scores, and Bayes factors
© Sung et al; licensee BioMed Central Ltd. 2007
Published: 18 December 2007
We performed multipoint linkage analyses with multiple programs and models for several gene expression traits in the Centre d'Etude du Polymorphisme Humain families. All analyses provided consistent results for both peak location and shape. Variance-components (VC) analysis gave wider peaks and Bayes factors gave fewer peaks. Among programs from the MORGAN package, lm_multiple performed better than lm_markers, resulting in less Markov-chain Monte Carlo (MCMC) variability between runs, and the program lm_twoqtl provided higher LOD scores by also including either a polygenic component or an additional quantitative trait locus.
Our aims were 1) to compare results from several multipoint linkage analysis programs that are available for quantitative traits and 2) to investigate the performance of MCMC-based programs on the GAW15 expression data in 14 three-generation CEPH families genotyped for clustered SNP markers . We used three recently developed programs in the MORGAN package : lm_markers, lm_multiple, and lm_twoqtl. These programs provide MCMC-based parametric LOD score analysis, the first two with a one-QTL (1Q) model and the last with more complex models, including a second linked (2Q) or unlinked (UQ) QTL and/or a polygenic component (P). In addition, we used Loki  for Bayesian oligogenic analysis and Merlin  for VC analysis. These analyses cover most approaches that fully use quantitative trait data from three-generation pedigrees.
For 62 traits previously reported to show evidence of linkage [5, 6], we performed genome-wide VC analysis and obtained the maximum likelihood estimate (MLE) of heritability (h2). We chose six traits that showed high VC LOD scores and h2 ≥ 0.31: CHI3L2, GSTM1, PSPH, VAMP8, PPAT, and TM7SF3. The first two of these had only a single peak with VC LOD > 3, representing potentially simple traits, and the latter four had multiple peaks, representing potentially complex traits. For these six traits, we performed Bayesian oligogenic joint segregation and linkage analyses using Loki and parametric LOD score analysis with a 1Q model using lm_markers and lm_multiple. For the first four traits only, we also performed parametric LOD score analysis with more complex models using lm_twoqtl.
Genetic map and marker data
We used the Rutgers map  for linkage analysis. We converted Kosambi map positions to Haldane map positions for analysis, although for ease of comparison with other GAW contributions we present all results on a Kosambi scale. We also constructed a jittered map by adding 0.01 cM between markers with identical positions on this map. We excluded sex chromosomes and used the sex-averaged jittered map for all our linkage analyses because neither MORGAN nor Loki allows multiple markers at the same position. For the VC analysis, we also used the nonjittered map as a comparison. We used Merlin to identify all Mendelian-inconsistent genotypes (69 marker-family combinations) and any obligate recombinations within each cluster (166 cluster-family, or 508 marker-family combinations), where a cluster is defined as a set of markers that have the same Rutgers map position. We coded these markers as missing genotypes in all members of the families with an apparent error.
Segregation and linkage analyses
For the 62 traits, we performed genome-wide VC linkage analysis with Merlin for both the jittered and original nonjittered maps. VC LOD scores were computed only at the marker positions. We also obtained MLEs of h2 for these 62 traits with a VC polygenic model . Using Merlin, we obtained MLEs of marker allele frequencies, which we used in all linkage analyses.
For the six traits, we performed Bayesian oligogenic segregation analysis and oligogenic joint segregation and linkage analysis using Loki. For segregation analysis, we used every fourth iteration in a 50 k iteration run to estimate QTL models. For linkage analysis, we used every fourth iteration in a 999 k iteration run to compute Bayes factors for presence versus absence of a QTL in each 2-cM bin. We used QTL models estimated from Bayesian segregation analysis in all our LOD score analyses.
We recently developed three programs in MORGAN: lm_markers, lm_multiple, and lm_twoqtl. The first two programs compute LOD scores for the 1Q model, and lm_twoqtl computes LOD scores for more complex models . In addition to its MCMC-based approach, lm_markers now can also provide exact computation of LOD scores for small pedigrees with many markers. No other programs provide parametric LOD scores for quantitative traits with many markers. The program lm_multiple differs from lm_markers only in that, instead of updating only one meiosis at a time, it uses an improved sampler that simultaneously updates either a randomly chosen subset of up to eight meioses or a possibly larger subset of meioses in closely related individuals, such as siblings . This multiple-meiosis updating can improve estimates of LOD scores, particularly for data with large sibships. Finally, lm_twoqtl provides LOD scores with models that include additional linked or unlinked QTLs and a polygenic component. Incorporating better modeling of complex traits into linkage analysis can provide higher LOD scores and better localization for complex traits .
Oligogenic segregation analysis results
VC LOD scores and heritabilities for the 62 traits
Of the 62 traits, 24 had a VC LOD score ≥ 3, with h2 ranging from 0.13 to 0.86. Five traits had a maximum VC LOD score < 1, with h2 ranging from 0 to 0.11. Most traits had only a single peak in the genome with VC LOD ≥ 3, suggesting a simple mode of inheritance. Two traits (PSPH and DDX17) had three peaks with VC LOD ≥ 3, and three traits (PPAT, HSD17B12, TUBG1) had two peaks with VC LOD ≥ 3. The jittered and nonjittered maps yielded virtually identical VC LOD scores, except for VAMP8 on chr 2, where the largest peak was slightly narrower with the nonjittered map.
We chose the six traits CHI3L2, GSTM1, PPAT, PSPH, TM7SF3, and VAMP8 for further analysis. The actual locations of these genes were at the maximum VC LOD scores (CHI3L2, GSTM1, PSPH), 10 cM away (VAMP8), or 25 cM away (PPAT). Bayesian oligogenic segregation analysis for these traits provided posterior mean numbers of QTLs ranging from 2 to 3.5. Estimation of the primary QTL model was relatively straightforward (Table 1), whereas the secondary or weaker QTL models were less obvious. Heritabilities estimated from Bayesian oligogenic segregation analysis were sometimes higher than MLEs of h2 obtained from a VC polygenic model. This is not surprising because VC analysis with Merlin uses only additive genetic variance, thus providing only narrow-sense heritabilities, whereas Loki allows for dominance effects, thus providing larger broad-sense heritabilities.
Bayes factors using an oligogenic model for the 6 traits
Highest LOD score or log (Bayes factor) and run time (in minutes)
CHI3L2 147 cM (chr 1)a
GSTM1 142 cM (chr 1)
PSPH 80 cM (chr 7)
VAMP8 113 cM (chr 2)
1Q + P
1Q + UQ
LOD scores using a one-QTL model for the six traits
Exact LOD scores by family at chromosomal locations with the highest overall LOD score
LOD scores using more complex models for the four traits
More complex trait models lead to higher LOD scores than the 1Q model (Table 2). For GSTM1, the 1Q + P model provided the highest LOD scores (Fig. 1B), while for CHI3L2 and VAMP8, LOD scores for 1Q + UQ and 1Q + P models were almost identical (Fig. 1A, D). For CHI3L2, the model labeled as 1Q + UQ in Table 2 actually included a polygenic component, i.e., 1Q + UQ + P, which increased the run time significantly. In contrast, for PSPH, the 1Q + UQ model provided strange results, with LOD scores ranging from less than -3000 to 40 (Fig. 1C). This may be due to inaccurate estimation of the secondary QTL model: the combined genetic variance from the two QTLs exceeded the total genetic variance obtained from segregation analysis. For VAMP8, the 2Q model provided two peaks, of equal magnitude (Fig. 1D), resulting from the identical model for both QTLs.
We performed several multipoint linkage analyses for quantitative traits: VC, Bayesian oligogenic, and parametric LOD score linkage analysis with 1Q, 1Q + P, 1Q + UQ, and 2Q models. We found that all of these analyses provided similar inferences about peak location and shape, with some advantage to using the 1Q + P and 1Q + UQ models over the 1Q model. Use of parametric LOD scores also provided insights into genetic heterogeneity of the traits, which was considerable. However, models for QTLs other than the primary QTL were difficult to estimate with the Bayesian approach for these gene expression traits, suggesting the need for better segregation analysis tools for estimating parameters of complex trait models.
We were able to obtain reliable results for analysis with clustered SNPs with several newly-developed MCMC programs in MORGAN. We found that lm_multiple provided better estimates of LOD scores than lm_markers with fewer scans in less time although, in general, both programs performed well with only minor differences in the variability between runs. The MCMC performance obtained here is improved relative to our results for GAW14 . Factors in this improvement likely include the use of sequential imputation to obtain starting configurations , less missing data, and different SNP marker maps, in addition to improved algorithms and software. Finally, although our goal here was to compare our developing MCMC-based methods, we advocate use of exact computation when this is practical. On small pedigrees, such as those used here, exact analysis with a 1Q model and lm_markers or with VC methods may be best initially since this is faster than MCMC analysis. Further analyses may use lm_twoqtl, if the evidence warrants it. However, on larger pedigrees, exact multipoint computation may not be possible, in which case these MCMC options are a viable and practical alternative.
We showed that MCMC-based programs from the MORGAN package provide accurate LOD scores for quantitative traits with SNP markers. The program lm_multiple gives more accurate results than lm_markers, and the program lm_twoqtl expands the trait models to include two loci plus a possible polygenic component.
List of Abbreviations
- 1Q + P:
One QTL plus a polygenic component
- 1Q + UQ:
One linked QTL plus one unlinked QTL
Two linked QTL
Centre d'Etude du Polymorphisme Humain
Genetic Analysis Workshop
- h 2 :
Markov chain Monte Carlo
Maximum likelihood estimate
Quantitative trait locus
Supported by NIH grants AG14382, AG05136, AG21544, AG11762, HL30086, GM46255, and HD35465.
This article has been published as part of BMC Proceedings Volume 1 Supplement 1, 2007: Genetic Analysis Workshop 15: Gene Expression Analysis and Approaches to Detecting Multiple Functional Loci. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/1?issue=S1.
- Cheung VG, Spielman RS: Data for Genetic Analysis Workshop 15 (GAW15), Problem 1: genetics of gene expression in humans. BMC Proc. 2007, 1 (Suppl 1): S2-View ArticlePubMed CentralPubMedGoogle Scholar
- MORGAN. [http://www.stat.washington.edu/thompson/Genepi/genepi.shtml]
- Heath SC: Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. Am J Hum Genet. 1997, 61: 748-760. 10.1086/515506.View ArticlePubMed CentralPubMedGoogle Scholar
- Abecasis GR, Cherny SS, Cookson WO, Cardon LR: Merlin-rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002, 30: 97-101. 10.1038/ng786.View ArticlePubMedGoogle Scholar
- Morley M, Molony CM, Weber TM, Devlin JL, Ewens KG, Spielman RS, Cheung VG: Genetic analysis of genome-wide variation in human gene expression. Nature. 2004, 430: 743-747. 10.1038/nature02797.View ArticlePubMed CentralPubMedGoogle Scholar
- Cheung VG, Spielman RS, Ewens KG, Weber TM, Morley M, Burdick JT: Mapping determinants of human gene expression by regional and genome-wide association. Nature. 2005, 437: 1365-1369. 10.1038/nature04244.View ArticlePubMed CentralPubMedGoogle Scholar
- Rutgers map (build 35). [http://compgen.rutgers.edu/maps/b35.shtml]
- Sung YJ, Dawson G, Munson J, Estes A, Schellenberg GD, Wijsman EA: Genetic investigation of quantitative traits related to autism: use of multivariate polygenic models with ascertainment adjustment. Am J Hum Genet. 2005, 76: 68-81. 10.1086/426951.View ArticlePubMed CentralPubMedGoogle Scholar
- Sung YJ, Thompson EA, Wijsman EM: MCMC-based linkage analysis for complex traits on general pedigrees: multipoint analysis with a two-locus model and a polygenic component. Genet Epidemiol. 2007, 31: 103-114. 10.1002/gepi.20194.View ArticlePubMedGoogle Scholar
- Tong L, Thompson EA: Multilocus LOD scores in large pedigrees: combination of exact and approximate calculations. Hum Hered. 2007, Google Scholar
- Sieh W, Basu S, Fu AQ, Rothstein JH, Scheet PA, Stewart WC, Sung YJ, Thompson EA, Wijsman EM: Comparison of marker types and map assumptions using Markov chain Monte Carlo-based linkage analysis of COGA data. BMC Genet. 2005, 6 (Suppl 1): S11-10.1186/1471-2156-6-S1-S11.View ArticlePubMed CentralPubMedGoogle Scholar
- Wijsman EM, Rothstein JH, Thompson EA: Multipoint linkage analysis with many multiallelic or dense diallelic markers: Markov chain-Monte Carlo provides practical approaches for genome scans on general pedigrees. Am J Hum Genet. 2006, 79: 846-858. 10.1086/508472.View ArticlePubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.