Genetic map and marker data
We focused exclusively on the Genetic Analysis Workshop (GAW) 16 50 k marker data, and primarily analyzed only chromosome 7. We matched the position of each chromosome 7 marker to the sex-averaged Kosambi map sequence position on the Rutgers map [1], and then converted those positions to a Haldane map. Markers within <0.01 cM of each were given unique and sequential map positions to obtain non-overlapping map positions.
We filtered markers with >3% missing data and with minor allele frequency <0.05. We used chi-square tests to test the null hypothesis of Hardy-Weinberg equilibrium, and removed markers yielding the largest 1% of the test statistics, leaving 2132 markers on chromosome 7. A "thinned" marker panel was obtained by selecting approximately every tenth marker from this "dense" filtered marker set, preferentially selecting markers with higher rare allele frequencies among founders because they are more suitable for linkage analysis. The final thinned data set included 214 markers on chromosome 7 (and 3465 genome-wide markers) with a marker density of ~1 per cM.
Pedigree data cleaning
To ensure compatibility with the linkage analysis programs in the MORGAN package [2], we merged the two members of each of 25 monozygotic twin pairs. Parents missing pedigree information who were referenced by at least two family members were given records of their own. Mendelian-inconsistent genotypes were identified by Loki 2.4.7 [3] and recoded as missing genotypes for all members of each affected pedigree. All individuals sharing a pedigree number could not necessarily be connected, so we split the larger pedigree into smaller pedigrees generated by available parent-offspring relationships.
Phenotype data refinement
For linkage-based analyses, we focused on high-density lipoprotein level (HDL) and chromosome 7 due to previous evidence of linkage within the Framingham Heart study (FHS) [4–8]. We used observations from Exam 11 for the Original Cohort, and Exam 1 for the Offspring and Generation 3 Cohort, age-matching the second and third generations to maximize the number of individuals in our study. Height was imputed from Exam 7 of the Original Cohort data when it was missing from Exam 11 in order to calculate body mass index (BMI). We fit linear regression models to adjust HDL for age, BMI, sex, cholesterol treatment status, and cohort.
Quantitative trait locus models
We performed Bayesian oligogenic segregation analysis using the software package Loki 2.4.7 [3] to identify and describe models for quantitative trait loci (QTLs) associated with the adjusted HDL phenotype. The QTL with the largest effect size (A allele frequency = 0.76, AA genotype effect = -1.39, Aa genotype effect = -1.75, aa genotype effect = 24.21, variance due to the QTL = 12.04, additive variance = 13.53, dominance variance = 23.54) was incorporated into the MORGAN [2] lm_multiple analysis described below. We used the posterior distribution of these models to generate a sample of simulated traits for use in empirical significance testing [9].
IBD sharing and kinship from population and pedigree data
Without reference to the pedigree structures, we estimated k-coefficients, where k
i
is the probability that i alleles are shared IBD, using the thinned chromosome 7 and genome-wide panels of markers, as well as subsets of 214 and 1000 genome-wide markers. We estimated k-coefficients for all possible pairs of independent people (n = 1827), using all founders in the pedigrees and other unrelated individuals, and for all pairs of individuals within each pedigree. Kinship coefficients, Φ, were subsequently computed as Φ = 0.25k1 + 0.5k2 and pairs of individuals with Φ > 0.2 were noted.
We selected four pairs of individuals showing high apparent relatedness as estimated from the thinned chromosome 7 markers while differing with respect to pedigree numbers. For each pair, the dense chromosome 7 markers were used to detect IBD segments using the model of Thompson [10]. We used a prior marginal pairwise IBD probability 0.1, and for an IBD-change rate parameter giving a prior expected length of chromosome in a particular IBD state of 1 cM, averaged over the nine possible IBD states in accordance with their marginal prior probabilities. The dense chromosome 7 data set was also used to flag tracts of homozygous markers (>9 SNPs in a row) shared between each of these four pairs of individuals.
We used a new "case-control" study design that corrects for relatedness (both known and estimated as cryptic kinship) within the sample [11], choosing 838 "cases" and 844 "controls" from the upper and lower 15th percentiles of the trait distribution in the full data set. The correction for relatedness essentially eliminates inflated test statistics resulting from inclusion of related individuals. We corrected the naïve chi-square statistic p-values using three types of kinship coefficients: pedigree-based prior, pedigree-based posterior, and population-estimated kinship coefficients. The pedigree-based prior was computed based on pedigree structure alone, while the pedigree-based posterior was based on the gl_auto results (described below) that used both pedigree structure and marker data. The dense chromosome 7 marker panel was used for the case-control study, while the thinned chromosome 7 marker panel was used for estimation of kinship coefficients. The population-estimated kinship coefficient was a maximum-likelihood estimate based on the thinned chromosome 7 marker data.
Linkage analyses
Two MORGAN [2] programs, lm_multiple and gl_auto, were used for lod score analyses and realization of inheritance indicators conditional on marker data, respectively. Options in both programs now allow the multiple-meiosis sampler to be used with the locus sampler, leading to more accurate Markov-chain Monte Carlo (MCMC) sampling of inheritance indicators on large pedigrees [12]. Additionally, both programs have options to run sequentially over pedigrees, permitting easier processing of output on disjoint pedigrees and allowing for exact computation of lod scores on small (<= 14 meioses) pedigrees in lm_multiple, and independent realizations of inheritance indicators in gl_auto. This allows computationally intensive MCMC approximation to be used only where necessary.
Genetic linkage can be detected with pedigree data using inheritance vector realizations. We used the inheritance vectors obtained from gl_auto for two linkage analysis methods: 1) standard variance-components (VC) analysis using SOLAR, and 2) a novel conditional inheritance vector test using the w-score [13], which is the expectation over founder genotypes of a maximized likelihood given those founder genotypes, to test whether we could resolve the number of causal loci in a region of interest indicated by the VC results. The w-score analyses were performed only on the size 4-9 pedigrees, while VC analysis was performed on this subset as well as on all pedigrees for comparison. We summarized the results using randomized p-values for the conditional test [14] and empirical p-values for the VC analysis through trait simulation and the inheritance vectors described above [9]. We also performed three Bayesian oligogenic joint segregation and linkage analyses on all pedigrees using Loki 2.4.7 [3], where every 100th out of 500,000 iterations were used to compute Bayes' factors for the presence of a QTL within each 2-cM bin.