Data
Using the Gaussian graphical model for pedigrees and sparse graphical models for discrete and quantitative traits, we analyzed Genetic Analysis Workshop 18 (GAW18) data, which includes genome-wide association data for 400,000 SNPs, along with simulated and real phenotypic information SBP, DBP, hypertension, blood pressure medication use, and smoking status. The real data contained 939 individuals within 20 pedigrees at 4 time points. Missing data were present at all the time points. We excluded individuals with missing data for each of the time points and performed our analyses on the remaining data. For the analysis of data using the Gaussian graphical model for quantitative traits in pedigrees, we analyzed the first 3 time points for the 6 phenotypes in the real data. The fourth time point was excluded from the analysis because most of the data was missing for this time point.
For the sparse graphical model with discrete and continuous traits, we concentrated on chromosome 3. We used genome-wide association data for constructing the network. Two hundred replicates of simulated data for the 3 time points were available that were generated using the real pedigree structures. We used a single replicate of the simulated data for the phenotypes. Only the unrelated individuals from the first time point were used for this analysis. In the simulated data, a total of 1457 genetic variants were causal for either SBP or DBP across all the chromosomes. Of these 1457 causal variants, 188 variants were located on chromosome 3. We randomly sampled 20 of these 188 variants on chromosome 3 in our analysis.
Gaussian graphical models for pedigrees
We derived the graphical models of 6 traits and covariates, accounting for pedigree structure: age, SBP, DBP, hypertension, blood pressure medication use, and smoking status. Because hypertension, blood pressure medication use, and smoking status are discrete phenotypes, we transformed these variables into quantitative phenotypes using a logistic regression framework in which all the other phenotypes were regressed as dependent variables in the logistic model. At each time point, the graph shows the conditional relationships among the phenotypes. For example, in Figure 1, the graph for the second time point shows that age and DBP are conditionally negatively correlated given all the other phenotypes. The weight of the edge is the partial correlation between age and DBP, which was −0.2042. Similarly, the other edges point out the conditional relationships among the other phenotypes. The graph structure remained essentially the same for all 3 time points. Smoking status was not related to any of the other phenotypes at the 3 time points. Whereas DBP was inversely correlated with age, SBP was positively correlated with age.
Sparse graphical models for binary and quantitative traits
We validated the sparse graphical methodology using the simulated genome-wide association data. Twenty of the 188 causal SNPs on chromosome 3 were randomly sampled. We also analyzed 21 consecutive noncausal SNPs from the same chromosome. The causal and noncausal SNPs analyzed are detailed in Figure 2. The graphical model also included the 6 phenotypes studied in the Gaussian model (age, SBP, DBP, hypertension, blood pressure medication use, and smoking status) for the first time point. Thus, our sparse graphical network model used 6 phenotypes, 20 causal SNPs, and 21 noncausal SNPs.
We performed LASSO regression using all 47 genetic and nongenetic factors and constructed the graph as described in the methods section. We used the AND operator for the conditional independence of 2 nodes to get a sparser graph. The strength of dependence was measured using the maximum measure of the 2 regression coefficients. Figure 2 shows the sparse graphical network of the phenotypes and the causal and noncausal SNPs. The phenotypes are coded in red, the causal SNPs in pink, and the noncausal SNPs in green. The 21 noncausal SNPs are in linkage disequilibrium (LD) with each other because of their proximity, which explains the huge number of edges between them. The network shows that the causal SNPs are linked with different phenotypes, but the noncausal SNPs are not linked to the phenotypes. However, 2 noncausal SNPs (rs1159106, rs4684741) were associated with the phenotypes. This can be explained by the noncausal SNPs being in low LD with 2 causal SNPs (rs11711953 and rs3772985, respectively), as shown by the blue edges in Figure 2. The values were 0.049 and 0.043, respectively. All of the phenotypes were interrelated, except for smoking status, which was independent of the other phenotypes and any genetic variants.
We also conducted additional validation of the proposed method where we randomly selected 21 noncausal SNPs from chromosome 3 that were not in LD with any of the causal SNPs or among themselves. All of the other phenotypes and the causal SNPs were as in the previous scenario. As expected, the resulting sparse graphical network (not shown) had no edges among the noncausal variants, and there were no edges connecting the causal variants and noncausal variants. The part of the network corresponding to the phenotypes and the causal SNPs was similar to the previous scenario.