SIMLAPLOT is a supplement to the simulation program SIMLA [1]. SIMLA uses a prospective logistic regression model as the penetrance function. This allows for the implementation of flexible multivariable genetic models, which may include terms derived from genotypes at a known susceptibility locus or a nearby marker, non-genetic covariate terms, and product terms modeling interaction between genotypes and covariates. The penetrance function can be expressed as follows:
where AFF = 1 if affected and AFF = 0 otherwise. G codes for the three possible genotypes (dd, Dd, DD) at a bi-allelic susceptibility locus or nearby marker based on the user-specified mode of inheritance (additive, dominant, or recessive). β1 is the log-transformed odds ratio for the susceptibility locus. E is a continuous, normally distributed covariate; it can be an environmental risk factor, an endophenotype, or a quantitative trait, which depends on an underlying QTL. β2 is the log-transformed odds ratio for a user-specified one-unit increase of the continuous covariate. G × E is defined as the product of G and E, and β3 is the log-transformed odds ratio for this interaction term. β0 adjusts for the user-specified disease prevalence in the population of simulated individuals.
SIMLAPLOT evaluates QTL models, G × E interaction models, and genetic main effect models with covariate-defined heterogeneity. It produces four types of plots to explore different aspects of the relationship between affection status, continuous covariate values and marker genotypes in each model.
Genotype-specific penetrance values as a function of covariate values
Three penetrance curves, one for each genotype, are produced. These curves display changes in penetrance as a function of E, if E is a risk factor for the simulated disease phenotype, either alone or in combination with genetic susceptibility.
Conditional genotype probability as a function of covariate values and affection status
Three frequency curves, one for each genotype, are produced. At each point on the x-axis, the sum of the three frequencies is 1.0. The respective frequencies change as a function of E if the genotypes correspond to a QTL, if there is interaction with an environmental covariate, or if E is an indicator of genetic heterogeneity.
Covariate distribution for each genotype in affected individuals
Covariate distribution for each genotype in unaffected individuals
The covariate distributions are plotted for each genotype, separately for affected and unaffected individuals. The comparison of the two plots reflects the main effect of E, or the strength of G × E interaction.
SIMLAPLOT will plot the theoretical conditional distributions for the different models given the following input parameters: mean and standard deviation for E, which may or may not be genotype-dependent, allele frequency for the susceptibility locus, QTL or nearby marker, all relevant odds ratios, the mode of inheritance, and the type of model (model-based: QTL, G × E, or heterogeneity). Some parameters, such as genotype-specific means and variances, can be estimated from an existing data set, and some parameters are approximated based on the assumed model, e.g., QTL. SIMPLAPLOT also produces the same types of plots based on the observed data (data-based). Comparison of the observed to the theoretical distributions may suggest an appropriate model for the observed data set. To produce these plots SIMLAPLOT uses a kernel density estimate of the form with different kernels and width b [2]. Kernel options include Gaussian (the default), rectangular, triangular, and cosine. It is very important to evaluate the robustness of the visual plot appearance to the choice of smoothing parameters. SIMLAPLOT determines the optimal degree of smoothing by either minimizing the mean squared error (default) or minimizing the mean distance to the center-matched Gaussian predictions [5].
We applied SIMLAPLOT to the GAW15 simulated data sets using the quantitative covariates IgM, anti-CCP (anti-cyclic citrinullated protein), and severity of RA (rheumatoid arthritis). We analyzed all SNP markers on chromosomes 9, 11, and 18. Because covariate values exist only for affected individuals, we specified a relative risk of 1.0 and focused on two types of plots: the conditional genotype probability (plot type 2) and the covariate distribution for each genotype in affecteds (plot type 3). The input parameters for an assumed QTL model (plots labeled "model-based"), such as genotype-specific mean and variance, were estimated from the observed data for the specified SNP. We demonstrate SIMLAPLOT with data from Replicate 1. To evaluate our qualitative conclusions, we performed quantitative trait association analysis using the Monks-Kaplan method [3] as implemented in the QTDT program [4]. p-Values and their ranks were obtained for all 100 simulated replicates.