To obtain a sample of cases and controls, we randomly chose one case from each simulated affected sib pair from Replicate 1 of GAW15 Problem 3. Our sample consisted of 1500 cases with rheumatoid arthritis and 2000 controls. Available covariates for controls included sex, lifetime smoking, DR alleles and age. We used only sex, smoking, and DR alleles as significant covariates in our adjusted model. We ran our interaction models once without covariates and once with the covariates. We used the genome-wide 10 K simulated SNP chip set with 9187 polymorphic SNPs. There are no missing data and no errors in this data set. Analyses were performed without knowledge of the simulated answers. Computations were made with SAS v.9.1.3 (SAS Institute Inc., Cary, NC, USA) on WinXP and Sun (SAS code available upon request, see http://www.statgen.org).
First stage
We tested 9187 SNPs for association by logistic regression modeling according to:
log(r/(1 - r)) = μ + ax + dz,
where r is the probability of each individual being a case, x and z are dummy variables with x = 1, z = -0.5 for one homozygote genotype, x = 0, z = 0.5 for the heterozygote genotypes, and x = -1, z = -0.5 for the other homozygote type. μ Corresponds to the mean effect. The terms a and d correspond to the additive and dominance coefficient estimates of the tested SNP. The p-values of the global model were considered. The additive effects model with adjustment for covariates was modeled as:
log(r/(1 - r)) = μ + ax + sex + smoking + DRalleles
following the same notation. For this model, the p-value of the additive coefficient a was used. Bonferroni correction was used for the conditional design.
Second stage
We also used logistic regression to model the effect of genotypes and SNP × SNP interactions on the disease risk. We included terms that allow for the estimation of additive effects and dominance effects for each SNP locus, along with the inter-SNP additive and dominance interactions. The full interaction model, following Cordell's notation [3] is:
where r is the probability of each individual being a case, x
i
and z
i
are dummy variables with x
i
= 1, z
i
= -0.5 for one homozygote genotype, x
i
= 0, z
i
= 0.5 for the heterozygote genotypes, and x
i
= -1, z
i
= -0.5 for the other homozygote. μ Corresponds to the mean effect; the terms a1, d1, a2, d2, are the dominance and additive effect coefficients of the two SNPs and i
aa
, i
ad
, i
da
, i
dd
, represent their interaction coefficients. The additive effects-only interaction test was modeled as:
log(r/(1 - r)) = μ + a1x1 + a2x2+ i
aa
x1x2 + sex + smoking + DRalleles
according to the same notation.
SNPs were selected in the first stage for marginal significance levels up to 0.1 and 0.05 for the simultaneous design and up to the Bonferroni adjusted threshold for the conditional method. In the second stage, the p-values of the four interaction terms i
aa
, i
ad
, i
da
, and i
dd
were considered in the full model [Eq. (3)], and the p-value of the interaction coefficient i
aa
was used in the additive-only model [Eq. (4)]. Bonferroni correction was used by considering the total number of valid interaction term tests. Valid interaction tests refer to those for which the problem of quasi-separation does not occur when using the logistic regression model. We also used an interaction model in which the additive effects are considered without covariates (results not shown), in this case, the p-values of the one interaction term (i
aa
) was used.