One hundred replicates were studied at the disease susceptibility Locus B that controls the effect of smoking on rheumatoid arthritis risk. In each replicate, 1500 affected sib pairs were considered for the MIT, 1500 case-parent trios for the log-linear method, 1500 cases and controls for the case-control design, and only the 1500 cases for the case-only test. We also studied smaller sample sizes (500 trios and 750 cases and controls) in order to compare the three association methods for the same number of genotyped individuals. Cases were obtained by considering the first affected case in each sib-pair and controls were the first 1500 control subjects among the 2000 available for each replicate.
Because none of the single-nucleotide polymorphisms (SNPs) close to Locus B were in linkage disequilibrium with this locus, we used genotypes of all the individuals at that locus for association tests and the exact identity-by-descent (IBD) provided in the Problem 3 "answers" for the linkage test. For the exposure status, we considered the lifetime smoking status and did not account for the indirectly increased risk through smoke effect on IgM.
The four following methods were compared.
Mean interaction test
The MIT developed by Gauderman and Siegmund [5] is an extension of the mean sharing test [8] to account for G × E interaction. It compares the proportion of alleles shared IBD, π, which is expected to be equal to 0.5 under the null hypothesis of no linkage, across the three groups of affected sib pairs differing for the number of exposed sibs (2, 1, or 0). The following regression model is used: π
i
= π + β(X
i
-X) + ε
i
, where π is the intercept and β the regression coefficient for the exposure, with X
i
the covariate of exposure centered on its mean X. We conducted analysis using the coding scheme consisting of two variables (X
EE
and X
EU
) contrasting sib pairs with 2, 1, or 0 exposed sibs. The null hypothesis of no linkage is tested by the likelihood ratio test (LRT): Tπβ = 2[ln{L(π = 0.5, β = 0)}-ln{L(π, β)}], which follows a 50:50 mixture distribution of two and three degrees of freedom (df) χ2. The alternative hypothesis corresponds to linkage with or without G × E interaction.
In its original presentation, the mean interaction test method allows accounting for G × E interaction in the search for linkage but does not test for G × E interaction. We therefore developed a LRT for G × E interaction: Tβ = 2[ln{L(π, β = 0)}-ln{L(π, β)}]. This test follows a 2 df χ2 distribution.
Log-linear-modeling approach for case-parent triads
Proposed by Umbach and Weinberg [6], this method consists of comparing the conditional genotype distribution of exposed cases, given parental genotypes, versus that of unexposed cases. Briefly, case-parent triads are divided into 20 categories based on the parental genotypes, the genotype of the case, and the exposure status of the case. The expected number of triads can be expressed according to a log-linear model [3, 6]. LRT are performed to test for 1) a gene effect ignoring G × E interaction (which follows a 2 df χ2), 2) a gene effect accounting for G × E interaction (which follows a 4 df χ2), and 3) a G × E interaction (which follows a 2 df χ2). Fit of the data with a dominant model is also tested as the true model was dominant.
Case-control design
Case-control designs have been widely used to compare risks of developing a disease according to their genotype and exposure status [4]. Odds-ratios (OR) associated with the exposure, the genotypes, and their interaction factors are estimated and tested for significance. Three likelihood ratio tests are performed: a 2 df χ2 test for genetic effect alone, a 4 df χ2 test for genetic effect accounting for G × E interaction, and 2 df χ2 test of GxE interaction. Fit of the data with a dominant model is tested using a 2 df LRT.
Case-only design
Case-only studies [4, 7] test the interaction between an exposure and a genotype among case subjects only. This type of design assesses the departure from a multiplicative scale, assuming independence between both factors. To test for the interaction, a 2 df LRT of homogeneity between the genotype distribution in exposed and unexposed cases is performed.
Powers of the different tests were estimated by determining the number of replicates among the 100 replicates that were significant at a nominal 0.05 type I error rate. Type I error rates to test for G × E interaction are estimated on the seven loci (A, C-H) that are not supposed to interact with lifetime smoking status.