A bivariate counting process model for genetic association of RA severity
Methods for analyzing data on events observed over time have been of considerable interest in recent years. A counting process framework [5, 6] models count data, collected at fixed points of ascertainment, by taking duration of disease into consideration (as is the case in a cross-sectional design, provided that the time of disease onset is available). Under the assumption that the numbers of tender and swollen joints do not decrease through time, we have a recurrent event process.
Let {N
ij
(t), t ≥ 0} represent the underlying counting process in which N
ij
(t) denotes the number of events experienced by the jth process of the ith individual over the continuous time interval (0, t], where i = 1,..., n. Note that j = 1, 2 represents the tender and swollen joint counting processes, respectively. Because it is reasonable to specify a model that depends only on the current covariate values rather than on the entire history process in our setting, we formulated a non-homogeneous Poisson process. To handle any substantial inter-individual variation in the model, we introduced a positive individual-specific random effect u
i
that is shared for both processes of the ith subject. Conditional on u
i
, the counts of the ith subject are independent. To be more specific, we assumed that conditional on u
i
, the counting process {N
ij
(t), t ≥ 0} is a non-homogeneous Poisson process with intensity and mean functions represented as
λ
ij
(t|u
i
) = u
i
λ0j(t) exp{x
ij
Tβ
j
} (1)
and
Λ
ij
(t|u
i
) = u
i
Λ0j(t) exp{x
ij
Tβ
j
}, (2)
respectively. The time line is defined as the time since onset of RA. The form of the intensity function is a relative risk or Cox model, with the baseline intensity function for the jth process λ0j(t) common among all subjects. The covariate vector x
ij
, specific to the ith subject, may include candidate polymorphisms (such as DRB1), single SNPs selected through genome-wide linkage scans, or a vector of SNPs chosen to capture variation in a candidate gene or candidate region (such as PTPN22); it may also include characteristics such as sex and smoking history, or any combination of reasonable interactions. The corresponding regression parameter vector β
j
describes the association of genotype with phenotype for the jth process. Furthermore, each individual has their own frailty u
i
acting multiplicatively on the intensity function.
Unless the variance of the underlying mixing distribution is exceptionally large, the gamma random effect provides a robust approach for modeling mixed-Poisson processes [7]. Assuming that u1,..., u
n
are independent and identically distributed random variables arising from a gamma distribution with mean of 1 and variance of φ, we obtain the expectation E[N
ij
(t)] = E (E[N
ij
(t)|u
i
]) = Λ0j(t) exp{x
ij
Tβ
j
}, which represents the mean number of counts for the jth process of the ith individual over the interval (0, t]. The parameter φ is a measure of heterogeneity between individuals that may not have been sufficiently accounted for by the Poisson model alone. Larger values of φ imply extra-Poisson variation in the model.
We now need to construct a likelihood function based on our bivariate mixed-Poisson process assumptions. Suppose the ith individual is ascertained at time τ
i
, which is relative to the time of onset. Conditional on the random effect u
i
, the distribution of the counts P(N
ij
(τ
i
) = n
ij
|u
i
) has a Poisson form. Moreover, under conditional independence, the bivariate distribution of the counts can be computed as P(ni1, ni2) = ∫ P(ni1, ni2|u
i
) dG(u
i
) = ∫ P(ni1|u
i
) P(ni2|u
i
) g(u
i
) du
i
, which has a convenient closed form expression due to the gamma-Poisson mixture. Thus the log-likelihood log L(θ) = Σ log P(ni1, ni2) is maximized with respect to θ, which consists of parameters β1, β2, φ, and parameters of the baseline intensity functions. We applied a Newton-Raphson technique, but any non-linear maximization algorithm may be used for this purpose. Furthermore, when maximizing the log-likelihood under the counting process framework, it is most convenient to assume a parametric form such as a Weibull model for the baseline intensity functions, although semi-parametric assumptions using piecewise constant models are also reasonable alternatives.
In the case of a univariate counting process model, there is no joint distribution formulated among the counts for each individual. Rather, the log-likelihood is simply L(θ) = ΣΣ log P(n
ij
), where P(n
ij
) = ∫ P(n
ij
|u
i
) dG(u
i
) = ∫ P(n
ij
|u
i
) g(u
i
) du
i
.