In this study, we proposed a method to integrate genetic data and gene expression data based on the gene set approach. We defined a 'gene set' as a set of genes in the same biological pathway and focused on identifying common regulators based on an assumption that genes within the same pathway are controlled by common regulators, either directly or indirectly. However, the genes in the low cascade of the pathway are not easily detected by using a classical linkage analysis with a strict threshold adjusting for multiple comparison problems, even though these genes are controlled by common regulators.
We, therefore, proposed a two-step procedure for detecting pathway regulators. In the first step, we performed the PIA to detect p-values in a multipoint linkage analysis. The PIA allowed us to choose markers controlling subtle overall changes in gene expression levels. Once peaks were identified, the peaks of a specific gene were compared to those of other genes in the same pathway, in the second step. For detecting the common peaks of genes within the same pathway at a specific marker, we performed Fisher's exact test and obtained one potent regulator for the inflammatory response pathway.
When we applied the gene set approach to the GAW 15 Problem 1 data, there were significant differences in results between the classical eQTL approach and the gene set approach. First of all, the classical eQTL approach only detected 3 genes with significant eQTL among the 23 genes. However, these 3 eQTLs are not linked, so we were unable to conclude whether or not these genes are regulated by the same regulators. On the other hand, the gene set approach identified a significant common regulator controlling gene expressions in the 'inflammatory response pathway'. The most significant marker, rs766737, is located within 2 kb from a mRNA transcript for the T cell receptor alpha locus (TRA@) on 14q11.2.
The T cell receptor (TCR) is a molecule found on the surface of T lymphocytes that is responsible for recognizing antigens bound to major histocompatibility complex (MHC) molecules. The TCRs recognize foreign antigens, and then convey the message to the nucleus to induce an inflammatory response [12]. Our bodies produce many T cells, each with specific TCRs on their surfaces through the recombination of the genes that encode the receptors, before they have encountered complementary antigens. Thus, the gene set approach convinced us that the genotype variation in a TCR has an effect on the expression level of genes in an inflammatory response pathway.
Our method has two advantages over the classical eQTL approach. First, it makes possible to obtain a more functional inference of the result on the basis of the biological pathway. Using prior knowledge we were able to obtain regulators of the whole pathway rather than individual genes. Second, it makes possible to detect relatively small but global effects on the genes interacting in the same pathway.
The proposed method can be further extended. The current method does not consider any linkage disequilibrium block structures. We think a test for significant regions using flanking markers could improve the power of detection of the common regulators.