Armitage's trend test for genome-wide association analysis: one-sided or two-sided?

The importance of considering confounding due to population stratification in genome-wide association analysis using case-control designs has been a source of debate. Armitage's trend test, together with some other methods developed from it, can correct for population stratification to some extent. However, there is a question whether the one-sided or the two-sided alternative hypothesis is appropriate, or to put it another way, whether examining both the one-sided and the two-sided alternative hypotheses can give more information. The dataset for Problem 1 of Genetic Analysis Workshop 16 provides us with a chance to address this question. Because it is a part of a combined sample from the North American Rheumatoid Arthritis Consortium (NARAC) and the Swedish Epidemiological Investigation of Rheumatoid Arthritis (EIRA), the results from the combined sample can be used as references. To test this aim, the last 10,000 single-nucleotide polymorphisms (SNPs) on chromosome 9, which contain the common genetic variant at the TRAF1-C5 locus, were examined by conducting Armitage's trend tests. Examining the two-sided alternative hypothesis shows that SNPs rs12380341 (p = 9.7 x 10-11) and rs872863 (p = 1.7 x 10-15), along with six SNPs across the TRAF1-C5 locus, rs1953126, rs10985073, rs881375, rs3761847, rs10760130, and rs2900180 (p~1 x 10-7), are significantly associated with anti-cyclic citrullinated peptide-positive rheumatoid arthritis. But examining the one-sided alternative hypothesis that the minor allele is positively associated with the disease shows that only those six SNPs across the TRAF1-C5 locus are significantly associated with the disease (p~1 x 10-8), which is consistent with the results from the combined sample of the NARAC and the EIRA.


Background
The Genetic Analysis Workshop 16 (GAW16) rheumatoid arthritis (RA) dataset is the initial batch of whole genome-wide association study (GWAS) data for the North American Rheumatoid Arthritis Consortium (NARAC) cases (N 1 = 868) and controls (N 0 = 1194) after removing duplicated and contaminated samples [1]. The high-throughput genotyping technology [~550 k Page 1 of 4 (page number not for citation purposes)

BioMed Central
Open Access single-nucleotide polymorphisms (SNPs)] in the NARAC data makes it a challenge to interpret this GWAS.
One of the disadvantages of the case-control GWAS is that they are prone to a number of biases including population stratification [2]. The importance of considering confounding due to population stratification in GWAS using case-control designs [3,4] has been a source of debate. The Armitage's trend tests can correct for population stratification to some extent [5][6][7]; some other methods based on the Armitage's trend tests have also been developed, such as genomic control approach [8,9]. However, there is still a question as to whether the one-sided or the two-sided alternative hypothesis is appropriate, or put it another way, whether examining both the one-sided and the two-sided alternative hypotheses can give more information. The dataset for the Problem 1 of GAW16 provides us with a chance to address this question. Because it is a part of a combined sample from the NARAC and the Epidemiological Investigation of Rheumatoid Arthritis (EIRA), the results from the combined sample can be used as references.
To this aim, the last 10,000 SNPs on chromosome 9, which contains the common genetic variant at the TRAF1-C5 locus, were examined by conducting Armitage's trend tests. Two alternative hypotheses, the twosided alternative hypothesis that the genotypes at a locus are associated with the disease and the one-sided alternative hypothesis that the minor allele at a locus is positively associated with the disease, were considered. Three types of scores, co-dominant score, dominant score, and recessive score, were chosen to construct the Armitage's trend tests.

Methods
At any SNP, the data can be summarized in a contingency table as in Table 1. Always assume that "M" is the major allele and "m" is the minor allele. Scores x 0 , x 1 , and x 2 , are chosen to construct Armitage's trend test. The Armitage's trend test statistic is defined as [5,6].
Under the null hypothesis, it is approximately distributed with χ 1 2 . This test statistic is suitable for the twosided alternative hypothesis that the genotypes at a SNP are associated with the disease of interest. As discussed in Armitage [5], whatever the scoring system chosen, the validity of the test X A 2 is not affected, but the choice of scoring system affects the power of the test. There are three common choices of scoring system: 1) co-dominant score: x 0 = 0, x 1 = 1, and x 2 = 2; 2) dominant score: x 0 = 0, x 1 = 1, and x 2 = 1; 3) recessive score: x 0 = 0, x 1 = 0, and x 2 = 1. Here, the names of scoring systems are in favor of the minor allele "m".
From the rationale of the genetic association analysis (see, for example, Risch and Merikangas [10]), it is more informative to look at two one-sided alternative hypotheses, i) the alternative that the minor allele is positively associated with the disease and ii) the alternative that the major allele is positively associated with the disease. Furthermore, because the disease of interest is rare, it is more reasonable to concentrate on the first alternative, despite that in practice we would do better to consider both alternatives if no prior information is available on which allele is positively associated with the disease. Another reason is that it can reduce the false-positive rate.
Hereafter, we concentrate on the alternative hypothesis that the minor allele is positively associated with the disease. To this aim, one-sided can be defined as Under the null hypothesis, it is approximately distributed with N(0,1). Similarly, those three scoring systems can also be used here. It is shown in Knapp [11] that if the co-dominant scoring system is chosen, then , where F is the Wright's coefficient of inbreeding, and Z is the test statistic simply comparing the frequencies of minor allele "m" in the case and control groups. Here the value of F automatically corrects the population stratification to some extent.

Results
For simplicity of interpretation, we only consider the last 10,000 SNPs on chromosome 9, which contain the common genetic variant at the TRAF1-C5 locus. The same analysis can be extended to the whole genome of approximately 550,000 SNPs. For the two-sided alternative that the genotypes at a SNP are associated with the disease, Table 2 summarizes the LOD scores (-log 10 p) of the test Z 2 , which simply compares the frequencies of the minor allele in both groups, the Armitage's tests X A1 2 with co-dominant score, X A2 2 with dominant score, X A3 2 with recessive score, and the Wright's coefficient of inbreeding F; only those SNPs with LOD > 6 are reported. The SNPs across the TRAF1-C5 locus are marked with asterisks.
In Table 2, those six SNPs marked with asterisks have small F (<0.03), and this explains why their X A1 2 values in the third column, which correct for population stratification, are almost the same as Z 2 in the second column. Also, for these six SNPs, X A1 2 is a bit more significant than X A2 2 and X A3 2 , and the latter two are close to each other, which means that these SNPs are very likely co-dominant. For the other seven SNPs, X A3 2 is a bit more significant than X A1 2 , but X A2 2 is not significant at all. This shows that these SNPs are very likely recessive.
Another thing learned from Table 2 is that two SNPs, rs12380341 and rs872863, have extreme large LOD scores for Z 2 , X A1 2 , and X A3 2 , but surprisingly they were not reported by Plenge et al. [1], which was based on the combined sample from the NARAC and the EIRA. Are these two SNPs truly associated with the disease, or are they just false positives? Table 3 summarizes the LOD values for the one-sided alternative that the minor allele at a SNP is positively associated with the disease. Similarly, Z A1 is the statistic Z A with co-dominant score, Z A2 dominant score, and Z A3 recessive score.
From Table 3, only those six SNPs marked with asterisks are significant for the one-sided alternative that the minor allele is positively associated with the disease. These results are completely consistent with the ones in Plenge et al. [1]. By consider the other one-sided alternative that the major allele is positively associated with the disease, the other seven SNPs are significant. Therefore, as discussed in the preceding section, and particularly for this dataset, it seems that it is more reasonable to consider the one-sided alternative that the minor allele is positively associated with the disease.

Discussion
The question of whether the two-sided alternative or the one-sided alternatives should be considered is intractable, but this manuscript attempts to raise the question and address it to some extent. Table 3 shows that if we concentrate on the one-sided alternative that the minor allele is positively associated with the disease, we get exactly the same results as Plenge et al. [1]. For rare diseases, and we have reason to believe that the alleles positively associated with them have low frequencies in a general population. Based on this belief (or alternative hypothesis), it seems that those SNPs without asterisks are false positives under the two-sided alternative.
But if we do not want to believe that the minor allele is positively associated with the disease and do not want to miss any SNPs related to the disease, we had better consider the two-sided alternative.

Conclusion
More information can be gained from GWAS by using multiple scoring systems in the Armitage's trend tests  and examining both the one-sided and the two-sided alternative hypotheses.