We propose a Bayesian logistic regression procedure to select important SNPs based on the *z* scores or the regression coefficient estimates for further analysis. From the simulation studies, when using a Gaussian prior, the percentage of causal SNPs correctly selected ranges from 64% to 83% among the top 20% SNPs. For the Laplace prior, the percentage of correctly identified causal SNPs ranges from 40% to 67%. The Gaussian prior outperforms Laplace prior, which could be attributable to a less stringent feature selection criterion employed for the Gaussian prior.

Among the top 300 SNPs selected by the *z* scores for the dominance model, three are significant after adjusting for multiple comparisons (see Table 2). For the additive model, five additional SNPs are significant after multiple comparisons adjustment. These SNPs lie in a region from 91,959,665 kb to 132,180,015 kb on chromosome 9 (LD plots not included due to space limitations). Three of the eight SNPs are in the region reported in Plenge et al. [5] (rs1953126, rs2900180, and rs942152), and two of them are in LD (rs1953126 and rs2900180). One of these SNPs, rs1953126, was reported in a study of 475 Caucasian patients [8] to be significantly associated with rheumatoid arthritis (odds ratio 1.28, CI 1.16-1.40, trend *p-*value = 1.45 × 10^{-6}). The other five SNPs are not in the candidate region and are not in LD with SNPs in the region. The significance of other SNPs deserves further investigation in an independent sample.

An alternative one-step approach would be reporting permutation *p*-values of Bayesian logistic regression with all SNPs on the whole sample. However, it is well known that increasing number of predictors, and therefore the number of parameters, in a multivariate analysis may reduce power. The two-step approach provides a balance between the need to reduce multiple comparisons and the loss of power due to increasing number of parameters.

We only analyzed SNPs with no missing data due to the incapability of handling missing covariates data of the BBRBMR software. One solution is to first impute the missing genotypes and then run the Bayesian regression on the imputed data. An alternative is to handle missing data directly in a Bayesian analysis by data augmentation.

Here the priors are assumed to be independent and their variances are assumed to be the same. We choose prior variance by cross-validation. An alternative strategy would be specifying a hyper-prior distribution (such as non-informative prior). To incorporate prior knowledge such as physical distance between the SNPs, one can specify prior distribution to have distance-based correlation. How to specify such a correlation for a large scale regression is worth further attention.