### BRSVD method

Let us consider the standard regression model in the matrix form:

where

*y*_{
n
}_{×1} is a vector of quantitative dependent variables,

*X*_{
n
}_{×}_{
k
} is the design matrix,

*β*_{
k
}_{×1} is a vector of parameters to be estimated,

*I*_{
n
} is an

*n* ×

*n* identity matrix, and

*σ*^{2} is an unknown variance; as before,

*k* and

*n* are the number of SNPs and the number of samples, respectively. By applying singular value decomposition (SVD) to the design matrix

*X*′ =

*ADF*′, the model in Eq. (

1) with the SVD of

*X* can be written:

As in Kwon et al. [

2], we call

*γ* a superfactor vector because it is expressed as a linear combination of the original parameters

*β*. The statistical inference will be held on the superfactor vector instead of on

*β.* From Eq. (

2), the likelihood function of

*y* given (

*γ*,

*σ*^{2}) can be obtained as:

and

is the maximum-likelihood (or least-squares) estimator of

*γ.* Let us choose prior densities for (

*β*|

*σ*^{2}) and

*σ*^{2} as:

where

*IG* is the inverted gamma distribution and (

*β**,

*m*,

*a*,

*b*) are known hyperparameters. Because

*γ* =

*A*′

*β*, the conjugate prior density on

*β* implies the conjugate prior density on

*γ* so that:

Thus the prior density on (

*γ*,

*σ*^{2}) can be expressed as:

The joint posterior distribution for (

*γ*,

*σ*^{2}) can be obtained by multiplying the likelihood function in Eq. (

4) to the prior density in Eq. (

9):

The marginal densities for

*γ* and

*σ*^{2} can be obtained by integrating Eq. (

10) with respect to

*σ*^{2} and

*γ*, respectively. Given the observed data, the marginal posterior density for

*γ* is a multivariate Student’s

*t* distribution in which each element is a Student’s

*t* distribution with (

*n* +

*a*) degrees of freedom and the marginal density for

*σ*^{2} is:

With these posterior distributions, the

*γ* can be estimated through a Markov chain Monte Carlo simulation with Gibbs sampler, which starts with the maximum-likelihood estimate. To transform the superfactor vector (

*γ*) in Eq. (

2) back to

*β*, which is our original parameter of interest vector, we use the most general solution form for the linear equation (

*γ* =

*A*′

*β*) and achieve the unique solution for

*β* by choosing the generalized inverse of

*A*′ as

*A* [

3]. We use a permutation test to estimate the significance of the SNP effects on the phenotype. Let

be the estimate of the

*i*th SNP effect from the raw data, and let

be the estimate of the

*i*th SNP effect from the jth shuffled data that were obtained by permuting the quantitative trait (y). Define

as the difference between

and

. Then the test statistic can be defined as:

where

is the sample mean of

and

is the standard error of

. Under the null hypothesis (

*H*_{0}:

*β*_{
i
} = 0), the statistic Λ

_{
i
} follows the standard normal distribution when

*J* is large:

### Study sample and association analysis

We used the unrelated individuals data distributed by GAW17, which includes 697 individuals, 24,487 SNPs, and 3 covariates (sex, age, and smoking status). We analyzed the first 10 replicates of phenotypes for quantitative risk factor Q1. We first performed the single-SNP association test using the simple linear regression model option in PLINK [4]. Second, we applied the PR method with L1 penalty introduced by Tibshirani [1] using the R package monomvn [5]. We evaluated SNP association with Q1 within the maximum number of SNPs allowed by the package in each step, which is min(*k*, *n* − intercept). Because the package does not provide *p*-values, we used the same permutation technique as in the BRSVD method to obtain empirical *p*-values. Third, we implemented the BRSVD method. To define significant SNPs for each method, we considered the following statistical models: quantitative risk factor Q1 versus the single SNP and the three covariates for the single-SNP association test; quantitative risk factor Q1 versus the maximum number of SNPs allowed by the package plus the three covariates for the PR method; and quantitative risk factor Q1 versus all SNPs (24,487) and the three covariates for the BRSVD method. All SNPs identified as significant for each model were compared to the 39 SNPs listed in the answer sheet distributed by GAW17. The analyses were run for each of the first 10 replicates, and the average of the 10 replicates was summarized (see Results section).