### Model

First, an animal model BLUP using pedigree and phenotypes was performed to predict breeding values of all animals, and REML estimate of heritability was obtained [9]. Further, the heritability and breeding values were estimated using genomic models [10] and compared with the ones from animal model.

The SNP-BLUP additive genomic model was used to estimate SNP effects in the first step as:

${\mathbf{y}}_{i}=\mu +\sum _{j}{x}_{ij}{\beta}_{j}+{\mathbf{e}}_{i}\phantom{\rule{1em}{0ex}}i=\mathsf{\text{1}},\dots ,n\phantom{\rule{1em}{0ex}}\mathsf{\text{and}}\phantom{\rule{1em}{0ex}}j=\mathsf{\text{1}},\dots ,q$

(1)

where *μ* is the general mean, *β*_{
j
} is the allele substitution effect of the *j*th SNP with $\beta ~N\left(\mathbf{0},\mathbf{I}{\sigma}_{\beta}^{2}\right)$, *x*_{
ij
} is the genotype covariate of the *j*th marker for the *i*th animal, associating marker effects *β*_{
j
} to the phenotype *y*_{
i
}, and e_{
i
} is residual or environmental effect with $e~N\left(0,\mathbf{I}{\sigma}_{e}^{2}\right)$. The genotype covariate is initially coded as 0,1,2 for homozygote, heterozygote and alternate homozygote, and then centered to have mean zero. Flat priors were considered for the residual variance ${\sigma}_{e}^{2}$ and marker variance ${\sigma}_{\beta}^{2}$ to estimate them in a reference Bayesian approach that uses the frequentist likelihood as the Bayesian posterior distribution.

The estimated marker effect

${\widehat{\beta}}_{j}$ from model (1) was used to estimate the variance explained by that marker in the population as

${p}_{j}\left(\mathsf{\text{1}}-{p}_{j}\right){\beta}_{j}^{2}$, where

*p*_{
j
} is the frequency of one of the alleles of the

*j*th SNP. Then, all markers were sorted based on their explained variance and each 150 marker were grouped together. We tried a grid of different SNP-group sizes and among them a SNP-group size of 150 yielded the highest accuracy of PBV in the validation dataset (explained in

*results and discussion*).The model for the second step (ALL-SNP) with grouped SNP and group specific SNP variance was:

${\mathbf{y}}_{i}=\mu +\sum _{j}{x}_{ij}{\beta}_{jk}+{\mathbf{e}}_{i}\phantom{\rule{1em}{0ex}}i=\mathsf{\text{1}},\dots ,n,\phantom{\rule{1em}{0ex}}j=\mathsf{\text{1}},\dots ,q\phantom{\rule{1em}{0ex}}\mathsf{\text{and}}\phantom{\rule{1em}{0ex}}k=\mathsf{\text{1}},\dots ,g$

(2)

where *β*_{
jk
} denotes the effect of the *j*th marker that belong to group *k*, and *g* is the total number of groups. A model was used with a variance parameter per group, and the group variances are jointly modeled to have an inverse chi-square distribution in which the scale is treated as a model parameter. The prior specification was as follows:

*μ~uniform*; ${\beta}_{jk}~N\left(0,{\sigma}_{k}^{2}\right)$; ${\sigma}_{k}^{2}~{\chi}^{-2}\left(scale,df\right)$; ${\sigma}_{e}^{2}~uniform\left(>0\right)$; ${\sigma}_{e}^{2}~uniform\left(>0\right)$; *scale~uniform*(>0); and *df*: a fixed number as hyperparameter.

The fully conditional posterior distributions were as follows:

$\begin{array}{c}\mu |.~N\left({n}^{-1}\sum _{i=1}^{n}\left({y}_{i}-\sum _{j=1}^{q}{x}_{ij}{\beta}_{jk}\right),\frac{{\sigma}_{e}^{2}}{n}\right);\\ {\beta}_{jk}|.~N\left({\left[\sum _{i=1}^{n}{x}_{ij}^{2}+\frac{{\sigma}_{e}^{2}}{{\sigma}_{k}^{2}}\right]}^{-1}\sum _{i=1}^{n}{x}_{ij}^{}\left({y}_{i}-\mu -\sum _{l\ne j}^{q}{x}_{il}{\beta}_{lk}\right),{\left[\sum _{i=1}^{n}{x}_{ij}^{2}+\frac{{\sigma}_{e}^{2}}{{\sigma}_{k}^{2}}\right]}^{-1}{\sigma}_{e}^{2}\right);\end{array}$

${\sigma}_{k}^{2}|.~{\chi}^{-2}\left({\left[{n}_{k}+df\right]}^{-1}\left[{n}_{k}\sum _{j=1}^{{n}_{k}}{\beta}_{jk}^{2}+scale\times df\right],{n}_{k}+df\right)$ where

*n*_{
k
} is the group size for SNP-group

*k*. In this study

*n*_{
k
} was equal to

*p* for all SNP-groups;

$\begin{array}{c}{\sigma}_{e}^{2}|.~{\chi}^{-2}\left({\left[n-2\right]}^{-1}\left[n\left(\sum _{i=1}^{n}{y}_{i}-\mu -\sum _{j=1}^{q}{x}_{ij}{\beta}_{jk}\right)\right],n-2\right);\\ \mathsf{\text{and}}\phantom{\rule{2.77695pt}{0ex}}scale|.~Gamma\left(\frac{g\times df}{2}+1,\frac{2}{df}\sum _{k=1}^{g}\frac{1}{{\sigma}_{k}^{2}}\right).\end{array}$

(3)

In further analyses, respectively, 1500 (SNP1500) and 450 (SNP450a, SNP450b) markers with the largest effects from model (2) were selected and were allocated to groups of size 150 (for 1500 markers), and 75 or 50 (for 450 markers). Then, breeding values of animals without records were predicted using the marker effects from these subsets of markers.