The common dataset consisted of an outbred population, which had been simulated using the LDSO software , with 1000 generations of 1000 individuals, followed by 30 generations of 150 individuals. 9990 SNP markers were distributed on 5 chromosomes. Each chromosome had a size of 1 Morgan and carried 1998 evenly distributed SNPs (1 SNP every 0.05 cM).
The final dataset used for evaluating genomic selection consisted of 3220 individuals, including 20 sires, 200 dams (each sire mated with 10 dams) and 3000 progenies (15 per dam). All individuals were genotyped for the 9990 SNPs without missing or genotyping error. Of the 15 progenies of each dam, 10 were phenotyped for a continuous trait. The 2000 progenies with phenotypic records and the other 1000 individuals (which had simulated true breeding values) without phenotypic records were treated as reference and validation population, respectively.
Estimation of variance components and EBVs
The variance components and the traditional BLUP EBVs were estimated using phenotypes and pedigree and the software DMUv6 [11
] based on the following model:
where y is the vector of phenotypes of individuals in the reference population, μ is the overall mean, a is the vector of additive genetic effects of the phenotyped individuals and their parents, Z is the incidence matrix of a, and e is the vector of residual errors. The variance-covariance matrices of a and e are and , respectively, where A is the additive genetic relationship matrix, is the additive genetic variance, and is the residual variance.
The reliabilities of the traditional EBVs were obtained from DMU directly and calculated as the square of the correlation between EBVs and the true unknown breeding values.
Estimation of SNP effects
BayesA, BayesB and BayesCπ were used to estimate SNP effects in the reference population based on the following model:
where g is the vector of random SNP effects, X is the matrix of genotype indicators (with values 0, 1, or 2 for genotypes 11, 12, and 22, respectively).
The differences between the three Bayesian methods lay in the assumptions for the prior distribution of SNP effects. BayesA assumes that all SNPs have an effect, but each has a different variance. BayesB and BayesCπ assume that each SNP has either an effect of zero or non-zero with probabilities π and 1-π, respectively, and for those having non-zero effects it is assumed that each SNP has a different variance in BayesB and a common variance in BayesCπ. In addition, in BayesB π is treated as a known parameter, while in BayesCπ it is treated as an unknown parameter with a uniform (0, 1) prior distribution. In this study, we set π = 0.99 for BayesB, and adopted the same prior distributions of g and e for the three Bayesian methods as those in [1, 5].
The Markov chain was run for 50,000 cycles of Gibbs sampling (for BayesB, 100 additional cycles of Metropolis-Hastings sampling were performed for the SNP effect variance in each Gibbs sampling cycle), and the first 5000 cycles were discarded as burn-in. All the samples of SNP effects after burn-in were averaged to obtain the SNP effect estimate.
Calculation of GEBVs
The genomic estimated breeding values (GEBVs) of all genotyped individuals were obtained using five methods: BayesA, BayesB, BayesCπ, GBLUP and TABLUP.
For BayesA, BayesB and BayesCπ, the GEBV of a genotyped individual was calculated as the sum of all marker effects according to its marker genotypes .
For GBLUP and TABLUP, the GEBVs were estimated based on the following model:
where u is the vector of genomic breeding values of all genotyped individuals with the variance-covariance matrix equal to for GBLUP or for TABLUP. is the additive genetic variance estimated from the reference population.
The G matrix (realized relationship matrix) was constructed by using genotypes of all markers . The TA matrix (trait-specific marker-derived relationship matrix), was constructed by using genotypes of all markers with each marker being weighted with its estimated effect obtained from BayesB following the rules proposed by Zhang et al. .
The accuracies of GEBVs were calculated as the correlation between GEBVs and the simulated true breeding values.