Volume 6 Supplement 2
Proceedings of the 15th European workshop on QTL mapping and marker assisted selection (QTLMAS)
Comparison of five methods for genomic breeding value estimation for the common dataset of the 15^{th} QTLMAS Workshop
 ChongLong Wang†^{1, 2},
 PeiPei Ma^{1},
 Zhe Zhang^{1},
 XiangDong Ding^{1},
 JianFeng Liu^{1},
 WeiXuan Fu^{1},
 ZiQing Weng^{1} and
 Qin Zhang^{1}Email author
DOI: 10.1186/175365616S2S13
© Wang et al.; licensee BioMed Central Ltd. 2012
Published: 21 May 2012
Abstract
Background
Genomic breeding value estimation is the key step in genomic selection. Among many approaches, BLUP methods and Bayesian methods are most commonly used for estimating genomic breeding values. Here, we applied two BLUP methods, TABLUP and GBLUP, and three Bayesian methods, BayesA, BayesB and BayesCπ, to the common dataset provided by the 15^{th} QTLMAS Workshop to evaluate and compare their predictive performances.
Results
For the 1000 progenies without phenotypic values, the correlations between GEBVs by different methods ranged from 0.812 (GBLUP and BayesCπ) to 0.997 (TABLUP and BayesB). The accuracies of GEBVs (measured as correlations between true breeding values (TBVs) and GEBVs) were from 0.774 (GBLUP) to 0.938 (BayesCπ) and the biases of GEBVs (measure as regressions of TBVs on GEBVs) were from 1.033 (TABLUP) to 1.648 (GBLUP). The three Bayesian methods and TABLUP had similar accuracy and bias.
Conclusions
BayesA, BayesB, BayesCπ and TABLUP performed similarly and satisfactorily and remarkably outperformed GBLUP for genomic breeding value estimation in this dataset. TABLUP is a promising method for genomic breeding value estimation because of its easy computation of reliabilities of GEBVs and its easy extension to real life conditions such as multiple traits and consideration of individuals without genotypes.
Background
The goal of genomic selection (GS) [1] is to capture all quantitative trait loci (QTL) influencing a trait by tracing all chromosome segments defined by adjacent markers. With use of highly dense markers, GS is supposed to be able to overcome the problem of traditional maker assisted selection (MAS) that only a limited proportion of the total genetic variance is captured by the markers of QTL. GS has become feasible very recently with the high throughput genotyping technology and the availability of highly dense markers covering whole genome. Genomic breeding value estimation is the key step in GS. A number of approaches have been proposed for estimating genomic breeding values [1–9], among which BLUP methods and Bayesian methods are most commonly used. Here, we applied two BLUP methods (GBLUP [3], TABLUP [4]) and three Bayesian methods (BayesA, BayesB [1], BayesCπ [5]) to the common dataset provided by the 15^{th} QTLMAS Workshop to evaluate and compare their predictive performances.
Methods
Dataset
The common dataset consisted of an outbred population, which had been simulated using the LDSO software [10], with 1000 generations of 1000 individuals, followed by 30 generations of 150 individuals. 9990 SNP markers were distributed on 5 chromosomes. Each chromosome had a size of 1 Morgan and carried 1998 evenly distributed SNPs (1 SNP every 0.05 cM).
The final dataset used for evaluating genomic selection consisted of 3220 individuals, including 20 sires, 200 dams (each sire mated with 10 dams) and 3000 progenies (15 per dam). All individuals were genotyped for the 9990 SNPs without missing or genotyping error. Of the 15 progenies of each dam, 10 were phenotyped for a continuous trait. The 2000 progenies with phenotypic records and the other 1000 individuals (which had simulated true breeding values) without phenotypic records were treated as reference and validation population, respectively.
Estimation of variance components and EBVs
where y is the vector of phenotypes of individuals in the reference population, μ is the overall mean, a is the vector of additive genetic effects of the phenotyped individuals and their parents, Z is the incidence matrix of a, and e is the vector of residual errors. The variancecovariance matrices of a and e are $\mathbf{A}{\sigma}_{a}^{2}$ and $\mathbf{I}{\sigma}_{e}^{2}$, respectively, where A is the additive genetic relationship matrix, ${\sigma}_{a}^{2}$ is the additive genetic variance, and ${\sigma}_{e}^{2}$ is the residual variance.
The reliabilities of the traditional EBVs were obtained from DMU directly and calculated as the square of the correlation between EBVs and the true unknown breeding values.
Estimation of SNP effects
where g is the vector of random SNP effects, X is the matrix of genotype indicators (with values 0, 1, or 2 for genotypes 11, 12, and 22, respectively).
The differences between the three Bayesian methods lay in the assumptions for the prior distribution of SNP effects. BayesA assumes that all SNPs have an effect, but each has a different variance. BayesB and BayesCπ assume that each SNP has either an effect of zero or nonzero with probabilities π and 1π, respectively, and for those having nonzero effects it is assumed that each SNP has a different variance in BayesB and a common variance in BayesCπ. In addition, in BayesB π is treated as a known parameter, while in BayesCπ it is treated as an unknown parameter with a uniform (0, 1) prior distribution. In this study, we set π = 0.99 for BayesB, and adopted the same prior distributions of g and e for the three Bayesian methods as those in [1, 5].
The Markov chain was run for 50,000 cycles of Gibbs sampling (for BayesB, 100 additional cycles of MetropolisHastings sampling were performed for the SNP effect variance in each Gibbs sampling cycle), and the first 5000 cycles were discarded as burnin. All the samples of SNP effects after burnin were averaged to obtain the SNP effect estimate.
Calculation of GEBVs
The genomic estimated breeding values (GEBVs) of all genotyped individuals were obtained using five methods: BayesA, BayesB, BayesCπ, GBLUP and TABLUP.
For BayesA, BayesB and BayesCπ, the GEBV of a genotyped individual was calculated as the sum of all marker effects according to its marker genotypes [1].
where u is the vector of genomic breeding values of all genotyped individuals with the variancecovariance matrix equal to $\mathbf{G}{\sigma}_{u}^{2}$ for GBLUP or $\mathbf{T}A{\sigma}_{u}^{2}$ for TABLUP. ${\sigma}_{u}^{2}$ is the additive genetic variance estimated from the reference population.
The G matrix (realized relationship matrix) was constructed by using genotypes of all markers [3]. The TA matrix (traitspecific markerderived relationship matrix), was constructed by using genotypes of all markers with each marker being weighted with its estimated effect obtained from BayesB following the rules proposed by Zhang et al. [4].
The accuracies of GEBVs were calculated as the correlation between GEBVs and the simulated true breeding values.
Results and discussion
Variance components
The estimated additive genetic variance and residual variance were 24.82 and 58.65, respectively. Therefore, the estimated heritability was 0.30. These estimates were used for the subsequent estimation of SNP effects and GEBVs.
Estimates of SNP effects
Peak positions of profiles of the estimated SNP effects and the corresponding estimated SNP effects
Method  Chr. 1  Chr. 2  Chr. 3  Chr. 4  

Pos.  Effect  Pos.  Effect  Pos.  Effect  Pos.  Effect  
BayesA  59  5.19±0.37  3660  1.01±0.90  4094  2.25±0.40  
3914  0.35±0.73  
BayesB  59  1.96±2.13  3660  0.73±0.82  4092  0.91±1.17  
3873  0.56±0.65  
BayesCπ  58  5.15±0.42  3660  0.93±0.96  4092  2.50±0.76  7234  0.53±1.51 
3873  0.76±0.75  4331  0.41±0.67  
Simulated QTL  57  3638  4100  6644  
3875  4300 
Correlations between GEBVs by different methods and between EBVs and GEBVs for the 20 sires
Correlations between GEBVs by different methods (the first 4 columns) and between traditional EBVs and GEBVs (the last column) for the 20 sires
BayesB  BayesCπ  TABLUP  GBLUP  Traditional EBV  

BayesA  0.999  0.995  0.995  0.972  0.942 
BayesB  0.992  0.998  0.978  0.947  
BayesCπ  0.986  0.956  0.933  
TABLUP  0.986  0.952  
GBLUP  0.966 
Correlations between GEBVs by different methods for the 1000 progenies without phenotypic values
Correlations between GEBVs by different methods for the 1000 progenies without phenotypic values.
BayesB  BayesCπ  TABLUP  GBLUP  

BayesA  0.991  0.985  0.983  0.841 
BayesB  0.986  0.997  0.860  
BayesCπ  0.976  0.812  
TABLUP  0.876 
Accuracies and biases of GEBVs
Accuracies and biases of GEBVs for the 1000 progenies without phenotypic values.
Method  r  b 

BayesA  0.924  1.063 
BayesB  0.933  1.068 
BayesCπ  0.938  1.057 
TABLUP  0.924  1.033 
GBLUP  0.774  1.648 
TABLUP is an improvement of GBLUP in the way that the G matrix is replaced with TA matrix. In construction of the TA matrix, not only the marker genotypes, but also the marker effects are taken into account. The advantage of the TA matrix over the G matrix is that it not only accounts for the Mendelian sampling term, but also puts greater weight on loci explaining more of genetic variance for the trait of interest. This makes TABLUP more accurate than GBLUP. On the other hand, although TABLUP and the Bayesian methods gave similar accuracies, TABLUP has two important features that Bayesian methods lack. The first is that the reliability of an individual's GEBV can be calculated by TABLUP through the method outlined for GBLUP by VanRaden [3] and Strandén et al. [13]. The second is that TABLUP can be extended to estimate GEBVs for individuals without genotypes by constructing a joint pedigreegenomic relationship matrix according to the rule proposed by Legarra et al. [14].
Conclusions
BayesA, BayesB, BayesCπ and TABLUP performed similarly and satisfactorily and remarkably outperformed GBLUP for genomic breeding value estimation in this dataset. TABLUP is a promising method for genomic breeding value estimation because of its easy computation of reliabilities of GEBVs and its easy extension to real life conditions such as multiple traits and consideration of individuals without genotypes.
Notes
List of abbreviations used
 QTL:

quantitative trait locus
 MAS:

marker assisted selection
 GS:

genomic selection
 BLUP:

best linear unbiased prediction
 GBLUP:

BLUP with a realized relationship matrix
 TABLUP:

BLUP with a trait specific relationship matrix
 EBV(s):

estimated breeding value(s)
 GEBV(s):

genomic estimated breeding value(s)
 TBV(s):

true breeding value(s)
 SNP:

single nucleotide polymorphism.
Declarations
Acknowledgements
This work was supported by the State HighTech Development Plan of China (Grant No. 2008AA101002, 2011AA100302), the National Natural Science Foundation of China (Grant No. 30800776, 30972092, 31171200), Beijing Municipal Natural Science Foundation (Grant No. 6102016), and the Modern Pig Industry Technology System Program of Anhui Province.
This article has been published as part of BMC Proceedings Volume 6 Supplement 2, 2012: Proceedings of the 15th European workshop on QTL mapping and marker assisted selection (QTLMAS). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcproc/supplements/6/S2.
Authors’ Affiliations
References
 Meuwissen THE, Hayes BJ, Goddard ME: Prediction of total genetic value using genomewide dense marker maps. Genetics. 2001, 157: 18191829.PubMed CentralPubMedGoogle Scholar
 Solberg TR, Sonesson AK, Woolliams JA, Meuwissen THE: Reducing dimensionality for prediction of genomewide breeding values. Genetics Selection Evolution. 2009, 41: 2910.1186/129796864129.View ArticleGoogle Scholar
 VanRaden PM: Efficient methods to compute genomic predictions. J Dairy Sci. 2008, 91: 44144423. 10.3168/jds.20070980.View ArticlePubMedGoogle Scholar
 Zhang Z, Liu J, Ding X, Bijma P, de Koning DJ, Qin Z: Best linear unbiased prediction of genomic breeding values using a traitspecific markerderived relationship matrix. PLoS ONE. 2010, 5 (9): e1264810.1371/journal.pone.0012648.PubMed CentralView ArticlePubMedGoogle Scholar
 Habier D, Fernando RL, Kizilkaya K, Garrick DJ: Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics. 2011, 12: 18610.1186/1471210512186.PubMed CentralView ArticlePubMedGoogle Scholar
 Yi N, Xu S: Bayesian LASSO for quantitative trait loci mapping. Genetics. 2008, 179: 10451055. 10.1534/genetics.107.085589.PubMed CentralView ArticlePubMedGoogle Scholar
 Zou H, Hastie T: Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B. 2005, 67: 301320. 10.1111/j.14679868.2005.00503.x.View ArticleGoogle Scholar
 Gianola D, Fernando RL, Stella A: Genomicassisted prediction of genetic value with semiparametric procedures. Genetics. 2006, 173: 17611776. 10.1534/genetics.105.049510.PubMed CentralView ArticlePubMedGoogle Scholar
 Long N, Gianola D, Rosa GJM, Weigel KA, Avendano S: Machine learning classification procedure for selecting SNPs in genomic selection: application to early mortality in broilers. J Anim Breed Genet. 2007, 124: 377389. 10.1111/j.14390388.2007.00694.x.View ArticlePubMedGoogle Scholar
 Ytournel F: Linkage disequilibrium and QTL fine mapping in a selected population. PhD thesis. 2008, Station de Génétique Quantitative et Appliquée, INRAGoogle Scholar
 Madsen P, Jensen J: DMU: A user's Guide. A Package for Analysing Multivariate Mixed Models. University of Aarhus, Faculty of Agricultural Sciences, Department of Animal Breeding and Genetics. 2007Google Scholar
 Gianola D, de los Campos G, Hill WG, Manfredi E, Fernando RL: Additive Genetic Variability and the Bayesian Alphabet. Genetics. 2009, 183: 347363. 10.1534/genetics.109.103952.PubMed CentralView ArticlePubMedGoogle Scholar
 Strandén I, Garrick DJ: Technical note: derivation of equivalent computing algorithms for genomic predictions and reliabilities of animal merit. J Dairy Sci. 2009, 92: 29712975. 10.3168/jds.20081929.View ArticlePubMedGoogle Scholar
 Legarra A, Aguilar I, Misztal I: A relationship matrix including full pedigree and genomic information. J Dairy Sci. 2009, 92: 46564663. 10.3168/jds.20092061.View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.