Comparison of analyses of the QTLMAS XIII common dataset. I: genomic selection.

BACKGROUND
Genomic selection, the use of markers across the whole genome, receives increasing amounts of attention and is having more and more impact on breeding programs. Development of statistical and computational methods to estimate breeding values based on markers is a very active area of research. A simulated dataset was analyzed by participants of the QTLMAS XIII workshop, allowing a comparison of the ability of different methods to estimate genomic breeding values.


METHODS
A best case scenario was analyzed by the organizers where QTL genotypes were known. Participants submitted estimated breeding values for 1000 unphenotyped individuals together with a description of the applied method(s). The submitted breeding values were evaluated for correlation with the simulated values (accuracy), rank correlation of the best 10% of individuals and error in predictions. Bias was tested by regression of simulated on estimated breeding values.


RESULTS
The accuracy obtained from the best case scenario was 0.94. Six research groups submitted 19 sets of estimated breeding values. Methods that assumed the same variance for markers showed accuracies, measured as correlations between estimated and simulated values, ranging from 0.75 to 0.89 and rank correlations between 0.58 and 0.70. Methods that allowed different marker variances showed accuracies ranging from 0.86 to 0.94 and rank correlations between 0.69 and 0.82. Methods assuming equal marker variances were generally more biased and showed larger prediction errors.


CONCLUSIONS
The best performing methods achieved very high accuracies, close to accuracies achieved in a best case scenario where QTL genotypes were known without error. Methods that allowed different marker variances generally outperformed methods that assumed equal marker variances. Genomic selection methods performed well compared to traditional, pedigree only, methods; all methods showed higher accuracies than those obtained for breeding values estimated solely on pedigree relationships.


Background
When methods for selection based on many markers across the genome, or genomic selection, were first described [1] the application of genetic marker data in plant and animal breeding programs was still limited [2]. In subsequent years the use of individual markers in breeding programs has increased [3,4]. With the availability of assays that provide genotypes for 50,000 or more markers for each individual, the application of genomic selection has started to take hold in recent years. Especially in dairy cattle the use of genomic selection is becoming common practice [5,6]. In other species the application of genomic selection is being considered or evaluated [7][8][9].
Methods to deal with these large number of markers in breeding programs were first proposed by [1] after which a number of alternatives have been suggested; e.g. [10,11]. Most methods have been evaluated in simulations and sometimes on real data. Analyses applying genomic BLUP methodology as defined by Meuwissen et al. [1], or applying Ridge Regression (RR) [12], assume the same variance for each marker. A series of wellknown methods are those named BayesA, BayesB, etc. BayesA assumes the same a priori variance for all markers, where effects are drawn from one distribution [1]. BayesB divides the markers in 2 groups: one group that contributes to the genetic variance and have the same a priori non-zero variance, and another group whose effect are supposed to be zero [1]. Another variant, sometimes referred to as "BayesC", considers two distributions: one with large effect (that are assumed to be linked to a QTL) and one with small effects (that are assumed to be not linked to a QTL) [11,13].
The organizers of the previous QTL-MAS workshop initiated a comparison of methods using a simulated dataset which resembled a population one might encounter in litter-bearing animals. They concluded that models that include markers as fixed effects were unlikely to provide any gain from the use of markers, while random effects models and especially the Bayesian analyses were most promising [14].
We aimed to compare methods that estimate genomic breeding values (GEBV) in a dataset that one might encounter in both plant and animal breeding programs and added the complexity of repeated measures over time. Participants of the current QTL-MAS workshop 2009 were invited to predict GEBV and describe their methods and results. Predicted GEBV from the different methods were submitted to the QTL-MAS workshop and compared to the simulated or true breeding values (TBV).

Simulated data
18 QTL were simulated affecting a trait called yield that followed a logistic growth curve. The growth curve was determined by 3 parameters and for each parameter, 6 QTL determined the genetic value with one large QTL (50% of genetic variance) and 5 smaller QTL. Phenotypic values for the parameters were simulated with a heritability of 0.50. Workshop participants were provided with genotypes for a set of biallelic markers that did not include the genotypes for the 18 QTL. Data available to participants of QTLMAS XIII consisted of 100 full-sib families which resulted from factorial mating of 20 female and 5 male parents. Each full-sib family consisted of 20 offspring. Parent-offspring relationships were provided, but relationships between parents were not. All offspring had genotypes for 453 markers, distributed over 5 chromosomes of 1 Morgan each. Phenotypes were provided for the offspring of 50 full-sib families and consisted of cumulative yield values at 5 different points in time, the last time point being 530. Further details of the simulation are described elsewhere [15] and the dataset is available from http://www.qtlmas2009. wur.nl/UK/Dataset.

Best-case analysis
The workshop organizers applied a "best-case" analysis to the simulated data to provide an upper bound of the expected accuracies of the contributed analyses. This best-case analysis made use of additional information which was not provided to the workshop participants. The correct model, a logistic growth curve, was used to estimate 3 growth curve parameters from phenotypes for each individual. More importantly, the true genotypes of the QTL were used. Workshop participants could apply the correct growth model without knowing this was the case, but the actual genotypes could not be used. In the best-case analysis, the true QTL genotypes were used as the only variables in a multitrait fixed regression model to estimate the QTL effects on the 3 growth curve parameters. The estimated growth curve QTL effects were subsequently used to predict breeding value for yield on time point 600 for the unphenotyped individuals.

Prediction strategies
QTLMAS XIII participants were asked to predict breeding values for the unphenotyped offspring (n = 1000) at time point 600. Timepoint 600 was outside the range of time points for which phenotypes were provided. Several strategies could be followed: 1) predict phenotypes at time point 600, using any of several methods, and use these to predict breeding values, 2) use a function to describe the observed phenotypes, predict breeding values for the parameters of this function and use those to calculate EBV at time point 600, 3) predict breeding values for the 5 different time points and use these to extrapolate to time point 600, using any of several methods.

Comparison of predicted breeding values
Accuracies of GEBV, reported by workshop participants on unphenotyped offspring, were calculated as the correlation between the GEBV and the TBV. Bias was assessed from the regression of TBV on the GEBV. The ability of methods to identify the best individuals was assessed from the rank correlations of predicted and TBV of individuals in the top 10% of TBV. Mean squared prediction error was calculated after predicted and TBV were centered on zero. The variances of GEBV were calculated and reported as a proportion of the variance of TBV.

Best-case analysis
The regression model applied directly to the QTL genotypes resulted in the highest accuracy (0.985) and rank correlation (0.935) of GEBV with TBV of unphenotyped individuals. However, regression of TBV on GEBV resulted in a regression coefficient of 0.847, relatively far away from 1 compared to other methods, The variance of GEBV was also higher (34.3) than the variance of the TBV (25.3) and higher than the variance from any other estimation method.

Prediction strategies
All authors applied a procedure with two or three steps. In most cases two steps were used with one step to predict phenotypes or breeding values at time point 600 and one step to estimate genetic effects for the markers. The order of these two steps varied and various methods were applied for both extrapolation to time point 600 as well as for the estimation of marker effects.

Two step strategies
Three authors [16][17][18] started with predicting phenotypes on time point 600 which were then used to predict breeding values. Phenotypes were predicted using a Logistic model (model 5) or a Gompertz model (models 1 to 4). Predicted phenotypes at time point 600 were subsequently used in single trait analyses to predict EBV at time point 600.
Three authors [19][20][21] started by predicting EBV in single trait analyses applied to each of the time points at which phenotypes were available and subsequently used those EBV to extrapolate to an EBV at time point 600. Extrapolation to time point 600 was done using quadratic regression (model 6) or linear regression (models 7 to 18). Linear regression models only used the last 3 time points.

Three step strategy
One author [17] (model 19) applied a three step strategy where first a growth model was used, then a model to estimate marker effects and finally again a growth model to predict EBV at time point 600. In step 1 the parameters of a Gompertz model were estimated for each of the individuals with phenotypes. Marker effects were estimated for each of the 3 parameters in the Gompertz model in the second step. The third step used these estimated SNP effects to calculate the EBV of individuals at time point 600.

Comparison of predicted breeding values
A number of different estimation methods were applied by participants (Table 1). Apart from the prior knowledge analysis performed by the organizers, no fixed effect models were applied to obtain GEBV. One participant [19] applied a pedigree BLUP model (model 8) as one of their approaches which ignored all marker data. We termed 2 models as being a "genomic BLUP" implementation, which meant that each marker was assumed to have the same variance that was not (re-)estimated in the model. The other 16 models were termed "Bayes" models, which meant that they considered a priori assumptions for the marker variances, and estimated the marker variances conditional on the a priori assumptions and the estimated marker effects. All implementations differed in some aspect. The two genomic BLUP implementations used either a genomic relationship matrix [20] (model 6) or a Ridge Regression approach [18] (model 5) both of which are equivalent to a genomic BLUP implementation [22,23]. It needs to be noted that model 6 did not use all available marker information but limited itself to a single chromosome. The 16 models classified as Bayes applied various forms of marker selection or separating markers into groups with different expected effect sizes. Only one group [19] applied the use of haplotype information in some of their models using either identity by descent matrices based on 2 locus haplotypes (model 9) or identity by state haplotypes (models 10 and 11).

Accuracy
The lowest accuracy was 0.647, obtained with pedigree BLUP (model 8), clearly below the second lowest model (model 6) with an accuracy of 0.751. All other methods performed markedly better than the genomic BLUP model 6 ( Figure 1). The other genomic BLUP model (model 5) showed an accuracy of 0.889 which was within the lower half of the range of the Bayes models. Accuracies obtained with the Bayes models were between 0.857 to 0.945. The model with the most accurate predictions (model 3) had an accuracy just 4 percent below the results of the best-case analysis.

Rank correlations
Rank correlations were calculated from the ranking of the top 10% of individuals, based on TBV. The range of rank correlations for the Bayes methods was 0.691 to 0.816 ( Figure 2). The two best methods switched positions when measured on rank correlation versus accuracy but overall the evaluation of methods based on accuracy was very similar to evaluation based on their ability to rank the top individuals (correlation = 0.91).

Bias
The pedigree BLUP analysis was found to have a regression coefficient that was closest to 1 (Figure 3). The genomic BLUP method by Schulz-Streek 5 also yielded a regression coefficient very close to 1, while the variance of GEBV from that model was approximately twice the variance of EBV obtained with pedigree BLUP (Table 2, Figure 4). Regression coefficients of Bayes models ranged from 0.804 for a Bayesian implementation of LASSO (model 2) to 1.16 for a BayesB implementation (model 16).

Prediction error
Average prediction error was largest for the pedigree BLUP EBV, which is due to the relatively low accuracies for this method and hence significant shrinkage of the resulting EBV ( Figure 5). Smallest prediction errors were obtained with the most accurate prediction method (model 3). Average prediction error showed a very strong correlation with accuracy (r = -0.99).

2-step and 3-step methods
Accuracy of GEBV from the 3-step method (model 19) was 0.897, which was very similar to the average accuracy, 0.893, of all the other Bayes methods. Because the simulated QTL affect the parameters of the true logistic growth curve it might have been expected that methods that look for associations of markers with these underlying parameters had an advantage. In this dataset this does however not appear to be the case. It was found that extrapolation to time point 600 was not a big challenge because for most individuals this time point was within the part of the growth curve where growth was almost linear. In fact, two authors extrapolated exactly this way, by linear regression on the last three timepoints and obtained high accuracies. The impact of methods for extrapolation would have been bigger when a time-point closer to the asymptote would have been chosen. However, extrapolation of the data was a secondary objective of the QTLMAS workshop whereas comparison of methods to obtain GEBV was a primary objective.

Bayes and genomic BLUP methods
QTL contributing to the three parameters of the growth curve were unequal in size. This was expected to favor Bayes methods over genomic BLUP methods that assume the same variance for all markers. This difference was not directly apparent from the results presented in this comparison as one of the genomic BLUP models (model 6) resulted in a lower accuracy (0.751) compared to all other marker methods while the accuracy (0.889) from the other genomic BLUP method (model 5) was higher than some of the Bayes methods. The GEBV from model 6 were obtained using a genetic covariance matrix build from a subset of 90 markers that were selected from only the first of the five chromosomes. The resulting GEBV may have reduced Table 1 Extrapolation, prediction and estimation methods Step 1 Step 2 Most likely the use of a polygenic component increased the accuracy of model 5. While a polygenic component was not used by model 6, some of the Bayes methods did include polygenes (models 7 to 18). In addition to the relatively high accuracy, the best genomic BLUP model (model 5) produced unbiased GEBV where many of the other methods showed moderately to severely biased results.
The structure of the data was such, that full sib families were either completely genotyped, or completely not genotyped. This structure was chosen, to avoid that models would benefit too much from close relationships between the animals in the training and validation data.
The larger the distance between training and validation, the more emphasis on LD information to predict GEBV of animals without phenotypes [24]. Bayes methods can employ LD to focus variance on specific parts of the genome, where genomic BLUP methods only employ the genomic relation over the whole genome. Therefore, in addition to the small number of QTL with relatively large effect, most likely the population structure was also more beneficial for the Bayes compared to the genomic BLUP models.

Number of markers included in the model
The proportion of markers, π, selected into the model or into the distribution of large marker effects was reported by Pong-Wong and found to be relatively high, even close to 1 for one of their methods where they tried to let the model decide on the value of π . Nevertheless, the proportion of markers that was included in the distribution of markers with large effect in more than half of the cycles (i.e. that had a posterior probability > 0.5), was limited. The posterior proportions in the various distributions were not reported by the other authors. Results obtained with a method that solely included 14 markers (model 7) show that very high accuracies could be obtained, at least in this dataset, with a small number, and a small proportion of the total number of markers selected into the model. The 14 SNPs included in model 7 were selected based on their association with the phenotype and analysed with a Bayes model. Preselection of SNPs and analysing them as fixed effects has not been considered by any of the participants. Results from QTLMAS comparison in 2008 [14] as well as the first comparison to genomic BLUP, BayesA and BayesB [1] already showed a low accuracy for these fixed effects models which can be expected when many markers are available to be selected into the model.

Conclusions
Accuracies of GEBV were always higher than those estimated based on pedigree alone (model 8). Methods that allow different variances of markers generally performed better than genomic BLUP methods that assume equal variance for all markers but differences were not very large, except when only a portion of the genome was used (model 6). The best Bayes method achieved an accuracy that was 0.056 higher than the best genomic BLUP method. The simulated QTL varied strongly in size which will have favored the Bayes methods in the comparison of accuracies between the two types of methods. The highest accuracies obtained were very close to those from the best case analysis, where knowledge about QTL genotypes was used. Methods to extrapolate to time point 600 from the observed phenotypes at time points up to 530 appear to have had a minor impact on the accuracies of GEBV.