Genomic selection in tree breeding: testing accuracy of prediction models including dominance effect

Background The concept of Marker Assisted Selection (MAS) is rapidly evolving in animal and plant breeding. With the advent of high throughput molecular technology, numerous molecular markers distributed throughout the whole genome can be produced to characterize many genetic entries involving new perspectives in methodology of selection. An important research activity has begun in the animal world given the first theoretical framework for a methodology called genomic selection (GS) [1]. Several statistical approaches have been proposed for the prediction of genomic breeding values and numerous results are available that validates the interest of this method in animal breeding. In plants the GS is still limited to very advanced model species involved in genetic improvement and especially from scenario-based simulation [2,3]. In tree breeding the GS could significantly reduce the cost of genetic improvement schemes by limiting the size and number of field experiments; and facilitating the early selection at the nursery stage [4]. If most of the studies on GS have addressed the prediction of breeding value, taking into account the gene additive effects, there is still a lack of analyses dealing with the total genetic value (genotypic value) including both additive and dominance effects. This aspect is important in plant and especially in tree breeding where the goal of some programs is the production of clones or elite families. The aim of this study is to investigate the performance of GS in the context of tree breeding when the selection is based on genotypic value. The proposed approach allows taking into account both additive and dominance effect [5] for each marker in the statistical model. Six scenarios are simulated to test the reliability of the GS in the frame of recurrent selection scheme.


Background
The concept of Marker Assisted Selection (MAS) is rapidly evolving in animal and plant breeding. With the advent of high throughput molecular technology, numerous molecular markers distributed throughout the whole genome can be produced to characterize many genetic entries involving new perspectives in methodology of selection. An important research activity has begun in the animal world given the first theoretical framework for a methodology called genomic selection (GS) [1]. Several statistical approaches have been proposed for the prediction of genomic breeding values and numerous results are available that validates the interest of this method in animal breeding. In plants the GS is still limited to very advanced model species involved in genetic improvement and especially from scenario-based simulation [2,3].
In tree breeding the GS could significantly reduce the cost of genetic improvement schemes by limiting the size and number of field experiments; and facilitating the early selection at the nursery stage [4]. If most of the studies on GS have addressed the prediction of breeding value, taking into account the gene additive effects, there is still a lack of analyses dealing with the total genetic value (genotypic value) including both additive and dominance effects. This aspect is important in plant and especially in tree breeding where the goal of some programs is the production of clones or elite families. The aim of this study is to investigate the performance of GS in the context of tree breeding when the selection is based on genotypic value. The proposed approach allows taking into account both additive and dominance effect [5] for each marker in the statistical model. Six scenarios are simulated to test the reliability of the GS in the frame of recurrent selection scheme.

Simulation
The data used to evaluate the accuracy of the model have been simulated using HaploSim package in R software [6].
Firstly, populations were simulated for 1000 generations at an effective size of 100 to reach a mutation-drift balance. Fifty parent trees were then selected to start a breeding scheme that was conducted during two generations. At each generation, a progeny test was implemented using a factorial mating design. The fifty percent parents were selected and crossed using circular design to constitute the following generation. At each generation, 670 individuals issued from the mating of 16 females and 34 males were evaluated for clonal selection. The 670 individuals were genotyped for 400 SNP markers equally-spaced across one chromosome of one Morgan corresponding to an efficient marker density.
The broad sense heritability HÂ² was equal to 0.3. A gamma distribution was used to sample the 44 QTL effects.
The additive (breeding), dominance and genotypic values were simulated for each individual. The ratio of dominance to additive variance was equal to 0.1, 0.5 and 1. Six scenarios were evaluated for predicting the genotypic value: three different ratios and two different QTL distributions (high proportion with small or medium effects).

Analysis model
Genomic selection consists in following steps: (i) estimation of the effects of all markers in a 'training data set', where the individuals are phenotyped and genotyped; (ii) prediction of the genetic values of other 'evaluation' individuals by combining their marker genotypes with the estimates obtained in step (i) .
A Bayesian implementation of the Lasso method with BLR package in R [7,8] was used to estimate the substitution and dominance effects for each of the 400 SNP. This method allows predicting the genotypic value using all markers simultaneously with different variances for each marker effect. We evaluated the performance of the statistical model with and without dominance effects. The training and validation set corresponded, respectively, to the first and second generation containing each 670 individuals. The criterion to compare the different scenarios was the accuracy calculated as the correlation between true and predicted genotypic value. Each simulated data set and analysis was replicated 30 times.

Results
The accuracy of two models decreases when the ratio of dominance to additive variance increases whatever the QTL distribution (table 1). In all scenarios GS is superior to basic phenotypic selection. In addition, the model with dominance effects shows a higher accuracy, especially when the variance ratio increases. For the second generation the same trend is observed but the model with dominance effects is more accurate when the ratio of variances is greater than 0.5. A lower accuracy is observed in the second generation; it can be attributed to the low linkage disequilibrium of this breeding population two generations after selection in simulated wild population.

Conclusions
The model including dominance effects is more accurate to predict the genotypic value especially when the dominance-additive variance ratio increases. These results are particularly interesting for tree improvement in hybrid populations where dominance effects are marked and clonal varieties are produced (eucalyptus, poplar, for example).