Simultaneous QTL detection and genomic breeding value estimation using high density SNP chips
© Veerkamp et al; licensee BioMed Central Ltd. 2010
Published: 31 March 2010
The simulated dataset of the 13th QTL-MAS workshop was analysed to i) detect QTL and ii) predict breeding values for animals without phenotypic information. Several parameterisations considering all SNP simultaneously were applied using Gibbs sampling.
Fourteen QTL were detected at the different time points. Correlations between estimated breeding values were high between models, except when the model was used that assumed that all SNP effects came from one distribution. The model that used the selected 14 SNP found associated with QTL, gave close to unity correlations with the full parameterisations.
Nine out of 18 QTL were detected, however the six QTL for inflection point were missed. Models for genomic selection were indicated to be fairly robust, e.g. with respect to accuracy of estimated breeding values. Still, it is worthwhile to investigate the number QTL underlying the quantitative traits, before choosing the model used for genomic selection.
High density SNP chips with ~50,000 SNPs have become available for most livestock species. Breeding value estimation using all these SNPs simultaneously is expected to yield the highest accuracy . Several parameterisations of the SNP effects in the statistical model have been suggested [2–5]. The objectives of this study were to accurately identify QTL and predict breeding values in the simulated data of the 13th QTL-MAS workshop, using different parameterisations for the SNP effects.
The simulated data of the 13th QTL-MAS workshop is described Coster et al. . Simulated data were analysed per time point, and for QTL detection, the change between traits at subsequent time points was also used. A pedigree based model was fitted using ASREML . The Gibbs sampler described initially by Meuwissen and Goddard  and Calus et al. [4, 5] was used for models including the SNP parameterisations. The general model used was:
, where yi is the phenotypic record of animal i, µ is the average phenotypic performance, animali is the random polygenic effect for animal i, haplotypeijk is a random effect for a paternal (k = 1) or maternal (k = 2) haplotype at locus j (of nloc loci) of animal i, and ei is a random residual for animal i. The first parameterisation was a simple BLUP model with the additive relationship matrix between the animals only. Other parameterisations assumed the SNP effects came from one distribution (SNP1), i.e. BayesA, from two distributions (SNP2 i.e. BayesC), or from three distributions allowing for small, medium and large SNP effects (SNP3). A further parameterisation assumed a QTL was placed in between two SNP and 453 IBD matrices were calculated for all the haplotypes at a bracket using linkage disequilibrium and linkage analysis information . Finally, a parameterisation used the phased genotypes to construct identical by state haplotypes from either 2 or 5 SNP, (IBS2 and IBS5, respectively) as presented before by Villumsen et al.  but with the addition that the same SNP were used at the border of two neighbouring brackets. The final reduced model included the 14 selected SNP that had a posterior probability >0.1 of affecting a QTL in the SNP2 analysis.
An important question is how to model the time series data, and extrapolate the breeding values to the required time point 600. The mean of the traits indicated that points 265, 397 and 530 are in the linear part of the growth curve, confirmed by high phenotypic, and genetic correlations between those points (> 0.95). Graphical inspection confirmed that little information was available to estimate the inflection point or asymptotic values at population individual or genetic level. Therefore all five time point were analysed separately and linear regression fitted through the breeding value at point 265, 397 and 530 was used to extrapolate breeding values to the required point 600.
Evaluation of predicted breeding values (EBV) at point 600 for the animals without phenotypic data
Association EBV with true breeding value
Mean sq. error
Using all SNP simultaneously, 14 QTL were identified with relative sharp peaks in posterior probability and 9 of these were within 5 cM of the 18 QTLs simulated, and all 14 were within 10 cM. Surprisingly few false positive QTL were found especially since the cut off point for the posterior probability of 10% was set arbitrarily. In the context of the simulated growth curve model, five QTLs were found for the asymptote, and four were close to the simulated QTL for relative growth. In our analysis these QTL for relative growth rate were found at the first time points only, as expected since here the effect is largest on the variance. As suggested by the preanalysis no QTL was found within 5 cM of the QTL affecting the inflection point, albeit on chromosome 2 one QTL was close. It would be interesting to see if using the growth model in the analysis would be more successful in picking up the QTL for the inflection point, since such a model resembles the underlying simulated model closer and requires two parameters less to be estimated, compared with the model used here. The disadvantage of fitting the growth curve model might be that sampling covariance between the three parameters, together with the inability to separate these parameters in the current data, might lead to more spurious QTL estimates.
Little difference was found between the IBD and SNP methods, although some of the peaks were distributed across more SNP when using IBD. This might be linked to the genetic history of the QTL or with the parameterisation. For example when the QTL is fixed at a SNP, then using brackets of two SNP will split the effect across the two brackets.
From the correlations and the MSE the breeding values appear fairly robust across the different models with the exception of the model assuming that all SNP effects can be captured with one distribution. The exception of model SNP1 is because the assumption on the distribution of the SNP effects is violated, because some large QTL were present and most SNP had no effect in the simulated data. Interesting to observe that, apart from the BLUP analysis, all regression coefficients deviated from one (Table 1). SNP1 smaller and the other models above one, we have no explanation for this difference. The analysis including a subset of 14 SNP gave high correlations with the other fully parameterised methods, suggesting there was considerable scope in reducing the number of SNP required when the QTLs were estimated in this dataset. However, this is in agreement with findings in real data also .
Nine out of 18 QTL were detected, however the six QTL for inflection point were missed. Models for genomic selection were indicated to be fairly robust. Still, it is worthwhile to investigate the number QTL underlying the quantitative traits, before choosing the model used for genomic selection
Hendrix Genetics, CRV B.V. and NWO-Casimir (The Netherlands Organization for Scientific Research) are acknowledged for financial support for MPLC. KLV was supported by the Erasmus Mundus Sabretrain project and HM by the EU SABRE project. RFV was funded by the EU RobustMilk project.
This article has been published as part of BMC Proceedings Volume 4 Supplement 1, 2009: Proceedings of 13th European workshop on QTL mapping and marker assisted selection.
The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/4?issue=S1.
- THE Meuwissen, ME Goddard: Mapping multiple QTL using linkage disequilibrium and linkage analysis information and multitrait data. Genet Sel Evol. 2004, 36 (3): 261-279. 10.1186/1297-9686-36-3-261.View ArticleGoogle Scholar
- THE Meuwissen, ME Goddard: Prediction of identity by descent probabilities from marker-haplotypes. Genet Sel Evol. 2001, 33: 605-634. 10.1186/1297-9686-33-6-605.View ArticleGoogle Scholar
- TM Villumsen, LLG Janss, MS Lund: The importance of haplotype length and heritability using genomic selection in dairy cattle. Journal of Animal Breeding and Genetics. 2009, 126: 3-13. 10.1111/j.1439-0388.2008.00747.x.View ArticleGoogle Scholar
- MPL Calus, THE Meuwissen, APW de Roos, RF Veerkamp: Accuracy of Genomic Selection Using Different Methods to Define Haplotypes. Genetics. 2008, 178: 553-561. 10.1534/genetics.107.080838.View ArticleGoogle Scholar
- M Calus, T Meuwissen, J Windig, E Knol, C Schrooten, A Vereijken, R Veerkamp: Effects of the number of markers per haplotype and clustering of haplotypes on the accuracy of QTL mapping and prediction of genomic breeding values. Genetics Selection Evolution. 2009, 41: 11-10.1186/1297-9686-41-11.View ArticleGoogle Scholar
- Coster A, Bastiaansen J, Calus M, Maliepaard C, Bink M: QTLMAS 2009: Simulated Dataset. BMC Proceedings. 2010, 4 (Suppl 1): S3-1. 10.1186/1753-6561-4-S1-S3.PubMed CentralView ArticlePubMedGoogle Scholar
- AR Gilmour, BR Cullis, SJ Welham, R Thompson: ASREML. Program user manual. NSW Agriculture, Orange Agricultural Institute, Forest Road, Orange, NSW, 2800, Australia. 2000Google Scholar
- BJ Hayes, PJ Bowman, AJ Chamberlain, ME Goddard: Invited review: Genomic selection in dairy cattle: Progress and challenges. J. Dairy Sci. 2009, 92: 433-443. 10.3168/jds.2008-1646.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.