Comparison of analyses of the QTLMAS XII common dataset. I: Genomic selection

Lund, Mogens Sandø; Sahana, Goutam; de Koning, Dirk-Jan; Su, Guosheng; Carlborg, Örjan

doi:10.1186/1753-6561-3-S1-S1

Volume 3 Supplement 1

Proceedings of the 12th European workshop on QTL mapping and marker assisted selection

Introduction
Open access
Published: 23 February 2009

Comparison of analyses of the QTLMAS XII common dataset. I: Genomic selection

Mogens Sandø Lund¹,
Goutam Sahana¹,
Dirk-Jan de Koning²,
Guosheng Su¹ &
…
Örjan Carlborg³

BMC Proceedings volume 3, Article number: S1 (2009) Cite this article

4610 Accesses
47 Citations
Metrics details

Abstract

A dataset was simulated and distributed to participants of the QTLMAS XII workshop who were invited to develop genomic selection models. Each contributing group was asked to describe the model development and validation as well as to submit genomic predictions for three generations of individuals, for which they only knew the genotypes. The organisers used these genomic predictions to perform the final validation by comparison to the true breeding values, which were known only to the organisers. Methods used by the 5 groups fell in 3 classes 1) fixed effects models 2) BLUP models, and 3) Bayesian MCMC based models. The Bayesian analyses gave the highest accuracies, followed by the BLUP models, while the fixed effects models generally had low accuracies and large error variance. The best BLUP models as well as the best Bayesian models gave unbiased predictions. The BLUP models are clearly sensitive to the assumed SNP variance, because they do not estimate SNP variance, but take the specified variance as the true variance. The current comparison suggests that Bayesian analyses on haplotypes or SNPs are the most promising approach for Genomic selection although the BLUP models may provide a computationally attractive alternative with little loss of efficiency. On the other hand fixed effect type models are unlikely to provide any gain over traditional pedigree indexes for selection.

Introduction

Genomic selection

Hybrid Marker Assisted Selection (MAS) schemes were the first tool proposed to include information on a few main genes or quantitative trait loci (QTL) into best linear unbiased prediction (BLUP) of breeding values e.g. [1]. More recently, genomic selection (GS) [2] was proposed. This approach relies on a genome-wide dense marker map, such that markers in linkage disequilibrium (LD) with each QTL are available. GS hence utilizes the data on all available markers to produce genomic estimated breeding values (GEBV) by summing the effects of all small chromosome segments characterized by their marker alleles. Because predictions using GS are based on marker associations and not pedigree information, the requirement to have phenotypes on selection candidates or their close relatives is relaxed and a breeding value can be obtained as soon as the genotypes are available. As a result, the method has the potential to increase genetic progress as well as reducing costs [3]. In breeding schemes with long generation intervals (e.g. cattle) the increase will mainly be a consequence of shortening the interval. In breeding schemes with shorter generation intervals the genetic gain may be increased due to higher accuracies of breeding values at the time of selection. With new genotyping technologies becoming available for livestock species (eg. Bovine50 Bead Chip http://www.illumin.com), GS is now becoming a very attractive approach to predict breeding values.

Several statistical methods have been proposed to be used in genomic prediction models [2, 4]. Apart from the specific models used, the prediction of genomic breeding values also involve steps of data editing, choice of response variable, utilization of marker information, and model validation.

The aim of this study was to compare approaches used to predict genomic breeding values in a situation that mimics real life. This was done by distributing genotypic data, phenotypic data, and pedigree information, but not the true breeding values to participants of the QTL-MAS XII workshop. Participants then applied various approaches to predict genomic breeding values for three generations of non-phenotyped individuals. The organisers then compared the predictions with true simulated breeding values.

Methods

Simulation of data

Pedigree

The pedigree was simulated in three parts as illustrated in Figure 1. First a historic population was simulated with 50 generations without records (G_h1 to G_h50), followed by 4 generations (G_t1 to G_t4; the training set), with both genotype and phenotype records, and finally the last 3 generations (G_v1 to G_v3; the validation set) were only genotypes and true breeding values were generated. The historic population was created by 100 founder individuals (50 males and 50 females) in generations G_h1. For each of the subsequent 50 generations, 50 males and 50 females were produced by randomly sampling parents from the previous generation. The base generation of the recorded pedigree (G_t1) had 15 males and 150 females. The parents for these were sampled randomly from individuals in G_h50. Each male was mated to 10 females and each mating pair produced 10 offspring. This created a fullsib-halfsib design, in which each male had 100 progeny and each female had 10 progeny and a total of 1500 individuals per generation. In the following 5 generations (G_t2 to G_t4 and G_v1 to G_v2), 15 males and 150 females were selected randomly to be parents of the next generation and the same mating design was repeated. In the last 3 generations (G_v1 to G_v3) 400 of the 1500 individuals were selected randomly to be genotyped. The resulting 1200 individuals constitute the validation set.

Marker and QTL

Marker alleles were sampled for 6000 biallelic loci on 6 chromosomes (1000 markers on each chromosome) with 0.1 cM between adjacent loci. In the founder individuals (G_h1), the two alleles at each marker locus were sampled with equal probabilities. Recombination was sampled according to Haldane's mapping function [5]. A total of 48 QTL were simulated. QTL positions were sampled under the assumption of a multinomial distribution of genes across the genome. The multinomial distribution used was based on the genetic map of the mouse genome [6]. The allele substitution effects of the QTL were drawn from a gamma distribution with scale parameter (α) 5.4 and shape parameter (β) 0.42 following [7]. We wanted to control the genetic effects and genomic location of four QTL. To achieve this, we replaced four of the 48 QTL that were closest to the desired locations with QTL of pre-defined effects. The allele substitution effects of these four fixed QTL were standardized based on their individual allelic frequencies in the last generation of the historic pedigree, so that each of these QTL explains a predefined percentage of the genetic variance. Positions and allele substitution effects and contribution to genetic and phenotypic variance for each QTL is available at http://www.computationalgenetics.se/QTLMAS08/QTLMAS/DATA_files/QTLMAS_simulated_effects.pdf

Phenotype

The phenotypes were obtained as the cumulative effects of the 44 randomly drawn QTL, the 4 defined QTL, and a random residual. First, the effects of the 44 random-QTL were summed. The effect was standardized to mean 0 and variance 1. Then the effects of four fixed QTL were added to constitute the total genetic effect. The residual variance was defined to obtain a heritability of 0.30. The individuals' phenotypes were derived as the sum of the individuals' genetic value and a random residual drawn from a normal distribution with mean zero and variance equal to the residual variance.

Validation of prediction models

Participants of the QTLMAS XII workshop were provided pedigree, phenotypic, and genomic data on the 4665 individuals of generations G_t1 to G_t4 and only pedigree and genomic data on generations G_v1 to G_v3. Using these data, they used various models to estimate GEBVs for individuals in generations G_v1 to G_v3. The properties of the reported GEBVs were then assessed by three different criteria to relate the GEBVs reported by the workshop participants and the true breeding values (TBV), which were only known by the organisers of the workshop.

The first criterion was the accuracy of the GEBVs as a measure of their predictive ability. Accuracies were calculated as the correlation between GEBVs and TBVs. The second criterion was the bias of the GEBVs, assessed as the coefficients of regressing TBVs on GEBVs. A regression coefficient close to 1 indicates that predictions are unbiased. The third criterion was the rank correlations between GEBVs and TBVs in the top 10% of individuals ranked on TBVs.

Genomic selection models

A range of different statistical methods were used by QTLMAS XII participants to analyse the simulated data. Some participants have provided GEBVs from all their investigated models to the organisers while some have provided GEBVs from selected models only. According to the properties of the models for which GEBVs have been provided, they can be classified into three categories: 1) fixed effect models, 2) BLUP based models with random marker effects, and 3) Bayesian MCMC based models with a mixture of some markers attributed a large variance and the majority attributed a small variance. Different characteristics of the models used and relation to the model specifications in the contributed papers are shown in Table 1 for the fixed effect models, in Table 2 for the BLUP based models, and in Table 3 for the Bayesian models. Except for models GP_BLUP11, GP_BLUP12, and GP_BLUP16 the provided phenotypes were used as the response variable in the analyses. Only models GP_Bayes1 and GP_Bayes3 included a polygenic effect in the model and only models GP_Bayes1, GP_Bayes3, and GP_Bayes5 used haplotype information.

Table 1 Fixed effects models, their name in the contributed paper, and number of SNPs fitted.

Full size table

Table 2 Random effects BLUP models, their name in the contributed paper, assumed SNP variance and response variable used.

Full size table

Table 3 Bayesian models, their name in the contributed paper, assumptions on SNP effects and polygenic effects.

Full size table

Results and discussion

In generations 1–7, the mean minor allele frequency (MAF) of markers was 0.298. The cumulative distribution of MAF in Figure 2 shows a rather equal distribution, which is a consequence of random drift over the 50 generations from the original starting values of 0.5 for each locus. For 38 marker loci one allele was fixed. The mean linkage disequilibrium (r²) between adjacent SNPs was 0.20 and the median was 0.11. This may be relatively low compared to other simulation studies that often use 1000 generations of random mating. On the other hand, the LD achieved in this paper seems very comparable to the realised values from real data analysis. For markers with a distance around 0.1 Mb the average R2 was 0.14 and 0.22 in different breeds and studies [8–10].

Fixed effect models

Only one contribution [11] used fixed effects models to predict GEBVs of the simulated data. They used 4 different sets of SNPs in the prediction models ranging from using all 6000 SNPs, every 5^th SNP, every 10^th SNP, every 20^th SNP or setting a MAF of 0.2. As expected, the model fit improved as more SNPs were fitted in the training set [11]. However, the predictive ability of the models in the validation dataset decreased with more SNPs in the model (Table 4). Using all 6000 SNPs resulted in a correlation between GEBVs and TBVs (averaged over generations G_v1 to G_v3) of 0.16 compared to a correlation of 0.56 with 300 SNPs.

Table 4 Comparison of genomic estimated breeding values (GEBV) and true breeding values (TBV) for fixed effects models.

Full size table

All evaluated fixed effects models overestimated breeding values severely (Table 4). The regression coefficients of TBVs on GEBVs were between 0.03 and 0.53 averaged over generations G_v1 to G_v3. The biases increased with the number of markers in the model and were particularly strong when all SNPs were used. Table 4 shows that the rank correlations drop dramatically with the inclusion of more markers in the model. This means that the low regression coefficient for this model was not only due to a scaling effect. In the worst case more parameters are fitted than available observations, which lead to a serious over-fitting of data. This is known to result in unstable prediction with large prediction errors [12]. As a result, the over-fitted model had poor predictive ability for the data beyond the training set, resulting in a large variance of GEBVs. These results are well in line with the original observation of [2], where it was found that using a least squares approach leads to biased GEBVs and poor predictive ability. On the other hand [13] found relatively good results using fixed regression models, if markers were selected based on associations to the phenotypes and a liberal significance threshold. This indicates that fixed effects models could be more useful that the current simulations indicated, if a procedure is used that balance the problems of over-fitting data and selecting SNPs with overestimated effects.

Random effect models

Three contributions presented GEBVs from BLUP models fitting SNPs as random effects. In two studies [11, 14] the variance explained by each SNP was given as either the total genetic variance (σ²_G) or the total genetic variance divided by the number of SNPs (σ²_G/N_SNPs). It was apparent that models using σ²_G lead to results similar to those from the fixed effects models. This is most apparent when comparing least squares models GP_Fix2 – GP_Fix5 ([11]; Table 4) to BLUP models GP_BLUP1 – GP_BLUP4 ([11]; Table 5), where correlations and regression coefficients are very similar when the same number of SNPs were used. When σ²_G/N_SNPs was used instead, the models performed much better. This is apparent when comparing accuracies from models GP_BLUP1 to GP_BLUP4 with those from models GP_BLUP5 to GP_BLUP8. The correlation between GEBVs and TBVs increased from 0.23 to 0.75 when 3328 SNPs were used (GP_BLUP1 and GP_BLUP5) and from 0.56 to 0.61 when only 300 SNPs were used (GP_BLUP4 and GP_BLUP8). When using this variance, the SNP effects are regressed considerably more towards zero, which leads to virtually unbiased estimates for the models with more than 600 SNPs.

Table 5 Comparison of genomic estimated breeding values (GEBV) and true breeding values (TBV) for BLUP models.

Full size table

The effect of the assumed variance explained by each SNP on the accuracy of GEBV can also be seen in the other two contributions. In the contribution [14] GEBV were obtained from a BLUP model with the variance either σ²_G or σ²_G/N_SNPs. Here the correlation between TBVs and GEBVs increased from 0.55 to 0.77 and the regression coefficient increased from 0.41 to 0.94, when phenotypic values were used as the response variable. In the study by Pimentel et al. [15], the correlation was 0.22 and the regression was 0.06 using a model with the variance of each SNP equal to residual variance, while the correlation was 0.51 and the regression was 0.31 using a model with the variance σ²_G/N_SNPs. These results clearly show that the importance of fitting the SNP effects as random effects and providing a reasonable SNP variance increases with the number of markers included in the models.

Mixture distribution/Bayesian models

Two contributions [16, 17] analysed data with Bayesian methods and in total submitted GEBVs from five different models. GEBVs from all models were more accurate (Table 6) than the non-Bayesian models. All models had a high predictive ability with correlations between GEBVs and TBVs ranging from 0.84 to 0.92 averaged over generations G_v1 to G_v3.

Table 6 Comparison of genomic estimated breeding values (GEBV) and true breeding values (TBV) for Bayesian models.

Full size table

All Bayesian models used Gibbs sampling algorithms, in which marker or QTL effects were assumed to follow a mixture distribution, where relatively few markers were assumed to explain a large variance and a large number explained a very small variance. This is most likely the main reason for the improved performance of these models over the BLUP models in which all markers are assumed to explain the same amount of variance. The assumption of a homogeneous variance for all markers leads to a poor prediction of the effect of a QTL with a large contribution to the trait from a single marker even if they are in complete LD. In this simulated dataset the 10 largest QTL explain 82.9% of the genetic variance. This may favour the Bayesian models relative to BLUP models, compared to situations with a large number of QTL contributing more equally to the genetic variance.

Fitting single SNP or haplotype effects

Following the internal validation by [17], it can be seen that, for this data, it was an advantage to fit effects of haplotypes rather than effects of single SNPs with the model used in [17]. It must be noted that the data were provided with known haplotypes. In real life, haplotypes are estimated with errors, which may affect the results. The advantage of using haplotypes in this study is most likely because there is higher LD between the haplotypes and the QTL than between any of the individual markers and the QTL. On the other hand there are some disadvantages of fitting haplotype effects. These disadvantages include: 1) for a given position there are more effects to be estimated, 2) large haplotypes are more likely to break up by recombinations, 3) haplotypes are more sensitive to errors in the map. Therefore, the optimal size of haplotypes is a trade-off between having a predictor in high LD with the underlying genes and the precision to estimate haplotype effects [18].

Models GP_Bayes1 and GP_Bayes2 fit haplotype effects with a correlation between alleles proportional to the probability of IBD. This is expected to perform better as more phenotypes contribute to the estimation of particular haplotype effects. However, Table 6 shows that the accuracies are slightly worse and regressions further from 1 compared with the other Bayesian models. In particular, it seems that the predictive ability, both assessed as accuracies and the rank correlations for the top animals, decreases faster over generations. A possible explanation is that the IBD based model is theoretically advantageous, but it may be associated with numerical problems. For instance, the IBD matrices calculated from pair wise comparisons of haplotypes are generally not positive definite and must be manipulated before inversion.

Inclusion of polygenic effects

Comparing GP_Bayes1 to GP_Bayes2 and GP_Bayes3 to GP_Bayes4 (Table 6) shows no effect of including a polygenic component in the model for these analyses. However, we do not know what proportion of genetic variance can be captured by SNP markers in real data. If SNP markers don't capture all the genetic variance it is still important to include a polygenic component. In the present study, the total genetic effect was simulated from bi-allelic QTL and the markers were uniformly distributed across the whole genome. However, in real data, part of the genetic variance comes from structural variation such as copy number variation, inversion, deletion etc. [19]. Also the simulated population is very homogeneous. In real data there are likely to be genetic structures that lead to spurious associations [20]. In such situations inclusion of pedigree may improve predictions more that what is observed in this simulated dataset.

Using EBVs or phenotypic values as response variables

Two of the contributions used both phenotypic values and EBVs as response variables in the analyses. In the study by Pimentel et al. [15], the correlation between TBV and GEBV derived from phenotypic values was slightly higher than the correlation for GEBV derived from EBVs. In the study by Macciotta et al. [14], using a model with SNP variance of σ²_G/N_SNPs, the accuracy of GEBV based on EBVs was 0.53, but the accuracy was 0.77 for GEBV based on phenotypic values. The lower accuracy of GEBV obtained using EBVs as response variable is most likely due to information lost in the procedure of predicting breeding values, which do not have high accuracies themselves. If this is the case, EBVs should only be used as response variables when they have very high reliabilities. The same effect is seen in the models by [15] who also used both EBVs and phenotypes as the response variable.

Model validation

Several workshop participants performed internal validations by either estimating correlations between GEBVs and phenotypes or EBVs in the training set or by cross validation. When correlations are calculated within the training set only, this reflects mainly how well the model fit the data in the training set and not necessarily how well the model predict the next generation. When a statistical model includes too many covariates, relative to the amount of data available, the model may fit the data perfectly but have a poor predictive ability. This was most obvious in the results obtained using fixed effects models. In [11], the correlation between EBV and GEBV in the training set was highest when all markers were included and declined as markers were removed. The predictive ability was, however, actually very poor for the fixed model with all markers (GP_Fix1) and increased as markers were removed from the model (GP_Fix2 – Gp_Fix5).

To assess the predictive ability, it is necessary to perform model validation. There are many approaches to model validation. A common approach in statistical practice is cross-validation where data sample is partitioned into subsets. The analysis is then initially performed on a single of these subsets, while the others are retained for subsequent validation of the initial analysis. In the workshop, [15] validated the models using data of G_t1-G_t3 as training data and G_t4 as test data. This approach may have the disadvantage of using EBVs for validation of GEBVs in G_t4 that were based on information from phenotypes of individuals in G_t1-G_t3. This creates a strong dependency between the data used for model development and validation, which could be avoided by using the phenotypic data in generation G_t4 rather than EBVs. Villumsen et al. [17] used a 5-fold cross validation where in each of the five validation sets, 20% of the data in G_t4, were taken as test data. The advantage of cross validation is that it makes it possible to retain training data as large as possible, while obtaining the amount of total test data as large as required (with maximal total test data equal to the whole data). In the simulated data of this workshop this strategy proved very efficient in selecting models that predicted the TBVs in the validation data very accurately.

Decrease of accuracies over generations

While three generations were simulated for validation, results are given as the mean accuracy of all three generations. This is because we could see no clear trend as to which differences in the models lead to a higher decrease in accuracy over the three generations. The only obvious result was a clear relation between the accuracy of GEBVs in generation G_v1 and the decline from G_v1 to G_v3, such that models with a high accuracy declined less than models with low accuracies.

It must be noted that the current comparison is based on a single replicate of simulated data and therefore the conclusions must be interpreted with caution.

Conclusion

The comparison of the different methods applied to the dataset by the workshop participants clearly shows a distinct clustering of the three approaches, where the Bayesian analyses gave the highest accuracies, followed by the BLUP models, while the fixed effects models generally had low accuracies and large error variance. However, some BLUP models were less biased than some Bayesian models. The BLUP models are clearly sensitive to the given SNP variance, because a BLUP does not estimate SNP variance, but takes the specified variance as the true variance. For instance, if the number of QTL would increase, and each QTL would have a smaller effect, it is expected that the differences between the BLUP and Bayesian models would be smaller. The current comparison suggests that Bayesian analyses on haplotypes or SNPs are the most promising approach for Genomic selection although the BLUP models may provide a computationally attractive alternative with little loss of efficiency. As already concluded by [2] fixed effect type models are unlikely to provide any gain over traditional pedigree indexes for selection.

Abbreviations

GEBV:: genomic estimated breeding values
GS:: genomic selection
TBV:: true breeding value.

References

Fernando RL, Grossman M: Marker assisted selection using best linear unbiased prediction. Genet Sel Evol. 1989, 21: 467-477. 10.1051/gse:19890407.
Article PubMed Central Google Scholar
Meuwissen THE, Hayes BJ, Goddard ME: Prediction of total genetic value using genome-wide dense marker map. Genetics. 2001, 157: 1819-1829.
PubMed Central CAS PubMed Google Scholar
Schaeffer LR: Strategy for applying genome-wide selection in dairy cattle. J Anim Breed Genet. 2006, 123: 218-223. 10.1111/j.1439-0388.2006.00595.x.
Article CAS PubMed Google Scholar
Gianola D, Fernando RL, Stella A: Genomic-assisted prediction of genetic value with semiparametric procedures. Genetics. 2006, 173: 1761-1776. 10.1534/genetics.105.049510.
Article PubMed Central CAS PubMed Google Scholar
Haldane JBS: The combination of linkage values and the calculation of distances between the loci of linked factors. J Genet. 1919, 8: 299-309. 10.1007/BF02983270.
Article Google Scholar
NCBI 2005: National Centre for Biotechnology Information, MGI genetic map of the mouse genome (Mus musculus). [http://www.ncbi.nlm.nih.gov/Genomes/]
Hayes BJ, Goddard ME: The distribution of the effects og genes affecting quantitative traits in livestock. Genet Sel Evol. 2001, 33: 209-229. 10.1051/gse:2001117.
Article PubMed Central CAS PubMed Google Scholar
McKay SD, Schnabel RD, Murdoch BM, Matukumalli LK, Aerts J, Coppieters W, Crews D, Dias Neto E, Gill CA, Gao C, Mannen H, Stothard P, Wang Z, Van Tassell CP, Williams JL, Taylor JF, Moore SS: Whole genome linkage disequilibrium maps in cattle. BMC Genetics. 2007, 8: 74-10.1186/1471-2156-8-74.
Article PubMed Central PubMed Google Scholar
Sargolzaei M, Schenkel FS, Jansen GB, Schaeffer LR: Extent of Linkage Disequilibrium in Holstein Cattle in North America. J Dairy Sci. 2008, 91: 2106-2117. 10.3168/jds.2007-0553.
Article CAS PubMed Google Scholar
de Ross AP, Hayes BJ, Spelman RJ, Goddard ME: Linkage disequilibrium and persistence of phase in Holstein-Fresian, Jersey and Angus cattle. Genetics. 179 (3): 1503-1512. 10.1534/genetics.107.084301.
Zukowski K, Suchochi T, Gontarek A, Szyda J: The impact of single nucleotide polymorphism selection on prediction of genomewide breeding values. BMC Proceedings. 2009, 3 (Suppl 1): S13-10.1186/1753-6561-3-s1-s13.
Article PubMed Central PubMed Google Scholar
Hastie T, Tibshirani R, Friedman J: The elements of statistical learning. 2003, New York, Springer
Google Scholar
Macciotta PP, Gaspa G, Steri R, Pieramati C, Carnier P, Dimauro C: Pre-selection of the most significant SNPs for the estimation of genomic breeding values. BMC Proceedings. 2009, 3 (Suppl 1): S14-10.1186/1753-6561-3-s1-s14.
Article PubMed Central PubMed Google Scholar
Habier D, Fernando RL, Dekkers JCM: The impact on relationship information on genome-assisted breeding values. Genetics. 2007, 177: 2389-2397.
PubMed Central CAS PubMed Google Scholar
Pimentel ECG, König S, Schenkel FS, Simianer H: Comparison of statistical procedures for estimating polygenic effects using dense genome-wide marker data. BMC Proceedings. 2009, 3 (Suppl 1): S12-10.1186/1753-6561-3-s1-s12.
Article PubMed Central PubMed Google Scholar
Calus MPL, de Roos APV, Veerkamp RF: Estimating genomic breeding values from the QTL-MAS Workshop data using single SNP regression and the haplotype/IBD approach. BMC Proceedings. 2009, 3 (Suppl 1): S10-10.1186/1753-6561-3-s1-s10.
Article PubMed Central PubMed Google Scholar
Villumsen TM, Janss L: Bayesian genomic selection: the effect of haplotype length and priors. BMC Proceedings. 2009, 3 (Suppl 1): S11-10.1186/1753-6561-3-s1-s11.
Article PubMed Central PubMed Google Scholar
Villumsen TM, Janss L, Lund MS: The importance of haplotype length and heritability using genomic selection in dairy cattle. J Anim Breed Genet. 2008, Published online: 24 Sep 2008; doi 10.1111/j.1439-0388.2008.00747.x
Google Scholar
Andrew J, Chang SZ, Eichler EE: Structural Variation of the human genome. Annual Review of Genomics and Human Genetics. 2006, 7 (1): 407-10.1146/annurev.genom.7.080505.115618.
Article Google Scholar
Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF, McMullen MD, Gaut BS, Nielsen DM, Holland JB, Kresovich S, Buckler ES: A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics. 2006, 38: 203-208. 10.1038/ng1702.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This article has been published as part of BMC Proceedings Volume 3 Supplement 1, 2009: Proceedings of the 12th European workshop on QTL mapping and marker assisted selection. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/3?issue=S1.

Author information

Authors and Affiliations

Faculty of Agricultural Sciences, Department of Genetics & Biotechnology, Research Centre Foulum, Aarhus University, DK-8830, Box 50, Tjele, Denmark
Mogens Sandø Lund, Goutam Sahana & Guosheng Su
The Roslin Institute and R(D)SVS, University of Edinburgh, Roslin Biocentre, Roslin, Midlothian, EH25 9PS, UK
Dirk-Jan de Koning
Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Box 7023, SE-75007, Uppsala, Sweden
Örjan Carlborg

Authors

Mogens Sandø Lund
View author publications
You can also search for this author in PubMed Google Scholar
Goutam Sahana
View author publications
You can also search for this author in PubMed Google Scholar
Dirk-Jan de Koning
View author publications
You can also search for this author in PubMed Google Scholar
Guosheng Su
View author publications
You can also search for this author in PubMed Google Scholar
Örjan Carlborg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mogens Sandø Lund.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

MSL, ÖC, and DJK designed the simulations. GSa and MSL wrote simulation software. GSa performed the simulations and computed correlations to true breeding values. MSL drafted the manuscript and all authors contributed to interpretation of results.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Lund, M.S., Sahana, G., de Koning, DJ. et al. Comparison of analyses of the QTLMAS XII common dataset. I: Genomic selection. BMC Proc 3 (Suppl 1), S1 (2009). https://doi.org/10.1186/1753-6561-3-S1-S1

Download citation

Published: 23 February 2009
DOI: https://doi.org/10.1186/1753-6561-3-S1-S1

Proceedings of the 12th European workshop on QTL mapping and marker assisted selection

Comparison of analyses of the QTLMAS XII common dataset. I: Genomic selection

Abstract

Introduction

Genomic selection

Methods

Simulation of data

Pedigree

Marker and QTL

Phenotype

Validation of prediction models

Genomic selection models

Results and discussion

Fixed effect models

Random effect models

Mixture distribution/Bayesian models

Fitting single SNP or haplotype effects

Inclusion of polygenic effects

Using EBVs or phenotypic values as response variables

Model validation

Decrease of accuracies over generations

Conclusion

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors' contributions

Rights and permissions

About this article

Cite this article

Share this article

BMC Proceedings

Contact us