Comparison of methods for estimation of genetic covariance matrix from SNP or pedigree data utilised to predict breeding value

Mucha, Sebastian; Wolc, Anna; Strabel, Tomasz

doi:10.1186/1753-6561-4-S1-S7

Volume 4 Supplement 1

Proceedings of the 13th European workshop on QTL mapping and marker assisted selection

Proceedings
Open access
Published: 31 March 2010

Comparison of methods for estimation of genetic covariance matrix from SNP or pedigree data utilised to predict breeding value

Sebastian Mucha¹,
Anna Wolc¹ &
Tomasz Strabel¹

BMC Proceedings volume 4, Article number: S7 (2010) Cite this article

2861 Accesses
1 Citations
Metrics details

Abstract

Background

The aim was to predict breeding values of non-phenotyped individuals based on a dataset prepared for the 13^th QTL-MAS Workshop in Wageningen.

Methods

Genetic co-variance matrices between animals were estimated with three methods: one using pedigree information only and two based on SNP markers from the first chromosome. Quadratic regression of breeding values, estimated separately in each of the five time points, was used to predict the breeding values in the 6^th time point.

Results

Based on the comparison (true - estimated BV) it can be concluded that SNP based methods provided better estimates (accuracy between 0.75 and 0.80) than pedigree (0.65).

Conclusions

Even though only SNPs from chromosome 1 were used it was still possible to achieve fairly high accuracies. Most likely this was due to the fact that chromosome 1 contained the QTLs with the largest effects.

Background

The analysis was based on a dataset prepared for the 13^th QTL-MAS Workshop in Wageningen [1]. The aim of this paper was to predict breeding values of the 1000 non-phenotyped animals in the 6^th time point, using three different strategies based on similarity between individuals due to common ancestry (pedigree records), and two methods based on marker similarity. Due to software limitations [2] only one chromosome could be included in the analysis. The first chromosome was chosen based on preliminary results of QTL mapping, performed with a single QTL model with additive effects in the GRID QTL package [3]. The most significant QTLs, affecting the analysed trait in all five time points, were found on chromosome 1.

Methods

Estimation of genetic relationship

Genetic covariance matrices between all animals present in the dataset were estimated with three methods. First approach (pedigree based method - PB) was based on the additive relationship matrix calculated from pedigree. Second method computed similarity between individuals as a correlation coefficient between allelic states using 90 SNP markers from chromosome 1 (SNP based method - SNPL). For this purpose the method of Loiselle [4] was used as implemented in software package SPAGeDi 1.2g [2], which computes relationship as a_ij = Σ1[ Σa(Σc_iΣc_j(x_1cia - p_1a)(x_1cja - p_1a)/Σc_iΣc_j1) + Σa(p_1a(1 - p_1a)/(n₁ - 1)) ] / ΣlΣa (p_1a(1 - p_1a)) where x_1cia is an indicator variable (x_1cia = 1 if the allele on chromosome c at locus l for individual i is a, otherwise x_1cia = 0), p_1a is the frequency of allele a at locus l in the reference sample, n₁is the number of alleles defined in the sample at locus l (the number of individuals times the ploidy level minus the number of missing alleles), and Σc_i stands for the sum over the homologous chromosomes of individual i. Here, the term involving (n₁ - 1) is a sampling bias correction. The program calculates the pair wise relationship between animals i and j (a_ij) leaving the diagonal elements blank (a_ii), thus selfcoancestry had to be estimated as: F_k = 1 + 0.5*a_ij, where a_ij is relationship between parents i and j of individual k.

The third method used MCMC (Markov Chain Monte Carlo simulations) to estimate genetic relationship between animals for a selected number of 39 SNP markers from chromosome 1, with minor allele frequency above 0.1 (selected SNP method - SNPC). This limitation was imposed due to time-extensive properties of the MCMC method. Software package Citius[5, 6] was used to apply the MCMC method to calculate multilocus genotype probabilities and to analyse genes shared identical by descent (IBD). IBD matrices were calculated in 9 points along the analysed fragment of chromosome 1. Afterwards they were averaged into one G-matrix, that was used for further computations.

Estimation of variance components and breeding values

Variance components were estimated separately for each time point (0, 132, 265, 397, 530), with ASREML [7] using the following model:

y_i = μ + a_i + e_i

Where: y_i - analysed trait a_i - random additive genetic effect of animal i; e_i - random residual effect.

The covariance structure was specified as:

and

where: - additive genetic variance, - residual variance, G - genetic relationship matrix, I – identity matrix.

The analysis was performed with three types of genetic covariance matrices (G) based on: pedigree (PB method), and SNP markers (methods SNPL and SNPC).

Quadratic regression of predicted breeding values on time, extracted from ASREML, in the first five time points was applied to estimate least square regression coefficients for each animal. Subsequently, the estimated regression coefficients were used topredict the unknown breeding values in the 6^th time point (time 600) using the following formula:

where: y_600,i is the breeding value in time point 600; , , and, are least square regression coefficients, estimated for animal i.

Results

Variance components

Regardless of the method used, genetic and residual variances increased with time (Table 1). Estimates of genetic variance were lower for the SNPC method than the PB or SNPL method. In case of the PB method heritability decreased from 0.51 (time point 0) to 0.47 (time point 530). On the other hand the SNPL method resulted in genetic variance increasing more than the residual variance, which resulted in an increase of heritability from 0.53 (time point 0) to 0.60 (time point 530). Heritability estimates for the SNPC method did not differ much between the time points and were between 0.40 and 0.41. Correlations between breeding values in the different time points were high - between 0.82 and 0.99 for the SNPL and SNPC methods (Table 2 and Table 3) and between 0.79 and 0.99 for the PB method (Table 4) reached around 0.98.

Table 1 Estimates of genetic and residual variance and heritability at five time points (T0, T132, T265, T397 and T530) obtained with three methods: SNPL - covariance structure from 90 SNPs on chromosome 1 (Loiselle et al. 1995), SNPC - covariance structure from 30 selected SNPs with minor allele frequency >0.1 (Szydlowski et al. 2008), PB - covariance structure from pedigree.

Full size table

Table 2 Correlation between breeding values in five time points (T0, T132, T265, T397 and T530) , estimated with a genetic covariance matrix based on 90 SNPs from chromosome 1 (SNPL method).

Full size table

Table 3 Correlation between breeding values in five time points (T0, T132, T265, T397 and T530) estimated with a genetic covariance matrix based on a selected number of SNPs from chromosome 1, with minor allele frequency >0.1 (SNPC method).

Full size table

Table 4 Correlation between breeding values in five time points (T0, T132, T265, T397 and T530) estimated with a genetic covariance matrix based on pedigree records (PB method).

Full size table

Table 5 Correlation and regression of true breeding values (provided by the organizers) on breeding values estimated with three methods: PB - relationship based on pedigree records, SNPL - genetic similarity estimated from 90 SNP markers from chromosome 1, SNPC - relationship estimated from selected SNP markers from chromosome 1 with minor allele frequency >0.1.

Full size table

Breeding values

The variance of predicted breeding values in the time point 600 was the lowest for the PB method (17.44), the highest for the SNPL method (27.94), and moderate for the SNPC method (20.42). In contrast to the PB in case of the SNP based methods there was considerable variation in breeding values within FS families (Figure 1). Correlations between breeding values in time point 600, estimated with the three methods were between 0.82 and 0.86. When comparing the list of top 20 nonphenotyped animals selected with the three methods with the true list of top 20 nonphenotyped animals from the simulation (as provided by organizers), than the PB method had only 30% of individuals in common. Higher agreement was found for SNP based methods: 45% and 50% for the SNPC and the SNPL methods, respectively. Accuracy of breeding values (correlation between predicted and true breeding values) for the 1000 nonphenotyped individuals in the time point 600 was the lowest for the PB method (0.65), higher for the SNPL method (0.75) and the highest for the SNPC method (0.80). Regression coefficients of true breeding values on predicted breeding values were between 0.79 - 0.93 (Table 5).

Discussion

The genetic variance estimated with the PB method was the closest to the true (simulated) one while variance components obtained with SNPL method were slightly overestimated. Underestimation of genetic variance with the SNPC method can be due to ignoring a part of SNPs. Changes of heritability estimates in case of the SNPL method could be due to overestimation of genetic variance which was higher for higher phenotypic variance. Genomic breeding values usually show bias, which is a consequence of using marker instead of QTL effects [8]. This bias exist also in our results - regression of true breeding values on predicted breeding values is much below 1.

Rather high variation of breeding values could be partly due to Mendelian variation and partly as a result of method inadequacy. Both SNPL and SNPC methods explored differences among animals within full-sib families but it is hard to decide which one should be preferred as one yielded higher accuracy but the second more correctly chose the top 20 animals.

It is also worth mentioning that the method for prediction of breeding values applied in this paper (quadratic regression) does not take into account the fact that the analysed trait will eventually reach its asymptotic value.

The restriction of using only one chromosome was imposed partially due to the fact that SNPC method is very computationally demanding and the SNPL method had a software limitation for the number of markers. This might have a drawback of our analysis as it neglects large part of the available SNP information. Nevertheless this simplified analysis allowed to predict breeding values with fairly high accuracy of 0.75 - 0.80. However because after the analysis it turned out that the first chromosome contained QTLs with the largest effect [1] it may be concluded that the result would have been much worse in other, practical situations.

In our analysis we used a concept of genetic relationship matrix to obtain genomic breeding values, which is similar to the method described by Zhang et al. [9]. Van Raden showed that reliabilities of GEBVs based on this approach are almost as high as in the Bayes B method [10]. The SNPL and SNPC methods, both assumed equal effects of all markers from chromosome I.

Conclusions

Application of SNP markers enables to differentiate breeding values within full sib families. Based on the comparison of true (simulated by organizers) breeding values in the time point 600 with our predictions it can be concluded that SNP based methods provided relatively good estimates. Even though only SNPs from chromosome 1 were used it was still possible to achieve fairly high accuracies. Most likely it was due to the fact that chromosome 1 contained the most significant QTLs affecting the analysed trait.

References

Coster A, Bastiaansen J, Calus M, Maliepaard C, Bink M: QTLMAS 2009, Simulated dataset. Submitted to BMC. 2009
Google Scholar
Hardy OJ, Vekemans X: SPAGeDi: a versatile computer program to analyse spatial genetic structure at the individual or population levels. Molecular Ecology Notes. 2002, 2: 618-620. 10.1046/j.1471-8286.2002.00305.x.
Article Google Scholar
Seaton G, Hernandez J, Grunchec J A, White I, Allen J, De Koning D J, Wei W, Berry D, Haley C, Knott S: GRIDQTL: a grid portal for QTL mapping of compute intensive datasets. Proceedings of the 8th World Congress on Genetics Applied to Livestock Production, Belo Horizonte, MG, Brasil. 2006
Google Scholar
Loiselle BA, Sork VL, Nason J, Graham C: Spatial genetic structure of a tropical understory shrub, Psychotria officinalis (Rubiaceae). American Journal of Botany. 1995, 82: 1420-1425. 10.2307/2445869.
Article Google Scholar
Szydlowski M: Citius: A program to apply Markov chain Monte Carlo method for multilocus analysis of large complex pedigrees - User Guide http://jay.au.poznan.pl/~mcszyd/citius/citius.pdf. 2008
Google Scholar
Szydlowski M, Gengler N: Sampling genotype configurations in a large complex pedigree. Journal of Animal Breeding and Genetics. 2008, 125: 330-338. 10.1111/j.1439-0388.2008.00733.x.
Article CAS PubMed Google Scholar
Gilmour AR, Gogel BJ, Cullis BR, Thompson R: ASReml User Guide Release 2.0. VSN International Ltd, Hemel Hempstead, HP1 1ES, UK. 2006
Google Scholar
Solberg T R, Sonesson A K, Woolliams J A, Meuwissen T H E: Genomic selection using different marker types and densities. Journal of Animal Science. 2008, 86: 2447-2454. 10.2527/jas.2007-0010.
Article CAS PubMed Google Scholar
Zhang Z, Todhunter R J, Buckler E S, Van Vleck L D: Technical note: Use of marker-based relationships with multiple-trait derivative-free restricted maximal likelihood. Journal of Animal Science. 2007, 85: 881-885. 10.2527/jas.2006-656.
Article CAS PubMed Google Scholar
Van Raden P M: Efficient Methods to Compute Genomic Predictions. Journal of Dairy Science. 2008, 91: 4414-4423. 10.3168/jds.2007-0980.
Article CAS Google Scholar

Download references

Acknowledgement

The authors would like to express their gratitude to Maciej Szydlowski for discussion and support with the software package Citius.

This article has been published as part of BMC Proceedings Volume 4 Supplement 1, 2009: Proceedings of 13th European workshop on QTL mapping and marker assisted selection.

The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/4?issue=S1.

Author information

Authors and Affiliations

Department of Genetics and Animal Breeding, Poznan University of Life Sciences, Wolynska 33, 60-637, Poznan, Poland
Sebastian Mucha, Anna Wolc & Tomasz Strabel

Authors

Sebastian Mucha
View author publications
You can also search for this author in PubMed Google Scholar
Anna Wolc
View author publications
You can also search for this author in PubMed Google Scholar
Tomasz Strabel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastian Mucha.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

SM: performed the analysis and drafted the manuscript. AW: performed the analysis and drafted the manuscript. TS: drafted the manuscript. All authors read and approved the final manuscript.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Mucha, S., Wolc, A. & Strabel, T. Comparison of methods for estimation of genetic covariance matrix from SNP or pedigree data utilised to predict breeding value. BMC Proc 4 (Suppl 1), S7 (2010). https://doi.org/10.1186/1753-6561-4-S1-S7

Download citation

Published: 31 March 2010
DOI: https://doi.org/10.1186/1753-6561-4-S1-S7

Proceedings of the 13th European workshop on QTL mapping and marker assisted selection

Comparison of methods for estimation of genetic covariance matrix from SNP or pedigree data utilised to predict breeding value