Genome wide scan for somatic cell counts in holstein bulls

Background Mastitis is the most costly disease for dairy production, and control of the disease is often difficult, due to its multi-factorial nature. Susceptibility to mastitis is under partial genetic control and the industry uses indirect selection for decreased concentrations of somatic cells in milk to reduce mastitis. Methods A genome-wide scan was performed to identify genomic regions associated with deregressed estimated breeding values (EBVs) for somatic cell counts (SCC) in Holstein bulls. In total 1183 proven bulls of the Italian of Holstein population, were genotyped with the BovineSNP50 BeadChip (Illumina, San Diego, CA) and a whole genome association analysis was performed using the R package GenABEL. Results Two chromosomal regions showed association with SCC, a region on chromosome 14 with high significance (P < 5x10-6) and a region on chromosome 6 with moderate significance (P < 5x10-5). Conclusions Two regions with effects on SCC have been identified with good statistical support. A further study of these candidate regions will be performed to verify the results and identify the causal mutations.


Background
Mastitis, an inflammation of the mammary gland caused by an infection with a range of bacteria, is the most costly disease for dairy production. Control of mastitis is difficult due to its multi-factorial nature. Susceptibility to mastitis is under partial genetic control and the industry uses selection on a correlated trait (somatic cells score in milk), to reduce mastitis incidence in the population. Over the last few years, several studies have identified genetic loci putatively associated with somatic cell counts or clinical mastitis [1,2]. The availability of the bovine genome sequence and high density genotyping panels of single nucleotide polymorphisms has allowed a considerable number of bulls worldwide to be genotyped for genomic evaluation and selection. Furthermore, this information can be used to perform association studies with high precision at genome-wide level. The work reported here used genotypic data from the genomic selection project to perform a genomewide scan with the objective of identifying genomic regions associated to deregressed estimated breeding values (DR-EBVs) for somatic cell counts (SCC) in Holstein bulls.

Animals
The bulls chosen for the genome wide association study were selected from among the 3155 animals progeny tested in Italy with DNA samples available. All these bulls will be used by the Italian National breeders association of Holstein Frisian Cattle (ANAFI) to perform national genomic evaluations.
Selection criteria used for association studies were intended to obtain: i) bulls with high selection index reliability (PFT > 0.75%); and ii) as low relationships between animals in the dataset as possible by trying to keep as many families (fatherson couples) as possible. Among the 3155 bulls with biological material available 2109 bulls had appropriate criteria to be included in the * Correspondence: giulietta.minozzi@tecnoparco.org 1 Parco Tecnologico Padano, Via Einstein, Polo Universitario, Lodi 26900, Italy Full list of author information is available at the end of the article study, 1183 of which had been already genotyped with the Bovine 50K SNP chip (Illumina Inc, San Diego).

Phenotype: deregressed EBV for SCC
The EBV for SCC had a mean of 98.73 ± 5.3 for the 2109 bulls, and a mean 98.77 ± 6.3 in the cohort of 1183 animals included in the study. Furthermore, deregressed EBVs (DR-EBVs) had mean of 0 and a standard deviation of 5. The DR-EBVs and reliabilities for somatic cell counts were derived from a reduced animal model for single records on a single trait. The strategy used to estimate the deregressed estimates was a simplified version of the algorithm of Jamrozik et al. [3], appropriate to a single trait reduced animal model.

Statistical analysis
Genome-wide association analysis was performed with the GenABEL package in R using a three step GRAM-MAR-CG approach, (Genome wide Association using Mixed Model and Regression -Genomic Control) [4,5]. Uncorrected p-values of P < 5 x 10 -7 were accepted to represent very strong proof of genome-wide association, while p-values between 5 x 10 -7 and 5 x 10 -5 were considered as moderately significant associations.

Genotyping and quality control filters
A total of 1183 progeny tested bulls were genotyped with the BovineSNP50 BeadChip (Illumina, San Diego, CA). Genotype quality assurance was performed within the R statistical environment using the GenABEL package ("check.marker" function) [6]. SNPs were checked for marker call rate (>5%) and minor allele frequency (<5%): markers missing 5% of data and with MAF of less than 5% were removed. Genotyping efficiency of samples was also verified, thus, samples with more than 5% missing data were removed. Classical Multi Dimension Scaling (MDS) was used to explore population substructure and to verify the genetic homogeneity of the dataset prior to analysis.

Quality control
Following quality control checks, 641 markers were excluded because of low call rate and 11404 markers were excluded because of low minor allele frequency. Furthermore, markers on the sex chromosomes were removed from the analysis. A total of 8 samples were removed because of low call rate and other 2 were eliminated because of high autosomal heterozygosity (FDR < 1%). Mean heterozygosity of the dataset after quality check was 0.33 ± 0.01, while the samples removed had heterozygosity higher than 0.63, indicating possible sample contamination. No samples were removed due to high IBS (Identity By State). Mean IBS was 0.70 ± 0.01, based on 2000 autosomal markers, while the threshold for IBS was set to > 0.95. No outliers were identified by Classical Multi Dimension Scaling (MDS).
After quality controls, the final dataset used in the following association analysis contained 1173 samples and 41209 Genome wide SNPs.

Association analysis
Two chromosomal regions showed associations with SCC, a region on chromosome 14 with high significance (P < 5x10 -6 ) and a region on chromosome 6 with moderate significance (P < 5x10 -5 ). These two chromosome regions should be further tested to confirm these associations and to potentially identify the causative variations that affect this trait.
A recent review of QTL reported on chromosome 14 [7] identified 10 QTL for disease traits as mastitis, seven of which were related to somatic cell score [8][9][10][11]. Interestingly the SNP identified in this study and located on chromosome 14, is actually within 1Mb from the QTL identified by Kaupe et al. [7] to be associated to somatic cell score, but significantly distant from all other chromosomal regions that harbor QTL for clinical mastitis [7]. Furthermore Nilsen et al, [2] characterized a region of chromosome 6 in which QTL for clinical mastitis had been identified, and found the Mucin 7 gene to be significantly associated with the trait. Mucin7 is located close to the casein cluster on chromosome 6. However, the SNP located on chromosome 6 obtained in this study is more than 2 Mb distant from the casein cluster, indicating that different genes could be involved. To confirm the results found in the current study, both SNP identified will be tested in a second independent set of animals.