BMC Proceedings BioMed Central

Background Reliable annotation linking oligonucleotide probes to target genes is essential for functional biological analysis of microarray experiments. We used the IMAD, OligoRAP and sigReannot pipelines to update the annotation for the ARK-Genomics Chicken 20 K array as part of a joined EADGENE/SABRE workshop. In this manuscript we compare their annotation strategies and results. Furthermore, we analyse the effect of differences in updated annotation on functional analysis for an experiment involving Eimeria infected chickens and finally we propose guidelines for optimal annotation strategies. Results IMAD, OligoRAP and sigReannot update both annotation and estimated target specificity. The 3 pipelines can assign oligos to target specificity categories although with varying degrees of resolution. Target specificity is judged based on the amount and type of oligo versus target-gene alignments (hits), which are determined by filter thresholds that users can adjust based on their experimental conditions. Linking oligos to annotation on the other hand is based on rigid rules, which differ between pipelines. For 52.7% of the oligos from a subset selected for in depth comparison all pipelines linked to one or more Ensembl genes with consensus on 44.0%. In 31.0% of the cases none of the pipelines could assign an Ensembl gene to an oligo and for the remaining 16.3% the coverage differed between pipelines. Differences in updated annotation were mainly due to different thresholds for hybridisation potential filtering of oligo versus target-gene alignments and different policies for expanding annotation using indirect links. The differences in updated annotation packages had a significant effect on GO term enrichment analysis with consensus on only 67.2% of the enriched terms. Conclusion In addition to flexible thresholds to determine target specificity, annotation tools should provide metadata describing the relationships between oligos and the annotation assigned to them. These relationships can then be used to judge the varying degrees of reliability allowing users to fine-tune the balance between reliability and coverage. This is important as it can have a significant effect on functional microarray analysis as exemplified by the lack of consensus on almost one third of the terms found with GO term enrichment analysis based on updated IMAD, OligoRAP or sigReannot annotation.


Background
Genome-wide association studies (GWAS) with 100,000-1,000,000 single-nucleotide polymorphisms (SNPs) are a promising novel approach for dissecting the genetic background of complex diseases and have become common in the last two years [1].Because of the high degree of automation in the genotyping process, great care needs to be taken to generate high data quality [2].Here, a number of quality criteria seem to be agreed upon, including a SNP-wise call fraction or the conformation of genotype frequencies with Hardy-Weinberg equilibrium [3,4].

Open Access
Another important criterion is the quality of the results from the calling algorithm, which the fluorescence signal intensities into one of the three possible genotypes.For evaluation, the visual inspection of signal intensities through cluster plots, also termed signal intensity plots, has been recommended [4,5], and the validity of the genotype assignment may be assessed.Our experience with previously performed GWAS of coronary artery disease [6,7] has shown that the gold standard for this evaluation as of yet is the visual inspection of the cluster plots by at least two independent and experienced readers.This is very time-consuming and depends on the training and availability of experienced readers.Because of this, only a selection of interesting SNPs in a GWAS are usually evaluated.Previous analyses have shown that erroneous genotype scoring can lead to false-positive or false-negative associations, so that the number of low-quality SNPs may be overestimated through this selection [2].
There are two principal approaches to tackle this problem.The first focuses on improving the calling algorithm itself [8].The second, which is followed here, aims at automating the evaluation of the cluster plots.We have developed an algorithm called automated cluster plot analysis (ACPA) that fulfills four requirements: 1) reduction of work load: out of a vast number of genotyped SNPs, ACPA classifies only a small proportion to have questionable quality and thus need to be inspected visually; 2) high negative predictive value: of the SNPs classified to be of high quality by ACPA, only a small portion is erroneously classified; 3) reasonable speed: on a simple personal computer, 1,000 SNPs are analyzed in approximately 10 minutes, and the processes may also be split between different machines; and 4) user-friendly environment: ACPA was implemented in R using the GenABEL [9] library.

Data
Signal intensities and genotypes of 6,752 participants in the Framingham Heart Study were provided as Problem 2 for the Genetic Analysis Workshop 16 (GAW16).
Genotyping was performed using the Affymetrix Gene-Chip Human Mapping 500 k Array Set.Details of the study and the genome-wide SNP scan can be found in Cupples et al. [10].

Algorithm
The ACPA algorithm, which analyses one SNP at a time, is described below for the case of exactly three different clusters, i.e., three genotype groups.If the SNP has a low minor allele frequency leading to two clusters, the algorithm is adapted appropriately.The validation status c of each cluster-plot is logged in a file.The algorithm has been implemented in R (version 2.7.1) and uses the library GenABEL (version 1.4-1) for storing the genotype data.The cluster plot can be generated together with the cluster boundaries in portable document format (pdf).
Two example plots are displayed in Figure 1.Figures 1, a  and 1, b, show SNPs with clearly separated clusters and with bad separation of clusters, respectively.By using different values for the factor f (see above, step 2d), the size of the ellipses (Figure 1) can be changed.Specifically, lowering f will lead to smaller ellipses, resulting in fewer samples within the boundary.Using f = 1.5 follows a commonly used definition of outliers when considering quartiles and the interquartile range.Around each cluster an ellipse will be constructed and the number of potentially misclassified samples is counted.The sum over all three clusters is then compared with a cut-off value t.This parameter depends on the sample size.For this particular data set we chose t = 25, which maximizes the accuracy of ACPA.

Evaluation of the algorithm
The performance of the ACPA algorithm was evaluated by comparing ACPA with the decisions made by two experienced readers.First, we performed standard quality control (sQC) and omitted SNPs with deviations from Hardy-Weinberg-equilibrium (p < 10 -4 for exact lack-of-fit test), missing fraction >0.02, and minor allele <0.01.We randomly selected 1,000 SNPs of the remaining SNPs.Both readers judged independently whether a SNP should be excluded from or kept for further analyses.They also gave a level of uncertainty (certain/uncertain) of their decisions.Because we cannot expect ACPA to outperform the reading of experienced readers, we only considered SNPs for which both readers came to the same decision and expressed certainty.A good SNP is a SNP when both readers recommend keeping that SNP for further analysis.If both readers favor the exclusion of a SNP, we call this a bad SNP.
We report sensitivity, specificity, positive predictive values (PPV) and negative predictive values (NPV) of ACPA as well as 95% confidence intervals (95% CI) using Wilson's score method [11].Here, sensitivity denotes the proportion of correctly identified bad clustering SNPs, and specificity is the proportion of correctly identified good clustering SNPs.We evaluated the performance of ACPA for two different ellipse sizes (f = 1.5 and f = 3).

Results
Out of the 486,605 BRLMM-called SNPs that were provided for GAW16, 343,427 SNPs successfully passed the sQC.For 695 of the 1,000 randomly selected SNPs, the two independent readers came to identical decisions, and both readers expressed certainty about their decision.Five hundred and eighty-eight (84.6%) were judged as correctly called SNPs, and the remaining 107 SNPs were classified as SNPs where genotype assignment was unreliable.Point and interval estimates for sensitivity, specificity, PPV, and NPV are shown in Table 1.For f = 1.5 we achieved a specificity of 99%, i.e., almost all good SNPs were recognized by ACPA.Increasing the cluster boundaries resulted in a loss of specificity (86%) but increased the of ACPA to detect badly clustered SNPs.The PPV declined from 93% to 53%.

1 .
Select all individuals with assigned genotypes, i.e., delete all missing genotypes.2. For clusters k = 1 to 3: a) Perform a principal-component analysis, using only data of cluster k. b) Transform the data of all clusters according to estimated first two principal components of step a).c) Calculate the Mahalanobis distance from the center of cluster k to all samples within cluster k. d) Define the cluster boundary b as b = q 3 + f.IQR, where q 3 is the upper quartile and IQR is the interquartile range of the Mahalanobis distances computed in step c).Default: f = 3. e) Calculate the Mahalanobis distance from the center of cluster k to the samples not included in k. f) Count the number c k , of samples not included in k falling in the boundary.3. Sum the number of samples falling in the boundary over all three clusters, i.e., c = ∑ c k 4. A SNP is called to have unreliable genotype assignments if c exceeds a threshold t.Default: t = 25.

Figure 1
Figure 1 Examples of cluster plots.Cluster plots for two SNPs.One spot corresponds to one sample.Samples with genotypes AA and BB are red and blue, respectively.Heterozygous samples are shown in green; samples with missing genotypes are black.The ellipses represent the cluster boundaries as computed by ACPA.a, A SNP with no samples in overlapping ellipses; b, red samples lie in the green ellipse.At the bottom of the green ellipse, samples have been erroneously classified as red samples.

Table 1 :
Quality of the automatic analysis with ACPA