Prioritizing single-nucleotide variations that potentially regulate alternative splicing
© Teng et al; licensee BioMed Central Ltd. 2011
Published: 29 November 2011
Recent evidence suggests that many complex diseases are caused by genetic variations that play regulatory roles in controlling gene expression. Most genetic studies focus on nonsynonymous variations that can alter the amino acid composition of a protein and are therefore believed to have the highest impact on phenotype. Synonymous variations, however, can also play important roles in disease pathogenesis by regulating pre-mRNA processing and translational control. In this study, we systematically survey the effects of single-nucleotide variations (SNVs) on binding affinity of RNA-binding proteins (RBPs). Among the 10,113 synonymous SNVs identified in 697 individuals in the 1,000 Genomes Project and distributed by Genetic Analysis Workshop 17 (GAW17), we identified 182 variations located in alternatively spliced exons that can significantly change the binding affinity of nine RBPs whose binding preferences on 7-mer RNA sequences were previously reported. We found that the minor allele frequencies of these variations are similar to those of nonsynonymous SNVs, suggesting that they are in fact functional. We propose a workflow to identify phenotype-associated regulatory SNVs that might affect alternative splicing from exome-sequencing-derived genetic variations. Based on the affecting SNVs on the quantitative traits simulated in GAW17, we further identified two and four functional SNVs that are predicted to be involved in alternative splicing regulation in traits Q1 and Q2, respectively.
Alternative splicing is an important level of gene regulation and greatly contributes to proteome diversity . In humans, more than 90% of genes encode multiple protein isoforms , and many diseases are caused by the disregulation of splicing patterns [3, 4]. In eukaryotic cells, splicing patterns are tightly regulated in a temporospatial manner through a set of RNA-binding proteins (RBPs) that bind to cis-acting sites on the precursor mRNA (pre-mRNA). With the advent of microarray and next-generation sequencing technology, the RNA-binding consensus sequences of several RBPs have recently been identified [5–8].
Nonsynonymous single-nucleotide polymorphisms (SNPs) alter the amino acid composition of a protein; their effects on protein functions can be predicted by many bioinformatics tools, including SIFT , PolyPhen , SNPs3D , and MAPP . Nonsynonymous SNPs contribute to the etiology of many diseases . Recent studies, however, suggest that synonymous SNPs in exons are also functionally important . These variations frequently affect the binding of splicing regulatory factors (SRFs) and potentially result in abnormal pre-mRNA splicing patterns. We have previously reported a transcriptome-wide profiling of SFRS1 protein, a highly conserved, essential pre-mRNA splicing factor with dual functions in constitutive and alternative splicing . A search for the SFRS1 consensus motif within the Human Gene Mutation Database identified 181 mutations in 82 different genes that disrupt the predicted SFRS1 binding sites .
In this study, we present a computational strategy to systematically characterize the potential capability of single-nucleotide variations (SNVs) to regulate alternative splicing. Focusing on the exonic SNVs identified in each of the seven populations in the 1000 Genomes Project, we found that the minor allele frequencies (MAFs) for the synonymous SNVs residing in alternatively spliced exons and potentially disrupting the binding of RBPs are more comparable to nonsynonymous SNVs than to other SNVs, suggesting that they are under similar selection pressure. This result suggests the importance of regulatory SNVs that are associated with certain diseases or phenotypes and the importance of integrating biological annotation into genetic association studies. Furthermore, from the SNVs that are simulated to be associated with the phenotypes distributed by Genetic Analysis Workshop 17 (GAW17) , we identified two and four regulatory candidates for phenotypes Q1 and Q2, respectively.
Genotypes of SNVs for 697 individuals from seven populations (CEPH [European-descent population from Utah], Denver Chinese, Han Chinese, Japanese, Luhya, Tuscan, and Yoruba) were derived from the sequence alignment files created in the 1000 Genomes Project and distributed by GAW17. We considered 24,487 exonic SNVs (both synonymous and nonsynonymous) within 3,205 autosomal genes. For phenotypes, the GAW17 committee carried out 200 replicate simulations on traits Q1 and Q2 and disease liability for 697 individuals; the simulations were based on an answer sheet of associated SNVs for each trait .
Assessing the capability of a genetic variation to change the binding affinity of an RNA-binding protein
As a proof of concept, we focus our analysis on nine RBPs whose binding affinity has been characterized using an in vitro assay called RNAcompete . Using a customized microarray that contains all the potential 7-base and 8-base sequences and a single binding reaction, this technology determines the relative preferences of RBPs for short RNA sequences. For each of the nine RBPs being studied (HuR, Vts1, FUSIP1, PTB, U1A, SF2/ASF, SLM2, RBM4, and YB1), a preference score is provided for every possible RNA 7-mer, indicating the level of binding affinity of the specific RBP-7mer RNA pair .
where PSmin and PSmaj are the RBP’s preference score for the minor allele and the major allele, respectively; b and n denote score distributions for the binding and nonbinding events, respectively; and is the area to the right of the preference score on the minor allele (PSmin) under the distribution indicating binding events (b). A positive or negative AC value indicates that the minor allele will cause a gain or loss of binding affinity, respectively.
Alternative splicing events
We used the AltEvent track in the UCSC (University of California, Santa Cruz) Genome Browser to identify the SNVs residing in the exons that can be alternatively spliced . This track documents various types of alternative splicing that result in more than one gene isoform. We consider here the SNVs labeled to be associated with alternatively spliced exons by the UCSC Table Browser. These variations are more likely to be associated with the splicing regulation.
GAW17 data analysis
We calculated the capability of changing the binding affinity of each RBP for all 24,487 exonic SNVs, based on the AC score (Eq. (1)), which is defined as the logarithmic ratio of the binding likelihood of the minor and major alleles. A more extreme positive or negative AC value indicates a higher possibility of altering an RBP’s binding affinity (gain or loss of the binding). For each RBP, the AC values for all the exonic SNVs follow a normal-like distribution; one example, for SFRS1 (or SF2), is shown in Figure 1B. We consider the outliers, the SNVs with AC values larger than the mean plus 3 times the standard deviation or less than the mean minus 3 times the standard deviation, as candidates that may change the binding affinity of the RBP.
Results and discussion
Many synonymous variations potentially change the binding of RBPs
Synonymous variations that potentially regulate alternative splicing show lower minor allele frequency
Prioritizing phenotype-associated variations that regulate alternative splicing
Alternative splicing regulatory SNVs in Q1 and Q2
Minor allele function
We present a strategy to prioritize synonymous SNVs based on their likely capacity to change the binding affinity of an RBP and thereby affect pre-RNA splicing. Synonymous variations within alternatively spliced exons that affect RBP binding appear to be under similar selection pressure as nonsynonymous SNVs and therefore are candidates for functional SNVs affecting the phenotype. Synonymous SNVs that are outside the AltEvent or that do not affect RBP binding are under less selection pressure and therefore are considered less likely to be functional. We also show that some RBPs may have antagonistic relations when binding to synonymous variations, whereas others share the common consequence of gain or loss of binding. Trait-specific regulatory SNVs indicate that some nonsynonymous SNVs not only result in amino acid substitution but also regulate alternative splicing.
Our proposed workflow provides an applicable way to identify phenotype-associated variations involved in alternative splicing. It should be noted that both nonsynonymous and synonymous variations that affect alternative splicing can be identified by using this workflow. The results of this analysis will lead to novel hypotheses for investigating the mechanisms of disease-causing mutations.
The nine RBPs analyzed in this study are properly assessed for their binding preferences on RNA 7-mers . Without losing generalizability, the same strategy can be applied to other RBPs or to microRNA binding sites. With the technological advent of next-generation sequencing and the application of CLIP (crosslinking immunoprecipitation)-seq or RIP-seq technologies [7, 18, 19], binding affinities of more RBPs will be available. This will provide greater opportunity to understand the genetic mechanisms of disease.
This work is supported by National Institutes of Health (NIH) grants R21 AA017941 (awarded to YL and HJE), R01 GM085121 (awarded to JRS), U10 AA008401 (awarded to HJE), and P01 AG018397 (awarded to YL and HJE) and the Indiana Genomics Initiative of Indiana University (supported in part by the Lilly Endowment Inc.). The Genetic Analysis Workshop is supported by NIH grant R01 GM031575.
This article has been published as part of BMC Proceedings Volume 5 Supplement 9, 2011: Genetic Analysis Workshop 17. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/5?issue=S9.
- DL Black: Mechanisms of alternative pre-messenger RNA splicing. Annu Rev Biochem. 2003, 72: 291-336. 10.1146/annurev.biochem.72.121801.161720.View ArticleGoogle Scholar
- Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB: Alternative isoform regulation in human tissue transcriptomes. Nature. 2008, 456: 470-476. 10.1038/nature07509.PubMed CentralView ArticlePubMedGoogle Scholar
- Skotheim RI, Nees M: Alternative splicing in cancer: noise, functional, or systematic?. Int J Biochem Cell Biol. 2007, 39: 1432-1449. 10.1016/j.biocel.2007.02.016.View ArticlePubMedGoogle Scholar
- Fackenthal JD, Godley LA: Aberrant RNA splicing and its functional consequences in cancer cells. Dis Model Mech. 2008, 1: 37-425. 10.1242/dmm.000331.PubMed CentralView ArticlePubMedGoogle Scholar
- Lunde BM, Moore C, Varani G: RNA-binding proteins: modular design for efficient function. Nat Rev Mol Cell Biol. 2007, 8: 479-490. 10.1038/nrm2178.View ArticlePubMedGoogle Scholar
- Hogan DJ, Riordan DP, Gerber AP, Herschlag D, Brown PO: Diverse RNA-binding proteins interact with functionally related sets of RNAs, suggesting an extensive regulatory system. PLoS Biol. 2008, 6: e255-10.1371/journal.pbio.0060255.PubMed CentralView ArticlePubMedGoogle Scholar
- Sanford JR, Wang X, Mort M, Vanduyn N, Cooper DN, Mooney SD, Edenberg HJ, Liu Y: Splicing factor SFRS1 recognizes a functionally diverse landscape of RNA transcripts. Genome Res. 2009, 19: 381-394.PubMed CentralView ArticlePubMedGoogle Scholar
- Ray D, Kazan H, Chan ET, Castillo LP, Chaudhry S, Talukder S, Blencowe BJ, Morris Q, Hughes TR: Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nat Biotechnol. 2009, 27: 667-670. 10.1038/nbt.1550.View ArticlePubMedGoogle Scholar
- Ng PC, Henikoff S: SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003, 31: 3812-3814. 10.1093/nar/gkg509.PubMed CentralView ArticlePubMedGoogle Scholar
- Ramensky V, Bork P, Sunyaev S: Human non-synonymous SNPs: server and survey. Nucleic Acids Res. 2002, 30: 3894-3900. 10.1093/nar/gkf493.PubMed CentralView ArticlePubMedGoogle Scholar
- Yue P, Melamud E, Moult J: SNPs3D: candidate gene and SNP selection for association studies. BMC Bioinformatics. 2006, 7: 166-10.1186/1471-2105-7-166.PubMed CentralView ArticlePubMedGoogle Scholar
- Stone EA, Sidow A: Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome Res. 2005, 15: 978-986. 10.1101/gr.3804205.PubMed CentralView ArticlePubMedGoogle Scholar
- Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, Duncanson A, Kwiatkowski DP, McCarthy MI, Ouwehand WH, Samani NJ, et al: Association scan of 14,500 nonsynonymous SNPs in four diseases identifies autoimmunity variants. Nat Genet. 2007, 39: 1329-1337. 10.1038/ng.2007.17.View ArticlePubMedGoogle Scholar
- Cartegni L, Chew SL, Krainer AR: Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet. 2002, 3: 285-298. 10.1038/nrg775.View ArticlePubMedGoogle Scholar
- Almasy LA, Dyer TD, Peralta JM, Kent JW, Charlesworth JC, Curran JE, Blangero J: Genetic Analysis Workshop 17 mini-exome simulation. BMC Proc. 2011, 5 (suppl 9): S2-10.1186/1753-6561-5-S9-S2.PubMed CentralView ArticlePubMedGoogle Scholar
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The human genome browser at UCSC. Genome Res. 2002, 12: 996-1006. 10.1101/gr.229102. Article published online before print in May 2002.PubMed CentralView ArticlePubMedGoogle Scholar
- Ringner M: What is principal component analysis?. Nat Biotechnol. 2008, 26: 303-304. 10.1038/nbt0308-303.View ArticlePubMedGoogle Scholar
- Ule J, Jensen K, Mele A, Darnell RB: CLIP: a method for identifying protein-RNA interaction sites in living cells. Methods. 2005, 37: 376-386. 10.1016/j.ymeth.2005.07.018.View ArticlePubMedGoogle Scholar
- Yeo GW, Coufal NG, Liang TY, Peng GE, Fu XD, Gage FH: An RNA code for the FOX2 splicing regulator revealed by mapping RNA-protein interactions in stem cells. Nat Struct Mol Biol. 2009, 16: 130-137. 10.1038/nsmb.1545.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.