Mining the TRAF6/p62 interactome for a selective ubiquitination motif
© Jadhav et al; licensee BioMed Central Ltd. 2011
Published: 28 April 2011
Skip to main content
© Jadhav et al; licensee BioMed Central Ltd. 2011
Published: 28 April 2011
A new approach is described here to predict ubiquitinated substrates of the E3 ubiquitin ligase, TRAF6, which takes into account its interaction with the scaffold protein SQSTM1/p62. A novel TRAF6 ubiquitination motif defined as [–(hydrophobic)–k–(hydrophobic)–x–x–(hydrophobic)– (polar)–(hydrophobic)–(polar)–(hydrophobic)] was identified and used to screen the TRAF6/p62 interactome composed of 155 proteins, that were either TRAF6 or p62 interactors, or a negative dataset, composed of 54 proteins with no known association to either TRAF6 or p62. NRIF (K19), TrkA (K485), TrkB (K811), TrkC (K602 and K815), NTRK2 (K828), NTRK3 (K829) and MBP (K169) were found to possess a perfect match for the amino acid consensus motif for TRAF6/p62 ubiquitination. Subsequent analyses revealed that this motif was biased to the C-terminal regions of the protein (nearly 50% the sites), and had preference for loops (~50%) and helices (~37%) over beta-strands (15% or less). In addition, the motif was observed to be in regions that were highly solvent accessible (nearly 90%). Our findings suggest that specific Lysines may be selected for ubiquitination based upon an embedded code defined by a specific amino acid motif with structural determinants. Collectively, our results reveal an unappreciated role for the scaffold protein in targeting ubiquitination. The findings described herein could be used to aid in identification of other E3/scaffold ubiquitination sites.
The process of signal transduction is dependent upon specific protein-protein interactions, with a small number of proteins – called 'hubs' possessing the ability to interact with many different partners to form multimeric signaling complexes referred to as signalsomes. These hubs mediate interactions by their modular protein domains that confer specific binding activity to their interacting partners. Protein p62 contains several structural motifs that allow it to function as a hub for protein-protein interactions. These motifs include an acidic interaction domain (AID/ORCA/PC/PB1) that binds the aPKC, a ZZ finger, a binding site for the RING finger protein TRAF6, two PEST sequences, and the UBA domain . In this study, we focused on the mechanism by which TRAF6, along with p62, targets specific Lysines for ubiquitination. The addition of ubiquitin, a 76 amino acid protein, occurs via a three step process involving the concerted action of the E1 ubiquitin activating enzyme, E2 the ubiquitin conjugating enzyme and E3 the ubiquitin ligase. Substrates are recognized by different E3s or E2/E3 complexes. The incorporation of a scaffold into the model, such as p62, would serve as a crucial bridge between enzyme (E3 ligase, TRAF6) and its substrate(s) and provide specificity for enzyme-substrate reactions . Thus substrate recognition, site selection and ultimately the ubiquitination reaction result from the activation of the E3 once docked on the scaffold. In support of this model, inhibition of TRAF6/p62 interaction blocks the activity of TRAF6 along with diminished K63-ubiquitination of its target substrate .
There are approximately one thousand reported E3 ubiquitin ligases in eukaryotes. The preferred substrates for most of these ligases remain largely unknown. Moreover, it remains unclear how among the many Lysines (K) found in an ubiquitinated protein only a few are selected for post-translational modification. Like all signal transduction processes, ubiquitination is a result of specific protein-protein interactions and biochemical reactions occurring at the sub-cellular locales, mediated by modular protein domains. Furthermore, E3 ligases and scaffold proteins interact with numerous binding partners through their multi-domain structures. SQSTM1/p62 is one such scaffold that is known to be a crucial for bridging the E3 ligase, TRAF6 and its substrates while at the same time providing specificity for the enzyme-substrate reactions. We propose that the scaffold plays an integral component in activation of the E3 and may target the specificity of the ubiquitination reaction [3, 4]. Therefore, the scaffold TAB 2 and TAB3 which also interact with TRAF6  would mediate ubiquitination of a different set of substrates, compared to the interaction of TRAF6 with the scaffold p62. These interactions would provide the needed specificity for formation of a signalsome in response to different receptors and external stimuli.
There are approximately 617 genes encoding putative ubiquitin (Ub) E3s compared to the 518 genes reported for protein kinases . Preferred substrates for most of these enzymes remain unknown. The biological importance of E3s requires understanding the site selection process involved in substrate recognition during the ubiquitination reaction. Eukaryotic cells express a single ubiquitin-activating enzyme (E1) that activates free ubiquitin for subsequent transfer to one of approximately 50 ubiquitin-conjugating enzymes (E2) . Ubiquitin E3 ligases recruit both substrate and activated ubiquitin to mediate the transfer of the ubiquitin molecule to the targeted protein either directly or with the help of E2 enzymes . The substrate specificity of the ubiquitination process occurs at the level of the E3 ubiquitin ligases. A large numbers of cellular proteins are known to be ubiquitinated and correspondingly, there are large numbers of E3 ligases with a diverse range of substrates. With fewer than 1% of the cellular proteins being ubiquitinated at a given time on a select number of Lysine residues, our understanding of the ubiquitination process is still in its infancy. A number of in vivo and in vitro methods have been employed to identify ubiquitinated substrates and their sites, including proteome-scale analyses of the substrates [11–13]. All these methods are time-consuming, labor-intensive, and expensive. In addition, they are focused on characterizing the ‘ubiquitinated proteome’ rather than studying single enzyme substrates. In contrast, computational approaches represent promising alternative methods for identification of ubiquitination sites would be of great value to the field. Until recently, no consensus amino acid motif had been reported for a single ubiquitin ligase. The reported biological specificity seems to be associated with substrate selection. This observation prompted us to hypothesize that there exists an embedded code specified by a string of amino acids that is read by E3 ligase in the target substrate. Employing this hypothesis, we developed a method to predict putative TRAF6/p62 ubiquitination sites. To facilitate identification of the putative consensus motif(s) within a substrate protein, a brute-force search algorithm was designed and implemented to search for the site.
As the starting point, we examined protein sequences of two known TRAF6/p62 substrates, TrkA and NRIF. This initial analysis concentrated on target ubiquitination sites selected to optimize our search for any potential consensus motif. Examination of flanking residues surrounding the target Lysine revealed the presence of a likely consensus motif, which was then used to screen additional protein sequences derived from the Trk receptor family. Ubiquitination sites in TrkB and TrkC proteins were first identified in silico and then confirmed through site-directed mutagenesis and functional testing. The final analysis identified a 10-amino acid long sequence of [-hydrophobic – k – hydrophobic – x – x – hydrophobic - polar1 – hydrophobic - polar2 – hydrophobic -]. The hydrophobic amino acids included Alanine, Leucine, Valine, Methionine, Glycine, Phenylalanine, or Isoleucine. The polar1 amino acids included Glutamine, Tyrosine, Cysteine, or Serine and polar2 included Histidine, Aspartic Acid, or Threonine (Figure 1B).
The overall goal of this study was to test the hypothesis that TRAF6 and p62- interacting proteins are putative E3 ubiquitin ligase substrates sharing a common target ubiquitination motif. Because our putative motif was pattern based and did not exhibit a fixed amino-acid sequence, we organized our project in a step-wise fashion. First we asked if the putative motif was broadly present within TRAF6 and p62-interacting proteins. This search included questions regarding the form and complexity of extant motifs, as well as, what subset of amino acids served as the core components. Next we attempted to evaluate the uniqueness of the motif forms identified. This was accomplished by both screening for the occurrence of the motif pattern among unrelated proteins and by estimating the probability of observing such motifs within a randomly generated amino acid sequence. In the final step, we further characterized our target motifs relative to specific characteristics known to be associated with protein function.
To permit testing of our motif pattern, a database containing 209 proteins was organized. Amino acid sequences were obtained from the National Center for Biotechnology Information (NCBI) protein repository. The proteins sequences were from a mixture of mammalian sources with human-derived proteins preferentially used when available. A full listing of proteins is provided in Jadhav . Proteins within the full database were subdivided into two groups, an experimental dataset containing known or likely TRAF6 / p62 interactors (n = 155) and a negative dataset containing randomly selected proteins with no known association (n = 54). The experimental dataset was further divided into five groups depending on the protein’s likelihood of being a TRAF6/p62 substrate. The subsets ranged from known ubiquitinated substrates with mapped sites to either TRAF6 or p62 interacting proteins. All known TRAF6/p62 substrates with verified ubiquitination (Ub) sites were placed in group I. Group II contained known and tested substrates of the TRAF6 E3 ligase whose target Lysine Ub site(s) were not mapped nor identified and their interaction status with p62 was unknown. TRAF6- or p62- interactors identified from various protein-protein interaction databases [HPRD , and BioGRID  and EntrezGene  formed Groups III and IV, respectively. Finally, Group V was comprised of proteins from the insoluble Formic acid (FA) fraction of the brain from p62 knockout mice. These proteins were included because the trafficking and turnover of these proteins are p62 dependent, and therefore, these proteins may also be either TRAF6 or p62 substrates. The negative dataset contained 54 proteins with no known evidence of interaction with either TRAF6 or p62 proteins. This dataset was used both for control comparisons and as a test group for the search algorithm.
Amino acid sequences of the 209 database proteins were searched for matches to all possible combinations of the putative motif using a brute-force approach. An exhaustive search approach was chosen over existing algorithms for two reasons: most existing motif programs are either limited to searching for individual motif sequences which is inconsistent with our variable pattern or programs that do include variation (e.g. SLiMSearch) are designed to only report perfect matches. Because we were interested in evaluating the entire distribution of matches, particularly those found to be near-perfect (7, 8 or 9 positions), we felt that a brute force approach which identified all potential matches would be most informative. To conduct this search, a file containing all unique combinations of seven variable positions in the 10 amino acid long target motif (hydrophobic – k – hydrophobic – x – x – hydrophobic – polar1 – hydrophobic – polar2 – hydrophobic) was generated (Figure 1B). Ideally our search would have included all possible combinations of amino acids hypothesized to occur in each of the 10 motif positions. Such an approach would, however, have required examining over 80 million character strings at each lysine residue; certainly possible but not practical. As a compromise, we chose to search using only the seven well defined positions and to conduct an a posterior analysis of the two positions (x) that could contain any amino acid. The results from this analysis yielded no consistent amino acid pattern for these two sites (data not shown), suggesting that our decision was appropriate. Excluding the two x positions and the fixed Lysine, a total of 201,684 unique seven position motifs were possible. Two computer-based search algorithms were employed to facilitate screening for the presence of motifs. The first program, MotifMaker, was a simple looping program that generated and stored all 201,684 potential motifs combinations. The second program, MotifFinder, implemented a brute-force search algorithm for all possible motif constructs. The analysis started by identifying and counting each K(Lysine) within the target peptide. Any Kwithin 8 residues from the carboxyl end was excluded because it would be impossible for it to be a full motif. The motif search then proceeded by temporarily storing the K-1 , K + 1 , K + 4…K + 8amino acids for each Kas a character string and comparing this string to each of the 201,684 potential motif patterns. A step-up procedure was used to determine the best fit among the potential motifs. For each K, a counter would be initially set at “zero” matches. The counter would be progressively updated as positive matches between the target string and potential motifs were encountered. The matching motif would then be stored in the computer memory. By searching all possible motif combinations, this approach ensured that the maximum ‘best match’ motif was identified. In motifs that matched at all 7 variable positions, a perfect match was identified. In motifs with less than perfect matches (6, 5, 4,…1), the algorithm ensured that no motif with a greater number of matching locations could be found. The procedure was repeated at each Kwithin the target peptide until all positions had been searched. Information on protein ID, the location of each K, the “best match” motif pattern for the amino acids surrounding each K , the corresponding motif pattern and the total count of positive hits was stored as output for each protein. Both programs were developed and executed using MATLAB® V6.5 (MathWorks Inc., Natick MA) and are freely available from the corresponding author (MCW).
To verify the accuracy of our search results both the experimental and negative datasets were submitted to SLiMSearch [http://www.southampton.ac.uk/~re1u06/software/slimsearch/index.html]. This program searches pre-defined SLiMs (Short Linear Motifs) in a protein sequence database and allows individual residues to vary as defined by the user. We utilized this program to search for perfect (10 amino acid) matches which we then compared to the results provided by our MotifFinder algorithm.
Results from the MotifFinder search of the 209 study proteins were compiled as frequency distributions representing the number of positive motif “hits” ranked from 0 (no matches) to 7 (perfect match). Standard descriptive statistics (mean, mode, variance, skewness and kurtosis) were generated for each distribution. Of particular interest were the estimates of kurtosis and skewness  which were derived for each full empirical distribution and for distribution subsets (5, 6, 7 hits). Statistical comparisons between the experimental and negative datasets were made using Chi-squared analyses (with Yates’ correction and one degree of freedom) or t-tests . Statistical calculations were generated using SAS 9.1 (SAS Institute Inc., Cary NC), Minitab 16 (Minitab Inc., State College, PA) or by hand.
A randomization approach was used to further evaluate the uniqueness of the consensus motif and to provide a basis for initial estimates for the probability of pattern occurrence. This procedure involved the creation and searching of 999 amino acid sequences which were randomly generated using a parameter space designed to reflect the information content found in the original experimental dataset. The procedure was initiated by identifying the frequency of occurrence for each amino acid contained in the 155 proteins of the experimental set. This empirical distribution was used as the source for random draws of amino acids (with replacement) that were used to construct 999 sequences. Each of these sequences was 565 amino acids in length which corresponded to the median size of actual proteins in the experimental dataset. The entire randomized dataset was searched for motif matches using MotifFinder and the results compiled / analyzed as described above.
To aid in visual evaluation of putative motifs, sequence logos [WebLogo; 21, 22] were created for the sets of 7 (perfect fit) and 6 (near fit) amino acid motifs identified. This program generates a display of the amino acid distribution surrounding the core motif Lysine. The height of each letter in a stack is proportional to its frequency at that position in the motif set. Letters were further sorted with the most frequent amino acid on top.
PSIPRED [23, 24] was used to predict secondary structures. PSIPRED uses neural networking and searches for homologous proteins with known structures to determine the most likely structure at each residue position. Predictions of disorder regions at predicted ubiquitinated sites were made using the Metaserver of Disorder (MeDor) . MeDor collects disorder and secondary structure predictions from servers available on the web and generates a graphical output. The web-based database SMART  was used to predict signaling domains within the protein sequences identified as containing strong motif patterns. The SABLE server was used to predict from sequence secondary structures and solvent accessibilities, with the goal of identifying potential characteristics of predicted Ub sites in terms of structural profiles .
The 155 proteins in the experimental dataset ranged in length from 103 to 4,572 amino acids with a median length of 565 amino acids (mean = 755.9 ± 609.5). A total of 7,592 Lysine residues (range = 1 – 260 per sequence) were found within the 155 sequences. This produced a median count of 33 Lysines per protein with one Lysine residues occurring on average every 19.7 residues (median = 17). The 54 protein negative dataset contained fewer total Lysines (2,220) but median values were similar between the two datasets. The negative dataset proteins ranged from 147 to 2,115 amino acids in length (median = 584; mean = 738.0 ± 496.7). Lysines ranged from 4 to 151 per protein with a median count of 29 and an average of one Lysine per 18.7 residues. Of the 7,592 total Lysines in the experimental dataset, 7,494 (98.7%) were located at positions within the sequence that could allow for a full 10 amino acid motif. In the negative dataset, 2,209 of 2,220 total Lysines (99.5%) were similarly available.
Because low accuracy hits dominated all statistical comparisons and potentially masked meaningful differences, we next evaluated a subset of the distributions corresponding to match values > 4 (Figure 2B and insert). Our assumption was that matches to the putative motif at 10, 9 or 8 positions might be biologically meaningful while those below these values were likely spurious. Proportions of 5 match hits were consistent between the two datasets (experimental = 86.0; negative = 88.6), however, closer examination of the 6 and 7 hit regions of the distributions revealed a strong dichotomy. Perfect matches to all 7 variable residues of the putative motif (Figure 2B) were only found among the experimental proteins. One protein, TrkC, contained two perfect match sequences. No perfect matches were identified within the negative dataset. Twenty-six matches at 6 variable sites were reported from the experimental dataset and 8 from the negative. Because differences between the dataset results were primarily at the 6 (near perfect) and 7 (perfect) matches and little consistency could be identified in the sequence patterns among the level 5 motif matches, all further analyses concentrated on 6 or 7 match results only.
Using the rate of occurrence of perfect matches (7 variable amino acids) in the randomized dataset as a crude estimator, we made a first approximation of the probability of encountering 8 motifs within 155 proteins. Based an initial expectation of 1 match per 999 proteins, the joint probability of finding 8 perfect motifs at random within 155 proteins was estimated to be extremely small at ~ 4.0 x 10-7 . A similar pattern was found for the near-perfect matches (6 variable amino acids) in the experimental dataset where 26 matches were observed compared to 19.8 predicted. As mentioned, it is obvious that bias did exist within the experimental dataset relate to protein selection, however even if the probability estimates are inflated by 1 or more orders of magnitude the net conclusion remains; the total numbers of perfect and near-perfect motif matches observed were well beyond random expectations.
Collectively, the results provided by comparisons of the experimental, negative and randomized datasets provided evidence in support of our hypothesis that a Lysine-centered ubiquitination motif might exist among TRAF6 / p62 interactors. The overall similarity between the negative and experimental match distributions argues that the selection of potential interacting proteins for the experimental group did not unduly bias the overall distribution pattern. Conversely, the presence of probabilistically unlikely large numbers of perfect and near perfect motif matches within the related experimental group proteins suggested that the motif is by itself or in association with an amino acid sequence of conserved biological importance.
Lysines identified as TRAF6 ubiquitination sites and their structural characteristics.
470 - 490
810 – 820
813 – 817
827 – 834
827 – 833
13 - 40
162 – 171
GO ontology analysis of sites in proteins with perfect match to the hypothesized motif for TRAF6/p62 ubiquitination.
GO: Term for function
small GTPase mediated signal transduction, transmembrane receptor protein tyrosine kinase signaling pathway, nervous system development
Plasma membrane, cytosol, endosome
transmembrane receptor protein tyrosine kinase signaling pathway, regulation of dendrite development
Plasma membrane, cytosol, endosome
transmembrane receptor protein tyrosine kinase signaling pathway, nervous system development
Plasma membrane, cytosol, endosome
nervous system development, transmembrane receptor protein tyrosine kinase signaling pathway, activation of adenylate cyclase activity
neurotrophin binding protein binding
Integral to plasma membrane, cytoplasm
Integral to plasma
development, transmembrane receptor protein tyrosine kinase signaling pathway, activation of adenylate cyclase activity
regulation of transcription
synaptic transmission, central nervous system development, central nervous system development
structural constituent of myelin sheath
The highest possible resolution for investigating structure–function relationships is that of individual residues and their corresponding microenvironments . To provide information on this aspect of hypothetical high-probability sites, the distribution of predicted Lysines residues with regards to their relative position within the protein sequence was evaluated (Figure 5C). Nearly half (48%) of the core motif Lysines were located near the C-termini of the proteins in the experimental dataset as compared to only 28% in the negative dataset. The remaining predicted sites were evenly distributed (25.8%) at the C-terminus or middle region of the proteins in the experimental dataset. On the contrary, within the negative dataset most (42%) target Lysines were located in the middle region of the protein (Figure 5C). This could be either because of false positive prediction of the sites or due to true positive (valid) sites that are buried inside the protein and become exposed when these proteins undergo conformational changes induced by other post-translational modifications or protein-protein interactions. This finding was consistent with studies of the TRAF6 substrate, IRF7 that is ubiquitinated at multiple sites both in vitro and in vivo with the three C-terminal Lysines (positions 444, 446, and 452) essential for activation of IRF7 [31, 32]. Similar studies on SUMOylation sites of LEDGF/p75 have shown that K75, K250, and K254 mapped on the N-terminal region located in evolutionarily conserved charge-rich regions, while C-terminal K364 was identified as solvent exposed . There were 98 Lysines in the N-terminal regions of the proteins in the experimental dataset that were not recognized by the program as they lacked the required 8 amino acids towards the N-terminal end to fit the 10 amino acid long motif. Out of these 98 Lysines, there were five instances of di-Lysines and four tri-Lysines with one occurrence of a poly-Lysine chain of 9 Lysines. The negative dataset on the other hand had 11 N-terminal Lysines, with only one occurrence of a di-Lysine. No specific amino acid distribution pattern was observed surrounding the N-terminal Lysines. The downstream Lysines in the di-Lysine sequences have been reported to be preferentially ubiquitinated .
Recent studies of all post-translationally modified proteins documented in Swiss-Prot have shown that most reversible modifications are found on protein surfaces . Ubiquitinated Lysines are surface exposed but this information is hidden in the primary sequence of the protein which can be detected by a surface accessibility predictor. To examine this possibility for the data, solvent accessibility of the high probability core Lysines for modification was examined. Solvent accessibility of an individual residue is often classified as “buried" or "exposed" using geometric analysis (geometric similarity in the arrangement of the water molecules around proteins)  or predictive methods. Prediction of solvent accessibilities revealed 84% of the highly positive sites in the experimental dataset and 100% of the negative dataset were exposed on the surface of the protein, which in a cellular environment, would be easily accessible to the active TRAF6/p62 complex (Figure 5D). It has been reported that surface accessibility of post-translational modifications is important for protein-protein interactivity .
Localization data of the high probability substrates revealed that relatively few cytosolic proteins were predicted to be TRAF6/p62 substrates. However, when the nuclear proteins in the experimental dataset were examined, the proportion predicted to function as TRAF6/p62 substrates (29%) was slightly higher than the total percentage of nuclear proteins (25%) found in the dataset (Figure 6). A substantial increase in prediction percentages was, however, observed for proteins that were integral to membranes in both the experimental and negative datasets. This finding suggests that either our motif was biased in favor of a pattern that occurs more commonly within membrane bound proteins, or potential that a meaningful cellular association was elucidated. Since the consensus motif was based on plasma membrane bound TrkA and nuclear protein NRIF [3, 4], we hypothesize that a combination of both conclusions may be correct. Further refinement of the motif as more substrates are experimentally verified from various subcellular localizations will be needed to clarify this observation.
We sought to further validate the biological relevance of our hypothetical ubiquitination motif by examining it in an evolutionary context. For this analysis a high-confidence set of TRAF6/P62 substrates, those with perfect matches to our putative motif, were selected for alignment. These proteins were TrkA, TrkB, TrkC, NRIF, NTRK2, NTRK3 and MBP (Table 1). To check for potential evidence of evolutionary pressure to conserve the site-specific ubiquitinated Lysines, conservation of our predicted sites in these eight proteins was examined across multiple species . Our results indicated that a predicted ubiquitination sites were conserved from among six mammalian species (data not shown). This unusually high conservation suggests that the ubiquitination of these sites may potentially be conserved across a wide spectrum of life forms, although this remains untested. A high degree of conservation among proteins that are ubiquitinated also suggests that they may have arisen early in the course of evolution. However, a significant number of ubiquitination sites differ in the ubiquitome and the extent of homology is not uniform because of the high diversity among the proteins. Nevertheless, evidence of conservation does suggest that ubiquitination is in each case indispensable for protein function, which is in turn essential for regulating cellular function. These highly conserved essential ubiquitination events may reflect how early forms of life used protein ubiquitination in specific housekeeping cellular functions. Interestingly, our results indicated that although the surrounding sequence regions may diverge, the critical residues remain conserved. Similar whole genome-scale studies have shown that 2,683 potential SUMO substrates are conserved between human and mouse based on pattern recognition and phylogenetic conservation . In another study, linear pattern recognition in combination with phylogenetic conservation was first used to discover transcription factor binding sites . This report was similar to recent studies on phosphorylation sites which demonstrated similar conservation within protein families , thus pointing at generic regulatory mechanisms which may be conserved across species. The presence of evolutionary conservation is mechanistically important because the short lengths and rarity within a complex proteome make linear motifs difficult to find computationally .
Most proteins in cells undergo post-translational modifications giving them structural and functional diversity for diverse roles in biological processes. Experimental identification and validation of posttranslational modifications (PTMs) is labor-intensive task and can be expensive in the absence of prior knowledge concerning PTMs. Analyzing the ‘ubiquitome’ is one of the most exciting and challenging tasks in current proteomics research. A lack of curated datasets of ubiquitinated proteins presents the ultimate limiting factor in studying substrate selection mechanism in ubiquitination making it difficult to evaluate, and compare target sites. As more and more ligases are identified there exists an urgent need to rapidly and precisely identify enzyme-specific substrates to decode their selectivity and specificity . Computational prediction of PTM sites has provided researchers with information on the high probability PTM sites for further experimental characterizations like PHOSIDA and NetPhos for phosphorylation [41, 42], SUMOsp for SUMOylation  and NetAcet for prediction of N-acetyltransferase A substrates .
In this study, a computational tool was developed to predict Lysine ubiquitination sites from sequences using MATLAB programs and online prediction software. As more validated ubiquitinated sites from experimental data become available, and appropriate changes are made based on the available site data, further predictions can be made. The inclusion of structural information to improve the prediction tools could be another way to enhance the prediction performance as ubiquitination is an enzymatic process, and the interactions between target sites and enzymes concerned should be structurally satisfied. The model that we propose herein could be applied to other E3 Ub ligases that are known to employ scaffold proteins to aid in their substrate selection process. One such example is DYRK2–EDVP E3 ligase complex where DYRK2 not only is it serves as adaptor for assembly of the active Ub ligase complex, but it also phosphorylates its substrate and primes the substrate for degradation . Thus, use of bioinformatics methods to predict site modification in silico could yield more defined results. These prediction tools should be closely integrated into the interpretation of proteomic experiments.
Here we identified the interactome of the active enzyme complex and studied the verified substrates for characterization of target sites to predict substrates. As proteomics methods identify additional in vivo ubiquitination sites, prediction algorithms can be fine tuned and improved. A conserved motif that serves as a recognition determinant for TRAF6/p62 enzyme complex was identified. Studies have shown some structural preferences for ubiquitination of targeted proteins such as preferred choice of Lysines in loops and/or for easily accessible Lysines within α–helical regions . Our findings indicate a bias towards a specific consensus sequence motif for ubiquitination by TRAF6/p62. Moreover, it appears that the active complex targets an accessible surface residue providing the selection process with a conformational recognition mechanism. We propose that the scaffold, p62, plays an important role in recruiting substrates for TRAF6, thereby facilitating interaction with an accessible Lysine residue in a loop or helical structures on the surface of the substrate resulting in K63-polyubiquitination at a specific Lysine, if the flanking residues fit the consensus motif. In this regard, it is of interest to point out that the scaffold TAB2 and TAB3, although ubiquitinated by TRAF6 (7), did not possess the motif identified by our search algorithm. This finding supports the notion that specific interaction of the ligase with a given scaffold may facilitate substrate ubiquitination in response to a given external stimulus.
A total of 42 high probability TRAF6/p62 ubiquitination sites in 30 proteins were identified by our prediction approach. Structural analysis of these predicted TRAF6/P62 substrates showed that the predicted ubiquitination sites were biased towards the C-terminal domain of the protein [31, 32]. Secondary structure analysis of the predicted sites revealed overall preference for loops and helices than beta-strands and solvent accessibility analysis of predicted Lysines revealed most of the predicted sites were exposed on the surface of the protein rather than being buried. There was high structural and phylogenetic conservation of predicted sites. Tertiary structure analyses of investigated proteins revealed that most of the predicted sites are likely to be exposed on the surface of the protein rather than being buried. Although linear conservation of individual amino acids within the consensus motif at the predicted ubiquitinated sites is low, there is a high structural and evolutionary conservation of predicted sites across mammalian species. The high accessibility of ubiquitination sites suggests that they are localized in loops and helices, since these structural elements are usually found at the protein surface. It is well known that the loop regions frequently participate in forming binding sites and active sites of enzymes making them excellent substrates for regulation . Beta sheets can be internal to a protein (largely hydrophobic) or on the surface in which case they are amphipathic, with every other amino acid side chain alternating between hydrophobic and hydrophilic nature. Because post-translational modification sites are predominantly located in rapidly evolving loop regions , relaxed evolutionary constraints on loops allow them to evolve rapidly and rather independently from the protein core . Formally, disordered regions are defined as regions within proteins that lack a precise 3D structure and consist of an ensemble of fluctuating, interconverting conformers. These regions have been known to be associated frequently with posttranslational modifications . Disorder prediction of linear motifs and their flanking regions for the experimentally characterized examples from the Eukaryotic Linear Motif (ELM) database revealed that short recognitions motifs are embedded in locally unstructured regions . Thus, structurally and evolutionarily, our high-confidence set of TRAF6/62 substrates and highly positive motif sites represent a reasonable site for post-translational modification by ubiquitin.
Findings from this study indicate that the ubiquitination site prediction is closely correlated with the amino acid property around the ubiquitination site. Our approach makes it possible to find putative novel ubiquitination sites that have not (yet) been experimentally identified. Thus, in the absence of experimental data, the prediction of novel ubiquitination sites can be taken as the first method of an experimental design for uncovering functionality of any protein of interest and elucidating its involvement in certain signaling cascades. Methods for computational prediction of peptide specificities and identification of substrates could be enhanced by combining different approaches and integrating various types of information. In addition, the prediction approach taken here combined with experimental verification will propel our understanding of ubiquitination mechanisms. Thus, a combination of both a computational and experimental approach could propel our understanding of ubiquitination dynamics into a new phase.
In conclusion, NRIF (K19), TrkA (K485), TrkB (K811), TrkC (K602 and K815), NTRK2 (K828), NTRK3 (K829) and MBP (K169) were found to possess a perfect match for the amino acid consensus motif for TRAF6/p62 ubiquitination. The ability of a given Lysine residue to serve as a bona fide in vivo ubiquitination site has been verified by mutagenesis along with verification of K63-ubiquitin linked chains for sites observed in TrkA, B, C, and NRIF [3, 4]. NTRK2, NTRK3 and MBP are putative TRAF6 substrates which have not been previously reported to interact with p62 and thus define new signaling networks for integration of this E3/scaffold. Altogether these findings suggest that the approach we describe could be applied to other E3 ligases for prediction of their substrates taking into account the specificity provided by the scaffold which aid in the formation of the signalsome in response to external stimuli.
Atypical PKC-interaction domain
Atypical protein kinase C
Eukaryotic linear motifs
Myelin basic protein
NF-kappa-B essential modulator
Neurotrophin receptor interacting factor
Neurotrophic tyrosine receptor kinase 2
Neurotrophic tyrosine receptor kinase 3
Phox and Bem1
Proline Glutamate Serine Threonine
Post translational modification
Really Interesting New Gene
Small ubiquitin-related modifier
TGF-beta activated kinase 1/MAP3K7 binding protein 2
TGF-beta activated kinase 1/MAP3K7 binding protein 3
Tumor necrosis factor receptor associated factor 6
Tropomyosin receptor kinase A
Tropomyosin receptor kinase B
Tropomyosin receptor kinase C
Zeta protein kinase C interacting protein
ZZ-type Zinc finger domain
This study was supported in part by NIH –NINDS 33661 (MWW). We thank S. Santos, J. Wower and N. Singh for reading draft version of this manuscript.
This article has been published as part of BMC Proceedings Volume 5 Supplement 2, 2011: Proceedings of the 6th International Symposium on Bioinformatics Research and Applications (ISBRA'10). The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/5?issue=S2.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.