General concept and views
To offer the available information to the user on different levels of abstraction and to support interactive synchronized exploration (Figure 1), we have carefully selected suitable visualizations as described in the following:
First, we use the standard representations of the three-dimensional (3D) structure and sequence of proteins as provided by UCSF Chimera [12, 13] because sequence changes and their impact on the structure might give valuable insight. UCSF Chimera offers a variety of tools that support the interactive crosstalk between sequences and structures, affording advanced exploration of multiple sequence alignments, comparison of structures and incorporation of user-specific data. In particular, the user can study the amino acid changes between two sequences and their locations on the corresponding protein structures. It is also possible to construct a structure-based sequence alignment from the superposition of two structures. This deep integration of sequences and structures is further complemented by a multitude of molecular graphics features.
Second, we apply the RINerator tool [14] to create a two-dimensional (2D) residue interaction network (RIN) from the protein structure and visualized the RIN with the help of RINalyzer [14] within the Cytoscape platform [15]. Such a network representation is very useful to demonstrate the impact of mutations at the detailed residue interaction level by highlighting the changes of local interactions as well as long-range interaction paths, e.g. indirect interactions between residues.
Third, we offer less complex, aggregated overviews that focus on functional or structural subunits like secondary structure elements and illustrate the location and distribution of the mutations on the protein structure. In particular, we utilize the cartoon view as provided by the Pro-origami web service [16]. The main advantage of this view is that it gives a clear depiction of the chain and the secondary structure elements, while it leaves out the exact spatial location and the interrelations between those elements, which are provided by the other more detailed views. As the visual mapping from a RIN to the corresponding cartoon might be difficult for the user, a network representation that shows the RIN together with aggregated secondary structure elements can be created as an intermediate visualization.
Fourth, we extract additional structural and functional information from external databases and map these data as visual cues onto the visualizations. Functional residue annotations such as protein domain localization as well as binding and catalytic sites are important for identifying mutations that could have a direct impact on the function of the protein because they are in or near such sites. Structural properties of residues such as hydrophobicity, solvent accessible surface area, and polarity are used to characterize their potential effect on protein structure and function. Last but not least, evolutionary conservation information is crucial for distinguishing between residue changes in conserved (less tolerable of sequence changes) or variable regions.
Finally, the linkage between the different views is maintained by several mechanisms. Regarding the interactive exploration, we propagate the selection of elements in one view to the others. We synchronize orientation and location between RINs and structures using a special layout algorithm that we developed for this purpose. In particular, we want to ensure a consistent use of information mapping and similar cues over all views. All of the above is accomplished by adapting and extending our plugins RINalyzer [14] and structureViz [17] to integrate the freely available software tools Cytoscape, UCSF Chimera, and Pro-origami into a prototypic implementation (Figure 2). Download links and further documentation can be found at the RINalyzer webpage [11].
RIN view and layout
The residue interaction networks (RINs) are generated by RINerator from a 3D protein structure as described previously and shown as standard network visualization within Cytoscape using RINalyzer [14, 18]. In this visualization, network nodes represent amino acid residues and edges depict non-covalent residue interactions. To transfer the spatial localization information of the mutations from the structure view to the network view, we replaced the previous force-directed layout algorithm by a more appropriate stress minimization variant (Figure 1 and 3).
The new layout method is distance-based, i.e., allows specifying distances between the residues. During the layout computation, it minimizes the weighted mean square error between the given distances for pairs of residues and the geometric distance in the layout with an emphasis on local accuracy. The layout is initialized using a projection of the 3D coordinates on a 2D plane based on the UCSF Chimera view perspective. To allow for a flexible representation of the residue network and, at the same time, to preserve the user's spatial orientation using the fixed projection coordinates, we compute the stress as a balanced combination of both and increase the priority for the latter over the course of the optimization. In order to emphasize the secondary structure, the distance error weights are larger for distances between residues within the same secondary structure element. Alternatively, the layout method can prioritize certain distances based on user-defined edge weights that represent additional structural or functional information.
Aggregated views
The aggregated views are intended to give the user a quick overview on the mutation locations with respect to specific known structural or functional regions. While it would be possible to map additional information directly onto the network representation, the RIN might become quite complex for the user. Thus, we utilize views that aggregate regions based on secondary structures, protein domain information, or functional annotations. These views serve as an intermediate visualization when switching between the 3D structure view and the 2D RIN view.
The simple cartoon view provided by the Pro-origami web service reduces the complex 3D protein structure to the essential secondary and super-secondary structure information and presents it with an easily readable layout (Figure 1). Pro-origami provides SVG images, which are enriched with further information in the form of highlighted regions of interest such as the localization of mutated residues. As Pro-origami can decompose proteins into domains, we can also obtain a combined representation of secondary structure and protein domains within the cartoon view.
Comparison view
The representation of protein structures as RINs enables network comparison and alignment to explore the differences between parent and mutant structures further. Besides the comparison of two networks or structures side-by-side, we provide a comparison network view based on the alignment of the underlying sequences (Figure 4). In this view, each node represents a pair of aligned residues and two nodes are connected if the corresponding residues have a non-covalent interaction in either of the two compared RINs. We use visual cues to highlight interactions that were gained or lost upon amino acid change, and we score the fraction of such interactions for each residue to quantify the mutational effect on protein structure and function.
Furthermore, to distinguish more or less likely mutations, we integrated the amino acid substitution scores from the Blosum62 matrix [19] in RINalyzer and assigned a score to each mutated residue in the comparison network. Each score can be used to highlight sequence changes with a stronger impact on the protein.
Data enrichment
An important component of our visual analytics approach is the mapping of available knowledge onto the visualized sequences and structures. The availability of this information in an easily accessible way while the user works with the different views should facilitate the biological knowledge discovery considerably. This is accomplished by importing the relevant data as node attributes in Cytoscape, which automatically associates them with the RIN and the protein structure. An additional benefit of this integration is that it enables the use of the built-in Cytoscape functionality to create filters based on the imported data and to highlight the residue nodes with attribute values within a given range, e.g. with high or low conservation scores (see Figure 5).
Therefore, in addition to the data given in the contest, we generated or retrieved data from multiple external sources to enrich our visualizations. The following information is regarded as potentially useful for protein analysis:
-
Family conservation. ConSurf-DB [20] provides pre-computed profiles of evolutionary sequence conservation.
-
Residue interactions. The RINerator package creates a network of noncovalent residue interactions such as contacts and hydrogen bonds for any 3D protein structure.
-
Functional sites. Active and binding site information is retrieved manually from UniProtKB [21].
-
Domain annotation. Protein domain information is obtained from the SCOP [22] online resource.
-
Structural properties. Data for the solvent accessible surface area, secondary structure, hydrophobicity, and other structural properties is retrieved automatically from UCSF Chimera.
Visual cues
The data used to enrich our visualizations is mapped as visual cues like color, shape, or line stroke in the network view and transferred to the other views where possible. Furthermore, the differences caused by the mutations can be highlighted by such cues in all visualizations.
We decided to control most visual properties via user-adjustable options with reasonable defaults. For example, different node shapes are used to distinguish the mutated residues in both the parent and the defective protein (Figure 3). Additionally, several visual styles are offered that map different functional and structural information on the views so that the user sees the distribution of corresponding values for the whole protein. Dark colors usually correspond to significant values such as strong hydrophobicity, large solvent accessible surface area or high number of changed residue interactions (Figure 4). For evolutionary conservation, the pink-to-turquoise coloring as applied by ConSurf-DB is used (Figure 5).
The visual cues are particularly useful for illustrating the changes in residue interactions due to the mutations in the comparison network view generated from the alignment of the respective sequences in UCSF Chimera. Residue interactions that are either lost or gained upon mutation are highlighted by differently colored and shaped lines (Figure 4). Residues that cannot be aligned are depicted by nodes with different node borders.
Linkage and coordination of views
To ease the user's cognitive load when switching between different views and tools, we link them in multiple important ways. For an interactive exploration, we implemented a global selection concept, that is, the selection of elements in one view leads to the immediate selection of their corresponding representatives in all other views. Our linkage concept also ensures the consistent use of information mapping and similar cues over all views, particularly, regarding the usage of colors.
Further coordination is achieved due to the synchronized orientation and location of the graphical representations in the different views. For instance, the user can freely explore the 3D structure within the UCSF Chimera window, e.g. by rotating the protein structure. The network view can then be adjusted according to the new orientation of the rotated structure by applying the 3D-structure based RIN layout described above.
In order to implement the full linkage between Cytoscape and UCSF Chimera, we made use of their new software versions. We also ported the plugins RINalyzer and structureViz to work with Cytoscape 3, which also allowed us to link them closely. For example, while the direct communication between Cytoscape and UCSF Chimera is handled by structureViz, the structure-based layout algorithm is implemented in RINalyzer and invokes structureViz to retrieve the current spatial coordinates.