Characteristics of equipartition for RNA structure

Li, Hengwu; Zhu, Daming; Zhang, Caiming; Han, Huijian; Crandall, Keith A

doi:10.1186/1753-6561-8-S6-S3

Volume 8 Supplement 6

Proceedings of the Great Lakes Bioinformatics Conference 2014

Research
Open access
Published: 13 October 2014

Characteristics of equipartition for RNA structure

Hengwu Li^1,2,
Daming Zhu²,
Caiming Zhang²,
Huijian Han¹ &
…
Keith A Crandall³

BMC Proceedings volume 8, Article number: S3 (2014) Cite this article

1718 Accesses
1 Altmetric
Metrics details

Abstract

Background

With the continuous discovery of novel RNA molecules with key cellular functions and of novel pathways and interaction networks, the need for structural information of RNA is still increasing. In order to predict structure of long RNA and understand its natural folding mechanism, exploring the characteristic of RNA structure is an important issue.

Methods

The real RNA secondary structures of all 480 sequences from the database of RNA strand, validated by nuclear magnetic resonance or x-ray are selected. For one sequence with multiple domains, the length ratios of these domains to the sequence are computed. For one sequence with one domain and multiple sub-domains, the length ratios of these sub-domains to the domain are computed. Then the ratios are compared and analyzed to seek the partition characteristic of domains and subdomains.

Results

For most RNAs, the length ratios of multiple domains to its sequence are close to equal, and those of sub-domains to its domain are also nearly identical. Most RNAs with multiple domains have two domains, so the length ratios of the domains to its sequence are close to 0.5. For sequence with one domain and no sub-domain or one sub-domain, the centre of domain and sub-domain is close to that of the sequence.

Conclusions

A novel finding is given that RNA folding accords with the characteristic of equipartition based on statistical analysis. The characteristic reflects the folding rules of RNA from a new angle, which maybe more close to natural folding.

Background

RNAs are versatile molecules. To understand fully the various functions of RNAs, we need to first understand their structures [1]. Experimental test of RNA tertiary structure is too expensive and time consuming to meet practical need, so predicting RNA structure by computer becomes a basic method and issue in computational biology [2].

RNA is folded as the process of transcription into RNA from DNA. In order to predict RNA structure, a case may be made that the natural folding process of RNA and the simulated folding of RNA using an evolutionary algorithm, which includes intermediate folds, have much in common [3]. So exploring the characteristic of RNA structure is an important issue to understand its natural folding mechanism.

We compare the structures of the test set of all 480 sequences from RNA STRAND [4], validated by NMR or X-Ray, and give a novel finding that RNA folding accords with characteristic of equipartition based on statistical analysis on real RNA secondary structures.

Methods

Let sequence s=s₁s₂...s_n be a single-stranded RNA molecule, where each base $s_{i} \in {A, U, C, G}$ , 1 ≤ i ≤ n. The subsequence s_{i, j} = s_is_i+1. . . s_j is a segment of s, 1 ≤ i ≤ j ≤ n.

If s_i and s_j are complementary bases (A&U, C&G, U&G), then s_i and s_j may constitute a base pair (i, j). A secondary structure S on s is a set of base pairs S={(i, j)}, where $i, j \in {1, 2, \dots, n}$ , that satisfies the following conditions.

(No sharp turns.) The ends of each pair in S are separated by at least four intervening bases; that is, if $(i, j) \in S$ , then i < j-3.

For any pair (i, j) in $S, (i, j) \in {(A, U), (C, G), (U, G), (U, A), (G, C), (G, U)}$ .

S is a matching: no base appears in more than one pair.

(The non-crossing condition.) If (i, j) and (k, l) are two pairs in S, then they are compatible, that is, they are juxtaposed (e.g. i <j < k <l) or nested (e.g. i <k <l <j).

If (i, j) and $(i + 1, j - 1) \in S$ , base pairs (i, j) and (i+1, j-1) constitute stack (i, i+1: j-1, j), and m(≥1) consecutive stacks form the helix (i, i+m: j-m, j) with the length of m+1.

If base pairs (i, j) and (k, l) are incompatible, they form a pseudoknot (e.g. i < k < j < l). More complex pseudoknots may occur if three or more base pairs cross each other.

In the past domains have been described as units of: compact three-dimensional structure, folding, function and evolution [5]. A domain is a conserved part of a given sequence and structure that exists independently of the rest of the chain, and often can be independently stable and folded. The majority of domains have less than 200 residues with an average of approximately 100 residues [6].

A domain D(i', j') consists of all (i', j') that satisfy, (i', j') ∈ D(i, j) then i < i' < j' < j. Each base pair and each helix is placed uniquely in one domain [7].

A domain is closed by a helix or pseudoknot, as Figure 1. One sub-domain is an independently stable part of one domain. If the closed helix or pseudoknot of one domain is deleted, its sub-domain will become domain.

By convention, single strands of RNA sequences are written in 5'-to-3' direction. RNA is folded as the process of transcription into RNA from DNA. The subsequence s_i,j begins to transcribe from the 5'-end s_i. It terminates transcription at the 3'-end s_j, as Figure 1. The helix (i, i+m: j-m, j) is totally folded after transcription of s_j.

For purpose of understanding the natural folding mechanism and pathway of RNA, we selected the real structures of all 480 sequences from RNA STRAND with secondary and pseudoknotted structures, validated by NMR or X-Ray, non-fragment and non-redundant sequences, and analyzed their domains and sub-domains.

If the structure of RNA has multiple domains, we computed R as the ratios of 3'-end of domains to the length of sequence. Let L is the length of sequence. The ratio of the 3'-end of the helix (i, i+m: j-m, j) and the domain D(i, j) to the length of sequence s_1,Lis the ratio of j to L, that is R=j/L. We compute and analyze the value of R, and seek the partition characteristic of domains.

If the structure of RNA has only one domain, we computed SR as the length ratios of its sub-domains to the domain. If the domain D(i, j) is closed by a helix (i, i+m: j-m, j), then its internal length is j-i-2m-1, and SR is equal to (q-p+1)/(j-i-2m-1) for the sub-domain D(i+p, i+q) with j-m-i>q>p>m. We compute and analyze the value of SR, and seek the partition characteristic of sub-domains.

Results and discussion

Characteristic of equipartition for synthetic RNA

We compare the structures of all 248 sequences of synthetic RNA. The results of statistical analysis on these structures are shown in Figure 1, Figure 2, Table 1 Table 2 and Table 3.

Table 1 Distribution of domains for synthetic RNA with more than two domains

Full size table

Table 2 Distribution of domains for synthetic RNA with two domains

Full size table

Table 3 Distribution of multiple sub-domains for synthetic RNA with one domain

Full size table

Table 1 shows the distribution of multi-domains for synthetic RNA with more than two domains.

There are 12 sequences with more than two domains as Table 1 and Figure 1. Sequences PDB_00195, PDB_00262, PDB_00754 and PDB_01250, have three domains. Their domains are 0-0.33L, 0.33L-0.66L and 0.66L-L as Table 1 and Figure 1A, which completely fits the characteristic of equipartition. Sequence PDB_01060 has three domains, 0.25L-0.5L, 0.5L-0.75L and 0.75L-L, but we can divide the sequence into four domains, 0-0.25L, 0.25L-0.5L, 0.5L-0.75L and 0.75 L-L, then it also conforms to the characteristic of equipartition.

Sequences PDB_00175, PDB_00873, PDB_00447 and PDB_00340, have four domains. Their domains are 0-0.25L, 0.25L-0.5L, 0.5L-0.75L and 0.75L- L as Table 1 and Figure 1B, which completely fits the characteristic of equipartition.

Sequences PDB_01061 and PDB_00370 have five domains. Their domains are 0-0.2L, 0.2L-0.4L, 0.4L-0.6L, 0.6 L-0.8L and 0.8L-L as Table 1 and Figure 1C, which completely fits the characteristic of equipartition.

Only sequence PDB_01249 has six domains 0-0.17L, 0.17L-0.33L, 0.33L-0.5L, 0.5L-0.67L, 0.67L-0.83L and 0.83L-L, which is completely fits the characteristic of equipartition, as Table 1 and Figure 1D.

Table 2 shows the distribution of domains for synthetic RNA with two domains. There are 48 sequences with two domains as Table 2 and Figure 2. The domains of 29 sequences are just 0-0.5L and 0.5L-L, and those of 13 sequences are close to 0-0.5L and 0.5L-L, which fits the characteristic of equipartition. The domain is formed by parallel helixes or pseudoknots as Figure 1E and 1F. But there are some exceptions, the domains of sequence PDB_00196 are 0-0.33L and 0.33L-L, those of sequence PDB_00868 are 0-0.39L and 0.39L-L, those of sequence PDB_00709 and PDB_00710 are 0-0.4L and 0.4L-L, those of sequence PDB_00971 are 0-0.57L and 0.57L-L, and those of sequence PDB_01138 are 0-0.58L and 0.58L-L. It can be thought as the combination of some domains, and they close to 0.33L, 0.4L and 0.6L. For example, we can regard sequence PDB_00196 as three domains 0-0.33L, 0.33L- 0.66L and 0.66L-L, then 0.33L-0.66L and 0.66L-L combines into domain 0.33L-L.

The rest of 188 sequences have one domain or one pseudokont, and the centre of domain is basically same as that of its sequence. In common, their sub-domains can be divided into three classes, no sub-domain in 157 sequences as Figure 1G, one subdomain with the centre is close to that of its sequence as Figure 1H, and multiple and nearly equal sub-domains as Figure 1I.

For Sequence with one domain and no sub-domain, the centre of domain is close to that of the sequence. There are 11 sequences with one pseudoknotted domain. Sequences PDB_01040, PDB_00020, PDB_01165 and PDB_01194 have one pseudoknot with three helixes, and the helix (2,8:18,24) and (26,32:42,48) in PDB_01040, (1,5:15,21) and (22,27:35,40) in PDB_00020, (1,9:12,20) and (23,31:34,42) in PDB_01165, (1,5:12,16) and (17,21:28,32) in PDB_01194, meet the characteristic of equipartition. PDB_00053, PDB_00209, PDB_00134, PDB_00842, PDB_01059 and PDB_00124 have one pseudoknot with two helixes, and the centre of the pseudoknot is same as that of its sequence. PDB_00759 has two pseudoknots with three helixes, and the centre of helix (3,4:13,14) is basically same as that of its sequence.

As Table 3 shows the distribution of sub-domains for synthetic RNA with one domain and multiple sub-domains. There are 7 sequences with one domain and multiple subdomains, as Table 5. The sub-domains of six sequences also conform to the characteristic of equipartition, with only one sequence exception.

Table 4 Distribution of domains for tRNA with multiple domains

Full size table

Table 5 Distribution of sub-domains for tRNA with one domain and multiple sub-domains

Full size table

Characteristic of equipartition for tRNA

We compare the structures of all 46 sequences of tRNA. The results of statistical analysis on these secondary structures are shown in Table 4 and Table 5.

Table 4 shows the distribution of domains for tRNA with multiple domains. There are 17 sequences with two domains. The domains of 8 sequences are just 0-0.5L and 0.5L-L, and those of 7 sequences are close to 0-0.5L and 0.5L-L, which fits the characteristic of equipartition. But there are some exceptions, the domains of sequence PDB_00681 are 0-0.33L and 0.33L-L, those of sequence PDB_01162 are 0-0.61L and 0.61L-L. It can also be thought as the combination of two domains, and they close to 0.33L and 0.66L.

There are 3 sequences have three domains. Their domains are close to 0-0.33L, 0.33L -0.66L and 0.66L-L, which fits the characteristic of equipartition.

Sequence PDB_00998 has four domains 0-0.18L, 0.18L-0.34L, 0.34L-0.5L and 0.5L-L. They can be thought as two groups, one is 0-0.5L, and the other is 0.5L-L, which fits the characteristic of equipartition. For the group 0-0.5L, it is divided into three domains, 0-0.18L, 0.18L-0.34L and 0.34L-0.5L, it also fits the characteristic.

Sequence PDB_00398 has five domains 0-0.2L, 0.2L-0.4L, 0.4L-0.6L, 0.6L-0.8L and 0.8L-L, which completely fits the characteristic of equipartition.

Sequence PDB_01000 has six domains 0-0.18 L, 0.18 L-0.34L, 0.34L -0.5L, 0.5L-0.68 L, 0.68L-0.83L and 0.83L-L, which is also fits the characteristic of equipartition.

The rest of 23 sequences have one domain or one pseudokont, and the centre of domain is basically same as that of its sequence. There are 12 sequences with one domain and multiple sub-domains as Table 5 and their sub-domains all close to 0 - 0.33L, 0.33L-0.66L and 0.66L-L, which conforms to the characteristic of equipartition.

Characteristic of equipartition for other RNA

We compare the structures of all 49 sequences of Other RNA, 6 sequences of Ham Ribozyme and 9 sequences of Viral & Phag.

The results of statistical analysis on these secondary structures are shown in Table 6. For Other RNA, there are 7 sequences with two domains, and the domains are just 0-0.5L and 0.5L-L, which completely fits the characteristic of equipartition. There are 3 sequences have three domains. The domains of PDB_00626 and PDB_00739 are just 0-0.33L, 0.33L-0.67L and 0.67L-L, which fits the characteristic of equipartition. Sequence PDB_01261 has three domains. They can be divided into two groups, one is 0-0.5L, and the other is 0.5L-L. Then the domain 0.5L-L is divided into 0.5L-0.75L and 0.75L-L. Sequence PDB_001261 and PDB_00985 has four domains. Their domains are close to 0-0.25L, 0.25L-0.5L, 0.5-0.75L and 0.75L-L, which fits the characteristic of equipartition. Sequences PDB_01061 and PDB_00370 have five domains. Their domains are 0-0.33L, 0.33L-0.66L and 0.66L-L, which completely fits the characteristic of equipartition.

Table 6 Distribution of domains for Other RNA, Viral Phage and Ham Ribozyme with multiple domains

Full size table

For Viral & Phage, only one sequence PDB_00743 has two domains 0-0.52L, 0.52LL, which fits the characteristic of equipartition, as Table 6. The rest of 8 sequences have only one domain and no sub-domain, and three of them exist as two pseudoknotted helixes. The domains all fit the characteristic of equipartition.

For Ham Ribozyme, only one sequence PDB_00157 has two domains 0-0.5L, 0.5L-L, which completely fits the characteristic of equipartition, as Table 6. The rest of 5 sequences only have one domain with two sub-domains, and their sub-domains also conform to the characteristic of equipartition.

Characteristic of equipartition for other ribozyme

We compare the structures of all 18 sequences of Other Ribozyme. The results of statistical analysis on these secondary structures are shown in Table 7.

Table 7 Distribution of domains for Other Ribozyme

Full size table

Sequence PDB_00805 have 8 domains, and they can be divided into four groups 0-0.25L, 0.25L-0.5L, 0.5-0.75L and 0.75L-L, which conforms to the characteristic of equipartition. Sequence PDB_00176 has four domains, and they meet the character of equipartition. Sequence PDB_01187 has four domains, and they can be divided into two groups 0-0.5L and 0.5L-L, which conforms to the characteristic of equipartition. There are 11 sequences with two domains. They fit the character of equipartition with three exceptions.

There are 4 sequences have only one domain. PDB_00088 has one domain with no sub-domain, PDB_00142 has one domain with one sub-domain, and they also fit the characteristic of equipartition. Sequence PDB_00078 has one pseudokontted domain with four sub-domains 0-0.36L, 0.36L-0.48L, 0.48L-0.86L and 0.86L-L. They can be divided into two groups, one is 0-0.48L, and the other is 0.48L-L, which nearly fit the characteristic of equipartition. PDB_01185 has one pseudokontted domain with three sub-domains, and sub-domain 0-0.52L and 0.52L-L nearly fit the characteristic of equipartition.

For 16S rRNA and 32S rRNA, they conform to other characteristics besides the characteristic of equipartition, it is a matter for further discussion.

Conclusions

In this paper, we give a novel finding that RNA folding accords with the characteristic equipartition based on statistical analysis on real RNA secondary structures of all 480 sequences from RNA STRAND, validated by NMR or X-Ray. For most RNA sequences, the length of multiple domains is close to equal. For the sub-domains of one domain, the length of them is also nearly identical. Most of multiple domains are two domains, so the length ratio of the first domain to its sequence is close to 0.5. The characteristic of equipartition reflects the folding rules of RNA from a new angle, which is more close to natural folding. Applying this characteristic, algorithm can be designed to dynamically predict long RNA structure, and the dynamic folding mechanism and the relation of function, mutation and RNA structure can be deeply understood from a new view.

Declarations

This work was supported by NSFC under grant N0.61070019, 61272431, Shan Dong Province Natural Science Foundation of China under grant N0.ZR2011FL029, ZR2013FM016, the Open Project Program of the Shandong Provincial Key Lab of Software Engineering under grant No.2011SE004, and Program for Scientific Research Innovation Team in Colleges and Universities of Shandong Province.

References

Staple DW, Butcher SE: Pseudoknots: RNA structures with diverse functions. PLoSBiol. 2005, 3: e213-
Article Google Scholar
Mathews DH, Turner DH: Prediction of RNA secondary structure by free energy minimization. Current Opinion in Structural Biology. 2006, 16: 270-278. 10.1016/j.sbi.2006.05.010.
Article PubMed CAS Google Scholar
Wiese Kay C, Hendriks Andrew: Comparison of P-RnaPredictand mfold--algorithms for RNA secondary structure prediction. Bioinformatics. 2006, 22: 934-942. 10.1093/bioinformatics/btl043.
Article PubMed CAS Google Scholar
RNA STRAND. [http://www.rnasoft.ca/strand/]
Bork P: Shuffled domains in extracellular proteins. FEBSLett. 1991, 286: 47-54. 10.1016/0014-5793(91)80937-X.
Article CAS Google Scholar
Wheelan SJ, Marchler-Bauer A, Bryant SH: Domain size distributions can predict domain boundaries. Bioinformatics. 2000, 16: 613-618. 10.1093/bioinformatics/16.7.613.
Article PubMed CAS Google Scholar
Petrov AS, Bernier CR, Hershkovitz E, Xue Y, Waterbury CC: Secondary Structure and Domain Architecture of the 23S rRNA. Nucleic Acids Research. 2013, 41: 7522-7535. 10.1093/nar/gkt513.
Article PubMed CAS PubMed Central Google Scholar

Download references

Acknowledgements

We thank the anonymous reviewers for their detailed and very useful comments.

This article has been published as part of BMC Proceedings Volume 8 Supplement 6, 2014: Proceedings of the Great Lakes Bioinformatics Conference 2014. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcproc/supplements/8/S6.

Author information

Authors and Affiliations

School of Computer Science and Technology, and Shandong Provincial Key Laboratory of Digital Media Technology, Shandong University of Finance and Economics, Jinan, 250014, China
Hengwu Li & Huijian Han
School of Computer Science and Technology, and Shandong Provincial Key Laboratory of Software Engineering, Shandong University, Jinan, 250014, China
Hengwu Li, Daming Zhu & Caiming Zhang
Computational Biology Institute, George Washington University, Ashburn, Virginia, 20147, USA
Keith A Crandall

Authors

Hengwu Li
View author publications
You can also search for this author in PubMed Google Scholar
Daming Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Caiming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Huijian Han
View author publications
You can also search for this author in PubMed Google Scholar
Keith A Crandall
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hengwu Li.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

HL and DM initiated the project and carried out data analysis. KAC conducted statistical modelling and comparison. CZ and HH performed data processing. All authors read and approved the final manuscript.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Li, H., Zhu, D., Zhang, C. et al. Characteristics of equipartition for RNA structure. BMC Proc 8 (Suppl 6), S3 (2014). https://doi.org/10.1186/1753-6561-8-S6-S3

Download citation

Published: 13 October 2014
DOI: https://doi.org/10.1186/1753-6561-8-S6-S3

Proceedings of the Great Lakes Bioinformatics Conference 2014

Characteristics of equipartition for RNA structure