scispace - formally typeset
Search or ask a question

Showing papers on "Nucleic acid secondary structure published in 2021"


Journal ArticleDOI
TL;DR: In this article, a long amplicon strategy was used to determine the secondary structure of the SARS-CoV-2 RNA genome at single-nucleotide resolution in infected cells.

166 citations


Journal ArticleDOI
01 Apr 2021-Cell
TL;DR: In this article, the structural landscape of SARS-CoV-2 RNA in infected human cells and from refolded RNAs, as well as the regulatory untranslated regions of six other coronaviruses were determined using icSHAPE.

114 citations


Journal ArticleDOI
TL;DR: UFold proposes a novel image-like representation of RNA sequences, which can be efficiently processed by Fully Convolutional Networks (FCNs), and significantly outperforms previous methods on within-family datasets, while achieving a similar performance as the traditional methods when trained and tested on distinct RNA families.
Abstract: For many RNA molecules, the secondary structure is essential for the correct function of the RNA. Predicting RNA secondary structure from nucleotide sequences is a long-standing problem in genomics, but the prediction performance has reached a plateau over time. Traditional RNA secondary structure prediction algorithms are primarily based on thermodynamic models through free energy minimization, which imposes strong prior assumptions and is slow to run. Here, we propose a deep learning-based method, called UFold, for RNA secondary structure prediction, trained directly on annotated data and base-pairing rules. UFold proposes a novel image-like representation of RNA sequences, which can be efficiently processed by Fully Convolutional Networks (FCNs). We benchmark the performance of UFold on both within- and cross-family RNA datasets. It significantly outperforms previous methods on within-family datasets, while achieving a similar performance as the traditional methods when trained and tested on distinct RNA families. UFold is also able to predict pseudoknots accurately. Its prediction is fast with an inference time of about 160 ms per sequence up to 1500 bp in length. An online web server running UFold is available at https://ufold.ics.uci.edu. Code is available at https://github.com/uci-cbcl/UFold.

42 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors used evolutionary profiles and mutational coupling to solve the secondary and tertiary structures in high resolution efficiently by existing experimental techniques, which can also significantly improve base-pairing prediction by incorporating artificial but functional homologous sequences.
Abstract: MOTIVATION: The recent discovery of numerous non-coding RNAs (long non-coding RNAs, in particular) has transformed our perception about the roles of RNAs in living organisms. Our ability to understand them, however, is hampered by our inability to solve their secondary and tertiary structures in high resolution efficiently by existing experimental techniques. Computational prediction of RNA secondary structure, on the other hand, has received much-needed improvement, recently, through deep learning of a large approximate data, followed by transfer learning with gold-standard base-pairing structures from high-resolution 3-D structures. Here, we expand this single-sequence-based learning to the use of evolutionary profiles and mutational coupling. RESULTS: The new method allows large improvement not only in canonical base-pairs (RNA secondary structures) but more so in base-pairing associated with tertiary interactions such as pseudoknots, noncanonical and lone base-pairs. In particular, it is highly accurate for those RNAs of more than 1000 homologous sequences by achieving >0.8 F1-score (harmonic mean of sensitivity and precision) for 14/16 RNAs tested. The method can also significantly improve base-pairing prediction by incorporating artificial but functional homologous sequences generated from deep mutational scanning without any modification. The fully automatic method (publicly available as server and standalone software) should provide the scientific community a new powerful tool to capture not only the secondary structure but also tertiary base-pairing information for building three-dimensional models. It also highlights the future of accurately solving the base-pairing structure by using a large number of natural and/or artificial homologous sequences. AVAILABILITY: Standalone-version of SPOT-RNA2 is available at https://github.com/jaswindersingh2/SPOT-RNA2. Direct prediction can also be made at https://sparks-lab.org/server/spot-rna2/. The datasets used in this research can also be downloaded from the GITHUB and the webserver mentioned above.

35 citations


Journal ArticleDOI
TL;DR: A comprehensive RNA structure probing database called RASP (RNA Atlas of Structure Probing), which curates the up-to-date datasets of several RNA secondary structure probing studies for the RNA genome of SARS-CoV-2, the RNA virus that caused the on-going COVID-19 pandemic.
Abstract: RNA molecules fold into complex structures that are important across many biological processes. Recent technological developments have enabled transcriptome-wide probing of RNA secondary structure using nucleases and chemical modifiers. These approaches have been widely applied to capture RNA secondary structure in many studies, but gathering and presenting such data from very different technologies in a comprehensive and accessible way has been challenging. Existing RNA structure probing databases usually focus on low-throughput or very specific datasets. Here, we present a comprehensive RNA structure probing database called RASP (RNA Atlas of Structure Probing) by collecting 161 deduplicated transcriptome-wide RNA secondary structure probing datasets from 38 papers. RASP covers 18 species across animals, plants, bacteria, fungi, and also viruses, and categorizes 18 experimental methods including DMS-seq, SHAPE-Seq, SHAPE-MaP, and icSHAPE, etc. Specially, RASP curates the up-to-date datasets of several RNA secondary structure probing studies for the RNA genome of SARS-CoV-2, the RNA virus that caused the on-going COVID-19 pandemic. RASP also provides a user-friendly interface to query, browse, and visualize RNA structure profiles, offering a shortcut to accessing RNA secondary structures grounded in experimental data. The database is freely available at http://rasp.zhanglab.net.

27 citations


Journal ArticleDOI
TL;DR: In this paper, a 5% divergence filter was used to infer directionality of RNA sequences from 36 datasets of aligned coding region sequences from a diverse range of mammalian RNA viruses (including Picornavirus, Flaviviridae, Matonavirusidae, Calicivirusidae and Coronaviridae) and showed a >2-fold base composition normalised excess of C->U transitions compared to U->C (range 2.1x-7.5x).
Abstract: The rapid evolution of RNA viruses has been long considered to result from a combination of high copying error frequencies during RNA replication, short generation times and the consequent extensive fixation of neutral or adaptive changes over short periods. While both the identities and sites of mutations are typically modelled as being random, recent investigations of sequence diversity of SARS coronavirus 2 (SARS-CoV-2) have identified a preponderance of C->U transitions, proposed to be driven by an APOBEC-like RNA editing process. The current study investigated whether this phenomenon could be observed in datasets of other RNA viruses. Using a 5% divergence filter to infer directionality, 18 from 36 datasets of aligned coding region sequences from a diverse range of mammalian RNA viruses (including Picornaviridae, Flaviviridae, Matonaviridae, Caliciviridae and Coronaviridae) showed a >2-fold base composition normalised excess of C->U transitions compared to U->C (range 2.1x-7.5x), with a consistently observed favoured 5' U upstream context. The presence of genome scale RNA secondary structure (GORS) was the only other genomic or structural parameter significantly associated with C->U/U->C transition asymmetries by multivariable analysis (ANOVA), potentially reflecting RNA structure dependence of sites targeted for C->U mutations. Using the association index metric, C->U changes were specifically over-represented at phylogenetically uninformative sites, potentially paralleling extensive homoplasy of this transition reported in SARS-CoV-2. Although mechanisms remain to be functionally characterised, excess C->U substitutions accounted for 11-14% of standing sequence variability of structured viruses and may therefore represent a potent driver of their sequence diversification and longer-term evolution.

23 citations


Journal ArticleDOI
26 Jan 2021-Viruses
TL;DR: In this article, the authors apply what is known of cellular splicing to understand splicing in HIV-1, and present data from their newer and more sensitive deep sequencing assays quantifying the different HIV1 transcript types.
Abstract: The transcription of the HIV-1 provirus results in only one type of transcript—full length genomic RNA. To make the mRNA transcripts for the accessory proteins Tat and Rev, the genomic RNA must completely splice. The mRNA transcripts for Vif, Vpr, and Env must undergo splicing but not completely. Genomic RNA (which also functions as mRNA for the Gag and Gag/Pro/Pol precursor polyproteins) must not splice at all. HIV-1 can tolerate a surprising range in the relative abundance of individual transcript types, and a surprising amount of aberrant and even odd splicing; however, it must not over-splice, which results in the loss of full-length genomic RNA and has a dramatic fitness cost. Cells typically do not tolerate unspliced/incompletely spliced transcripts, so HIV-1 must circumvent this cell policing mechanism to allow some splicing while suppressing most. Splicing is controlled by RNA secondary structure, cis-acting regulatory sequences which bind splicing factors, and the viral protein Rev. There is still much work to be done to clarify the combinatorial effects of these splicing regulators. These control mechanisms represent attractive targets to induce over-splicing as an antiviral strategy. Finally, splicing has been implicated in latency, but to date there is little supporting evidence for such a mechanism. In this review we apply what is known of cellular splicing to understand splicing in HIV-1, and present data from our newer and more sensitive deep sequencing assays quantifying the different HIV-1 transcript types.

22 citations


Journal ArticleDOI
TL;DR: The role of insertions and deletions (indels) as well as recombination in SARS-CoV-2 evolution has been examined in this paper, using sequences from the GISAID database.
Abstract: The evolutionary dynamics of SARS-CoV-2 have been carefully monitored since the COVID-19 pandemic began in December 2019. However, analysis has focused primarily on single nucleotide polymorphisms and largely ignored the role of insertions and deletions (indels) as well as recombination in SARS-CoV-2 evolution. Using sequences from the GISAID database, we catalogue over 100 insertions and deletions in the SARS-CoV-2 consensus sequences. We hypothesize that these indels are artifacts of recombination events between SARS-CoV-2 replicates whereby RNA-dependent RNA polymerase (RdRp) re-associates with a homologous template at a different loci (“imperfect homologous recombination”). We provide several independent pieces of evidence that suggest this. (1) The indels from the GISAID consensus sequences are clustered at specific regions of the genome. (2) These regions are also enriched for 5’ and 3’ breakpoints in the transcription regulatory site (TRS) independent transcriptome, presumably sites of RNA-dependent RNA polymerase (RdRp) template-switching. (3) Within raw reads, these indel hotspots have cases of both high intra-host heterogeneity and intra-host homogeneity, suggesting that these indels are both consequences of de novo recombination events within a host and artifacts of previous recombination. We briefly analyze the indels in the context of RNA secondary structure, noting that indels preferentially occur in “arms” and loop structures of the predicted folded RNA, suggesting that secondary structure may be a mechanism for TRS-independent template-switching in SARS-CoV-2 or other coronaviruses. These insights into the relationship between structural variation and recombination in SARS-CoV-2 can improve our reconstructions of the SARS-CoV-2 evolutionary history as well as our understanding of the process of RdRp template-switching in RNA viruses.

20 citations


Journal ArticleDOI
TL;DR: In this article, a simplified SPLASH assay and comprehensively map the in vivo RNA-RNA interactome of SARS-CoV-2 genome across viral life cycle is developed, which reveals the structural basis for the regulation of replication, discontinuous transcription and translational frameshifting.
Abstract: The dynamics of SARS-CoV-2 RNA structure and their functional relevance are largely unknown. Here we develop a simplified SPLASH assay and comprehensively map the in vivo RNA-RNA interactome of SARS-CoV-2 genome across viral life cycle. We report canonical and alternative structures including 5′-UTR and 3′-UTR, frameshifting element (FSE) pseudoknot and genome cyclization in both cells and virions. We provide direct evidence of interactions between Transcription Regulating Sequences, which facilitate discontinuous transcription. In addition, we reveal alternative short and long distance arches around FSE. More importantly, we find that within virions, while SARS-CoV-2 genome RNA undergoes intensive compaction, genome domains remain stable but with strengthened demarcation of local domains and weakened global cyclization. Taken together, our analysis reveals the structural basis for the regulation of replication, discontinuous transcription and translational frameshifting, the alternative conformations and the maintenance of global genome organization during the whole life cycle of SARS-CoV-2, which we anticipate will help develop better antiviral strategies. RNA secondary structure is important for viral replication, transcription and translation. Here the authors employ SPLASH method and map in vivo RNA interactions and dynamics of SARS-CoV-2 RNA genome in different viral life cycles.

18 citations


Journal ArticleDOI
TL;DR: The inherent compactness of mRNA might regulate translation initiation by facilitating the formation of protein complexes that bridge mRNA 5′ and 3′ ends and the proximity of mRNA ends might mediate coupling of 3′ deadenylation to 5′ end mRNA decay.
Abstract: The 5' cap and 3' poly(A) tail of mRNA are known to synergistically regulate mRNA translation and stability Recent computational and experimental studies revealed that both protein-coding and non-coding RNAs will fold with extensive intramolecular secondary structure, which will result in close distances between the sequence ends This proximity of the ends is a sequence-independent, universal property of most RNAs Only low-complexity sequences without guanosines are without secondary structure and exhibit end-to-end distances expected for RNA random coils The innate proximity of RNA ends might have important biological implications that remain unexplored In particular, the inherent compactness of mRNA might regulate translation initiation by facilitating the formation of protein complexes that bridge mRNA 5' and 3' ends Additionally, the proximity of mRNA ends might mediate coupling of 3' deadenylation to 5' end mRNA decay This article is categorized under: RNA Structure and Dynamics > RNA Structure, Dynamics, and Chemistry RNA Structure and Dynamics > Influence of RNA Structure in Biological Systems Translation > Translation Regulation

17 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors found that RNA methylation played an important role in RNA transport and function and provided ideas and clues to inspire future research on the function of RNA motifs in RNA long-distance transport, furthermore to explore the underlying mechanism of RNA systematic signaling.
Abstract: A large number of RNA molecules have been found in the phloem of higher plants, and they can be transported to distant organelles through the phloem. RNA signals are important cues to be evolving in fortification strategies by long-distance transportation when suffering from various physiological challenges. So far, the mechanism of RNA selectively transportation through phloem cells is still in progress. Up to now, evidence have shown that several RNA motifs including Polypyrimidine (poly-CU) sequence, transfer RNA (tRNA)-related sequence, Single Nucleotide Mutation bound with specific RNA binding proteins to form Ribonucleotide protein (RNP) complexes could facilitate RNA mobility in plants. Furthermore, some RNA secondary structure such as tRNA-like structure (TLS), untranslation region (UTR) of mRNA, stem-loop structure of pre-miRNA also contributed to the mobility of RNAs. Latest researchs found that RNA methylation such as methylated 5' cytosine (m5C) played an important role in RNA transport and function. These studies lay a theoretical foundation to uncover the mechanism of RNA transport. We aim to provide ideas and clues to inspire future research on the function of RNA motifs in RNA long-distance transport, furthermore to explore the underlying mechanism of RNA systematic signaling.

Journal ArticleDOI
TL;DR: In this paper, the authors discuss the various roles of G4 structures in bacterial DNA and the application of g4 DNA as an inhibitor or therapeutic agent to tackle the bacterial pathogens.
Abstract: DNA strand consisting of multiple runs of guanines can adopt the non-canonical, four-stranded DNA secondary structure known as G-quadruplex or G4 DNA. G4 DNA is thought to play an important role in transcriptional and translational regulation of genes, DNA replication, genome stability, and oncogene expression in eukaryotic genomes. In other organisms including several bacterial pathogens and some plant species, the biological role of G4 DNA and G4 RNA is starting to be explored. Recent investigation showed that G4 DNA and G4 RNA are generally conserved across plant species. In silico analyses of several bacterial genomes identified the putative guanine-rich, G4 DNA-forming sequences in the promoter regions. They were particularly abundant in certain gene classes, suggesting that these highly diverse structures can be employed to regulate expression of genes involved in secondary metabolite synthesis and signal transduction. Furthermore, in the pathogen Mycobacterium tuberculosis, the distribution of G4 motifs and their potential role in the regulation of gene transcription advocate for the use of G4 ligands to develop novel antitubercular therapies. In this review, we discuss the various roles of G4 structures in bacterial DNA and the application of G4 DNA as an inhibitor or therapeutic agent to tackle the bacterial pathogens.

Journal ArticleDOI
TL;DR: In this article, an improvement of IPknot was proposed by employing the LinearPartition model and automatically selecting the optimal threshold parameters based on the pseudo-expected accuracy, which showed favorable prediction accuracy across a wide range of conditions in exhaustive benchmarking.
Abstract: RNA structural elements called pseudoknots are involved in various biological phenomena including ribosomal frameshifts. Because it is infeasible to construct an efficiently computable secondary structure model including pseudoknots, secondary structure prediction methods considering pseudoknots are not yet widely available. We developed IPknot, which uses heuristics to speed up computations, but it has remained difficult to apply it to long sequences, such as messenger RNA and viral RNA, because it requires cubic computational time with respect to sequence length and has threshold parameters that need to be manually adjusted. Here, we propose an improvement of IPknot that enables calculation in linear time by employing the LinearPartition model and automatically selects the optimal threshold parameters based on the pseudo-expected accuracy. In addition, IPknot showed favorable prediction accuracy across a wide range of conditions in our exhaustive benchmarking, not only for single sequences but also for multiple alignments.

Journal ArticleDOI
TL;DR: A review of bioinformatics strategies for human cancer A-to-I RNA editing identification is presented in this article, which briefly discusses recent advances in related areas, such as the oncogenic and tumor suppressive effects of RNA editing.
Abstract: As an important regulatory mechanism at the posttranscriptional level in metazoans, adenosine deaminase acting on RNA (ADAR)-induced A-to-I RNA editing modification of double-stranded RNA has been widely detected and reported. Editing may lead to non-synonymous amino acid mutations, RNA secondary structure alterations, pre-mRNA processing changes, and microRNA-mRNA redirection, thereby affecting multiple cellular processes and functions. In recent years, researchers have successfully developed several bioinformatics software tools and pipelines to identify RNA editing sites. However, there are still no widely accepted editing site standards due to the variety of parallel optimization and RNA high-seq protocols and programs. It is also challenging to identify RNA editing by normal protocols in tumor samples due to the high DNA mutation rate. Numerous RNA editing sites have been reported to be located in non-coding regions and can affect the biosynthesis of ncRNAs, including miRNAs and circular RNAs. Predicting the function of RNA editing sites located in non-coding regions and ncRNAs is significantly difficult. In this review, we aim to provide a better understanding of bioinformatics strategies for human cancer A-to-I RNA editing identification and briefly discuss recent advances in related areas, such as the oncogenic and tumor suppressive effects of RNA editing.

Posted ContentDOI
13 Sep 2021-bioRxiv
TL;DR: In this paper, the authors demonstrate that Vibrio metoecus type III-B (VmeCmr) locus is activated by target RNA binding, generating cyclic-triadenylate (cA3) to stimulate a robust NucC-mediated DNase activity.
Abstract: Type III CRISPR systems detect invading RNA, resulting in the activation of the enzymatic Cas10 subunit. The Cas10 cyclase domain generates cyclic oligoadenylate (cOA) second messenger molecules, activating a variety of effector nucleases that degrade nucleic acids to provide immunity. The prophage-encoded Vibrio metoecus type III-B (VmeCmr) locus is uncharacterised, lacks the HD nuclease domain in Cas10 and encodes a NucC DNA nuclease effector that is also found associated with Cyclic-oligonucleotide-based anti-phage signalling systems (CBASS). Here we demonstrate that VmeCmr is activated by target RNA binding, generating cyclic-triadenylate (cA3) to stimulate a robust NucC-mediated DNase activity. The specificity of VmeCmr is probed, revealing the importance of specific nucleotide positions in segment 1 of the RNA duplex and the protospacer flanking sequence (PFS). We harness this programmable system to demonstrate the potential for a highly specific and sensitive assay for detection of the SARS-CoV-2 virus RNA with a limit of detection (LoD) of 2 fM using a commercial plate reader without any extrinsic amplification step. The sensitivity is highly dependent on the guide RNA used, suggesting that target RNA secondary structure plays an important role that may also be relevant in vivo.

Journal ArticleDOI
TL;DR: Genotype-phenotype maps link genetic changes to their fitness effect and are thus an essential component of evolutionary models as discussed by the authors, and the map between RNA sequences and their secondary structures is a ke...
Abstract: Genotype–phenotype maps link genetic changes to their fitness effect and are thus an essential component of evolutionary models. The map between RNA sequences and their secondary structures is a ke...

Journal ArticleDOI
TL;DR: In this article, a CD study on the interaction of a dithymine-functionalized tetra-L-serine with a homoadenine DNA (dA12) reported an interpretation of the experimental data in light of computational studies based on molecular docking and molecular dynamics (MD), as well as computer-assisted CD interpretation and simulation of the predicted complex structure.

Journal ArticleDOI
01 Aug 2021-Viruses
TL;DR: In this paper, a coarse-grained model with a subdomain composition scheme was proposed to reconstruct the complete 3D structure of RNA genomes inside proteinaceus capsids based on secondary structures from experimental techniques.
Abstract: Three-dimensional RNA domain reconstruction is important for the assembly, disassembly and delivery functionalities of a packed proteinaceus capsid. However, to date, the self-association of RNA molecules is still an open problem. Recent chemical probing reports provide, with high reliability, the secondary structure of diverse RNA ensembles, such as those of viral genomes. Here, we present a method for reconstructing the complete 3D structure of RNA genomes, which combines a coarse-grained model with a subdomain composition scheme to obtain the entire genome inside proteinaceus capsids based on secondary structures from experimental techniques. Despite the amount of sampling involved in the folded and also unfolded RNA molecules, advanced microscope techniques can provide points of anchoring, which enhance our model to include interactions between capsid pentamers and RNA subdomains. To test our method, we tackle the satellite tobacco mosaic virus (STMV) genome, which has been widely studied by both experimental and computational communities. We provide not only a methodology to structurally analyze the tertiary conformations of the RNA genome inside capsids, but a flexible platform that allows the easy implementation of features/descriptors coming from both theoretical and experimental approaches.

Journal ArticleDOI
TL;DR: A review of known bacterial regulatory mechanisms which rely on RNA structure is presented in this paper, where the basic theory on RNA folding and dynamics is described. And examples of multiple mechanisms employed by RNA regulators in the control of bacterial transcription and translation are presented.
Abstract: Due to the high exposition to changing environmental conditions, bacteria have developed many mechanisms enabling immediate adjustments of gene expression. In many cases, the required speed and plasticity of the response are provided by RNA-dependent regulatory mechanisms. This is possible due to the very high dynamics and flexibility of an RNA structure, which provide the necessary sensitivity and specificity for efficient sensing and transduction of environmental signals. In this review, we will discuss the current knowledge about known bacterial regulatory mechanisms which rely on RNA structure. To better understand the structure-driven modulation of gene expression, we describe the basic theory on RNA structure folding and dynamics. Next, we present examples of multiple mechanisms employed by RNA regulators in the control of bacterial transcription and translation.

Journal ArticleDOI
TL;DR: In this article, the authors summarized newly reported methods for probing RNA secondary structure in vivo and functions and mechanisms of RSS in plant physiology and concluded that RSS is correlated with regulating splicing, polyadenylation, protein systhsis, and miRNA biogenesis and functions.
Abstract: The majority of the genome is transcribed to RNA in living organisms. RNA transcripts can form astonishing arrays of secondary and tertiary structures via Watson-Crick, Hoogsteen or wobble base pairing. In vivo, RNA folding is not a simple thermodynamics event of minimizing free energy. Instead, the process is constrained by transcription, RNA binding proteins (RBPs), steric factors and micro-environment. RNA secondary structure (RSS) plays myriad roles in numerous biological processes, such as RNA processing, stability, transportation and translation in prokaryotes and eukaryotes. Emerging evidence has also implicated RSS in RNA trafficking, liquid-liquid phase separation and plant responses to environmental variations such as temperature and salinity. At the molecular level, RSS is correlated with regulating splicing, polyadenylation, protein systhsis, and miRNA biogenesis and functions. In this review, we summarized newly reported methods for probing RSS in vivo and functions and mechanisms of RSS in plant physiology.

Journal ArticleDOI
TL;DR: CoBold is presented, a computational method for identifying different functional classes of transient RNA structure features that can either aid or hinder the formation of a known reference RNA structure.
Abstract: RNA structure formation in vivo happens co-transcriptionally while the transcript is being made. The corresponding co-transcriptional folding pathway typically involves transient RNA structure features that are not part of the final, functional RNA structure. These transient features can play important functional roles of their own and also influence the formation of the final RNA structure in vivo. We here present CoBold, a computational method for identifying different functional classes of transient RNA structure features that can either aid or hinder the formation of a known reference RNA structure. Our method takes as input either a single RNA or a corresponding multiple-sequence alignment as well as a known reference RNA secondary structure and identifies different classes of transient RNA structure features that could aid or prevent the formation of the given RNA structure. We make CoBold available via a web-server which includes dedicated data visualisation.

Journal ArticleDOI
TL;DR: The proposed approach based on a metaheuristic algorithm named Chemical Reaction Optimization (CRO) to solve the RNA pseudoknotted structure prediction problem is compared with some existing algorithms and shown that the CRO based model is a better prediction method in terms of accuracy and speed.
Abstract: RNA molecules play a significant role in cell function especially including pseudoknots. In past decades, several methods have been developed to predict RNA secondary structure with pseudoknots and the most popular one uses minimum free energy. It is a nondeterministic polynomial-time hard (NP-hard) problem. We have proposed an approach based on a metaheuristic algorithm named Chemical Reaction Optimization (CRO) to solve the RNA pseudoknotted structure prediction problem. The reaction operators of CRO algorithm have been redesigned and used on the generated population to find the structure with the minimum free energy. Besides, we have developed an additional operator called Repair operator which has a great influence on our algorithm in increasing accuracy. It helps to increase the true positive base pairs while decreasing the false positive and false negative base pairs. Four energy models have been applied to calculate the energy. To evaluate the performance, we have used four datasets containing RNA pseudoknotted sequences taken from the RNA STRAND and Pseudobase++ database. We have compared the proposed approach with some existing algorithms and shown that our CRO based model is a better prediction method in terms of accuracy and speed.

Journal ArticleDOI
04 Feb 2021-Cells
TL;DR: In this article, the authors showed that changes in mRNA secondary structure might represent a general mechanism for translational regulation of psbA and other plastid genes, and they showed that other PLASTid genes with weak Shine-Dalgarno sequences (SD) are likely to exhibit PSBA-like regulation, while those with strong SDs do not.
Abstract: mRNA secondary structure influences translation. Proteins that modulate the mRNA secondary structure around the translation initiation region may regulate translation in plastids. To test this hypothesis, we exposed Arabidopsis thaliana to high light, which induces translation of psbA mRNA encoding the D1 subunit of photosystem II. We assayed translation by ribosome profiling and applied two complementary methods to analyze in vivo RNA secondary structure: DMS-MaPseq and SHAPE-seq. We detected increased accessibility of the translation initiation region of psbA after high light treatment, likely contributing to the observed increase in translation by facilitating translation initiation. Furthermore, we identified the footprint of a putative regulatory protein in the 5' UTR of psbA at a position where occlusion of the nucleotide sequence would cause the structure of the translation initiation region to open up, thereby facilitating ribosome access. Moreover, we show that other plastid genes with weak Shine-Dalgarno sequences (SD) are likely to exhibit psbA-like regulation, while those with strong SDs do not. This supports the idea that changes in mRNA secondary structure might represent a general mechanism for translational regulation of psbA and other plastid genes.

Journal ArticleDOI
TL;DR: These results provide valuable TE exon models for studying formation and kinetics of pre-mRNA building blocks required for splice-site selection and will be useful for fine-tuning auxilliary splicing motifs and exon and intron size constraints that govern aberrant splicesome activation.
Abstract: Transposed elements (TEs) have dramatically shaped evolution of the exon-intron structure and significantly contributed to morbidity, but how recent TE invasions into older TEs cooperate in generating new coding sequences is poorly understood. Employing an updated repository of new exon-intron boundaries induced by pathogenic mutations, termed DBASS, here we identify novel TE clusters that facilitated exon selection. To explore the extent to which such TE exons maintain RNA secondary structure of their progenitors, we carried out structural studies with a composite exon that was derived from a long terminal repeat (LTR78) and AluJ and was activated by a C > T mutation optimizing the 5' splice site. Using a combination of SHAPE, DMS and enzymatic probing, we show that the disease-causing mutation disrupted a conserved AluJ stem that evolved from helix 3.3 (or 5b) of 7SL RNA, liberating a primordial GC 5' splice site from the paired conformation for interactions with the spliceosome. The mutation also reduced flexibility of conserved residues in adjacent exon-derived loops of the central Alu hairpin, revealing a cross-talk between traditional and auxilliary splicing motifs that evolved from opposite termini of 7SL RNA and were approximated by Watson-Crick base-pairing already in organisms without spliceosomal introns. We also identify existing Alu exons activated by the same RNA rearrangement. Collectively, these results provide valuable TE exon models for studying formation and kinetics of pre-mRNA building blocks required for splice-site selection and will be useful for fine-tuning auxilliary splicing motifs and exon and intron size constraints that govern aberrant splice-site activation.

Journal ArticleDOI
03 Jun 2021-RNA
TL;DR: In this article, a multivariable linear regression model was developed to predict APOBEC1 dependent C-to-U RNA editing efficiency, incorporating factors independently associated with editing frequencies based on 103 Sanger-confirmed editing sites.
Abstract: Mammalian C-to-U RNA editing was described more than 30 years ago as a single nucleotide modification in small intestinal Apob RNA, later shown to be mediated by the RNA-specific cytidine deaminase APOBEC1 Reports of other examples of C-to-U RNA editing, coupled with the advent of genome-wide transcriptome sequencing, identified an expanded range of APOBEC1 targets Here we analyze the cis-acting regulatory components of verified murine C-to-U RNA editing targets, including nearest neighbor as well as flanking sequence requirements and folding predictions RNA secondary structure of the editing cassette was associated with editing frequency and exhibited minimal free energy values comparable to small nuclear RNAs We summarize findings demonstrating the relative importance of trans-acting factors (A1CF, RBM47) acting in concert with APOBEC1 Co-factor dominance was associated with editing frequency, with RNAs targeted by both RBM47 and A1CF edited at a lower frequency than RBM47 dominant targets Using this information, we developed a multivariable linear regression model to predict APOBEC1 dependent C-to-U RNA editing efficiency, incorporating factors independently associated with editing frequencies based on 103 Sanger-confirmed editing sites, which accounted for 84% of the observed variance This model also predicted a composite score for available human C-to-U RNA targets, which again correlated with editing frequency

Journal ArticleDOI
TL;DR: In this paper, the authors used chemical mapping to determine the secondary structure of segment 8 vRNA of the pandemic A/California/04/2009 (H1N1) strain of IAV.

Journal ArticleDOI
TL;DR: In this paper, the differential accumulation of RNA secondary structures in codewords was measured in the case of the pathogen Plasmodium falciparum, which is a deadly human pathogen responsible for the spread of malaria.
Abstract: Plasmodium falciparum is a deadly human pathogen responsible for the devastating disease called malaria. In this study, we measured the differential accumulation of RNA secondary structures in codi...

Journal ArticleDOI
Linyu Wang1, Xiao-dan Zhong1, Shuo Wang1, Hao Zhang1, Yuanning Liu1 
TL;DR: Wang et al. as discussed by the authors proposed an end-to-end method to predict RNA secondary structure profile based on Bidirectional LSTM and Residual Neural Network, which utilizes data sets generated by multiple biological experiment methods as the training, validation, and test sets.
Abstract: Studies have shown that RNA secondary structure, a planar structure formed by paired bases, plays diverse vital roles in fundamental life activities and complex diseases. RNA secondary structure profile can record whether each base is paired with others. Hence, accurate prediction of secondary structure profile can help to deduce the secondary structure and binding site of RNA. RNA secondary structure profile can be obtained through biological experiment and calculation methods. Of them, the biological experiment method involves two ways: chemical reagent and biological crystallization. The chemical reagent method can obtain a large number of prediction data, but its cost is high and always associated with high noise, making it difficult to get results of all bases on RNA due to the limited of sequencing coverage. By contrast, the biological crystallization method can lead to accurate results, yet heavy experimental work and high costs are required. On the other hand, the calculation method is CROSS, which comprises a three-layer fully connected neural network. However, CROSS can not completely learn the features of RNA secondary structure profile since its poor network structure, leading to its low performance. In this paper, a novel end-to-end method, named as “RPRes, was proposed to predict RNA secondary structure profile based on Bidirectional LSTM and Residual Neural Network. RPRes utilizes data sets generated by multiple biological experiment methods as the training, validation, and test sets to predict profile, which can compatible with numerous prediction requirements. Compared with the biological experiment method, RPRes has reduced the costs and improved the prediction efficiency. Compared with the state-of-the-art calculation method CROSS, RPRes has significantly improved performance.

Journal ArticleDOI
TL;DR: In this paper, the authors solved the first structures of both RNA and an RNA-protein complex by using 1 H-detection at fast MAS rates, which can provide rapid access to RNA secondary structure by ssNMR in protein-RNA complexes of any size.
Abstract: Knowledge of RNA structure, either in isolation or in complex, is fundamental to understand the mechanism of cellular processes. Solid-state NMR (ssNMR) is applicable to high molecular-weight complexes and does not require crystallization; thus, it is well-suited to study RNA as part of large multicomponent assemblies. Recently, we solved the first structures of both RNA and an RNA-protein complex by ssNMR using conventional 13 C- and 15 N-detection. This approach is limited by the severe overlap of the RNA peaks together with the low sensitivity of multidimensional experiments. Here, we overcome the limitations in sensitivity and resolution by using 1 H-detection at fast MAS rates. We develop experiments that allow the identification of complete nucleobase spin-systems together with their site-specific base pair pattern using sub-milligram quantities of one uniformly labelled RNA sample. These experiments provide rapid access to RNA secondary structure by ssNMR in protein-RNA complexes of any size.