scispace - formally typeset
Search or ask a question
Journal ArticleDOI

LNCipedia: a database for annotated human lncRNA transcript sequences and structures.

01 Jan 2013-Nucleic Acids Research (Oxford University Press)-Vol. 41, pp 246-251
TL;DR: LNCipedia as discussed by the authors is a database for human long non-coding RNA (lncRNA) transcripts and genes, which contains 21 488 annotated human lncRNA transcripts obtained from different sources.
Abstract: Here, we present LNCipedia (http://www.lncipedia.org), a novel database for human long non-coding RNA (lncRNA) transcripts and genes. LncRNAs constitute a large and diverse class of non-coding RNA genes. Although several lncRNAs have been functionally annotated, the majority remains to be characterized. Different high-throughput methods to identify new lncRNAs (including RNA sequencing and annotation of chromatin-state maps) have been applied in various studies resulting in multiple unrelated lncRNA data sets. LNCipedia offers 21 488 annotated human lncRNA transcripts obtained from different sources. In addition to basic transcript information and gene structure, several statistics are determined for each entry in the database, such as secondary structure information, protein coding potential and microRNA binding sites. Our analyses suggest that, much like microRNAs, many lncRNAs have a significant secondary structure, in-line with their presumed association with proteins or protein complexes. Available literature on specific lncRNAs is linked, and users or authors can submit articles through a web interface. Protein coding potential is assessed by two different prediction algorithms: Coding Potential Calculator and HMMER. In addition, a novel strategy has been integrated for detecting potentially coding lncRNAs by automatically re-analysing the large body of publicly available mass spectrometry data in the PRIDE database. LNCipedia is publicly available and allows users to query and download lncRNA sequences and structures based on different search criteria. The database may serve as a resource to initiate small- and large-scale lncRNA studies. As an example, the LNCipedia content was used to develop a custom microarray for expression profiling of all available lncRNAs.
Citations
More filters
Journal ArticleDOI
16 Jan 2014-Nature
TL;DR: Understanding this novel RNA crosstalk will lead to significant insight into gene regulatory networks and have implications in human development and disease.
Abstract: Recent reports have described an intricate interplay among diverse RNA species, including protein-coding messenger RNAs and non-coding RNAs such as long non-coding RNAs, pseudogenes and circular RNAs. These RNA transcripts act as competing endogenous RNAs (ceRNAs) or natural microRNA sponges — they communicate with and co-regulate each other by competing for binding to shared microRNAs, a family of small non-coding RNAs that are important post-transcriptional regulators of gene expression. Understanding this novel RNA crosstalk will lead to significant insight into gene regulatory networks and have implications in human development and disease.

2,869 citations

Journal ArticleDOI
TL;DR: The lncRNA landscape characterized here may shed light on normal biology and cancer pathogenesis and may be valuable for future biomarker development.
Abstract: Long noncoding RNAs (lncRNAs) are emerging as important regulators of tissue physiology and disease processes including cancer. To delineate genome-wide lncRNA expression, we curated 7,256 RNA sequencing (RNA-seq) libraries from tumors, normal tissues and cell lines comprising over 43 Tb of sequence from 25 independent studies. We applied ab initio assembly methodology to this data set, yielding a consensus human transcriptome of 91,013 expressed genes. Over 68% (58,648) of genes were classified as lncRNAs, of which 79% were previously unannotated. About 1% (597) of the lncRNAs harbored ultraconserved elements, and 7% (3,900) overlapped disease-associated SNPs. To prioritize lineage-specific, disease-associated lncRNA expression, we employed non-parametric differential expression testing and nominated 7,942 lineage- or cancer-associated lncRNA genes. The lncRNA landscape characterized here may shed light on normal biology and cancer pathogenesis and may be valuable for future biomarker development.

2,209 citations

Journal ArticleDOI
TL;DR: This review guides the reader through important aspects of non-coding RNA biology, including their biogenesis, mode of actions, physiological function, as well as their role in the disease context (such as in cancer or the cardiovascular system).
Abstract: Advances in RNA-sequencing techniques have led to the discovery of thousands of non-coding transcripts with unknown function. There are several types of non-coding linear RNAs such as microRNAs (miRNA) and long non-coding RNAs (lncRNA), as well as circular RNAs (circRNA) consisting of a closed continuous loop. This review guides the reader through important aspects of non-coding RNA biology. This includes their biogenesis, mode of actions, physiological function, as well as their role in the disease context (such as in cancer or the cardiovascular system). We specifically focus on non-coding RNAs as potential therapeutic targets and diagnostic biomarkers.

1,238 citations

Journal ArticleDOI
TL;DR: The review here will emphasize their aberrant expression and function in cancer, and the roles in cancer diagnosis and therapy will be also discussed.

837 citations

Journal ArticleDOI
TL;DR: The characteristics of lncRNAs, including their roles, functions, and working mechanisms are summarized, methods for identifying and annotating lnc RNAs are described, and future opportunities for lncRNA-based therapies using antisense oligonucleotides are discussed.

736 citations

References
More filters
Journal ArticleDOI
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Abstract: Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

14,075 citations

Journal ArticleDOI
TL;DR: The rapidly advancing field of long ncRNAs is reviewed, describing their conservation, their organization in the genome and their roles in gene regulation, and the medical implications.
Abstract: In mammals and other eukaryotes most of the genome is transcribed in a developmentally regulated manner to produce large numbers of long non-coding RNAs (ncRNAs). Here we review the rapidly advancing field of long ncRNAs, describing their conservation, their organization in the genome and their roles in gene regulation. We also consider the medical implications, and the emerging recognition that any transcript, regardless of coding potential, can have an intrinsic function as an RNA.

4,911 citations

Journal ArticleDOI
15 Apr 2010-Nature
TL;DR: It is shown that lincRNAs in the HOX loci become systematically dysregulated during breast cancer progression, indicating that l incRNAs have active roles in modulating the cancer epigenome and may be important targets for cancer diagnosis and therapy.
Abstract: Large intervening non-coding RNAs (lincRNAs) are pervasively transcribed in the genome yet their potential involvement in human disease is not well understood. Recent studies of dosage compensation, imprinting, and homeotic gene expression suggest that individual lincRNAs can function as the interface between DNA and specific chromatin remodelling activities. Here we show that lincRNAs in the HOX loci become systematically dysregulated during breast cancer progression. The lincRNA termed HOTAIR is increased in expression in primary breast tumours and metastases, and HOTAIR expression level in primary tumours is a powerful predictor of eventual metastasis and death. Enforced expression of HOTAIR in epithelial cancer cells induced genome-wide re-targeting of Polycomb repressive complex 2 (PRC2) to an occupancy pattern more resembling embryonic fibroblasts, leading to altered histone H3 lysine 27 methylation, gene expression, and increased cancer invasiveness and metastasis in a manner dependent on PRC2. Conversely, loss of HOTAIR can inhibit cancer invasiveness, particularly in cells that possess excessive PRC2 activity. These findings indicate that lincRNAs have active roles in modulating the cancer epigenome and may be important targets for cancer diagnosis and therapy.

4,605 citations

Book
01 May 2015
TL;DR: An acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm, which computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment.
Abstract: Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the "multiple segment Viterbi" (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call "sparse rescaling". These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches.

4,492 citations

Journal ArticleDOI
12 Mar 2009-Nature
TL;DR: It is demonstrated that specific lincRNAs are transcriptionally regulated by key transcription factors in these processes such as p53, NFκB, Sox2, Oct4 (also known as Pou5f1) and Nanog, defining a unique collection of functional linc RNAs that are highly conserved and implicated in diverse biological processes.
Abstract: There is growing recognition that mammalian cells produce many thousands of large intergenic transcripts. However, the functional significance of these transcripts has been particularly controversial. Although there are some well-characterized examples, most (>95%) show little evidence of evolutionary conservation and have been suggested to represent transcriptional noise. Here we report a new approach to identifying large non-coding RNAs using chromatin-state maps to discover discrete transcriptional units intervening known protein-coding loci. Our approach identified ~1,600 large multi-exonic RNAs across four mouse cell types. In sharp contrast to previous collections, these large intervening non-coding RNAs (lincRNAs) show strong purifying selection in their genomic loci, exonic sequences and promoter regions, with greater than 95% showing clear evolutionary conservation. We also developed a functional genomics approach that assigns putative functions to each lincRNA, demonstrating a diverse range of roles for lincRNAs in processes from embryonic stem cell pluripotency to cell proliferation. We obtained independent functional validation for the predictions for over 100 lincRNAs, using cell-based assays. In particular, we demonstrate that specific lincRNAs are transcriptionally regulated by key transcription factors in these processes such as p53, NFκB, Sox2, Oct4 (also known as Pou5f1) and Nanog. Together, these results define a unique collection of functional lincRNAs that are highly conserved and implicated in diverse biological processes.

3,875 citations

Related Papers (5)