A census of human RNA-binding proteins.
TL;DR: This work presents a census of 1,542 manually curated RBPs that are analysed for their interactions with different classes of RNA, their evolutionary conservation, their abundance and their tissue-specific expression, a critical step towards the comprehensive characterization of proteins involved in human RNA metabolism.
Abstract: Post-transcriptional gene regulation (PTGR) concerns processes involved in the maturation, transport, stability and translation of coding and non-coding RNAs. RNA-binding proteins (RBPs) and ribonucleoproteins coordinate RNA processing and PTGR. The introduction of large-scale quantitative methods, such as next-generation sequencing and modern protein mass spectrometry, has renewed interest in the investigation of PTGR and the protein factors involved at a systems-biology level. Here, we present a census of 1,542 manually curated RBPs that we have analysed for their interactions with different classes of RNA, their evolutionary conservation, their abundance and their tissue-specific expression. Our analysis is a critical step towards the comprehensive characterization of proteins involved in human RNA metabolism.
Citations
More filters
••
TL;DR: An enhanced CLIP (eCLIP) protocol is developed that decreases requisite amplification by ∼1,000-fold, decreasing discarded PCR duplicate reads by ∼60% while maintaining single-nucleotide binding resolution, and improves specificity in the discovery of authentic binding sites.
Abstract: As RNA-binding proteins (RBPs) play essential roles in cellular physiology by interacting with target RNA molecules, binding site identification by UV crosslinking and immunoprecipitation (CLIP) of ribonucleoprotein complexes is critical to understanding RBP function. However, current CLIP protocols are technically demanding and yield low-complexity libraries with high experimental failure rates. We have developed an enhanced CLIP (eCLIP) protocol that decreases requisite amplification by ~1,000-fold, decreasing discarded PCR duplicate reads by ~60% while maintaining single-nucleotide binding resolution. By simplifying the generation of paired IgG and size-matched input controls, eCLIP improves specificity in the discovery of authentic binding sites. We generated 102 eCLIP experiments for 73 diverse RBPs in HepG2 and K562 cells (available at https://www.encodeproject.org), demonstrating that eCLIP enables large-scale and robust profiling, with amplification and sample requirements similar to those of ChIP-seq. eCLIP enables integrative analysis of diverse RBPs to reveal factor-specific profiles, common artifacts for CLIP and RNA-centric perspectives on RBP activity.
1,027 citations
••
TL;DR: The RNA targets and molecular and cellular functions of the new RBPs, as well as the possibility that some RBPs may be regulated by RNA rather than regulate RNA, are discussed.
Abstract: RNA-binding proteins (RBPs) are typically thought of as proteins that bind RNA through one or multiple globular RNA-binding domains (RBDs) and change the fate or function of the bound RNAs. Several hundred such RBPs have been discovered and investigated over the years. Recent proteome-wide studies have more than doubled the number of proteins implicated in RNA binding and uncovered hundreds of additional RBPs lacking conventional RBDs. In this Review, we discuss these new RBPs and the emerging understanding of their unexpected modes of RNA binding, which can be mediated by intrinsically disordered regions, protein-protein interaction interfaces and enzymatic cores, among others. We also discuss the RNA targets and molecular and cellular functions of the new RBPs, as well as the possibility that some RBPs may be regulated by RNA rather than regulate RNA.
1,013 citations
••
University of Massachusetts Medical School1, Broad Institute2, Stanford University3, Cold Spring Harbor Laboratory4, University of Washington5, University of California, San Diego6, Massachusetts Institute of Technology7, Ludwig Institute for Cancer Research8, University of California, San Francisco9, Salk Institute for Biological Studies10, California Institute of Technology11, University of California, Irvine12, Pennsylvania State University13, Lawrence Berkeley National Laboratory14, University of Connecticut Health Center15, Université de Montréal16, McGill University17, University of Minnesota18, Florida State University19, Yale University20, University of Alabama in Huntsville21, University of Chicago22, University of California, Merced23, University of Colorado Boulder24, Icahn School of Medicine at Mount Sinai25, Pompeu Fabra University26, University of Southern California27, University of California, Berkeley28, Harvard University29, Boston University30, Tongji University31
TL;DR: The authors summarize the data produced by phase III of the Encyclopedia of DNA Elements (ENCODE) project, a resource for better understanding of the human and mouse genomes, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development.
Abstract: The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.
999 citations
••
TL;DR: Together, these data define METTL3 as a regulator of a chromatin-based pathway that is necessary for maintenance of the leukaemic state and identify this enzyme as a potential therapeutic target for acute myeloid leukaemia.
Abstract: N6-methyladenosine (m6A) is an abundant internal RNA modification in both coding and non-coding RNAs that is catalysed by the METTL3-METTL14 methyltransferase complex. However, the specific role of these enzymes in cancer is still largely unknown. Here we define a pathway that is specific for METTL3 and is implicated in the maintenance of a leukaemic state. We identify METTL3 as an essential gene for growth of acute myeloid leukaemia cells in two distinct genetic screens. Downregulation of METTL3 results in cell cycle arrest, differentiation of leukaemic cells and failure to establish leukaemia in immunodeficient mice. We show that METTL3, independently of METTL14, associates with chromatin and localizes to the transcriptional start sites of active genes. The vast majority of these genes have the CAATT-box binding protein CEBPZ present at the transcriptional start site, and this is required for recruitment of METTL3 to chromatin. Promoter-bound METTL3 induces m6A modification within the coding region of the associated mRNA transcript, and enhances its translation by relieving ribosome stalling. We show that genes regulated by METTL3 in this way are necessary for acute myeloid leukaemia. Together, these data define METTL3 as a regulator of a chromatin-based pathway that is necessary for maintenance of the leukaemic state and identify this enzyme as a potential therapeutic target for acute myeloid leukaemia.
705 citations
01 Jan 2009
TL;DR: In this article, a review outlines the current understanding of miRNA target recognition in animals and discusses the widespread impact of miRNAs on both the expression and evolution of protein-coding genes.
Abstract: MicroRNAs (miRNAs) are endogenous ∼23 nt RNAs that play important gene-regulatory roles in animals and plants by pairing to the mRNAs of protein-coding genes to direct their posttranscriptional repression. This review outlines the current understanding of miRNA target recognition in animals and discusses the widespread impact of miRNAs on both the expression and evolution of protein-coding genes.
646 citations
References
More filters
••
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.
88,255 citations
"A census of human RNA-binding prote..." refers background in this paper
...database, thereby adding sequence-related insect rRNA sequences from the NCBI nucleotide database (Altschul et al., 1990)....
[...]
••
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals.
Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package.
Availability: http://maq.sourceforge.net
Contact: [email protected]
43,862 citations
••
TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Abstract: Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
35,225 citations
••
TL;DR: By following this protocol, investigators are able to gain an in-depth understanding of the biological themes in lists of genes that are enriched in genome-scale studies.
Abstract: DAVID bioinformatics resources consists of an integrated biological knowledgebase and analytic tools aimed at systematically extracting biological meaning from large gene/protein lists. This protocol explains how to use DAVID, a high-throughput and integrated data-mining environment, to analyze gene lists derived from high-throughput genomic experiments. The procedure first requires uploading a gene list containing any number of common gene identifiers followed by analysis using one or more text and pathway-mining tools such as gene functional classification, functional annotation chart or clustering and functional annotation table. By following this protocol, investigators are able to gain an in-depth understanding of the biological themes in lists of genes that are enriched in genome-scale studies.
31,015 citations
"A census of human RNA-binding prote..." refers methods in this paper
...Gene Ontology [using the DAVID functional annotation database (Ashburner et al., 2000; Huang et al., 2008)] and GOrilla (Eden et al....
[...]
••
TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.
20,335 citations
"A census of human RNA-binding prote..." refers background in this paper
...9 (Langmead et al., 2009) (Bowtie parameters “-v 1 -m 10 --all --best –strata”), allowing for one mismatch in read alignments and up...
[...]