GENCODE: The reference human genome annotation for The ENCODE Project
Jennifer Harrow,Adam Frankish,José M. González,Electra Tapanari,Mark Diekhans,Felix Kokocinski,Bronwen Aken,Daniel Barrell,Amonida Zadissa,Stephen M. J. Searle,If H. A. Barnes,Alexandra Bignell,Veronika Boychenko,Toby Hunt,M. Kay,Gaurab Mukherjee,Jeena Rajan,Gloria Despacio-Reyes,Gary Saunders,Charles A. Steward,Rachel A. Harte,Michael F. Lin,Cédric Howald,Andrea Tanzer,Thomas Derrien,Jacqueline Chrast,Nathalie Walters,Suganthi Balasubramanian,Baikang Pei,Michael L. Tress,Jose Manuel Rodriguez,Iakes Ezkurdia,Jeltje Van Baren,Michael R. Brent,David Haussler,Manolis Kellis,Alfonso Valencia,Alexandre Reymond,Mark Gerstein,Roderic Guigó,Tim Hubbard +40 more
Reads0
Chats0
TLDR
This work has examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites, and over one-third of GENCODE protein-Coding genes aresupported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas.Abstract:
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.read more
Citations
More filters
Journal ArticleDOI
Human nonsense-mediated RNA decay initiates widely by endonucleolysis and targets snoRNA host genes
Søren Lykke-Andersen,Yun Chen,Britt R. Ardal,Berit Lilje,Johannes Waage,Albin Sandelin,Torben Heick Jensen +6 more
TL;DR: It is hypothesized that snoRNA host genes need to be highly transcribed to accommodate high levels of sno RNA production and that the expression of individual snoRNAs and their cognate spliced RNA can be uncoupled via alternative splicing and NMD.
Journal ArticleDOI
Arid1a Has Context-Dependent Oncogenic and Tumor Suppressor Functions in Liver Cancer
Xuxu Sun,Sam C. Wang,Yonglong Wei,Xin Luo,Yuemeng Jia,Lin Li,Purva Gopal,Min Zhu,Ibrahim Nassour,Jen Chieh Chuang,Thomas Maples,Cemre Celen,Liem H. Nguyen,Linwei Wu,Shunjun Fu,Weiping Li,Lijian Hui,Feng Tian,Yuan Ji,Shuyuan Zhang,Mahsa Sorouri,Tae Hyun Hwang,Lynda Letzig,Laura P. James,Zixi Wang,Adam C. Yopp,Amit G. Singal,Hao Zhu +27 more
TL;DR: Mechanistically, loss of Arid1a within tumors decreased chromatin accessibility and reduced transcription of genes associated with migration, invasion, and metastasis, and ARID1A has context-dependent tumor-suppressive and oncogenic roles in cancer.
Journal ArticleDOI
Identification and validation of potential prognostic lncRNA biomarkers for predicting survival in patients with multiple myeloma
TL;DR: Four lncRNAs were identified to be significantly associated with overall survival of patients with MM in the training dataset, and were combined to develop a four-lncRNA prognostic signature to stratify patients into high-risk and low-risk groups, demonstrating potential application of lnc RNAs as novel independent biomarkers for diagnosis and prognosis in MM.
Journal ArticleDOI
Meta-Analysis of Genome-Wide Association Studies for Abdominal Aortic Aneurysm Identifies Four New Disease-Specific Risk Loci
Gregory T. Jones,Gerard Tromp,Helena Kuivaniemi,Solveig Gretarsdottir,Annette F. Baas,Betti Giusti,Ewa Strauss,Femke N G van 't Hof,Tom R. Webb,Robert Erdman,Marylyn D. Ritchie,James R. Elmore,Anurag Verma,Sarah A. Pendergrass,Iftikhar J. Kullo,Zi Ye,Peggy L. Peissig,Omri Gottesman,Omri Gottesman,Shefali S. Verma,Jennifer Malinowski,Laura J. Rasmussen-Torvik,Kenneth M. Borthwick,Diane T. Smelser,David R. Crosslin,Mariza de Andrade,Evan J. Ryer,Catherine A. McCarty,E.P. Bottinger,Jennifer A. Pacheco,Dana C. Crawford,David Carrell,Glenn S. Gerhard,David P. Franklin,David J. Carey,Victoria L Phillips,Michael J.A. Williams,Wenhua Wei,Ross D. Blair,Andrew Hill,Thodor M. Vasudevan,David R. Lewis,Ian Thomson,J Krysa,Geraldine B. Hill,Justin A. Roake,Tony R. Merriman,Grzegorz Oszkinis,Silvia Galora,Claudia Saracini,Rosanna Abbate,Rosanna Abbate,Raffaele Pulli,Carlo Pratesi,Athanasios Saratzis,Ana Raquel Verissimo,Suzannah Bumpstead,Stephen A. Badger,Rachel E. Clough,Gillian Cockerill,Hany Hafez,D. Julian A. Scott,T. Simon Futers,Simon P. R. Romaine,Katherine I Bridge,Kathryn J. Griffin,Marc A. Bailey,Alberto Smith,Matthew M. Thompson,Frank M. van Bockxmeer,Stefan E Matthiasson,Gudmar Thorleifsson,Gudmar Thorleifsson,Unnur Thorsteinsdottir,Jan D. Blankensteijn,Joep A.W. Teijink,Joep A.W. Teijink,Cisca Wijmenga,Jacqueline de Graaf,Lambertus A. Kiemeney,Jes S. Lindholt,Anne Hughes,Declan Bradley,Kathleen Stirrups,Jonathan Golledge,Paul Norman,Janet T. Powell,Steve E. Humphries,Stephen E. Hamby,Alison H. Goodall,Christopher P. Nelson,Natzi Sakalihasan,Audrey Courtois,Robert E. Ferrell,Per Eriksson,Lasse Folkersen,Anders Franco-Cereceda,John D. Eicher,Andrew D. Johnson,Christer Betsholtz,Arno Ruusalepp,Arno Ruusalepp,Oscar Franzén,Oscar Franzén,Eric E. Schadt,Johan Björkegren,Leonard Lipovich,Leonard Lipovich,Anne M. Drolet,Eric L. G. Verhoeven,Clark J. Zeebregts,Robert H. Geelkerken,Marc R.H.M. van Sambeek,Steven M.M. van Sterkenburg,Jean-Paul P.M. de Vries,K. Stefansson,John R. Thompson,Paul I.W. de Bakker,Panos Deloukas,Robert D. Sayers,Seamus C. Harrison,Andre M. van Rij,Nilesh J. Samani,Matthew J. Bown +123 more
TL;DR: The 4 new risk loci for AAA seem to be specific for AAA compared with other cardiovascular diseases and related traits suggesting that traditional cardiovascular risk factor management may only have limited value in preventing the progression of aneurysmal disease.
Journal ArticleDOI
Comparative transcriptomics in human and mouse
TL;DR: Cross-species comparisons of genomes, transcriptomes and gene regulation are now feasible at unprecedented resolution and throughput, enabling the comparison of human and mouse biology at the molecular level.
References
More filters
Journal ArticleDOI
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Stephen F. Altschul,Thomas L. Madden,Alejandro A. Schäffer,Jinghui Zhang,Zheng Zhang,Webb Miller,David J. Lipman +6 more
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Journal ArticleDOI
The Protein Data Bank
Helen M. Berman,John D. Westbrook,Zukang Feng,Gary L. Gilliland,Talapady N. Bhat,Helge Weissig,Ilya N. Shindyalov,Philip E. Bourne +7 more
TL;DR: The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.
Journal ArticleDOI
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Journal ArticleDOI
The Pfam protein families database
Marco Punta,Penny Coggill,Ruth Y. Eberhardt,Jaina Mistry,John Tate,Chris Boursnell,Ningze Pang,Kristoffer Forslund,Goran Ceric,Jody Clements,Andreas Heger,Liisa Holm,Erik L. L. Sonnhammer,Sean R. Eddy,Alex Bateman,Robert D. Finn +15 more
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Journal ArticleDOI
Pfam: the protein families database.
Robert D. Finn,Alex Bateman,Jody Clements,Penelope Coggill,Ruth Y. Eberhardt,Sean R. Eddy,Andreas Heger,Kirstie Hetherington,Liisa Holm,Jaina Mistry,Erik L. L. Sonnhammer,John Tate,Marco Punta +12 more
TL;DR: Pfam as discussed by the authors is a widely used database of protein families, containing 14 831 manually curated entries in the current version, version 27.0, and has been updated several times since 2012.
Related Papers (5)
BEDTools: a flexible suite of utilities for comparing genomic features
Aaron R. Quinlan,Ira M. Hall +1 more