scispace - formally typeset
Search or ask a question
Author

Daniel Barrell

Bio: Daniel Barrell is an academic researcher from European Bioinformatics Institute. The author has contributed to research in topics: Annotation & UniProt. The author has an hindex of 25, co-authored 29 publications receiving 18261 citations. Previous affiliations of Daniel Barrell include Wellcome Trust Sanger Institute & Wellcome Trust.

Papers
More filters
Journal ArticleDOI
TL;DR: This work has examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites, and over one-third of GENCODE protein-Coding genes aresupported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas.
Abstract: The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.

4,281 citations

Journal ArticleDOI
Midori A. Harris, Jennifer I. Clark1, Ireland A1, Jane Lomax1, Michael Ashburner2, Michael Ashburner1, R. Foulger1, R. Foulger2, Karen Eilbeck3, Karen Eilbeck1, Suzanna E. Lewis1, Suzanna E. Lewis3, B. Marshall1, B. Marshall3, Christopher J. Mungall1, Christopher J. Mungall3, J. Richter3, J. Richter1, Gerald M. Rubin3, Gerald M. Rubin1, Judith A. Blake1, Carol J. Bult1, Dolan M1, Drabkin H1, Janan T. Eppig1, Hill Dp1, L. Ni1, Ringwald M1, Rama Balakrishnan1, Rama Balakrishnan4, J. M. Cherry1, J. M. Cherry4, Karen R. Christie1, Karen R. Christie4, Maria C. Costanzo1, Maria C. Costanzo4, Selina S. Dwight1, Selina S. Dwight4, Stacia R. Engel4, Stacia R. Engel1, Dianna G. Fisk1, Dianna G. Fisk4, Jodi E. Hirschman4, Jodi E. Hirschman1, Eurie L. Hong4, Eurie L. Hong1, Robert S. Nash1, Robert S. Nash4, Anand Sethuraman1, Anand Sethuraman4, Chandra L. Theesfeld4, Chandra L. Theesfeld1, David Botstein1, David Botstein5, Kara Dolinski5, Kara Dolinski1, Becket Feierbach1, Becket Feierbach5, Tanya Z. Berardini1, Tanya Z. Berardini6, S. Mundodi6, S. Mundodi1, Seung Y. Rhee1, Seung Y. Rhee6, Rolf Apweiler1, Daniel Barrell1, Camon E1, E. Dimmer1, Lee1, Rex L. Chisholm, Pascale Gaudet1, Pascale Gaudet7, Warren A. Kibbe7, Warren A. Kibbe1, Ranjana Kishore1, Ranjana Kishore8, Erich M. Schwarz1, Erich M. Schwarz8, Paul W. Sternberg8, Paul W. Sternberg1, M. Gwinn1, Hannick L1, Wortman J1, Matthew Berriman9, Matthew Berriman1, Wood9, Wood1, de la Cruz N1, de la Cruz N10, Peter J. Tonellato10, Peter J. Tonellato1, Pankaj Jaiswal1, Pankaj Jaiswal11, Seigfried T12, Seigfried T1, White R13, White R1 
TL;DR: The Gene Ontology (GO) project as discussed by the authors provides structured, controlled vocabularies and classifications that cover several domains of molecular and cellular biology and are freely available for community use in the annotation of genes, gene products and sequences.
Abstract: The Gene Ontology (GO) project (http://www.geneontology.org/) provides structured, controlled vocabularies and classifications that cover several domains of molecular and cellular biology and are freely available for community use in the annotation of genes, gene products and sequences. Many model organism databases and genome annotation groups use the GO and contribute their annotation sets to the GO resource. The GO database integrates the vocabularies and contributed annotations and provides full access to this information in several formats. Members of the GO Consortium continually work collectively, involving outside experts as needed, to expand and update the GO vocabularies. The GO Web resource also provides access to extensive documentation about the GO project and links to applications that use GO data for functional analyses.

3,565 citations

01 Sep 2012
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

2,767 citations

Rolf Apweiler, Maria Jesus Martin, Claire O'Donovan, Michele Magrane, Yasmin Alam-Faruque, Ricardo Antunes, Daniel Barrell, Benoit Bely, M Bingley, David Binns, Lynette Bower, Paul Browne, WM Chan, E. Dimmer, Ruth Y. Eberhardt, A. Fedotov, Rebecca E. Foulger, John S. Garavelli, Rachael P. Huntley, Julius O.B. Jacobsen, M. Kleen, Kati Laiho, Rasko Leinonen, Duncan Legge, Quan Lin, W Liu, Jie Luo, Sandra Orchard, Samuel Patient, Diego Poggioli, Manuela Pruess, Matthew Corbett, G di Martino, M Donnelly, P van Rensburg, Amos Marc Bairoch, Lydie Bougueleret, Ioannis Xenarios, S Altairac, Andrea H. Auchincloss, Ghislaine Argoud-Puy, Kristian B. Axelsen, Delphine Baratin, M. C. Blatter, Brigitte Boeckmann, Jerven Bolleman, L. Bollondi, Emmanuel Boutet, SB Quintaje, Lionel Breuza, Alan Bridge, E. Decastro, L Ciapina, D Coral, Elisabeth Coudert, Isabelle Cusin, G Delbard, M Doche, Dolnide Dornevil, Paula Duek Roggli, Séverine Duvaud, Anne Estreicher, L Famiglietti, M Feuermann, Sebastien Gehant, N. Farriol-Mathis, Serenella Ferro, Elisabeth Gasteiger, Alain Gateau, Gerritsen, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Nicolas Hulo, J. James, S. Jimenez, Florence Jungo, T. Kappler, Guillaume Keller, Corinne Lachaize, L Lane-Guermonprez, Petra S. Langendijk-Genevaux, Lara, P Lemercier, Damien Lieberherr, Tdo Lima, Mangold, Xavier D. Martin, Patrick Masson, M. Moinat, Anne Morgat, Anaïs Mottaz, Salvo Paesano, Ivo Pedruzzi, Sandrine Pilbout, Pillet, Sylvain Poux, Monica Pozzato, Nicole Redaschi, Catherine Rivoire, Bernd Roechert, Maria Victoria Schneider, Christian J. A. Sigrist, K Sonesson, S Staehli, Eleanor J Stanley, Andre Stutz, Shyamala Sundaram, Michael Tognolli, Laure Verbregue, A-L Veuthey, L Yip, L Zuletta, Cathy H. Wu, Cecilia N. Arighi, Leslie Arminski, Winona C. Barker, Chuming Chen, Yingfei Chen, Z-Z Hu, Hongzhan Huang, Raja Mazumder, Peter B. McGarvey, Darren A. Natale, Jules Nchoutmboube, Natalia V. Petrova, N Subramanian, Baris E. Suzek, U. Ugochukwu, Sona Vasudevan, C. R. Vinayaka, LS Yeh, Jian Zhang 
01 Jan 2010

961 citations

Journal ArticleDOI
TL;DR: The Gene Ontology Annotation database aims to provide high-quality electronic and manual annotations to the UniProt Knowledgebase (Swiss-Prot, TrEMBL and PIR-PSD) using the standardized vocabulary of theGene Ontology (GO).
Abstract: The Gene Ontology Annotation (GOA) database (http://www.ebi.ac.uk/GOA) aims to provide high-quality electronic and manual annotations to the UniProt Knowledgebase (Swiss-Prot, TrEMBL and PIR-PSD) using the standardized vocabulary of the Gene Ontology (GO). As a supplementary archive of GO annotation, GOA promotes a high level of integration of the knowledge represented in UniProt with other databases. This is achieved by converting UniProt annotation into a recognized computational format. GOA provides annotated entries for nearly 60,000 species (GOA-SPTr) and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. By integrating GO annotations from other model organism groups, GOA consolidates specialized knowledge and expertise to ensure the data remain a key reference for up-to-date biological information. Furthermore, the GOA database fully endorses the Human Proteomics Initiative by prioritizing the annotation of proteins likely to benefit human health and disease. In addition to a non-redundant set of annotations to the human proteome (GOA-Human) and monthly releases of its GO annotation for all species (GOA-SPTr), a series of GO mapping files and specific cross-references in other databases are also regularly distributed. GOA can be queried through a simple user-friendly web interface or downloaded in a parsable format via the EBI and GO FTP websites. The GOA data set can be used to enhance the annotation of particular model organism or gene expression data sets, although increasingly it has been used to evaluate GO predictions generated from text mining or protein interaction experiments. In 2004, the GOA team will build on its success and will continue to supplement the functional annotation of UniProt and work towards enhancing the ability of scientists to access all available biological information. Researchers wishing to query or contribute to the GOA project are encouraged to email: goa@ebi.ac.uk.

917 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure outperforms other aligners by a factor of >50 in mapping speed.
Abstract: Motivation Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. Results To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. Availability and implementation STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

30,684 citations

Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

13,548 citations

Journal ArticleDOI
23 Jan 2015-Science
TL;DR: In this paper, a map of the human tissue proteome based on an integrated omics approach that involves quantitative transcriptomics at the tissue and organ level, combined with tissue microarray-based immunohistochemistry, to achieve spatial localization of proteins down to the single-cell level.
Abstract: Resolving the molecular details of proteome variation in the different tissues and organs of the human body will greatly increase our knowledge of human biology and disease. Here, we present a map of the human tissue proteome based on an integrated omics approach that involves quantitative transcriptomics at the tissue and organ level, combined with tissue microarray-based immunohistochemistry, to achieve spatial localization of proteins down to the single-cell level. Our tissue-based analysis detected more than 90% of the putative protein-coding genes. We used this approach to explore the human secretome, the membrane proteome, the druggable proteome, the cancer proteome, and the metabolic functions in 32 different tissues and organs. All the data are integrated in an interactive Web-based database that allows exploration of individual proteins, as well as navigation of global expression patterns, in all major tissues and organs in the human body.

9,745 citations

Journal Article
01 Jan 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

8,106 citations

Journal ArticleDOI
TL;DR: The Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt), which is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces.
Abstract: To provide the scientific community with a single, centralized, authoritative resource for protein sequences and functional information, the Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt) consortium. Our mission is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces. The central database will have two sections, corresponding to the familiar Swiss-Prot (fully manually curated entries) and TrEMBL (enriched with automated classification, annotation and extensive cross-references). For convenient sequence searches, UniProt also provides several non-redundant sequence databases. The UniProt NREF (UniRef) databases provide representative subsets of the knowledgebase suitable for efficient searching. The comprehensive UniProt Archive (UniParc) is updated daily from many public source databases. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). The scientific community is encouraged to submit data for inclusion in UniProt.

7,298 citations