Topic
Annotation
About: Annotation is a research topic. Over the lifetime, 6719 publications have been published within this topic receiving 203463 citations. The topic is also known as: note & markup.
Papers published on a yearly basis
Papers
More filters
••
University of North Carolina at Chapel Hill1, Friedrich Miescher Institute for Biomedical Research2, Swiss Institute of Bioinformatics3, University of Melbourne4, Walter and Eliza Hall Institute of Medical Research5, University of California, Berkeley6, Roswell Park Cancer Institute7, University of Maryland, College Park8
TL;DR: This work provides a solution in the form of an R/Bioconductor package tximeta that performs numerous annotation and metadata gathering tasks automatically on behalf of users during the import of transcript quantification files.
Abstract: Correct annotation metadata is critical for reproducible and accurate RNA-seq analysis. When files are shared publicly or among collaborators with incorrect or missing annotation metadata, it becomes difficult or impossible to reproduce bioinformatic analyses from raw data. It also makes it more difficult to locate the transcriptomic features, such as transcripts or genes, in their proper genomic context, which is necessary for overlapping expression data with other datasets. We provide a solution in the form of an R/Bioconductor package tximeta that performs numerous annotation and metadata gathering tasks automatically on behalf of users during the import of transcript quantification files. The correct reference transcriptome is identified via a hashed checksum stored in the quantification output, and key transcript databases are downloaded and cached locally. The computational paradigm of automatically adding annotation metadata based on reference sequence checksums can greatly facilitate genomic workflows, by helping to reduce overhead during bioinformatic analyses, preventing costly bioinformatic mistakes, and promoting computational reproducibility. The tximeta package is available at https://bioconductor.org/packages/tximeta.
81 citations
••
TL;DR: A second-generation gene annotation of human chromosome 22 is reported, using expressed sequence databases, comparative sequence analysis, and experimental verification to suggest that the revised annotation criteria provide a paradigm for future annotation of the human genome.
Abstract: We report a second-generation gene annotation of human chromosome 22. Using expressed sequence databases, comparative sequence analysis, and experimental verification, we have extended genes, fused previously fragmented structures, and identified new genes. The total length in exons of annotation was increased by 74% over our previously published annotation and includes 546 protein-coding genes and 234 pseudogenes. Thirty-two potential protein-coding annotations are partial copies of other genes, and may represent duplications on an evolutionary path to change or loss of function. We also identified 31 non-protein-coding transcripts, including 16 possible antisense RNAs. By extrapolation, we estimate the human genome contains 29,000-36,000 protein-coding genes, 21,300 pseudogenes, and 1500 antisense RNAs. We suggest that our revised annotation criteria provide a paradigm for future annotation of the human genome.
81 citations
•
07 May 2002TL;DR: This contribution describes the open source project Edutella which builds upon metadata standards defined for the WWW and aims to provide an RDF-based metadata infrastructure for P2P applications, building on the recently announced JXTA Framework.
Abstract: P2P applications for searching and exchanging information over the Web have become increasingly popular. This has lead to a number of (usually thematically) focused communities, which allow efficient searching within such communities, and which use specific metadata sets to specify the resources stored within the P2P network. By concentrating on domain and application specific formats for metadata and query languages, however, current P2P networks appear to be fragmenting into non-interoperable niche markets. This contribution describes the open source project Edutella which builds upon metadata standards defined for the WWW and aims to provide an RDF-based metadata infrastructure for P2P applications, building on the recently announced JXTA Framework. We describe one basic service (query) and an Edutella application (annotation) within this network, both being built on a common query language exchange format, and specify the main architecture and APIs of the Edutella P2P network.
81 citations
•
31 Jul 2003
TL;DR: In this article, an interactive machine learning-based system that incrementally learns how to annotate new text data is presented, where the user is selectively presented for review and appropriate action, with a convenient and efficient interface so that context of use can be verified if necessary in order to evaluate the annotations and correct them.
Abstract: An interactive machine learning based system that incrementally learns, on the basis of text data, how to annotate new text data. The system and method starts with partially annotated training data or alternatively unannotated training data and a set of examples of what is to be learned. Through iterative interactive training sessions with a user the system trains annotators, and these are in turn used to discover more annotations in the text data. Once all of the text data or a sufficient amount of the text data is annotated, at the user's discretion, the system learns a final annotator or annotators, which are exported and available to annotate new textual data. As the iterative training process occurs the user is selectively presented for review and appropriate action, system-determined representations of the annotation instances and provided a convenient and efficient interface so that context of use can be verified if necessary in order to evaluate the annotations and correct them, where required. At the user's discretion, annotations that receive a high confidence level can be automatically accepted and those with low confidence levels can be automatically rejected.
81 citations