scispace - formally typeset
Search or ask a question
Topic

Annotation

About: Annotation is a research topic. Over the lifetime, 6719 publications have been published within this topic receiving 203463 citations. The topic is also known as: note & markup.


Papers
More filters
Posted ContentDOI
25 Sep 2019-bioRxiv
TL;DR: This work provides a solution in the form of an R/Bioconductor package tximeta that performs numerous annotation and metadata gathering tasks automatically on behalf of users during the import of transcript quantification files.
Abstract: Correct annotation metadata is critical for reproducible and accurate RNA-seq analysis. When files are shared publicly or among collaborators with incorrect or missing annotation metadata, it becomes difficult or impossible to reproduce bioinformatic analyses from raw data. It also makes it more difficult to locate the transcriptomic features, such as transcripts or genes, in their proper genomic context, which is necessary for overlapping expression data with other datasets. We provide a solution in the form of an R/Bioconductor package tximeta that performs numerous annotation and metadata gathering tasks automatically on behalf of users during the import of transcript quantification files. The correct reference transcriptome is identified via a hashed checksum stored in the quantification output, and key transcript databases are downloaded and cached locally. The computational paradigm of automatically adding annotation metadata based on reference sequence checksums can greatly facilitate genomic workflows, by helping to reduce overhead during bioinformatic analyses, preventing costly bioinformatic mistakes, and promoting computational reproducibility. The tximeta package is available at https://bioconductor.org/packages/tximeta.

81 citations

Journal ArticleDOI
TL;DR: A second-generation gene annotation of human chromosome 22 is reported, using expressed sequence databases, comparative sequence analysis, and experimental verification to suggest that the revised annotation criteria provide a paradigm for future annotation of the human genome.
Abstract: We report a second-generation gene annotation of human chromosome 22. Using expressed sequence databases, comparative sequence analysis, and experimental verification, we have extended genes, fused previously fragmented structures, and identified new genes. The total length in exons of annotation was increased by 74% over our previously published annotation and includes 546 protein-coding genes and 234 pseudogenes. Thirty-two potential protein-coding annotations are partial copies of other genes, and may represent duplications on an evolutionary path to change or loss of function. We also identified 31 non-protein-coding transcripts, including 16 possible antisense RNAs. By extrapolation, we estimate the human genome contains 29,000-36,000 protein-coding genes, 21,300 pseudogenes, and 1500 antisense RNAs. We suggest that our revised annotation criteria provide a paradigm for future annotation of the human genome.

81 citations

Proceedings Article
07 May 2002
TL;DR: This contribution describes the open source project Edutella which builds upon metadata standards defined for the WWW and aims to provide an RDF-based metadata infrastructure for P2P applications, building on the recently announced JXTA Framework.
Abstract: P2P applications for searching and exchanging information over the Web have become increasingly popular. This has lead to a number of (usually thematically) focused communities, which allow efficient searching within such communities, and which use specific metadata sets to specify the resources stored within the P2P network. By concentrating on domain and application specific formats for metadata and query languages, however, current P2P networks appear to be fragmenting into non-interoperable niche markets. This contribution describes the open source project Edutella which builds upon metadata standards defined for the WWW and aims to provide an RDF-based metadata infrastructure for P2P applications, building on the recently announced JXTA Framework. We describe one basic service (query) and an Edutella application (annotation) within this network, both being built on a common query language exchange format, and specify the main architecture and APIs of the Edutella P2P network.

81 citations

Patent
31 Jul 2003
TL;DR: In this article, an interactive machine learning-based system that incrementally learns how to annotate new text data is presented, where the user is selectively presented for review and appropriate action, with a convenient and efficient interface so that context of use can be verified if necessary in order to evaluate the annotations and correct them.
Abstract: An interactive machine learning based system that incrementally learns, on the basis of text data, how to annotate new text data. The system and method starts with partially annotated training data or alternatively unannotated training data and a set of examples of what is to be learned. Through iterative interactive training sessions with a user the system trains annotators, and these are in turn used to discover more annotations in the text data. Once all of the text data or a sufficient amount of the text data is annotated, at the user's discretion, the system learns a final annotator or annotators, which are exported and available to annotate new textual data. As the iterative training process occurs the user is selectively presented for review and appropriate action, system-determined representations of the annotation instances and provided a convenient and efficient interface so that context of use can be verified if necessary in order to evaluate the annotations and correct them, where required. At the user's discretion, annotations that receive a high confidence level can be automatically accepted and those with low confidence levels can be automatically rejected.

81 citations


Network Information
Related Topics (5)
Inference
36.8K papers, 1.3M citations
81% related
Deep learning
79.8K papers, 2.1M citations
80% related
Graph (abstract data type)
69.9K papers, 1.2M citations
80% related
Unsupervised learning
22.7K papers, 1M citations
79% related
Cluster analysis
146.5K papers, 2.9M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20231,461
20223,073
2021305
2020401
2019383
2018373