Topic

Annotation

About: Annotation is a research topic. Over the lifetime, 6719 publications have been published within this topic receiving 203463 citations. The topic is also known as: note & markup.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

EnTAP: Bringing faster and smarter functional annotation to non-model eukaryotic transcriptomes

[...]

Alexander Hart¹, Samuel Ginzburg¹, Muyang Sam Xu¹, Cera R Fisher¹, Nasim Rahmatpour¹, Jeffry B. Mitton², Robin Paul¹, Jill L. Wegrzyn¹ - Show less +4 more•Institutions (2)

University of Connecticut¹, University of Colorado Boulder²

01 Mar 2020-Molecular Ecology Resources

TL;DR: EnTAP (Eukaryotic Non‐Model Transcriptome Annotation Pipeline) was designed to improve the accuracy, speed, and flexibility of functional gene annotation for de novo assembled transcriptomes in non‐model eukaryotes.

...read moreread less

Abstract: EnTAP (Eukaryotic Non-Model Transcriptome Annotation Pipeline) was designed to improve the accuracy, speed, and flexibility of functional gene annotation for de novo assembled transcriptomes in non-model eukaryotes. This software package addresses the fragmentation and related assembly issues that result in inflated transcript estimates and poor annotation rates of protein-coding transcripts. Following filters applied through assessment of true expression and frame selection, open-source tools are leveraged to functionally annotate the reduced set of translated proteins. Downstream features include fast similarity search across five repositories, protein domain assignment, orthologous gene family assessment, and Gene Ontology (GO) term assignment. The final annotation integrates across multiple databases and selects an optimal assignment from a combination of weighted metrics describing similarity search score, taxonomic relationship, and informativeness. Researchers have the option to include additional filters to identify and remove contaminants, identify associated pathways, and prepare the transcripts for enrichment analysis. This fully featured pipeline is easy to install, configure, and runs significantly faster than comparable annotation packages. EnTAP is optimized to generate extensive functional information for the gene space of organisms with limited or poorly characterized genomic resources.

...read moreread less

76 citations

DOI•

Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification.

[...]

Oliver Schwengers¹, Lukas Jelonek¹, Marius Alfred Dieckmann¹, Sebastian Beyvers¹, Jochen Blom¹, Alexander Goesmann¹ - Show less +2 more•Institutions (1)

University of Giessen¹

01 Nov 2021

TL;DR: Bakta as discussed by the authors is a command-line software tool for the robust, taxon-independent, thorough and, nonetheless, fast annotation of bacterial genomes, including the detection of small proteins taking into account replicon metadata.

...read moreread less

Abstract: Command-line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command-line software pipelines heavily depend on taxon-specific databases or sufficiently well annotated reference genomes. Here, we introduce Bakta, a new command-line software tool for the robust, taxon-independent, thorough and, nonetheless, fast annotation of bacterial genomes. Bakta conducts a comprehensive annotation workflow including the detection of small proteins taking into account replicon metadata. The annotation of coding sequences is accelerated via an alignment-free sequence identification approach that in addition facilitates the precise assignment of public database cross-references. Annotation results are exported in GFF3 and International Nucleotide Sequence Database Collaboration (INSDC)-compliant flat files, as well as comprehensive JSON files, facilitating automated downstream analysis. We compared Bakta to other rapid contemporary command-line annotation software tools in both targeted and taxonomically broad benchmarks including isolates and metagenomic-assembled genomes. We demonstrated that Bakta outperforms other tools in terms of functional annotations, the assignment of functional categories and database cross-references, whilst providing comparable wall-clock runtimes. Bakta is implemented in Python 3 and runs on MacOS and Linux systems. It is freely available under a GPLv3 license at https://github.com/oschwengers/bakta. An accompanying web version is available at https://bakta.computational.bio.

...read moreread less

76 citations

Posted Content•DOI•

Harmonization and Annotation of Single-cell Transcriptomics data with Deep Generative Models

[...]

Chenling Xu¹, Romain Lopez¹, Edouard Mehlman¹, Edouard Mehlman², Jeffrey Regier¹, Michael I. Jordan¹, Nir Yosef - Show less +3 more•Institutions (2)

University of California, Berkeley¹, École Polytechnique²

29 Jan 2019-bioRxiv

TL;DR: It is demonstrated that scVI and scANVI represent the integrated datasets with a single generative model that can be directly used for any probabilistic decision making task, using differential expression as a case study.

...read moreread less

Abstract: As single-cell transcriptomics becomes a mainstream technology, the natural next step is to integrate the accumulating data in order to achieve a common ontology of cell types and states. However, owing to various nuisance factors of variation, it is not straightforward how to compare gene expression levels across data sets and how to automatically assign cell type labels in a new data set based on existing annotations. In this manuscript, we demonstrate that our previously developed method, scVI, provides an effective and fully probabilistic approach for joint representation and analysis of cohorts of single-cell RNA-seq data sets, while accounting for uncertainty caused by biological and measurement noise. We also introduce single-cell ANnotation using Variational Inference (scANVI), a semi-supervised variant of scVI designed to leverage any available cell state annotations — for instance when only one data set in a cohort is annotated, or when only a few cells in a single data set can be labeled using marker genes. We demonstrate that scVI and scANVI compare favorably to the existing methods for data integration and cell state annotation in terms of accuracy, scalability, and adaptability to challenging settings such as a hierarchical structure of cell state labels. We further show that different from existing methods, scVI and scANVI represent the integrated datasets with a single generative model that can be directly used for any probabilistic decision making task, using differential expression as our case study. scVI and scANVI are available as open source software and can be readily used to facilitate cell state annotation and help ensure consistency and reproducibility across studies.

...read moreread less

76 citations

Journal Article•DOI•

Correlative Linear Neighborhood Propagation for Video Annotation

[...]

Jinhui Tang¹, Xian-Sheng Hua², Meng Wang², Zhiwei Gu², Guo-Jun Qi³, Xiuqing Wu³ - Show less +2 more•Institutions (3)

National University of Singapore¹, Microsoft², University of Science and Technology of China³

01 Apr 2009

TL;DR: This paper proposes a novel method named correlative linear neighborhood propagation to improve annotation performance and demonstrates its effectiveness and efficiency on the Text REtrieval Conference VIDeo retrieval evaluation data set.

...read moreread less

Abstract: Recently, graph-based semi-supervised learning methods have been widely applied in multimedia research area. However, for the application of video semantic annotation in multi-label setting, these methods neglect an important characteristic of video data: The semantic concepts appear correlatively and interact naturally with each other rather than exist in isolation. In this paper, we adapt this semantic correlation into graph-based semi-supervised learning and propose a novel method named correlative linear neighborhood propagation to improve annotation performance. Experiments conducted on the Text REtrieval Conference VIDeo retrieval evaluation data set have demonstrated its effectiveness and efficiency.

...read moreread less

76 citations

Proceedings Article•DOI•

CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning

[...]

Ziyu Yao¹, Jayavardhan Reddy Peddamail¹, Huan Sun¹•Institutions (1)

Ohio State University¹

13 May 2019

TL;DR: This work proposes an effective framework based on reinforcement learning that explicitly encourages the code annotation model to generate annotations that can be used for the retrieval task, and shows that code annotations generated by this framework are much more detailed and more useful for code retrieval, and they can further improve the performance of existing code retrieval models significantly.

...read moreread less

Abstract: To accelerate software development, much research has been performed to help people understand and reuse the huge amount of available code resources. Two important tasks have been widely studied: code retrieval, which aims to retrieve code snippets relevant to a given natural language query from a code base, and code annotation, where the goal is to annotate a code snippet with a natural language description. Despite their advancement in recent years, the two tasks are mostly explored separately. In this work, we investigate a novel perspective of Code annotation for Code retrieval (hence called “CoaCor”), where a code annotation model is trained to generate a natural language annotation that can represent the semantic meaning of a given code snippet and can be leveraged by a code retrieval model to better distinguish relevant code snippets from others. To this end, we propose an effective framework based on reinforcement learning, which explicitly encourages the code annotation model to generate annotations that can be used for the retrieval task. Through extensive experiments, we show that code annotations generated by our framework are much more detailed and more useful for code retrieval, and they can further improve the performance of existing code retrieval models significantly.1

...read moreread less

76 citations

Collapse

Network Information

Performance

Metrics

11,409

Papers

238,885

Citations

No. of papers in the topic in previous years
Year	Papers
2023	1,461
2022	3,073
2021	305
2020	401
2019	383
2018	373

Annotation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics