scispace - formally typeset
Search or ask a question
Topic

Annotation

About: Annotation is a research topic. Over the lifetime, 6719 publications have been published within this topic receiving 203463 citations. The topic is also known as: note & markup.


Papers
More filters
Proceedings Article
01 Dec 2006
TL;DR: DBMS as discussed by the authors is an extensible prototype database management system for supporting biological data, which extends the functionalities of current DBMSs with annotations and provenance management including storage, indexing, manipulation, and querying of annotation and provenances as first class objects in bdbms.
Abstract: Biologists are increasingly using databases for storing and managing their data. Biological databases typically consist of a mixture of raw data, metadata, sequences, annotations, and related data obtained from various sources. Current database technology lacks several functionalities that are needed by biological databases. In this paper, we introduce bdbms, an extensible prototype database management system for supporting biological data. bdbms extends the functionalities of current DBMSs with: (1) Annotation and provenance management including storage, indexing, manipulation, and querying of annotation and provenance as rst class objects in bdbms, (2) Local dependency tracking to track the dependencies and derivations among data items, (3) Update authorization to support data curation via content-based authorization, in contrast to identity-based authorization, and (4) New access methods and their supporting operators that support pattern matching on various types of compressed biological data types. This paper presents the design of bdbms along with the techniques proposed to support these functionalities including an extension to SQL. We also outline some open issues in building bdbms.

45 citations

Journal ArticleDOI
TL;DR: A combination of Natural Language Processing and probabilistic classification is applied to re-rank documents returned by PubMed according to their relevance to Swiss-Prot annotation, and to identify significant terms in the documents.
Abstract: Motivation: Searching relevant publications for manual database annotation is a tedious task. In this paper, we apply a combination of Natural Language Processing (NLP) and probabilistic classification to re-rank documents returned by PubMed according to their relevance to SwissProt annotation, and to identify significant terms in the documents.

45 citations

Proceedings ArticleDOI
06 Aug 2009
TL;DR: A case study on POS annotation for Bangla and Hindi is reported, where it is observed that reliable linguistic annotation requires not only expert annotators, but also a great deal of supervision, leading to believe that reliable annotation requiring deep linguistic knowledge requires expertise and supervision.
Abstract: Alternative paths to linguistic annotation, such as those utilizing games or exploiting the web users, are becoming popular in recent times owing to their very high benefit-to-cost ratios. In this paper, however, we report a case study on POS annotation for Bangla and Hindi, where we observe that reliable linguistic annotation requires not only expert annotators, but also a great deal of supervision. For our hierarchical POS annotation scheme, we find that close supervision and training is necessary at every level of the hierarchy, or equivalently, complexity of the tagset. Nevertheless, an intelligent annotation tool can significantly accelerate the annotation process and increase the inter-annotator agreement for both expert and non-expert annotators. These findings lead us to believe that reliable annotation requiring deep linguistic knowledge (e.g., POS, chunking, Treebank, semantic role labeling) requires expertise and supervision. The focus, therefore, should be on design and development of appropriate annotation tools equipped with machine learning based predictive modules that can significantly boost the productivity of the annotators.

45 citations

Journal ArticleDOI
01 Jan 2016-Database
TL;DR: An interactive annotation tool, EXTRACT, which helps curators identify and extract standard-compliant terms for annotation of metagenomic records and other samples, and speeds up annotation by 15–25% and helpsCurators to detect terms that would otherwise have been missed.
Abstract: The microbial and molecular ecology research communities have made substantial progress on developing standards for annotating samples with environment metadata. However, sample manual annotation is a highly labor intensive process and requires familiarity with the terminologies used. We have therefore developed an interactive annotation tool, EXTRACT, which helps curators identify and extract standard-compliant terms for annotation of metagenomic records and other samples. Behind its web-based user interface, the system combines published methods for named entity recognition of environment, organism, tissue and disease terms. The evaluators in the BioCreative V Interactive Annotation Task found the system to be intuitive, useful, well documented and sufficiently accurate to be helpful in spotting relevant text passages and extracting organism and environment terms. Comparison of fully manual and text-mining-assisted curation revealed that EXTRACT speeds up annotation by 15–25% and helps curators to detect terms that would otherwise have been missed. Database URL: https://extract.hcmr.gr/

45 citations

Journal ArticleDOI
TL;DR: The UCSC Genome Browser (http://genome.ucsc.edu) as discussed by the authors is an omics data consolidator, graphical viewer, and general bioinformatics resource that continues to serve the community as it enters its 23rd year.
Abstract: Abstract The UCSC Genome Browser (https://genome.ucsc.edu) is an omics data consolidator, graphical viewer, and general bioinformatics resource that continues to serve the community as it enters its 23rd year. This year has seen an emphasis in clinical data, with new tracks and an expanded Recommended Track Sets feature on hg38 as well as the addition of a single cell track group. SARS-CoV-2 continues to remain a focus, with regular annotation updates to the browser and continued curation of our phylogenetic sequence placing tool, hgPhyloPlace, whose tree has now reached over 12M sequences. Our GenArk resource has also grown, offering over 2500 hubs and a system for users to request any absent assemblies. We have expanded our bigBarChart display type and created new ways to visualize data via bigRmsk and dynseq display. Displaying custom annotations is now easier due to our chromAlias system which eliminates the requirement for renaming sequence names to the UCSC standard. Users involved in data generation may also be interested in our new tools and trackDb settings which facilitate the creation and display of their custom annotations.

45 citations


Network Information
Related Topics (5)
Inference
36.8K papers, 1.3M citations
81% related
Deep learning
79.8K papers, 2.1M citations
80% related
Graph (abstract data type)
69.9K papers, 1.2M citations
80% related
Unsupervised learning
22.7K papers, 1M citations
79% related
Cluster analysis
146.5K papers, 2.9M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20231,461
20223,073
2021305
2020401
2019383
2018373