scispace - formally typeset
Search or ask a question
Topic

Annotation

About: Annotation is a research topic. Over the lifetime, 6719 publications have been published within this topic receiving 203463 citations. The topic is also known as: note & markup.


Papers
More filters
Journal ArticleDOI
TL;DR: The GenomeTools, a convenient and efficient software library and associated software tools for developing bioinformatics software intended to create, process or convert annotation graphs, strictly follow the annotation graph approach, offering a unified graph-based representation.
Abstract: Genome annotations are often published as plain text files describing genomic features and their subcomponents by an implicit annotation graph. In this paper, we present the GenomeTools, a convenient and efficient software library and associated software tools for developing bioinformatics software intended to create, process or convert annotation graphs. The GenomeTools strictly follow the annotation graph approach, offering a unified graph-based representation. This gives the developer intuitive and immediate access to genomic features and tools for their manipulation. To process large annotation sets with low memory overhead, we have designed and implemented an efficient pull-based approach for sequential processing of annotations. This allows to handle even the largest annotation sets, such as a complete catalogue of human variations. Our object-oriented C-based software library enables a developer to conveniently implement their own functionality on annotation graphs and to integrate it into larger workflows, simultaneously accessing compressed sequence data if required. The careful C implementation of the GenomeTools does not only ensure a light-weight memory footprint while allowing full sequential as well as random access to the annotation graph, but also facilitates the creation of bindings to a variety of script programming languages (like Python and Ruby) sharing the same interface.

330 citations

Journal ArticleDOI
TL;DR: CAMERA collects and links metadata relevant to environmental metagenome data sets with annotation in a semantically-aware environment allowing users to write expressive semantic queries against the database.
Abstract: The Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis (CAMERA, http://camera.calit2.net/) is a database and associated computational infrastructure that provides a single system for depositing, locating, analyzing, visualizing and sharing data about microbial biology through an advanced web-based analysis portal. CAMERA collects and links metadata relevant to environmental metagenome data sets with annotation in a semantically-aware environment allowing users to write expressive semantic queries against the database. To meet the needs of the research community, users are able to query metadata categories such as habitat, sample type, time, location and other environmental physicochemical parameters. CAMERA is compliant with the standards promulgated by the Genomic Standards Consortium (GSC), and sustains a role within the GSC in extending standards for content and format of the metagenomic data and metadata and its submission to the CAMERA repository. To ensure wide, ready access to data and annotation, CAMERA also provides data submission tools to allow researchers to share and forward data to other metagenomics sites and community data archives such as GenBank. It has multiple interfaces for easy submission of large or complex data sets, and supports pre-registration of samples for sequencing. CAMERA integrates a growing list of tools and viewers for querying, analyzing, annotating and comparing metagenome and genome data.

329 citations

Patent
Ramesh Sarukkai1
26 Jun 2006
TL;DR: In this article, a trust network is defined for each user, and annotations by any member of the user's trust network are made visible to the user during search and/or browsing of the corpus if the querying user and trust network members use similar queries to identify documents in the corpus.
Abstract: Computer systems and methods incorporate user annotations (metadata) regarding various pages or sites, including annotations by a querying user and by members of a trust network defined for the querying user into search and browsing of a corpus such as the World Wide Web. A trust network is defined for each user, and annotations by any member of the querying user's trust network are made visible to the querying user during search and/or browsing of the corpus if the querying user and trust network members use similar queries to identify documents in the corpus. Users can also limit searches to content annotated by members of their trust networks or by members of a community selected by the user.

322 citations

Proceedings ArticleDOI
05 Jun 2009
TL;DR: An empirical study is conducted to examine the effect of noisy annotations on the performance of sentiment classification models, and evaluate the utility of annotation selection on classification accuracy and efficiency.
Abstract: Annotation acquisition is an essential step in training supervised classifiers. However, manual annotation is often time-consuming and expensive. The possibility of recruiting annotators through Internet services (e.g., Amazon Mechanic Turk) is an appealing option that allows multiple labeling tasks to be outsourced in bulk, typically with low overall costs and fast completion rates. In this paper, we consider the difficult problem of classifying sentiment in political blog snippets. Annotation data from both expert annotators in a research lab and non-expert annotators recruited from the Internet are examined. Three selection criteria are identified to select high-quality annotations: noise level, sentiment ambiguity, and lexical uncertainty. Analysis confirm the utility of these criteria on improving data quality. We conduct an empirical study to examine the effect of noisy annotations on the performance of sentiment classification models, and evaluate the utility of annotation selection on classification accuracy and efficiency.

316 citations

Patent
28 Sep 2001
TL;DR: In this paper, a data structure for annotating data files within a database is provided, which comprises a phoneme and word lattice which allows the quick and efficient searching of data files in response to a user's input query.
Abstract: A data structure is provided for annotating data files within a database. The annotation data comprises a phoneme and word lattice which allows the quick and efficient searching of data files within the database in response to a user's input query. The structure of the annotation data is such that it allows the input query to be made by voice and can be used for annotating various kinds of data files, such as audio data files, video data files, multimedia data files etc. The annotation data may be generated from the data files themselves or may be input by the user either from a voiced input or from a typed input.

314 citations


Network Information
Related Topics (5)
Inference
36.8K papers, 1.3M citations
81% related
Deep learning
79.8K papers, 2.1M citations
80% related
Graph (abstract data type)
69.9K papers, 1.2M citations
80% related
Unsupervised learning
22.7K papers, 1M citations
79% related
Cluster analysis
146.5K papers, 2.9M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20231,461
20223,073
2021305
2020401
2019383
2018373