scispace - formally typeset
Search or ask a question
Topic

Annotation

About: Annotation is a research topic. Over the lifetime, 6719 publications have been published within this topic receiving 203463 citations. The topic is also known as: note & markup.


Papers
More filters
Journal ArticleDOI
TL;DR: A series of algorithmic steps were taken to increase the confidence of EPT identification for these organisms, including generation of smaller subdatabases to be searched against, and definition of E PT criteria that accommodates the more complex eukaryotic gene architecture.
Abstract: While genome sequencing is becoming ever more routine, genome annotation remains a challenging process. Identification of the coding sequences within the genomic milieu presents a tremendous challenge, especially for eukaryotes with their complex gene architectures. Here, we present a method to assist the annotation process through the use of proteomic data and bioinformatics. Mass spectra of digested protein preparations of the organism of interest were acquired and searched against a protein database created by a six-frame translation of the genome. The identified peptides were mapped back to the genome, compared to the current annotation, and then categorized as supporting or extending the current genome annotation. We named the classified peptides Expressed Peptide Tags (EPTs). The well-annotated bacterium Rhodopseudomonas palustris was used as a control for the method and showed a high degree of correlation between EPT mapping and the current annotation, with 86% of the EPTs confirming existing gene calls and less than 1% of the EPTs expanding on the current annotation. The eukaryotic plant pathogens Phytophthora ramorum and Phytophthora sojae, whose genomes have been recently sequenced and are much less well-annotated, were also subjected to this method. A series of algorithmic steps were taken to increase the confidence of EPT identification for these organisms, including generation of smaller subdatabases to be searched against, and definition of EPT criteria that accommodates the more complex eukaryotic gene architecture. As expected, the analysis of the Phytophthora species showed less correlation between EPT mapping and their current annotation. While approximately 76% of Phytophthora EPTs supported the current annotation, a portion of them (7.7% and 12.9% for P. ramorum and P. sojae, respectively) suggested modification to current gene calls or identified novel genes that were missed by the current genome annotation of these organisms.

38 citations

Book ChapterDOI
02 Nov 2020
TL;DR: Evaluations run on a novel dataset consisting of a set of high-quality manually-curated tables with non-obviously linkable cells show that ambiguity is a key problem for entity linking algorithms and encourage a promising direction for future work in the field.
Abstract: Table annotation is a key task to improve querying the Web and support the Knowledge Graph population from legacy sources (tables). Last year, the SemTab challenge was introduced to unify different efforts to evaluate table annotation algorithms by providing a common interface and several general-purpose datasets as a ground truth. The SemTab dataset is useful to have a general understanding of how these algorithms work, and the organizers of the challenge included some artificial noise to the data to make the annotation trickier. However, it is hard to analyze specific aspects in an automatic way. For example, the ambiguity of names at the entity-level can largely affect the quality of the annotation. In this paper, we propose a novel dataset to complement the datasets proposed by SemTab. The dataset consists of a set of high-quality manually-curated tables with non-obviously linkable cells, i.e., where values are ambiguous names, typos, and misspelled entity names not appearing in the current version of the SemTab dataset. These challenges are particularly relevant for the ingestion of structured legacy sources into existing knowledge graphs. Evaluations run on this dataset show that ambiguity is a key problem for entity linking algorithms and encourage a promising direction for future work in the field.

38 citations

Patent
07 Feb 2002
TL;DR: In this paper, a system for annotating image data prompts a user to annotate image data and transmits the annotation to a recipient according to predefined preferences to facilitate simultaneous review of the image and the annotation by the recipient.
Abstract: A system for annotating image data prompts a user to annotate image data and transmits the annotation to a recipient according to predefined preferences to facilitate simultaneous review of the image data and the annotation by the recipient.

38 citations

Proceedings ArticleDOI
23 Aug 2004
TL;DR: The results of the evaluation suggest that the cross-language annotation transfer methodology is a promising solution allowing for the exploitation of existing (mostly English) annotated resources to bootstrap the creation of annotated corpora in new (resource-poor) languages with greatly reduced human effort.
Abstract: In this paper we illustrate and evaluate an approach to the creation of high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the assumption that if a text in one language has been annotated and its translation has not, annotations can be transferred from the source text to the target using word alignment as a bridge. The transfer approach has been tested in the creation of the MultiSemCor corpus, an English/Italian parallel corpus created on the basis of the English SemCor corpus. In MultiSemCor texts are aligned at the word level and semantically annotated with a shared inventory of senses. We present some experiments carried out to evaluate the different steps involved in the methodology. The results of the evaluation suggest that the cross-language annotation transfer methodology is a promising solution allowing for the exploitation of existing (mostly English) annotated resources to bootstrap the creation of annotated corpora in new (resource-poor) languages with greatly reduced human effort.

38 citations

Proceedings ArticleDOI
09 Jul 2007
TL;DR: A probabilistic approach to refine image annotations by incorporating semantic relations between annotation words using a conditional random field (CRF) model where each vertex indicates the final decision on a candidate annotation word.
Abstract: In this paper, we present a probabilistic approach to refine image annotations by incorporating semantic relations between annotation words. Our approach firstly predicts a candidate set of annotation words with confidence scores. This is achieved by the relevance vector machine (RVM), which is a kernel based probabilistic classifier in order to cope with nonlinear classification. Given the candidate annotations, we model semantic relationships between words using a conditional random field (CRF) model where each vertex indicates the final decision (true / false) on a candidate annotation word. The refined annotation is given by inferring the most likely states of these vertexes. In the CRF model, we consider the confidence scores given by the RVM classifiers as local evidences. In addition, we utilise Normalized Google distances (NGD's) between two words as their contextual potential. NGD is a distance function between two words obtained by searching a pair of words using the Google search engine. It has a simple mathematical formulation with a foundation in Kolmogorov theory. We also propose a learning algorithm to tune the weight parameters in the CRF model. These weight parameters control the balance between the local evidence of a single word and the contextual relation between words. Our experiments on the Corel images demonstrate the effect of our approach.

38 citations


Network Information
Related Topics (5)
Inference
36.8K papers, 1.3M citations
81% related
Deep learning
79.8K papers, 2.1M citations
80% related
Graph (abstract data type)
69.9K papers, 1.2M citations
80% related
Unsupervised learning
22.7K papers, 1M citations
79% related
Cluster analysis
146.5K papers, 2.9M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20231,461
20223,073
2021305
2020401
2019383
2018373