scispace - formally typeset
Search or ask a question

Showing papers on "Annotation published in 2006"


Journal ArticleDOI
TL;DR: WEGO (Web Gene Ontology Annotation Plot) is a simple but useful tool for visualizing, comparing and plotting GO annotation results, designed to deal with the directed acyclic graph structure of GO to facilitate histogram creation of Go annotation results.
Abstract: Unified, structured vocabularies and classifications freely provided by the Gene Ontology (GO) Consortium are widely accepted in most of the large scale gene annotation projects. Consequently, many tools have been created for use with the GO ontologies. WEGO (Web Gene Ontology Annotation Plot) is a simple but useful tool for visualizing, comparing and plotting GO annotation results. Different from other commercial software for creating chart, WEGO is designed to deal with the directed acyclic graph structure of GO to facilitate histogram creation of GO annotation results. WEGO has been used widely in many important biological research projects, such as the rice genome project and the silkworm genome project. It has become one of the daily tools for downstream gene annotation analysis, especially when performing comparative genomics tasks. WEGO, along with the two other tools, namely External to GO Query and GO Archive Query, are freely available for all users at http://wego.genomics.org.cn. There are two available mirror sites at http://wego2.genomics.org.cn and http://wego.genomics.com.cn. Any suggestions are welcome at wego@genomics.org.cn.

2,460 citations


Journal ArticleDOI
TL;DR: MaGe integration of annotation data from bacterial genomes enhanced by a gene coding re-annotation process using accurate gene models, integration of results obtained with a wide range of bioinformatics methods and an advanced web interface allowing multiple users to refine the automatic assignment of gene product functions.
Abstract: Magnifying Genomes (MaGe) is a microbial genome annotation system based on a relational database containing information on bacterial genomes, as well as a web interface to achieve genome annotation projects. Our system allows one to initiate the annotation of a genome at the early stage of the finishing phase. MaGe's main features are (i) integration of annotation data from bacterial genomes enhanced by a gene coding re-annotation process using accurate gene models, (ii) integration of results obtained with a wide range of bioinformatics methods, among which exploration of gene context by searching for conserved synteny and reconstruction of metabolic pathways, (iii) an advanced web interface allowing multiple users to refine the automatic assignment of gene product functions. MaGe is also linked to numerous well-known biological databases and systems. Our system has been thoroughly tested during the annotation of complete bacterial genomes (Acinetobacter baylyi ADP1, Pseudoalteromonas haloplanktis, Frankia alni) and is currently used in the context of several new microbial genome annotation projects. In addition, MaGe allows for annotation curation and exploration of already published genomes from various genera (e.g. Yersinia, Bacillus and Neisseria). MaGe can be accessed at http://www.genoscope.cns.fr/agc/mage.

385 citations


Journal ArticleDOI
TL;DR: All four levels of genome annotation are discussed, with specific emphasis on two-dimensional annotation methods, and the study of changes in genome sequences that occur during adaptive evolution is studied.
Abstract: Our information about the gene content of organisms continues to grow as more genomes are sequenced and gene products are characterized. Sequence-based annotation efforts have led to a list of cellular components, which can be thought of as a one-dimensional annotation. With growing information about component interactions, facilitated by the advancement of various high-throughput technologies, systemic, or two-dimensional, annotations can be generated. Knowledge about the physical arrangement of chromosomes will lead to a three-dimensional spatial annotation of the genome and a fourth dimension of annotation will arise from the study of changes in genome sequences that occur during adaptive evolution. Here we discuss all four levels of genome annotation, with specific emphasis on two-dimensional annotation methods.

384 citations


Patent
Ramesh Sarukkai1
26 Jun 2006
TL;DR: In this article, a trust network is defined for each user, and annotations by any member of the user's trust network are made visible to the user during search and/or browsing of the corpus if the querying user and trust network members use similar queries to identify documents in the corpus.
Abstract: Computer systems and methods incorporate user annotations (metadata) regarding various pages or sites, including annotations by a querying user and by members of a trust network defined for the querying user into search and browsing of a corpus such as the World Wide Web. A trust network is defined for each user, and annotations by any member of the querying user's trust network are made visible to the querying user during search and/or browsing of the corpus if the querying user and trust network members use similar queries to identify documents in the corpus. Users can also limit searches to content annotated by members of their trust networks or by members of a community selected by the user.

322 citations


Journal ArticleDOI
TL;DR: The Rice Annotation Project Database (RAP-DB) is presented, which has been developed to provide access to the annotation data and serves as a hub for rice genomics.
Abstract: With the completion of the rice genome sequencing, a standardized annotation is necessary so that the information from the genome sequence can be fully utilized in understanding the biology of rice and other cereal crops. An annotation jamboree was held in Japan with the aim of annotating and manually curating all the genes in the rice genome. Here we present the Rice Annotation Project Database (RAP-DB), which has been developed to provide access to the annotation data. The RAP-DB has two different types of annotation viewers, BLAST and BLAT search, and other useful features. By connecting the annotations to other rice genomics data, such as full-length cDNAs and Tos17 mutant lines, the RAP-DB serves as a hub for rice genomics. All of the resources can be accessed through http://rapdb.lab.nig.ac.jp/.

243 citations


Journal ArticleDOI
TL;DR: GATU greatly simplifies the initial stages of genome annotation by using a closely related genome as a reference and significantly reduces the time required for annotation of genes and mature peptides as well as helping to standardize gene names between related organisms.
Abstract: Background Since DNA sequencing has become easier and cheaper, an increasing number of closely related viral genomes have been sequenced. However, many of these have been deposited in GenBank without annotations, severely limiting their value to researchers. While maintaining comprehensive genomic databases for a set of virus families at the Viral Bioinformatics Resource Center http://www.biovirus.org and Viral Bioinformatics – Canada http://www.virology.ca, we found that researchers were unnecessarily spending time annotating viral genomes that were close relatives of already annotated viruses. We have therefore designed and implemented a novel tool, Genome Annotation Transfer Utility (GATU), to transfer annotations from a previously annotated reference genome to a new target genome, thereby greatly reducing this laborious task.

208 citations


Proceedings ArticleDOI
23 Oct 2006
TL;DR: A novel approach to automatically refine the original annotations of images by using Random Walk with Restarts (RWR) to leverage both the corpus information and the original confidence information of the annotations.
Abstract: Image annotation plays an important role in image retrieval and management. However, the results of the state-of-the-art image annotation methods are often unsatisfactory. Therefore, it is necessary to refine the imprecise annotations obtained by existing annotation methods. In this paper, a novel approach to automatically refine the original annotations of images is proposed. On the one hand, for Web images, textual information, e.g. file name and surrounding text, is used to retrieve a set of candidate annotations. On the other hand, for non-Web images that are lack of textual information, a relevance model-based algorithm using visual information is used to decide the candidate annotations. Then, candidate annotations are re-ranked and only the top ones are reserved as the final annotations. To re-rank the annotations, an algorithm using Random Walk with Restarts (RWR) is proposed to leverage both the corpus information and the original confidence information of the annotations. Experimental results on both non-Web images of Corel dataset and Web images of photo forum sites demonstrate the effectiveness of the proposed method.

205 citations


Proceedings Article
01 May 2006
TL;DR: The frame-semantic annotation framework and its cross-lingual applicability, problems arising from exhaustive annotation, strategies for quality control, and possible applications are discussed.
Abstract: This paper describes the SALSA corpus, a large German corpus manually annotated with manual role-semantic annotation, based on the syntactically annotated TIGER newspaper corpus. The first release, comprising about 20,000 annotated predicate instances (about half the TIGER corpus), is scheduled for mid-2006. In this paper we discuss the annotation framework (frame semantics) and its cross-lingual applicability, problems arising from exhaustive annotation, strategies for quality control, and possible applications.

194 citations


Journal ArticleDOI
TL;DR: The FANTOM3 annotation system, consisting of automated computational prediction, manualCuration, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.
Abstract: The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.

193 citations


Proceedings ArticleDOI
Philip V. Ogren1
04 Jun 2006
TL;DR: A general-purpose text annotation tool called Knowtator is introduced that facilitates the manual creation of annotated corpora that can be used for evaluating or training a variety of natural language processing systems.
Abstract: A general-purpose text annotation tool called Knowtator is introduced. Knowtator facilitates the manual creation of annotated corpora that can be used for evaluating or training a variety of natural language processing systems. Building on the strengths of the widely used Protege knowledge representation system, Knowtator has been developed as a Protege plug-in that leverages Protege's knowledge representation capabilities to specify annotation schemas. Knowtator's unique advantage over other annotation tools is the ease with which complex annotation schemas (e.g. schemas which have constrained relationships between annotation types) can be defined and incorporated into use. Knowtator is available under the Mozilla Public License 1.1 at http://bionlp.sourceforge.net/Knowtator.

193 citations


Journal ArticleDOI
TL;DR: The results of the inquiry into properties of scientific text that have sufficient generality to transcend the confines of a narrow subject area are reported, while supporting practical mining of text for factual information.
Abstract: While biomedical text mining is emerging as an important research area, practical results have proven difficult to achieve. We believe that an important first step towards more accurate text-mining lies in the ability to identify and characterize text that satisfies various types of information needs. We report here the results of our inquiry into properties of scientific text that have sufficient generality to transcend the confines of a narrow subject area, while supporting practical mining of text for factual information. Our ultimate goal is to annotate a significant corpus of biomedical text and train machine learning methods to automatically categorize such text along certain dimensions that we have defined. We have identified five qualitative dimensions that we believe characterize a broad range of scientific sentences, and are therefore useful for supporting a general approach to text-mining: focus, polarity, certainty, evidence, and directionality. We define these dimensions and describe the guidelines we have developed for annotating text with regard to them. To examine the effectiveness of the guidelines, twelve annotators independently annotated the same set of 101 sentences that were randomly selected from current biomedical periodicals. Analysis of these annotations shows 70–80% inter-annotator agreement, suggesting that our guidelines indeed present a well-defined, executable and reproducible task. We present our guidelines defining a text annotation task, along with annotation results from multiple independently produced annotations, demonstrating the feasibility of the task. The annotation of a very large corpus of documents along these guidelines is currently ongoing. These annotations form the basis for the categorization of text along multiple dimensions, to support viable text mining for experimental results, methodology statements, and other forms of information. We are currently developing machine learning methods, to be trained and tested on the annotated corpus, that would allow for the automatic categorization of biomedical text along the general dimensions that we have presented. The guidelines in full detail, along with annotated examples, are publicly available.

Proceedings ArticleDOI
01 May 2006
TL;DR: MASI, distance metric for comparing sets, is discussed and its use in quantifying the reliability of a specific dataset is illustrated, and it is argued that a paradigmatic reliability study should relate measures of inter-annotator agreement to independent assessments, such as significance tests of the annotated variables with respect to other phenomena.
Abstract: Annotation projects dealing with complex semantic or pragmatic phenomena face the dilemma of creating annotation schemes that oversimplify the phenomena, or that capture distinctions conventional reliability metrics cannot measure adequately. The solution to the dilemma is to develop metrics that quantify the decisions that annotators are asked to make. This paper discusses MASI, distance metric for comparing sets, and illustrates its use in quantifying the reliability of a specific dataset. Annotations of Summary Content Units (SCUs) generate models referred to as pyramids which can be used to evaluate unseen human summaries or machine summaries. The paper presents reliability results for five pairs of pyramids created for document sets from the 2003 Document Understanding Conference (DUC). The annotators worked independently of each other. Differences between application of MASI to pyramid annotation and its previous application to co-reference annotation are discussed. In addition, it is argued that a paradigmatic reliability study should relate measures of inter-annotator agreement to independent assessments, such as significance tests of the annotated variables with respect to other phenomena. In effect, what counts as sufficiently reliable intera-annotator agreement depends on the use the annotated data will be put to.

Book ChapterDOI
TL;DR: This chapter details the systems used for the deposition, annotation and distribution of the data in the Protein Data Bank archive.
Abstract: In 1998, members of the Research Collaboratory for Structural Bioinformatics became the managers of the Protein Data Bank archive. This chapter details the systems used for the deposition, annotation and distribution of the data in the archive. Keywords: databases; nuclear magnetic resonance; NMR; Protein Data Bank; structure validation

Patent
15 May 2006
TL;DR: In this paper, a facility for annotating media files is described, which displays a timeline indicating a duration of the media file, determines that an annotation is associated with the file, and displays in an area near the timeline an indication of the associated annotation.
Abstract: A facility for annotating media files is described. In various embodiments, the facility displays a timeline indicating a duration of the media file, determines that an annotation is associated with the media file, and displays in an area near the timeline an indication of the associated annotation. In various embodiments, the facility displays a timeline indicative of a duration of the media file, receives an indication to add an annotation at an annotation time relative to the duration of the timeline, receives and stores an annotation, associates the annotation with the annotation time, and displays an indication of the stored annotation at an area near the timeline.

Proceedings ArticleDOI
03 Apr 2006
TL;DR: An annotation-oriented data model for the manipulation and querying of both data and annotations, which allows for the specification of annotations on sets of values and for effectively querying the information on their association is introduced.
Abstract: Annotations play a central role in the curation of scientific databases. Despite their importance, data formats and schemas are not designed to manage the increasing variety of annotations. Moreover, DBMS’s often lack support for storing and querying annotations. Furthermore, annotations and data are only loosely coupled. This paper introduces an annotation-oriented data model for the manipulation and querying of both data and annotations. In particular, the model allows for the specification of annotations on sets of values and for effectively querying the information on their association. We use the concept of block to represent an annotated set of values. Different colors applied to the blocks represent different annotations. We introduce a color query language for our model and prove it to be both complete (it can express all possible queries over the class of annotated databases), and minimal (all the algebra operators are primitive). We present MONDRIAN, a prototype implementation of our annotation mechanism, and we conduct experiments that investigate the set of parameters which influence the evaluation cost for color queries.

Proceedings Article
01 May 2006
TL;DR: Talbanken05, a Swedish treebank based on a syntactically annotated corpus from the 1970s, Talbanken76, converted to modern formats is introduced.
Abstract: We introduce Talbanken05, a Swedish treebank based on a syntactically annotated corpus from the 1970s, Talbanken76, converted to modern formats. The treebank is available in three different formats, besides the original one: two versions of phrase structure annotation and one dependency-based annotation, all of which are encoded in XML. In this paper, we describe the conversion process and exemplify the available formats. The treebank is freely available for research and educational purposes.

Proceedings Article
01 May 2006
TL;DR: The paper presents the initial release of the Slovene Dependency Treebank, currently containing 2000 sentences or 30.000 words, based on the Prague Dependency treebank, which serves as an excellent model for annotation due to the similarity of the languages, the existence of a detailed annotation guide and an annotation editor.
Abstract: The paper presents the initial release of the Slovene Dependency Treebank, currently containing 2000 sentences or 30.000 words. Our approach to annotation is based on the Prague Dependency Treebank, which serves as an excellent model due to the similarity of the languages, the existence of a detailed annotation guide and an annotation editor. The initial treebank contains a portion of the MULTEXT-East parallel word-level annotated corpus, namely the first part of the Slovene translation of Orwell’s “1984”. This corpus was first parsed automatically, to arrive at the initial analytic level dependency trees. These were then hand corrected using the tree editor TrEd; simultaneously, the Czech annotation manual was modified for Slovene. The current version is available in XML/TEI, as well as derived formats, and has been used in a comparative evaluation using the MALT parser, and as one of the languages present in the CoNLL-X shared task on dependency parsing. The paper also discusses further work, in the first instance the composition of the corpus to be annotated next.

Patent
18 Jan 2006
TL;DR: In this paper, the authors present a system for receiving and distributing annotations of a digital work (e.g., text, graphical, or textual annotations) and displaying indicators to identify content in the digital work for which annotations are available.
Abstract: Methods and systems for receiving and distributing annotations of a digital work (84) include receiving an annotation of the digital work (84), storing the annotation, and providing the annotation to the user. The user may be required to submit a valid authorization credential (90) for the annotation. Annotations may be textual or graphical, and may be associated with particular content in a digital work. Indicators may be displayed to identify content in the digital work for which annotations are available. A user may exchange compensation or perform a specified action for access to an annotation. Some or all of the compensation received for an annotation may be distributed to the author of the annotation. Multiple annotations may be listed in an order based a criterion, such as ranking, price, or date of receipt. Users that purchase a digital work may automatically receive an authorization credential (90)to receive annotations of the digital work.

Journal ArticleDOI
01 Sep 2006-Yeast
TL;DR: The ways in which the S. cerevisiae sequence and annotation have changed are discussed, the multiple sources of experimental and comparative data on which these changes are based are considered, and the methods for evaluating, incorporating and documenting these new data are described.
Abstract: The S. cerevisiae genome is the most well-characterized eukaryotic genome and one of the simplest in terms of identifying open reading frames (ORFs), yet its primary annotation has been updated continually in the decade since its initial release in 1996 (Goffeau et al., 1996). The Saccharomyces Genome Database (SGD; www.yeastgenome.org) (Hirschman et al., 2006), the community-designated repository for this reference genome, strives to ensure that the S. cerevisiae annotation is as accurate and useful as possible. At SGD, the S. cerevisiae genome sequence and annotation are treated as a working hypothesis, which must be repeatedly tested and refined. In this paper, in celebration of the tenth anniversary of the completion of the S. cerevisiae genome sequence, we discuss the ways in which the S. cerevisiae sequence and annotation have changed, consider the multiple sources of experimental and comparative data on which these changes are based, and describe our methods for evaluating, incorporating and documenting these new data.

Proceedings ArticleDOI
26 Oct 2006
TL;DR: This paper proposes a novel automatic image annotation method based on manifold ranking learning, in which the visual and textual information are well integrated, and designs a new scheme named the Nearest Spanning Chain (NSC) to generate an adaptive similarity graph.
Abstract: Automatic keyword annotation is a promising solution to enable more effective image search by using keywords. In this paper, we propose a novel automatic image annotation method based on manifold ranking learning, in which the visual and textual information are well integrated. Due to complex and unbalanced data distribution and limited prior information in practice, we design two new schemes to make manifold ranking efficient for image annotation. Firstly, we design a new scheme named the Nearest Spanning Chain (NSC) to generate an adaptive similarity graph, which is robust across data distribution and easy to implement. Secondly, the word-to-word correlations obtained from WordNet and the pairwise co-occurrence are taken into consideration to expand the annotations and prune irrelevant annotations for each image. Experiments conducted on standard Corel dataset and web image dataset demonstrate the effectiveness and efficiency of the proposed method for image annotation.

Journal ArticleDOI
TL;DR: Virtual Ribosome is a DNA translation tool with two areas of focus: providing a strong translation tool in its own right, with an integrated ORF finder, and integration of sequences feature annotation—in particular, native support for working with files containing intron/exon structure annotation.
Abstract: Virtual Ribosome is a DNA translation tool with two areas of focus. (i) Providing a strong translation tool in its own right, with an integrated ORF finder, full support for the IUPAC degenerate DNA alphabet and all translation tables defined by the NCBI taxonomy group, including the use of alternative start codons. (ii) Integration of sequences feature annotation—in particular, native support for working with files containing intron/exon structure annotation. The software is available for both download and online use at http://www.cbs.dtu.dk/services/ VirtualRibosome/.

Journal ArticleDOI
TL;DR: Some of the recent and popular approaches developed in Bioinformatics to predict functions for hypothetical proteins are discussed, including automated genome sequence analysis and annotation.
Abstract: The complete human genome sequences in the public database provide ways to understand the blue print of life. As of June 29, 2006, 27 archaeal, 326 bacterial and 21 eukaryotes is complete genomes are available and the sequencing for 316 bacterial, 24 archaeal, 126 eukaryotic genomes are in progress. The traditional biochemical/molecular experiments can assign accurate functions for genes in these genomes. However, the process is time-consuming and costly. Despite several efforts, only 50-60 % of genes have been annotated in most completely sequenced genomes. Automated genome sequence analysis and annotation may provide ways to understand genomes. Thus, determination of protein function is one of the challenging problems of the post-genome era. This demands bioinformatics to predict functions of un-annotated protein sequences by developing efficient tools. Here, we discuss some of the recent and popular approaches developed in Bioinformatics to predict functions for hypothetical proteins.

Journal ArticleDOI
TL;DR: AGMIAL as discussed by the authors is a genome annotation system for prokaryotes based on W3 Web Services framework and is used by several INRA laboratories to analyze genomes of bacteria relevant to the food-processing industry.
Abstract: We have implemented a genome annotation system for prokaryotes called AGMIAL. Our approach embodies a number of key principles. First, expert manual annotators are seen as a critical component of the overall system; user interfaces were cyclically refined to satisfy their needs. Second, the overall process should be orchestrated in terms of a global annotation strategy; this facilitates coordination between a team of annotators and automatic data analysis. Third, the annotation strategy should allow progressive and incremental annotation from a time when only a few draft contigs are available, to when a final finished assembly is produced. The overall architecture employed is modular and extensible, being based on the W3 standard Web services framework. Specialized modules interact with two independent core modules that are used to annotate, respectively, genomic and protein sequences. AGMIAL is currently being used by several INRA laboratories to analyze genomes of bacteria relevant to the food-processing industry, and is distributed under an open source license.

Journal ArticleDOI
TL;DR: An ontology-based approach to automatic annotation of learning objects’ (LOs) content units that is tested in TANGRAM, an integrated learning environment for the domain of Intelligent Information Systems and provides a solution for automatic metadata generation for LOs components.
Abstract: This paper presents an ontology-based approach to automatic annotation of learning objects’ (LOs) content units that we tested in TANGRAM, an integrated learning environment for the domain of Intelligent Information Systems. The approach does not primarily focus on automatic annotation of entire LOs, as other relevant solutions do. Instead, it provides a solution for automatic metadata generation for LOs’ components (i.e., smaller, potentially reusable, content units). Here we mainly report on the content-mining algorithms and heuristics applied for determining values of certain metadata elements used to annotate content units. Specifically, the focus is on the following elements: title, description, unique identifier, subject (based on a domain ontology), and pedagogical role (based on an ontology of pedagogical roles). Additionally, as TANGRAM is grounded on an LO content structure ontology that drives the process of an LO decomposition into its constituent content units, each thus generated content unit is implicitly semantically annotated with its role/position in the LO’s structure. Employing such semantic annotations, TANGRAM allows assembling content units into new LOs personalized to the users’ goals, preferences, and learning styles. In order to provide the evaluation of the proposed solution, we describe our experiences with automatic annotation of slide presentations, one of the most common LO types.

Proceedings Article
01 Jan 2006
TL;DR: A set of music-scene descriptions consisting of the beat structure, melody line, and chorus sections of the RWC Music Database are annotated, and the AIST Annotation is called.
Abstract: In this paper, we introduce our activities regarding the manual annotation of the musical pieces of the RWC Music Database. Although the RWC Music Database is widely used, its annotated descriptions are not widely available. We therefore annotated a set of music-scene descriptions consisting of the beat structure, melody line, and chorus sections. We call this AIST Annotation. We also manually synchronized standard MIDI files with the corresponding audio signals at the beat level. We hope that the AIST Annotation will contribute to further advances in the field of music information processing.

Proceedings Article
01 May 2006
TL;DR: The SALTO tool was originally developed for the annotation of semantic roles in the frame semantics paradigm, but can be used for graphical annotation of treebanks with general relational information in a simple drag-and-drop fashion.
Abstract: In this paper, we describe the SALTO tool. It was originally developed for the annotation of semantic roles in the frame semantics paradigm, but can be used for graphical annotation of treebanks with general relational information in a simple drag-and-drop fashion. The tool additionally supports corpus management and quality control.

Journal ArticleDOI
TL;DR: The evaluation of the prototype system on 17,000 images and 7736 automatically extracted annotation words from crawled Web pages for multi-modal image retrieval has indicated that the proposed semantic model and the developed Bayesian framework are superior to a state-of-the-art peer system in the literature.
Abstract: This paper addresses automatic image annotation problem and its application to multi-modal image retrieval. The contribution of our work is three-fold. (1) We propose a probabilistic semantic model in which the visual features and the textual words are connected via a hidden layer which constitutes the semantic concepts to be discovered to explicitly exploit the synergy among the modalities. (2) The association of visual features and textual words is determined in a Bayesian framework such that the confidence of the association can be provided. (3) Extensive evaluation on a large-scale, visually and semantically diverse image collection crawled from Web is reported to evaluate the prototype system based on the model. In the proposed probabilistic model, a hidden concept layer which connects the visual feature and the word layer is discovered by fitting a generative model to the training image and annotation words through an Expectation-Maximization (EM) based iterative learning procedure. The evaluation of the prototype system on 17,000 images and 7736 automatically extracted annotation words from crawled Web pages for multi-modal image retrieval has indicated that the proposed semantic model and the developed Bayesian framework are superior to a state-of-the-art peer system in the literature.

Journal ArticleDOI
TL;DR: This work presents similarities and differences with respect to other approaches for metadata creation, and describes languages and tools that can be used to implement these annotations.
Abstract: Metadata is used to describe documents and applications, improving information seeking and retrieval and its understanding and use. Metadata can be expressed in a wide variety of vocabularies and languages, and can be created and maintained with a variety of tools. Ontology based annotation refers to the process of creating metadata using ontologies as their vocabularies. We present similarities and differences with respect to other approaches for metadata creation, and describe languages and tools that can be used to implement these annotations.

Book Chapter
01 May 2006
TL;DR: The Photocopain system is described, a semi-automatic image annotation system which combines information about the context in which a photograph was captured with information from other readily available sources in order to generate outline annotations for that photograph that the user may further extend or amend.
Abstract: Photo annotation is a resource-intensive task, yet is increasingly essential as image archives and personal photo collections grow in size. There is an inherent conflict in the process of describing and archiving personal experiences, because casual users are generally unwilling to expend large amounts of effort on creating the annotations which are required to organise their collections so that they can make best use of them. This paper describes the Photocopain system, a semi-automatic image annotation system which combines information about the context in which a photograph was captured with information from other readily available sources in order to generate outline annotations for that photograph that the user may further extend or amend.

Journal ArticleDOI
10 Jul 2006
TL;DR: An ontology driven approach to the representation of all the Robot Scientist's data and metadata is applied, based on a general ontology of experiments, which aids the curation and annotating of the experimental data andadata, and the equipment metadata, and supports the design of database systems to hold the data andmetadata.
Abstract: Motivation: A Robot Scientist is a physically implemented robotic system that can automatically carry out cycles of scientific experimentation. We are commissioning a new Robot Scientist designed to investigate gene function in S. cerevisiae. This Robot Scientist will be capable of initiating >1,000 experiments, and making >200,000 observations a day. Robot Scientists provide a unique test bed for the development of methodologies for the curation and annotation of scientific experiments: because the experiments are conceived and executed automatically by computer, it is possible to completely capture and digitally curate all aspects of the scientific process. This new ability brings with it significant technical challenges. To meet these we apply an ontology driven approach to the representation of all the Robot Scientist’s data and metadata. Results: We demonstrate the utility of developing an ontology for our new Robot Scientist. This ontology is based on a general ontology of experiments. The ontology aids the curation and annotating of the experimental data and metadata, and the equipment metadata, and supports the design of database systems to hold the data and metadata. Availability: EXPO in XML and OWL formats is at:. All materials about the Robot Scientist project are available at: . Contact: [email protected]