scispace - formally typeset
Search or ask a question

Showing papers on "Annotation published in 2001"


Proceedings ArticleDOI
01 Apr 2001
TL;DR: The paper presents the overall design of Annotea and describes some of the issues the project faced and how it has solved them, including combining RDF with XPointer, XLink, and HTTP.
Abstract: Annotea is a Web-based shared annotation system based on a general-purpose open RDF infrastructure, where annotations are modeled as a class of metadata. Annotations are viewed as statements made by an author about a Web document. Annotations are external to the documents and can be stored in one or more annotation servers. One of the goals of this project has been to re-use as much existing W3C technology as possible. We have reached it mostly by combining RDF with XPointer, XLink, and HTTP. We have also implemented an instance of our system using the Amaya editor/browser and a generic RDF database, accessible through an Apache HTTP server. In this implementation, the merging of annotations with documents takes place within the client. The paper presents the overall design of Annotea and describes some of the issues we have faced and how we have solved them.

765 citations


Journal ArticleDOI
TL;DR: The Comprehensive Microbial Resource (http://cmr.jcvi.org) provides a web-based central resource for the display, search and analysis of the sequence and annotation for complete and publicly available bacterial and archaeal genomes as mentioned in this paper.
Abstract: The Comprehensive Microbial Resource or CMR (http://cmr.jcvi.org) provides a web-based central resource for the display, search and analysis of the sequence and annotation for complete and publicly available bacterial and archaeal genomes. In addition to displaying the original annotation from GenBank, the CMR makes available secondary automated structural and functional annotation across all genomes to provide consistent data types necessary for effective mining of genomic data. Precomputed homology searches are stored to allow meaningful genome comparisons. The CMR supplies users with over 50 different tools to utilize the sequence and annotation data across one or more of the 571 currently available genomes. At the gene level users can view the gene annotation and underlying evidence. Genome level information includes whole genome graphical displays, biochemical pathway maps and genome summary data. Comparative tools display analysis between genomes with homology and genome alignment tools, and searches across the accessions, annotation, and evidence assigned to all genes/genomes are available. The data and tools on the CMR aid genomic research and analysis, and the CMR is included in over 200 scientific publications. The code underlying the CMR website and the CMR database are freely available for download with no license restrictions.

502 citations


Journal ArticleDOI
TL;DR: The Distributed Annotation System (DAS) allows sequence annotations to be decentralized among multiple third-party annotators and integrated on an as-needed basis by client-side software.
Abstract: Background Currently, most genome annotation is curated by centralized groups with limited resources. Efforts to share annotations transparently among multiple groups have not yet been satisfactory.

442 citations


Book ChapterDOI
12 Mar 2001
TL;DR: Houdini is presented, an annotation assistant for the modular checker ESC/Java, which generates a large number of candidate annotations and uses ESC/ Java to verify or refute each of these annotations.
Abstract: A static program checker that performs modular checking can check one program module for errors without needing to analyze the entire program. Modular checking requires that each module be accompanied by annotations that specify the module. To help reduce the cost of writing specifications, this paper presents Houdini, an annotation assistant for the modular checker ESC/Java. To infer suitable ESC/Java annotations for a given program, Houdini generates a large number of candidate annotations and uses ESC/Java to verify or refute each of these annotations. The paper describes the design, implementation, and preliminary evaluation of Houdini.

423 citations


Journal ArticleDOI
TL;DR: A wide variety of existing annotation formats are surveyed and a common conceptual core, the annotation graph, is demonstrated, which provides a formal framework for constructing, maintaining and searching linguistic annotations, while remaining consistent with many alternative data structures and file formats.

398 citations


Journal ArticleDOI
TL;DR: The aim of high-quality annotation is to identify the key features of the genome — in particular, the genes and their products.
Abstract: The genome sequence of an organism is an information resource unlike any that biologists have previously had access to. But the value of the genome is only as good as its annotation. It is the annotation that bridges the gap from the sequence to the biology of the organism. The aim of high-quality annotation is to identify the key features of the genome - in particular, the genes and their products. The tools and resources for annotation are developing rapidly, and the scientific community is becoming increasingly reliant on this information for all aspects of biological research.

370 citations


Patent
28 Sep 2001
TL;DR: In this paper, a data structure for annotating data files within a database is provided, which comprises a phoneme and word lattice which allows the quick and efficient searching of data files in response to a user's input query.
Abstract: A data structure is provided for annotating data files within a database. The annotation data comprises a phoneme and word lattice which allows the quick and efficient searching of data files within the database in response to a user's input query. The structure of the annotation data is such that it allows the input query to be made by voice and can be used for annotating various kinds of data files, such as audio data files, video data files, multimedia data files etc. The annotation data may be generated from the data files themselves or may be input by the user either from a voiced input or from a typed input.

314 citations


Journal ArticleDOI
TL;DR: The number of potential errors in the prediction of detailed functions is higher than is usually believed in the annotation of microbial genomes.

307 citations


Proceedings Article
30 Jul 2001
TL;DR: CREAM (Creating RElational, Annotation-based Metadata), a framework for an annotation environment that allows to construct relational metadata, i.e. metadata that comprises class instances and relationship instances, is presented.
Abstract: Richly interlinked, machine-understandable data constitutes the basis for the Semantic Web. Annotating web documents is one of the major techniques for creating metadata on the Web. However, annotation tools so far are restricted in their capabilities of providing richly interlinked and truely machine-understandable data. They basically allow the user to annotate with plain text according to a template structure, such as Dublin Core. We here present CREAM (Creating RElational, Annotation-based Metadata), a framework for an annotation environment that allows to construct relational metadata, i.e. metadata that comprises class instances and relationship instances. These instances are not based on a fix structure, but on a domain ontology. We discuss some of the requirements one has to meet when developing such a framework, e.g. the integration of a metadata crawler, inference services, document management and information extraction, and describe its implementation, viz. Ont-O-Mat a component-based, ontology-driven annotation tool.

203 citations


Proceedings Article
01 Jan 2001
TL;DR: A novel approach to semi-automatically and progressively annotating images with keywords is presented, and a preliminary user study is described showing that users view annotations as important and will likely use them in image retrieval.
Abstract: A novel approach to semi-automatically and progressively annotating images with keywords is presented. The progressive annotation process is embedded in the course of integrated keyword-based and content-based image retrieval and user feedback. When the user submits a keyword query and then provides relevance feedback, the search keywords are automatically added to the images that receive positive feedback and can then facilitate keyword-based image retrieval in the future. The coverage and quality of image annotation in such a database system is improved progressively as the cycle of search and feedback increases. The strategy of semi-automatic image annotation is better than manual annotation in terms of efficiency and better than automatic annotation in terms of accuracy. A performance study is presented which shows that high annotation coverage can be achieved with this approach, and a preliminary user study is described showing that users view annotations as important and will likely use them in image retrieval. The user study also suggested user interface enhancements needed to support relevance feedback. We believe that similar approaches could also be applied to annotating and managing other forms of multimedia objects.

183 citations


Journal ArticleDOI
TL;DR: The statistical evaluation of the generated rules by cross-validation suggests that by applying them on arbitrary proteins 33% of their keyword annotation can be generated with an error rate of 1.5%, and the coverage rate can be increased to 60% by tolerating a higher error rate.
Abstract: Motivation: The gap between the amount of newly submitted protein data and reliable functional annotation in public databases is growing. Traditional manual annotation by literature curation and sequence analysis tools without the use of automated annotation systems is not able to keep up with the ever increasing quantity of data that is submitted. Automated supplements to manually curated databases such as TrEMBL or GenPept cover raw data but provide only limited annotation. To improve this situation automatic tools are needed that support manual annotation, automatically increase the amount of reliable information and help to detect inconsistencies in manually generated annotations. Results: A standard data mining algorithm was successfully applied to gain knowledge about the Keyword annotation in SWISS-PROT. 11 306 rules were generated, which are provided in a database and can be applied to yet unannotated protein sequences and viewed using a web browser. They rely on the taxonomy of the organism, in which the protein was found and on signature matches of its sequence. The statistical evaluation of the generated rules by cross-validation suggests that by applying them on arbitrary proteins 33% of their keyword annotation can be generated with an error rate of 1.5%. The coverage rate of the keyword annotation can be increased to 60% by tolerating a higher error rate of 5%. Availability: The results of the automatic data mining process can be browsed on http://golgi.ebi.ac.uk:8080/ Spearmint/ Source code is available upon request.

Proceedings ArticleDOI
01 Mar 2001
TL;DR: How users react to lost annotations, the relationship between types of document modifications and user expectations, and whether users pay attention to text surrounding their annotations are explored, which could contribute substantially to effective digital document annotation systems.
Abstract: Increasingly, documents exist primarily in digital form. System designers have recently focused on making it easier to read digital documents, with annotation as an important new feature. But supporting annotation well is difficult because digital documents are frequently modified, making it challenging to correctly reposition annotations in modified versions. Few systems have addressed this issue, and even fewer have approached the problem from the users' point of view. This paper reports the results of two studies examining user expectations for robust annotation positioning in modified documents. We explore how users react to lost annotations, the relationship between types of document modifications and user expectations, and whether users pay attention to text surrounding their annotations. Our results could contribute substantially to effective digital document annotation systems.

Patent
30 Aug 2001
TL;DR: In this article, the authors proposed a system and method for adding hyperlinked information to television broadcast system, which includes a video source providing video information and an annotation system, and an augmented video information transmission generator that receives the annotation data, the video information, and annotation data timing information.
Abstract: The invention features a system and method for adding hyperlinked information to television broadcast system. The system includes a video source providing video information and an annotation system. The annotation system generates annotation data to be associated with the video information and generates annotation data timing information. The hyperlinked broadcast system also includes an augmented video information transmission generator that receives the annotation data, the video information, and the annotation data timing information. The augmented video information transmission generator generates an augmented video transmission signal including the annotation data, the annotation data timing information and the video information. In operation, the augmented video information transmission generator associates the video information with the annotation data using the annotation data timing information. A receiver displays the annotation information associated with the video signal on a frame by frame basis.

Patent
28 Sep 2001
TL;DR: In this article, a method for integrated retrieval and annotation of stored images involves running a user application in which text entered by a user is continuously monitored to isolate the context expressed by the text.
Abstract: A method for integrated retrieval and annotation of stored images involves running a user application in which text entered by a user is continuously monitored to isolate the context expressed by the text. The context is matched with metadata associated with the stored images, thereby providing one or more matched images, and the matched images are retrieved and displayed in proximity with the text. The context is then utilized to provide suggested annotations to the user for the matched images, together with the capability of selecting certain of the suggested annotations for subsequent association with the matched images. In a further extension, the method provides the user with the capability of inserting selected ones of the matched images into the text of the application, and further provides for automatically updating the metadata for the matched images.

Patent
07 Dec 2001
TL;DR: In this paper, a collaborative annotation system for facilitating annotations of time-based media, such as video, by users is disclosed, which involves displaying and controlling the display of a time based medium, and receiving and storing input for defining a location in the time based media, and performing and storing a valuation relating to the annotation.
Abstract: A collaborative annotation system for facilitating annotations, such as commentaries, of time-based media, such as video, by users is disclosed. The system involves displaying and controlling the display of a time-based medium, and receiving and storing input for defining a location in the time-based medium. The system also involves receiving and storing an annotation relating to the context of the location, and performing and storing a valuation relating to the annotation.

Patent
13 Aug 2001
TL;DR: In this paper, a system providing a user interface to annotate different items in a media production system such as in a digital non-linear post production system is described, where parts of the production, such as clips, frames and layers, that have an associated annotation are provided with a visual annotation marker.
Abstract: A system providing a user interface to annotate different items in a media production system such as in a digital non-linear post production system. Parts of the production, such as clips, frames and layers, that have an associated annotation are provided with a visual annotation marker. The annotation marker can use shape, color or animation to convey source, urgency, status or other information. Annotations can be text, freehand drawing, audio, or other. Annotations can be automatically generated. Annotations can be compiled into records, searched and transferred. A state of an application program can be stored and transferred to a remote system. The remote system attempts to recreate the original state of the application program. If the remote system is unable to do so, an image of the state of the application program is obtained, instead. Assignment of control to various functions of an application program is achieved by associating a function (i.e., modifying a parameter) with a user control at a remote location.

Patent
06 Jun 2001
TL;DR: In this paper, a trace address register is configured to store an address corresponding to a trace unit, and an annotation enable bit is used to indicate whether annotation transactions are to be generated.
Abstract: A method and mechanism for annotating a transaction stream. A processing unit is configured to generate annotation transactions which are inserted into a transaction stream. The transaction stream, including the annotations, are subsequently observed by a trace unit for debug or other analysis. In one embodiment, a processing unit includes a trace address register and an annotation enable bit. The trace address register is configured to store an address corresponding to a trace unit and the enable bit is configured to indicate whether annotation transactions are to be generated. Annotation instructions are added to operating system or user code at locations where annotations are desired. In one embodiment, annotation transactions correspond to transaction types which are not unique to annotation transactions. In one embodiment, an annotation instruction includes a reference to the trace address register which contains the address of the trace unit. Upon detecting the annotation instruction, and detecting annotations are enabled, the processing unit generates an annotation transaction addressed to the trace unit. In one embodiment, annotation transactions may be used to indicate context switches, processor mode changes, timestamps, or address translation information.

Patent
06 Apr 2001
TL;DR: In this paper, a method and apparatus for annotating an image and a plurality of icons is disclosed. And each icon is associated with metadata, such that the metadata associated with the selected icon is stored as an annotation of the image.
Abstract: A method and apparatus for annotating an image ( 407 ) is disclosed. The image ( 407 ) and a plurality of icons ( 403 ) are displayed such that each icon is associated with metadata. At least one of the icons is selected depending on at least one subject of the image ( 407 ) and the metadata associated with the selected icon is stored as an annotation of the subject of the image.

Proceedings Article
01 Jan 2001
TL;DR: A Semantic Annotation Tool for extraction of knowledge structures from web pages through the use of simple user-defined knowledge extraction patterns and to provide support for ontology population by using the information extraction component.
Abstract: This paper describes a Semantic Annotation Tool for extraction of knowledge structures from web pages through the use of simple user-defined knowledge extraction patterns. The semantic annotation tool contains: an ontology-based mark-up component which allows the user to browse and to mark-up relevant pieces of information; a learning component (Crystal from the University of Massachusetts at Amherst) which learns rules from examples and an information extraction component which extracts the objects and relation between these objects. Our final aim is to provide support for ontology population by using the information extraction component. Our system uses as domain of study “KMi Planet”, a Webbased news server that helps to communicate relevant information between members in our institute.

Journal ArticleDOI
TL;DR: The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database.
Abstract: The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classificationdriven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively crossreferenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-International databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP. The Protein Information Resource (PIR) for over three decades has been a community resource that provides protein databases and analysis tools to support research on molecular evolution, functional genomics and computational biology. The PIR, along with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), maintains and distributes the PIR-International Protein Sequence Database, the most comprehensive, well-annotated and non-redundant protein sequence database in the public domain. To further support genomic and proteomic research, we have greatly improved our bioinformatics infrastructure in the last 2 years, which allows us: (i) to continue to provide high quality protein sequence data and annotation, while keeping pace with the large influx of data being generated by genome sequencing projects; (ii) to develop an integrated system of protein databases and analytical tools for expert annotation and knowledge discovery; and (iii) to improve accessibility of our resource and interoperability of our databases. Some key developments include: highly-automated protein sequence classification and annotation, enhanced web site with many new search engines and functionality for protein data mining and analysis, a new integrated classification database that provides comprehensive descriptions of family relationships and functional/structural annotations, database migration into Oracle 8i object-relational database system and database distribution in XML format.

Journal ArticleDOI
TL;DR: The MATE workbench is a program which provides support for the annotation of speech and text, and provides facilities for flexible display and editing of such annotations, and complex querying of a resulting corpus.

Proceedings Article
01 Jan 2001
TL;DR: The development of a fast, robust and highly usable annotation tool for the annotation of XMLencoded multi-modal language corpora was a major objective of the work presented.
Abstract: We present a tool for the annotation of XMLencoded multi-modal language corpora. Nonhierarchical data is supported by means of standoff annotation. We define base level and suprabase level elements and theory-independent markables for multi-modal annotation and apply them to a cospecification annotation scheme. We also describe how arbitrary annotation schemes can be represented in terms of these elements. Apart from theoretical considerations, however, the development of a fast, robust and highly usable annotation tool was a major objective of the work presented.

Journal ArticleDOI
TL;DR: The Open Archives Initiative (OAI) as discussed by the authors is dedicated to solving problems of digital library interoperability, focusing on defining simple protocols, most recently for the exchange of metadata from archives.
Abstract: Summary The Open Archives Initiative (OAI) is dedicated to solving problems of digital library interoperability. Its focus has been on defining simple protocols, most recently for the exchange of metadata from archives. The OAI evolved out of a need to increase access to scholarly publications by supporting the creation of interoperable digital libraries. As a first step towards such interoperability, a metadata harvesting protocol was developed to support the streaming of metadata from one repository to another, ultimately to a provider of user services such as browsing, searching, or annotation. This article provides an overview of the mission, philosophy, and technical framework of the OAI.

Patent
04 Jun 2001
TL;DR: In this paper, a system and method which uses metadata to create an association between key words in textual files or files containing text; key objects in image files or pictures; and key names associated with textual files associated with text, image files and picture files and the files or their file names is presented.
Abstract: The present invention is directed to a system and method which uses metadata to create an association between key words in textual files or files containing text; key objects in image files or pictures; and key names associated with textual files, files containing text, image files and picture files and the files or their file names Key words in textual files or files containing text can be identified by the user or through semantics processing Key objects in image and picture files can be identified by the user or through object recognition software Key names in textual files, files containing text, image files and picture files are identified by a narrative or other spoken words given by the user to the processing system with respect to specific pictures

Proceedings ArticleDOI
Milind Naphade1, Ching-Yung Lin1, John R. Smith1, Belle L. Tseng1, Sankar Basu1 
TL;DR: A video annotation tool that has been developed for the purpose of annotating generic video sequences in the context of a recent video-TREC benchmarking exercise is described and it is shown how active learning strategy can be potentially implemented in this context to further improve the performance of the annotation tool.
Abstract: Model-based approach to video retrieval requires ground-truth data for training the models. This leads to the development of video annotation tools that allow users to annotate each shot in the video sequence as well as to identify and label scenes, events, and objects by applying the labels at the shot-level. The annotation tool considered here also allows the user to associate the object-labels with an individual region in a key-frame image. However, the abundance of video data and diversity of labels make annotation a difficult and overly expensive task. To combat this problem, we formulate the task of annotation in the framework of supervised training with partially labeled data by viewing it as an exercise in active learning. In this scenario, one first trains a classifier with a small set of labeled data, and subsequently updates the classifier by selecting the most informative, or most uncertain subset of the available data-set. Consequently, propagation of labels to yet unlabeled data is automatically achieved as well. The purpose of this paper is primarily twofold. The first is to describe a video annotation tool that has been developed for the purpose of annotating generic video sequences in the context of a recent video-TREC benchmarking exercise. The tool is semi-automatic in that it automatically propagates labels to similar shots, which requires the user to confirm or reject the propagated labels. The second purpose is to show how active learning strategy can be potentially implemented in this context to further improve the performance of the annotation tool. While many versions of active learning could be thought of, we specifically report results on experiments with support vector machine classifiers with polynomial kernels.

Journal ArticleDOI
TL;DR: A general approach to annotation inference for a given static program checker that reuses the checker as a subroutine is presented and shows how it applies to ESC.

Proceedings ArticleDOI
07 Jul 2001
TL;DR: The authors presented a language-neutral, theory-neutral method for annotating sentence-internal temporal relations, which can be applied without special training and can be used in lexicon/induction, translation and linguistic investigation.
Abstract: The aim of this paper is to present a language-neutral, theory-neutral method for annotating sentence-internal temporal relations. The annotation method is simple and can be applied without special training. The annotations are provided with a well-defined model-theoretic interpretation for use in the content-based comparison of annotations. Temporally annotated corpora have a number of applications, in lexicon/induction, translation and linguistic investigation. A searchable multi-language database has already been created.


Proceedings Article
01 Jan 2001
TL;DR: This position paper explains how a basic annotation schema can be extended to support new scenarios and describes and evaluates some other features and modifications that are useful when implementing these scenarios.
Abstract: In this position paper, we describe three user scenarios that benefit from metadata based annotation infrastructure. We explain how a basic annotation schema can be extended to support new scenarios. We also describe and evaluate some other features and modifications that are useful when implementing these scenarios. The most laborious part in the scenarios is the design and implementation of new user interfaces; the metadata infrastructure itself easily supports the needs of the different applications and new schemas.

Patent
13 Nov 2001
TL;DR: In this article, a method of optimizing a computer program includes generating annotation information about the computer program, storing the annotation information with the program, and dynamically optimizing the program based on annotation information while the program is being executed.
Abstract: A method of optimizing a computer program includes generating annotation information about the computer program, storing the annotation information with the computer program, and dynamically optimizing the computer program based on the annotation information while the computer program is being executed.