scispace - formally typeset
Search or ask a question

Showing papers on "Annotation published in 2002"


Journal ArticleDOI
TL;DR: A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu.
Abstract: As vertebrate genome sequences near completion and research refocuses to their analysis, the issue of effective genome annotation display becomes critical. A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu. This browser displays assembly contigs and gaps, mRNA and expressed sequence tag alignments, multiple gene predictions, cross-species homologies, single nucleotide polymorphisms, sequence-tagged sites, radiation hybrid data, transposon repeats, and more as a stack of coregistered tracks. Text and sequence-based searches provide quick and precise access to any region of specific interest. Secondary links from individual features lead to sequence details and supplementary off-site databases. One-half of the annotation tracks are computed at the University of California, Santa Cruz from publicly available sequence data; collaborators worldwide provide the rest. Users can stably add their own custom tracks to the browser for educational or research purposes. The conceptual and technical framework of the browser, its underlying MYSQL database, and overall use are described. The web site currently serves over 50,000 pages per day to over 3000 different users.

9,605 citations


Journal ArticleDOI
TL;DR: The paper presents the overall design of Annotea and describes some of the issues the project has faced and how it has solved them, including combining RDF with XPointer, XLink, and HTTP.

565 citations


Journal ArticleDOI
TL;DR: FlyBase biologists successfully used Apollo to annotate the Drosophila melanogaster genome and it is increasingly being used as a starting point for the development of customized annotation editing tools for other genome projects.
Abstract: The well-established inaccuracy of purely computational methods for annotating genome sequences necessitates an interactive tool to allow biological experts to refine these approximations by viewing and independently evaluating the data supporting each annotation. Apollo was developed to meet this need, enabling curators to inspect genome annotations closely and edit them. FlyBase biologists successfully used Apollo to annotate the Drosophila melanogaster genome and it is increasingly being used as a starting point for the development of customized annotation editing tools for other genome projects.

439 citations


Book ChapterDOI
01 Oct 2002
TL;DR: OntoMat-Annotizer extract with the help of Amilcare knowledge structure from web pages through the use of knowledge extraction rules, the result of a learning-cycle based on already annotated pages.
Abstract: Richly interlinked, machine-understandable data constitute the basis for the Semantic Web. We provide a framework, S-CREAM, that allows for creation of metadata and is trainable for a specific domain. Annotating web documents is one of the major techniques for creating metadata on the web. The implementation of S-CREAM, OntoMat-Annotizer supports now the semi-automatic annotation of web pages. This semi-automatic annotation is based on the information extraction component Amilcare. OntoMat-Annotizer extract with the help of Amilcare knowledge structure from web pages through the use of knowledge extraction rules. These rules are the result of a learning-cycle based on already annotated pages.

355 citations


Proceedings Article
24 Mar 2002
TL;DR: This paper reports on a new corpus, its ontological basis, annotation scheme, and statistics of annotated objects, and the tools used for corpus annotation and management.
Abstract: With the information overload in genome-related field, there is an increasing need for natural language processing technology to extract information from literature and various attempts of information extraction using NLP has been being made. We are developing the necessary resources including domain ontology and annotated corpus from research abstracts in MEDLINE database (GENIA corpus). We are building the ontology and the corpus simultaneously, using each other. In this paper we report on our new corpus, its ontological basis, annotation scheme, and statistics of annotated objects. We also describe the tools used for corpus annotation and management.

285 citations


Proceedings ArticleDOI
07 May 2002
TL;DR: This work provides a framework, CREAM, that allows for creation of metadata, and describes its implementation, viz.
Abstract: Richly interlinked, machine-understandable data constitute the basis for the Semantic Web. We provide a framework, CREAM, that allows for creation of metadata. While the annotation mode of CREAM allows to create metadata for existing web pages, the authoring mode lets authors create metadata --- almost for free --- while putting together the content of a page.As a particularity of our framework, CREAM allows to create relational metadata, i.e. metadata that instantiate interrelated definitions of classes in a domain ontology rather than a comparatively rigid template-like schema asm Dublin Core. We discuss some of the requirements one has to meet when developing such an ontology-based framework, e.g. the integration of a metadata crawler, inference services, document management and a meta-ontology, and describe its implementation, viz. Ont-O-Mat, a component-based, ontology-driven Web page authoring and annotation tool.

261 citations


Proceedings ArticleDOI
03 Jun 2002
TL;DR: The two problems correspond to two fundamentally distinct notions of provenance, why and where-provenance, which gives important insights into computational issues involved in data provenance or lineage --- the process by which data moves through databases.
Abstract: We study two classes of view update problems in relational databases. We are given a source database S, a monotone query Q, and the view Q(S) generated by the query. The first problem that we consider is the classical view deletion problem where we wish to identify a minimal set T of tuples in S whose deletion will eliminate a given tuple t from the view. We study the complexity of optimizing two natural objectives in this setting, namely, find T to minimize the side-effects on the view, and the source, respectively. For both objective functions, we show a dichotomy in the complexity. Interestingly, the problem is either in P or is NP-hard, for queries in the same class in either objective function.The second problem in our study is the annotation placement problem. Suppose we annotate an attribute of a tuple in S. The rules for carrying the annotation forward through a query are easily stated. On the other hand, suppose we annotate an attribute of a tuple in the view Q(S), what annotation(s) in S will cause this annotation to appear in the view, minimizing the propagation to other attributes in Q(S)? View annotation is becoming an increasingly useful method of communicating meta-data among users of shared scientific data sets, and to our knowledge, there has been no formal study of this problem.Our study of these problems gives us important insights into computational issues involved in data provenance or lineage --- the process by which data moves through databases. We show that the two problems correspond to two fundamentally distinct notions of provenance, why and where-provenance.

236 citations


Journal ArticleDOI
TL;DR: A bibliography submission system is developed for scientists to submit, categorize and retrieve literature information, and a non-redundant reference protein database, PIR-NREF is introduced.
Abstract: The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases).

223 citations


Patent
26 Aug 2002
TL;DR: In this article, a system and method for facilitating annotation of a document co-browsed by multiple attendees of a collaboration session is presented, where the annotation event is forwarded to other attendees.
Abstract: A system and method for facilitating annotation of a document co-browsed by multiple attendees of a collaboration session A co-browsed page is served to the attendees by a collaboration server An attendee (eg, the host) annotates the page by highlighting a portion, placing or moving a pointer, scrolling the page within a window, or taking some other action A collaboration applet operating in conjunction with the attendee's browser notes the position of the annotation (and size of annotation if it involves highlighting) and normalizes that position relative to the page The collaboration applet then transmits an annotation event to the collaboration server, with the normalized position The event is forwarded to other attendees, where the annotation is recreated

162 citations


Proceedings ArticleDOI
24 Aug 2002
TL;DR: This paper addresses issues related to building a large-scale Chinese corpus and tries to answer four questions: how to speed up annotation, how to maintain high annotation quality, for what purposes is the corpus applicable, and finally what future work the authors anticipate.
Abstract: In this paper we address issues related to building a large-scale Chinese corpus. We try to answer four questions: (i) how to speed up annotation, (ii) how to maintain high annotation quality, (iii) for what purposes is the corpus applicable, and finally (iv) what future work we anticipate.

152 citations


Book ChapterDOI
01 Oct 2002
TL;DR: A model of interaction that addresses such issues and Melita, an annotation framework that implements a methodology for active annotation for the Semantic Web based on IE are presented.
Abstract: The process of document annotation for the Semantic Web is complex and time consuming, as it requires a great deal of manual annotation. Information extraction from texts (IE) is a technology used by some very recent systems for reducing the burden of annotation. The integration of IE systems in annotation tools is quite a new development and there is still the necessity of thinking the impact of the IE system on the whole annotation process. In this paper we initially discuss a number of requirements for the use of IE as support for annotation. Then we present and discuss a model of interaction that addresses such issues and Melita, an annotation framework that implements a methodology for active annotation for the Semantic Web based on IE. Finally we present an experiment that quantifies the gain in using IE as support to human annotators.

Journal ArticleDOI
TL;DR: A new automated annotation system and database called Rice Genome Automated Annotation System (RiceGAAS) has been developed to execute a reliable and up-to-date analysis of the genome sequence as well as to store and retrieve the results of annotation.
Abstract: An extensive effort of the International Rice Genome Sequencing Project (IRGSP) has resulted in rapid accumulation of genome sequence, and >137 Mb has already been made available to the public domain as of August 2001. This requires a high-throughput annotation scheme to extract biologically useful and timely information from the sequence data on a regular basis. A new automated annotation system and database called Rice Genome Automated Annotation System (RiceGAAS) has been developed to execute a reliable and up-to-date analysis of the genome sequence as well as to store and retrieve the results of annotation. The system has the following functional features: (i) collection of rice genome sequences from GenBank; (ii) execution of gene prediction and homology search programs; (iii) integration of results from various analyses and automatic interpretation of coding regions; (iv) re-execution of analysis, integration and automatic interpretation with the latest entries in reference databases; (v) integrated visualization of the stored data using web-based graphical view. RiceGAAS also has a data submission mechanism that allows public users to perform fully automated annotation of their own sequences. The system can be accessed at http://RiceGAAS.dna.affrc.go.jp/.

Proceedings ArticleDOI
11 Jul 2002
TL;DR: An experiment that uses translation equivalents derived from parallel corpora to determine sense distinctions that can be used for automatic sense-tagging and other disambiguation tasks shows that sense distinctions derived from cross-lingual information are at least as reliable as those made by human annotators.
Abstract: This paper describes an experiment that uses translation equivalents derived from parallel corpora to determine sense distinctions that can be used for automatic sense-tagging and other disambiguation tasks. Our results show that sense distinctions derived from cross-lingual information are at least as reliable as those made by human annotators. Because our approach is fully automated through all its steps, it could provide means to obtain large samples of "sense-tagged" data without the high cost of human annotation.

Patent
20 May 2002
TL;DR: In this paper, a system and method allows for the annotation of an image stream, which may be produced by, for example, an ingestible capsule, where a user inputs an annotation which corresponds to a portion of the moving image, and the annotation is recorded in a database associated with the selected portion.
Abstract: A system and method allows for the annotation of an image stream, which may be produced by, for example, an ingestible capsule. A workstation accepts images acquired by the capsule and displays the images on a monitor as a moving image. A user inputs an annotation which corresponds to a portion of the moving image, and the annotation is recorded in a database associated with the selected portion. The annotation may include, for example, textual notes regarding the image portion. The annotations may be displayed at a later time.

Journal ArticleDOI
TL;DR: The PRINTS database houses a collection of protein fingerprints that may be used to make family and tentative functional assignments for uncharacterised sequences, and the use of its relational cousin,PRINTS-S, to model relationships between families, including those beyond the reach of conventional sequence analysis approaches is reported.
Abstract: The PRINTS database houses a collection of protein fingerprints. These may be used to make family and tentative functional assignments for uncharacterised sequences. The September 2001 release (version 32.0) includes 1600 fingerprints, encoding approximately 10 000 motifs, covering a range of globular and membrane proteins, modular polypeptides and so on. In addition to its continued steady growth, we report here its use as a source of annotation in the InterPro resource, and the use of its relational cousin, PRINTS-S, to model relationships between families, including those beyond the reach of conventional sequence analysis approaches. The database is accessible for BLAST, fingerprint and text searches at http://www.bioinf.man.ac.uk/dbbrowser/PRINTS/.

Patent
10 Apr 2002
TL;DR: Common Annotation Framework as discussed by the authors is a framework that includes an annotation having a context anchor that identifies a resource and a position in the resource that the annotation pertains to, and a content anchor that is annotating the resource.
Abstract: A Common Annotation Framework includes, in an embodiment, an annotation having a context anchor that identifies a resource and a position in the resource that the annotation pertains to, and a content anchor that identifies data that is annotating the resource. The annotation can also be extended with client application-defined data and/or functionality, and the framework can be extended with one or more of application-defined objects, methods, and annotation stores.

Journal ArticleDOI
TL;DR: The development of GO Engine, a computational platform for GO annotation, and analysis of the resultant GO annotations of human proteins are reported, which centered on sequence homology with GO-annotated proteins and protein domain analysis.
Abstract: Recent progress in genomic sequencing, computational biology, and ontology development has presented an opportunity to investigate biological systems from a unique perspective, that is, examining genomes and transcriptomes through the multiple and hierarchical structure of Gene Ontology (GO). We report here our development of GO Engine, a computational platform for GO annotation, and analysis of the resultant GO annotations of human proteins. Protein annotation was centered on sequence homology with GO-annotated proteins and protein domain analysis. Text information analysis and a multiparameter cellular localization predictive tool were also used to increase the annotation accuracy, and to predict novel annotations. The majority of proteins corresponding to full-length mRNA in GenBank, and the majority of proteins in the NR database (nonredundant database of proteins) were annotated with one or more GO nodes in each of the three GO categories. The annotations of GenBank and SWISS-PROT proteins are available to the public at the GO Consortium web site.

Dissertation
01 Jan 2002
TL;DR: A fine-grained annotation scheme with which all events, times and temporal relations reported ill a text can be captured, and a graphical annotation tool to aid the application of the scheme to text, which allows easy markup of sophisticated temporal annotations.
Abstract: Many natural language processing applications, such as information extraction, question answering, topic detection and tracking, would benefit significantly from the ability to accurately position reported events in time, either relatively with respect to other events or absolutely with respect to calendrical time. However, relatively little work has been done to date on the automatic extraction of temporal information from text. Before we can progress to automatically position reported events in time, we must gain an understanding of the mechanisms used to do this in language. This understanding can be promoted through the development of all annotation scheme, which allows us to identify the textual expressions conveying events, times and temporal relations in a corpus of 'real' text. This thesis describes a fine-grained annotation scheme with which we can capture all events, times and temporal relations reported ill a text. To aid the application of the scheme to text, a graphical annotation tool has been developed. This tool not only allows easy markup of sophisticated temporal annotations, it also contains an interactive, inference-based component supporting the gathering of temporal relations. The annotation scheme and the tool have been evaluated through the construction of a trial corpus during a pilot study. In this study, a group of annotators was supplied with a description of the annotation scheme and asked to apply it to a trial corpus. The pilot study showed that the annotation scheme was difficult to apply, but is feasible with improvements to the definition of the annotation scheme and the tool. Analysis of the resulting trial corpus also provides preliminary results on the relative extent to which different linguistic mechanisms, explicit and implicit, are used to convey temporal relational information in text.

Patent
Steven J. Simske1
10 Sep 2002
TL;DR: In this article, the authors present a system for generating image annotation information comprising selecting images to be annotated, analyzing selected images to identify associated information, generating annotation information from at least one of said selected images using said associated information and annotating the selected images with the annotation information.
Abstract: The present invention is directed to a system for and method of generating image annotation information comprising selecting images to be annotated, analyzing said selected images to identify associated information, generating annotation information from at least one of said selected images using said associated information, and annotating said selected images with the annotation information

Patent
31 Dec 2002
TL;DR: A method and system for automated annotation and retrieval of remote digital content is described in this paper, where the image capture device of the present invention is configured to communicate with one or more external devices using a wired or wireless protocol.
Abstract: A method and system for automated annotation and retrieval of remote digital content is described The image capture device of the present invention is configured to communicate with one or more external devices using a wired or wireless protocol For example, Smart tag, 80211, or Bluetooth protocols may be used to enable the camera to communicate with the external device, associated with an object of interest, to obtain metadata corresponding to a captured image of the object The metadata collected using various forms of technology, as noted above for instance, can be used to automatically index a digital image and/or other digital content without any manual intervention

Journal ArticleDOI
TL;DR: The process of annotating a previously annotated genome sequence as 're-annotation', and the strengths and weaknesses of current manual and automatic genome-wide re-ANNotation approaches are examined.
Abstract: Annotation, the process by which structural or functional information is inferred for genes or proteins, is crucial for obtaining value from genome sequences. We define the process of annotating a previously annotated genome sequence as 're-annotation', and examine the strengths and weaknesses of current manual and automatic genome-wide re-annotation approaches.

Proceedings Article
01 May 2002
TL;DR: The specification and dicult tagging problems which have emerged through the annotation so far of the corpus annotation project are shown.
Abstract: This paper describes our corpus annotation project. The annotated corpus has relevance tags which consist of predicate-argument relations, relations between nouns, and coreferences. To construct this relevance-tagged corpus, we investigated a large corpus and established the specification of the annotation. This paper shows the specification and dicult tagging problems which have emerged through the annotation so far.

Patent
29 Oct 2002
TL;DR: In this paper, an intermediate server is provided between a user's client terminal and a server managing digital contents, where the user wishing to add annotation information to be shared, acquires the original digital contents via the intermediate server, not directly from the Web server.
Abstract: A mechanism for sharing information (annotation information) added to digital contents such as a Web page, by a plurality of users. An intermediate server is provided between a user's client terminal and a server managing digital contents. The user wishing to add annotation information to be shared, acquires the original digital contents via the intermediate server, not directly from the Web server. Then, the intermediate server caches the digital contents concerned and transmits them to the client terminal along with an annotation tool having a program for adding annotation information. Annotation information prepared by this annotation tool is stored in the intermediate server along with position information where it is displayed. For correlating the original digital contents and data about the annotation information with each other, a session key is produced as key information. The intermediate server produces address information (URL) for contents reproduction including data about the session key.

Proceedings Article
07 May 2002
TL;DR: This contribution describes the open source project Edutella which builds upon metadata standards defined for the WWW and aims to provide an RDF-based metadata infrastructure for P2P applications, building on the recently announced JXTA Framework.
Abstract: P2P applications for searching and exchanging information over the Web have become increasingly popular. This has lead to a number of (usually thematically) focused communities, which allow efficient searching within such communities, and which use specific metadata sets to specify the resources stored within the P2P network. By concentrating on domain and application specific formats for metadata and query languages, however, current P2P networks appear to be fragmenting into non-interoperable niche markets. This contribution describes the open source project Edutella which builds upon metadata standards defined for the WWW and aims to provide an RDF-based metadata infrastructure for P2P applications, building on the recently announced JXTA Framework. We describe one basic service (query) and an Edutella application (annotation) within this network, both being built on a common query language exchange format, and specify the main architecture and APIs of the Edutella P2P network.

Proceedings ArticleDOI
27 Oct 2002
TL;DR: An implementation of a freeform annotation system that accommodates dynamic document layout and explores a range of heuristics and algorithms required to handle common types of annotation, and concludes with a discussion of possible extensions.
Abstract: Freeform digital ink annotation allows readers to interact with documents in an intuitive and familiar manner. Such marks are easy to manage on static documents, and provide a familiar annotation experience. In this paper, we describe an implementation of a freeform annotation system that accommodates dynamic document layout. The algorithm preserves the correct position of annotations when documents are viewed with different fonts or font sizes, with different aspect ratios, or on different devices. We explore a range of heuristics and algorithms required to handle common types of annotation, and conclude with a discussion of possible extensions to handle special kinds of annotations and changes to documents.

Patent
16 Oct 2002
TL;DR: In this article, a system and method for dynamic modification and generation of data is described, which includes an annotation server that is connected to a user and a content provider, configured to modify a copy of an enterprise's stored Web content without necessarily modifying the actual stored web content.
Abstract: A system and method for dynamic modification and generation of data is described. One embodiment includes an annotation server that is connected to a user and a content provider. The annotation server is configured to modify a copy of an enterprise's stored Web content without necessarily modifying the actual stored Web content. The annotation server then provides the modified content to the user for viewing or other use.

Journal ArticleDOI
TL;DR: This work states that as more of the human genome draft sequence is finished, and genomes from other organisms begin to be sequenced, the demand for accurate and reliable genome annotation will increase significantly, and automated bioinformatics solutions are increasingly required.

Patent
06 Jun 2002
TL;DR: In this paper, the authors present methods for remote users of a collaborative application to generate annotation information, send that annotation information to an application sharer device, and receive back a display combining output of the collaborative application with the annotation information.
Abstract: Disclosed are methods for remote users of a collaborative application to generate annotation information, send that annotation information to an application sharer device, and receive back a display combining output of the collaborative application with the annotation information. A collaborative application display is visible on an application viewer's screen. To make an annotation, a user draws over the shared display. The annotation is intercepted and sent to the sharer. On the sharer, the annotation is graphically blended with the display produced by the collaborative application. The combination is then sent to the remote viewers for display. The sharer may visually indicate, via color or a text flag, for example, the source of each annotation. The sharer may time out an annotation, or may delete the annotation if the collaborative application's display has scrolled underneath the annotation, causing the annotation to “lose its place” in the display and become meaningless.

Patent
16 Sep 2002
TL;DR: In this paper, a system for automatic emphasis of freeform annotations contained within an electronic document is performed based on a determined importance of each annotation, which is determined by a mark parser that groups, types and ranks each of the annotations.
Abstract: A system for automatic emphasis of freeform annotations contained within an electronic document is performed based on a determined importance of each annotation. The importance of each annotation is determined by a mark parser that groups, types and ranks each of the annotations. A weighted value is assigned to each grouped, ranked and typed annotation based on temporal and spatial information. The display characteristics of the weighted annotation is altered based on the weighted value.

Book ChapterDOI
30 Oct 2002
TL;DR: A number of candidate interpretations of annotation are identified, and the impact these interpretations may have on Semantic Web applications is discussed.
Abstract: Semantic metadata will playa significant role in the provision of the Semantic Web. Agents will need metadata that describes the content of resources in order to perform operations, such as retrieval, over those resources. In addition, if rich semantic metadata is supplied, those agents can then employ reasoning over the metadata, enhancing their processing power. Keyto this approach is the provision of annotations, both through automatic and human means. The semantics of these annotations, however, in terms of the mechanisms through which they are interpreted and presented to the user, are sometimes unclear. In this paper, we identify a number of candidate interpretations of annotation, and discuss the impact these interpretations mayha ve on Semantic Web applications.