scispace - formally typeset
Search or ask a question

Showing papers on "Annotation published in 2004"


Journal ArticleDOI
Midori A. Harris, Jennifer I. Clark1, Ireland A1, Jane Lomax1, Michael Ashburner2, Michael Ashburner1, R. Foulger2, R. Foulger1, Karen Eilbeck3, Karen Eilbeck1, Suzanna E. Lewis1, Suzanna E. Lewis3, B. Marshall3, B. Marshall1, Christopher J. Mungall1, Christopher J. Mungall3, J. Richter1, J. Richter3, Gerald M. Rubin3, Gerald M. Rubin1, Judith A. Blake1, Carol J. Bult1, Dolan M1, Drabkin H1, Janan T. Eppig1, Hill Dp1, L. Ni1, Ringwald M1, Rama Balakrishnan4, Rama Balakrishnan1, J. M. Cherry1, J. M. Cherry4, Karen R. Christie1, Karen R. Christie4, Maria C. Costanzo4, Maria C. Costanzo1, Selina S. Dwight4, Selina S. Dwight1, Stacia R. Engel4, Stacia R. Engel1, Dianna G. Fisk1, Dianna G. Fisk4, Jodi E. Hirschman4, Jodi E. Hirschman1, Eurie L. Hong4, Eurie L. Hong1, Robert S. Nash1, Robert S. Nash4, Anand Sethuraman1, Anand Sethuraman4, Chandra L. Theesfeld4, Chandra L. Theesfeld1, David Botstein5, David Botstein1, Kara Dolinski5, Kara Dolinski1, Becket Feierbach5, Becket Feierbach1, Tanya Z. Berardini1, Tanya Z. Berardini6, S. Mundodi1, S. Mundodi6, Seung Y. Rhee1, Seung Y. Rhee6, Rolf Apweiler1, Daniel Barrell1, Camon E1, E. Dimmer1, Lee1, Rex L. Chisholm, Pascale Gaudet1, Pascale Gaudet7, Warren A. Kibbe7, Warren A. Kibbe1, Ranjana Kishore8, Ranjana Kishore1, Erich M. Schwarz8, Erich M. Schwarz1, Paul W. Sternberg8, Paul W. Sternberg1, M. Gwinn1, Hannick L1, Wortman J1, Matthew Berriman1, Matthew Berriman9, Wood9, Wood1, de la Cruz N1, de la Cruz N10, Peter J. Tonellato1, Peter J. Tonellato10, Pankaj Jaiswal11, Pankaj Jaiswal1, Seigfried T12, Seigfried T1, White R1, White R13 
TL;DR: The Gene Ontology (GO) project as discussed by the authors provides structured, controlled vocabularies and classifications that cover several domains of molecular and cellular biology and are freely available for community use in the annotation of genes, gene products and sequences.
Abstract: The Gene Ontology (GO) project (http://www.geneontology.org/) provides structured, controlled vocabularies and classifications that cover several domains of molecular and cellular biology and are freely available for community use in the annotation of genes, gene products and sequences. Many model organism databases and genome annotation groups use the GO and contribute their annotation sets to the GO resource. The GO database integrates the vocabularies and contributed annotations and provides full access to this information in several formats. Members of the GO Consortium continually work collectively, involving outside experts as needed, to expand and update the GO vocabularies. The GO Web resource also provides access to extensive documentation about the GO project and links to applications that use GO data for functional analyses.

3,565 citations


Journal ArticleDOI
TL;DR: GO::TermFinder comprises a set of object-oriented Perl modules for accessing Gene Ontology information and evaluating and visualizing the collective annotation of a list of genes to GO terms, which can be used to draw conclusions from microarray and other biological data.
Abstract: Summary: GO::TermFinder comprises a set of object-oriented Perl modules for accessing Gene Ontology (GO) information and evaluating and visualizing the collective annotation of a list of genes to GO terms. It can be used to draw conclusions from microarray and other biological data, calculating the statistical significance of each annotation. GO::TermFinder can be used on any system on which Perl can be run, either as a command line application, in single or batch mode, or as a web-based CGI script. Availability: The full source code and documentation for GO::TermFinder are freely available from http://search.cpan.org/dist/GO-TermFinder/

1,869 citations


Journal ArticleDOI
TL;DR: The Gene Ontology Annotation database aims to provide high-quality electronic and manual annotations to the UniProt Knowledgebase (Swiss-Prot, TrEMBL and PIR-PSD) using the standardized vocabulary of theGene Ontology (GO).
Abstract: The Gene Ontology Annotation (GOA) database (http://www.ebi.ac.uk/GOA) aims to provide high-quality electronic and manual annotations to the UniProt Knowledgebase (Swiss-Prot, TrEMBL and PIR-PSD) using the standardized vocabulary of the Gene Ontology (GO). As a supplementary archive of GO annotation, GOA promotes a high level of integration of the knowledge represented in UniProt with other databases. This is achieved by converting UniProt annotation into a recognized computational format. GOA provides annotated entries for nearly 60,000 species (GOA-SPTr) and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. By integrating GO annotations from other model organism groups, GOA consolidates specialized knowledge and expertise to ensure the data remain a key reference for up-to-date biological information. Furthermore, the GOA database fully endorses the Human Proteomics Initiative by prioritizing the annotation of proteins likely to benefit human health and disease. In addition to a non-redundant set of annotations to the human proteome (GOA-Human) and monthly releases of its GO annotation for all species (GOA-SPTr), a series of GO mapping files and specific cross-references in other databases are also regularly distributed. GOA can be queried through a simple user-friendly web interface or downloaded in a parsable format via the EBI and GO FTP websites. The GOA data set can be used to enhance the annotation of particular model organism or gene expression data sets, although increasingly it has been used to evaluate GO predictions generated from text mining or protein interaction experiments. In 2004, the GOA team will build on its success and will continue to supplement the functional annotation of UniProt and work towards enhancing the ability of scientists to access all available biological information. Researchers wishing to query or contribute to the GOA project are encouraged to email: goa@ebi.ac.uk.

917 citations


Journal ArticleDOI
TL;DR: The process ofArabidopsis functional annotation is described using a variety of data sources and several ways in which this information can be accessed and used to infer knowledge about Arabidopsis and other plant species are illustrated.
Abstract: Controlled vocabularies are increasingly used by databases to describe genes and gene products because they facilitate identification of similar genes within an organism or among different organisms. One of The Arabidopsis Information Resource's goals is to associate all Arabidopsis genes with terms developed by the Gene Ontology Consortium that describe the molecular function, biological process, and subcellular location of a gene product. We have also developed terms describing Arabidopsis anatomy and developmental stages and use these to annotate published gene expression data. As of March 2004, we used computational and manual annotation methods to make 85,666 annotations representing 26,624 unique loci. We focus on associating genes to controlled vocabulary terms based on experimental data from the literature and use The Arabidopsis Information Resource-developed PubSearch software to facilitate this process. Each annotation is tagged with a combination of evidence codes, evidence descriptions, and references that provide a robust means to assess data quality. Annotation of all Arabidopsis genes will allow quantitative comparisons between sets of genes derived from sources such as microarray experiments. The Arabidopsis annotation data will also facilitate annotation of newly sequenced plant genomes by using sequence similarity to transfer annotations to homologous genes. In addition, complete and up-to-date annotations will make unknown genes easy to identify and target for experimentation. Here, we describe the process of Arabidopsis functional annotation using a variety of data sources and illustrate several ways in which this information can be accessed and used to infer knowledge about Arabidopsis and other plant species.

449 citations


Proceedings Article
01 May 2004
TL;DR: The actual state of development of the manual annotation tool ELAN is shown and usage requirements from three different groups of users are presented and one annotation model and a number of generic design principles guided the choices made during the development process of ELAN.
Abstract: This paper shows the actual state of development of the manual annotation tool ELAN. It presents usage requirements from three different groups of users and how one annotation model and a number of generic design principles guided the choices made during the development process of ELAN. Introduction At the Max-Planck-Institute for Psycholinguistics (MPI) software development on annotation tools for the manual annotation of multimedia data has been going on since the early 90’s. Over this decade there have been large changes in enabling technology and insights in the nature of linguistic annotation. Media frameworks for the handling of digital audio and especially digital video files have matured, as has media streaming technology. XML has come to existence and has become highly relevant in a short time. Rendering and input of Unicode characters is now commonplace. Simultaneously, users made experiences with the first generation of video annotation tools and became aware of and got used to these new technologies. From this a new set of requirements arose. Finally, annotation tool builders are better aware of each other’s approaches, annotation models and annotation document formats. Clearly convergence is going on, leading to easier exchange of data between annotation tools. An important role in this process was played by the paper by (Bird & Liberman, 2001) that introduced Annotation Graphs. We are closely watching and trying to participate in standards initiatives, as for example ISO TC37/SC4. The first video annotation tool developed at the MPI was MediaTagger, a QuickTime based application that runs only on pre-OS X Macintoshes. It started as a first attempt to exploit the QuickTime Movie data structure, and especially it’s text tracks, as an informal model for linguistic annotation. Since then several new formal models where made, each one building on the experiences of the previous ones and considering new user requirements. The formal modeling languages that were used are Entity-Relationship diagrams and UML. A detailed presentation and evaluation of these models can be found in (Brugman & Wittenburg, 2001). The next chapters will discuss the requirements of several different groups of users and describe the latest state of ELAN functionality. We will then present our model for annotation in some detail and show how we can cover the needs of very different user groups with one relatively simple model. In the discussions plans for future development will presented. 1 http://www.mpi.nl User requirements ELAN is developed with a number of different user groups in mind. These users are situated both within the MPI and, in an increasing number of cases, outside the MPI. Often they are participating in externally funded projects (DoBeS, ECHO). We will discuss the main requirements per group, although there is of course a substantial overlap between each group’s needs.

428 citations


Proceedings ArticleDOI
17 May 2004
TL;DR: PANKOW (Pattern-based Annotation through Knowledge on theWeb), a method which employs an unsupervised, pattern-based approach to categorize instances with regard to an ontology, is proposed.
Abstract: The success of the Semantic Web depends on the availability of ontologies as well as on the proliferation of web pages annotated with metadata conforming to these ontologies. Thus, a crucial question is where to acquire these metadata from. In this paper wepropose PANKOW (Pattern-based Annotation through Knowledge on theWeb), a method which employs an unsupervised, pattern-based approach to categorize instances with regard to an ontology. The approach is evaluated against the manual annotations of two human subjects. The approach is implemented in OntoMat, an annotation tool for the Semantic Web and shows very promising results.

388 citations


Journal ArticleDOI
TL;DR: The Vertebrate Genome Annotation (Vega) database was first made public in 2004 and now contains comprehensive annotation on 20 of the 24 human chromosomes, four whole mouse chromosomes and around 40% of the zebrafish Danio rerio genome.
Abstract: The Vertebrate Genome Annotation (Vega) database (http://vegasangeracuk) has been designed to be a community resource for browsing manual annotation of finished sequences from a variety of vertebrate genomes Its core database is based on an Ensembl-style schema, extended to incorporate curation-specific metadata In collaboration with the genome sequencing centres, Vega attempts to present consistent high-quality annotation of the published human chromosome sequences In addition, it is also possible to view various finished regions from other vertebrates, including mouse and zebrafish Vega displays only manually annotated gene structures built using transcriptional evidence, which can be examined in the browser Attempts have been made to standardize the annotation procedure across each vertebrate genome, which should aid comparative analysis of orthologues across the different finished regions

373 citations


Journal ArticleDOI
31 Aug 2004
TL;DR: An annotation management system for relational databases where every piece of data in a relation is assumed to have zero or more annotations associated with it and annotations are propagated along, from the source to the output, as data is being transformed through a query.
Abstract: We present an annotation management system for relational databases. In this system, every piece of data in a relation is assumed to have zero or more annotations associated with it and annotations are propagated along, from the source to the output, as data is being transformed through a query. Such an annotation management system is important for understanding the provenance and quality of data, especially in applications that deal with integration of scientific and biological data. We present an extension, pSQL, of a fragment of SQL that has three different types of annotation propagation schemes, each useful for different purposes. The default scheme propagates annotations according to where data is copied from. The default-all scheme propagates annotations according to where data is copied from among all equivalent formulations of a given query. The custom scheme allows a user to specify how annotations should propagate. We present a storage scheme for the annotations and describe algorithms for translating a pSQL query under each propagation scheme into one or more SQL queries that would correctly retrieve the relevant annotations according to the specified propagation scheme. For the default-all scheme, we also show how we generate finitely many queries that can simulate the annotation propagation behavior of the set of all equivalent queries, which is possibly infinite. The algorithms are implemented and the feasibility of the system is demonstrated by a set of experiments that we have conducted.

313 citations


Patent
17 Jul 2004
TL;DR: In this article, a system and method including a user interface to manage sets of digital data (e.g., files) such as digital photographs or email messages) is described.
Abstract: Described is a system and method including a user interface to manage sets of digital data (e.g., files) such as digital photographs or email messages. The system and method comprise a rapid sort mechanism and an underlying support mechanism that associates metadata with each set of digital data, including annotation metadata obtained from the sort mechanism. As the user scrolls through images that represent the sets of digital data and categorizes them, metadata as to its particular categorization or lack of categorization is implicitly obtained and associated with each set of digital data. Grouping of sets of digital data into clusters is also provided, with a visual indication as to which cluster a set of digital data belongs. With respect to digital photography, the system and method makes annotating and classifying digital photographs significantly easier and faster than contemporary photograph management mechanisms.

304 citations


Journal ArticleDOI
TL;DR: The TIGER Treebank, a corpus of currently 40,000 syntactically annotated German newspaper sentences, is described and the query language which was designed to facilitate a simple formulation of complex queries is described, a graphical user interface for query input.
Abstract: This paper reports on the TIGER Treebank, a corpus of currently 40,000 syntactically annotated German newspaper sentences. We describe what kind of information is encoded in the treebank and introduce the different representation formats that are used for the annotation and exploitation of the treebank. We explain the different methods used for the annotation: interactive annotation, using the tool ANNOTATE, and LFG parsing. Furthermore, we give an account of the annotation scheme used for the TIGER treebank. This scheme is an extended and improved version of the NEGRA annotation scheme and we illustrate in detail the linguistic extensions that were made concerning the annotation in the TIGER project. The main differences are concerned with coordination, verb-subcategorization, expletives as well as proper nouns. In addition, the paper also presents the query tool TIGERSearch that was developed in the project to exploit the treebank in an adequate way. We describe the query language which was designed to facilitate a simple formulation of complex queries; furthermore, we shortly introduce TIGER in, a graphical user interface for query input. The paper concludes with a summary and some directions for future work.

253 citations


Patent
Wen-Yin Liu1, Hong-Jiang Zhang1
20 Oct 2004
TL;DR: In this paper, a multimedia object retrieval and annotation system integrates an annotation process with object retrieval, and relevance feedback processes, which is performed in background, hidden from the user, as the user conducts normal searches.
Abstract: A multimedia object retrieval and annotation system integrates an annotation process with object retrieval and relevance feedback processes. The annotation process annotates multimedia objects, such as digital images, with semantically relevant keywords. The annotation process is performed in background, hidden from the user, as the user conducts normal searches. The annotation process is “semi-automatic” in that it utilizes both keyword-based information retrieval and content-based image retrieval techniques to automatically search for multimedia objects, and then encourages users to provide feedback on the retrieved objects. The user identifies objects as either relevant or irrelevant to the query keywords and based on this feedback, the system automatically annotates the objects with semantically relevant keywords and/or updates associations between the keywords and objects. As the retrieval-feedback-annotation cycle is repeated, the annotation coverage and accuracy of future searches continues to improve.

Proceedings Article
06 May 2004
TL;DR: An approach to two areas of biomedical information extraction, drug development and cancer genomics using a framework which includes corpus annotation integrated at multiple levels: a Treebank containing syntactic structure, a Propbank containing predicate-argument structure, and annotation of entities and relations among the entities.
Abstract: We describe an approach to two areas of biomedical information extraction, drug development and cancer genomics. We have developed a framework which includes corpus annotation integrated at multiple levels: a Treebank containing syntactic structure, a Propbank containing predicate-argument structure, and annotation of entities and relations among the entities. Crucial to this approach is the proper characterization of entities as relation components, which allows the integration of the entity annotation with the syntactic structure while retaining the capacity to annotate and extract more complex events. We are training statistical taggers using this annotation for such extraction as well as using them for improving the annotation process.

Patent
26 Oct 2004
TL;DR: The fact extraction tool set (FEX) as mentioned in this paperEX is a pattern matching language which is used to write scripts that find and match patterns of attributes that correspond to targeted pieces of information in the text, and extract that information.
Abstract: A fact extraction tool set ('FEX') finds and extracts targeted pieces of information from text using linguistic and pattern matching technologies, and in particular, text annotation and fact extraction. Text annotation tools break a text, such as a document, into its base tokens and annotate those tokens or patterns of tokens with orthographic, syntactic, semantic, pragmatic and other attributes. A user-defined 'Annotation Configuration' controls which annotation tools are used in a given application. XML is used as the basis for representing the annotated text. A tag uncrossing tool resolves conflicting (crossed) annotation boundaries in an annotated text to produce well-formed XML from the results of the individual annotators. The fact extraction tool is a pattern matching language which is used to write scripts that find and match patterns of attributes that correspond to targeted pieces of information in the text, and extract that information.

Journal ArticleDOI
TL;DR: The Linguistic Annotation Framework under development within ISO TC37 SC4 WG1 as mentioned in this paper is intended to serve as a basis for harmonizing existing language resources as well as developing new ones.
Abstract: This paper describes the Linguistic Annotation Framework under development within ISO TC37 SC4 WG1. The Linguistic Annotation Framework is intended to serve as a basis for harmonizing existing language resources as well as developing new ones.

01 Jan 2004
TL;DR: The views, opinions, and/or findings contained in this report are those of the MITRE Corporation and should not be construed as an official Government position, policy, or decision, unless designated by other documentation.
Abstract: The views, opinions, and/or findings contained in this report are those of the MITRE Corporation and should not be construed as an official Government position, policy, or decision, unless designated by other documentation.

Journal ArticleDOI
TL;DR: This database system, with its focus on facilitating flexible queries of the data and providing access to both peer-reviewed annotations as well as alternate annotation information, may be a suitable model for other genome projects wishing to use a continually updated, community-based annotation approach.
Abstract: Using the Pseudomonas aeruginosa Genome Project as a test case, we have developed a database and submission system to facilitate a community-based approach to continually updated genome annotation (http://www.pseudomonas.com). Researchers submit proposed annotation updates through one of three web-based form options which are then subjected to review, and if accepted, entered into both the database and log file of updates with author acknowledgement. In addition, a coordinator continually reviews literature for suitable updates, as we have found such reviews to be the most efficient. Both the annotations database and updates-log database have Boolean search capability with the ability to sort results and download all data or search results as tab-delimited files. To complement this peer-reviewed genome annotation, we also provide a linked GBrowse view which displays alternate annotations. Additional tools and analyses are also integrated, including PseudoCyc, and knockout mutant information. We propose that this database system, with its focus on facilitating flexible queries of the data and providing access to both peer-reviewed annotations as well as alternate annotation information, may be a suitable model for other genome projects wishing to use a continually updated, community-based annotation approach. The source code is freely available under GNU General Public Licence.

Patent
Seiya Shimizu1, Asako Kitaura1
30 Sep 2004
TL;DR: In this article, a technique is presented which enables users to add, with ease, annotation information to an electronic document on a network and to share the annotation information within a disclosure range set to the annotations.
Abstract: A technique is provided which enables users to add, with ease, annotation information to an electronic document on a network and to share the annotation information within a disclosure range set to the annotation information. An electronic information is provided in a state that allows the electronic information to have annotation in formation attached, annotation information to be attached to the electronic information is stored, attribute information indicating the disclosure range of the annotation information is stored and, when a user requests to provide annotation information, annotation information available to the user is provided to a terminal of the user by consulting the attribute information.

Proceedings ArticleDOI
10 Oct 2004
TL;DR: A coherent language model for automatic image annotation is proposed that takes into account the word-to-word correlation by estimating a coherent language models for an image to significantly reduce the required number of annotated image examples.
Abstract: Image annotations allow users to access a large image database with textual queries. There have been several studies on automatic image annotation utilizing machine learning techniques, which automatically learn statistical models from annotated images and apply them to generate annotations for unseen images. One common problem shared by most previous learning approaches for automatic image annotation is that each annotated word is predicated for an image independently from other annotated words. In this paper, we proposed a coherent language model for automatic image annotation that takes into account the word-to-word correlation by estimating a coherent language model for an image. This new approach has two important advantages: 1) it is able to automatically determine the annotation length to improve the accuracy of retrieval results, and 2) it can be used with active learning to significantly reduce the required number of annotated image examples. Empirical studies with Corel dataset are presented to show the effectiveness of the coherent language model for automatic image annotation.

Proceedings ArticleDOI
10 Oct 2004
TL;DR: This paper proposes a multi-level approach to annotate the semantics of natural scenes by using both the dominant image components (salient objects) and the relevant semantic concepts to achieve automatic image annotation at the content level.
Abstract: Automatic image annotation is a promising solution to enable semantic image retrieval via keywords. In this paper, we propose a multi-level approach to annotate the semantics of natural scenes by using both the dominant image components (salient objects) and the relevant semantic concepts. To achieve automatic image annotation at the content level, we use salient objects as the dominant image components for image content representation and feature extraction. To support automatic image annotation at the concept level, a novel image classification technique is developed to map the images into the most relevant semantic image concepts. In addition, Support Vector Machine (SVM) classifiers are used to learn the detection functions for the pre-defined salient objects and finite mixture models are used for semantic concept interpretation and modeling. An adaptive EM algorithm has been proposed to determine the optimal model structure and model parameters simultaneously. We have also demonstrated that our algorithms are very effective to enable multi-level annotation of natural scenes in a large-scale image dataset.

Patent
08 Nov 2004
TL;DR: In this article, a user with a predetermined privilege selects a widget and is presented with the annotation document, the user performs an annotation task modifying the annotation documents and submits the annotated documents to the annotation store, the submission triggering the workflow action program to progress the workflow to another step.
Abstract: A displayed document comprises an annotation widget, the widget associated with an annotation document and a corresponding annotation key in an annotation store. The annotation document associated with a workflow action program. A user with a predetermined privilege selects a widget and is presented with the annotation document. The user performs an annotation task modifying the annotation document and submits the annotation document to the annotation store, the submission triggering the workflow action program to progress the workflow to another step.

Proceedings ArticleDOI
01 Dec 2004
TL;DR: The associations identified in this study could serve as the starting point for adding associative relations across hierarchies to GO, but would require manual curation.
Abstract: The Gene Ontology (GO) is a controlled vocabulary widely used for the annotation of gene products. GO is organized in three hierarchies for molecular functions, cellular components, and biological processes but no relations are provided among terms across hierarchies. The objective of this study is to investigate three non-lexical approaches to identifying such associative relations in GO and compare them among themselves and to lexical approaches. The three approaches are: computing similarity in a vector space model, statistical analysis of co-occurrence of GO terms in annotation databases, and association rule mining. Five annotation databases (FlyBase, the Human subset of GOA, MGI, SGD, and WormBase) are used in this study. A total of 7,665 associations were identified by at least one of the three non-lexical approaches. Of these, 12% were identified by more than one approach. While there are almost 6,000 lexical relations among GO terms, only 203 associations were identified by both non-lexical and lexical approaches. The associations identified in this study could serve as the starting point for adding associative relations across hierarchies to GO, but would require manual curation. The application to quality assurance of annotation databases is also discussed.

Patent
Jordi Albornoz1, Lee Feigenbaum1, Douglas R. Fish1, Sean J. Martin1, Hoa Tran1, David A. Wall1 
17 Dec 2004
TL;DR: In this paper, the authors present methods, systems, and articles of manufacture for managing an annotation system that includes storing annotations for a document family, i.e., a series of versions of a data source.
Abstract: The present invention generally provides methods, systems, and articles of manufacture for managing an annotation system that includes storing annotations for a document family, i.e., a series of versions of a data source. Annotations created for one version of the data source may be viewed in context from both subsequent and prior versions of the same data source. Embodiments of the invention associate annotations with both a data source “family identifier” as well as a “version identifier.” Other than adding a family ID to the data source, the data source remains unchanged. The family ID is maintained across different versions of the data source, whereas version IDs are determined for a specific version of the data source. Version IDs can be constructed from each data source directly, and do not need to be stored.

Journal ArticleDOI
TL;DR: GOblet is a comprehensive web server application providing the annotation of anonymous sequence data with Gene Ontology (GO) terms and provides an improved display of results with the aid of Java applets.
Abstract: GOblet is a comprehensive web server application providing the annotation of anonymous sequence data with Gene Ontology (GO) terms. It uses a variety of different protein databases (human, murines, invertebrates, plants, sp-trembl) and their respective GO mappings. The user selects the appropriate database and alignment threshold and thereafter submits single or multiple nucleotide or protein sequences. Results are shown in different ways, e.g. as survey statistics for the main GO categories for all sequences or as detailed results for each single sequence that has been submitted. In its newest version, GOblet allows the batch submission of sequences and provides an improved display of results with the aid of Java applets. All output data, together with the Java applet, are packed to a downloadable archive for local installation and analysis. GOblet can be accessed freely at http://goblet.molgen.mpg.de.

Patent
Yue Pan1, Li Zhang1
21 Oct 2004
TL;DR: In this article, the authors present a method of annotation for electronic documents, a method for creating, modifying and browsing an annotation in an electronic document, and an apparatus and system for editing, browsing annotations in electronic documents.
Abstract: The present invention provides a method of annotation for electronic document, a method for creating, modifying and browsing an annotation in an electronic document, and an apparatus and system for editing, browsing annotations in electronic document. The method of annotation for electronic document includes: storing annotation contents for one or more electronic documents into a shared dictionary; and when a reader browses an electronic document, providing the reader with annotations for the electronic document based on the shared dictionary.

Proceedings ArticleDOI
24 Apr 2004
TL;DR: Usability issues encountered in using a camera phone as an image annotation device immediately after image capture and users' responses to use of such a system are presented.
Abstract: In this paper we describe a system that allows users to annotate digital photos at the time of capture. The system uses camera phones with a lightweight client application and a server to store the images and metadata and assists the user in annotation on the camera phone by providing guesses about the location and content of the photos. By conducting user interface testing, surveys, and focus groups we were able to evaluate the usability of this system and uncover usage patterns and motivations that will inform our development of future mobile media annotation applications. In this paper we present usability issues encountered in using a camera phone as an image annotation device immediately after image capture and users' responses to use of such a system.

Journal ArticleDOI
TL;DR: The MyHits web server is a new integrated service dedicated to the annotation of protein sequences and to the analysis of their domains and signatures and includes a new collection of tools to generate graphical representations of pairwise and multiple sequence alignments including their annotated features.
Abstract: The MyHits web server (http://myhits.isb-sib.ch) is a new integrated service dedicated to the annotation of protein sequences and to the analysis of their domains and signatures. Guest users can use the system anonymously, with full access to (i) standard bioinformatics programs (e.g. PSI-BLAST, ClustalW, T-Coffee, Jalview); (ii) a large number of protein sequence databases, including standard (Swiss-Prot, TrEMBL) and locally developed databases (splice variants); (iii) databases of protein motifs (Prosite, Interpro); (iv) a precomputed list of matches (‘hits’) between the sequence and motif databases. All databases are updated on a weekly basis and the hit list is kept up to date incrementally. The MyHits server also includes a new collection of tools to generate graphical representations of pairwise and multiple sequence alignments including their annotated features. Free registration enables users to upload their own sequences and motifs to private databases. These are then made available through the same web interface and the same set of analytical tools. Registered users can manage their own sequences and annotations using only web tools and freeze their data in their private database for publication purposes.

Proceedings Article
01 May 2004
TL;DR: The First Release of the American National Corpus (ANC) was made available in mid-fall, 2003 and includes approximately 11 million words of American English, including written and spoken data and a variety of text types annotated for part of speech and lemma.
Abstract: The First Release of the American National Corpus (ANC) was made available in mid-fall, 2003. The data includes approximately 11 million words of American English, including written and spoken data and a variety of text types annotated for part of speech and lemma. The corpus is provided in XML format conformant to the XML Corpus Encoding Standard (XCES) (http://www.xml-ces.org), and is distributed in both a stand-off version (where annotation is in an XML document separate from the primary texts) and a merged version (where annotation is included in-line in the texts). The merged version includes annotation for part of speech and lemma produced by the Biber tagger; in stand-off annotation, in addition to the Biber tagging, morpho-syntactic annotations of the data are provided using the CLAWS 5 and 7 tagsets as well as several other tagsets.

Journal ArticleDOI
TL;DR: An XML-based Java application is described that provides a function-oriented overview of the results of cluster analysis of gene-expression microarray data based on Gene Ontology terms and associations.
Abstract: Summary: An XML-based Java application is described that provides a function-oriented overview of the results of cluster analysis of gene-expression microarray data based on Gene Ontology terms and associations. The application generates one HTML page with listings of the frequencies of explicit and implicit Gene Ontology annotations for each cluster, and separate, linked pages with listings of explicit annotations for each gene in a cluster. Availability: http://www.charite.de/ch/medgen/ontologizer

Proceedings ArticleDOI
25 May 2004
TL;DR: A new digital annotation system organized in a client-server architecture, where the client is a plug-in for a standard web browser and the servers are repositories of annotations to which different clients can login.
Abstract: Digital annotation of multimedia documents adds information to a document (e.g. a web page) or parts of it (a multimedia object such as an image or a video stream contained in the document). Digital annotations can be kept private or shared among different users over the internet, allowing discussions and cooperative work. We study the possibility of annotating multimedia documents with objects which are in turn of multimedial nature. Annotations can refer to whole documents or single portions thereof, as usual, but also to multi-objects, i.e. groups of objects contained in a single document. We designed and developed a new digital annotation system organized in a client-server architecture, where the client is a plug-in for a standard web browser and the servers are repositories of annotations to which different clients can login. Annotations can be retrieved and filtered, and one can choose different annotation servers for a document. We present a platform-independent design for such a system, and illustrate a specific implementation for Microsoft Internet Explorer on the client side and on JSP/MySQL for the server side.

Proceedings Article
01 Jan 2004
TL;DR: These first experiences with the MATE scheme for anaphoric annotation are discussed, some lessons that have been learned, and a few modifications are suggested.
Abstract: In the five years since it was proposed, the MATE scheme for anaphoric annotation has been used in a variety of annotation projects, and the resulting corpora have been used to study both anaphora resolution and NL generation. Annotation tools inspired by the proposals have been used in some of these projects. In this paper we discuss these first experiences with the scheme, some lessons that have been learned, and suggest a few modifications.