Showing papers on "Annotation published in 2003"

PDF

Open Access

Journal Article•DOI•

DAVID: Database for Annotation, Visualization, and Integrated Discovery

[...]

Glynn Dennis¹, Brad T. Sherman¹, Douglas A. Hosack¹, Jun Jun Yang¹, Wei Gao¹, H. Clifford Lane², Richard A. Lempicki¹ - Show less +3 more•Institutions (2)

Science Applications International Corporation¹, National Institutes of Health²

14 Aug 2003-Genome Biology

TL;DR: DAMID is a web-accessible program that integrates functional genomic annotations with intuitive graphical summaries that assists in the interpretation of genome-scale datasets by facilitating the transition from data collection to biological meaning.

...read moreread less

Abstract: The distributed nature of biological knowledge poses a major challenge to the interpretation of genome-scale datasets, including those derived from microarray and proteomic studies. This report describes DAVID, a web-accessible program that integrates functional genomic annotations with intuitive graphical summaries. Lists of gene or protein identifiers are rapidly annotated and summarized according to shared categorical data for Gene Ontology, protein domain, and biochemical pathway membership. DAVID assists in the interpretation of genome-scale datasets by facilitating the transition from data collection to biological meaning.

...read moreread less

8,849 citations

Journal Article•DOI•

The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003

[...]

Brigitte Boeckmann¹, Amos Marc Bairoch, Rolf Apweiler, Marie-Claude Blatter, Anne Estreicher, Elisabeth Gasteiger, Maria Jesus Martin, Karine Michoud, Claire O'Donovan, Isabelle Phan, Sandrine Pilbout, Michel Schneider - Show less +8 more•Institutions (1)

Swiss Institute of Bioinformatics¹

01 Jan 2003-Nucleic Acids Research

TL;DR: The SWISS-PROT protein knowledgebase connects amino acid sequences with the current knowledge in the Life Sciences by providing an interdisciplinary overview of relevant information by bringing together experimental results, computed features and sometimes even contradictory conclusions.

...read moreread less

Abstract: The SWISS-PROT protein knowledgebase (http://www.expasy.org/sprot/ and http://www.ebi.ac.uk/swissprot/) connects amino acid sequences with the current knowledge in the Life Sciences. Each protein entry provides an interdisciplinary overview of relevant information by bringing together experimental results, computed features and sometimes even contradictory conclusions. Detailed expertise that goes beyond the scope of SWISS-PROT is made available via direct links to specialised databases. SWISS-PROT provides annotated entries for all species, but concentrates on the annotation of entries from human (the HPI project) and other model organisms to ensure the presence of high quality annotation for representative members of all protein families. Part of the annotation can be transferred to other family members, as is already done for microbes by the High-quality Automated and Manual Annotation of microbial Proteomes (HAMAP) project. Protein families and groups of proteins are regularly reviewed to keep up with current scientific findings. Complementarily, TrEMBL strives to comprise all protein sequences that are not yet represented in SWISS-PROT, by incorporating a perpetually increasing level of mostly automated annotation. Researchers are welcome to contribute their knowledge to the scientific community by submitting relevant findings to SWISS-PROT at swiss-prot@expasy.org.

...read moreread less

3,440 citations

Proceedings Article•DOI•

Automatic image annotation and retrieval using cross-media relevance models

[...]

Jiwoon Jeon¹, Victor Lavrenko¹, R. Manmatha¹•Institutions (1)

University of Massachusetts Amherst¹

28 Jul 2003

TL;DR: The approach shows the usefulness of using formal information retrieval models for the task of image annotation and retrieval by assuming that regions in an image can be described using a small vocabulary of blobs.

...read moreread less

Abstract: Libraries have traditionally used manual image annotation for indexing and then later retrieving their image collections. However, manual image annotation is an expensive and labor intensive procedure and hence there has been great interest in coming up with automatic ways to retrieve images based on content. Here, we propose an automatic approach to annotating and retrieving images based on a training set of images. We assume that regions in an image can be described using a small vocabulary of blobs. Blobs are generated from image features using clustering. Given a training set of images with annotations, we show that probabilistic models allow us to predict the probability of generating a word given the blobs in an image. This may be used to automatically annotate and retrieve images given a word as a query. We show that relevance models allow us to derive these probabilities in a natural way. Experiments show that the annotation performance of this cross-media relevance model is almost six times as good (in terms of mean precision) than a model based on word-blob co-occurrence model and twice as good as a state of the art model derived from machine translation. Our approach shows the usefulness of using formal information retrieval models for the task of image annotation and retrieval.

...read moreread less

1,275 citations

Journal Article•DOI•

Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation.

[...]

Phillip Lord¹, Robert Stevens¹, Andy Brass¹, Carole Goble¹•Institutions (1)

University of Manchester¹

01 Jul 2003-Bioinformatics

TL;DR: In this paper, the authors investigate the use of ontological annotation to measure the similarities in knowledge content or "semantic similarity" between entries in a data resource, and present a simple extension that enables a semantic search of the knowledge held within sequence databases.

...read moreread less

Abstract: Motivation: Many bioinformatics data resources not only hold data in the form of sequences, but also as annotation. In the majority of cases, annotation is written as scientific natural language: this is suitable for humans, but not particularly useful for machine processing. Ontologies offer a mechanism by which knowledge can be represented in a form capable of such processing. In this paper we investigate the use of ontological annotation to measure the similarities in knowledge content or ‘semantic similarity’ between entries in a data resource. These allow a bioinformatician to perform a similarity measure over annotation in an analogous manner to those performed over sequences. Am easure of semantic similarity for the knowledge component of bioinformatics resources should afford a biologist a new tool in their repetoire of analyses. Results: We present the results from experiments that investigate the validity of using semantic similarity by comparison with sequence similarity. We show a simple extension that enables a semantic search of the knowledge held within sequence databases. Availability: Software available from http://www.russet.

...read moreread less

903 citations

Journal Article•DOI•

GenDB—an open source genome annotation system for prokaryote genomes

[...]

Folker Meyer¹, Alexander Goesmann, Alice C. McHardy, Daniela Bartels, Thomas Bekel, Jörn Clausen, Jörn Kalinowski, Burkhard Linke, Oliver Rupp, Robert Giegerich, Alfred Pühler - Show less +7 more•Institutions (1)

Bielefeld University¹

15 Apr 2003-Nucleic Acids Research

TL;DR: A genome annotation system for prokaryote genomes, which is well tested and readily adaptable to different tasks, and can be employed as a flexible framework for the large-scale evaluation of different annotation strategies.

...read moreread less

Abstract: The flood of sequence data resulting from the large number of current genome projects has increased the need for a flexible, open source genome annotation system, which so far has not existed. To account for the individual needs of different projects, such a system should be modular and easily extensible. We present a genome annotation system for prokaryote genomes, which is well tested and readily adaptable to different tasks. The modular system was developed using an object-oriented approach, and it relies on a relational database backend. Using a well defined application programmers interface (API), the system can be linked easily to other systems. GenDB supports manual as well as automatic annotation strategies. The software currently is in use in more than a dozen microbial genome annotation projects. In addition to its use as a production genome annotation system, it can be employed as a flexible framework for the large-scale evaluation of different annotation strategies. The system is open source.

...read moreread less

711 citations

Journal Article•DOI•

The Protein Information Resource

[...]

Cathy H. Wu¹, Lai-Su L. Yeh, Hongzhan Huang, Leslie Arminski, Jorge Castro-Alvear, Yongxing Chen, Zhang-Zhi Hu, Panagiotis Kourtesis, Robert S. Ledley, Baris E. Suzek, C. R. Vinayaka, Jian Zhang, Winona C. Barker - Show less +9 more•Institutions (1)

Georgetown University Medical Center¹

01 Jan 2003-Nucleic Acids Research

TL;DR: The Protein Information Resource is an integrated public resource of protein informatics that supports genomic and proteomic research and scientific discovery and has developed a bibliography system for literature searching, mapping, and user submission.

...read moreread less

Abstract: The Protein Information Resource (PIR) is an integrated public resource of protein informatics that supports genomic and proteomic research and scientific discovery. PIR maintains the Protein Sequence Database (PSD), an annotated protein database containing over 283 000 sequences covering the entire taxonomic range. Family classification is used for sensitive identification, consistent annotation, and detection of annotation errors. The superfamily curation defines signature domain architecture and categorizes memberships to improve automated classification. To increase the amount of experimental annotation, the PIR has developed a bibliography system for literature searching, mapping, and user submission, and has conducted retrospective attribution of citations for experimental features. PIR also maintains NREF, a non-redundant reference database, and iProClass, an integrated database of protein family, function, and structure information. PIR-NREF provides a timely and comprehensive collection of protein sequences, currently consisting of more than 1 000 000 entries from PIR-PSD, SWISS-PROT, TrEMBL, RefSeq, GenPept, and PDB. The PIR web site (http://pir.georgetown.edu) connects data analysis tools to underlying databases for information retrieval and knowledge discovery, with functionalities for interactive queries, combinations of sequence and text searches, and sorting and visual exploration of search results. The FTP site provides free download for PSD and NREF biweekly releases and auxiliary databases and files.

...read moreread less

446 citations

Book Chapter•DOI•

The Prague Dependency Treebank

[...]

Alena Böhmová¹, Jan Hajič¹, Eva Hajičová¹, Barbora Hladká¹•Institutions (1)

Charles University in Prague¹

01 Jan 2003

TL;DR: Inspired by the Penn Treebank, the most widely used syntactically annotated corpus of English, this work decided to develop a similarly sized corpus of Czech with a rich annotation scheme.

...read moreread less

Abstract: The availability of annotated data (with as rich and “deep” annotation as possible) is desirable in any new developments. Textual data are being used for so-called training phase of various empirical methods solving various problems in the field of computational linguistics. While there are many methods that use texts in their plain (or raw) form (in most cases for so-called unsupervised training), more accurate results may be obtained if annotated corpora are available. The data annotation itself is a complex task. While morphologically annotated corpora (pioneered by Henry Kucera in the 60’s) are now available for English and other languages, syntactically annotated corpora are rare. Inspired by the Penn Treebank, the most widely used syntactically annotated corpus of English, we decided to develop a similarly sized corpus of Czech with a rich annotation scheme.

...read moreread less

409 citations

Book Chapter•DOI•

Semantic annotation, indexing, and retrieval

[...]

Atanas Kiryakov¹, Borislav Popov¹, Damyan Ognyanoff¹, Dimitar Manov¹, Angel Kirilov¹, Miroslav Goranov¹ - Show less +2 more•Institutions (1)

Ontotext¹

20 Oct 2003

TL;DR: A simplistic upper-level ontology is introduced which starts with some basic philosophic distinctions and goes down to the most popular entity types, thus providing many of the inter-domain common sense concepts and allowing easy domain-specific extensions.

...read moreread less

Abstract: The Semantic Web realization depends on the availability of critical mass of metadata for the web content, linked to formal knowledge about the world. This paper presents our vision about a holistic system allowing annotation, indexing, and retrieval of documents with respect to real-world entities. A system (called KIM), partially implementing this concept is shortly presented and used for evaluation and demonstration. Our understanding is that a system for semantic annotation should be based upon specific knowledge about the world, rather than indifferent to any ontological commitments and general knowledge. To assure efficiency and reusability of the metadata we introduce a simplistic upper-level ontology which starts with some basic philosophic distinctions and goes down to the most popular entity types (people, companies, cities, etc.), thus providing many of the inter-domain common sense concepts and allowing easy domain-specific extensions. Based on the ontology, an extensive knowledge base of entities descriptions is maintained. Semantically enhanced information extraction system providing automatic annotation with references to classes in the ontology and instances in the knowledge base is presented. Based on these annotations, we perform IR-like indexing and retrieval, further extended using the ontology and knowledge about the specific entities.

...read moreread less

366 citations

Proceedings Article•DOI•

Image annotation using SVM

[...]

Claudio Cusano¹, Gianluigi Ciocca¹, Raimondo Schettini¹•Institutions (1)

University of Milano-Bicocca¹

22 Dec 2003-electronic imaging

TL;DR: An innovative image annotation tool for classifying image regions in one of seven classes - sky, skin, vegetation, snow, water, ground, and buildings - or as unknown is described.

...read moreread less

Abstract: The paper describes an innovative image annotation tool for classifying image regions in one of seven classes - sky, skin, vegetation, snow, water, ground, and buildings - or as unknown. This tool could be productively applied in the management of large image and video databases where a considerable volume of images/frames there must be automatically indexed. The annotation is performed by a classification system based on a multi-class Support Vector Machine. Experimental results on a test set of 200 images are reported and discussed.

...read moreread less

296 citations

Book Chapter•DOI•

KIM: semantic annotation platform

[...]

Borislav Popov¹, Atanas Kiryakov¹, Angel Kirilov¹, Dimitar Manov¹, Damyan Ognyanoff¹, Miroslav Goranov¹ - Show less +2 more•Institutions (1)

Ontotext¹

20 Oct 2003

TL;DR: The KIM platform allows KIM-based applications to use it for automatic semantic annotation, content retrieval based on semantic restrictions, and querying and modifying the underlying ontologies and knowledge bases.

...read moreread less

Abstract: The KIM platform provides a novel Knowledge and Information Management infrastructure and services for automatic semantic annotation, indexing, and retrieval of documents. It provides mature infrastructure for scaleable and customizable information extraction (IE) as well as annotation and document management, based on GATE. In order to provide basic level of performance and allow easy bootstrapping of applications, KIM is equipped with an upper-level ontology and a knowledge base providing extensive coverage of entities of general importance. The ontologies and knowledge bases involved are handled using cutting edge Semantic Web technology and standards, including RDF(S) repositories, ontology middleware and reasoning. From technical point of view, the platform allows KIM-based applications to use it for automatic semantic annotation, content retrieval based on semantic restrictions, and querying and modifying the underlying ontologies and knowledge bases. This paper presents the KIM platform, with emphasize on its architecture, interfaces, tools, and other technical issues.

...read moreread less

291 citations

Balancing Training Data for Automated Annotation of Keywords: a Case Study.

[...]

Gustavo E. A. P. A. Batista¹, Ana L. C. Bazzan¹, Maria Carolina Monard²•Institutions (2)

University of São Paulo¹, Universidade Federal do Rio Grande do Sul²

01 Jan 2003

TL;DR: The experiments show that the classifiers induced from balanced data sampled with the present work are more accurate than those induced from the original data.

...read moreread less

Abstract: There has been an increasing interest in tools for automating the annotation of databases Machine learning techniques are promising candidates to help curators to, at least, guide the process of annotation which is mostly done manually Following previous works on automated annotation using symbolic machine learning techniques, the present work deals with a common problem in machine learning: that classes usually have skewed class prior probabilities, ie, there is a large number of examples of one class compared with just few examples of the other class This happens due to the fact that a large number of proteins is not annotated for every feature Thus, we analyze and employ some techniques aiming at balancing the training data Our experiments show that the classifiers induced from balanced data sampled with our method are more accurate than those induced from the original data

...read moreread less

Semantic Annotation of Image Collections

[...]

Laura Hollink, A.T. Schreiber, Jan Wielemaker, Bob Wielinga¹•Institutions (1)

University of Amsterdam¹

01 Jan 2003

TL;DR: A tool for semantic annotation and search in a collection of art images using multiple existing ontologies, including the Art and Architecture Thesaurus, WordNet, ULAN and Iconclass is discussed.

...read moreread less

Abstract: In this paper we discuss a tool for semantic annotation and search in a collection of art images. Multiple existing ontologies are used to support this process, including the Art and Architecture Thesaurus, WordNet, ULAN and Iconclass. We discuss knowledge-engineering aspect such as the annotation structure and links between the ontologies. The annotation and search process is illustrated with an application scenario.

...read moreread less

Journal Article•DOI•

ASAP, a systematic annotation package for community analysis of genomes

[...]

Jeremy D. Glasner¹, Paul Liss, Guy Plunkett, Aaron E. Darling, Tejasvini Prasad, Michael Rusch, Alexis Byrnes, Michael K. Gilson, Bryan S. Biehl, Frederick R. Blattner, Nicole T. Perna - Show less +7 more•Institutions (1)

University of Wisconsin-Madison¹

01 Jan 2003-Nucleic Acids Research

TL;DR: Public viewers can currently browse updated annotation information for Escherichia coli K-12 strain MG1655, genome-wide transcript profiles from more than 50 microarray experiments and an extensive collection of mutant strains and associated phenotypic data.

...read moreread less

Abstract: ASAP (a systematic annotation package for community analysis of genomes) is a relational database and web interface developed to store, update and distribute genome sequence data and functional characterization (https://asap.ahabs.wisc.edu/annotation/php/ASAP1.htm). ASAP facilitates ongoing community annotation of genomes and tracking of information as genome projects move from preliminary data collection through post-sequencing functional analysis. The ASAP database includes multiple genome sequences at various stages of analysis, corresponding experimental data and access to collections of related genome resources. ASAP supports three levels of users: public viewers, annotators and curators. Public viewers can currently browse updated annotation information for Escherichia coli K-12 strain MG1655, genome-wide transcript profiles from more than 50 microarray experiments and an extensive collection of mutant strains and associated phenotypic data. Annotators worldwide are currently using ASAP to participate in a community annotation project for the Erwinia chrysanthemi strain 3937 genome. Curation of the E. chrysanthemi genome annotation as well as those of additional published enterobacterial genomes is underway and will be publicly accessible in the near future.

...read moreread less

Book Chapter•DOI•

Automated Scoring and Annotation of Essays with the Intelligent Essay Assessor

[...]

Jill C. Burstein

30 Jan 2003

Journal Article•DOI•

GoFigure: automated Gene Ontology annotation

[...]

Salim Khan¹, Gang Situ, Keith Decker, Carl J. Schmidt²•Institutions (2)

University UCINF¹, University of Delaware²

12 Dec 2003-Bioinformatics

TL;DR: A web tool to predict Gene Ontology (GO) terms that uses BLAST to identify homologous sequences in GO annotated databases and a graph is returned to the user via email.

...read moreread less

Abstract: Summary: We have developed a web tool to predict Gene Ontology (GO) terms. The tool accepts an input DNA or protein sequence, and uses BLAST to identify homologous sequences in GO annotated databases. A graph is returned to the user via email. Availability: The tool is freely available at: http://udgenome.

...read moreread less

Journal Article•DOI•

Predicting Gene Function From Patterns of Annotation

[...]

Oliver D. King¹, Rebecca E. Foulger, Selina S. Dwight, James V. White, Frederick P. Roth - Show less +1 more•Institutions (1)

Harvard University¹

01 May 2003-Genome Research

TL;DR: The Gene Ontology (GO) Consortium has produced a controlled vocabulary for annotation of gene function that is used in many organism-specific gene annotation databases, and the relationships among GO attributes with decision trees and Bayesian networks are modeled.

...read moreread less

Abstract: The Gene Ontology Consortium (Gene Ontology Consortium 2000) provides a standardized vocabulary for the annotation of gene attributes, which fall into the three general categories of molecular function, biological process, and cellular component. Organism-specific databases such as FlyBase (FlyBase Consortium 2002), Saccharomyces Genome Database (SGD; Cherry et al. 1998), Mouse Genome Database (MGD; Blake et al. 2002), and WormBase (Stein et al. 2001), have codeveloped this vocabulary, and have used it to annotate genes with the attributes that the biomedical literature asserts that they hold. These databases are incomplete because there are genes whose attributes are not yet all known, and because there is literature that has not yet been digested by the database curators. In such cases it is useful to have a prediction of whether a gene has a certain attribute. Such predictions can help to make the databases more complete (and consequently more useful to researchers) by directing curators toward literature that they may have overlooked. Also, predictions that are not presently supported by the literature provide new hypotheses that may be tested experimentally. A variety of approaches for predicting Gene Ontology (GO) attributes have been attempted. Natural language processing was used in Raychaudhuri et al. (2002) to automate the curator's task of extracting gene–attribute associations from literature abstracts. Others have assigned attributes to genes on the basis of microarray data (Hvidsten et al. 2001) or protein folds (Schug et al. 2002). These approaches are especially valuable for assigning attributes to genes with otherwise unknown function. But once some attributes of a gene are known, statistical patterns among the annotations themselves can be useful for predicting additional attributes. In this paper, we model the probabilistic relationships between the GO annotations using two approaches, one based on decision trees and the other based on Bayesian networks. We assess the models using cross-validation on the SGD and FlyBase databases. We also manually assess 100 of those gene–attribute associations that the models indicate are likely to hold but that have not been annotated in the databases.

...read moreread less

Proceedings Article•DOI•

On deep annotation

[...]

Siegfried Handschuh¹, Steffen Staab¹, Raphael Volz¹•Institutions (1)

Karlsruhe Institute of Technology¹

20 May 2003

TL;DR: This work describes a framework of metadata creation when web pages are generated from a database and the database owner is cooperatively participating in the Semantic Web, and refers to the framework as deep annotation 1.

...read moreread less

Abstract: The success of the Semantic Web crucially depends on the easy creation, integration and use of semantic data. For this purpose, we consider an integration scenario that defies core assumptions of current metadata construction methods. We describe a framework of metadata creation when web pages are generated from a database and the database owner is cooperatively participating in the Semantic Web. This leads us to the definition of ontology mapping rules by manual semantic annotation and the usage of the mapping rules and of web services for semantic queries. In order to create metadata, the framework combines the presentation layer with the data description layer -- in contrast to "conventional" annotation, which remains at the presentation layer. Therefore, we refer to the framework as deep annotation 1.We consider deep annotation as particularly valid because, (i), web pages generated from databases outnumber static web pages, (ii), annotation of web pages may be a very intuitive way to create semantic data from a database and, (iii), data from databases should not be materialized as RDF files, it should remain where it can be handled most efficiently -- in its databases.

...read moreread less

Proceedings Article•DOI•

Automated annotation of human faces in family albums

[...]

Lei Zhang¹, Longbin Chen², Mingjing Li¹, Hong-Jiang Zhang¹•Institutions (2)

Microsoft¹, Chinese Academy of Sciences²

02 Nov 2003

TL;DR: The experimental evaluation has been conducted within a family album of few thousands of photographs and the results show that the proposed approach is effective and efficient in automated face annotation in family albums.

...read moreread less

Abstract: Automatic annotation of photographs is one of the most desirable needs in family photograph management systems. In this paper, we present a learning framework to automate the face annotation in family photograph albums. Firstly, methodologies of content-based image retrieval and face recognition are seamlessly integrated to achieve automated annotation. Secondly, face annotation is formulated in a Bayesian framework, in which the face similarity measure is defined as maximum a posteriori (MAP) estimation. Thirdly, to deal with the missing features, marginal probability is used so that samples which have missing features are compared with those having the full feature set to ensure a non-biased decision. The experimental evaluation has been conducted within a family album of few thousands of photographs and the results show that the proposed approach is effective and efficient in automated face annotation in family albums.

...read moreread less

Video Collaborative Annotation Forum: Establishing Ground-Truth Labels on Large Multimedia Datasets

[...]

Ching-Yung Lin¹, Belle L. Tseng, John R. Smith•Institutions (1)

IBM¹

01 Jan 2003

TL;DR: A new version of The VideoAnnEx is developed, a.k.a. IBM MPEG-7 Annotation Tool, for collaborative multimedia annotation task in a distributed environment, and a forum to collaboratively annotate semantic labels to the NIST TRECVID 2003 development set is proposed.

...read moreread less

Abstract: We developed a new version of The VideoAnnEx, a.k.a. IBM MPEG-7 Annotation Tool, for collaborative multimedia annotation task in a distributed environment. The VideoAnnEx assists authors in the task of annotating video sequences with MPEG-7 metadata. Each shot in the video sequence can be annotated with static scene descriptions, key object descriptions, event descriptions, and other lexicon sets. The annotated descriptions are associated with each video shot or regions in the keyframes, and are stored as MPEG-7 XML file. We proposed a forum to collaboratively annotate semantic labels to the NIST TRECVID 2003 development set. From April to July 2003, 111 researchers from 23 institutes worked together to associate 198K of ground-truth labels (433K after hierarchy propagation) to 62.2 hours of videos. This large set of valuable ground-truth data is publicly available to the research community, especially for multimedia indexing and retrieval, semantic understanding, and supervised machine learning fields.

...read moreread less

Patent•

System and method for indicating an annotation for a document

[...]

Rami Caspi¹•Institutions (1)

Siemens Communications¹

05 Jun 2003

TL;DR: Embodiments provide a system, method, apparatus, means and computer program code that allow multiple annotations to a document to be created and that distinguish between the annotations made by different people as mentioned in this paper.

...read moreread less

Abstract: Embodiments provide a system, method, apparatus, means, and computer program code that allow multiple annotations to a document to be created and that distinguish between the annotations made by different people. The people may view documents, exchange ideas and messages, etc. via a server or conference/collaboration system at different times and/or without being in direct communication with each other. In such an off-line collaboration mode, the people may want to add listen to, view, or add annotations regarding one or more documents. The methods and systems described herein allow users to follow the trail of annotations regarding a document and to distinguish between the voice or other audible annotations created by other people.

...read moreread less

Journal Article•DOI•

The Gene Ontology (GO) Project: Structured Vocabularies for Molecular Biology and Their Application to Genome and Expression Analysis

[...]

Judith A. Blake, Midori A. Harris¹•Institutions (1)

Wellcome Trust¹

01 Jan 2003-Current protocols in human genetics

TL;DR: This unit addresses the issue of how GO vocabularies are constructed and related to genes and gene products and concludes with a discussion of how researchers can access, browse, and utilize the GO project in the course of their own research.

...read moreread less

Abstract: Scientists wishing to utilize genomic data have quickly come to realize the benefit of standardizing descriptions of experimental procedures and results for computer-driven information retrieval systems. The focus of the Gene Ontology project is three-fold. First, the project goal is to compile the Gene Ontologies; structured vocabularies describing domains of molecular biology. Second, the project supports the use of these structured vocabularies in the annotation of gene products. Third, the gene product-to-GO annotation sets are provided by participating groups to the public through open access to the GO database and Web resource. This unit describes the current ontologies and what is beyond the scope of the Gene Ontology project. It addresses the issue of how GO vocabularies are constructed and related to genes and gene products. It concludes with a discussion of how researchers can access, browse, and utilize the GO project in the course of their own research.

...read moreread less

Journal Article•DOI•

Automated annotation of microbial proteomes in SWISS-PROT

[...]

Alexandre Gattiker¹, Karine Michoud¹, Catherine Rivoire¹, Andrea H. Auchincloss¹, Elisabeth Coudert¹, Tania Lima¹, Paul J. Kersey², Marco Pagni¹, Christian J. A. Sigrist¹, Corinne Lachaize¹, Anne-Lise Veuthey¹, Elisabeth Gasteiger¹, Amos Marc Bairoch¹ - Show less +9 more•Institutions (2)

Swiss Institute of Bioinformatics¹, European Bioinformatics Institute²

01 Feb 2003-Computational Biology and Chemistry

TL;DR: The HAMAP project, or 'High-quality Automated and Manual Annotation of microbial Proteomes', aims to integrate manual and automatic annotation methods in order to enhance the speed of the curation process while preserving the quality of the database annotation.

...read moreread less

Journal Article•DOI•

Protein family classification and functional annotation

[...]

Cathy H. Wu¹, Hongzhan Huang¹, Lai-Su L. Yeh¹, Winona C. Barker¹•Institutions (1)

Georgetown University Medical Center¹

01 Feb 2003-Computational Biology and Chemistry

TL;DR: The approach to protein functional annotation with case studies and examines common identification errors is described and it is illustrated that data integration in PIR supports exploration of protein relationships and may reveal protein functional associations beyond sequence homology.

...read moreread less

Patent•

Annotation management system

[...]

Steve Nelson¹, Jason Harris¹•Institutions (1)

Epson¹

15 May 2003

TL;DR: In this article, an annotation management system for providing real-time annotations for media content during a videoconference session is provided, which includes a media management server configured to manage media data and annotation data for distribution to participants of the videocon conference session.

...read moreread less

Abstract: An annotation management system for providing real-time annotations for media content during a videoconference session is provided. The annotation management system includes a media management server configured to manage media data and annotation data for distribution to participants of the videoconference session. A storage server in communication with the media management server is configured to store the media data and the annotation data. An event database in communication with the media management server is configured to capture events associated with the annotation data. A media analysis server is in communication with the media management server, the event database, and the storage server. The media analysis server is configured to associate the stored annotation data with the captured events to enable reconstruction of the videoconference session based on the captured events. A videoconference system, a computer readable medium, a graphical user interface, and a method are also included.

...read moreread less

Proceedings Article•DOI•

Detecting errors in part-of-speech annotation

[...]

Markus Dickinson¹, W. Detmar Meurers¹•Institutions (1)

Ohio State University¹

12 Apr 2003

TL;DR: A new method is proposed for detecting errors in "gold-standard" part-of-speech annotation based on n-grams occurring in the corpus with multiple taggings based on closed-class analysis and finite-state tagging guide patterns.

...read moreread less

Abstract: We propose a new method for detecting errors in "gold-standard" part-of-speech annotation. The approach locates errors with high precision based on n-grams occurring in the corpus with multiple taggings. Two further techniques, closed-class analysis and finite-state tagging guide patterns, are discussed. The success of the three approaches is illustrated for the Wall Street Journal corpus as part of the Penn Tree-bank.

...read moreread less

Automatic annotation of data extracted from large web sites

[...]

Luigi Arlotta, Valter Crescenzi, Giansalvatore Mecca, Paolo Merialdo

01 Jan 2003

TL;DR: In the framework of the ongoing project RoadRunner, this work has developed a prototype, called Labeller, that automatically annotates data extracted by automatically generated wrappers, and its underlying approach has a general validity and therefore it can be applied together with other wrapper generator systems.

...read moreread less

Abstract: Data extraction from web pages is performed by software modules called wrappers. Recently, some systems for the automatic generation of wrappers have been proposed in the literature. These systems are based on unsupervised inference techniques: taking as input a small set of sample pages, they can produce a common wrapper to extract relevant data. However, due to the automatic nature of the approach, the data extracted by these wrappers have anonymous names. In the framework of our ongoing project RoadRunner, we have developed a prototype, called Labeller, that automatically annotates data extracted by automatically generated wrappers. Although Labeller has been developed as a companion system to our wrapper generator, its underlying approach has a general validity and therefore it can be applied together with other wrapper generator systems. We have experimented the prototype over several real-life web sites obtaining encouraging results.

...read moreread less

Automatic Image Annotation and Retrieval using CrossMedia Relevance Models

[...]

Jiwoon Jeon¹, V. Lavrenko¹, R. Manmatha¹•Institutions (1)

University of Massachusetts Amherst¹

01 Jan 2003

...read moreread less

Abstract: Libraries have traditionally used manual image annotation for indexing and then later retrieving their image collections. However, manual image annotation is an expensive and labor intensive procedure and hence there has been great interest in coming up with automatic ways to retrieve images based on content. Here, we propose an automatic approach to annotating and retrieving images based on a training set of images. We assume that regions in an image can be described using a small vocabulary of blobs. Blobs are generated from image features using clustering. Given a training set of images with annotations, we show that probabilistic models allow us to predict the probability of generating a word given the blobs in an image. This may be used to automatically annotate and retrieve images given a word as a query. We show that relevance models. allow us to derive these probabilities in a natural way. Experiments show that the annotation performance of this cross-media relevance model is almost six times as good (in terms of mean precision) than a model based on word-blob co-occurrence model and twice as good as a state of the art model derived from machine translation. Our approach shows the usefulness of using formal information retrieval models for the task of image annotation and retrieval.

...read moreread less

Journal Article•DOI•

Automated Gene Ontology annotation for anonymous sequence data

[...]

Steffen Hennig¹, Detlef Groth, Hans Lehrach¹•Institutions (1)

Max Planck Society¹

01 Jul 2003-Nucleic Acids Research

TL;DR: A software package, which performs annotation based on GO terms for anonymous cDNA or protein sequences using the species independent GO structure and vocabulary together with a series of protein databases collected from various sites, to perform a detailed GO annotation by sequence similarity searches.

...read moreread less

Abstract: Gene Ontology (GO) is the most widely accepted attempt to construct a unified and structured vocabulary for the description of genes and their products in any organism. Annotation by GO terms is performed in most of the current genome projects, which besides generality has the advantage of being very convenient for computer based classification methods. However, direct use of GO in small sequencing projects is not easy, especially for species not commonly represented in public databases. We present a software package (GOblet), which performs annotation based on GO terms for anonymous cDNA or protein sequences. It uses the species independent GO structure and vocabulary together with a series of protein databases collected from various sites, to perform a detailed GO annotation by sequence similarity searches. The sensitivity and the reference protein sets can be selected by the user. GOblet runs automatically and is available as a public service on our web server. The paper also addresses the reliability of automated GO annotations by using a reference set of more than 6000 human proteins. The GOblet server is accessible at http://goblet.molgen.mpg.de.

...read moreread less

Journal Article•DOI•

ChipInfo: software for extracting gene annotation and gene ontology information for microarray analysis

[...]

Sheng Zhong¹, Cheng Li, Wing Hung Wong•Institutions (1)

Harvard University¹

01 Jul 2003-Nucleic Acids Research

TL;DR: ChipInfo is designed for retrieving annotation information from online databases and organizing such information into easily interpretable tabular format outputs and enables users to independently update the information resource files of these software packages.

...read moreread less

Abstract: To date, assembling comprehensive annotation information for all probe sets of any Affymetrix microarrays remains a time-consuming, error-prone and challenging task. ChipInfo is designed for retrieving annotation information from online databases such as NetAffx and Gene Ontology and organizing such information into easily interpretable tabular format outputs. As companion software to dChip and GoSurfer, ChipInfo enables users to independently update the information resource files of these software packages. It also has functions for computing related summary statistics of probe sets and Gene Ontology terms. ChipInfo is available at http://biosun1.harvard.edu/complab/chipinfo/.

...read moreread less

Book Chapter•DOI•

Syntactic annotation of a german newspaper corpus

[...]

Thorsten Brants¹, Thorsten Brants², Wojciech Skut¹, Hans Uszkoreit¹•Institutions (2)

Saarland University¹, PARC²

01 Jan 2003

TL;DR: A new, interactive semi-automatic annotation process that allows efficient and reliable annotations and is sped up by incrementally presenting structures and by automatically highlighting unreliable assignments is presented.

...read moreread less

Abstract: We report on the syntactic annotation of a German newspaper corpus. The annotations consist of context-free structures, additionally allowing crossing branches, with labeled nodes (phrases) and edges (grammatical functions). Furthermore, we present a new, interactive semi-automatic annotation process that allows efficient and reliable annotations. The annotation process is sped up by incrementally presenting structures and by automatically highlighting unreliable assignments.

...read moreread less

Collapse