scispace - formally typeset
Search or ask a question

Showing papers on "Annotation published in 2003"


Journal ArticleDOI
TL;DR: DAMID is a web-accessible program that integrates functional genomic annotations with intuitive graphical summaries that assists in the interpretation of genome-scale datasets by facilitating the transition from data collection to biological meaning.
Abstract: The distributed nature of biological knowledge poses a major challenge to the interpretation of genome-scale datasets, including those derived from microarray and proteomic studies. This report describes DAVID, a web-accessible program that integrates functional genomic annotations with intuitive graphical summaries. Lists of gene or protein identifiers are rapidly annotated and summarized according to shared categorical data for Gene Ontology, protein domain, and biochemical pathway membership. DAVID assists in the interpretation of genome-scale datasets by facilitating the transition from data collection to biological meaning.

8,849 citations


Journal ArticleDOI
TL;DR: The SWISS-PROT protein knowledgebase connects amino acid sequences with the current knowledge in the Life Sciences by providing an interdisciplinary overview of relevant information by bringing together experimental results, computed features and sometimes even contradictory conclusions.
Abstract: The SWISS-PROT protein knowledgebase (http://www.expasy.org/sprot/ and http://www.ebi.ac.uk/swissprot/) connects amino acid sequences with the current knowledge in the Life Sciences. Each protein entry provides an interdisciplinary overview of relevant information by bringing together experimental results, computed features and sometimes even contradictory conclusions. Detailed expertise that goes beyond the scope of SWISS-PROT is made available via direct links to specialised databases. SWISS-PROT provides annotated entries for all species, but concentrates on the annotation of entries from human (the HPI project) and other model organisms to ensure the presence of high quality annotation for representative members of all protein families. Part of the annotation can be transferred to other family members, as is already done for microbes by the High-quality Automated and Manual Annotation of microbial Proteomes (HAMAP) project. Protein families and groups of proteins are regularly reviewed to keep up with current scientific findings. Complementarily, TrEMBL strives to comprise all protein sequences that are not yet represented in SWISS-PROT, by incorporating a perpetually increasing level of mostly automated annotation. Researchers are welcome to contribute their knowledge to the scientific community by submitting relevant findings to SWISS-PROT at swiss-prot@expasy.org.

3,440 citations


Proceedings ArticleDOI
28 Jul 2003
TL;DR: The approach shows the usefulness of using formal information retrieval models for the task of image annotation and retrieval by assuming that regions in an image can be described using a small vocabulary of blobs.
Abstract: Libraries have traditionally used manual image annotation for indexing and then later retrieving their image collections. However, manual image annotation is an expensive and labor intensive procedure and hence there has been great interest in coming up with automatic ways to retrieve images based on content. Here, we propose an automatic approach to annotating and retrieving images based on a training set of images. We assume that regions in an image can be described using a small vocabulary of blobs. Blobs are generated from image features using clustering. Given a training set of images with annotations, we show that probabilistic models allow us to predict the probability of generating a word given the blobs in an image. This may be used to automatically annotate and retrieve images given a word as a query. We show that relevance models allow us to derive these probabilities in a natural way. Experiments show that the annotation performance of this cross-media relevance model is almost six times as good (in terms of mean precision) than a model based on word-blob co-occurrence model and twice as good as a state of the art model derived from machine translation. Our approach shows the usefulness of using formal information retrieval models for the task of image annotation and retrieval.

1,275 citations


Journal ArticleDOI
TL;DR: In this paper, the authors investigate the use of ontological annotation to measure the similarities in knowledge content or "semantic similarity" between entries in a data resource, and present a simple extension that enables a semantic search of the knowledge held within sequence databases.
Abstract: Motivation: Many bioinformatics data resources not only hold data in the form of sequences, but also as annotation. In the majority of cases, annotation is written as scientific natural language: this is suitable for humans, but not particularly useful for machine processing. Ontologies offer a mechanism by which knowledge can be represented in a form capable of such processing. In this paper we investigate the use of ontological annotation to measure the similarities in knowledge content or ‘semantic similarity’ between entries in a data resource. These allow a bioinformatician to perform a similarity measure over annotation in an analogous manner to those performed over sequences. Am easure of semantic similarity for the knowledge component of bioinformatics resources should afford a biologist a new tool in their repetoire of analyses. Results: We present the results from experiments that investigate the validity of using semantic similarity by comparison with sequence similarity. We show a simple extension that enables a semantic search of the knowledge held within sequence databases. Availability: Software available from http://www.russet.

903 citations


Journal ArticleDOI
TL;DR: A genome annotation system for prokaryote genomes, which is well tested and readily adaptable to different tasks, and can be employed as a flexible framework for the large-scale evaluation of different annotation strategies.
Abstract: The flood of sequence data resulting from the large number of current genome projects has increased the need for a flexible, open source genome annotation system, which so far has not existed. To account for the individual needs of different projects, such a system should be modular and easily extensible. We present a genome annotation system for prokaryote genomes, which is well tested and readily adaptable to different tasks. The modular system was developed using an object-oriented approach, and it relies on a relational database backend. Using a well defined application programmers interface (API), the system can be linked easily to other systems. GenDB supports manual as well as automatic annotation strategies. The software currently is in use in more than a dozen microbial genome annotation projects. In addition to its use as a production genome annotation system, it can be employed as a flexible framework for the large-scale evaluation of different annotation strategies. The system is open source.

711 citations


Journal ArticleDOI
TL;DR: The Protein Information Resource is an integrated public resource of protein informatics that supports genomic and proteomic research and scientific discovery and has developed a bibliography system for literature searching, mapping, and user submission.
Abstract: The Protein Information Resource (PIR) is an integrated public resource of protein informatics that supports genomic and proteomic research and scientific discovery. PIR maintains the Protein Sequence Database (PSD), an annotated protein database containing over 283 000 sequences covering the entire taxonomic range. Family classification is used for sensitive identification, consistent annotation, and detection of annotation errors. The superfamily curation defines signature domain architecture and categorizes memberships to improve automated classification. To increase the amount of experimental annotation, the PIR has developed a bibliography system for literature searching, mapping, and user submission, and has conducted retrospective attribution of citations for experimental features. PIR also maintains NREF, a non-redundant reference database, and iProClass, an integrated database of protein family, function, and structure information. PIR-NREF provides a timely and comprehensive collection of protein sequences, currently consisting of more than 1 000 000 entries from PIR-PSD, SWISS-PROT, TrEMBL, RefSeq, GenPept, and PDB. The PIR web site (http://pir.georgetown.edu) connects data analysis tools to underlying databases for information retrieval and knowledge discovery, with functionalities for interactive queries, combinations of sequence and text searches, and sorting and visual exploration of search results. The FTP site provides free download for PSD and NREF biweekly releases and auxiliary databases and files.

446 citations


Book ChapterDOI
01 Jan 2003
TL;DR: Inspired by the Penn Treebank, the most widely used syntactically annotated corpus of English, this work decided to develop a similarly sized corpus of Czech with a rich annotation scheme.
Abstract: The availability of annotated data (with as rich and “deep” annotation as possible) is desirable in any new developments. Textual data are being used for so-called training phase of various empirical methods solving various problems in the field of computational linguistics. While there are many methods that use texts in their plain (or raw) form (in most cases for so-called unsupervised training), more accurate results may be obtained if annotated corpora are available. The data annotation itself is a complex task. While morphologically annotated corpora (pioneered by Henry Kucera in the 60’s) are now available for English and other languages, syntactically annotated corpora are rare. Inspired by the Penn Treebank, the most widely used syntactically annotated corpus of English, we decided to develop a similarly sized corpus of Czech with a rich annotation scheme.

409 citations


Book ChapterDOI
20 Oct 2003
TL;DR: A simplistic upper-level ontology is introduced which starts with some basic philosophic distinctions and goes down to the most popular entity types, thus providing many of the inter-domain common sense concepts and allowing easy domain-specific extensions.
Abstract: The Semantic Web realization depends on the availability of critical mass of metadata for the web content, linked to formal knowledge about the world. This paper presents our vision about a holistic system allowing annotation, indexing, and retrieval of documents with respect to real-world entities. A system (called KIM), partially implementing this concept is shortly presented and used for evaluation and demonstration. Our understanding is that a system for semantic annotation should be based upon specific knowledge about the world, rather than indifferent to any ontological commitments and general knowledge. To assure efficiency and reusability of the metadata we introduce a simplistic upper-level ontology which starts with some basic philosophic distinctions and goes down to the most popular entity types (people, companies, cities, etc.), thus providing many of the inter-domain common sense concepts and allowing easy domain-specific extensions. Based on the ontology, an extensive knowledge base of entities descriptions is maintained. Semantically enhanced information extraction system providing automatic annotation with references to classes in the ontology and instances in the knowledge base is presented. Based on these annotations, we perform IR-like indexing and retrieval, further extended using the ontology and knowledge about the specific entities.

366 citations


Proceedings ArticleDOI
TL;DR: An innovative image annotation tool for classifying image regions in one of seven classes - sky, skin, vegetation, snow, water, ground, and buildings - or as unknown is described.
Abstract: The paper describes an innovative image annotation tool for classifying image regions in one of seven classes - sky, skin, vegetation, snow, water, ground, and buildings - or as unknown. This tool could be productively applied in the management of large image and video databases where a considerable volume of images/frames there must be automatically indexed. The annotation is performed by a classification system based on a multi-class Support Vector Machine. Experimental results on a test set of 200 images are reported and discussed.

296 citations


Book ChapterDOI
20 Oct 2003
TL;DR: The KIM platform allows KIM-based applications to use it for automatic semantic annotation, content retrieval based on semantic restrictions, and querying and modifying the underlying ontologies and knowledge bases.
Abstract: The KIM platform provides a novel Knowledge and Information Management infrastructure and services for automatic semantic annotation, indexing, and retrieval of documents. It provides mature infrastructure for scaleable and customizable information extraction (IE) as well as annotation and document management, based on GATE. In order to provide basic level of performance and allow easy bootstrapping of applications, KIM is equipped with an upper-level ontology and a knowledge base providing extensive coverage of entities of general importance. The ontologies and knowledge bases involved are handled using cutting edge Semantic Web technology and standards, including RDF(S) repositories, ontology middleware and reasoning. From technical point of view, the platform allows KIM-based applications to use it for automatic semantic annotation, content retrieval based on semantic restrictions, and querying and modifying the underlying ontologies and knowledge bases. This paper presents the KIM platform, with emphasize on its architecture, interfaces, tools, and other technical issues.

291 citations


01 Jan 2003
TL;DR: The experiments show that the classifiers induced from balanced data sampled with the present work are more accurate than those induced from the original data.
Abstract: There has been an increasing interest in tools for automating the annotation of databases Machine learning techniques are promising candidates to help curators to, at least, guide the process of annotation which is mostly done manually Following previous works on automated annotation using symbolic machine learning techniques, the present work deals with a common problem in machine learning: that classes usually have skewed class prior probabilities, ie, there is a large number of examples of one class compared with just few examples of the other class This happens due to the fact that a large number of proteins is not annotated for every feature Thus, we analyze and employ some techniques aiming at balancing the training data Our experiments show that the classifiers induced from balanced data sampled with our method are more accurate than those induced from the original data

01 Jan 2003
TL;DR: A tool for semantic annotation and search in a collection of art images using multiple existing ontologies, including the Art and Architecture Thesaurus, WordNet, ULAN and Iconclass is discussed.
Abstract: In this paper we discuss a tool for semantic annotation and search in a collection of art images. Multiple existing ontologies are used to support this process, including the Art and Architecture Thesaurus, WordNet, ULAN and Iconclass. We discuss knowledge-engineering aspect such as the annotation structure and links between the ontologies. The annotation and search process is illustrated with an application scenario.

Journal ArticleDOI
TL;DR: Public viewers can currently browse updated annotation information for Escherichia coli K-12 strain MG1655, genome-wide transcript profiles from more than 50 microarray experiments and an extensive collection of mutant strains and associated phenotypic data.
Abstract: ASAP (a systematic annotation package for community analysis of genomes) is a relational database and web interface developed to store, update and distribute genome sequence data and functional characterization (https://asap.ahabs.wisc.edu/annotation/php/ASAP1.htm). ASAP facilitates ongoing community annotation of genomes and tracking of information as genome projects move from preliminary data collection through post-sequencing functional analysis. The ASAP database includes multiple genome sequences at various stages of analysis, corresponding experimental data and access to collections of related genome resources. ASAP supports three levels of users: public viewers, annotators and curators. Public viewers can currently browse updated annotation information for Escherichia coli K-12 strain MG1655, genome-wide transcript profiles from more than 50 microarray experiments and an extensive collection of mutant strains and associated phenotypic data. Annotators worldwide are currently using ASAP to participate in a community annotation project for the Erwinia chrysanthemi strain 3937 genome. Curation of the E. chrysanthemi genome annotation as well as those of additional published enterobacterial genomes is underway and will be publicly accessible in the near future.


Journal ArticleDOI
TL;DR: A web tool to predict Gene Ontology (GO) terms that uses BLAST to identify homologous sequences in GO annotated databases and a graph is returned to the user via email.
Abstract: Summary: We have developed a web tool to predict Gene Ontology (GO) terms. The tool accepts an input DNA or protein sequence, and uses BLAST to identify homologous sequences in GO annotated databases. A graph is returned to the user via email. Availability: The tool is freely available at: http://udgenome.

Journal ArticleDOI
TL;DR: The Gene Ontology (GO) Consortium has produced a controlled vocabulary for annotation of gene function that is used in many organism-specific gene annotation databases, and the relationships among GO attributes with decision trees and Bayesian networks are modeled.
Abstract: The Gene Ontology Consortium (Gene Ontology Consortium 2000) provides a standardized vocabulary for the annotation of gene attributes, which fall into the three general categories of molecular function, biological process, and cellular component. Organism-specific databases such as FlyBase (FlyBase Consortium 2002), Saccharomyces Genome Database (SGD; Cherry et al. 1998), Mouse Genome Database (MGD; Blake et al. 2002), and WormBase (Stein et al. 2001), have codeveloped this vocabulary, and have used it to annotate genes with the attributes that the biomedical literature asserts that they hold. These databases are incomplete because there are genes whose attributes are not yet all known, and because there is literature that has not yet been digested by the database curators. In such cases it is useful to have a prediction of whether a gene has a certain attribute. Such predictions can help to make the databases more complete (and consequently more useful to researchers) by directing curators toward literature that they may have overlooked. Also, predictions that are not presently supported by the literature provide new hypotheses that may be tested experimentally. A variety of approaches for predicting Gene Ontology (GO) attributes have been attempted. Natural language processing was used in Raychaudhuri et al. (2002) to automate the curator's task of extracting gene–attribute associations from literature abstracts. Others have assigned attributes to genes on the basis of microarray data (Hvidsten et al. 2001) or protein folds (Schug et al. 2002). These approaches are especially valuable for assigning attributes to genes with otherwise unknown function. But once some attributes of a gene are known, statistical patterns among the annotations themselves can be useful for predicting additional attributes. In this paper, we model the probabilistic relationships between the GO annotations using two approaches, one based on decision trees and the other based on Bayesian networks. We assess the models using cross-validation on the SGD and FlyBase databases. We also manually assess 100 of those gene–attribute associations that the models indicate are likely to hold but that have not been annotated in the databases.

Proceedings ArticleDOI
20 May 2003
TL;DR: This work describes a framework of metadata creation when web pages are generated from a database and the database owner is cooperatively participating in the Semantic Web, and refers to the framework as deep annotation 1.
Abstract: The success of the Semantic Web crucially depends on the easy creation, integration and use of semantic data. For this purpose, we consider an integration scenario that defies core assumptions of current metadata construction methods. We describe a framework of metadata creation when web pages are generated from a database and the database owner is cooperatively participating in the Semantic Web. This leads us to the definition of ontology mapping rules by manual semantic annotation and the usage of the mapping rules and of web services for semantic queries. In order to create metadata, the framework combines the presentation layer with the data description layer -- in contrast to "conventional" annotation, which remains at the presentation layer. Therefore, we refer to the framework as deep annotation 1.We consider deep annotation as particularly valid because, (i), web pages generated from databases outnumber static web pages, (ii), annotation of web pages may be a very intuitive way to create semantic data from a database and, (iii), data from databases should not be materialized as RDF files, it should remain where it can be handled most efficiently -- in its databases.

Proceedings ArticleDOI
02 Nov 2003
TL;DR: The experimental evaluation has been conducted within a family album of few thousands of photographs and the results show that the proposed approach is effective and efficient in automated face annotation in family albums.
Abstract: Automatic annotation of photographs is one of the most desirable needs in family photograph management systems. In this paper, we present a learning framework to automate the face annotation in family photograph albums. Firstly, methodologies of content-based image retrieval and face recognition are seamlessly integrated to achieve automated annotation. Secondly, face annotation is formulated in a Bayesian framework, in which the face similarity measure is defined as maximum a posteriori (MAP) estimation. Thirdly, to deal with the missing features, marginal probability is used so that samples which have missing features are compared with those having the full feature set to ensure a non-biased decision. The experimental evaluation has been conducted within a family album of few thousands of photographs and the results show that the proposed approach is effective and efficient in automated face annotation in family albums.

01 Jan 2003
TL;DR: A new version of The VideoAnnEx is developed, a.k.a. IBM MPEG-7 Annotation Tool, for collaborative multimedia annotation task in a distributed environment, and a forum to collaboratively annotate semantic labels to the NIST TRECVID 2003 development set is proposed.
Abstract: We developed a new version of The VideoAnnEx, a.k.a. IBM MPEG-7 Annotation Tool, for collaborative multimedia annotation task in a distributed environment. The VideoAnnEx assists authors in the task of annotating video sequences with MPEG-7 metadata. Each shot in the video sequence can be annotated with static scene descriptions, key object descriptions, event descriptions, and other lexicon sets. The annotated descriptions are associated with each video shot or regions in the keyframes, and are stored as MPEG-7 XML file. We proposed a forum to collaboratively annotate semantic labels to the NIST TRECVID 2003 development set. From April to July 2003, 111 researchers from 23 institutes worked together to associate 198K of ground-truth labels (433K after hierarchy propagation) to 62.2 hours of videos. This large set of valuable ground-truth data is publicly available to the research community, especially for multimedia indexing and retrieval, semantic understanding, and supervised machine learning fields.

Patent
05 Jun 2003
TL;DR: Embodiments provide a system, method, apparatus, means and computer program code that allow multiple annotations to a document to be created and that distinguish between the annotations made by different people as mentioned in this paper.
Abstract: Embodiments provide a system, method, apparatus, means, and computer program code that allow multiple annotations to a document to be created and that distinguish between the annotations made by different people. The people may view documents, exchange ideas and messages, etc. via a server or conference/collaboration system at different times and/or without being in direct communication with each other. In such an off-line collaboration mode, the people may want to add listen to, view, or add annotations regarding one or more documents. The methods and systems described herein allow users to follow the trail of annotations regarding a document and to distinguish between the voice or other audible annotations created by other people.

Journal ArticleDOI
TL;DR: This unit addresses the issue of how GO vocabularies are constructed and related to genes and gene products and concludes with a discussion of how researchers can access, browse, and utilize the GO project in the course of their own research.
Abstract: Scientists wishing to utilize genomic data have quickly come to realize the benefit of standardizing descriptions of experimental procedures and results for computer-driven information retrieval systems. The focus of the Gene Ontology project is three-fold. First, the project goal is to compile the Gene Ontologies; structured vocabularies describing domains of molecular biology. Second, the project supports the use of these structured vocabularies in the annotation of gene products. Third, the gene product-to-GO annotation sets are provided by participating groups to the public through open access to the GO database and Web resource. This unit describes the current ontologies and what is beyond the scope of the Gene Ontology project. It addresses the issue of how GO vocabularies are constructed and related to genes and gene products. It concludes with a discussion of how researchers can access, browse, and utilize the GO project in the course of their own research.

Journal ArticleDOI
TL;DR: The HAMAP project, or 'High-quality Automated and Manual Annotation of microbial Proteomes', aims to integrate manual and automatic annotation methods in order to enhance the speed of the curation process while preserving the quality of the database annotation.

Journal ArticleDOI
TL;DR: The approach to protein functional annotation with case studies and examines common identification errors is described and it is illustrated that data integration in PIR supports exploration of protein relationships and may reveal protein functional associations beyond sequence homology.

Patent
Steve Nelson1, Jason Harris1
15 May 2003
TL;DR: In this article, an annotation management system for providing real-time annotations for media content during a videoconference session is provided, which includes a media management server configured to manage media data and annotation data for distribution to participants of the videocon conference session.
Abstract: An annotation management system for providing real-time annotations for media content during a videoconference session is provided. The annotation management system includes a media management server configured to manage media data and annotation data for distribution to participants of the videoconference session. A storage server in communication with the media management server is configured to store the media data and the annotation data. An event database in communication with the media management server is configured to capture events associated with the annotation data. A media analysis server is in communication with the media management server, the event database, and the storage server. The media analysis server is configured to associate the stored annotation data with the captured events to enable reconstruction of the videoconference session based on the captured events. A videoconference system, a computer readable medium, a graphical user interface, and a method are also included.

Proceedings ArticleDOI
12 Apr 2003
TL;DR: A new method is proposed for detecting errors in "gold-standard" part-of-speech annotation based on n-grams occurring in the corpus with multiple taggings based on closed-class analysis and finite-state tagging guide patterns.
Abstract: We propose a new method for detecting errors in "gold-standard" part-of-speech annotation. The approach locates errors with high precision based on n-grams occurring in the corpus with multiple taggings. Two further techniques, closed-class analysis and finite-state tagging guide patterns, are discussed. The success of the three approaches is illustrated for the Wall Street Journal corpus as part of the Penn Tree-bank.

01 Jan 2003
TL;DR: In the framework of the ongoing project RoadRunner, this work has developed a prototype, called Labeller, that automatically annotates data extracted by automatically generated wrappers, and its underlying approach has a general validity and therefore it can be applied together with other wrapper generator systems.
Abstract: Data extraction from web pages is performed by software modules called wrappers. Recently, some systems for the automatic generation of wrappers have been proposed in the literature. These systems are based on unsupervised inference techniques: taking as input a small set of sample pages, they can produce a common wrapper to extract relevant data. However, due to the automatic nature of the approach, the data extracted by these wrappers have anonymous names. In the framework of our ongoing project RoadRunner, we have developed a prototype, called Labeller, that automatically annotates data extracted by automatically generated wrappers. Although Labeller has been developed as a companion system to our wrapper generator, its underlying approach has a general validity and therefore it can be applied together with other wrapper generator systems. We have experimented the prototype over several real-life web sites obtaining encouraging results.

01 Jan 2003
TL;DR: The approach shows the usefulness of using formal information retrieval models for the task of image annotation and retrieval by assuming that regions in an image can be described using a small vocabulary of blobs.
Abstract: Libraries have traditionally used manual image annotation for indexing and then later retrieving their image collections. However, manual image annotation is an expensive and labor intensive procedure and hence there has been great interest in coming up with automatic ways to retrieve images based on content. Here, we propose an automatic approach to annotating and retrieving images based on a training set of images. We assume that regions in an image can be described using a small vocabulary of blobs. Blobs are generated from image features using clustering. Given a training set of images with annotations, we show that probabilistic models allow us to predict the probability of generating a word given the blobs in an image. This may be used to automatically annotate and retrieve images given a word as a query. We show that relevance models. allow us to derive these probabilities in a natural way. Experiments show that the annotation performance of this cross-media relevance model is almost six times as good (in terms of mean precision) than a model based on word-blob co-occurrence model and twice as good as a state of the art model derived from machine translation. Our approach shows the usefulness of using formal information retrieval models for the task of image annotation and retrieval.

Journal ArticleDOI
TL;DR: A software package, which performs annotation based on GO terms for anonymous cDNA or protein sequences using the species independent GO structure and vocabulary together with a series of protein databases collected from various sites, to perform a detailed GO annotation by sequence similarity searches.
Abstract: Gene Ontology (GO) is the most widely accepted attempt to construct a unified and structured vocabulary for the description of genes and their products in any organism. Annotation by GO terms is performed in most of the current genome projects, which besides generality has the advantage of being very convenient for computer based classification methods. However, direct use of GO in small sequencing projects is not easy, especially for species not commonly represented in public databases. We present a software package (GOblet), which performs annotation based on GO terms for anonymous cDNA or protein sequences. It uses the species independent GO structure and vocabulary together with a series of protein databases collected from various sites, to perform a detailed GO annotation by sequence similarity searches. The sensitivity and the reference protein sets can be selected by the user. GOblet runs automatically and is available as a public service on our web server. The paper also addresses the reliability of automated GO annotations by using a reference set of more than 6000 human proteins. The GOblet server is accessible at http://goblet.molgen.mpg.de.

Journal ArticleDOI
TL;DR: ChipInfo is designed for retrieving annotation information from online databases and organizing such information into easily interpretable tabular format outputs and enables users to independently update the information resource files of these software packages.
Abstract: To date, assembling comprehensive annotation information for all probe sets of any Affymetrix microarrays remains a time-consuming, error-prone and challenging task. ChipInfo is designed for retrieving annotation information from online databases such as NetAffx and Gene Ontology and organizing such information into easily interpretable tabular format outputs. As companion software to dChip and GoSurfer, ChipInfo enables users to independently update the information resource files of these software packages. It also has functions for computing related summary statistics of probe sets and Gene Ontology terms. ChipInfo is available at http://biosun1.harvard.edu/complab/chipinfo/.

Book ChapterDOI
01 Jan 2003
TL;DR: A new, interactive semi-automatic annotation process that allows efficient and reliable annotations and is sped up by incrementally presenting structures and by automatically highlighting unreliable assignments is presented.
Abstract: We report on the syntactic annotation of a German newspaper corpus. The annotations consist of context-free structures, additionally allowing crossing branches, with labeled nodes (phrases) and edges (grammatical functions). Furthermore, we present a new, interactive semi-automatic annotation process that allows efficient and reliable annotations. The annotation process is sped up by incrementally presenting structures and by automatically highlighting unreliable assignments.