Showing papers on "Annotation published in 2004"

PDF

Open Access

Journal Article•DOI•

The Gene Ontology (GO) database and informatics resource.

[...]

Midori A. Harris, Jennifer I. Clark¹, Ireland A¹, Jane Lomax¹, Michael Ashburner², Michael Ashburner¹, R. Foulger², R. Foulger¹, Karen Eilbeck³, Karen Eilbeck¹, Suzanna E. Lewis¹, Suzanna E. Lewis³, B. Marshall³, B. Marshall¹, Christopher J. Mungall¹, Christopher J. Mungall³, J. Richter¹, J. Richter³, Gerald M. Rubin³, Gerald M. Rubin¹, Judith A. Blake¹, Carol J. Bult¹, Dolan M¹, Drabkin H¹, Janan T. Eppig¹, Hill Dp¹, L. Ni¹, Ringwald M¹, Rama Balakrishnan⁴, Rama Balakrishnan¹, J. M. Cherry¹, J. M. Cherry⁴, Karen R. Christie¹, Karen R. Christie⁴, Maria C. Costanzo⁴, Maria C. Costanzo¹, Selina S. Dwight⁴, Selina S. Dwight¹, Stacia R. Engel⁴, Stacia R. Engel¹, Dianna G. Fisk¹, Dianna G. Fisk⁴, Jodi E. Hirschman⁴, Jodi E. Hirschman¹, Eurie L. Hong⁴, Eurie L. Hong¹, Robert S. Nash¹, Robert S. Nash⁴, Anand Sethuraman¹, Anand Sethuraman⁴, Chandra L. Theesfeld⁴, Chandra L. Theesfeld¹, David Botstein⁵, David Botstein¹, Kara Dolinski⁵, Kara Dolinski¹, Becket Feierbach⁵, Becket Feierbach¹, Tanya Z. Berardini¹, Tanya Z. Berardini⁶, S. Mundodi¹, S. Mundodi⁶, Seung Y. Rhee¹, Seung Y. Rhee⁶, Rolf Apweiler¹, Daniel Barrell¹, Camon E¹, E. Dimmer¹, Lee¹, Rex L. Chisholm, Pascale Gaudet¹, Pascale Gaudet⁷, Warren A. Kibbe⁷, Warren A. Kibbe¹, Ranjana Kishore⁸, Ranjana Kishore¹, Erich M. Schwarz⁸, Erich M. Schwarz¹, Paul W. Sternberg⁸, Paul W. Sternberg¹, M. Gwinn¹, Hannick L¹, Wortman J¹, Matthew Berriman¹, Matthew Berriman⁹, Wood⁹, Wood¹, de la Cruz N¹, de la Cruz N¹⁰, Peter J. Tonellato¹, Peter J. Tonellato¹⁰, Pankaj Jaiswal¹¹, Pankaj Jaiswal¹, Seigfried T¹², Seigfried T¹, White R¹, White R¹³ - Show less +93 more•Institutions (13)

Wellcome Trust¹, University of Cambridge², University of California, Berkeley³, Stanford University⁴, Princeton University⁵, Carnegie Institution for Science⁶, Northwestern University⁷, California Institute of Technology⁸, Wellcome Trust Sanger Institute⁹, Medical College of Wisconsin¹⁰, Cornell University¹¹, Iowa State University¹², Incyte¹³

01 Jan 2004-Nucleic Acids Research

TL;DR: The Gene Ontology (GO) project as discussed by the authors provides structured, controlled vocabularies and classifications that cover several domains of molecular and cellular biology and are freely available for community use in the annotation of genes, gene products and sequences.

...read moreread less

Abstract: The Gene Ontology (GO) project (http://www.geneontology.org/) provides structured, controlled vocabularies and classifications that cover several domains of molecular and cellular biology and are freely available for community use in the annotation of genes, gene products and sequences. Many model organism databases and genome annotation groups use the GO and contribute their annotation sets to the GO resource. The GO database integrates the vocabularies and contributed annotations and provides full access to this information in several formats. Members of the GO Consortium continually work collectively, involving outside experts as needed, to expand and update the GO vocabularies. The GO Web resource also provides access to extensive documentation about the GO project and links to applications that use GO data for functional analyses.

...read moreread less

3,565 citations

Journal Article•DOI•

GO: :TermFinder---open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes

[...]

Elizabeth I. Boyle, Shuai Weng, Jeremy Gollub¹, Heng Jin¹, David Botstein, J. Michael Cherry, Gavin Sherlock - Show less +3 more•Institutions (1)

Stanford University¹

12 Dec 2004-Bioinformatics

TL;DR: GO::TermFinder comprises a set of object-oriented Perl modules for accessing Gene Ontology information and evaluating and visualizing the collective annotation of a list of genes to GO terms, which can be used to draw conclusions from microarray and other biological data.

...read moreread less

Abstract: Summary: GO::TermFinder comprises a set of object-oriented Perl modules for accessing Gene Ontology (GO) information and evaluating and visualizing the collective annotation of a list of genes to GO terms. It can be used to draw conclusions from microarray and other biological data, calculating the statistical significance of each annotation. GO::TermFinder can be used on any system on which Perl can be run, either as a command line application, in single or batch mode, or as a web-based CGI script. Availability: The full source code and documentation for GO::TermFinder are freely available from http://search.cpan.org/dist/GO-TermFinder/

...read moreread less

1,869 citations

Journal Article•DOI•

The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology

[...]

Evelyn Camon¹, Michele Magrane¹, Daniel Barrell¹, Vivian Lee¹, Emily Dimmer¹, John Maslen¹, David Binns¹, Nicola Harte¹, Rodrigo Lopez¹, Rolf Apweiler¹ - Show less +6 more•Institutions (1)

European Bioinformatics Institute¹

01 Jan 2004-Nucleic Acids Research

TL;DR: The Gene Ontology Annotation database aims to provide high-quality electronic and manual annotations to the UniProt Knowledgebase (Swiss-Prot, TrEMBL and PIR-PSD) using the standardized vocabulary of theGene Ontology (GO).

...read moreread less

Abstract: The Gene Ontology Annotation (GOA) database (http://www.ebi.ac.uk/GOA) aims to provide high-quality electronic and manual annotations to the UniProt Knowledgebase (Swiss-Prot, TrEMBL and PIR-PSD) using the standardized vocabulary of the Gene Ontology (GO). As a supplementary archive of GO annotation, GOA promotes a high level of integration of the knowledge represented in UniProt with other databases. This is achieved by converting UniProt annotation into a recognized computational format. GOA provides annotated entries for nearly 60,000 species (GOA-SPTr) and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. By integrating GO annotations from other model organism groups, GOA consolidates specialized knowledge and expertise to ensure the data remain a key reference for up-to-date biological information. Furthermore, the GOA database fully endorses the Human Proteomics Initiative by prioritizing the annotation of proteins likely to benefit human health and disease. In addition to a non-redundant set of annotations to the human proteome (GOA-Human) and monthly releases of its GO annotation for all species (GOA-SPTr), a series of GO mapping files and specific cross-references in other databases are also regularly distributed. GOA can be queried through a simple user-friendly web interface or downloaded in a parsable format via the EBI and GO FTP websites. The GOA data set can be used to enhance the annotation of particular model organism or gene expression data sets, although increasingly it has been used to evaluate GO predictions generated from text mining or protein interaction experiments. In 2004, the GOA team will build on its success and will continue to supplement the functional annotation of UniProt and work towards enhancing the ability of scientists to access all available biological information. Researchers wishing to query or contribute to the GOA project are encouraged to email: goa@ebi.ac.uk.

...read moreread less

917 citations

Journal Article•DOI•

Functional Annotation of the Arabidopsis Genome Using Controlled Vocabularies

[...]

Tanya Z. Berardini¹, Suparna Mundodi, Leonore Reiser, Eva Huala, Margarita Garcia-Hernandez, Peifen Zhang, Lukas A. Mueller, Jungwoon Yoon, Aisling Doyle, Gabriel C. Lander, Nick Moseyko, Danny Yoo, Iris Xu, Brandon Zoeckler, Mary Montoya, Neil A. Miller, Dan C. Weems, Seung Y. Rhee - Show less +14 more•Institutions (1)

Carnegie Institution for Science¹

01 Jun 2004-Plant Physiology

TL;DR: The process ofArabidopsis functional annotation is described using a variety of data sources and several ways in which this information can be accessed and used to infer knowledge about Arabidopsis and other plant species are illustrated.

...read moreread less

Abstract: Controlled vocabularies are increasingly used by databases to describe genes and gene products because they facilitate identification of similar genes within an organism or among different organisms. One of The Arabidopsis Information Resource's goals is to associate all Arabidopsis genes with terms developed by the Gene Ontology Consortium that describe the molecular function, biological process, and subcellular location of a gene product. We have also developed terms describing Arabidopsis anatomy and developmental stages and use these to annotate published gene expression data. As of March 2004, we used computational and manual annotation methods to make 85,666 annotations representing 26,624 unique loci. We focus on associating genes to controlled vocabulary terms based on experimental data from the literature and use The Arabidopsis Information Resource-developed PubSearch software to facilitate this process. Each annotation is tagged with a combination of evidence codes, evidence descriptions, and references that provide a robust means to assess data quality. Annotation of all Arabidopsis genes will allow quantitative comparisons between sets of genes derived from sources such as microarray experiments. The Arabidopsis annotation data will also facilitate annotation of newly sequenced plant genomes by using sequence similarity to transfer annotations to homologous genes. In addition, complete and up-to-date annotations will make unknown genes easy to identify and target for experimentation. Here, we describe the process of Arabidopsis functional annotation using a variety of data sources and illustrate several ways in which this information can be accessed and used to infer knowledge about Arabidopsis and other plant species.

...read moreread less

449 citations

Proceedings Article•

Annotating Multi-media/Multi-modal Resources with ELAN

[...]

Hennie Brugman¹, Albert Russel¹•Institutions (1)

Max Planck Society¹

01 May 2004

TL;DR: The actual state of development of the manual annotation tool ELAN is shown and usage requirements from three different groups of users are presented and one annotation model and a number of generic design principles guided the choices made during the development process of ELAN.

...read moreread less

Abstract: This paper shows the actual state of development of the manual annotation tool ELAN. It presents usage requirements from three different groups of users and how one annotation model and a number of generic design principles guided the choices made during the development process of ELAN. Introduction At the Max-Planck-Institute for Psycholinguistics (MPI) software development on annotation tools for the manual annotation of multimedia data has been going on since the early 90’s. Over this decade there have been large changes in enabling technology and insights in the nature of linguistic annotation. Media frameworks for the handling of digital audio and especially digital video files have matured, as has media streaming technology. XML has come to existence and has become highly relevant in a short time. Rendering and input of Unicode characters is now commonplace. Simultaneously, users made experiences with the first generation of video annotation tools and became aware of and got used to these new technologies. From this a new set of requirements arose. Finally, annotation tool builders are better aware of each other’s approaches, annotation models and annotation document formats. Clearly convergence is going on, leading to easier exchange of data between annotation tools. An important role in this process was played by the paper by (Bird & Liberman, 2001) that introduced Annotation Graphs. We are closely watching and trying to participate in standards initiatives, as for example ISO TC37/SC4. The first video annotation tool developed at the MPI was MediaTagger, a QuickTime based application that runs only on pre-OS X Macintoshes. It started as a first attempt to exploit the QuickTime Movie data structure, and especially it’s text tracks, as an informal model for linguistic annotation. Since then several new formal models where made, each one building on the experiences of the previous ones and considering new user requirements. The formal modeling languages that were used are Entity-Relationship diagrams and UML. A detailed presentation and evaluation of these models can be found in (Brugman & Wittenburg, 2001). The next chapters will discuss the requirements of several different groups of users and describe the latest state of ELAN functionality. We will then present our model for annotation in some detail and show how we can cover the needs of very different user groups with one relatively simple model. In the discussions plans for future development will presented. 1 http://www.mpi.nl User requirements ELAN is developed with a number of different user groups in mind. These users are situated both within the MPI and, in an increasing number of cases, outside the MPI. Often they are participating in externally funded projects (DoBeS, ECHO). We will discuss the main requirements per group, although there is of course a substantial overlap between each group’s needs.

...read moreread less

428 citations

Proceedings Article•DOI•

Towards the self-annotating web

[...]

Philipp Cimiano¹, Siegfried Handschuh¹, Steffen Staab¹•Institutions (1)

Karlsruhe Institute of Technology¹

17 May 2004

TL;DR: PANKOW (Pattern-based Annotation through Knowledge on theWeb), a method which employs an unsupervised, pattern-based approach to categorize instances with regard to an ontology, is proposed.

...read moreread less

Abstract: The success of the Semantic Web depends on the availability of ontologies as well as on the proliferation of web pages annotated with metadata conforming to these ontologies. Thus, a crucial question is where to acquire these metadata from. In this paper wepropose PANKOW (Pattern-based Annotation through Knowledge on theWeb), a method which employs an unsupervised, pattern-based approach to categorize instances with regard to an ontology. The approach is evaluated against the manual annotations of two human subjects. The approach is implemented in OntoMat, an annotation tool for the Semantic Web and shows very promising results.

...read moreread less

388 citations

Journal Article•DOI•

The vertebrate genome annotation (Vega) database

[...]

Laurens G. Wilming¹, James G. R. Gilbert¹, Kerstin Howe¹, Stephen J. Trevanion¹, Tim Hubbard¹, Jennifer Harrow¹ - Show less +2 more•Institutions (1)

Wellcome Trust Sanger Institute¹

17 Dec 2004-Nucleic Acids Research

TL;DR: The Vertebrate Genome Annotation (Vega) database was first made public in 2004 and now contains comprehensive annotation on 20 of the 24 human chromosomes, four whole mouse chromosomes and around 40% of the zebrafish Danio rerio genome.

...read moreread less

Abstract: The Vertebrate Genome Annotation (Vega) database (http://vegasangeracuk) has been designed to be a community resource for browsing manual annotation of finished sequences from a variety of vertebrate genomes Its core database is based on an Ensembl-style schema, extended to incorporate curation-specific metadata In collaboration with the genome sequencing centres, Vega attempts to present consistent high-quality annotation of the published human chromosome sequences In addition, it is also possible to view various finished regions from other vertebrates, including mouse and zebrafish Vega displays only manually annotated gene structures built using transcriptional evidence, which can be examined in the browser Attempts have been made to standardize the annotation procedure across each vertebrate genome, which should aid comparative analysis of orthologues across the different finished regions

...read moreread less

373 citations

Journal Article•DOI•

An annotation management system for relational databases

[...]

Deepavali Bhagwat¹, Laura Chiticariu¹, Wang-Chiew Tan¹, Gaurav Vijayvargiya¹•Institutions (1)

University of California, Santa Cruz¹

31 Aug 2004

TL;DR: An annotation management system for relational databases where every piece of data in a relation is assumed to have zero or more annotations associated with it and annotations are propagated along, from the source to the output, as data is being transformed through a query.

...read moreread less

Abstract: We present an annotation management system for relational databases. In this system, every piece of data in a relation is assumed to have zero or more annotations associated with it and annotations are propagated along, from the source to the output, as data is being transformed through a query. Such an annotation management system is important for understanding the provenance and quality of data, especially in applications that deal with integration of scientific and biological data. We present an extension, pSQL, of a fragment of SQL that has three different types of annotation propagation schemes, each useful for different purposes. The default scheme propagates annotations according to where data is copied from. The default-all scheme propagates annotations according to where data is copied from among all equivalent formulations of a given query. The custom scheme allows a user to specify how annotations should propagate. We present a storage scheme for the annotations and describe algorithms for translating a pSQL query under each propagation scheme into one or more SQL queries that would correctly retrieve the relevant annotations according to the specified propagation scheme. For the default-all scheme, we also show how we generate finitely many queries that can simulate the annotation propagation behavior of the set of all equivalent queries, which is possibly infinite. The algorithms are implemented and the feasibility of the system is demonstrated by a set of experiments that we have conducted.

...read moreread less

313 citations

Patent•

Rapid visual sorting of digital files and data

[...]

Steven M. Drucker¹, Curtis G. Wong¹, Asta Roseway¹, Steven C. Glenner¹, Steven D. DeMar¹ - Show less +1 more•Institutions (1)

Microsoft¹

17 Jul 2004

TL;DR: In this article, a system and method including a user interface to manage sets of digital data (e.g., files) such as digital photographs or email messages) is described.

...read moreread less

Abstract: Described is a system and method including a user interface to manage sets of digital data (e.g., files) such as digital photographs or email messages. The system and method comprise a rapid sort mechanism and an underlying support mechanism that associates metadata with each set of digital data, including annotation metadata obtained from the sort mechanism. As the user scrolls through images that represent the sets of digital data and categorizes them, metadata as to its particular categorization or lack of categorization is implicitly obtained and associated with each set of digital data. Grouping of sets of digital data into clusters is also provided, with a visual indication as to which cluster a set of digital data belongs. With respect to digital photography, the system and method makes annotating and classifying digital photographs significantly easier and faster than contemporary photograph management mechanisms.

...read moreread less

304 citations

Journal Article•DOI•

TIGER: Linguistic Interpretation of a German Corpus

[...]

Sabine Brants¹, Stefanie Dipper², Peter Eisenberg³, Silvia Hansen-Schirra¹, Esther König², Wolfgang Lezius², Christian Rohrer², George Smith³, Hans Uszkoreit¹ - Show less +5 more•Institutions (3)

Saarland University¹, University of Stuttgart², University of Potsdam³

01 Dec 2004-Research on Language and Computation

TL;DR: The TIGER Treebank, a corpus of currently 40,000 syntactically annotated German newspaper sentences, is described and the query language which was designed to facilitate a simple formulation of complex queries is described, a graphical user interface for query input.

...read moreread less

Abstract: This paper reports on the TIGER Treebank, a corpus of currently 40,000 syntactically annotated German newspaper sentences. We describe what kind of information is encoded in the treebank and introduce the different representation formats that are used for the annotation and exploitation of the treebank. We explain the different methods used for the annotation: interactive annotation, using the tool ANNOTATE, and LFG parsing. Furthermore, we give an account of the annotation scheme used for the TIGER treebank. This scheme is an extended and improved version of the NEGRA annotation scheme and we illustrate in detail the linguistic extensions that were made concerning the annotation in the TIGER project. The main differences are concerned with coordination, verb-subcategorization, expletives as well as proper nouns. In addition, the paper also presents the query tool TIGERSearch that was developed in the project to exploit the treebank in an adequate way. We describe the query language which was designed to facilitate a simple formulation of complex queries; furthermore, we shortly introduce TIGER in, a graphical user interface for query input. The paper concludes with a summary and some directions for future work.

...read moreread less

253 citations

Patent•

Semi-automatic annotation of multimedia objects

[...]

Wen-Yin Liu¹, Hong-Jiang Zhang¹•Institutions (1)

Microsoft¹

20 Oct 2004

TL;DR: In this paper, a multimedia object retrieval and annotation system integrates an annotation process with object retrieval, and relevance feedback processes, which is performed in background, hidden from the user, as the user conducts normal searches.

...read moreread less

Abstract: A multimedia object retrieval and annotation system integrates an annotation process with object retrieval and relevance feedback processes. The annotation process annotates multimedia objects, such as digital images, with semantically relevant keywords. The annotation process is performed in background, hidden from the user, as the user conducts normal searches. The annotation process is “semi-automatic” in that it utilizes both keyword-based information retrieval and content-based image retrieval techniques to automatically search for multimedia objects, and then encourages users to provide feedback on the retrieved objects. The user identifies objects as either relevant or irrelevant to the query keywords and based on this feedback, the system automatically annotates the objects with semantically relevant keywords and/or updates associations between the keywords and objects. As the retrieval-feedback-annotation cycle is repeated, the annotation coverage and accuracy of future searches continues to improve.

...read moreread less

Proceedings Article•

Integrated Annotation for Biomedical Information Extraction

[...]

Seth Kulick¹, Ann Bies¹, Mark Liberman¹, Mark Mandel¹, Ryan McDonald², Martha Palmer, Andrew I. Schein, Lyle H. Ungar, Scott Winters, Pete White - Show less +6 more•Institutions (2)

University of Pennsylvania¹, Children's Hospital of Philadelphia²

06 May 2004

TL;DR: An approach to two areas of biomedical information extraction, drug development and cancer genomics using a framework which includes corpus annotation integrated at multiple levels: a Treebank containing syntactic structure, a Propbank containing predicate-argument structure, and annotation of entities and relations among the entities.

...read moreread less

Abstract: We describe an approach to two areas of biomedical information extraction, drug development and cancer genomics. We have developed a framework which includes corpus annotation integrated at multiple levels: a Treebank containing syntactic structure, a Propbank containing predicate-argument structure, and annotation of entities and relations among the entities. Crucial to this approach is the proper characterization of entities as relation components, which allows the integration of the entity annotation with the syntactic structure while retaining the capacity to annotate and extract more complex events. We are training statistical taggers using this annotation for such extraction as well as using them for improving the annotation process.

...read moreread less

Patent•

Extraction of facts from text

[...]

Mark Wasson, James Wiltshire¹, Donald Loritz¹, Steve Xu¹, Shian-Jung Chen¹, Valentina Templar¹, Eleni Koutsomitopoulou¹ - Show less +3 more•Institutions (1)

LexisNexis¹

26 Oct 2004

TL;DR: The fact extraction tool set (FEX) as mentioned in this paperEX is a pattern matching language which is used to write scripts that find and match patterns of attributes that correspond to targeted pieces of information in the text, and extract that information.

...read moreread less

Abstract: A fact extraction tool set ('FEX') finds and extracts targeted pieces of information from text using linguistic and pattern matching technologies, and in particular, text annotation and fact extraction. Text annotation tools break a text, such as a document, into its base tokens and annotate those tokens or patterns of tokens with orthographic, syntactic, semantic, pragmatic and other attributes. A user-defined 'Annotation Configuration' controls which annotation tools are used in a given application. XML is used as the basis for representing the annotated text. A tag uncrossing tool resolves conflicting (crossed) annotation boundaries in an annotated text to produce well-formed XML from the results of the individual annotators. The fact extraction tool is a pattern matching language which is used to write scripts that find and match patterns of attributes that correspond to targeted pieces of information in the text, and extract that information.

...read moreread less

Journal Article•DOI•

International standard for a linguistic annotation framework

[...]

Nancy Ide¹, Laurent Romary²•Institutions (2)

Vassar College¹, French Institute for Research in Computer Science and Automation²

01 Sep 2004-Natural Language Engineering

TL;DR: The Linguistic Annotation Framework under development within ISO TC37 SC4 WG1 as mentioned in this paper is intended to serve as a basis for harmonizing existing language resources as well as developing new ones.

...read moreread less

Abstract: This paper describes the Linguistic Annotation Framework under development within ISO TC37 SC4 WG1. The Linguistic Annotation Framework is intended to serve as a basis for harmonizing existing language resources as well as developing new ones.

...read moreread less

2003 Standard for the Annotation of Temporal Expressions

[...]

Lisa Ferro, Laurie Gerber, Inderjeet Mani, Beth Sundheim, George Wilson - Show less +1 more

01 Jan 2004

TL;DR: The views, opinions, and/or findings contained in this report are those of the MITRE Corporation and should not be construed as an official Government position, policy, or decision, unless designated by other documentation.

...read moreread less

Abstract: The views, opinions, and/or findings contained in this report are those of the MITRE Corporation and should not be construed as an official Government position, policy, or decision, unless designated by other documentation.

...read moreread less

Journal Article•DOI•

Pseudomonas aeruginosa Genome Database and PseudoCAP: facilitating community-based, continually updated, genome annotation.

[...]

Geoffrey L. Winsor¹, Raymond Lo¹, Shannan J. Ho Sui¹, Korine S.E. Ung¹, Shaoshan Huang, Dean Cheng¹, Wai Kay Ho Ching¹, Robert E. W. Hancock², Fiona S. L. Brinkman¹ - Show less +5 more•Institutions (2)

Simon Fraser University¹, University of British Columbia²

17 Dec 2004-Nucleic Acids Research

TL;DR: This database system, with its focus on facilitating flexible queries of the data and providing access to both peer-reviewed annotations as well as alternate annotation information, may be a suitable model for other genome projects wishing to use a continually updated, community-based annotation approach.

...read moreread less

Abstract: Using the Pseudomonas aeruginosa Genome Project as a test case, we have developed a database and submission system to facilitate a community-based approach to continually updated genome annotation (http://www.pseudomonas.com). Researchers submit proposed annotation updates through one of three web-based form options which are then subjected to review, and if accepted, entered into both the database and log file of updates with author acknowledgement. In addition, a coordinator continually reviews literature for suitable updates, as we have found such reviews to be the most efficient. Both the annotations database and updates-log database have Boolean search capability with the ability to sort results and download all data or search results as tab-delimited files. To complement this peer-reviewed genome annotation, we also provide a linked GBrowse view which displays alternate annotations. Additional tools and analyses are also integrated, including PseudoCyc, and knockout mutant information. We propose that this database system, with its focus on facilitating flexible queries of the data and providing access to both peer-reviewed annotations as well as alternate annotation information, may be a suitable model for other genome projects wishing to use a continually updated, community-based annotation approach. The source code is freely available under GNU General Public Licence.

...read moreread less

Patent•

Information sharing device and information sharing method

[...]

Seiya Shimizu¹, Asako Kitaura¹•Institutions (1)

Fujitsu¹

30 Sep 2004

TL;DR: In this article, a technique is presented which enables users to add, with ease, annotation information to an electronic document on a network and to share the annotation information within a disclosure range set to the annotations.

...read moreread less

Abstract: A technique is provided which enables users to add, with ease, annotation information to an electronic document on a network and to share the annotation information within a disclosure range set to the annotation information. An electronic information is provided in a state that allows the electronic information to have annotation in formation attached, annotation information to be attached to the electronic information is stored, attribute information indicating the disclosure range of the annotation information is stored and, when a user requests to provide annotation information, annotation information available to the user is provided to a terminal of the user by consulting the attribute information.

...read moreread less

Proceedings Article•DOI•

Effective automatic image annotation via a coherent language model and active learning

[...]

Rong Jin¹, Joyce Y. Chai¹, Luo Si²•Institutions (2)

Michigan State University¹, Carnegie Mellon University²

10 Oct 2004

TL;DR: A coherent language model for automatic image annotation is proposed that takes into account the word-to-word correlation by estimating a coherent language models for an image to significantly reduce the required number of annotated image examples.

...read moreread less

Abstract: Image annotations allow users to access a large image database with textual queries. There have been several studies on automatic image annotation utilizing machine learning techniques, which automatically learn statistical models from annotated images and apply them to generate annotations for unseen images. One common problem shared by most previous learning approaches for automatic image annotation is that each annotated word is predicated for an image independently from other annotated words. In this paper, we proposed a coherent language model for automatic image annotation that takes into account the word-to-word correlation by estimating a coherent language model for an image. This new approach has two important advantages: 1) it is able to automatically determine the annotation length to improve the accuracy of retrieval results, and 2) it can be used with active learning to significantly reduce the required number of annotated image examples. Empirical studies with Corel dataset are presented to show the effectiveness of the coherent language model for automatic image annotation.

...read moreread less

Proceedings Article•DOI•

Multi-level annotation of natural scenes using dominant image components and semantic concepts

[...]

Jianping Fan¹, Yuli Gao¹, Hangzai Luo¹•Institutions (1)

University of North Carolina at Charlotte¹

10 Oct 2004

TL;DR: This paper proposes a multi-level approach to annotate the semantics of natural scenes by using both the dominant image components (salient objects) and the relevant semantic concepts to achieve automatic image annotation at the content level.

...read moreread less

Abstract: Automatic image annotation is a promising solution to enable semantic image retrieval via keywords. In this paper, we propose a multi-level approach to annotate the semantics of natural scenes by using both the dominant image components (salient objects) and the relevant semantic concepts. To achieve automatic image annotation at the content level, we use salient objects as the dominant image components for image content representation and feature extraction. To support automatic image annotation at the concept level, a novel image classification technique is developed to map the images into the most relevant semantic image concepts. In addition, Support Vector Machine (SVM) classifiers are used to learn the detection functions for the pre-defined salient objects and finite mixture models are used for semantic concept interpretation and modeling. An adaptive EM algorithm has been proposed to determine the optimal model structure and model parameters simultaneously. We have also demonstrated that our algorithms are very effective to enable multi-level annotation of natural scenes in a large-scale image dataset.

...read moreread less

Patent•

Multi-user, multi-timed collaborative annotation

[...]

Elias Albornoz Jordi A Feigenb¹, Lee Feigenbaum¹, Sean J. Martin¹, Simon L. Martin¹, Lonnie A. McCullough¹, Elias Torres¹ - Show less +2 more•Institutions (1)

IBM¹

08 Nov 2004

TL;DR: In this article, a user with a predetermined privilege selects a widget and is presented with the annotation document, the user performs an annotation task modifying the annotation documents and submits the annotated documents to the annotation store, the submission triggering the workflow action program to progress the workflow to another step.

...read moreread less

Abstract: A displayed document comprises an annotation widget, the widget associated with an annotation document and a corresponding annotation key in an annotation store. The annotation document associated with a workflow action program. A user with a predetermined privilege selects a widget and is presented with the annotation document. The user performs an annotation task modifying the annotation document and submits the annotation document to the annotation store, the submission triggering the workflow action program to progress the workflow to another step.

...read moreread less

Proceedings Article•DOI•

Non-lexical approaches to identifying associative relations in the gene ontology

[...]

Olivier Bodenreider¹, Marc Aubry, Anita Burgun•Institutions (1)

National Institutes of Health¹

01 Dec 2004

TL;DR: The associations identified in this study could serve as the starting point for adding associative relations across hierarchies to GO, but would require manual curation.

...read moreread less

Abstract: The Gene Ontology (GO) is a controlled vocabulary widely used for the annotation of gene products. GO is organized in three hierarchies for molecular functions, cellular components, and biological processes but no relations are provided among terms across hierarchies. The objective of this study is to investigate three non-lexical approaches to identifying such associative relations in GO and compare them among themselves and to lexical approaches. The three approaches are: computing similarity in a vector space model, statistical analysis of co-occurrence of GO terms in annotation databases, and association rule mining. Five annotation databases (FlyBase, the Human subset of GOA, MGI, SGD, and WormBase) are used in this study. A total of 7,665 associations were identified by at least one of the three non-lexical approaches. Of these, 12% were identified by more than one approach. While there are almost 6,000 lexical relations among GO terms, only 203 associations were identified by both non-lexical and lexical approaches. The associations identified in this study could serve as the starting point for adding associative relations across hierarchies to GO, but would require manual curation. The application to quality assurance of annotation databases is also discussed.

...read moreread less

Patent•

Method for associating annotations with document families

[...]

Jordi Albornoz¹, Lee Feigenbaum¹, Douglas R. Fish¹, Sean J. Martin¹, Hoa Tran¹, David A. Wall¹ - Show less +2 more•Institutions (1)

IBM¹

17 Dec 2004

TL;DR: In this paper, the authors present methods, systems, and articles of manufacture for managing an annotation system that includes storing annotations for a document family, i.e., a series of versions of a data source.

...read moreread less

Abstract: The present invention generally provides methods, systems, and articles of manufacture for managing an annotation system that includes storing annotations for a document family, i.e., a series of versions of a data source. Annotations created for one version of the data source may be viewed in context from both subsequent and prior versions of the same data source. Embodiments of the invention associate annotations with both a data source “family identifier” as well as a “version identifier.” Other than adding a family ID to the data source, the data source remains unchanged. The family ID is maintained across different versions of the data source, whereas version IDs are determined for a specific version of the data source. Version IDs can be constructed from each data source directly, and do not need to be stored.

...read moreread less

Journal Article•DOI•

GOblet: a platform for Gene Ontology annotation of anonymous sequence data

[...]

Detlef Groth¹, Hans Lehrach¹, Steffen Hennig¹•Institutions (1)

Max Planck Society¹

01 Jul 2004-Nucleic Acids Research

TL;DR: GOblet is a comprehensive web server application providing the annotation of anonymous sequence data with Gene Ontology (GO) terms and provides an improved display of results with the aid of Java applets.

...read moreread less

Abstract: GOblet is a comprehensive web server application providing the annotation of anonymous sequence data with Gene Ontology (GO) terms. It uses a variety of different protein databases (human, murines, invertebrates, plants, sp-trembl) and their respective GO mappings. The user selects the appropriate database and alignment threshold and thereafter submits single or multiple nucleotide or protein sequences. Results are shown in different ways, e.g. as survey statistics for the main GO categories for all sequences or as detailed results for each single sequence that has been submitted. In its newest version, GOblet allows the batch submission of sequences and provides an improved display of results with the aid of Java applets. All output data, together with the Java applet, are packed to a downloadable archive for local installation and analysis. GOblet can be accessed freely at http://goblet.molgen.mpg.de.

...read moreread less

Patent•

Method and System of Annotation for Electronic Documents

[...]

Yue Pan¹, Li Zhang¹•Institutions (1)

IBM¹

21 Oct 2004

TL;DR: In this article, the authors present a method of annotation for electronic documents, a method for creating, modifying and browsing an annotation in an electronic document, and an apparatus and system for editing, browsing annotations in electronic documents.

...read moreread less

Abstract: The present invention provides a method of annotation for electronic document, a method for creating, modifying and browsing an annotation in an electronic document, and an apparatus and system for editing, browsing annotations in electronic document. The method of annotation for electronic document includes: storing annotation contents for one or more electronic documents into a shared dictionary; and when a reader browses an electronic document, providing the reader with annotations for the electronic document based on the shared dictionary.

...read moreread less

Proceedings Article•DOI•

Photo annotation on a camera phone

[...]

Anita Wilhelm¹, Yuri Takhteyev¹, Risto Sarvas², Nancy A. Van House¹, Marc Davis¹ - Show less +1 more•Institutions (2)

University of California, Berkeley¹, Helsinki Institute for Information Technology²

24 Apr 2004

TL;DR: Usability issues encountered in using a camera phone as an image annotation device immediately after image capture and users' responses to use of such a system are presented.

...read moreread less

Abstract: In this paper we describe a system that allows users to annotate digital photos at the time of capture. The system uses camera phones with a lightweight client application and a server to store the images and metadata and assists the user in annotation on the camera phone by providing guesses about the location and content of the photos. By conducting user interface testing, surveys, and focus groups we were able to evaluate the usability of this system and uncover usage patterns and motivations that will inform our development of future mobile media annotation applications. In this paper we present usability issues encountered in using a camera phone as an image annotation device immediately after image capture and users' responses to use of such a system.

...read moreread less

Journal Article•DOI•

MyHits: a new interactive resource for protein annotation and domain identification

[...]

Marco Pagni¹, Vassilios Ioannidis¹, Lorenzo Cerutti¹, Monique Zahn-Zabal¹, Monique Zahn-Zabal², C. Victor Jongeneel³, Laurent Falquet¹ - Show less +3 more•Institutions (3)

Swiss Institute of Bioinformatics¹, Ludwig Institute for Cancer Research², National Center for Supercomputing Applications³

01 Jul 2004-Nucleic Acids Research

TL;DR: The MyHits web server is a new integrated service dedicated to the annotation of protein sequences and to the analysis of their domains and signatures and includes a new collection of tools to generate graphical representations of pairwise and multiple sequence alignments including their annotated features.

...read moreread less

Abstract: The MyHits web server (http://myhits.isb-sib.ch) is a new integrated service dedicated to the annotation of protein sequences and to the analysis of their domains and signatures. Guest users can use the system anonymously, with full access to (i) standard bioinformatics programs (e.g. PSI-BLAST, ClustalW, T-Coffee, Jalview); (ii) a large number of protein sequence databases, including standard (Swiss-Prot, TrEMBL) and locally developed databases (splice variants); (iii) databases of protein motifs (Prosite, Interpro); (iv) a precomputed list of matches (‘hits’) between the sequence and motif databases. All databases are updated on a weekly basis and the hit list is kept up to date incrementally. The MyHits server also includes a new collection of tools to generate graphical representations of pairwise and multiple sequence alignments including their annotated features. Free registration enables users to upload their own sequences and motifs to private databases. These are then made available through the same web interface and the same set of analytical tools. Registered users can manage their own sequences and annotations using only web tools and freeze their data in their private database for publication purposes.

...read moreread less

Proceedings Article•

The American National Corpus First Release.

[...]

Nancy Ide¹, Keith Suderman¹•Institutions (1)

Vassar College¹

01 May 2004

TL;DR: The First Release of the American National Corpus (ANC) was made available in mid-fall, 2003 and includes approximately 11 million words of American English, including written and spoken data and a variety of text types annotated for part of speech and lemma.

...read moreread less

Abstract: The First Release of the American National Corpus (ANC) was made available in mid-fall, 2003. The data includes approximately 11 million words of American English, including written and spoken data and a variety of text types annotated for part of speech and lemma. The corpus is provided in XML format conformant to the XML Corpus Encoding Standard (XCES) (http://www.xml-ces.org), and is distributed in both a stand-off version (where annotation is in an XML document separate from the primary texts) and a merged version (where annotation is included in-line in the texts). The merged version includes annotation for part of speech and lemma produced by the Biber tagger; in stand-off annotation, in addition to the Biber tagging, morpho-syntactic annotations of the data are provided using the CLAWS 5 and 7 tagsets as well as several other tagsets.

...read moreread less

Journal Article•DOI•

Ontologizing gene-expression microarray data: characterizing clusters with Gene Ontology

[...]

Peter N. Robinson¹, Andreas Wollstein¹, Ulrike Böhme¹, Brad Beattie²•Institutions (2)

Humboldt State University¹, Memorial Sloan Kettering Cancer Center²

12 Apr 2004-Bioinformatics

TL;DR: An XML-based Java application is described that provides a function-oriented overview of the results of cluster analysis of gene-expression microarray data based on Gene Ontology terms and associations.

...read moreread less

Abstract: Summary: An XML-based Java application is described that provides a function-oriented overview of the results of cluster analysis of gene-expression microarray data based on Gene Ontology terms and associations. The application generates one HTML page with listings of the frequencies of explicit and implicit Gene Ontology annotations for each cluster, and separate, linked pages with listings of explicit annotations for each gene in a cluster. Availability: http://www.charite.de/ch/medgen/ontologizer

...read moreread less

Proceedings Article•DOI•

MADCOW: a multimedia digital annotation system

[...]

Paolo Bottoni¹, Roberta Civica¹, Stefano Levialdi¹, Laura Orso¹, Emanuele Panizzi¹, Rosa Trinchese¹ - Show less +2 more•Institutions (1)

Sapienza University of Rome¹

25 May 2004

TL;DR: A new digital annotation system organized in a client-server architecture, where the client is a plug-in for a standard web browser and the servers are repositories of annotations to which different clients can login.

...read moreread less

Abstract: Digital annotation of multimedia documents adds information to a document (e.g. a web page) or parts of it (a multimedia object such as an image or a video stream contained in the document). Digital annotations can be kept private or shared among different users over the internet, allowing discussions and cooperative work. We study the possibility of annotating multimedia documents with objects which are in turn of multimedial nature. Annotations can refer to whole documents or single portions thereof, as usual, but also to multi-objects, i.e. groups of objects contained in a single document. We designed and developed a new digital annotation system organized in a client-server architecture, where the client is a plug-in for a standard web browser and the servers are repositories of annotations to which different clients can login. Annotations can be retrieved and filtered, and one can choose different annotation servers for a document. We present a platform-independent design for such a system, and illustrate a specific implementation for Microsoft Internet Explorer on the client side and on JSP/MySQL for the server side.

...read moreread less

Proceedings Article•

The MATE/GNOME Proposals for Anaphoric Annotation, Revisited

[...]

Massimo Poesio

01 Jan 2004

TL;DR: These first experiences with the MATE scheme for anaphoric annotation are discussed, some lessons that have been learned, and a few modifications are suggested.

...read moreread less

Abstract: In the five years since it was proposed, the MATE scheme for anaphoric annotation has been used in a variety of annotation projects, and the resulting corpora have been used to study both anaphora resolution and NL generation. Annotation tools inspired by the proposals have been used in some of these projects. In this paper we discuss these first experiences with the scheme, some lessons that have been learned, and suggest a few modifications.

...read moreread less

Collapse