scispace - formally typeset
Search or ask a question

Showing papers on "Annotation published in 2008"


Journal ArticleDOI
TL;DR: The Blast2GO framework is used to carry out a detailed analysis of annotation behaviour through homology transfer and its impact in functional genomics research to offer biologists useful information to take into account when addressing the task of functionally characterizing their sequence data.
Abstract: Functional genomics technologies have been widely adopted in the biological research of both model and non-model species. An efficient functional annotation of DNA or protein sequences is a major requirement for the successful application of these approaches as functional information on gene products is often the key to the interpretation of experimental results. Therefore, there is an increasing need for bioinformatics resources which are able to cope with large amount of sequence data, produce valuable annotation results and are easily accessible to laboratories where functional genomics projects are being undertaken. We present the Blast2GO suite as an integrated and biologist-oriented solution for the high-throughput and automatic functional annotation of DNA or protein sequences based on the Gene Ontology vocabulary. The most outstanding Blast2GO features are: (i) the combination of various annotation strategies and tools controlling type and intensity of annotation, (ii) the numerous graphical features such as the interactive GO-graph visualization for gene-set function profiling or descriptive charts, (iii) the general sequence management features and (iv) high-throughput capabilities. We used the Blast2GO framework to carry out a detailed analysis of annotation behaviour through homology transfer and its impact in functional genomics research. Our aim is to offer biologists useful information to take into account when addressing the task of functionally characterizing their sequence data.

3,306 citations


Journal ArticleDOI
TL;DR: The Blast2GO suite is described as a comprehensive bioinformatics tool for functional annotation of sequences and data mining on the resulting annotations, primarily based on the gene ontology (GO) vocabulary.
Abstract: Functional annotation of novel sequence data is a primary requirement for the utilization of functional genomics approaches in plant research. In this paper, we describe the Blast2GO suite as a comprehensive bioinformatics tool for functional annotation of sequences and data mining on the resulting annotations, primarily based on the gene ontology (GO) vocabulary. Blast2GO optimizes function transfer from homologous sequences through an elaborate algorithm that considers similarity, the extension of the homology, the database of choice, the GO hierarchy, and the quality of the original annotations. The tool includes numerous functions for the visualization, management, and statistical analysis of annotation results, including gene set enrichment analysis. The application supports InterPro, enzyme codes, KEGG pathways, GO direct acyclic graphs (DAGs), and GOSlim. Blast2GO is a suitable tool for plant genomics research because of its versatility, easy installation, and friendly use.

1,889 citations


Journal ArticleDOI
TL;DR: This work highlights the importance of SOPs for genome annotation and endorse an online repository of Sops and highlights the need for a central repository to store and disseminate procedures and protocols for annotation.
Abstract: The methodologies used to generate genome and metagenome annotations are diverse and vary between groups and laboratories. Descriptions of the annotation process are helpful in interpreting genome annotation data. Some groups have produced Standard Operating Procedures (SOPs) that describe the annotation process, but standards are lacking for structure and content of these descriptions. In addition, there is no central repository to store and disseminate procedures and protocols for genome annotation. We highlight the importance of SOPs for genome annotation and endorse an online repository of SOPs.

585 citations


Book ChapterDOI
12 Oct 2008
TL;DR: This work introduces a new baseline technique for image annotation that treats annotation as a retrieval problem and outperforms the current state-of-the-art methods on two standard and one large Web dataset.
Abstract: Automatically assigning keywords to images is of great interest as it allows one to index, retrieve, and understand large collections of image data. Many techniques have been proposed for image annotation in the last decade that give reasonable performance on standard datasets. However, most of these works fail to compare their methods with simple baseline techniques to justify the need for complex models and subsequent training. In this work, we introduce a new baseline technique for image annotation that treats annotation as a retrieval problem. The proposed technique utilizes low-level image features and a simple combination of basic distances to find nearest neighbors of a given image. The keywords are then assigned using a greedy label transfer mechanism. The proposed baseline outperforms the current state-of-the-art methods on two standard and one large Web dataset. We believe that such a baseline measure will provide a strong platform to compare and better understand future annotation techniques.

483 citations


Journal ArticleDOI
TL;DR: A corpus annotation project that has produced a freely available resource for research on handling negation and uncertainty in biomedical texts, which is also a good resource for the linguistic analysis of scientific and clinical texts.
Abstract: Detecting uncertain and negative assertions is essential in most BioMedical Text Mining tasks where, in general, the aim is to derive factual knowledge from textual data. This article reports on a corpus annotation project that has produced a freely available resource for research on handling negation and uncertainty in biomedical texts (we call this corpus the BioScope corpus). The corpus consists of three parts, namely medical free texts, biological full papers and biological scientific abstracts. The dataset contains annotations at the token level for negative and speculative keywords and at the sentence level for their linguistic scope. The annotation process was carried out by two independent linguist annotators and a chief linguist – also responsible for setting up the annotation guidelines – who resolved cases where the annotators disagreed. The resulting corpus consists of more than 20.000 sentences that were considered for annotation and over 10% of them actually contain one (or more) linguistic annotation suggesting negation or uncertainty. Statistics are reported on corpus size, ambiguity levels and the consistency of annotations. The corpus is accessible for academic purposes and is free of charge. Apart from the intended goal of serving as a common resource for the training, testing and comparing of biomedical Natural Language Processing systems, the corpus is also a good resource for the linguistic analysis of scientific and clinical texts.

410 citations


Journal ArticleDOI
TL;DR: A new type of semantic annotation, event annotation, is completed, which is an addition to the existing annotations in the GENIA corpus, and is expected to become a valuable resource for NLP (Natural Language Processing)-based TM in the bio-medical domain.
Abstract: Advanced Text Mining (TM) such as semantic enrichment of papers, event or relation extraction, and intelligent Question Answering have increasingly attracted attention in the bio-medical domain. For such attempts to succeed, text annotation from the biological point of view is indispensable. However, due to the complexity of the task, semantic annotation has never been tried on a large scale, apart from relatively simple term annotation. We have completed a new type of semantic annotation, event annotation, which is an addition to the existing annotations in the GENIA corpus. The corpus has already been annotated with POS (Parts of Speech), syntactic trees, terms, etc. The new annotation was made on half of the GENIA corpus, consisting of 1,000 Medline abstracts. It contains 9,372 sentences in which 36,114 events are identified. The major challenges during event annotation were (1) to design a scheme of annotation which meets specific requirements of text annotation, (2) to achieve biology-oriented annotation which reflect biologists' interpretation of text, and (3) to ensure the homogeneity of annotation quality across annotators. To meet these challenges, we introduced new concepts such as Single-facet Annotation and Semantic Typing, which have collectively contributed to successful completion of a large scale annotation. The resulting event-annotated corpus is the largest and one of the best in quality among similar annotation efforts. We expect it to become a valuable resource for NLP (Natural Language Processing)-based TM in the bio-medical domain.

401 citations


Journal ArticleDOI
TL;DR: Based on the comprehensive characterization of tomato fruit metabolites, it is demonstrated that metabolite annotation facilitates the systematic analysis of unknown metabolites and biological interpretation of their relationships, which provide a basis for integrating metabolite information into the system-level study of plant biology.
Abstract: A large number of metabolites are found in each plant, most of which have not yet been identified. Development of a methodology is required to deal systematically with unknown metabolites, and to elucidate their biological roles in an integrated 'omics' framework. Here we report the development of a 'metabolite annotation' procedure. The metabolite annotation is a process by which structures and functions are inferred for metabolites. Tomato (Solanum lycopersicum cv. Micro-Tom) was used as a model for this study using LC-FTICR-MS. Collected mass spectral features, together with predicted molecular formulae and putative structures, were provided as metabolite annotations for 869 metabolites. Comparison with public databases suggests that 494 metabolites are novel. A grading system was introduced to describe the evidence supporting the annotations. Based on the comprehensive characterization of tomato fruit metabolites, we identified chemical building blocks that are frequently found in tomato fruit tissues, and predicted novel metabolic pathways for flavonoids and glycoalkaloids. These results demonstrate that metabolite annotation facilitates the systematic analysis of unknown metabolites and biological interpretation of their relationships, which provide a basis for integrating metabolite information into the system-level study of plant biology.

280 citations


Proceedings ArticleDOI
19 Jun 2008
TL;DR: A corpus annotation project that has produced a freely available resource for research on handling negation and uncertainty in biomedical texts and is called the BioScope corpus, which consists of medical free texts, biological full papers and biological scientific abstracts.
Abstract: This article reports on a corpus annotation project that has produced a freely available resource for research on handling negation and uncertainty in biomedical texts (we call this corpus the BioScope corpus). The corpus consists of three parts, namely medical free texts, biological full papers and biological scientific abstracts. The dataset contains annotations at the token level for negative and speculative keywords and at the sentence level for their linguistic scope. The annotation process was carried out by two independent linguist annotators and a chief annotator -- also responsible for setting up the annotation guidelines -- who resolved cases where the annotators disagreed. We will report our statistics on corpus size, ambiguity levels and the consistency of annotations.

191 citations


PatentDOI
TL;DR: In this paper, the authors propose a system and/or a method that facilitates generating a point of interest related to a map, where an interface component can collect a portion of annotation data from two or more users, wherein the annotated data is associated with a digital map and includes at least one of a map location and a user specific description of the map location.
Abstract: The claimed subject matter provides a system and/or a method that facilitates generating a point of interest related to a map. An interface component can collect a portion of annotation data from two or more users, wherein the portion of annotation data is associated with a digital map and includes at least one of a map location and a user specific description of the map location. An annotation aggregator can evaluate annotation data corresponding to the map location on the digital map. The annotation aggregator can create a point of interest (POI) for the map location based upon the evaluation and populates the digital map with at least one of an identified location extracted from two or more users or a universal description extracted from two or more users.

189 citations


Proceedings ArticleDOI
30 Mar 2008
TL;DR: This paper describes the collaborative annotation system used to annotate the High Level Features (HLF) in the development set of TRECVID 2007 and shows that Active Learning allows simultaneously getting the most useful information from the partial annotation and significantly reducing the annotation effort per participant relatively to previous collaborative annotations.
Abstract: Concept indexing in multimedia libraries is very useful for users searching and browsing but it is a very challenging research problem as well. Beyond the systems' implementations issues, semantic indexing is strongly dependent upon the size and quality of the training examples. In this paper, we describe the collaborative annotation system used to annotate the High Level Features (HLF) in the development set of TRECVID 2007. This system is web-based and takes advantage of Active Learning approach. We show that Active Learning allows simultaneously getting the most useful information from the partial annotation and significantly reducing the annotation effort per participant relatively to previous collaborative annotations.

187 citations


Patent
21 Mar 2008
TL;DR: In this article, a digital work may be annotated using an eBook reader device and an invariant location reference identifier corresponding to the specified portion of the digital work can then be added to the annotation.
Abstract: A digital work may be annotated using an eBook reader device (300). Upon receiving (706) an annotation relating to a specific portion of the digital work, an invariant location reference identifier (500) corresponding to the specified portion of the digital work may be appended (710) to the annotation. The annotation may then be stored (712) in association with the digital work for later reference. In some instances, an annotation may be presented (618) on an eBook reader device upon receipt (612) of a valid authorization credential granting access to the annotation.

Journal ArticleDOI
TL;DR: WGAViewer is a suite of JAVA software tools that provides a user-friendly interface to automatically annotate, visualize, and interpret the set of P-values emerging from a WGA study, and can be used to highlight possible functional mechanisms in an automatic manner.
Abstract: To meet the immediate need for a framework of post-whole genome association (WGA) annotation, we have developed WGAViewer, a suite of JAVA software tools that provides a user-friendly interface to automatically annotate, visualize, and interpret the set of P-values emerging from a WGA study. Most valuably, it can be used to highlight possible functional mechanisms in an automatic manner, for example, by directly or indirectly implicating a polymorphism with an apparent link to gene expression, and help to generate hypotheses concerning the possible biological bases of observed associations. The easily interpretable diagrams can then be used to identify the associations that seem most likely to be biologically relevant, and to select genomic regions that may need to be resequenced in a search for candidate causal variants. In this report, we used our recently completed study on host control of HIV-1 viral load during the asymptomatic set point period as an illustration for the heuristic annotation of this software and its contributive role in a successful WGA project.

Journal ArticleDOI
TL;DR: Three image annotation approaches are reviewed: free text annotation, keyword annotation and annotation based on ontologies, which discusses the creation of keyword vocabularies for use in automated image annotation evaluation.
Abstract: In order to evaluate automated image annotation and object recognition algorithms, ground truth in the form of a set of images correctly annotated with text describing each image is required. In this paper, three image annotation approaches are reviewed: free text annotation, keyword annotation and annotation based on ontologies. The practical aspects of image annotation are then considered. We discuss the creation of keyword vocabularies for use in automated image annotation evaluation. As direct manual annotation of images requires much time and effort, we also review various methods to make the creation of ground truth more efficient. An overview of annotated image datasets for computer vision research is provided.

Proceedings ArticleDOI
07 Jul 2008
TL;DR: This work shows how semantic metadata about social networks and family relationships can be used to improve semantic annotation suggestion and indicates that utilizing relationships among people while searching can provide at least 28% higher recall and 55% higher precision than keyword search while still being up to 12 times faster.
Abstract: The number of personal multimedia objects, such as digital photographs and videos, are exploding on the web through popular sites such as Flickr, YouTube, and FaceBook hosting billions of user-created items. Semantic annotation can be an extremely effective way to search, browse, and organize media objects, but can require extensive human involvement. In this work, we show how semantic metadata about social networks and family relationships can be used to improve semantic annotation suggestion. This includes up to 82% recall for people annotations as well as recall improvements of 20-26% in tag annotation recall when no annotation history is available. In addition, utilizing relationships among people while searching can provide at least 28% higher recall and 55% higher precision than keyword search while still being up to 12 times faster. Methods are evaluated on real personal photo collections containing up to 120k photos from Flickr as well as 41k annotated photos from our prototype system.

Journal ArticleDOI
TL;DR: This manuscript describes the creation of comprehensive gene wiki, seeded with data from public domain sources, which will enable and encourage community annotation of gene function.
Abstract: This manuscript describes the creation of comprehensive gene wiki, seeded with data from public domain sources, which will enable and encourage community annotation of gene function.

Proceedings Article
01 Jan 2008
TL;DR: The motivation for following thePaninian framework as the annotation scheme is provided and it is argued that the Paninian framework is better suited to model the various linguistic phenomena manifest in Indian languages.
Abstract: The paper introduces a dependency annotation effort which aims to fully annotate a million word Hindi corpus. It is the first attempt of its kind to develop a large scale tree-bank for an Indian language. In this paper we provide the motivation for following the Paninian framework as the annotation scheme and argue that the Paninian framework is better suited to model the various linguistic phenomena manifest in Indian languages. We present the basic annotation scheme. We also show how the scheme handles some phenomenon such as complex verbs, ellipses, etc. Empirical results of some experiments done on the currently annotated sentences are also reported.

Journal ArticleDOI
TL;DR: The issues involved in this task are discussed, the results strongly suggest that automatic annotation along most of the dimensions is highly feasible, and that this new framework for scientific sentence categorization is applicable in practice.
Abstract: Motivation: Much current research in biomedical text mining is concerned with serving biologists by extracting certain information from scientific text. We note that there is no ‘average biologist’ client; different users have distinct needs. For instance, as noted in past evaluation efforts (BioCreative, TREC, KDD) database curators are often interested in sentences showing experimental evidence and methods. Conversely, lab scientists searching for known information about a protein may seek facts, typically stated with high confidence. Text-mining systems can target specific end-users and become more effective, if the system can first identify text regions rich in the type of scientific content that is of interest to the user, retrieve documents that have many such regions, and focus on fact extraction from these regions. Here, we study the ability to characterize and classify such text automatically. We have recently introduced a multi-dimensional categorization and annotation scheme, developed to be applicable to a wide variety of biomedical documents and scientific statements, while intended to support specific biomedical retrieval and extraction tasks. Results: The annotation scheme was applied to a large corpus in a controlled effort by eight independent annotators, where three individual annotators independently tagged each sentence. We then trained and tested machine learning classifiers to automatically categorize sentence fragments based on the annotation. We discuss here the issues involved in this task, and present an overview of the results. The latter strongly suggest that automatic annotation along most of the dimensions is highly feasible, and that this new framework for scientific sentence categorization is applicable in practice. Contact: shatkay@cs.queensu.ca

Journal ArticleDOI
TL;DR: MetaMap generates precise results at the expense of insufficient recall while the statistical method obtains better recall at a lower precision rate, and dictionary look-up already provides competitive results indicating that the use of disease terminology is highly standardized throughout the terminologies and the literature.
Abstract: Background In recent years, the recognition of semantic types from the biomedical scientific literature has been focused on named entities like protein and gene names (PGNs) and gene ontology terms (GO terms). Other semantic types like diseases have not received the same level of attention. Different solutions have been proposed to identify disease named entities in the scientific literature. While matching the terminology with language patterns suffers from low recall (e.g., Whatizit) other solutions make use of morpho-syntactic features to better cover the full scope of terminological variability (e.g., MetaMap). Currently, MetaMap that is provided from the National Library of Medicine (NLM) is the state of the art solution for the annotation of concepts from UMLS (Unified Medical Language System) in the literature. Nonetheless, its performance has not yet been assessed on an annotated corpus. In addition, little effort has been invested so far to generate an annotated dataset that links disease entities in text to disease entries in a database, thesaurus or ontology and that could serve as a gold standard to benchmark text mining solutions.

Proceedings Article
01 Nov 2008
TL;DR: The creators of signed language corpora should prioritize annotation above transcription, and ensure that signs are identified using unique gloss-based annotations, and the whole rationale for corpus-creation is undermined.
Abstract: The essential characteristic of a signed language corpus is that it has been annotated, and not, contrary to the practice of many signed language researchers, that it has been transcribed. Annotations are necessary for corpus-based investigations of signed or spoken languages. Multi-media annotation software can now be used to transform a recording into a machine-readable text without it first being necessary to transcribe the text, provided that linguistic units are uniquely identified and annotations subsequently appended to these units. These unique identifiers are here referred to as ID-glosses. The use of ID-glosses is only possible if a reference lexical database (i.e., dictionary) exists as the result of prior foundation research into the lexicon. In short, the creators of signed language corpora should prioritize annotation above transcription, and ensure that signs are identified using unique gloss-based annotations. Without this the whole rationale for corpus-creation is undermined.

Journal ArticleDOI
TL;DR: How the gene prediction in the research field increasingly shifts from methods that typically exploited one or two types of data to more integrative approaches that simultaneously deal with various experimental, statistical, or other in silico evidence is discussed.
Abstract: In this era of whole genome sequencing, reliable genome annotations (identification of functional regions) are the cornerstones for many subsequent analyses. Not only is careful annotation important for studying the gene and gene family content of a genome and its host, but also for wide-scale transcriptome and proteome analyses attempting to de- scribe a certain biological process or to get a global picture of a cell's behavior. Although the number of sequenced ge- nomes is increasing thanks to the application of new technologies, genome-wide analyses will critically depend on the quality of the genome annotations. However, the annotation process is more complicated in the plant field than in the animal field because of the limited funding that leads to much fewer experimental data and less annotation expertise. This situation calls for highly automated annotation platforms that can make the best use of all available data, experimental or not. We discuss how the gene prediction (the process of predicting protein gene structures in genomic sequences) research field increasingly shifts from methods that typically exploited one or two types of data to more integrative approaches that simultaneously deal with various experimental, statistical, or other in silico evidence. We illustrate the importance of inte- grative approaches for producing high-quality automatic annotations of genomes of plants and algae as well as of fungi that live in close association with plants using the platform EuGene as an example.

Proceedings ArticleDOI
28 May 2008
TL;DR: The proposed framework is grounded in existing literature, interviews with experienced coders, and ongoing discussions with researchers in multiple disciplines, and directly addresses the workflow and needs of both researchers and video coders.
Abstract: Digital tools for annotation of video have the promise to provide immense value to researchers in disciplines ranging from psychology to ethnography to computer science. With traditional methods for annotation being cumbersome, time-consuming, and frustrating, technological solutions are situated to aid in video annotation by increasing reliability, repeatability, and workflow optimizations. Three notable limitations of existing video annotation tools are lack of support for the annotation workflow, poor representation of data on a timeline, and poor interaction techniques with video, data, and annotations. This paper details a set of design requirements intended to enhance video annotation. Our framework is grounded in existing literature, interviews with experienced coders, and ongoing discussions with researchers in multiple disciplines. Our model is demonstrated in a new system called VCode and VData. The benefit of our system is that is directly addresses the workflow and needs of both researchers and video coders.

Patent
23 Jan 2008
TL;DR: In this article, the authors present a user interface that allows a user to enter and view annotations associated with content such as a video, such that when that time segment is played in a video player, its associated annotation is presented.
Abstract: Aspects of the subject matter described herein relate to annotating and sharing content. In aspects, an annotation tool presents a user interface that allows a user to enter and view annotations associated with content such as a video. The annotation tool allows the user to associate each annotation with a particular time segment of the video such that when that time segment is played in a video player, its associated annotation is presented. The annotation tool also presents a user interface that allows the user to share the video as annotated with other users as desired. Other users receiving the annotated video may further annotate the video and share it with others.

Journal ArticleDOI
TL;DR: ConFunc significantly outperforms BLAST & PSI-BLAST obtaining levels of recall and precision that are not obtained by either method and maximum precision 24% greater than BLAST.
Abstract: Motivation: The success of genome sequencing has resulted in many protein sequences without functional annotation. We present ConFunc, an automated Gene Ontology (GO)-based protein function prediction approach, which uses conserved residues to generate sequence profiles to infer function. ConFunc split sets of sequences identified by PSI-BLAST into sub-alignments according to their GO annotations. Conserved residues are identified for each GO term sub-alignment for which a position specific scoring matrix is generated. This combination of steps produces a set of feature (GO annotation) derived profiles from which protein function is predicted. Results: We assess the ability of ConFunc, BLAST and PSI-BLAST to predict protein function in the twilight zone of sequence similarity. ConFunc significantly outperforms BLAST & PSI-BLAST obtaining levels of recall and precision that are not obtained by either method and maximum precision 24% greater than BLAST. Further for a large test set of sequences with homologues of low sequence identity, at high levels of presicision, ConFunc obtains recall six times greater than BLAST. These results demonstrate the potential for ConFunc to form part of an automated genomics annotation pipeline. Availability: http://www.sbg.bio.ic.ac.uk/confunc Contact: m.sternberg@imperial.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

01 Jan 2008
TL;DR: The first version of Phrase Detectives is presented, to the authors' knowledge the first game designed for collaborative linguistic annotation on the Web and applying this method to linguistic annotation tasks like anaphoric annotation.
Abstract: Annotated corpora of the size needed for modern computational linguistics research cannot be created by small groups of hand annotators. One solution is to exploit collaborative work on the Web and one way to do this is through games like the ESP game. Applying this methodology however requires developing methods for teaching subjects the rules of the game and evaluating their contribution while maintaining the game entertainment. In addition, applying this method to linguistic annotation tasks like anaphoric annotation requires developing methods for presenting text and identifying the components of the text that need to be annotated. In this paper we present the first version of Phrase Detectives (http://www.phrasedetectives.org), to our knowledge the first game designed for collaborative linguistic annotation on the Web.

Journal ArticleDOI
TL;DR: annot8r is a tool that assigns GO, EC and KEGG annotations for data sets resulting from EST sequencing projects both rapidly and efficiently, and is ideally suited for non-model species EST-sequencing projects.
Abstract: The expressed sequence tag (EST) methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that do not have the bioinformatics infrastructure of major genome sequencing centres. Therefore, there is a need for user-friendly tools to facilitate the annotation of non-model species EST datasets with well-defined ontologies that enable meaningful cross-species comparisons. To address this, we have developed annot8r, a platform for the rapid annotation of EST datasets with GO-terms, EC-numbers and KEGG-pathways. annot8r automatically downloads all files relevant for the annotation process and generates a reference database that stores UniProt entries, their associated Gene Ontology (GO), Enzyme Commission (EC) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) annotation and additional relevant data. For each of GO, EC and KEGG, annot8r extracts a specific sequence subset from the UniProt dataset based on the information stored in the reference database. These three subsets are then formatted for BLAST searches. The user provides the protein or nucleotide sequences to be annotated and annot8r runs BLAST searches against these three subsets. The BLAST results are parsed and the corresponding annotations retrieved from the reference database. The annotations are saved both as flat files and also in a relational postgreSQL results database to facilitate more advanced searches within the results. annot8r is integrated with the PartiGene suite of EST analysis tools. annot8r is a tool that assigns GO, EC and KEGG annotations for data sets resulting from EST sequencing projects both rapidly and efficiently. The benefits of an underlying relational database, flexibility and the ease of use of the program make it ideally suited for non-model species EST-sequencing projects.

Journal ArticleDOI
TL;DR: This paper develops a rule-based approach to formulate explicit annotations for images fully automatically, so that by the use of this method, semantic query such as "sunset by the sea in autumn in New York" can be answered and indexed purely by machine.
Abstract: As the number of Web images is increasing at a rapid rate, searching them semantically presents a significant challenge. Many raw images are constantly uploaded with little meaningful direct annotations of semantic content, limiting their search and discovery. In this paper, we present a semantic annotation technique based on the use of image parametric dimensions and metadata. Using decision trees and rule induction, we develop a rule-based approach to formulate explicit annotations for images fully automatically, so that by the use of our method, semantic query such as "sunset by the sea in autumn in New York" can be answered and indexed purely by machine. Our system is evaluated quantitatively using more than 100,000 Web images. Experimental results indicate that this approach is able to deliver highly competent performance, attaining good recall and precision rates of sometimes over 80%. This approach enables a new degree of semantic richness to be automatically associated with images which previously can only be performed manually.

Journal ArticleDOI
TL;DR: It is concluded how text mining techniques could be tightly integrated into the manual annotation process through novel authoring systems to scale-up high-quality manual curation.
Abstract: The biomedical literature can be seen as a large integrated, but unstructured data repository. Extracting facts from literature and making them accessible is approached from two directions: manual curation efforts develop ontologies and vocabularies to annotate gene products based on statements in papers. Text mining aims to automatically identify entities and their relationships in text using information retrieval and natural language processing techniques. Manual curation is highly accurate but time consuming, and does not scale with the ever increasing growth of literature. Text mining as a high-throughput computational technique scales well, but is error-prone due to the complexity of natural language. How can both be married to combine scalability and accuracy? Here, we review the state-of-the-art text mining approaches that are relevant to annotation and discuss available online services analysing biomedical literature by means of text mining techniques, which could also be utilised by annotation projects. We then examine how far text mining has already been utilised in existing annotation projects and conclude how these techniques could be tightly integrated into the manual annotation process through novel authoring systems to scale-up high-quality manual curation.

Patent
28 Jun 2008
TL;DR: In this article, a method of annotating a digital clip and setting a duration over which the annotation applies is disclosed, which provides a graphical user interface (GUI) with a display area for displaying the digital clip.
Abstract: A method of annotating a digital clip and setting a duration over which the annotation applies is disclosed. The method provides a graphical user interface (GUI) with a display area for displaying the digital clip. The GUI provides controls for entering notes, including graphical notes, on the clip. The GUI also provides controls for setting the duration for which the annotation applies.

Journal ArticleDOI
01 Aug 2008-Genomics
TL;DR: The UCSC Genome Bioinformatics website provides a quick and easy-to-use visual display of genomic data, placing annotation tracks beneath genome coordinate positions, allowing rapid visual correlation of different types of information.

Proceedings ArticleDOI
16 Jun 2008
TL;DR: This paper introduced the main features of the UAM CorpusTool, software for human and semi-automatic annotation of text and images, and will show how to set up an annotation project, how to annotate text files at multiple annotation levels, and how to perform cross-layer searches of the corpus.
Abstract: This paper introduced the main features of the UAM CorpusTool, software for human and semi-automatic annotation of text and images. The demonstration will show how to set up an annotation project, how to annotate text files at multiple annotation levels, how to automatically assign tags to segments matching lexical patterns, and how to perform cross-layer searches of the corpus.