scispace - formally typeset
Search or ask a question

Showing papers on "Annotation published in 2007"


Journal ArticleDOI
TL;DR: The expanded DAVID Knowledgebase now integrates almost all major and well-known public bioinformatics resources centralized by the DAVID Gene Concept, a single-linkage method to agglomerate tens of millions of diverse gene/protein identifiers and annotation terms from a variety of public bio informatics databases.
Abstract: All tools in the DAVID Bioinformatics Resources aim to provide functional interpretation of large lists of genes derived from genomic studies. The newly updated DAVID Bioinformatics Resources consists of the DAVID Knowledgebase and five integrated, web-based functional annotation tool suites: the DAVID Gene Functional Classification Tool, the DAVID Functional Annotation Tool, the DAVID Gene ID Conversion Tool, the DAVID Gene Name Viewer and the DAVID NIAID Pathogen Genome Browser. The expanded DAVID Knowledgebase now integrates almost all major and well-known public bioinformatics resources centralized by the DAVID Gene Concept, a single-linkage method to agglomerate tens of millions of diverse gene/protein identifiers and annotation terms from a variety of public bioinformatics databases. For any uploaded gene list, the DAVID Resources now provides not only the typical gene-term enrichment analysis, but also new tools and functions that allow users to condense large gene lists into gene functional groups, convert between gene/protein identifiers, visualize many-genes-to-many-terms relationships, cluster redundant and heterogeneous terms into groups, search for interesting and related genes or terms, dynamically view genes from their lists on bio-pathways and more. With DAVID (http://david. niaid.nih.gov), investigators gain more power to interpret the biological mechanisms associated with large gene lists.

1,842 citations


Journal ArticleDOI
TL;DR: Through incorporation of multiple transcript and proteomic expression data sets, the Institute for Genomic Research has been able to annotate 24 799 genes (31 739 gene models), representing ∼50% of the total gene models, as expressed in the rice genome.
Abstract: In The Institute for Genomic Research Rice Genome Annotation project (http://rice.tigr.org), we have continued to update the rice genome sequence with new data and improve the quality of the annotation. In our current release of annotation (Release 4.0; January 12, 2006), we have identified 42,653 non-transposable element-related genes encoding 49,472 gene models as a result of the detection of alternative splicing. We have refined our identification methods for transposable element-related genes resulting in 13,237 genes that are related to transposable elements. Through incorporation of multiple transcript and proteomic expression data sets, we have been able to annotate 24 799 genes (31,739 gene models), representing approximately 50% of the total gene models, as expressed in the rice genome. All structural and functional annotation is viewable through our Rice Genome Browser which currently supports 59 tracks. Enhanced data access is available through web interfaces, FTP downloads and a Data Extractor tool developed in order to support discrete dataset downloads.

1,117 citations


Journal ArticleDOI
TL;DR: The Genome Browser displays a wide variety of annotations at all scales from the single nucleotide level up to a full chromosome and includes assembly data, genes and gene predictions, mRNA and EST alignments, and comparative genomics, regulation, expression and variation data.
Abstract: The University of California, Santa Cruz Genome Browser Database contains, as of September 2006, sequence and annotation data for the genomes of 13 vertebrate and 19 invertebrate species. The Genome Browser displays a wide variety of annotations at all scales from the single nucleotide level up to a full chromosome and includes assembly data, genes and gene predictions, mRNA and EST alignments, and comparative genomics, regulation, expression and variation data. The database is optimized for fast interactive performance with web tools that provide powerful visualization and querying capabilities for mining the data. In the past year, 22 new assemblies and several new sets of human variation annotation have been released. New features include VisiGene, a fully integrated in situ hybridization image browser; phyloGif, for drawing evolutionary tree diagrams; a redesigned Custom Track feature; an expanded SNP annotation track; and many new display options. The Genome Browser, other tools, downloadable data files and links to documentation and other information can be found at http://genome.ucsc.edu/.

1,061 citations


Proceedings ArticleDOI
29 Apr 2007
TL;DR: The incentives for annotation in Flickr, a popular web-based photo-sharing system, and ZoneTag, a cameraphone photo capture and annotation tool that uploads images to Flickr are investigated to offer a taxonomy of motivations for annotation along two dimensions (sociality and function).
Abstract: Why do people tag? Users have mostly avoided annotating media such as photos -- both in desktop and mobile environments -- despite the many potential uses for annotations, including recall and retrieval. We investigate the incentives for annotation in Flickr, a popular web-based photo-sharing system, and ZoneTag, a cameraphone photo capture and annotation tool that uploads images to Flickr. In Flickr, annotation (as textual tags) serves both personal and social purposes, increasing incentives for tagging and resulting in a relatively high number of annotations. ZoneTag, in turn, makes it easier to tag cameraphone photos that are uploaded to Flickr by allowing annotation and suggesting relevant tags immediately after capture. A qualitative study of ZoneTag/Flickr users exposed various tagging patterns and emerging motivations for photo annotation. We offer a taxonomy of motivations for annotation in this system along two dimensions (sociality and function), and explore the various factors that people consider when tagging their photos. Our findings suggest implications for the design of digital photo organization and sharing applications, as well as other applications that incorporate user-based annotation.

912 citations


Journal ArticleDOI
TL;DR: The University of California, Santa Cruz, Genome Browser Database (GBD) provides integrated sequence and annotation data for a large collection of vertebrate and model organism genomes.
Abstract: The University of California, Santa Cruz, Genome Browser Database (GBD) provides integrated sequence and annotation data for a large collection of vertebrate and model organism genomes. Seventeen new assemblies have been added to the database in the past year, for a total coverage of 19 vertebrate and 21 invertebrate species as of September 2007. For each assembly, the GBD contains a collection of annotation data aligned to the genomic sequence. Highlights of this year's additions include a 28-species human-based vertebrate conservation annotation, an enhanced UCSC Genes set, and more human variation, MGC, and ENCODE data. The database is optimized for fast interactive performance with a set of web-based tools that may be used to view, manipulate, filter and download the annotation data. New toolset features include the Genome Graphs tool for displaying genome-wide data sets, session saving and sharing, better custom track management, expanded Genome Browser configuration options and a Genome Browser wiki site. The downloadable GBD data, the companion Genome Browser toolset and links to documentation and related information can be found at: http://genome.ucsc.edu/.

551 citations


Journal ArticleDOI
TL;DR: The DAVID Knowledgebase is designed to facilitate high throughput gene functional analysis, and not only provides the quick accessibility to a wide range of heterogeneous annotation data in a centralized location, but also enriches the level of biological information for an individual gene.
Abstract: Background: Due to the complex and distributed nature of biological research, our current biological knowledge is spread over many redundant annotation databases maintained by many independent groups. Analysts usually need to visit many of these bioinformatics databases in order to integrate comprehensive annotation information for their genes, which becomes one of the bottlenecks, particularly for the analytic task associated with a large gene list. Thus, a highly centralized and ready-to-use gene-annotation knowledgebase is in demand for high throughput gene functional analysis. Description: The DAVID Knowledgebase is built around the DAVID Gene Concept, a single-linkage method to agglomerate tens of millions of gene/protein identifiers from a variety of public genomic resources into DAVID gene clusters. The grouping of such identifiers improves the cross-reference capability, particularly across NCBI and UniProt systems, enabling more than 40 publicly available functional annotation sources to be comprehensively integrated and centralized by the DAVID gene clusters. The simple, pair-wise, text format files which make up the DAVID Knowledgebase are freely downloadable for various data analysis uses. In addition, a well organized web interface allows users to query different types of heterogeneous annotations in a high-throughput manner. Conclusion: The DAVID Knowledgebase is designed to facilitate high throughput gene functional analysis. For a given gene list, it not only provides the quick accessibility to a wide range of heterogeneous annotation data in a centralized location, but also enriches the level of biological information for an individual gene. Moreover, the entire DAVID Knowledgebase is freely downloadable or searchable at http://david.abcc.ncifcrf.gov/knowledgebase/.

500 citations


Journal ArticleDOI
TL;DR: A corpus targeted at protein, gene, and RNA relationships which serves as a resource for the development of information extraction systems and their components such as parsers and domain analyzers is introduced.
Abstract: Lately, there has been a great interest in the application of information extraction methods to the biomedical domain, in particular, to the extraction of relationships of genes, proteins, and RNA from scientific publications. The development and evaluation of such methods requires annotated domain corpora. We present BioInfer (Bio Information Extraction Resource), a new public resource providing an annotated corpus of biomedical English. We describe an annotation scheme capturing named entities and their relationships along with a dependency analysis of sentence syntax. We further present ontologies defining the types of entities and relationships annotated in the corpus. Currently, the corpus contains 1100 sentences from abstracts of biomedical research articles annotated for relationships, named entities, as well as syntactic dependencies. Supporting software is provided with the corpus. The corpus is unique in the domain in combining these annotation types for a single set of sentences, and in the level of detail of the relationship annotation. We introduce a corpus targeted at protein, gene, and RNA relationships which serves as a resource for the development of information extraction systems and their components such as parsers and domain analyzers. The corpus will be maintained and further developed with a current version being available at http://www.it.utu.fi/BioInfer .

479 citations


Proceedings ArticleDOI
Lyndon Kennedy1, Mor Naaman1, Shane Ahern1, Rahul Nair1, Tye Rattenbury1 
29 Sep 2007
TL;DR: A location-tag-vision-based approach to retrieving images of geography-related landmarks and features from the Flickr dataset is demonstrated, suggesting that community-contributed media and annotation can enhance and improve access to multimedia resources - and the understanding of the world.
Abstract: The advent of media-sharing sites like Flickr and YouTube has drastically increased the volume of community-contributed multimedia resources available on the web These collections have a previously unimagined depth and breadth, and have generated new opportunities - and new challenges - to multimedia research How do we analyze, understand and extract patterns from these new collections? How can we use these unstructured, unrestricted community contributions of media (and annotation) to generate "knowledge" As a test case, we study Flickr - a popular photo sharing website Flickr supports photo, time and location metadata, as well as a light-weight annotation model We extract information from this dataset using two different approaches First, we employ a location-driven approach to generate aggregate knowledge in the form of "representative tags" for arbitrary areas in the world Second, we use a tag-driven approach to automatically extract place and event semantics for Flickr tags, based on each tag's metadata patterns With the patterns we extract from tags and metadata, vision algorithms can be employed with greater precision In particular, we demonstrate a location-tag-vision-based approach to retrieving images of geography-related landmarks and features from the Flickr dataset The results suggest that community-contributed media and annotation can enhance and improve our access to multimedia resources - and our understanding of the world

417 citations


Journal ArticleDOI
Tsuyoshi Tanaka1, Baltazar A. Antonio1, Shoshi Kikuchi1, Takashi Matsumoto1, Yoshiaki Nagamura1, Hisataka Numa1, Hiroaki Sakai1, Jianzhong Wu1, Takeshi Itoh1, Takeshi Itoh2, Takuji Sasaki1, Ryo Aono, Yasuyuki Fujii3, Takuya Habara, Erimi Harada, Masako Kanno, Yoshihiro Kawahara4, Hiroaki Kawashima, Hiromi Kubooka, Akihiro Matsuya, Hajime Nakaoka, Naomi Saichi, Ryoko Sanbonmatsu, Yoshiharu Sato, Yuji Shinso, Mami Suzuki, Jun-ichi Takeda, Motohiko Tanino, Fusano Todokoro, Kaori Yamaguchi, Naoyuki Yamamoto, Chisato Yamasaki, Tadashi Imanishi2, Toshihisa Okido, Masahito Tada, Kazuho Ikeo, Yoshio Tateno, Takashi Gojobori, Yao-Cheng Lin5, Fu Jin Wei5, Yue-Ie C. Hsing5, Qiang Zhao, Bin Han, Melissa Kramer6, Richard W. McCombie6, David Lonsdale7, Claire O'Donovan7, Eleanor J. Whitfield7, Rolf Apweiler7, Kanako O. Koyanagi8, Jitendra P. Khurana9, Saurabh Raghuvanshi9, Nagendra K. Singh10, Akhilesh K. Tyagi9, Georg Haberer, Masaki Fujisawa, Satomi Hosokawa, Yukiyo Ito, Hiroshi Ikawa, Michie Shibata, Mayu Yamamoto, Richard Bruskiewich11, Douglas R. Hoen12, Thomas E. Bureau12, Nobukazu Namiki13, Hajime Ohyanagi13, Yasumichi Sakai13, Satoshi Nobushima13, Katsumi Sakata13, Roberto A. Barrero14, Yutaka Sato15, Alexandre Souvorov16, Brian Smith-White16, Tatiana Tatusova16, Suyoung An17, Gynheung An17, Satoshi Oota, Galina Fuks18, Joachim Messing, Karen R. Christie19, Damien Lieberherr20, Hyeran Kim21, Andrea Zuccolo21, Rod A. Wing, Kan Nobuta22, Pamela J. Green22, Cheng Lu22, Blake C. Meyers22, Cristian Chaparro23, Benoît Piégu23, Olivier Panaud23, Manuel Echeverria23 
TL;DR: The latest version of the RAP-DB contains a variety of annotation data as follows: clone positions, structures and functions of 31 439 genes validated by cDNAs, RNA genes detected by massively parallel signature sequencing (MPSS) technology and sequence similarity, flanking sequences of mutant lines, transposable elements, etc.
Abstract: The Rice Annotation Project Database (RAP-DB) was created to provide the genome sequence assembly of the International Rice Genome Sequencing Project (IRGSP), manually curated annotation of the sequence, and other genomics information that could be useful for comprehensive understanding of the rice biology. Since the last publication of the RAP-DB, the IRGSP genome has been revised and reassembled. In addition, a large number of rice-expressed sequence tags have been released, and functional genomics resources have been produced worldwide. Thus, we have thoroughly updated our genome annotation by manual curation of all the functional descriptions of rice genes. The latest version of the RAP-DB contains a variety of annotation data as follows: clone positions, structures and functions of 31 439 genes validated by cDNAs, RNA genes detected by massively parallel signature sequencing (MPSS) technology and sequence similarity, flanking sequences of mutant lines, transposable elements, etc. Other annotation data such as Gnomon can be displayed along with those of RAP for comparison. We have also developed a new keyword search system to allow the user to access useful information. The RAP-DB is available at: http://rapdb.dna.affrc.go.jp/ and http://rapdb.lab.nig.ac.jp/.

342 citations


Journal ArticleDOI
TL;DR: The current release of ORegAnno comprises 30 145 records curated from 922 publications and describing regulatory sequences for over 3853 genes and 465 transcription factors from 19 species.
Abstract: ORegAnno is an open-source, open-access database and literature curation system for communitybased annotation of experimentally identified DNA regulatory regions, transcription factor binding sites and regulatory variants. The current release comprises 30145 records curated from 922 publications and describing regulatory sequences for over 3853 genes and 465 transcription factors from 19 species. A new feature called the ‘publication queue’ allows users to input relevant papers from scientific literature as targets for annotation. The queue contains 4438 gene regulation papers entered by experts and another 54351 identified by text-mining methods. Users can enter or ‘check out’ papers from the queue for manual curation using a series of user-friendly annotation pages. A typical record entry consists of species, sequence type, sequence, target gene, binding factor, experimental outcome and one or more lines of experimental evidence. An evidence ontology was developed to describe and categorize these experiments. Records are cross-referenced to Ensembl or Entrez gene identifiers, PubMed and dbSNP and can be visualized in the Ensembl or UCSC genome browsers. All data are freely available through search pages, XML data dumps or web services at: http://www.oreganno.org.

262 citations


Journal ArticleDOI
TL;DR: GO resources at SGD have been modified to distinguish data sources and annotation methods so that GO annotations from independent sources can be compared to those made by SGD to help keep the literature-based GO annotations current.
Abstract: The Saccharomyces Genome Database (SGD; http:// www.yeastgenome.org/) collects and organizes biological information about the chromosomal features and gene products of the budding yeast Saccharomyces cerevisiae. Although published data from traditional experimental methods are the primary sources of evidence supporting Gene Ontology (GO) annotations for a gene product, high-throughput experiments and computational predictions can also provide valuable insights in the absence of an extensive body of literature. Therefore, GO annotations available at SGD now include high-throughput data as well as computational predictions provided by the GO Annotation Project (GOA UniProt; http://www.ebi.ac.uk/GOA/). Because the annotation method used to assign GO annotations varies by data source, GO resources at SGD have been modified to distinguish data sources and annotation methods. In addition to providing information for genes that have not been experimentally characterized, GO annotations from independent sources can be compared to those made by SGD to help keep the literature-based GO annotations current.

Patent
19 Dec 2007
TL;DR: In this article, an annotation associated with a media file is indexed to a first instance of that media file by comparing features of the two instances, a mapping is created between the first instance and a second instance of the media file.
Abstract: A system and method for transferring annotations associated with a media file. An annotation associated with a media file is indexed to a first instance of that media file. By comparing features of the two instances, a mapping is created between the first instance of the media file and a second instance of the media file. The annotation can be indexed to the second instance using the mapping between the first and second instances. The annotation can be processed (displayed, stored, or modified) based on the index to the second instance.

Proceedings ArticleDOI
29 Apr 2007
TL;DR: This paper develops several innovative interaction techniques for semi-automatic photo annotation that provide a more user friendly interface for the annotation of person name, location, and event, and thus substantially improve the annotation performance especially for a large photo album.
Abstract: Digital photo management is becoming indispensable for the explosively growing family photo albums due to the rapid popularization of digital cameras and mobile phone cameras. In an effective photo management system photo annotation is the most challenging task. In this paper, we develop several innovative interaction techniques for semi-automatic photo annotation. Compared with traditional annotation systems, our approach provides the following new features: "cluster annotation" puts similar faces or photos with similar scene together, and enables user label them in one operation; "contextual re-ranking" boosts the labeling productivity by guessing the user intention; "ad hoc annotation" allows user label photos while they are browsing or searching, and improves system performance progressively through learning propagation. Our results show that these technologies provide a more user friendly interface for the annotation of person name, location, and event, and thus substantially improve the annotation performance especially for a large photo album.

Patent
Yong Ju Jung1, Jae-won Lee1, Ji Yeun Kim1, Kim Sang Kyun1, Han Ick Sang1 
26 Jan 2007
TL;DR: In this article, a system, method and medium indexing a plurality of photos semantically based on a user's annotation is presented, which includes analyzing the user's annotations and extracting a shared index from the user annotations, detecting a situation change in the plurality of images and indexing the plurality according to the situation change based on the shared index.
Abstract: A system, method and medium indexing a plurality of photos semantically based on a user's annotation. The method includes analyzing the user's annotation and extracting a shared index from the user's annotation, detecting a situation change in the plurality of photos, and indexing the plurality of photos according to the situation change based on the shared index.

Journal ArticleDOI
01 Jul 2007
TL;DR: A novel approach, information theory-based semantic similarity (ITSS), to automatically predict molecular functions of genes based on existing GO annotations is proposed, able to generate highly accurate predictions in sparsely annotated portions of GO, where previous algorithms have failed.
Abstract: Motivation: Despite advances in the gene annotation process, the functions of a large portion of gene products remain insufficiently characterized. In addition, the in silico prediction of novel Gene Ontology (GO) annotations for partially characterized gene functions or processes is highly dependent on reverse genetic or functional genomic approaches. To our knowledge, no prediction method has been demonstrated to be highly accurate for sparsely annotated GO terms (those associated to fewer than 10 genes). Results: We propose a novel approach, information theory-based semantic similarity (ITSS), to automatically predict molecular functions of genes based on existing GO annotations. Using a 10-fold cross-validation, we demonstrate that the ITSS algorithm obtains prediction accuracies (precision 97%, recall 77%) comparable to other machine learning algorithms when compared in similar conditions over densely annotated portions of the GO datasets. This method is able to generate highly accurate predictions in sparsely annotated portions of GO, where previous algorithms have failed. As a result, our technique generates an order of magnitude more functional predictions than previous methods. A 10-fold cross validation demonstrated a precision of 90% at a recall of 36% for the algorithm over sparsely annotated networks of the recent GO annotations (about 1400 GO terms and 11 000 genes in Homo sapiens). To our knowledge, this article presents the first historical rollback validation for the predicted GO annotations, which may represent more realistic conditions than more widely used cross-validation approaches. By manually assessing a random sample of 100 predictions conducted in a historical rollback evaluation, we estimate that a minimum precision of 51% (95% confidence interval: 43–58%) can be achieved for the human GO Annotation file dated 2003. Availability: The program is available on request. The 97 732 positive predictions of novel gene annotations from the 2005 GO Annotation dataset and other supplementary information is available at http://phenos.bsd.uchicago.edu/ITSS/ Contact: Lussier@uchicago.edu Supplementary information: Supplementary data are available atBioinformatics online.

Posted Content
TL;DR: The Linguistic Annotation Framework is intended to serve as a basis for harmonizing existing language resources as well as developing new ones within ISO TC37 SC4 WG1.
Abstract: This paper describes the Linguistic Annotation Framework under development within ISO TC37 SC4 WG1. The Linguistic Annotation Framework is intended to serve as a basis for harmonizing existing language resources as well as developing new ones.

Proceedings ArticleDOI
08 May 2007
TL;DR: P-TAG is proposed, a method which automatically generates personalized tags for Web pages and produces keywords relevant both to its textual content, but also to the data residing on the surfer's Desktop, thus expressing a personalized viewpoint.
Abstract: The success of the Semantic Web depends on the availability of Web pages annotated with metadata. Free form metadata or tags, as used in social bookmarking and folksonomies, have become more and more popular and successful. Such tags are relevant keywords associated with or assigned to a piece of information (e.g., a Web page), describing the item and enabling keyword-based classification. In this paper we propose P-TAG, a method which automatically generates personalized tags for Web pages. Upon browsing a Web page, P-TAG produces keywords relevant both to its textual content, but also to the data residing on the surfer's Desktop, thus expressing a personalized viewpoint. Empirical evaluations with several algorithms pursuing this approach showed very promising results. We are therefore very confident that such a user oriented automatic tagging approach can provide large scale personalized metadata annotations as an important step towards realizing the Semantic Web.

Journal ArticleDOI
TL;DR: The Munich Information Center for Protein Sequences combines automatic processing of large amounts of sequences with manual annotation of selected model genomes with the compilation of manually curated databases for protein interactions based on scrutinized information from the literature.
Abstract: The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) combines automatic processing of large amounts of sequences with manual annotation of selected model genomes. Due to the massive growth of the available data, the depth of annotation varies widely between independent databases. Also, the criteria for the transfer of information from known to orthologous sequences are diverse. To cope with the task of global in-depth genome annotation has become unfeasible. Therefore, our efforts are dedicated to three levels of annotation: (i) the curation of selected genomes, in particular from fungal and plant taxa (e.g. CYGD, MNCDB, MatDB), (ii) the comprehensive, consistent, automatic annotation employing exhaustive methods for the computation of sequence similarities and sequence-related attributes as well as the classification of individual sequences (SIMAP, PEDANT and FunCat) and (iii) the compilation of manually curated databases for protein interactions based on scrutinized information from the literature to serve as an accepted set of reliable annotated interaction data (MPACT, MPPI, CORUM). All databases and tools described as well as the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

Proceedings ArticleDOI
Rui Li1, Shenghua Bao1, Yong Yu1, Ben Fei2, Zhong Su2 
08 May 2007
TL;DR: This paper proposes a novel algorithm, namely Effective Large Scale Annotation Browser (ELSABer), to browse large-scale social annotation data and helps the users browse huge number of annotations in a semantic, hierarchical and efficient way.
Abstract: This paper is concerned with the problem of browsing social annotations. Today, a lot of services (e.g., Del.icio.us, Filckr) have been provided for helping users to manage and share their favorite URLs and photos based on social annotations. Due to the exponential increasing of the social annotations, more and more users, however, are facing the problem how to effectively find desired resources from large annotation data. Existing methods such as tag cloud and annotation matching work well only on small annotation sets. Thus, an effective approach for browsing large scale annotation sets and the associated resources is in great demand by both ordinary users and service providers. In this paper, we propose a novel algorithm, namely Effective Large Scale Annotation Browser (ELSABer), to browse large-scale social annotation data. ELSABer helps the users browse huge number of annotations in a semantic, hierarchical and efficient way. More specifically, ELSABer has the following features: 1) the semantic relations between annotations are explored for browsing of similar resources; 2) the hierarchical relations between annotations are constructed for browsing in a top-down fashion; 3) the distribution of social annotations is studied for efficient browsing. By incorporating the personal and time information, ELSABer can be further extended for personalized and time-related browsing. A prototype system is implemented and shows promising results.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: A content-based image annotation refinement (CIAR) algorithm is proposed to re-rank the candidate annotations of images and leverages both corpus information and the content feature of a query image.
Abstract: Automatic image annotation has been an active research topic due to its great importance in image retrieval and management. However, results of the state-of-the-art image annotation methods are often unsatisfactory. Despite continuous efforts in inventing new annotation algorithms, it would be advantageous to develop a dedicated approach that could refine imprecise annotations. In this paper, a novel approach to automatically refining the original annotations of images is proposed. For a query image, an existing image annotation method is first employed to obtain a set of candidate annotations. Then, the candidate annotations are re-ranked and only the top ones are reserved as the final annotations. By formulating the annotation refinement process as a Markov process and defining the candidate annotations as the states of a Markov chain, a content-based image annotation refinement (CIAR) algorithm is proposed to re-rank the candidate annotations. It leverages both corpus information and the content feature of a query image. Experimental results on a typical Corel dataset show not only the validity of the refinement, but also the superiority of the proposed algorithm over existing ones.

Proceedings ArticleDOI
24 Sep 2007
TL;DR: This work developed Kodak's consumer video benchmark data set, which includes a significant number of videos from actual users, a rich lexicon that accommodates consumers, and the annotation of a subset of concepts over the entire video data set.
Abstract: Semantic indexing of images and videos in the consumer domain has become a very important issue for both research and actual application. In this work we developed Kodak's consumer video benchmark data set, which includes (1) a significant number of videos from actual users, (2) a rich lexicon that accommodates consumers. needs, and (3) the annotation of a subset of concepts over the entire video data set. To the best of our knowledge, this is the first systematic work in the consumer domain aimed at the definition of a large lexicon, construction of a large benchmark data set, and annotation of videos in a rigorous fashion. Such effort will have significant impact by providing a sound foundation for developing and evaluating large-scale learning-based semantic indexing/annotation techniques in the consumer domain.

Journal ArticleDOI
TL;DR: The annotation of most genomes becomes outdated over time, owing in part to our ever-improving knowledge of genomes and in part due to improvements in bioinformatics software.
Abstract: The annotation of most genomes becomes outdated over time, owing in part to our ever-improving knowledge of genomes and in part to improvements in bioinformatics software. Unfortunately, annotation is rarely if ever updated and resources to support routine reannotation are scarce. Wiki software, which would allow many scientists to edit each genome's annotation, offers one possible solution.

Proceedings ArticleDOI
17 Sep 2007
TL;DR: A relational database representation is described that captures both the inter- and intra-layer dependencies and details of an object-oriented API for efficient, multi-tiered access to this data.
Abstract: The OntoNotes project is creating a corpus of large-scale, accurate, and integrated annotation of multiple levels of the shallow semantic structure in text. Such rich, integrated annotation covering many levels will allow for richer, cross-level models enabling significantly better automatic semantic analysis. At the same time, it demands a robust, efficient, scalable mechanism for storing and accessing these complex inter-dependent annotations. We describe a relational database representation that captures both the inter- and intra-layer dependencies and provide details of an object-oriented API for efficient, multi-tiered access to this data.

Proceedings Article
11 Oct 2007
TL;DR: An annotation methodology is described and encouraging initial results of inter-annotator agreement are reported, and Comparisons are made between different text sub-genres, and between annotators with different skills.
Abstract: The Clinical E-Science Framework (CLEF) project is building a framework for the capture, integration and presentation of clinical information: for clinical research, evidence-based health care and genotype-meets-phenotype informatics. A significant portion of the information required by such a framework originates as text, even in EHR-savvy organizations. CLEF uses Information Extraction (IE) to make this unstructured information available. An important part of IE is the identification of semantic entities and relationships. Typical approaches require human annotated documents to provide both evaluation standards and material for system development. CLEF has a corpus of clinical narratives, histopathology reports and imaging reports from 20 thousand patients. We describe the selection of a subset of this corpus for manual annotation of clinical entities and relationships. We describe an annotation methodology and report encouraging initial results of inter-annotator agreement. Comparisons are made between different text sub-genres, and between annotators with different skills.

Journal ArticleDOI
TL;DR: The proposed formal model captures both syntactic and semantic aspects of the annotations and is built on previously existing models and may be seen as an extension of them.
Abstract: This article is a study of the themes and issues concerning the annotation of digital contents, such as textual documents, images, and multimedia documents in general. These digital contents are automatically managed by different kinds of digital library management systems and more generally by different kinds of information management systems.Even though this topic has already been partially studied by other researchers, the previous research work on annotations has left many open issues. These issues concern the lack of clarity about what an annotation is, what its features are, and how it is used. These issues are mainly due to the fact that models and systems for annotations have only been developed for specific purposes. As a result, there is only a fragmentary picture of the annotation and its management, and this is tied to specific contexts of use and lacks-general validity.The aim of the article is to provide a unified and integrated picture of the annotation, ranging from defining what an annotation is to providing a formal model. The key ideas of the model are: the distinction between the meaning and the sign of the annotation, which represent the semantics and the materialization of an annotation, respectively; the clear formalization of the temporal dimension involved with annotations; and the introduction of a distributed hypertext between digital contents and annotations. Therefore, the proposed formal model captures both syntactic and semantic aspects of the annotations. Furthermore, it is built on previously existing models and may be seen as an extension of them.

Patent
Mor Naaman1, Marc Davis1, Shane Ahern1, Simon P. King1, Rahul Nair1, Jeannie Hui-I Yang1 
08 Feb 2007
TL;DR: In this paper, the authors present a set of annotation suggestions that are most likely to be relevant to the particular user and/or media context. But they do not specify whether the existing annotations were created or selected by the user, a member of the user's social network, or members of the general public.
Abstract: Disclosed are apparatus and methods for facilitating annotation of media objects by a user. Mechanisms present a user with an easily usable set of annotation suggestions that are most likely to be relevant to the particular user and/or media context. In general, existing annotations are analyzed to determine a set of suggested annotations. Annotation suggestions for a particular user are based on an analysis of the relevance, to the particular user, of existing annotations of one or more media objects so that the most likely relevant annotations are presented as suggested annotations. In particular embodiments, this analysis depends on whether the existing annotations were created and/or selected by the particular user, a member of the particular user's social network, or members of the general public.

Journal ArticleDOI
TL;DR: A Web-based tool for creating and sharing annotations is described and the effect on learning of its use with college students is investigated and it is concluded that there is value in further study of collaborative learning through shared annotation.
Abstract: Web-based learning has become an important way to enhance learning and teaching, offering many learning opportunities. A limitation of current Web-based learning is the restricted ability of students to personalize and annotate the learning materials. Providing personalized tools and analyzing some types of learning behavior, such as students' annotation, has attracted attention as a means to enhance Web-based learning. We describe a Web-based tool for creating and sharing annotations and investigate the effect on learning of its use with college students. First, an annotation tool was designed and implemented for the research. Second, learning support mechanisms, including full and group annotation sharing, were developed to promote students' motivation for annotation. Lastly, experiments with individual and shared annotation were conducted and the results show that the influence of annotation on learning performance becomes stronger with the use of sharing mechanisms. We conclude that there is value in further study of collaborative learning through shared annotation.

Journal ArticleDOI
01 Dec 2007
TL;DR: This paper presents a gesture annotation scheme for the specific purpose of automatically generating and animating character-specific hand/arm gestures, but with potential general value, and focuses on how to capture temporal structure and locational information with relatively little annotation effort.
Abstract: The empirical investigation of human gesture stands at the center of multiple research disciplines, and various gesture annotation schemes exist, with varying degrees of precision and required annotation effort. We present a gesture annotation scheme for the specific purpose of automatically generating and animating character-specific hand/arm gestures, but with potential general value. We focus on how to capture temporal structure and locational information with relatively little annotation effort. The scheme is evaluated in terms of how accurately it captures the original gestures by re-creating those gestures on an animated character using the annotated data. This paper presents our scheme in detail and compares it to other approaches.

Proceedings Article
01 Jun 2007
TL;DR: The issue whether a corpus annotated by means of AL can be re-used to train classifiers different from the ones employed by AL, supplying alternative feature sets as well is addressed.
Abstract: We consider the impact Active Learning (AL) has on effective and efficient text corpus annotation, and report on reduction rates for annotation efforts ranging up until 72%. We also address the issue whether a corpus annotated by means of AL ‐ using a particular classifier and a particular feature set ‐ can be re-used to train classifiers different from the ones employed by AL, supplying alternative feature sets as well. We, finally, report on our experience with the AL paradigm under real-world conditions, i.e., the annotation of large-scale document corpora for the life sciences.

Journal ArticleDOI
TL;DR: The project MIRIAM Resources allows an easy access to M IRIAM URIs and the associated information and is therefore crucial to foster a general use of MirIAM annotations in computational models of biological processes.
Abstract: Background The Minimal Information Requested In the Annotation of biochemical Models (MIRIAM) is a set of guidelines for the annotation and curation processes of computational models, in order to facilitate their exchange and reuse. An important part of the standard consists in the controlled annotation of model components, based on Uniform Resource Identifiers. In order to enable interoperability of this annotation, the community has to agree on a set of standard URIs, corresponding to recognised data types. MIRIAM Resources are being developed to support the use of those URIs.