Topic
Annotation
About: Annotation is a research topic. Over the lifetime, 6719 publications have been published within this topic receiving 203463 citations. The topic is also known as: note & markup.
Papers published on a yearly basis
Papers
More filters
••
29 Jun 2005TL;DR: Extensions to a corpus annotation scheme for the manual annotation of attributions, as well as opinions, emotions, sentiments, speculations, evaluations and other private states in language are described.
Abstract: This paper describes extensions to a corpus annotation scheme for the manual annotation of attributions, as well as opinions, emotions, sentiments, speculations, evaluations and other private states in language. It discusses the scheme with respect to the "Pie in the Sky" Check List of Desirable Semantic Information for Annotation. We believe that the scheme is a good foundation for adding private state annotations to other layers of semantic meaning.
95 citations
••
01 Sep 2009TL;DR: A novel and efficient approach, named domain adaptive semantic diffusion (DASD), to exploit semantic context while considering the domain-shift-of-context for large scale video concept annotation, which provides a means to handle domain change between training and test data.
Abstract: Learning to cope with domain change has been known as a challenging problem in many real-world applications. This paper proposes a novel and efficient approach, named domain adaptive semantic diffusion (DASD), to exploit semantic context while considering the domain-shift-of-context for large scale video concept annotation. Starting with a large set of concept detectors, the proposed DASD refines the initial annotation results using graph diffusion technique, which preserves the consistency and smoothness of the annotation over a semantic graph. Different from the existing graph learning methods which capture relations among data samples, the semantic graph treats concepts as nodes and the concept affinities as the weights of edges. Particularly, the DASD approach is capable of simultaneously improving the annotation results and adapting the concept affinities to new test data. The adaptation provides a means to handle domain change between training and test data, which occurs very often in video annotation task. We conduct extensive experiments to improve annotation results of 374 concepts over 340 hours of videos from TRECVID 2005-2007 data sets. Results show consistent and significant performance gain over various baselines. In addition, the proposed approach is very efficient, completing DASD over 374 concepts within just 2 milliseconds for each video shot on a regular PC.
94 citations
••
TL;DR: An ontology-based approach to automatic annotation of learning objects’ (LOs) content units that is tested in TANGRAM, an integrated learning environment for the domain of Intelligent Information Systems and provides a solution for automatic metadata generation for LOs components.
Abstract: This paper presents an ontology-based approach to automatic annotation of learning objects’ (LOs) content units that we tested in TANGRAM, an integrated learning environment for the domain of Intelligent Information Systems. The approach does not primarily focus on automatic annotation of entire LOs, as other relevant solutions do. Instead, it provides a solution for automatic metadata generation for LOs’ components (i.e., smaller, potentially reusable, content units). Here we mainly report on the content-mining algorithms and heuristics applied for determining values of certain metadata elements used to annotate content units. Specifically, the focus is on the following elements: title, description, unique identifier, subject (based on a domain ontology), and pedagogical role (based on an ontology of pedagogical roles). Additionally, as TANGRAM is grounded on an LO content structure ontology that drives the process of an LO decomposition into its constituent content units, each thus generated content unit is implicitly semantically annotated with its role/position in the LO’s structure. Employing such semantic annotations, TANGRAM allows assembling content units into new LOs personalized to the users’ goals, preferences, and learning styles. In order to provide the evaluation of the proposed solution, we describe our experiences with automatic annotation of slide presentations, one of the most common LO types.
94 citations
•
01 May 2004TL;DR: The First Release of the American National Corpus (ANC) was made available in mid-fall, 2003 and includes approximately 11 million words of American English, including written and spoken data and a variety of text types annotated for part of speech and lemma.
Abstract: The First Release of the American National Corpus (ANC) was made available in mid-fall, 2003. The data includes approximately 11 million words of American English, including written and spoken data and a variety of text types annotated for part of speech and lemma. The corpus is provided in XML format conformant to the XML Corpus Encoding Standard (XCES) (http://www.xml-ces.org), and is distributed in both a stand-off version (where annotation is in an XML document separate from the primary texts) and a merged version (where annotation is included in-line in the texts). The merged version includes annotation for part of speech and lemma produced by the Biber tagger; in stand-off annotation, in addition to the Biber tagging, morpho-syntactic annotations of the data are provided using the CLAWS 5 and 7 tagsets as well as several other tagsets.
94 citations
••
15 Aug 2005TL;DR: A novel method for automatic annotation of images with keywords from a generic vocabulary of concepts or objects for the purpose of content-based image retrieval and results are presented on two image-collections | COREL and key-frames from TRECVID.
Abstract: This paper introduces a novel method for automatic annotation of images with keywords from a generic vocabulary of concepts or objects for the purpose of content-based image retrieval. An image, represented as sequence of feature-vectors characterizing low-level visual features such as color, texture or oriented-edges, is modeled as having been stochastically generated by a hidden Markov model, whose states represent concepts. The parameters of the model are estimated from a set of manually annotated (training) images. Each image in a large test collection is then automatically annotated with the a posteriori probability of concepts present in it. This annotation supports content-based search of the image-collection via keywords. Various aspects of model parameterization, parameter estimation, and image annotation are discussed. Empirical retrieval results are presented on two image-collections | COREL and key-frames from TRECVID. Comparisons are made with two other recently developed techniques on the same datasets.
94 citations