Topic
Annotation
About: Annotation is a research topic. Over the lifetime, 6719 publications have been published within this topic receiving 203463 citations. The topic is also known as: note & markup.
Papers published on a yearly basis
Papers
More filters
•
AT&T1
TL;DR: In this paper, a method for collaborative sketch annotation of a program of multimedia content is proposed, which enables a first user to create a sketch annotation, enabling the second user to access the sketch annotation.
Abstract: A method for collaborative sketch annotating of a program of multimedia content includes enabling a first user to create a sketch annotation, enabling the first user to store sketch annotation data related to the sketch annotation, and enabling a second user to access the sketch annotation. The second user may navigate the program using the sketch annotation and/or an indication of the sketch annotation. The first user may create the sketch annotation while viewing the program, for example, and the program may be paused for adding the sketch annotation to one or more paused frames. The sketch annotations may include chronological information indicative of a chronological location of the sketch annotation within the program.
37 citations
01 Jan 2012
TL;DR: The NTU-MC compilation taps on the linguistic diversity of multilingual texts available within Singapore to provide valuable information on linguistic diversity for traditional linguistic research as well as natural language processing tasks.
Abstract: The NTU-MC compilation taps on the linguistic diversity of multilingual texts available within Singapore. The current version of NTU-MC contains 375,000 words (15,000 sentences) in 6 languages (English, Chinese, Japanese, Korean, Indonesian and Vietnamese) from 6 language families (Indo-European, Sino-Tibetan, Japonic, Korean as a language isolate, Austronesian and Austro-Asiatic). The NTU-MC is annotated with a layer of monolingual annotation (POS tags) and cross-lingual annotation (sentence-level alignments). The diverse language data and cross-lingual annotations provide valuable information on linguistic diversity for traditional linguistic research as well as natural language processing tasks. This paper describes the corpus compilation process with the evaluation of the monolingual and cross-lingual annotations of the corpus data. The corpus is available under the Creative Commons - Attribute 3.0 Unported license (CC by).
37 citations
21 Jul 2006
TL;DR: The Prague Dependency Treebank 2.0 contains a large amount of Czech texts with complex and interlinked morphological, syntactic and complex semantic annotation, adapted for the current Computational Linguistics research needs.
Abstract: The Prague Dependency Treebank 2.0 (PDT 2.0) contains a large amount of Czech texts with complex and interlinked morphological (two million words), syntactic (1.5 MW) and complex semantic annotation (0.8 MW); in addition, certain properties of sentence information structure and coreference relations are annotated at the semantic level.
PDT 2.0 is based on the long-standing Praguian linguistic tradition, adapted for the current Computational Linguistics research needs. The corpus itself uses the latest annotation technology. Software tools for corpus search, annotation and language analysis are included. Extensive documentation (in English) is provided as well.
36 citations
••
01 Jan 2001TL;DR: The tag sets and the annotation procedures that are currently being developed and tested are discussed, focusing on how some typical spoken language phenomena are dealt with in the CGN corpus.
Abstract: Of the ten million words of contemporary standard Dutch in the Spoken Dutch Corpus (Corpus Gesproken Nederlands, CGN), a selection of one million words of natural spoken language will be annotated syntactically. In the present paper we discuss the tag sets and the annotation procedures that are currently being developed and tested. The annotation tags provide information about syntactic constituents and about the semantic relations (dependencies) between these constituents. The annotation graphs allow crossing branches, which makes it possible to represent dependencies independently of surface word order. Moreover, constituents can carry multiple dependency roles, a feature that is exploited in the annotation of non-local dependencies and ellipsis. The annotation process is carried out semi-automatically, using an interactive annotation environment developed within the NEGRA project, a syntactically annotated corpus of German newspaper texts. We illustrate the approach with some real life examples from the CGN corpus, focusing on how some typical spoken language phenomena are dealt with.
36 citations
••
TL;DR: GenoSurf is implemented, a multi-ontology semantic search system providing access to a consolidated collection of metadata attributes found in the most relevant genomic datasets; values of 10 attributes are semantically enriched by making use of the most suited available ontologies.
Abstract: Many valuable resources developed by world-wide research institutions and consortia describe genomic datasets that are both open and available for secondary research, but their metadata search interfaces are heterogeneous, not interoperable and sometimes with very limited capabilities. We implemented GenoSurf, a multi-ontology semantic search system providing access to a consolidated collection of metadata attributes found in the most relevant genomic datasets; values of 10 attributes are semantically enriched by making use of the most suited available ontologies. The user of GenoSurf provides as input the search terms, sets the desired level of ontological enrichment and obtains as output the identity of matching data files at the various sources. Search is facilitated by drop-down lists of matching values; aggregate counts describing resulting files are updated in real time while the search terms are progressively added. In addition to the consolidated attributes, users can perform keyword-based searches on the original (raw) metadata, which are also imported; GenoSurf supports the interplay of attribute-based and keyword-based search through well-defined interfaces. Currently, GenoSurf integrates about 40 million metadata of several major valuable data sources, including three providers of clinical and experimental data (TCGA, ENCODE and Roadmap Epigenomics) and two sources of annotation data (GENCODE and RefSeq); it can be used as a standalone resource for targeting the genomic datasets at their original sources (identified with their accession IDs and URLs), or as part of an integrated query answering system for performing complex queries over genomic regions and metadata.
36 citations