scispace - formally typeset
Search or ask a question
Topic

Annotation

About: Annotation is a research topic. Over the lifetime, 6719 publications have been published within this topic receiving 203463 citations. The topic is also known as: note & markup.


Papers
More filters
Patent
Brian Amento1, Larry Stead1, Mukesh Nathan1
30 Sep 2008
TL;DR: In this paper, a method for collaborative sketch annotation of a program of multimedia content is proposed, which enables a first user to create a sketch annotation, enabling the second user to access the sketch annotation.
Abstract: A method for collaborative sketch annotating of a program of multimedia content includes enabling a first user to create a sketch annotation, enabling the first user to store sketch annotation data related to the sketch annotation, and enabling a second user to access the sketch annotation. The second user may navigate the program using the sketch annotation and/or an indication of the sketch annotation. The first user may create the sketch annotation while viewing the program, for example, and the program may be paused for adding the sketch annotation to one or more paused frames. The sketch annotations may include chronological information indicative of a chronological location of the sketch annotation within the program.

37 citations

01 Jan 2012
TL;DR: The NTU-MC compilation taps on the linguistic diversity of multilingual texts available within Singapore to provide valuable information on linguistic diversity for traditional linguistic research as well as natural language processing tasks.
Abstract: The NTU-MC compilation taps on the linguistic diversity of multilingual texts available within Singapore. The current version of NTU-MC contains 375,000 words (15,000 sentences) in 6 languages (English, Chinese, Japanese, Korean, Indonesian and Vietnamese) from 6 language families (Indo-European, Sino-Tibetan, Japonic, Korean as a language isolate, Austronesian and Austro-Asiatic). The NTU-MC is annotated with a layer of monolingual annotation (POS tags) and cross-lingual annotation (sentence-level alignments). The diverse language data and cross-lingual annotations provide valuable information on linguistic diversity for traditional linguistic research as well as natural language processing tasks. This paper describes the corpus compilation process with the evaluation of the monolingual and cross-lingual annotations of the corpus data. The corpus is available under the Creative Commons - Attribute 3.0 Unported license (CC by).

37 citations

21 Jul 2006
TL;DR: The Prague Dependency Treebank 2.0 contains a large amount of Czech texts with complex and interlinked morphological, syntactic and complex semantic annotation, adapted for the current Computational Linguistics research needs.
Abstract: The Prague Dependency Treebank 2.0 (PDT 2.0) contains a large amount of Czech texts with complex and interlinked morphological (two million words), syntactic (1.5 MW) and complex semantic annotation (0.8 MW); in addition, certain properties of sentence information structure and coreference relations are annotated at the semantic level. PDT 2.0 is based on the long-standing Praguian linguistic tradition, adapted for the current Computational Linguistics research needs. The corpus itself uses the latest annotation technology. Software tools for corpus search, annotation and language analysis are included. Extensive documentation (in English) is provided as well.

36 citations

Book ChapterDOI
01 Jan 2001
TL;DR: The tag sets and the annotation procedures that are currently being developed and tested are discussed, focusing on how some typical spoken language phenomena are dealt with in the CGN corpus.
Abstract: Of the ten million words of contemporary standard Dutch in the Spoken Dutch Corpus (Corpus Gesproken Nederlands, CGN), a selection of one million words of natural spoken language will be annotated syntactically. In the present paper we discuss the tag sets and the annotation procedures that are currently being developed and tested. The annotation tags provide information about syntactic constituents and about the semantic relations (dependencies) between these constituents. The annotation graphs allow crossing branches, which makes it possible to represent dependencies independently of surface word order. Moreover, constituents can carry multiple dependency roles, a feature that is exploited in the annotation of non-local dependencies and ellipsis. The annotation process is carried out semi-automatically, using an interactive annotation environment developed within the NEGRA project, a syntactically annotated corpus of German newspaper texts. We illustrate the approach with some real life examples from the CGN corpus, focusing on how some typical spoken language phenomena are dealt with.

36 citations

Journal ArticleDOI
01 Jan 2019-Database
TL;DR: GenoSurf is implemented, a multi-ontology semantic search system providing access to a consolidated collection of metadata attributes found in the most relevant genomic datasets; values of 10 attributes are semantically enriched by making use of the most suited available ontologies.
Abstract: Many valuable resources developed by world-wide research institutions and consortia describe genomic datasets that are both open and available for secondary research, but their metadata search interfaces are heterogeneous, not interoperable and sometimes with very limited capabilities. We implemented GenoSurf, a multi-ontology semantic search system providing access to a consolidated collection of metadata attributes found in the most relevant genomic datasets; values of 10 attributes are semantically enriched by making use of the most suited available ontologies. The user of GenoSurf provides as input the search terms, sets the desired level of ontological enrichment and obtains as output the identity of matching data files at the various sources. Search is facilitated by drop-down lists of matching values; aggregate counts describing resulting files are updated in real time while the search terms are progressively added. In addition to the consolidated attributes, users can perform keyword-based searches on the original (raw) metadata, which are also imported; GenoSurf supports the interplay of attribute-based and keyword-based search through well-defined interfaces. Currently, GenoSurf integrates about 40 million metadata of several major valuable data sources, including three providers of clinical and experimental data (TCGA, ENCODE and Roadmap Epigenomics) and two sources of annotation data (GENCODE and RefSeq); it can be used as a standalone resource for targeting the genomic datasets at their original sources (identified with their accession IDs and URLs), or as part of an integrated query answering system for performing complex queries over genomic regions and metadata.

36 citations


Network Information
Related Topics (5)
Inference
36.8K papers, 1.3M citations
81% related
Deep learning
79.8K papers, 2.1M citations
80% related
Graph (abstract data type)
69.9K papers, 1.2M citations
80% related
Unsupervised learning
22.7K papers, 1M citations
79% related
Cluster analysis
146.5K papers, 2.9M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20231,461
20223,073
2021305
2020401
2019383
2018373