scispace - formally typeset
Search or ask a question
Topic

Annotation

About: Annotation is a research topic. Over the lifetime, 6719 publications have been published within this topic receiving 203463 citations. The topic is also known as: note & markup.


Papers
More filters
Patent
21 Jan 2009
TL;DR: In this article, a method of generating annotation tags (28) for a digital image (22) includes maintaining a library (16) of human-meaningful words or phrases organized as category entries (72) according to a number of defined image description categories (70), and receiving context metadata (20) associated with the capture of a given digital image.
Abstract: In one embodiment, a method of generating annotation tags (28) for a digital image (22) includes maintaining a library (16) of human-meaningful words or phrases organized as category entries (72) according to a number of defined image description categories (70), and receiving context metadata (20) associated with the capture of a given digital image (22). The method further includes selecting particular category entries (72-1, 72-2) as vocabulary metadata (24) for the digital image (22) by mapping the context metadata (20) into the library (16), and generating annotation tags (28) for the digital image (22) by logically combining the vocabulary metadata (24) according to a defined set of deductive logic rules (30) that are predicated on the defined image description categories (70). In another embodiment, a processing apparatus (12), such as a digital processor (18, 26) and supporting memory (14), etc., is configured to carry out the above method, or to carry out variations of the above method.

40 citations

Posted ContentDOI
24 Apr 2018-bioRxiv
TL;DR: EnTAP (Eukaryotic Non-Model Transcriptome Annotation Pipeline) was designed to improve the accuracy, speed, and flexibility of functional gene annotation for de novo assembled transcriptomes in non-model eukaryotes.
Abstract: EnTAP (Eukaryotic Non-Model Transcriptome Annotation Pipeline) was designed to improve the accuracy, speed, and flexibility of functional gene annotation for de novo assembled transcriptomes in non-model eukaryotes. This software package addresses the fragmentation and related assembly issues that result in inflated transcript estimates and poor annotation rates. Following filters applied through assessment of true expression and frame selection, open-source tools are leveraged to functionally annotate the translated proteins. Downstream features include fast similarity search across three repositories, protein domain assignment, orthologous gene family assessment, and Gene Ontology term assignment. The final annotation integrates across multiple databases and selects an optimal assignment from a combination of weighted metrics describing similarity search score, taxonomic relationship, and informativeness. Researchers have the option to include additional filters to identify and remove contaminants, identify associated pathways, and prepare the transcripts for enrichment analysis. This fully featured pipeline is easy to install, configure, and runs significantly faster than comparable annotation packages. It is developed to contend with many of the issues in existing software solutions. EnTAP is optimized to generate extensive functional information for the gene space of organisms with limited or poorly characterized genomic resources.

40 citations

01 Mar 2000
TL;DR: The creation and initial annotation of the Monroe corpus is discussed, a collection of video and audio data of 20 human-human, mixed-initiative, task-oriented dialogs about disaster-handling tasks, which describes how the dialogs were collected, what tasks were used, and how the data was transcribed and aligned.
Abstract: In this report we discuss the creation and initial annotation of the Monroe corpus, a collection of video and audio data of 20 human-human, mixed-initiative, task-oriented dialogs about disaster-handling tasks. We describe how the dialogs were collected, what tasks were used, and how the data was transcribed and aligned.

40 citations

Proceedings ArticleDOI
01 Aug 2019
TL;DR: A robust English corpus and annotation schema is presented that allows us to explore the less straightforward examples of term-definition structures in free and semi-structured text.
Abstract: Definition extraction has been a popular topic in NLP research for well more than a decade, but has been historically limited to well-defined, structured, and narrow conditions. In reality, natural language is messy, and messy data requires both complex solutions and data that reflects that reality. In this paper, we present a robust English corpus and annotation schema that allows us to explore the less straightforward examples of term-definition structures in free and semi-structured text.

40 citations

Dissertation
07 Jul 2017
TL;DR: This thesis investigates the problem of detecting hate speech posted online with an exhaustive and methodical approach, and investigates the potential advantages of using hierarchical classes to annotate a dataset.
Abstract: The use of the internet and social networks, in particular for communication, has significantly increased in recent years. This growth has also resulted in the adoption of more aggressive communication. Therefore it is important that governments and social network platforms have tools to detect this type of communication, because it can be harmful to its targets. In this thesis we investigate the problem of detecting hate speech posted online. The first goal of our work was to make a complete overview on the topic, focusing on the perspective of computer science and engineering. We adopted an exhaustive and methodical approach that we called Systematic Literature Review. As a result, we critically summarized different perspectives on the hate speech concept and complemented our definition with rules, examples, and a comparison with other related concepts, such as cyberbullying and abusive language. Regarding the past work in the topic, we observed that the majority of the studies tackles this problem as a machine learning classification task and the studies use either general text mining features (e.g. n-grams, word2vec), or hate speech specific features (e.g. othering discourse). In the majority of these studies new datasets are collected, but those remain private, which makes more difficult to compare results across different works. We concluded also that this field is still in an early stage, with several open research opportunities. As we found no research on the topic in Portuguese, the second goal of this work was to annotate a dataset for this language and to make it available as well. Regarding the dataset annotation, we built a classification system using a hierarchical structure. The main advantage of this strategy is that it allows to better consider nuances in the hate speech concept, such as the existence and intersectionality of the subtypes of hate speech. Our data was collected from Twitter, and manually annotated by following a set of rules, that are also a valuable product of our work. We annotated a dataset with 5,668 messages from 1,156 distinct users, where 85 distinct classes of hate speech were considered. From the total 5,668 messages, around 22% contain some type of hate speech. Regarding the annotators agreement, using the hierarchical approach allowed us to improve results, however this was still an issue in identifying hate speech. Further analysis pointed out that the several types of hate speech present different characteristics (e.g. distinct number of messages, time occurrences, vocabulary size, distinct n-grams and POS). A final goal of our thesis was to investigate the potential advantages of using hierarchical classes to annotate a dataset. For this, we used the dataset annotated for Portuguese and we conducted an experiment with training, validation and test phases. In this experiment we compare two different approaches: we called unimodel to the model using only the hate speech class; and multimodel to the model using the several hierarchical classes. The main conclusion of our experiment was that the performance of the multimodel seemed to be slightly better than the unimodel in the F1 metric, and additionally, our method helped to identify a larger number of hate speech messages. This is the case because it has a better recall, in detriment of the precision. Finally, we think that in the future this experiment can be extended in order to better identify hate speech and the respective subtypes.

40 citations


Network Information
Related Topics (5)
Inference
36.8K papers, 1.3M citations
81% related
Deep learning
79.8K papers, 2.1M citations
80% related
Graph (abstract data type)
69.9K papers, 1.2M citations
80% related
Unsupervised learning
22.7K papers, 1M citations
79% related
Cluster analysis
146.5K papers, 2.9M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20231,461
20223,073
2021305
2020401
2019383
2018373