scispace - formally typeset
Proceedings ArticleDOI

Overview of the HASOC track at FIRE 2019: Hate Speech and Offensive Content Identification in Indo-European Languages

Reads0
Chats0
TLDR
The HASOC track intends to stimulate development in Hate Speech for Hindi, German and English by identifying Hate Speech in Social Media using LSTM networks processing word embedding input.
Abstract
The identification of Hate Speech in Social Media is of great importance and receives much attention in the text classification community. There is a huge demand for research for languages other than English. The HASOC track intends to stimulate development in Hate Speech for Hindi, German and English. Three datasets were developed from Twitter and Facebook and made available. Binary classification and more fine-grained subclasses were offered in 3 subtasks. For all subtasks, 321 experiments were submitted. The approaches used most often were LSTM networks processing word embedding input. The performance of the best system for identification of Hate Speech for English, Hindi, and German was a Marco-F1 score of 0.78, 0.81 and 0.61, respectively.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)

TL;DR: The SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2020) as mentioned in this paper included three subtasks corresponding to the hierarchical taxonomy of the OLID schema, and was offered in five languages: Arabic, Danish, English, Greek, and Turkish.
Journal ArticleDOI

Resources and benchmark corpora for hate speech detection: a systematic review

TL;DR: This review systematically analyze the resources made available by the community at large, including their development methodology, topical focus, language coverage, and other factors, to highlight a heterogeneous, growing landscape.
Proceedings ArticleDOI

Overview of the HASOC Track at FIRE 2020: Hate Speech and Offensive Language Identification in Tamil, Malayalam, Hindi, English and German

TL;DR: The HASOC track as mentioned in this paper is dedicated to evaluate technology for finding offensive language and hate speech, which has attracted much interest and over 40 research groups have participated as well as described their approaches in papers.
Posted Content

A Large-Scale Semi-Supervised Dataset for Offensive Language Identification

TL;DR: This work creates the largest available dataset for this task, SOLID, which contains over nine million English tweets labeled in a semi-supervised manner, and demonstrates experimentally that using SOLID along with OLID yields improved performance on the OLID test set for two different models, especially for the lower levels of the taxonomy.
References
More filters
Proceedings ArticleDOI

A Survey on Hate Speech Detection using Natural Language Processing

TL;DR: A survey on hate speech detection describes key areas that have been explored to automatically recognize these types of utterances using natural language processing and discusses limits of those approaches.
Posted Content

Automated Hate Speech Detection and the Problem of Offensive Language

TL;DR: This article used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords and trained a multi-class classifier to distinguish hate speech from other offensive language, finding that racist and homophobic tweets are more likely to be classified as hate speech but that sexist tweets are generally classified as offensive.
Journal ArticleDOI

A Survey on Automatic Detection of Hate Speech in Text

TL;DR: This survey organizes and describes the current state of the field, providing a structured overview of previous approaches, including core algorithms, methods, and main features used, and provides a unifying definition of hate speech.
Proceedings ArticleDOI

SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter

TL;DR: The paper describes the organization of the SemEval 2019 Task 5 about the detection of hate speech against immigrants and women in Spanish and English messages extracted from Twitter, and provides an analysis and discussion about the participant systems and the results they achieved in both subtasks.
Proceedings ArticleDOI

SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval).

TL;DR: The SemEval-2019 Task 6 on Identifying and categorizing Offensive Language in Social Media (OffensEval) as mentioned in this paper was based on a new dataset, the Offensive Language Identification Dataset (OLID), which contains over 14,000 English tweets, and featured three sub-tasks.
Related Papers (5)