Transfer Learning for Hate Speech Detection in Social Media

Posted Content•

Transfer Learning for Hate Speech Detection in Social Media

Marian-Andrei Rizoiu, Tianyu Wang, Gabriela Ferraro, Hanna Suominen

10 Jun 2019-arXiv: Social and Information Networks-

TL;DR: Developing automated text analytics methods, capable of jointly learning a single representation of hate from several smaller, unrelated data sets, that enables generating an interpretable two-dimensional text visualization called the Map of Hate that is capable of separating different types of hate speech and explaining what makes text harmful.

read less

Abstract: In today's society more and more people are connected to the Internet, and its information and communication technologies have become an essential part of our everyday life. Unfortunately, the flip side of this increased connectivity to social media and other online contents is cyber-bullying and -hatred, among other harmful and anti-social behaviors. Models based on machine learning and natural language processing provide a way to detect this hate speech in web text in order to make discussion forums and other media and platforms safer. The main difficulty, however, is annotating a sufficiently large number of examples to train these models. In this paper, we report on developing automated text analytics methods, capable of jointly learning a single representation of hate from several smaller, unrelated data sets. We train and test our methods on the total of $37,520$ English tweets that have been annotated for differentiating harmless messages from racist or sexists contexts in the first detection task, and hateful or offensive contents in the second detection task. Our most sophisticated method combines a deep neural network architecture with transfer learning. It is capable of creating word and sentence embeddings that are specific to these tasks while also embedding the meaning of generic hate speech. Its prediction correctness is the macro-averaged F1 of $78\%$ and $72\%$ in the first and second task, respectively. This method enables generating an interpretable two-dimensional text visualization --- called the Map of Hate --- that is capable of separating different types of hate speech and explaining what makes text harmful. These methods and insights hold a potential for not only safer social media, but also reduced need to expose human moderators and annotators to distressing online~messaging.

...read moreread less

Cites background from "Transfer Learning for Hate Speech D..."

Cites background from "Transfer Learning for Hate Speech D..."

Cites background from "Transfer Learning for Hate Speech D..."

"Transfer Learning for Hate Speech D..." refers methods in this paper

"Transfer Learning for Hate Speech D..." refers methods in this paper

Transfer Learning for Hate Speech Detection in Social Media

Citations

Cites background from "Transfer Learning for Hate Speech D..."

Cites background from "Transfer Learning for Hate Speech D..."

Cites background from "Transfer Learning for Hate Speech D..."

References

"Transfer Learning for Hate Speech D..." refers methods in this paper

"Transfer Learning for Hate Speech D..." refers methods in this paper

Related Papers (5)