scispace - formally typeset
Open AccessJournal ArticleDOI

Automatic detection of cyberbullying in social media text

Reads0
Chats0
TLDR
This paper describes the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and performs a series of binary classification experiments to determine the feasibility of automatic cyberbullies detection.
Abstract
While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for the task. Experiments on a hold-out test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1 score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Political discourse content analysis: a critical overview of a computerized text analysis program linguistic inquiry and word count (liwc)

TL;DR: The authors examined and analyzed the linguistic and psychological features of political discourse using a computer-based Linguistic Inquiry and Word Count (LIWC) content analysis program to explore the relationship between political discourse and the personality of politicians.
Proceedings ArticleDOI

Challenges and frontiers in abusive content detection

TL;DR: In this article, the authors delineate and clarify the main challenges and frontiers in the abusive content detection field, critically evaluate their implications and discuss potential solutions, and highlight ways in which social scientific insights can advance research.
Posted Content

Racism is a Virus: Anti-Asian Hate and Counterhate in Social Media during the COVID-19 Crisis

TL;DR: Analysis of the social network reveals that hateful and counterspeech users interact and engage extensively with one another, instead of living in isolated polarized communities, and finds that nodes were highly likely to become hateful after being exposed to hateful content in the year 2020.
Journal ArticleDOI

Misogyny Detection in Twitter: a Multilingual and Cross-Domain Study

TL;DR: It is concluded that misogyny is quite a specific kind of abusive language, while the experimentally found that it is different from sexism, which is worth to be explored in further investigation.
References
More filters
Journal Article

Scikit-learn: Machine Learning in Python

TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Journal ArticleDOI

LIBSVM: A library for support vector machines

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Journal ArticleDOI

Support-Vector Networks

TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Journal ArticleDOI

A Coefficient of agreement for nominal Scales

TL;DR: In this article, the authors present a procedure for having two or more judges independently categorize a sample of units and determine the degree, significance, and significance of the units. But they do not discuss the extent to which these judgments are reproducible, i.e., reliable.
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Related Papers (5)