Home
/
Authors
/
Chris Emmery

Author

Chris Emmery

Other affiliations: University of Antwerp

Bio: Chris Emmery is an academic researcher from Tilburg University. The author has contributed to research in topics: Computer science & Stylometry. The author has an hindex of 6, co-authored 16 publications receiving 232 citations. Previous affiliations of Chris Emmery include University of Antwerp.

Topics: Computer science, Stylometry, Data collection, Rule-based system, Rewriting ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Automatic detection of cyberbullying in social media text

[...]

Cynthia Van Hee¹, Gilles Jacobs¹, Chris Emmery², Bart Desmet¹, Els Lefever¹, Ben Verhoeven², Guy De Pauw², Walter Daelemans², Veronique Hoste¹ - Show less +5 more•Institutions (2)

Ghent University¹, University of Antwerp²

08 Oct 2018-PLOS ONE

TL;DR: This paper describes the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and performs a series of binary classification experiments to determine the feasibility of automatic cyberbullies detection.

...read moreread less

Abstract: While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for the task. Experiments on a hold-out test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1 score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems.

...read moreread less

231 citations

Journal Article•DOI•

Current Limitations in Cyberbullying Detection: on Evaluation Criteria, Reproducibility, and Data Scarcity

[...]

Chris Emmery¹, Chris Emmery², Ben Verhoeven², Guy De Pauw², Gilles Jacobs³, Cynthia Van Hee³, Els Lefever³, Bart Desmet³, Veronique Hoste³, Walter Daelemans² - Show less +6 more•Institutions (3)

Tilburg University¹, University of Antwerp², Ghent University³

25 Oct 2019-arXiv: Computation and Language

TL;DR: An effective crowdsourcing method is presented: simulating real-life bullying scenarios in a lab setting generates plausible data that can be effectively used to enrich real data, and largely circumvents the restrictions on data that could be collected, and increases classifier performance.

...read moreread less

Abstract: The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of the recent research uses small, heterogeneous datasets, without a thorough evaluation of applicability. In this paper, we further illustrate these issues, as we (i) evaluate many publicly available resources for this task and demonstrate difficulties with data collection. These predominantly yield small datasets that fail to capture the required complex social dynamics and impede direct comparison of progress. We (ii) conduct an extensive set of experiments that indicate a general lack of cross-domain generalization of classifiers trained on these sources, and openly provide this framework to replicate and extend our evaluation criteria. Finally, we (iii) present an effective crowdsourcing method: simulating real-life bullying scenarios in a lab setting generates plausible data that can be effectively used to enrich real data. This largely circumvents the restrictions on data that can be collected, and increases classifier performance. We believe these contributions can aid in improving the empirical practices of future research in the field.

...read moreread less

28 citations

Proceedings Article•

Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource

[...]

Stéphan Tulkens¹, Chris Emmery¹, Walter Daelemans¹•Institutions (1)

University of Antwerp¹

01 Jul 2016

TL;DR: The authors demonstrate the performance of multiple types of embeddings, created with both count and prediction-based architectures on a variety of corpora, in two language-specific tasks: relation evaluation, and dialect identification.

...read moreread less

Abstract: Word embeddings have recently seen a strong increase in interest as a result of strong performance gains on a variety of tasks. However, most of this research also underlined the importance of benchmark datasets, and the difficulty of constructing these for a variety of language-specific tasks. Still, many of the datasets used in these tasks could prove to be fruitful linguistic resources, allowing for unique observations into language use and variability. In this paper we demonstrate the performance of multiple types of embeddings, created with both count and prediction-based architectures on a variety of corpora, in two language-specific tasks: relation evaluation, and dialect identification. For the latter, we compare unsupervised methods with a traditional, hand-crafted dictionary. With this research, we provide the embeddings themselves, the relation evaluation task benchmark for use in further research, and demonstrate how the benchmarked embeddings prove a useful unsupervised linguistic resource, effectively used in a downstream task.

...read moreread less

26 citations

Posted Content•

Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource

[...]

Stéphan Tulkens¹, Chris Emmery¹, Walter Daelemans¹•Institutions (1)

University of Antwerp¹

01 Jul 2016-arXiv: Computation and Language

TL;DR: This paper demonstrates the performance of multiple types of embeddings, created with both count and prediction-based architectures on a variety of corpora, in two language-specific tasks: relation evaluation, and dialect identification.

...read moreread less

21 citations

Proceedings Article•DOI•

Simple Queries as Distant Labels for Predicting Gender on Twitter

[...]

Chris Emmery¹, Grzegorz Chrupała², Walter Daelemans¹•Institutions (2)

University of Antwerp¹, Tilburg University²

01 Sep 2017

TL;DR: This paper demonstrates the effectiveness of gathering distant labels for self-reported gender on Twitter using simple queries and offers a cheap, extensible, and fast alternative that can be employed beyond the task of gender classification.

...read moreread less

Abstract: The majority of research on extracting missing user attributes from social media profiles use costly hand-annotated labels for supervised learning. Distantly supervised methods exist, although these generally rely on knowledge gathered using external sources. This paper demonstrates the effectiveness of gathering distant labels for self-reported gender on Twitter using simple queries. We confirm the reliability of this query heuristic by comparing with manual annotation. Moreover, using these labels for distant supervision, we demonstrate competitive model performance on the same data as models trained on manual annotations. As such, we offer a cheap, extensible, and fast alternative that can be employed beyond the task of gender classification.

...read moreread less

18 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Исследование влияния пола и психологических характеристик автора на количественные параметры его текста с использованием программы Linguistic Inquiry and Word Count

[...]

Литвинова Татьяна Александровна, Литвинова Ольга Александровна, Рыжкова Екатерина Сергеевна, Бирюкова Елизавета Дмитриевна, Середин Павел Владимирович, Загоровская Ольга Владимировна - Show less +2 more

01 Jan 2015

410 citations

Journal Article•DOI•

Political discourse content analysis: a critical overview of a computerized text analysis program linguistic inquiry and word count (liwc)

[...]

Angelika Yanovets, Oksana Smal

30 Jan 2020

TL;DR: The authors examined and analyzed the linguistic and psychological features of political discourse using a computer-based Linguistic Inquiry and Word Count (LIWC) content analysis program to explore the relationship between political discourse and the personality of politicians.

...read moreread less

Abstract: The article examines and analyzes the linguistic and psychological features of political discourse using a computer-based Linguistic Inquiry and Word Count (LIWC) content analysis program to explore the relationship between political discourse and the personality of politicians. As for political discourse, it is perhaps the communicator, the linguistic personality, who plays the most important role in the communication. The linguistic personality of a politician is of particular interest in political discourse content-analysis, since it has the greatest influence on the public consciousness via mass media. Using text as a source of psychological and cognitive information has been gaining popularity. Researchers use a variety of methods to analyze texts, but Linguistic Inquiry Word Count (LIWC) has proved to be the most common technique. The analysis of linguistic patterns of political discourse shows that in the context of political speech events such as media interviews, politicians make a unique choice of lexical units, which can be interpreted as a manifestation of certain personality traits. However, despite the significance of the results, there are clear limitations to the use of computerized methodologies to make political discourse content-analysis, such as the limited interpretive capacity of software to understand pragmatic and contextual use of lexical units.

...read moreread less

286 citations

Journal Article•DOI•

Do not blame it on the algorithm: an empirical assessment of multiple recommender systems and their impact on content diversity

[...]

Judith Möller¹, Damian Trilling¹, Natali Helberger¹, B. van Es•Institutions (1)

University of Amsterdam¹

01 Mar 2018-Information, Communication & Society

TL;DR: It is found that all of the recommendation logics under study proved to lead to a rather diverse set of recommendations that are on par with human editors and that basing recommendations on user histories can substantially increase topic diversity within a recommendation set.

...read moreread less

Abstract: In the debate about filter bubbles caused by algorithmic news recommendation, the conceptualization of the two core concepts in this debate, diversity and algorithms, has received little attention in social scientific research. This paper examines the effect of multiple recommender systems on different diversity dimensions. To this end, it maps different values that diversity can serve, and a respective set of criteria that characterizes a diverse information offer in this particular conception of diversity. We make use of a data set of simulated article recommendations based on actual content of one of the major Dutch broadsheet newspapers and its users (N=21,973 articles, N=500 users). We find that all of the recommendation logics under study proved to lead to a rather diverse set of recommendations that are on par with human editors and that basing recommendations on user histories can substantially increase topic diversity within a recommendation set.

...read moreread less

222 citations

Proceedings Article•DOI•

Challenges and frontiers in abusive content detection

[...]

Bertie Vidgen, Alex Harris, Dong Nguyen, Rebekah Tromble, Scott A. Hale, Helen Margetts - Show less +2 more

01 Aug 2019

TL;DR: In this article, the authors delineate and clarify the main challenges and frontiers in the abusive content detection field, critically evaluate their implications and discuss potential solutions, and highlight ways in which social scientific insights can advance research.

...read moreread less

Abstract: Online abusive content detection is an inherently difficult task. It has received considerable attention from academia, particularly within the computational linguistics community, and performance appears to have improved as the field has matured. However, considerable challenges and unaddressed frontiers remain, spanning technical, social and ethical dimensions. These issues constrain the performance, efficiency and generalizability of abusive content detection systems. In this article we delineate and clarify the main challenges and frontiers in the field, critically evaluate their implications and discuss potential solutions. We also highlight ways in which social scientific insights can advance research. We discuss the lack of support given to researchers working with abusive content and provide guidelines for ethical research.

...read moreread less

153 citations

Youth Risk Behavior Surveillance - United States, 1993

[...]

Laura Kann, Charles W. Warren, Barbara I. Williams

24 Mar 1995

TL;DR: This report summarizes results from the national survey, 24 state surveys, and nine local surveys conducted among high school students during February through May 1993, which indicated substantial morbidity and social problems among adolescents also result from unintended pregnancies and sexually transmitted diseases.

...read moreread less

Abstract: PROBLEM/CONDITION Priority health risk behaviors that contribute to the leading causes of mortality, morbidity, and social problems among youth and adults often are established during youth, extend into adulthood, and are interrelated. REPORTING PERIOD February through May 1993. DESCRIPTION OF SYSTEM The Youth Risk Behavior Surveillance System (YRBSS) monitors six categories of priority health risk behaviors among youth and young adults: behaviors that contribute to unintentional and intentional injuries, tobacco use, alcohol and other drug use, sexual behaviors, dietary behaviors, and physical activity. The YRBSS includes a national, school-based survey conducted by CDC and state and local school-based surveys conducted by state and local education agencies. This report summarizes results from the national survey, 24 state surveys, and nine local surveys conducted among high school students during February through May 1993. RESULTS AND INTERPRETATION In the United States, 72% of all deaths among school-age youth and young adults are from four causes: motor vehicle crashes, other unintentional injuries, homicide, and suicide. Results from the 1993 YRBSS suggest that many high school students practice behaviors that may increase their likelihood of death from these four causes: 19.1% rarely or never used a safety belt, 35.3% had ridden with a driver who had been drinking alcohol during the 30 days preceding the survey, 22.1% had carried a weapon during the 30 days preceding the survey, 80.9% ever drank alcohol, 32.8% ever used marijuana, and 8.6% had attempted suicide during the 12 months preceding the survey. Substantial morbidity and social problems among adolescents also result from unintended pregnancies and sexually transmitted diseases, including human immunodeficiency virus (HIV) infection. YRBSS results indicate that in 1993, 53.0% of high school students had had sexual intercourse, 52.8% of sexually active students had used a condom during last sexual intercourse, and 1.4% ever injected an illegal drug. Among adults, 67% of all deaths are from three causes: heart disease, cancer, and stroke. In 1993, many high school students practiced behaviors that may increase the risk for these health problems: 30.5% of high school students had smoked cigarettes during the 30 days preceding the survey, only 15.4% had eaten five or more servings of fruits and vegetables during the day preceding the survey, and only 34.3% had attended physical education class daily. ACTIONS TAKEN YRBSS data are being used nationwide by health and education officials to improve school health policies and programs designed to reduce risks associated with the leading causes of mortality and morbidity. At the national level, YRBSS data are being used to measure progress toward achieving 26 national health objectives and one of eight National Education Goals.

...read moreread less

132 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68

Collapse