Home
/
Authors
/
Veronique Hoste

Author

Veronique Hoste

Other affiliations: Katholieke Universiteit Leuven, University of Antwerp, Hogeschool Gent

Bio: Veronique Hoste is an academic researcher from Ghent University. The author has contributed to research in topics: Machine translation & Sentiment analysis. The author has an hindex of 28, co-authored 173 publications receiving 3760 citations. Previous affiliations of Veronique Hoste include Katholieke Universiteit Leuven & University of Antwerp.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2005
2004
2003
2002
2001
2000

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

SemEval-2016 task 5 : aspect based sentiment analysis

[...]

Maria Pontiki, Dimitris Galanis, Haris Papageorgiou, Ion Androutsopoulos¹, Suresh Manandhar², Mohammad AL-Smadi³, Mahmoud Al-Ayyoub³, Yanyan Zhao⁴, Bing Qin⁴, Orphée De Clercq⁵, Veronique Hoste⁶, Marianna Apidianaki, Xavier Tannier, Natalia V. Loukachevitch⁷, Evgeniy V. Kotelnikov, Núria Bel⁸, Salud María Jiménez-Zafra⁹, Gülşen Eryiğit¹⁰ - Show less +14 more•Institutions (10)

Athens University of Economics and Business¹, University of York², Jordan University of Science and Technology³, Harbin Institute of Technology⁴, Ghent University⁵, Hogeschool Gent⁶, Moscow State University⁷, Pompeu Fabra University⁸, University of Jaén⁹, Istanbul Technical University¹⁰

01 Jan 2016

TL;DR: This paper describes the SemEval 2016 shared task on Aspect Based Sentiment Analysis (ABSA), a continuation of the respective tasks of 2014 and 2015, which attracted 245 submissions from 29 teams and provided 19 training and 20 testing datasets.

...read moreread less

Abstract: This paper describes the SemEval 2016 shared task on Aspect Based Sentiment Analysis (ABSA), a continuation of the respective tasks of 2014 and 2015. In its third year, the task provided 19 training and 20 testing datasets for 8 languages and 7 domains, as well as a common evaluation procedure. From these datasets, 25 were for sentence-level and 14 for text-level ABSA; the latter was introduced for the first time as a subtask in SemEval. The task attracted 245 submissions from 29 teams.

...read moreread less

1,139 citations

Proceedings Article•DOI•

SemEval-2018 Task 3: Irony Detection in English Tweets

[...]

Cynthia Van Hee¹, Els Lefever¹, Veronique Hoste¹•Institutions (1)

Ghent University¹

01 Jun 2018

TL;DR: This paper presents the first shared task on irony detection: given a tweet, automatic natural language processing systems should determine whether the tweet is ironic and which type of irony (if any) is expressed (Task B) and demonstrates that fine-grained irony classification is much more challenging than binary irony detection.

...read moreread less

Abstract: This paper presents the first shared task on irony detection: given a tweet, automatic natural language processing systems should determine whether the tweet is ironic (Task A) and which type of irony (if any) is expressed (Task B). The ironic tweets were collected using irony-related hashtags (i.e. #irony, #sarcasm, #not) and were subsequently manually annotated to minimise the amount of noise in the corpus. Prior to distributing the data, hashtags that were used to collect the tweets were removed from the corpus. For both tasks, a training corpus of 3,834 tweets was provided, as well as a test set containing 784 tweets. Our shared tasks received submissions from 43 teams for the binary classification Task A and from 31 teams for the multiclass Task B. The highest classification scores obtained for both subtasks are respectively F1= 0.71 and F1= 0.51 and demonstrate that fine-grained irony classification is much more challenging than binary irony detection.

...read moreread less

241 citations

Journal Article•DOI•

Automatic detection of cyberbullying in social media text

[...]

Cynthia Van Hee¹, Gilles Jacobs¹, Chris Emmery², Bart Desmet¹, Els Lefever¹, Ben Verhoeven², Guy De Pauw², Walter Daelemans², Veronique Hoste¹ - Show less +5 more•Institutions (2)

Ghent University¹, University of Antwerp²

08 Oct 2018-PLOS ONE

TL;DR: This paper describes the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and performs a series of binary classification experiments to determine the feasibility of automatic cyberbullies detection.

...read moreread less

Abstract: While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for the task. Experiments on a hold-out test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1 score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems.

...read moreread less

231 citations

Journal Article•DOI•

Emotion detection in suicide notes

[...]

Bart Desmet¹, Veronique Hoste¹•Institutions (1)

Hogeschool Gent¹

01 Nov 2013-Expert Systems With Applications

TL;DR: It is shown that fine-grained automatic emotion detection benefits from classifier optimization and a combined lexico-semantic feature representation, and concludes that natural language processing techniques have future application potential for suicide prevention.

...read moreread less

Abstract: The success of suicide prevention, a major public health concern worldwide, hinges on adequate suicide risk assessment. Online platforms are increasingly used for expressing suicidal thoughts, but manual monitoring is unfeasible given the information overload experts are confronted with. We investigate whether the recent advances in natural language processing, and more specifically in sentiment mining, can be used to accurately pinpoint 15 different emotions, which might be indicative of suicidal behavior. A system for automatic emotion detection was built using binary support vector machine classifiers. We hypothesized that lexical and semantic features could be an adequate way to represent the data, as emotions seemed to be lexicalized consistently. The optimal feature combination for each of the different emotions was determined using bootstrap resampling. Spelling correction was applied to the input data, in order to reduce lexical variation. Classification performance varied between emotions, with scores up to 68.86% F-score. F-scores above 40% were achieved for six of the seven most frequent emotions: thankfulness, guilt, love, information, hopelessness and instructions. The most salient features are trigram and lemma bags-of-words and subjectivity clues. Spelling correction had a slightly positive effect on classification performance. We showed that fine-grained automatic emotion detection benefits from classifier optimization and a combined lexico-semantic feature representation. The modest performance improvements obtained through spelling correction might indicate the robustness of the system to noisy input text. We conclude that natural language processing techniques have future application potential for suicide prevention.

...read moreread less

167 citations

Proceedings Article•

SemEval-2010 Task 1: Coreference Resolution in Multiple Languages

[...]

Marta Recasens¹, Lluís Màrquez², Emili Sapena², M. Antònia Martí¹, Mariona Taulé¹, Veronique Hoste³, Massimo Poesio⁴, Yannick Versley⁵ - Show less +4 more•Institutions (5)

University of Barcelona¹, Polytechnic University of Catalonia², Hogeschool Gent³, University of Trento⁴, University of Tübingen⁵

15 Jul 2010

TL;DR: The SemEval-2010 task on coreference resolution in multiple languages as mentioned in this paper evaluated and compared automatic resolution systems for six different languages (Catalan, Dutch, English, German, Italian and Spanish) in four evaluation settings and using four different metrics.

...read moreread less

Abstract: This paper presents the SemEval-2010 task on Coreference Resolution in Multiple Languages. The goal was to evaluate and compare automatic coreference resolution systems for six different languages (Catalan, Dutch, English, German, Italian, and Spanish) in four evaluation settings and using four different metrics. Such a rich scenario had the potential to provide insight into key issues concerning coreference resolution: (i) the portability of systems across languages, (ii) the relevance of different levels of linguistic information, and (iii) the behavior of scoring metrics.

...read moreread less

165 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Journal Article•

Data Mining Practical Machine Learning Tools and Techniques

[...]

อนิรุธ สืบสิงห์

01 Jan 2014-Journal of management science

9,185 citations

Journal Article•DOI•

Regression Diagnostics: Identifying Influential Data and Sources of Collinearity

[...]

W. W. Muir¹•Institutions (1)

University of Strathclyde¹

01 May 1981

TL;DR: This chapter discusses Detecting Influential Observations and Outliers, a method for assessing Collinearity, and its applications in medicine and science.

...read moreread less

Abstract: 1. Introduction and Overview. 2. Detecting Influential Observations and Outliers. 3. Detecting and Assessing Collinearity. 4. Applications and Remedies. 5. Research Issues and Directions for Extensions. Bibliography. Author Index. Subject Index.

...read moreread less

4,948 citations

Journal Article•DOI•

Word sense disambiguation: A survey

[...]

Roberto Navigli¹•Institutions (1)

Sapienza University of Rome¹

23 Feb 2009-ACM Computing Surveys

TL;DR: This work introduces the reader to the motivations for solving the ambiguity of words and provides a description of the task, and overviews supervised, unsupervised, and knowledge-based approaches.

...read moreread less

Abstract: Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problems in artificial intelligence. We introduce the reader to the motivations for solving the ambiguity of words and provide a description of the task. We overview supervised, unsupervised, and knowledge-based approaches. The assessment of WSD systems is discussed in the context of the Senseval/Semeval campaigns, aiming at the objective evaluation of systems participating in several different disambiguation tasks. Finally, applications, open problems, and future directions are discussed.

...read moreread less

2,178 citations

Journal Article•DOI•

Analyzing linguistic data: a practical introduction to statistics using R

[...]

Elisabeth Dévière¹•Institutions (1)

Katholieke Universiteit Leuven¹

16 Apr 2009-Journal of Applied Statistics

TL;DR: The author guides the reader in about 350 pages from descriptive and basic statistical methods over classification and clustering to (generalised) linear and mixed models to enable researchers and students alike to reproduce the analyses and learn by doing.

...read moreread less

Abstract: The complete title of this book runs ‘Analyzing Linguistic Data: A Practical Introduction to Statistics using R’ and as such it very well reflects the purpose and spirit of the book. The author guides the reader in about 350 pages from descriptive and basic statistical methods over classification and clustering to (generalised) linear and mixed models. Each of the methods is introduced in the context of concrete linguistic problems and demonstrated on exciting datasets from current research in the language sciences. In line with its practical orientation, the book focuses primarily on using the methods and interpreting the results. This implies that the mathematical treatment of the techniques is held at a minimum if not absent from the book. In return, the reader is provided with very detailed explanations on how to conduct the analyses using R [1]. The first chapter sets the tone being a 20-page introduction to R. For this and all subsequent chapters, the R code is intertwined with the chapter text and the datasets and functions used are conveniently packaged in the languageR package that is available on the Comprehensive R Archive Network (CRAN). With this approach, the author has done an excellent job in enabling researchers and students alike to reproduce the analyses and learn by doing. Another quality as a textbook is the fact that every chapter ends with Workbook sections where the user is invited to exercise his or her analysis skills on supplemental datasets. Full solutions including code, results and comments are given in Appendix A (30 pages). Instructors are therefore very well served by this text, although they might want to balance the book with some more mathematical treatment depending on the target audience. After the introductory chapter on R, the book opens on graphical data exploration. Chapter 3 treats probability distributions and common sampling distributions. Under basic statistical methods (Chapter 4), distribution tests and tests on means and variances are covered. Chapter 5 deals with clustering and classification. Strangely enough, the clustering section has material on PCA, factor analysis, correspondence analysis and includes only one subsection on clustering, devoted notably to hierarchical partitioning methods. The classification part deals with decision trees, discriminant analysis and support vector machines. The regression chapter (Chapter 6) treats linear models, generalised linear models, piecewise linear models and a substantial section on models for lexical richness. The final chapter on mixed models is particularly interesting as it is one of the few text book accounts that introduce the reader to using the (innovative) lme4 package of Douglas Bates which implements linear mixed-effects models. Moreover, the case studies included in this

...read moreread less

1,679 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse