Home
/
Authors
/
Patxi Galán-García

Author

Patxi Galán-García

Bio: Patxi Galán-García is an academic researcher from University of Deusto. The author has contributed to research in topics: Game theory & Anonymity. The author has an hindex of 7, co-authored 11 publications receiving 333 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•

Supervised Machine Learning for the Detection of Troll Profiles in Twitter Social Network: Application to a Real Case of Cyberbullying.

[...]

Patxi Galán-García¹, José Gaviria de la Puerta¹, Carlos Laorden Gómez¹, Igor Santos¹, Pablo García Bringas¹ - Show less +1 more•Institutions (1)

University of Deusto¹

01 Jan 2013

TL;DR: A methodology to detect and associate fake profiles on Twitter social network which are employed for defamatory activities to a real profile within the same network by analysing the content of comments generated by both profiles is presented.

...read moreread less

Abstract: The use of new technologies along with the popularity of social networks has given the power of anonymity to the users. The ability to create an alter-ego with no relation to the actual user, creates a situation in which no one can certify the match between a profile and a real person. This problem generates situations, repeated daily, in which users with fake accounts, or at least not related to their real identity, publish news, reviews or multimedia material trying to discredit or attack other people who may or may not be aware of the attack. These acts can have great impact on the affected victims’ environment generating situations in which virtual attacks escalate into fatal consequences in real life. In this paper, we present a methodology to detect and associate fake profiles on Twitter social network which are employed for defamatory activities to a real profile within the same network by analysing the content of comments generated by both profiles. Accompanying this approach we also present a successful real life use case in which this methodology was applied to detect and stop a cyberbullying situation in a real elementary school.

...read moreread less

147 citations

Journal Article•DOI•

Supervised machine learning for the detection of troll profiles in twitter social network: application to a real case of cyberbullying

[...]

Patxi Galán-García¹, José Gaviria de la Puerta¹, Carlos Laorden Gómez¹, Igor Santos¹, Pablo García Bringas¹ - Show less +1 more•Institutions (1)

University of Deusto¹

31 Oct 2015-Logic Journal of the IGPL

TL;DR: In this article, the authors present a methodology to detect and associate fake profiles on Twitter social network which are employed for defamatory activities to a real profile within the same network by analyzing the content of comments generated by both profiles.

...read moreread less

115 citations

Book Chapter•DOI•

Twitter Content-Based Spam Filtering

[...]

Igor Santos, Igor Miñambres-Marcos, Carlos Laorden, Patxi Galán-García, Aitor Santamaría-Ibirika, Pablo García Bringas - Show less +2 more

01 Jan 2014

TL;DR: This paper has used the text in the tweet and machine learning and compression algorithms to filter those undesired tweets and proposes a content-based approach to filter spam tweets.

...read moreread less

Abstract: Twitter has become one of the most used social networks. And, as happens with every popular media, it is prone to misuse. In this context, spam in Twitter has emerged in the last years, becoming an important problem for the users. In the last years, several approaches have appeared that are able to determine whether an user is a spammer or not. However, these blacklisting systems cannot filter every spam message and a spammer may create another account and restart sending spam. In this paper, we propose a content-based approach to filter spam tweets. We have used the text in the tweet and machine learning and compression algorithms to filter those undesired tweets.

...read moreread less

44 citations

Book Chapter•DOI•

Negobot: A Conversational Agent Based on Game Theory for the Detection of Paedophile Behaviour

[...]

Carlos Laorden¹, Patxi Galán-García¹, Igor Santos¹, Borja Sanz¹, José María Gómez Hidalgo, Pablo García Bringas¹ - Show less +2 more•Institutions (1)

University of Deusto¹

01 Jan 2013

TL;DR: Negobot is a conversational agent posing as a child, in chats, social networks and other channels suffering from paedophile behaviour, which proposes to consider the conversation itself as a game, applying game theory.

...read moreread less

Abstract: Children have been increasingly becoming active users of the Internet and, although any segment of the population is susceptible to falling victim to the existing risks, they in particular are one of the most vulnerable. Thus, some of the major scourges of this cyber-society are paedophile behaviours on the Internet, child pornography or sexual exploitation of children. In light of this background, Negobot is a conversational agent posing as a child, in chats, social networks and other channels suffering from paedophile behaviour. As a conversational agent, Negobot, has a strong technical base of Natural Language Processing and information retrieval, as well as Artificial Intelligence and Machine Learning. However, the most innovative proposal of Negobot is to consider the conversation itself as a game, applying game theory. In this context, Negobot proposes, first, a competitive game in which the system identifies the best strategies for achieving its goal, to obtain information that leads us to infer if the subject involved in a conversation with the agent has paedophile tendencies, while our actions do not bring the alleged offender to leave the conversation due to a suspicious behaviour of the agent.

...read moreread less

23 citations

Journal Article•DOI•

Collective classification for spam filtering

[...]

Carlos Laorden¹, Borja Sanz¹, Igor Santos¹, Patxi Galán-García¹, Pablo García Bringas¹ - Show less +1 more•Institutions (1)

University of Deusto¹

01 Aug 2013-Logic Journal of The Igpl \/ Bulletin of The Igpl

TL;DR: In this paper, collective classification for text classification poses as an interesting method for optimising the classification of partially-labelled data, and for the first time, collective classification algorithms for spam filtering to overcome the amount of unclassified e-mails that are sent every day.

...read moreread less

Abstract: Spam has become a major issue in computer security because it is a channel for threats such as computer viruses, worms and phishing. Many solutions feature machine-learning algorithms trained using statistical representations of the terms that usually appear in the e-mails. Still, these methods require a training step with labelled data. Dealing with the situation where the availability of labelled training instances is limited slows down the progress of filtering systems and offers advantages to spammers. Currently, many approaches direct their efforts into Semi-Supervised Learning (SSL). SSL is a halfway method between supervised and unsupervised learning, which, in addition to unlabelled data, receives some supervision information such as the association of the targets with some of the examples. Collective Classification for Text Classification poses as an interesting method for optimising the classification of partially-labelled data. In this way, we propose here, for the first time, Collective Classification algorithms for spam filtering to overcome the amount of unclassified e-mails that are sent every day.

...read moreread less

14 citations

Cited by

PDF

Open Access

More filters

An essay towards solving a problem in the doctrine of chances. [Facsimil]

[...]

Thomas Bayes

01 Jan 2001

TL;DR: The probability of any event is the ratio between the value at which an expectation depending on the happening of the event ought to be computed, and the value of the thing expected upon it’s 2 happening.

...read moreread less

Abstract: Problem Given the number of times in which an unknown event has happened and failed: Required the chance that the probability of its happening in a single trial lies somewhere between any two degrees of probability that can be named. SECTION 1 Definition 1. Several events are inconsistent, when if one of them happens, none of the rest can. 2. Two events are contrary when one, or other of them must; and both together cannot happen. 3. An event is said to fail, when it cannot happen; or, which comes to the same thing, when its contrary has happened. 4. An event is said to be determined when it has either happened or failed. 5. The probability of any event is the ratio between the value at which an expectation depending on the happening of the event ought to be computed, and the value of the thing expected upon it’s 2 happening.

...read moreread less

368 citations

Proceedings Article•DOI•

Large scale crowdsourcing and characterization of twitter abusive behavior

[...]

Antigoni Maria Founta¹, Constantinos Djouvas², Despoina Chatzakou¹, Ilias Leontiadis³, Jeremy Blackburn⁴, Gianluca Stringhini⁵, Athena Vakali¹, Michael Sirivianos², Nicolas Kourtellis³ - Show less +5 more•Institutions (5)

Aristotle University of Thessaloniki¹, Cyprus University of Technology², Telefónica³, University of Alabama at Birmingham⁴, University College London⁵

15 Jun 2018

TL;DR: The authors proposed an incremental and iterative methodology that leverages the power of crowdsourcing to annotate a large collection of tweets with a set of abuse-related labels and identified a reduced but robust set of labels to characterize abusive-related tweets.

...read moreread less

Abstract: In recent years online social networks have suffered an increase in sexism, racism, and other types of aggressive and cyberbullying behavior, often manifesting itself through offensive, abusive, or hateful language. Past scientific work focused on studying these forms of abusive activity in popular online social networks, such as Facebook and Twitter. Building on such work, we present an eight month study of the various forms of abusive behavior on Twitter, in a holistic fashion. Departing from past work, we examine a wide variety of labeling schemes, which cover different forms of abusive behavior. We propose an incremental and iterative methodology that leverages the power of crowdsourcing to annotate a large collection of tweets with a set of abuse-related labels. By applying our methodology and performing statistical analysis for label merging or elimination, we identify a reduced but robust set of labels to characterize abuse-related tweets. Finally, we offer a characterization of our annotated dataset of 80 thousand tweets, which we make publicly available for further scientific exploration.

...read moreread less

351 citations

Hate Me, Hate Me Not: Hate Speech Detection on Facebook.

[...]

Fabio Del Vigna, Andrea Cimino, Felice Dell'Orletta, Marinella Petrocchi, Maurizio Tesconi - Show less +1 more

01 Jan 2017

TL;DR: This work proposes a variety of hate categories and designs and implements two classifiers for the Italian language, based on different learning algorithms: the first based on Support Vector Machines (SVM) and the second on a particular Recurrent Neural Network named Long Short Term Memory (LSTM).

...read moreread less

Abstract: While favouring communications and easing information sharing, Social Network Sites are also used to launch harmful campaigns against specific groups and individuals. Cyberbullism, incitement to self-harm practices, sexual predation are just some of the severe effects of massive online offensives. Moreover, attacks can be carried out against groups of victims and can degenerate in physical violence. In this work, we aim at containing and preventing the alarming diffusion of such hate campaigns. Using Facebook as a benchmark, we consider the textual content of comments appeared on a set of public Italian pages. We first propose a variety of hate categories to distinguish the kind of hate. Crawled comments are then annotated by up to five distinct human annotators, according to the defined taxonomy. Leveraging morpho-syntactical features, sentiment polarity and word embedding lexicons, we design and implement two classifiers for the Italian language, based on different learning algorithms: the first based on Support Vector Machines (SVM) and the second on a particular Recurrent Neural Network named Long Short Term Memory (LSTM). We test these two learning algorithms in order to verify their classification performances on the task of hate speech recognition. The results show the effectiveness of the two classification approaches tested over the first manually annotated Italian Hate Speech Corpus of social media text.

...read moreread less

286 citations

Journal Article•DOI•

Automatic detection of cyberbullying in social media text

[...]

Cynthia Van Hee¹, Gilles Jacobs¹, Chris Emmery², Bart Desmet¹, Els Lefever¹, Ben Verhoeven², Guy De Pauw², Walter Daelemans², Veronique Hoste¹ - Show less +5 more•Institutions (2)

Ghent University¹, University of Antwerp²

08 Oct 2018-PLOS ONE

TL;DR: This paper describes the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and performs a series of binary classification experiments to determine the feasibility of automatic cyberbullies detection.

...read moreread less

Abstract: While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for the task. Experiments on a hold-out test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1 score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems.

...read moreread less

231 citations

Journal Article•DOI•

Credibility in social media: opinions, news, and health information—a survey

[...]

Marco Viviani¹, Gabriella Pasi¹•Institutions (1)

University of Milano-Bicocca¹

01 Sep 2017-Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

TL;DR: Three of the main tasks facing this issue concern: (1) the detection of opinion spam in review sites, (2) the Detection of fake news and spam in microblogging, and (3) the credibility assessment of online health information.

...read moreread less

Abstract: In the Social Web scenario, where large amounts of User Generated Content diffuse through Social Media, the risk of running into misinformation is not negligible For this reason, assessing and mining the credibility of both sources of information and information itself constitute nowadays a fundamental issue Credibility, also referred as believability, is a quality perceived by individuals, who are not always able to discern with their cognitive capacities genuine information from the fake one For this reason, in the recent years several approaches have been proposed to automatically assess credibility in Social Media Most of them are based on data-driven models, ie, they employ machine-learning techniques to identify misinformation, but recently also model-driven approaches are emerging, as well as graph-based approaches focusing on credibility propagation Since multiple social applications have been developed for different aims and in different contexts, several solutions have been considered to address the issue of credibility assessment in Social Media Three of the main tasks facing this issue and considered in this article concern: (1) the detection of opinion spam in review sites, (2) the detection of fake news and spam in microblogging, and (3) the credibility assessment of online health information Despite the high number of interesting solutions proposed in the literature to tackle the above three tasks, some issues remain unsolved; they mainly concern both the absence of predefined benchmarks and gold standard datasets, and the difficulty of collecting and mining large amount of data, which has not yet received the attention it deserves For further resources related to this article, please visit the WIREs website

...read moreread less

159 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59

Collapse