scispace - formally typeset
Search or ask a question
Author

Patxi Galán-García

Bio: Patxi Galán-García is an academic researcher from University of Deusto. The author has contributed to research in topics: Game theory & Anonymity. The author has an hindex of 7, co-authored 11 publications receiving 333 citations.

Papers
More filters
Proceedings Article
01 Jan 2013
TL;DR: A methodology to detect and associate fake profiles on Twitter social network which are employed for defamatory activities to a real profile within the same network by analysing the content of comments generated by both profiles is presented.
Abstract: The use of new technologies along with the popularity of social networks has given the power of anonymity to the users. The ability to create an alter-ego with no relation to the actual user, creates a situation in which no one can certify the match between a profile and a real person. This problem generates situations, repeated daily, in which users with fake accounts, or at least not related to their real identity, publish news, reviews or multimedia material trying to discredit or attack other people who may or may not be aware of the attack. These acts can have great impact on the affected victims’ environment generating situations in which virtual attacks escalate into fatal consequences in real life. In this paper, we present a methodology to detect and associate fake profiles on Twitter social network which are employed for defamatory activities to a real profile within the same network by analysing the content of comments generated by both profiles. Accompanying this approach we also present a successful real life use case in which this methodology was applied to detect and stop a cyberbullying situation in a real elementary school.

147 citations

Journal ArticleDOI
TL;DR: In this article, the authors present a methodology to detect and associate fake profiles on Twitter social network which are employed for defamatory activities to a real profile within the same network by analyzing the content of comments generated by both profiles.
Abstract: The use of new technologies along with the popularity of social networks has given the power of anonymity to the users. The ability to create an alter-ego with no relation to the actual user, creates a situation in which no one can certify the match between a profile and a real person. This problem generates situations, repeated daily, in which users with fake accounts, or at least not related to their real identity, publish news, reviews or multimedia material trying to discredit or attack other people who may or may not be aware of the attack. These acts can have great impact on the affected victims’ environment generating situations in which virtual attacks escalate into fatal consequences in real life. In this paper, we present a methodology to detect and associate fake profiles on Twitter social network which are employed for defamatory activities to a real profile within the same network by analysing the content of comments generated by both profiles. Accompanying this approach we also present a successful real life use case in which this methodology was applied to detect and stop a cyberbullying situation in a real elementary school.

115 citations

Book ChapterDOI
01 Jan 2014
TL;DR: This paper has used the text in the tweet and machine learning and compression algorithms to filter those undesired tweets and proposes a content-based approach to filter spam tweets.
Abstract: Twitter has become one of the most used social networks. And, as happens with every popular media, it is prone to misuse. In this context, spam in Twitter has emerged in the last years, becoming an important problem for the users. In the last years, several approaches have appeared that are able to determine whether an user is a spammer or not. However, these blacklisting systems cannot filter every spam message and a spammer may create another account and restart sending spam. In this paper, we propose a content-based approach to filter spam tweets. We have used the text in the tweet and machine learning and compression algorithms to filter those undesired tweets.

44 citations

Book ChapterDOI
01 Jan 2013
TL;DR: Negobot is a conversational agent posing as a child, in chats, social networks and other channels suffering from paedophile behaviour, which proposes to consider the conversation itself as a game, applying game theory.
Abstract: Children have been increasingly becoming active users of the Internet and, although any segment of the population is susceptible to falling victim to the existing risks, they in particular are one of the most vulnerable. Thus, some of the major scourges of this cyber-society are paedophile behaviours on the Internet, child pornography or sexual exploitation of children. In light of this background, Negobot is a conversational agent posing as a child, in chats, social networks and other channels suffering from paedophile behaviour. As a conversational agent, Negobot, has a strong technical base of Natural Language Processing and information retrieval, as well as Artificial Intelligence and Machine Learning. However, the most innovative proposal of Negobot is to consider the conversation itself as a game, applying game theory. In this context, Negobot proposes, first, a competitive game in which the system identifies the best strategies for achieving its goal, to obtain information that leads us to infer if the subject involved in a conversation with the agent has paedophile tendencies, while our actions do not bring the alleged offender to leave the conversation due to a suspicious behaviour of the agent.

23 citations

Journal ArticleDOI
TL;DR: In this paper, collective classification for text classification poses as an interesting method for optimising the classification of partially-labelled data, and for the first time, collective classification algorithms for spam filtering to overcome the amount of unclassified e-mails that are sent every day.
Abstract: Spam has become a major issue in computer security because it is a channel for threats such as computer viruses, worms and phishing. Many solutions feature machine-learning algorithms trained using statistical representations of the terms that usually appear in the e-mails. Still, these methods require a training step with labelled data. Dealing with the situation where the availability of labelled training instances is limited slows down the progress of filtering systems and offers advantages to spammers. Currently, many approaches direct their efforts into Semi-Supervised Learning (SSL). SSL is a halfway method between supervised and unsupervised learning, which, in addition to unlabelled data, receives some supervision information such as the association of the targets with some of the examples. Collective Classification for Text Classification poses as an interesting method for optimising the classification of partially-labelled data. In this way, we propose here, for the first time, Collective Classification algorithms for spam filtering to overcome the amount of unclassified e-mails that are sent every day.

14 citations


Cited by
More filters
01 Jan 2001
TL;DR: The probability of any event is the ratio between the value at which an expectation depending on the happening of the event ought to be computed, and the value of the thing expected upon it’s 2 happening.
Abstract: Problem Given the number of times in which an unknown event has happened and failed: Required the chance that the probability of its happening in a single trial lies somewhere between any two degrees of probability that can be named. SECTION 1 Definition 1. Several events are inconsistent, when if one of them happens, none of the rest can. 2. Two events are contrary when one, or other of them must; and both together cannot happen. 3. An event is said to fail, when it cannot happen; or, which comes to the same thing, when its contrary has happened. 4. An event is said to be determined when it has either happened or failed. 5. The probability of any event is the ratio between the value at which an expectation depending on the happening of the event ought to be computed, and the value of the thing expected upon it’s 2 happening.

368 citations

Proceedings ArticleDOI
15 Jun 2018
TL;DR: The authors proposed an incremental and iterative methodology that leverages the power of crowdsourcing to annotate a large collection of tweets with a set of abuse-related labels and identified a reduced but robust set of labels to characterize abusive-related tweets.
Abstract: In recent years online social networks have suffered an increase in sexism, racism, and other types of aggressive and cyberbullying behavior, often manifesting itself through offensive, abusive, or hateful language. Past scientific work focused on studying these forms of abusive activity in popular online social networks, such as Facebook and Twitter. Building on such work, we present an eight month study of the various forms of abusive behavior on Twitter, in a holistic fashion. Departing from past work, we examine a wide variety of labeling schemes, which cover different forms of abusive behavior. We propose an incremental and iterative methodology that leverages the power of crowdsourcing to annotate a large collection of tweets with a set of abuse-related labels. By applying our methodology and performing statistical analysis for label merging or elimination, we identify a reduced but robust set of labels to characterize abuse-related tweets. Finally, we offer a characterization of our annotated dataset of 80 thousand tweets, which we make publicly available for further scientific exploration.

351 citations

01 Jan 2017
TL;DR: This work proposes a variety of hate categories and designs and implements two classifiers for the Italian language, based on different learning algorithms: the first based on Support Vector Machines (SVM) and the second on a particular Recurrent Neural Network named Long Short Term Memory (LSTM).
Abstract: While favouring communications and easing information sharing, Social Network Sites are also used to launch harmful campaigns against specific groups and individuals. Cyberbullism, incitement to self-harm practices, sexual predation are just some of the severe effects of massive online offensives. Moreover, attacks can be carried out against groups of victims and can degenerate in physical violence. In this work, we aim at containing and preventing the alarming diffusion of such hate campaigns. Using Facebook as a benchmark, we consider the textual content of comments appeared on a set of public Italian pages. We first propose a variety of hate categories to distinguish the kind of hate. Crawled comments are then annotated by up to five distinct human annotators, according to the defined taxonomy. Leveraging morpho-syntactical features, sentiment polarity and word embedding lexicons, we design and implement two classifiers for the Italian language, based on different learning algorithms: the first based on Support Vector Machines (SVM) and the second on a particular Recurrent Neural Network named Long Short Term Memory (LSTM). We test these two learning algorithms in order to verify their classification performances on the task of hate speech recognition. The results show the effectiveness of the two classification approaches tested over the first manually annotated Italian Hate Speech Corpus of social media text.

286 citations

Journal ArticleDOI
08 Oct 2018-PLOS ONE
TL;DR: This paper describes the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and performs a series of binary classification experiments to determine the feasibility of automatic cyberbullies detection.
Abstract: While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for the task. Experiments on a hold-out test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1 score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems.

231 citations

Journal ArticleDOI
TL;DR: Three of the main tasks facing this issue concern: (1) the detection of opinion spam in review sites, (2) the Detection of fake news and spam in microblogging, and (3) the credibility assessment of online health information.
Abstract: In the Social Web scenario, where large amounts of User Generated Content diffuse through Social Media, the risk of running into misinformation is not negligible For this reason, assessing and mining the credibility of both sources of information and information itself constitute nowadays a fundamental issue Credibility, also referred as believability, is a quality perceived by individuals, who are not always able to discern with their cognitive capacities genuine information from the fake one For this reason, in the recent years several approaches have been proposed to automatically assess credibility in Social Media Most of them are based on data-driven models, ie, they employ machine-learning techniques to identify misinformation, but recently also model-driven approaches are emerging, as well as graph-based approaches focusing on credibility propagation Since multiple social applications have been developed for different aims and in different contexts, several solutions have been considered to address the issue of credibility assessment in Social Media Three of the main tasks facing this issue and considered in this article concern: (1) the detection of opinion spam in review sites, (2) the detection of fake news and spam in microblogging, and (3) the credibility assessment of online health information Despite the high number of interesting solutions proposed in the literature to tackle the above three tasks, some issues remain unsolved; they mainly concern both the absence of predefined benchmarks and gold standard datasets, and the difficulty of collecting and mining large amount of data, which has not yet received the attention it deserves For further resources related to this article, please visit the WIREs website

159 citations