scispace - formally typeset
Search or ask a question
Author

Qian Xu

Bio: Qian Xu is an academic researcher from Baidu. The author has contributed to research in topics: Spambot & Spamming. The author has an hindex of 1, co-authored 1 publications receiving 87 citations.

Papers
More filters
Journal ArticleDOI
Qian Xu1, Evan Wei Xiang1, Qiang Yang2, Jiachun Du2, Jieping Zhong2 
TL;DR: This service-side solution uses graph data mining to distinguish spammers from nonspammers and detect spam without checking a message's contents.
Abstract: Short Message Service text messages are indispensable, but they face a serious problem from spamming. This service-side solution uses graph data mining to distinguish spammers from nonspammers and detect spam without checking a message's contents.

90 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Different real-world applications have varying definitions of suspicious behaviors, and detection methods often look for the most suspicious parts of the data by optimizing scores, but quantifying the suspiciousness of a behavioral pattern is still an open issue.
Abstract: Different real-world applications have varying definitions of suspicious behaviors. Detection methods often look for the most suspicious parts of the data by optimizing scores, but quantifying the suspiciousness of a behavioral pattern is still an open issue.

116 citations

Proceedings Article
22 Jul 2012
TL;DR: A Supervised Matrix Factorization method with Social Regularization (SMFSR) for spammer detection in social networks that exploits both social activities as well as users' social relations in an innovative and highly scalable manner is proposed.
Abstract: As the popularity of the social media increases, as evidenced in Twitter, Facebook and China's Renren, spamming activities also picked up in numbers and variety. On social network sites, spammers often disguise themselves by creating fake accounts and hijacking normal users' accounts for personal gains. Different from the spammers in traditional systems such as SMS and email, spammers in social media behave like normal users and they continue to change their spamming strategies to fool anti-spamming systems. However, due to the privacy and resource concerns, many social media websites cannot fully monitor all the contents of users, making many of the previous approaches, such as topology-based and content-classification-based methods, infeasible to use. In this paper, we propose a Supervised Matrix Factorization method with Social Regularization (SMFSR) for spammer detection in social networks that exploits both social activities as well as users' social relations in an innovative and highly scalable manner. The proposed method detects spammers collectively based on users' social actions and social relations. We have empirically tested our method on data from Renren.com, which is one of the largest social networks in China, and demonstrated that our new method can improve the detection performance significantly.

109 citations

Journal ArticleDOI
TL;DR: Using the dataset from a popular OHC, the research demonstrated that the proposed metric is highly effective in identifying influential users and combining the metric with other traditional measures further improves the identification of influential users.

106 citations

31 Mar 2013
TL;DR: The results indicate that the procedure followed to build the collection does not lead to near-duplicates and, regarding the classifiers, the Support Vector Machines outperforms other evaluated techniques and, hence, it can be used as a good baseline for further comparison.
Abstract: The growth of mobile phone users has lead to a dramatic increasing of SMS spam messages. Recent reports clearly indicate that the volume of mobile phone spam is dramatically increasing year by year. In practice, fighting such plague is difficult by several factors, including the lower rate of SMS that has allowed many users and service providers to ignore the issue, and the limited availability of mobile phone spam-filtering software. Probably, one of the major concerns in academic settings is the scarcity of public SMS spam datasets, that are sorely needed for validation and comparison of different classifiers. Moreover, traditional content-based filters may have their performance seriously degraded since SMS messages are fairly short and their text is generally rife with idioms and abbreviations. In this paper, we present details about a new real, public and non-encoded SMS spam collection that is the largest one as far as we know. Moreover, we offer a comprehensive analysis of such dataset in order to ensure that there are no duplicated messages coming from previously existing datasets, since it may ease the task of learning SMS spam classifiers and could compromise the evaluation of methods. Additionally, we compare the performance achieved by several established machine learning techniques. Im summary, the results indicate that the procedure followed to build the collection does not lead to near-duplicates and, regarding the classifiers, the Support Vector Machines outperforms other evaluated techniques and, hence, it can be used as a good baseline for further comparison.

84 citations

Journal ArticleDOI
TL;DR: The proposed text processing approach is based on lexicographic and semantic dictionaries along with state-of-the-art techniques for semantic analysis and context detection and aims to alleviate factors that can degrade the algorithms performance, such as redundancies and inconsistencies.
Abstract: The rapid popularization of smartphones has contributed to the growth of online Instant Messaging and SMS usage as an alternative way of communication The increasing number of users, along with the trust they inherently have in their devices, makes such messages a propitious environment for spammers In fact, reports clearly indicate that volume of spam over Instant Messaging and SMS is dramatically increasing year by year It represents a challenging problem for traditional filtering methods nowadays, since such messages are usually fairly short and normally rife with slangs, idioms, symbols and acronyms that make even tokenization a difficult task In this scenario, this paper proposes and then evaluates a method to normalize and expand original short and messy text messages in order to acquire better attributes and enhance the classification performance The proposed text processing approach is based on lexicographic and semantic dictionaries along with state-of-the-art techniques for semantic analysis and context detection This technique is used to normalize terms and create new attributes in order to change and expand original text samples aiming to alleviate factors that can degrade the algorithms performance, such as redundancies and inconsistencies We have evaluated our approach with a public, real and non-encoded data-set along with several established machine learning methods Our experiments were diligently designed to ensure statistically sound results which indicate that the proposed text processing techniques can in fact enhance Instant Messaging and SMS spam filtering

80 citations