scispace - formally typeset
Search or ask a question
Proceedings Article

Negative Deceptive Opinion Spam

01 Jun 2013-pp 497-501
TL;DR: This work creates and study the first dataset of deceptive opinion spam with negative sentiment reviews, and finds that standard n-gram text categorization techniques can detect negative deceptive opinions spam with performance far surpassing that of human judges.
Abstract: The rising influence of user-generated online reviews (Cone, 2011) has led to growing incentive for businesses to solicit and manufacture DECEPTIVE OPINION SPAM—fictitious reviews that have been deliberately written to sound authentic and deceive the reader. Recently, Ott et al. (2011) have introduced an opinion spam dataset containing gold standard deceptive positive hotel reviews. However, the complementary problem of negative deceptive opinion spam, intended to slander competitive offerings, remains largely unstudied. Following an approach similar to Ott et al. (2011), in this work we create and study the first dataset of deceptive opinion spam with negative sentiment reviews. Based on this dataset, we find that standard n-gram text categorization techniques can detect negative deceptive opinion spam with performance far surpassing that of human judges. Finally, in conjunction with the aforementioned positive review dataset, we consider the possible interactions between sentiment and deception, and present initial results that encourage further exploration of this relationship.
Citations
More filters
01 Jan 1999
TL;DR: Longman Student Grammar of Spoken and Written English (LGSME) as discussed by the authors is a large scale grammar of English with the aim of meeting the need of creating discourse in different situations.
Abstract: Longman Student Grammar of Spoken and Written English March 13th, 2019 These tell us what choices are available in the grammar but we also need to understand how these choices are used to create discourse in different situations The year 1999 saw the publication of a large scale grammar of English with the aim of meeting the above needs the Longman ielts house net, longman student grammar of spoken and written english, longman grammar of spoken and written english roffel, longman student grammar of spoken and written english pdf, longman grammar of spoken and written english libros, longmans student grammar of spoken and written english, english longman grammar of spoken and written eng free, longman student grammar of spoken and written english, longman grammar of spoken and written english pdf web, lms2 vu edu pk, longman student grammar of spoken and written english, longman grammar of spoken and written english wikipedia, longman student grammar of spoken and written english, download pdf longman grammar of spoken and written, longman student grammar of spoken and written english, longman grammar of spoken and written english amazon co, longman student grammar of spoken and written english, longman grammar of spoken and written english edoc pub, the languagelab library longman student grammar of, longman grammar of spoken and written english scribd, longman grammar of spoken and written english free, the longman grammar of spoken and written english, longman grammar of spoken and written english epdf tips, grammars of spoken english new outcomes of corpus, longman grammar of spoken and written english tesl ej, book reviews longman grammar of spoken and written english, longman student grammar of spoken and written english, longman grammar of spoken and written english worldcat org, douglas biber et al longman grammar of spoken and, project muse longman grammar of spoken and written, longman grammar of spoken and written english oxford, 9780582237261 longman student grammar of spoken and, longman student grammar of spoken and written english, pdf longman grammar of spoken and written english, longman student grammar of spoken and written english, longman grammar of spoken and written english google books, student grammar of spoken and written english workbook, longman grammar of spoken and written english goodreads, longman student grammar of spoken and written english, longman student grammar of spoken and written english le, longman student grammar of spoken and written english, longman grammar of spoken and written english co construction, longman student grammar of spoken and written english, longman student grammar of spoken and written english by, longman student grammar of spoken and written english workbook, longman grammar of spoken and written english douglas

1,038 citations

Journal ArticleDOI
TL;DR: A rigorous survey on sentiment analysis is presented, which portrays views presented by over one hundred articles published in the last decade regarding necessary tasks, approaches, and applications of sentiment analysis.
Abstract: With the advent of Web 2.0, people became more eager to express and share their opinions on web regarding day-to-day activities and global issues as well. Evolution of social media has also contributed immensely to these activities, thereby providing us a transparent platform to share views across the world. These electronic Word of Mouth (eWOM) statements expressed on the web are much prevalent in business and service industry to enable customer to share his/her point of view. In the last one and half decades, research communities, academia, public and service industries are working rigorously on sentiment analysis, also known as, opinion mining, to extract and analyze public mood and views. In this regard, this paper presents a rigorous survey on sentiment analysis, which portrays views presented by over one hundred articles published in the last decade regarding necessary tasks, approaches, and applications of sentiment analysis. Several sub-tasks need to be performed for sentiment analysis which in turn can be accomplished using various approaches and techniques. This survey covering published literature during 2002-2015, is organized on the basis of sub-tasks to be performed, machine learning and natural language processing techniques used and applications of sentiment analysis. The paper also presents open issues and along with a summary table of a hundred and sixty-one articles.

1,011 citations


Cites background or methods from "Negative Deceptive Opinion Spam"

  • ...Document level 73 [13], [18], [22], [32], [33], [36], [40], [43], [45], [48], [50], [51], [53], [54], [61], [64], [66], [77], [81], [80], [85], [88], [90], [91], [94], [96], [101], [111], [117], [121], [123], [130], [131], [132], [148], [155], [156], [157], [158], [167], [168], [169], [175], [176], [177], [179], [180], [182], [194], [195], [197], [200], [203], [205], [206], [207], [209], [210], [211], [212], [217], [220], [221], [222], [223], [224], [225], [226], [227], [228], [229], [231], [232]...

    [...]

  • ...[200] dataset, which contains 400 deceptive and 400 truthful reviews on each positive and negative category....

    [...]

  • ...Some promising review spam detection methods included duplicate finding methods [234], concept similarity based method [235], content based method [200, 210], and review and reviewer oriented features based method [236] etc....

    [...]

  • ...[200] developed a negative deceptive opinion dataset and performed spam classification using SVM....

    [...]

  • ...S# Tasks and applications #Articles References 1 Subjectivity Classification 6 [44], [75], [110], [163], [167], [174] 2 Polarity determination 43 [12], [26], [29], [32], [33], [35], [40], [45], [48], [50], [54], [57], [66], [85], [95], [96], [108], [109], [112], [114], [123], [126], [154], [156], [157], [160], [162], [165], [166], [168], [169], [170], [171], [172], [176], [177], [178], [179], [180], [203], [205], [206], [209] 3 Vagueness in opinionated text 5 [22], [41], [86], [216], [217] 4 Multi- & cross-lingual SA 6 [46], [88], [94], [115], [148], [173] 5 Cross-domain SA 4 [36], [98], [99], [121] 6 Review usefulness measurement 13 [76], [78], [81], [130], [221], [222], [223], [224], [225], [226], [227], [228], [229] 7 Opinion spam detection 7 [199], [200], [212], [216], [220], [231], [232] 8 Lexica and corpora creation 22 [21], [23], [24], [30], [52], [55], [56], [69], [74], [97], [106], [111], [116], [117], [118], [127], [136], [202], [207], [211], [213], [214] 9 Opinion word and aspects extraction, entity recognition, name disambiguation 36 [8], [11], [25], [27], [35], [37], [59], [60], [61], [62], [63], [67], [68], [92],[93], [100], [101], [102], [107], [125], [132], [175], [182], [185], [186], [189], [190], [191], [193], [194], [195], [196], [218], [240], [241], [243] 10 Applications of SA 21 [13], [18], [43], [47], [49], [51], [53], [58], [64], [73], [77], [79], [80], [90], [91], [124], [131], [155], [158], [183], [184] Total 163...

    [...]

Journal ArticleDOI
06 Nov 2015
TL;DR: This research surveys the current state‐of‐the‐art technologies that are instrumental in the adoption and development of fake news detection, as well as various formats and genres.
Abstract: This research surveys the current state-of-the-art technologies that are instrumental in the adoption and development of fake news detection. "Fake news detection" is defined as the task of categorizing news along a continuum of veracity, with an associated measure of certainty. Veracity is compromised by the occurrence of intentional deceptions. The nature of online news publication has changed, such that traditional fact checking and vetting from potential deception is impossible against the flood arising from content generators, as well as various formats and genres. The paper provides a typology of several varieties of veracity assessment methods emerging from two major categories -- linguistic cue approaches (with machine learning), and network analysis approaches. We see promise in an innovative hybrid approach that combines linguistic cue and machine learning, with network-based behavioral data. Although designing a fake news detector is not a straightforward problem, we propose operational guidelines for a feasible fake news detecting system.

715 citations


Cites background from "Negative Deceptive Opinion Spam"

  • ...The classification of sentiment (Pang & Lee, 2008; Ott et al., 2013) is based on the underlying intuition that deceivers use unintended emotional communication, judgment or evaluation of affective state (Hancock, Woodworth, & Porter, 2011)....

    [...]

  • ...Comparison between human judgement and SVM classifiers showed 86% performance accuracy on negative deceptive opinion spam (Ott et al., 2013)....

    [...]

Journal ArticleDOI
TL;DR: A strong and comprehensive comparative study of current research on detecting review spam using various machine learning techniques and to devise methodology for conducting further investigation is provided.
Abstract: Online reviews are often the primary factor in a customer’s decision to purchase a product or service, and are a valuable source of information that can be used to determine public opinion on these products or services. Because of their impact, manufacturers and retailers are highly concerned with customer feedback and reviews. Reliance on online reviews gives rise to the potential concern that wrongdoers may create false reviews to artificially promote or devalue products and services. This practice is known as Opinion (Review) Spam, where spammers manipulate and poison reviews (i.e., making fake, untruthful, or deceptive reviews) for profit or gain. Since not all online reviews are truthful and trustworthy, it is important to develop techniques for detecting review spam. By extracting meaningful features from the text using Natural Language Processing (NLP), it is possible to conduct review spam detection using various machine learning techniques. Additionally, reviewer information, apart from the text itself, can be used to aid in this process. In this paper, we survey the prominent machine learning techniques that have been proposed to solve the problem of review spam detection and the performance of different approaches for classification and detection of review spam. The majority of current research has focused on supervised learning methods, which require labeled data, a scarcity when it comes to online review spam. Research on methods for Big Data are of interest, since there are millions of online reviews, with many more being generated daily. To date, we have not found any papers that study the effects of Big Data analytics for review spam detection. The primary goal of this paper is to provide a strong and comprehensive comparative study of current research on detecting review spam using various machine learning techniques and to devise methodology for conducting further investigation.

355 citations

Proceedings ArticleDOI
01 Jun 2014
TL;DR: This paper explores generalized approaches for identifying online deceptive opinion spam based on a new gold standard dataset, which is comprised of data from three different domains that contains three types of reviews, i.e. customer generated truthful reviews, Turker generated deceptive reviews and employee (domain-expert) generated deception reviews.
Abstract: Consumers’ purchase decisions are increasingly influenced by user-generated online reviews. Accordingly, there has been growing concern about the potential for posting deceptive opinion spam— fictitious reviews that have been deliberately written to sound authentic, to deceive the reader. In this paper, we explore generalized approaches for identifying online deceptive opinion spam based on a new gold standard dataset, which is comprised of data from three different domains (i.e. Hotel, Restaurant, Doctor), each of which contains three types of reviews, i.e. customer generated truthful reviews, Turker generated deceptive reviews and employee (domain-expert) generated deceptive reviews. Our approach tries to capture the general difference of language usage between deceptive and truthful reviews, which we hope will help customers when making purchase decisions and review portal operators, such as TripAdvisor or Yelp, investigate possible fraudulent activity on their sites. 1

293 citations


Cites background or methods from "Negative Deceptive Opinion Spam"

  • ..., 2012), identification of negative deceptive opinion spam (Ott et al., 2013), and identifying manipulated offerings (Li et al....

    [...]

  • ...created a gold-standard collection by employing Turkers to write fake reviews, and follow-up research was based on their data (Ott et al., 2012; Ott et al., 2013; Li et al., 2013b; Feng and Hirst, 2013)....

    [...]

  • ...…Turk.3 A couple of follow-up works have been introduced based on Ott et al.’s dataset, including estimating prevalence of deception in online reviews (Ott et al., 2012), identification of negative deceptive opinion spam (Ott et al., 2013), and identifying manipulated offerings (Li et al., 2013b)....

    [...]

  • ...Ott et al. created a gold-standard collection by employing Turkers to write fake reviews, and follow-up research was based on their data (Ott et al., 2012; Ott et al., 2013; Li et al., 2013b; Feng and Hirst, 2013)....

    [...]

  • ...Identifying positive/negative opinion spam is explored in (Ott et al., 2011; Ott et al., 2013)...

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies is presented and tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interob server agreement are developed as generalized kappa-type statistics.
Abstract: This paper presents a general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies. The procedure essentially involves the construction of functions of the observed proportions which are directed at the extent to which the observers agree among themselves and the construction of test statistics for hypotheses involving these functions. Tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interobserver agreement are developed as generalized kappa-type statistics. These procedures are illustrated with a clinical diagnosis example from the epidemiological literature.

64,109 citations

Book
01 Jan 1999
TL;DR: The authors compare the frequency of constructions in different contexts, from conversation to fiction to academic prose, using the 40 million-word Longman Spoken and Written English Corpus (LSEE).
Abstract: * Over 350 tables and graphs show the frequency of constructions in different contexts, from conversation to fiction to academic prose * Entirely corpus-based with 6000 authentic examples from the 40 million-word Longman Spoken and Written English Corpus * Suggests the reasons why we choose a particular structure in a particular context * Compares British and American spoken and written English Areas covered include basic grammar: description and distribution, key word classes and their phrases and complex structures. Each area is subdivided into more detailed content.

3,876 citations

Journal ArticleDOI
TL;DR: The generalized additive model for location, scale and shape (GAMLSS) as mentioned in this paper is a general class of statistical models for a univariate response variable, which assumes independent observations of the response variable y given the parameters, the explanatory variables and the values of the random effects.
Abstract: Summary. A general class of statistical models for a univariate response variable is presented which we call the generalized additive model for location, scale and shape (GAMLSS). The model assumes independent observations of the response variable y given the parameters, the explanatory variables and the values of the random effects. The distribution for the response variable in the GAMLSS can be selected from a very general family of distributions including highly skew or kurtotic continuous and discrete distributions. The systematic part of the model is expanded to allow modelling not only of the mean (or location) but also of the other parameters of the distribution of y, as parametric and/or additive nonparametric (smooth) functions of explanatory variables and/or random-effects terms. Maximum (penalized) likelihood estimation is used to fit the (non)parametric models. A Newton–Raphson or Fisher scoring algorithm is used to maximize the (penalized) likelihood. The additive terms in the model are fitted by using a backfitting algorithm. Censored data are easily incorporated into the framework. Five data sets from different fields of application are analysed to emphasize the generality of the GAMLSS class of models.

2,386 citations


"Negative Deceptive Opinion Spam" refers methods in this paper

  • ...6We use the R package GAMLSS (Rigby and Stasinopoulos, 2005) to fit a log-normal distribution (left truncated at 150 characters) to the lengths of the deceptive reviews....

    [...]

Journal ArticleDOI
TL;DR: The study explores the interaction situation, and considers how within deception interactions differences in neuroanatomy and cultural influences combine to produce specific types of body movements and facial expressions which escape efforts to deceive and emerge as leakage or deception clues.
Abstract: : Research relevant to psychotherapy regarding facial expression and body movement, has shown that the kind of information which can be gleaned from the patients words - information about affects, attitudes, interpersonal styles, psychodynamics - can also be derived from his concomitant nonverbal behavior. The study explores the interaction situation, and considers how within deception interactions differences in neuroanatomy and cultural influences combine to produce specific types of body movements and facial expressions which escape efforts to deceive and emerge as leakage or deception clues.

1,594 citations


"Negative Deceptive Opinion Spam" refers background in this paper

  • ..., 2001), and (3) increased negative emotion terms, often attributed to leakage cues (Ekman and Friesen, 1969), but perhaps better explained in our case as an exaggeration of the underlying review sentiment....

    [...]

Journal ArticleDOI
TL;DR: It is proposed that people judge others' deceptions more harshly than their own and that this double standard in evaluating deceit can explain much of the accumulated literature.
Abstract: We analyze the accuracy of deception judgments, synthesizing research results from 206 documents and 24,483 judges. In relevant studies, people attempt to discriminate lies from truths in real time with no special aids or training. In these circumstances, people achieve an average of 54% correct lie-truth judgments, correctly classifying 47% of lies as deceptive and 61% of truths as nondeceptive. Relative to cross-judge differences in accuracy, mean lie-truth discrimination abilities are nontrivial, with a mean accuracy d of roughly .40. This produces an effect that is at roughly the 60th percentile in size, relative to others that have been meta-analyzed by social psychologists. Alternative indexes of lie-truth discrimination accuracy correlate highly with percentage correct, and rates of lie detection vary little from study to study. Our meta-analyses reveal that people are more accurate in judging audible than visible lies, that people appear deceptive when motivated to be believed, and that individuals regard their interaction partners as honest. We propose that people judge others' deceptions more harshly than their own and that this double standard in evaluating deceit can explain much of the accumulated literature.

1,493 citations


"Negative Deceptive Opinion Spam" refers background or result in this paper

  • ...To validate the credibility of our deceptive reviews, we show that human deception detection performance on the negative reviews is low, in agreement with decades of traditional deception detection research (Bond and DePaulo, 2006)....

    [...]

  • ...Recent large-scale meta-analyses have shown human deception detection performance is low, with accuracies often not much better than chance (Bond and DePaulo, 2006)....

    [...]