scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Comments Mining With TF-IDF: The Inherent Bias and Its Removal

01 Mar 2019-IEEE Transactions on Knowledge and Data Engineering (IEEE)-Vol. 31, Iss: 3, pp 437-450
TL;DR: This paper reveals the bias introduced by between-participants’ discourse to the study of comments in social media, and proposes an adjustment to tf-idf that accounts for this bias.
Abstract: Text mining have gained great momentum in recent years, with user-generated content becoming widely available. One key use is comment mining, with much attention being given to sentiment analysis and opinion mining. An essential step in the process of comment mining is text pre-processing; a step in which each linguistic term is assigned with a weight that commonly increases with its appearance in the studied text, yet is offset by the frequency of the term in the domain of interest. A common practice is to use the well-known tf-idf formula to compute these weights. This paper reveals the bias introduced by between-participants’ discourse to the study of comments in social media, and proposes an adjustment. We find that content extracted from discourse is often highly correlated, resulting in dependency structures between observations in the study, thus introducing a statistical bias. Ignoring this bias can manifest in a non-robust analysis at best and can lead to an entirely wrong conclusion at worst. We propose an adjustment to tf-idf that accounts for this bias. We illustrate the effects of both the bias and correction with with seven Facebook fan pages data, covering different domains, including news, finance, politics, sport, shopping, and entertainment.
Citations
More filters
01 Dec 2013
TL;DR: This article conducted an online experiment to examine the impacts of interactivity in CSR messages on corporate reputation and word-of-mouth intentions and found that an increase in perceived interactivity leads to higher message credibility and stronger feelings of identification with the company, which also boost corporate reputation.
Abstract: markdown____ Companies increasingly communicate about corporate social responsibility (CSR) through interactive online media. We examine whether using such media is beneficial to a company's reputation. We conducted an online experiment to examine the impacts of interactivity in CSR messages on corporate reputation and word-of-mouth intentions. Our findings suggest that an increase in perceived interactivity leads to higher message credibility and stronger feelings of identification with the company, which also boost corporate reputation and word-of-mouth. This result implies that using interactive channels to communicate about CSR can improve corporate reputation. Our results also show that the detrimental impacts of negative user evaluations on corporate reputation are much higher than the favorable impacts of positive evaluations. This finding suggests that, despite the effectiveness of interactive communication channels, firms need to carefully monitor these channels.

220 citations

Journal ArticleDOI
TL;DR: The present research can help online retailers identify the most helpful reviews and, thus, reduce consumers' search costs as well as assist reviewers in contributing more valuable online reviews.
Abstract: Online review helpfulness has always sparked a heated discussion among academics and practitioners. Despite the fact that research has extensively examined the impacts of review title and content on perceptions of online review helpfulness, the underlying mechanism of how the similarities between a review' title and content may affect review helpfulness has been rarely explored. Based on mere exposure theory, a research model reflecting the influences of title-content similarity and sentiment consistency on review helpfulness was developed and empirically examined by using data collected from 127,547 product reviews on Amazon.com. The TF-IDF and the cosine of similarity were used for measuring the text similarity between review title and review content, and the Tobit model was used for regression analysis. The results showed that the title-content similarity positively affected review helpfulness. In addition, the positive effect of title-content similarity on review helpfulness is increased when the title-content sentiment consistency is high. The title sentiment also negatively moderates the impact of the title-content similarity on review helpfulness. The present research can help online retailers identify the most helpful reviews and, thus, reduce consumers' search costs as well as assist reviewers in contributing more valuable online reviews.

60 citations

Journal ArticleDOI
TL;DR: An online shopping support model using deep‐learning–based opinion mining and q‐rung orthopair fuzzy interaction weighted Heronian mean (q‐ROFIWHM) operators is proposed to support consumers' purchase decisions.
Abstract: In the process of online shopping, consumers usually compare the review information of the same product in different e‐commerce platforms. The sentiment orientation of online reviews from different platforms interactively influences on consumers’ purchase decision. However, due to the limitation of the ability to process information manually, it is difficult for a consumer to accurately identify the sentiment orientation of all reviews one by one and describe the process of their interactive influence. To this end, we proposed an online shopping support model using deep‐learning–based opinion mining and q‐rung orthopair fuzzy interaction weighted Heronian mean (q‐ROFIWHM) operators. First, in the proposed method, the deep‐learning model is used to automatically extract different product attribute words and opinion words from online reviews, and match the corresponding attribute‐opinion pairs; meanwhile, the sentiment dictionary is used to calculate sentiment orientation, including positive, negative, and neutral sentiments. Second, the proportions of the three kinds of sentiments about each attribute of the same product are calculated. According to the proportion value of attribute sentiment from different platforms, the sentiment information is converted into multiple cross‐decision matrices, which are represented by the q‐rung orthopair fuzzy set. Third, considering the interactive characteristics of decision matrix, the q‐ROFIWHM operators are proposed to aggregate this cross‐decision information, and then the ranking result was determined by score function to support consumers' purchase decisions. Finally, an actual example of mobile phone purchase is given to verify the rationality of the proposed method, and the sensitivity and the comparison analysis are used to show its effectiveness and superiority.

59 citations


Additional excerpts

  • ...1 | Attribute‐opinion pairs mining The research on the mining of attribute‐opinion word pairs has attracted wide attention, mainly including the following three aspects: (a) The mining of attribute‐opinion word pairs is regarded as a task of “keyword” extraction, and these keywords are extracted with unsupervised methods, for example, latent Dirichlet allocation (LDA),(12,13) TextRank,(14,15) and term frequency‐ inverse document frequency (TF‐IDF).(16,17) However, those unsupervised methods have their limitations....

    [...]

Journal ArticleDOI
TL;DR: A novel SSC-VIKOR approach is developed to prioritize vehicle brand candidates from a big data analytical viewpoint, contributing to much more precise operations management on marketing strategy, quality improvement and intelligent recommendation.
Abstract: The increasingly booming e-commerce development has stimulated vehicle consumers to express individual reviews through online forum. The purpose of this paper is to probe into the vehicle consumer consumption behavior and make recommendations for potential consumers from textual comments viewpoint.,A big data analytic-based approach is designed to discover vehicle consumer consumption behavior from online perspective. To reduce subjectivity of expert-based approaches, a parallel Naive Bayes approach is designed to analyze the sentiment analysis, and the Saaty scale-based (SSC) scoring rule is employed to obtain specific sentimental value of attribute class, contributing to the multi-grade sentiment classification. To achieve the intelligent recommendation for potential vehicle customers, a novel SSC-VIKOR approach is developed to prioritize vehicle brand candidates from a big data analytical viewpoint.,The big data analytics argue that “cost-effectiveness” characteristic is the most important factor that vehicle consumers care, and the data mining results enable automakers to better understand consumer consumption behavior.,The case study illustrates the effectiveness of the integrated method, contributing to much more precise operations management on marketing strategy, quality improvement and intelligent recommendation.,Researches of consumer consumption behavior are usually based on survey-based methods, and mostly previous studies about comments analysis focus on binary analysis. The hybrid SSC-VIKOR approach is developed to fill the gap from the big data perspective.

40 citations

References
More filters
Journal ArticleDOI
TL;DR: This publication contains reprint articles for which IEEE does not hold copyright and which are likely to be copyrighted.
Abstract: Social network sites SNSs are increasingly attracting the attention of academic and industry researchers intrigued by their affordances and reach This special theme section of the Journal of Computer-Mediated Communication brings together scholarship on these emergent phenomena In this introductory article, we describe features of SNSs and propose a comprehensive definition We then present one perspective on the history of such sites, discussing key changes and developments After briefly summarizing existing scholarship concerning SNSs, we discuss the articles in this special section and conclude with considerations for future research

14,912 citations


"Comments Mining With TF-IDF: The In..." refers background in this paper

  • ...SOCIAL media and in particular social networks (SNS) are today’s major form of communication used on a daily basis [1]....

    [...]

Journal ArticleDOI
TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.
Abstract: The experimental evidence accumulated over the past 20 years indicates that textindexing systems based on the assignment of appropriately weighted single terms produce retrieval results that are superior to those obtainable with other more elaborate text representations. These results depend crucially on the choice of effective term weighting systems. This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.

9,460 citations


"Comments Mining With TF-IDF: The In..." refers methods in this paper

  • ...Several variations and adjustments were offered, including normalizing tft;d and optional weighting schemes (such as BM25) by [48], [49], [50], [51]....

    [...]

Book
08 Jul 2008
TL;DR: This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems and focuses on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis.
Abstract: An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.

7,452 citations


"Comments Mining With TF-IDF: The In..." refers background in this paper

  • ...A complete survey on different aspects of sentiment analysis is given in [36], [37], [38]....

    [...]

01 Jan 2002
TL;DR: In this paper, the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, was considered and three machine learning methods (Naive Bayes, maximum entropy classiflcation, and support vector machines) were employed.
Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we flnd that standard machine learning techniques deflnitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classiflcation, and support vector machines) do not perform as well on sentiment classiflcation as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classiflcation problem more challenging.

6,980 citations

Proceedings ArticleDOI
06 Jul 2002
TL;DR: This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.
Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.

6,626 citations


"Comments Mining With TF-IDF: The In..." refers methods in this paper

  • ...Accounting for ngrams in tf-idf weights has been addressed by several researchers, such as [60] who show that unigrams better predict class membership than...

    [...]