scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Box-office forecasting based on sentiments of movie reviews and Independent subspace method

01 Dec 2016-Information Sciences (Elsevier Science Inc.)-Vol. 372, pp 608-624
TL;DR: New box-office forecasting models are presented to enhance the forecasting accuracy by utilizing review sentiments and employing non-linear machine learning algorithms, and an independent subspace method (ISM) is applied.
About: This article is published in Information Sciences.The article was published on 2016-12-01. It has received 70 citations till now. The article focuses on the topics: Probabilistic forecasting & Support vector machine.
Citations
More filters
Journal ArticleDOI
TL;DR: Both social media data and stock values are essential to forecast monthly total vehicle sales; and deseasonalizing procedures by the LSSVR models can improve forecasting accuracy in predicting monthlyTotal vehicle sales.
Abstract: Owning to the booming of social media, making comments or expressing opinions about merchandises online becomes easier than before. Data from social media might be one of the essential inputs for forecasting sales of vehicles. Besides, some other effects, such as stock market values, have influences on purchasing power of vehicles. In this paper, both multivariate regression models with social media data and stock market values and time series models are employed to predict monthly total vehicle sales. The least squares support vector regression (LSSVR) models are used to deal with multivariate regression data. Three types of data, namely sentiment scores of tweets, stock market values, and hybrid data, are employed in this paper to forecast monthly total vehicle sales in USA. The hybrid data contain both sentiment scores of tweets and stock market values. In addition, seasonal factors of monthly total vehicle sales are employed to deseasonalizing both monthly total vehicle sales and three types of input data. The time series models include the naive model, the exponential smoothing model, the autoregressive integrated moving average model, the seasonal autoregressive integrated moving average model, and backpropagation neural networks and LSSVR with time series models. The numerical results indicate that using hybrid data with deseasonalizing procedures by the LSSVR models can obtain more accurate results than other models with different data. Thus, both social media data and stock values are essential to forecast monthly total vehicle sales; and deseasonalizing procedures can improve forecasting accuracy in predicting monthly total vehicle sales.

71 citations


Cites methods from "Box-office forecasting based on sen..."

  • ...[12] employed multiple linear regressions, classification and regression trees, artificial neural networks, and support vector regression models to forecast the number of films audiences....

    [...]

Journal ArticleDOI
TL;DR: The applicability of the developed approach is explored by a case study, in which customer reviews about hotel experiences are evaluated using lexicon based sentiment analysis and alternative hotels are ranked according to the findings from the sentiment analysis by the Intuitionistic fuzzy (IF)-ELECTRE integrated with VIKOR methodology.

68 citations

Journal ArticleDOI
TL;DR: To address the shortcomings of limited research in forecasting the power of social media in India, sentiment analysis and prediction algorithms are used to analyze the performance of Indian movies based on data obtained from social media sites.
Abstract: Purpose – The purpose of this paper is to address the shortcomings of limited research in forecasting the power of social media in India. Design/methodology/approach – This paper uses sentiment analysis and prediction algorithms to analyze the performance of Indian movies based on data obtained from social media sites. The authors used Twitter4j Java API for extracting the tweets through authenticating connection with Twitter web sites and stored the extracted data in MySQL database and used the data for sentiment analysis. To perform sentiment analysis of Twitter data, the Probabilistic Latent Semantic Analysis classification model is used to find the sentiment score in the form of positive, negative and neutral. The data mining algorithm Fuzzy Inference System is used to implement sentiment analysis and predict movie performance that is classified into three categories: hit, flop and average. Findings – In this study the authors found results of movie performance at the box office, which had been based ...

52 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel framework for gauging the ratings of online reviews using machine learning techniques, and focuses on improving the accuracy of neutral sentiment, and concludes by showing how this can be achieved without sacrificing the accuracies of positive or negative ratings.
Abstract: Online reviews are becoming increasingly important for decision-making. Consumers often refer to online reviews for opinions before making a purchase. Marketers also acknowledge the importance of online reviews and use them to improve product success. However, the massive amount of online review data, as well as its unstructured nature, is a challenge for anyone wanting to derive a conclusion quickly. In this paper, we propose a novel framework for gauging the ratings of online reviews using machine learning techniques. This framework uses a combination of text pre-processing and feature extraction methods. Here, we investigate four different aspects of the new framework. First, we assess the performance of single and ensemble classifiers in predicting sentiment—positive or negative—initially on a specific dataset (Yelp), but subsequently also on two other datasets (Amazon's product reviews and a movie review dataset). Second, using the best identified classifiers, we improve the accuracy with which neutral polarity can be predicted, an ability largely overlooked in the literature. Third, we further improve the performance of these classifiers by testing different pre-processing and feature extraction methods. Finally, we measure how well our deep learning approach performs on the same task compared to the best previously identified classifiers. Our extensive testing shows that the linear-kernel support vector machine, logistic regression and multilayer perceptron are the three best single classifiers in terms of accuracy, precision, recall, and F-measure. Their performance could be further improved if they were used as base classifiers for ensemble models. We also observe that several text pre-processing techniques—negation word identification, word elongation correction, and part of speech lemmatisation (combined with Terms Frequency and N-gram words)—can increase accuracy. In addition, we demonstrate that the general sentiment of lexicons such as SentiWordNet 3.0 and SenticNet 4 can be used to generate features with good results, although deep learning models can perform equally well. Experiments with different datasets confirm that our framework provides consistent outcomes. In particular, we have focused on improving the accuracy of neutral sentiment, and we conclude by showing how this can be achieved without sacrificing the accuracy of positive or negative ratings.

30 citations

References
More filters
Book
Vladimir Vapnik1
01 Jan 1995
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

40,147 citations


"Box-office forecasting based on sen..." refers methods in this paper

  • ...The performance variation from worst to best among the forecasting models when the ISM is applied is about -5% to 15% in the ANN, and about -8% to 13% in the SVR, respectively, which is smaller than the approximately -43% to 12% in the CART....

    [...]

  • ...Model W23 (j = 2, k = 3) Category Variable MLR CART ANN SVR Total External Factor Ratio Scrk 86.53∗∗∗ 102.02∗∗∗ 80.89∗∗∗ 94.61∗∗∗ 4 Seasonalityk 5.47∗∗∗ 4.69∗∗∗ 2.86∗ 6.23∗∗∗ 4 Audience Factor Avg ratingj 4.49*** 2.15∗ 7.31∗∗ 6.02∗∗∗ 4 N ratingj 0.22∗ 1.38∗∗∗ 2 Sum PosRevj 0.24*** 7.88∗∗∗ 1.24∗∗ 3 Sum NeuRevj 0.24∗ 1 Sum NegRevj 1.17∗∗∗ 1.39*** 2 Ratio PosRevj 0.24*** 0.07 2 Ratio NeuRevj 0 Ratio NegRevj 0.12∗∗∗ 0.96 0.27∗ 3 Motion Picture Factor KOR 0.09 1.35∗∗∗ 2 US 0 0.15 0.75*** 3 MPAA 0.56∗∗∗ 0.96∗ 1.42∗∗∗ 3 Dir Pop 0.09 0.54 0.64∗∗∗ 3 Sales Pow 0.01 1 Dis Pow 1.34*** 1 Act Pop 2.84∗∗∗ 1.00∗∗ 2.87 4.48∗∗∗ 4 N audj 388.36∗∗∗ 433.03∗∗∗ 373.41∗∗∗ 397.70∗∗∗ 4 Total number of important factors 13 8 8 17 week, which is considered as the most influential variable for the box-office forecasting, is included or not....

    [...]

  • ...Algorithm Parameters Settings MLR - - CART Pruning Off Min. number of observations per tree leaf 1 Min. number of observations per tree parent 10 ANN Number of hidden nodes 1, 3, 5, 7, 9, 11, 13, 15 Max. number of epochs 300 Target training error 1.0× 10−7 SVR Kernel type Gaussian (RBF kernel) Gamma 2 ∧ {−10,−9, · · · ,−1, 0} nu 0.1, 0.3, 0.5, 0.7, 0.9 Cost 2 ∧ {−1, 0, · · · , 7, 8} instances were randomly sampled....

    [...]

  • ...Model W01 (j = 0, k = 1) Category Variable MLR CART ANN SVR Total External Factor Ratio Scrk 197.07*** 347.81*** 273.68*** 247.78*** 4 Seasonalityk 27.84*** 20.67*** 34.05*** 39.81*** 4 Audience Factor Avg ratingj 6.98*** 1.87* 7.98*** 3.24*** 4 N ratingj 1.47*** 1.23 4.11*** 5.83*** 4 Sum PosRevj 0 Sum NeuRevj 0.11 0.97 1.96 0.45 4 Sum NegRevj 0.2 1 Ratio PosRevj 0.72 1.32 0.17 3 Ratio NeuRevj 3.71** 1 Ratio NegRevj 7.90*** 0.59*** 2 Motion Picture Factor KOR 2.05*** 3.32** 4.48*** 3 US 3.03*** 1.46* 7.06*** 4.19*** 4 MPAA 1.37*** 1.03 9.22*** 3 Dir Pop 3.70*** 4.74*** 3.87*** 6.17*** 4 Sales Pow 1.77* 1 Dis Pow 0.88*** 0.65* 2.37* 0.35 4 Act Pop 17.08*** 17.33*** 21.59*** 15.43*** 4 N audj − − − − − Total number of important factors 11 11 15 13 prior to the release....

    [...]

  • ...The best algorithm from the ANN and SVR varies depending on the forecasting model and evaluation criteria....

    [...]

Journal ArticleDOI
TL;DR: This tutorial gives an overview of the basic ideas underlying Support Vector (SV) machines for function estimation, and includes a summary of currently used algorithms for training SV machines, covering both the quadratic programming part and advanced methods for dealing with large datasets.
Abstract: In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.

10,696 citations

Book
08 Jul 2008
TL;DR: This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems and focuses on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis.
Abstract: An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.

7,452 citations


"Box-office forecasting based on sen..." refers methods in this paper

  • ...A sentiment corpus was built by simply looking up a small 135 number of words in a dictionary in the public domain [41] because building up a large number of words for each domain requires significant time and cost [17, 42, 54]....

    [...]

Book
01 Dec 1981
TL;DR: In this paper, the authors propose a simple linear regression model with variable selection and multicollinearity for robust regression, and validate the model using regression analysis and validation of regression models.
Abstract: Preface. Introduction. Simple Linear Regression. Multiple Linear Regression. Model Adequacy Checking. Transformations and Weighting to Correct Model Inadequacies. Diagnostics for Leverage and Influence. Polynomial Regression Models. Indicator Variables. Variable Selection and Model Building. Multicollinearity. Robust Regression. Introduction to Nonlinear Regression. Generalized Linear Models. Other Topics in the Use of Regression Analysis. Validation of Regression Models. Appendix A. Statistical Tables. Appendix B. Data Sets for Exercises. Appendix C. Supplemental Technical Material. References. Index.

5,664 citations

Posted Content
TL;DR: A simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (Thumbs down) if the average semantic orientation of its phrases is positive.
Abstract: This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs. A phrase has a positive semantic orientation when it has good associations (e.g., "subtle nuances") and a negative semantic orientation when it has bad associations (e.g., "very cavalier"). In this paper, the semantic orientation of a phrase is calculated as the mutual information between the given phrase and the word "excellent" minus the mutual information between the given phrase and the word "poor". A review is classified as recommended if the average semantic orientation of its phrases is positive. The algorithm achieves an average accuracy of 74% when evaluated on 410 reviews from Epinions, sampled from four different domains (reviews of automobiles, banks, movies, and travel destinations). The accuracy ranges from 84% for automobile reviews to 66% for movie reviews.

4,526 citations


"Box-office forecasting based on sen..." refers background in this paper

  • ...A particular expression can be viewed as positive or negative based on the domain [49, 52]....

    [...]