scispace - formally typeset
Journal ArticleDOI

Forecasting with twitter data

Reads0
Chats0
TLDR
Whether a public sentiment indicator extracted from daily Twitter messages can indeed improve the forecasting of social, economic, or commercial indicators is assessed and nonlinear models do take advantage of Twitter data when forecasting trends in volatility indices, while linear ones fail systematically when forecasting any kind of financial time series.
Abstract
The dramatic rise in the use of social network platforms such as Facebook or Twitter has resulted in the availability of vast and growing user-contributed repositories of data. Exploiting this data by extracting useful information from it has become a great challenge in data mining and knowledge discovery. A recently popular way of extracting useful information from social network platforms is to build indicators, often in the form of a time series, of general public mood by means of sentiment analysis. Such indicators have been shown to correlate with a diverse variety of phenomena. In this article we follow this line of work and set out to assess, in a rigorous manner, whether a public sentiment indicator extracted from daily Twitter messages can indeed improve the forecasting of social, economic, or commercial indicators. To this end we have collected and processed a large amount of Twitter posts from March 2011 to the present date for two very different domains: stock market and movie box office revenue. For each of these domains, we build and evaluate forecasting models for several target time series both using and ignoring the Twitter-related data. If Twitter does help, then this should be reflected in the fact that the predictions of models that use Twitter-related data are better than the models that do not use this data. By systematically varying the models that we use and their parameters, together with other tuning factors such as lag or the way in which we build our Twitter sentiment index, we obtain a large dataset that allows us to test our hypothesis under different experimental conditions. Using a novel decision-tree-based technique that we call summary tree we are able to mine this large dataset and obtain automatically those configurations that lead to an improvement in the prediction power of our forecasting models. As a general result, we have seen that nonlinear models do take advantage of Twitter data when forecasting trends in volatility indices, while linear ones fail systematically when forecasting any kind of financial time series. In the case of predicting box office revenue trend, it is support vector machines that make best use of Twitter data. In addition, we conduct statistical tests to determine the relation between our Twitter time series and the different target time series.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Machine learning

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Journal ArticleDOI

Insights from hashtag #supplychain and Twitter Analytics: Considering Twitter and Twitter data for supply chain practice and research

TL;DR: In this article, the authors proposed a novel analytical framework (Twitter Analytics) for analyzing supply chain tweets, highlighting the current use of Twitter in supply chain contexts, and further developing insights into the potential role of Twitter for supply chain practice and research.
Journal ArticleDOI

Using Data Sciences in Digital Marketing: Framework, methods, and performance metrics

TL;DR: A comprehensive literature review of major scientific contributions made so far in this research area is undertaken and a holistic overview of the main applications of Data Sciences to digital marketing is presented to generate insights related to the creation of innovative Data Mining and knowledge discovery techniques.
Journal ArticleDOI

Social media data analytics to improve supply chain management in food industries

TL;DR: A big-data analytics-based approach that considers social media (Twitter) data for the identification of supply chain management issues in food industries includes text analysis using a support vector machine (SVM) and hierarchical clustering with multiscale bootstrap resampling.
Journal ArticleDOI

Search engine marketing is not all gold

TL;DR: This study highlights how SEM often not only fails to provide benefits but also destructs value if not done properly, with potential fallouts which affect the long-term benefits.
References
More filters
Journal ArticleDOI

A Coefficient of agreement for nominal Scales

TL;DR: In this article, the authors present a procedure for having two or more judges independently categorize a sample of units and determine the degree, significance, and significance of the units. But they do not discuss the extent to which these judgments are reproducible, i.e., reliable.
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI

Machine learning

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Related Papers (5)