scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The power of prediction with social media

TL;DR: It is argued that statistical models seem to be the most fruitful approach to apply to make predictions from social media data in the field of social media-based prediction and forecasting.
Abstract: – Social media provide an impressive amount of data about users and their interactions, thereby offering computer and social scientists, economists, and statisticians – among others – new opportunities for research. Arguably, one of the most interesting lines of work is that of predicting future events and developments from social media data. However, current work is fragmented and lacks of widely accepted evaluation approaches. Moreover, since the first techniques emerged rather recently, little is known about their overall potential, limitations and general applicability to different domains. Therefore, better understanding the predictive power and limitations of social media is of utmost importance. , – Different types of forecasting models and their adaptation to the special circumstances of social media are analyzed and the most representative research conducted up to date is surveyed. Presentations of current research on techniques, methods, and empirical studies aimed at the prediction of future or current events from social media data are provided. , – A taxonomy of prediction models is introduced, along with their relative advantages and the particular scenarios where they have been applied to. The main areas of prediction that have attracted research so far are described, and the main contributions made by the papers in this special issue are summarized. Finally, it is argued that statistical models seem to be the most fruitful approach to apply to make predictions from social media data. , – This special issue raises important questions to be addressed in the field of social media-based prediction and forecasting, fills some gaps in current research, and outlines future lines of work.

Summary (2 min read)

Sea level change

  • Global mean sea level is rising at a rate of approximately 3.2 millimeters per year (Church et al. ).
  • Direct consequences of sea level rise on coastal areas include an increase in flooding area, an increase in erosion, and an increase in salinity and changes in ecosystems (Nicholls et al. ).
  • Estuarine circulation is mainly driven by freshwater flow, tides, and density differences (Garel et al. ).A studybyChua&Xu founda stronger longitudinal salinity gradient in estuaries due to sea level rise, which in turn drives a stronger gravitational circulation.
  • The increase in salinity will cause the water to become denser, and thus increase the stratification of the water column.
  • Changes in estuarine stratification and circulation will further cause oxygen depletion (Hong & Shen ).

Physical characteristics of the Guadiana Estuary

  • The Guadiana Estuary is formed at the interface of the Guadiana River and the Gulf of Cadiz.
  • This large reservoir was completed in 2002 and since then the freshwater flow into the estuary has been reduced from a yearly average of 143 to 16 m3/s (Garel et al. ).
  • When there is a higher tidal amplitude and lower discharge from the Guadiana River, tidal processes control the water circulation of Downloaded from http by guest on 19 April 2021 the estuary and the estuary becomes well-mixed (Garel & D’Alimonte ).
  • The water column is stratified only under extreme conditions (Basos ).

Model setup

  • The present study implements the same general setup of the model by Mills et al. and uses the same Cartesian computational grid of 1,400 × 350 cells with a resolution of 30 m.
  • Two months of simulation time is required due to the high residence time when the freshwater flow rate is low (Oliveira et al. ).
  • The present model consists of two separate bathymetries: (1) a bathymetry in which coastal management strategies are implemented to keep the coastline as it is (Figure 1) and (2) a bathymetry that allows for geomorphological changes caused by sea level rise and thus allows flooding around the estuary (Figure 2).
  • The 5th assessment report includes sea level rise forecasts up to the year 2100 for different Representative Concentration Pathways (RCP), more commonly known as greenhouse gas emissions (Church et al. ).

Areas of inundation

  • Areas of inundation were computed for the various scenarios of sea level rise and highest freshwater discharge scenario (100 m3/s) over the bathymetry allowing flooding.
  • The methodologies follow those of Mills et al. who computed flooding area as a function of the number of hours of land submersion during one tidal cycle, but only for a freshwater discharge of 500 m3/s.
  • The present study includes an analysis of flooding area for a river discharge of 100 m3/s as well as each spring and neap tide scenario.
  • Histograms and flood distribution maps were computed based on the percentage of time a land cell was covered by water in the grid during one tidal cycle.

Temporal evolution of salinity

  • Time series graphs were produced at each location shown in Figure 3, allowing for an assessment of the evolution of salinity over two tidal cycles (approximately 24.48 h) for each sea level rise scenario.
  • The average change in salinity every 30 years is summarized in Tables 1 and 2.
  • The simulations for salinity distribution assume a salinity value of 0 for freshwater and a value of 36 for seawater.

Horizontal distribution of salinity

  • The following section examines the horizontal distribution of salinity along with water velocity direction at a time instant 1 h before high tide.
  • As can be seen in the time series results, changes in salinity throughout the scenarios of sea level rise vary for the present bathymetry, whereas results from the alternate bathymetry reveal a correlation between sea level rise and salinity.
  • Thus, all horizontal distribution maps of salinity are shown for the present bathymetry, whereas only the present year compared with 2100 are shown for the bathymetry allowing flooding.
  • High freshwater discharge at spring tide See Figures 6 and 7.

DISCUSSION

  • The results obtained from the MOHID model have shown the dynamics of the Guadiana Estuary to be complex, especially with respect to the tides.
  • Areas further upstream of the estuary portray an increase in salinity when the river discharge is low.
  • Especially for the present bathymetry, decreases in salinity coincide with decreases in water velocity.
  • The results of this model indicate that bathymetry, freshwater flow, and spring-neap tide variability impact the horizontal distribution of salinity intrusion caused by sea level rise.
  • In terms of land inundation, all varying hydrodynamic factors result in an increase in inundation due to mean sea level rise.

CONCLUSION

  • All results portray an increase in salinity in response to sea level rise.
  • When the freshwater flow is low in the spring and summer months, areas located upstream of the estuary increase in salinity.
  • A limitation of this work is the use of a two-dimensional model instead of a three-dimensional model.
  • Future studies should use a three-dimensional model with real tidal signals Downloaded fr by guest on 19 April 202 to allow for a more complete evaluation of the Guadiana Estuary and how it responds to climate change.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

1
The Power of Prediction with Social Media
Harald Schoen, University of Bamberg (Germany), harald.schoen@uni-bamberg.de
Daniel Gayo-Avello, University of Oviedo (Spain), dani@uniovi.es
Panagiotis Takis Metaxas, Wellesley College and Harvard University (USA), pmetaxas@seas.harvard.edu
Eni Mustafaraj, Wellesley College (USA), eni.mustafaraj@wellesley.edu
Markus Strohmaier, Graz University of Technology (Austria), markus.strohmaier@tugraz.at
Peter Gloor, MIT (USA), pgloor@mit.edu
Abstract
Social media today provide an impressive amount of data about users and their societal
interactions, thereby offering computer and social scientists, economists, and statisticians
among others– many new opportunities for research exploration. Arguably, one of the most
interesting lines of work is that of predicting future events and developments based on social
media data, as we have recently seen in the areas of politics, finance, entertainment, market
demands, health, etc. In fact, an average of one in seven research papers presented at the WWW,
ICWSM and IEEE SocialCom Conferences between 2007 and 2012 contain the term “predict” in
their title. This upward trend, starting from 0 in 2006 and reaching 18% in 2012, shows a
significant interest of the research community in predicting with Social Media.
But what can be successfully predicted and why? Since the first algorithms and techniques
emerged rather recently, little is known about their overall potential, limitations and general
applicability to different domains.
Better understanding the predictive power and limitations of social media is therefore of utmost
importance, in order to be successful and avoid false expectations, misinformation or unintended
consequences. Today, current methods and techniques are far from being well understood, and it
is mostly unclear to what extent or under what conditions the different methods for prediction
can be applied to social media. While there exists a respectable and growing amount of literature
in this area, current work is fragmented, characterized by a lack of commonly accepted
evaluation approaches. Yet, this research seems to have reached a sufficient level of interest and
relevance to justify a dedicated section.
This special section aims to shape a frame of important questions to be addressed in this field,
and fill the gaps in current research with presentations of early research on algorithms,
techniques, methods and empirical studies aimed at the prediction of future or current events
based on user-generated content in social media.

2
Introduction
Human beings are fascinated with what will happen in the future and, indeed, we even associate
intelligence with an ability to predict future events (Hawkins 2004). In ancient times, several
techniques were invented including inspecting bird flights, haruspicy, and astrology. Later,
predictions were done mostly through experts who had developed their own intuitions and
methods of prediction. Unfortunately, such expert knowledge is idiosyncratic and cannot be
automatized or even duplicated. In more recent times, the research community has developed
much more sophisticated techniques that aim to predict future outcomes using data-based models.
Such model-based forecasts have proved to be quite successful in predicting a diversity of
outcomes including economic, societal, and political outcomes (e.g., Campbell 2008; Clements
and Hendry, 2011; Silver 2012). Despite their general success, even these models cannot predict
the future perfectly, because real-world outcomes can change in ways that are not anticipated by
data-based models.
The advent of social media provides researchers with a new and rich source of easily accessible
data about individuals, society and, potentially, the world in general. In particular, data from
social media captures online behavior of users who communicate or interact on a diversity of
issues and topics. It is the intent of this special section to focus on novel methods of prediction
that are based on data harvested from social media. In recent years, such data has shown to be
very popular with scholars interested in developing predictive models. With varying success, an
emerging community of researchers has utilized social media data for a wide variety of purposes,
for example, to predict stock market movements (e.g., Bollen et al., 2010), to predict
announcements of flu outbreaks (Lampos et.al., 2010), to forecast box-office revenues for
movies (Asur and Huberman, 2010) and even to predict election outcomes (Gayo-Avello, 2012),
to name a few. The models and areas of application are diverse and, moreover, predictions based
on social media data have also attracted considerable attention from the public through
traditional and online media. These media are projecting an impression of social media as a
widely accepted and reliable source of data for predicting future outcomes.
However, reality is more complicated than that. There are many theoretical and methodological
issues in predicting future outcomes using social media data that are far from being settled, and
deeper studies and experiments are required to discover the true potential of social media as a
reliable source of data. While prediction represents a problem in a wide variety of scholarly
fields, social media-based forecasts today receive significant attention. We thus consider it
appropriate to discuss and reflect on the promises of social media based forecasts as well as the
perils and pitfalls it is plagued with, and strategies to address these problems.
This special section aims to serve as a platform for these and related matters. Its intended
audience is computer scientists, social scientists, economists, statisticians, and other researchers
interested in the application of multidisciplinary approaches to exploit user-generated contents to
better understand (and predict) societal behaviors. This issue includes three works approaching

3
such topics from different points of view: the credibility of data appearing in social media; the
detection of unexpected phenomena as deviations of the “pulse” of social media; and a
conceptual framework to survey the current body of research.
This guest editorial is organized as follows: First, we discuss the different approaches to build
forecasting models. Then, we analyze how such models could be adapted to the special
circumstances of social media and the caveats that apply (e.g., the pervasive need of machine
learning methods to ensure the quality of data). Additionally, we discuss one peculiar
idiosyncrasy of social media-based forecasting: the fact that sometimes it is not forecasting but
nowcasting; i.e. the variable of interest is estimated in real time using online trails as proxies.
After that, we briefly survey the most representative research conducted in the topic up to date,
and introduce the papers published in this special section. The editorial concludes with some
final remarks.
Different Types of Forecasting Models
A fundamental question we need to address in order to tackle with previous issues is: What
enables prediction based on social media data? A first requirement is that the prediction itself
somehow must be encoded within the data; without any signal w.r.t. the phenomenon of interest
the data would be rendered useless. Second, the data collection needs to maintain the encoding of
the answer. Third, the analysis performed on the collected data is able to reveal the prediction.
Without all three of these fundamental requirements, predictions are either not possible or no
better than pure chance.
It is fundamental, therefore, to examine the ways the research community conducts both the
process of collecting data and its analysis. We observe that there are three prevailing practices:
Data could be collected through past logs of experiences, and statistical models are often
employed to make sense of them. Data could also be collected on demand. A traditional and
direct way to do that is by using polling, asking the public directly for their opinion or behavioral
intentions, as is done with survey models. However, social media provides an additional and
indirect
1
way to collect data on demand. Researchers can unobtrusively approach social media to
observe the public’s behavior and then derive their intention or opinion from the observed
behavior (e.g., using machine-learning techniques, cf. Bishop, 2006). When the interest lies in
the users’ opinion about the outcome of an event, rather than their intention with regards to it, the
method is somewhat comparable to prediction markets models.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
1
Maybe it would be more appropriate to denote such kind of collection as unobtrusive or non-
reactive; see, for instance, Janetzko (2008) for a broad introduction to this approach to
measurement in social sciences.
1

4
In the following subsections we will discuss these different types of forecasting models we have
seen in various fields. Without loss of generality, we use electoral predictions as a running
example for the discussion.
Prediction Market Models
The prediction market model attempts to capitalize on the so-called “wisdom of crowds”
approach (Surowiecki 2004). A large number of people give their best guesses for an outcome
variable. In this respect, this approach is based on subjective evidence. Then, the individual
guesses are aggregated in some way and the aggregate guess, according to this line of reasoning,
will closely approximate the real outcome. This approach is underlying a host of prediction
markets (e.g., Arrow et al., 2008; Rhode and Strumpf, 2004; Forsythe et al., 1992). Participants
deal with assets that are linked to the quantity of interest, i.e. the occurrence of an outcome or a
parameter, such as a party’s vote share. Market prices are thus interpreted as predictions of the
occurrence probability or another parameter of interest.
Prediction markets have been shown to be quite successful in predicting several outcomes (cf.
Wolfers and Zitzewitz, 2004). At the same time, it has been pointed out that successful market-
based predictions require certain preconditions to be met, including a sound market architecture
guaranteeing the heterogeneity of participants (Surowiecki 2004). Moreover, some critics object
that electoral markets simply mirror information available from other sources, i.e. election polls,
and add no new information (Erikson and Wlezien, 2008a).
Survey Models
We refer to the second kind of forecast models as “survey models” because their approach is
typical of election surveys that are sometimes used to predict election outcomes. In this model,
an appropriate random sample from the people who might affect future outcomes is required.
Then, the people included in this random sample are questioned about the ways in which they
intend to act (for example, vote in an election or purchase consumer goods). Then, the
distribution of behavioral intentions is interpreted as a forecast of the future outcome.
This procedure assumes that the sample is not biased and respondents’ future behavior does not
differ systematically from their stated intention (e.g., Perry, 1979; Rattinger and Ohr, 1993). The
usefulness of this model thus critically hinges upon the quality of the sample, the right questions
to be asked in the survey, and the interval between the interview and the future outcome. Quite
obviously, undecided respondents are a source of potential obstacles in the analysis.
Statistical Models

5
The third method builds on statistical models of the outcome of interest
2
. Using some kind of
time-series analysis, univariate models aim at detecting past regularities in the outcome variable
are then used to predict its future development. Multivariate models capture the relationship
between the outcome variable and several predictor variables. Whereas data-driven models
simply aim at detecting empirical relationships, theory-driven models identify predictors that can
be linked to the outcome in theoretically meaningful ways. In this vein, the vote share of an
incumbent party might be modeled as a function of the state of the economy several months
before an election, the results of trial heat polls, and the length of incumbency (for a variety of
electoral forecast models see, e.g., Campbell 2008; Erikson and Wlezien, 2008; Hibbs, 2008;
Holbrook, 2008). Having established a robust empirical model, predicting a future outcome
requires filling in relevant information on predictor variables and then calculating the dependent
variable (for a discussion of statistical issues in predictions see, e.g., Brandt et al., 2011;
Montgomery et al., 2012).
The success of statistical predictive models hinges upon the robustness of the empirical
relationships, in particular, the patterns detected in the past are assumed to hold in the future. In
the absence of a structural break, such predictions are likely to prove valuable. Yet, the absence
of structural breaks cannot be taken for granted. For example, an external shock might alter the
relationship between gross income and gross demand, or a new party might change the logic of
party competition and vote choice. Put differently, the success of statistical models crucially
hinges upon the assumption that the future closely resembles the past.
Forecasting Models with Social Media
In principle, the identified types of forecasting models can be adapted to and/or used in the
context of social media. Yet, it remains unclear what type of model best suits the characteristics
and the fabric of social media data.
Social media allow users to interact, to share content, and to create content collectively (O’Reilly,
2005; Shirky, 2008; Gauntlett et al., 2011). Social media comprise, inter alia, weblogs, social
networking sites, and platforms for music, video, and photo sharing. Every move users make on
social media is documented on machine readable formats. When analyzing these data, their
origin must be taken into account. In particular, Internet users and, even more so, users of social
media have voluntarily decided to use these applications and thus differ from the population at
large in terms of demographic characteristics, socio-economic variables, and socio-political
attitudes (e.g., Hargittai and Hinnant, 2008).
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
2
This approach, also labeled “econometric models”, is common place, for instance, in
economics (e.g., Clements and Hendry, 2011; Hendry and Ericsson, 2001; Granger et al., 2006)
or electoral forecasting (e.g., Lewis-Beck and Rice, 1992; Campbell and Garand, 2000).

Citations
More filters
Journal ArticleDOI
28 Nov 2014-Science
TL;DR: This paper argues that large-scale studies of human behavior in social media need to be held to higher methodological standards to produce the more accurate and statistically rigorous polls conducted today.
Abstract: On 3 November 1948, the day after Harry Truman won the United States presidential elections, the Chicago Tribune published one of the most famous erroneous headlines in newspaper history: “Dewey Defeats Truman” ( 1 , 2 ). The headline was informed by telephone surveys, which had inadvertently undersampled Truman supporters ( 1 ). Rather than permanently discrediting the practice of polling, this event led to the development of more sophisticated techniques and higher standards that produce the more accurate and statistically rigorous polls conducted today ( 3 ).

532 citations

Journal ArticleDOI
11 Jul 2019
TL;DR: A framework for identifying a broad range of menaces in the research and practices around social data is presented, including biases and inaccuracies at the source of the data, but also introduced during processing.
Abstract: Social data in digital form—including user-generated content, expressed or implicit relations between people, and behavioral traces—are at the core of popular applications and platforms, driving the research agenda of many researchers. The promises of social data are many, including understanding “what the world thinks” about a social issue, brand, celebrity, or other entity, as well as enabling better decision-making in a variety of fields including public policy, healthcare, and economics. Many academics and practitioners have warned against the naive usage of social data. There are biases and inaccuracies occurring at the source of the data, but also introduced during processing. There are methodological limitations and pitfalls, as well as ethical boundaries and unexpected consequences that are often overlooked. This paper recognizes the rigor with which these issues are addressed by different researchers varies across a wide range. We identify a variety of menaces in the practices around social data use, and organize them in a framework that helps to identify them. “For your own sanity, you have to remember that not all problems can be solved. Not all problems can be solved, but all problems can be illuminated.” –Ursula Franklin1

379 citations

Journal ArticleDOI
TL;DR: In this article, the authors focus on the use of social media platforms in political communication and examine how politicians use different platforms in their campaigns, focusing on the German federal election in 2017.
Abstract: Although considerable research has concentrated on online campaigning, it is still unclear how politicians use different social media platforms in political communication. Focusing on the German fe...

286 citations

Journal ArticleDOI
TL;DR: The results confirm that advertising on Facebook has a positive influence on the behavioural attitudes (intention-to-purchase and purchase) of Millennials who reside in SA.
Abstract: Purpose – The purpose of this paper is to investigate the influence of behavioural attitudes towards the most popular social medium in the world, Facebook, amongst Millennials in South Africa (SA), and to determine whether various usage and demographic variables have an impact on intention-to-purchase and purchase perceptions. Design/methodology/approach – Quantitative research was conducted by means of a survey among a sample of over 3,500 respondents via self-administered structured questionnaires in SA. A generalised linear model was used to analyse the data. Findings – The results confirm that advertising on Facebook has a positive influence on the behavioural attitudes (intention-to-purchase and purchase) of Millennials who reside in SA. The usage characteristics, log on duration and profile update incidence, as well as the demographic influence of ethnic orientation also resulted in more favourable perceptions of Facebook advertising. Research limitations/implications – Research on Facebook advertis...

265 citations


Cites background from "The power of prediction with social..."

  • ...Stevenson et al. (2000) disclosed that an unfavourable attitude towards online advertising was related to low purchase intention, whereas Wolin et al. (2002) proposed that a favourable attitude towards online advertising usually resulted in more recurrent online purchasing and greater online spending....

    [...]

  • ...Hence, a longitudinal approach would yield more complete results, as inferred by Kalampokis et al. (2013) and Schoen et al. (2013)....

    [...]

  • ...Stevenson et al. (2000) disclosed that an unfavourable attitude towards online advertising was related to low purchase intention, whereas Wolin et al....

    [...]

Journal ArticleDOI
TL;DR: This paper develops a Big Data architecture that properly integrates most of the non-traditional information sources and data analysis methods in order to provide a specifically designed system for forecasting social and economic behaviors, trends and changes.

205 citations

References
More filters
Book
Christopher M. Bishop1
17 Aug 2006
TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

22,840 citations

Journal ArticleDOI
TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.
Abstract: (2007). Pattern Recognition and Machine Learning. Technometrics: Vol. 49, No. 3, pp. 366-366.

18,802 citations


"The power of prediction with social..." refers background or methods in this paper

  • ...Researchers can unobtrusively approach social media to observe the public’s behavior and then derive their intention or opinion from the observed behavior (e.g., using machine-learning techniques, cf. Bishop, 2006)....

    [...]

  • ...Unfortunately, their methods are quite different from those applied by Asur and Huberman and, thus, in the absence of further research, it is difficult to ascertain if and under what conditions consumer patterns can be predicted from social media....

    [...]

  • ...Machine learning methods are commonly used to face such tasks and, hence, any researcher or practitioner interested in the area of social media-based forecasting should be familiar with them (cf., Bishop, 2006)....

    [...]

Posted Content
TL;DR: This paper was the first initiative to try to define Web 2.0 and understand its implications for the next generation of software, looking at both design patterns and business modes.
Abstract: This paper was the first initiative to try to define Web2.0 and understand its implications for the next generation of software, looking at both design patterns and business modes. Web 2.0 is the network as platform, spanning all connected devices; Web 2.0 applications are those that make the most of the intrinsic advantages of that platform: delivering software as a continually-updated service that gets better the more people use it, consuming and remixing data from multiple sources, including individual users, while providing their own data and services in a form that allows remixing by others, creating network effects through an "architecture of participation," and going beyond the page metaphor of Web 1.0 to deliver rich user experiences.

7,513 citations

Book
08 Jul 2008
TL;DR: This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems and focuses on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis.
Abstract: An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.

7,452 citations

Book
01 May 2012
TL;DR: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language as discussed by the authors and is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining.
Abstract: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language. It is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining. In fact, this research has spread outside of computer science to the management sciences and social sciences due to its importance to business and society as a whole. The growing importance of sentiment analysis coincides with the growth of social media such as reviews, forum discussions, blogs, micro-blogs, Twitter, and social networks. For the first time in human history, we now have a huge volume of opinionated data recorded in digital form for analysis. Sentiment analysis systems are being applied in almost every business and social domain because opinions are central to almost all human activities and are key influencers of our behaviors. Our beliefs and perceptions of reality, and the choices we make, are largely conditioned on how others see and evaluate the world. For this reason, when we need to make a decision we often seek out the opinions of others. This is true not only for individuals but also for organizations. This book is a comprehensive introductory and survey text. It covers all important topics and the latest developments in the field with over 400 references. It is suitable for students, researchers and practitioners who are interested in social media analysis in general and sentiment analysis in particular. Lecturers can readily use it in class for courses on natural language processing, social media analysis, text mining, and data mining. Lecture slides are also available online.

4,515 citations

Frequently Asked Questions (8)
Q1. What contributions have the authors mentioned in the paper "The power of prediction with social media" ?

This upward trend, starting from 0 in 2006 and reaching 18 % in 2012, shows a significant interest of the research community in predicting with Social Media. While there exists a respectable and growing amount of literature in this area, current work is fragmented, characterized by a lack of commonly accepted evaluation approaches. Yet, this research seems to have reached a sufficient level of interest and relevance to justify a dedicated section. This special section aims to shape a frame of important questions to be addressed in this field, and fill the gaps in current research with presentations of early research on algorithms, techniques, methods and empirical studies aimed at the prediction of future or current events based on user-generated content in social media. Since the first algorithms and techniques emerged rather recently, little is known about their overall potential, limitations and general applicability to different domains. 

If there is anything that the human experience has taught us is that predicting the future is both highly desirable and extremely difficult. While the authors do not have reasons to doubt it, they are consciously cautious about the validity of their own arguments regarding the future of forecasting using social media data. In the future, one might identify the existence of a new, fourth prediction model that is made possible by the idiosyncrasies of social media. 

When publishing their results, it is of utmost importance to report decisions concerning market design issues, including resistance to tampering, as they might influence prediction outcomes. 

The main challenges of the proposed methods lie in finding an automatic way to determine the best keywords and their associated weights to fit user-generated data to the ground truth available for training. 

This is of interest because sentiment analysis has become inextricably associated with social media-based prediction –although up to now it has been applied under the form of very simple methods. 

despite of the model of choice, social media poses problems regarding the quality and credibility of the collected data, and those problems must be addressed with techniques which are independent of the predictive models. 

electoral forecasting from social media is a field where further research is needed, and where the main focus should be put on providing general models that could be applied not to a single election but to multiple elections. 

Probably reflecting its financial importance, research on predicting the stock market from Web data largely predates the existence of user-generated content.