Automated news reading: Stock price prediction based on financial news using context-capturing features

doi:10.1016/J.DSS.2013.02.006

Home
/
Papers
/
Automated news reading: Stock price prediction based on financial news using context-capturing features

Journal Article•DOI•

Automated news reading: Stock price prediction based on financial news using context-capturing features

Michael Hagenau¹, Michael Liebmann¹, Dirk Neumann¹•Institutions (1)

University of Freiburg¹

01 Jun 2013-Vol. 55, Iss: 3, pp 685-697

TL;DR: It is shown that a robust feature selection allows lifting classification accuracies significantly above previous approaches when combined with complex feature types and reduces the problem of over-fitting when applying a machine learning approach.

read less

Abstract: We examine whether stock price prediction based on textual information in financial news can be improved as previous approaches only yield prediction accuracies close to guessing probability. Accordingly, we enhance existing text mining methods by using more expressive features to represent text and by employing market feedback as part of our feature selection process. We show that a robust feature selection allows lifting classification accuracies significantly above previous approaches when combined with complex feature types. This is because our approach allows selecting semantically relevant features and thus, reduces the problem of over-fitting when applying a machine learning approach. We also demonstrate that our approach is highly profitable for trading in practice. The methodology can be transferred to any other application area providing textual information and corresponding effect data.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Sentiment analysis algorithms and applications: A survey

[...]

Walaa Medhat¹, Ahmed Hassan², Hoda Korashy²•Institutions (2)

Hodges University¹, Ain Shams University²

01 Dec 2014-Ain Shams Engineering Journal

TL;DR: This survey paper tackles a comprehensive overview of the last update in this field of sentiment analysis with sophisticated categorizations of a large number of recent articles and the illustration of the recent trend of research in the sentiment analysis and its related areas.

...read moreread less

2,152 citations

Cites background or methods from "Automated news reading: Stock price..."

...SA is not only applied on product reviews but can also be applied on stock markets [4,5], news articles, [6] or political debates [7]....
[...]
...Table 1 contains the articles reference [4–7] and [12–61]....
[...]
...Hagenau and Liebmann [5] used feedback features by employing market feedback as part of their feature selection process regarding stock market data....
[...]

Journal Article•DOI•

Review: Text mining for market prediction: A systematic review

[...]

Arman Khadjeh Nassirtoussi¹, Saeed Aghabozorgi¹, Teh Ying Wah¹, David Chek Ling Ngo²•Institutions (2)

Information Technology University¹, Sunway University²

01 Nov 2014-Expert Systems With Applications

TL;DR: A comparative analysis of the systems based on market prediction based on online-text-mining expands onto the theoretical and technical foundations behind each and should help the research community to structure this emerging field and identify the exact aspects which require further research and are of special significance.

...read moreread less

Abstract: The quality of the interpretation of the sentiment in the online buzz in the social media and the online news can determine the predictability of financial markets and cause huge gains or losses. That is why a number of researchers have turned their full attention to the different aspects of this problem lately. However, there is no well-rounded theoretical and technical framework for approaching the problem to the best of our knowledge. We believe the existing lack of such clarity on the topic is due to its interdisciplinary nature that involves at its core both behavioral-economic topics as well as artificial intelligence. We dive deeper into the interdisciplinary nature and contribute to the formation of a clear frame of discussion. We review the related works that are about market prediction based on online-text-mining and produce a picture of the generic components that they all have. We, furthermore, compare each system with the rest and identify their main differentiating factors. Our comparative analysis of the systems expands onto the theoretical and technical foundations behind each. This work should help the research community to structure this emerging field and identify the exact aspects which require further research and are of special significance.

...read moreread less

476 citations

Cites background or methods or result from "Automated news reading: Stock price..."

...Some of the other works are using a technique called n-grams (Butler & Kešelj, 2009; Hagenau et al., 2013)....
[...]
...…and up to multiple years with the longest at 24 years from 1980 to 2004 by Tetlock (2007) followed by 14 years from 1997 to 2011 in the work of Hagenau et al. (2013), 13 years from 1994 to 2007 in the work of Li (2010) with the latter looking at an annual timeframe and the formers at daily…...
[...]
...Table 4 (continued) Reference Algorithm type Algorithm details Training vs. testing volume and sampling Sliding window Semantics Syntax News and tech. data Software Wuthrich et al. (1998) Multi-algorithm experiments k-NN, ANNs, naïve Bayes, rule-based Last 100 training days to forecast 1 day Yes Yes No No Not mentioned Werner and Myrray (2004) Naïve Bayes, SVM 1000 messages vs. the rest No No No No Rainbow package Groth and Muntermann (2011) Naïve Bayes, k-NN, ANN, SVM Stratified cross validations No No No No Not mentioned (D) Decision rules and trees: it is the next group of algorithms used in the literature as indicated in Table 4....
[...]
...Experiment periods are also contrasted with shortest being only 5 days in case of the work of Peramunetilleke and Wong (2002) and up to multiple years with the longest at 24 years from 1980 to 2004 by Tetlock (2007) followed by 14 years from 1997 to 2011 in the work of Hagenau et al. (2013), 13 years from 1994 to 2007 in the work of Li (2010) with the latter looking at an annual timeframe and the formers at daily timeframes....
[...]
...Few of the researchers have taken advantage of these additional inputs as indicated in Table 4 (Butler & Kešelj, 2009; Hagenau et al., 2013; Rachlin et al., 2007; Schumaker & Chen, 2009; Schumaker et al., 2012; Zhai et al., 2007)....
[...]

Journal Article•DOI•

Natural language based financial forecasting: a survey

[...]

Frank Z. Xing¹, Erik Cambria¹, Roy E. Welsch²•Institutions (2)

Nanyang Technological University¹, Massachusetts Institute of Technology²

01 Jun 2018-Artificial Intelligence Review

TL;DR: This review article clarifies the scope of NLFF research by ordering and structuring techniques and applications from related work, and aims to increase the understanding of progress and hotspots in NLFF, and bring about discussions across many different disciplines.

...read moreread less

Abstract: Natural language processing (NLP), or the pragmatic research perspective of computational linguistics, has become increasingly powerful due to data availability and various techniques developed in the past decade. This increasing capability makes it possible to capture sentiments more accurately and semantics in a more nuanced way. Naturally, many applications are starting to seek improvements by adopting cutting-edge NLP techniques. Financial forecasting is no exception. As a result, articles that leverage NLP techniques to predict financial markets are fast accumulating, gradually establishing the research field of natural language based financial forecasting (NLFF), or from the application perspective, stock market prediction. This review article clarifies the scope of NLFF research by ordering and structuring techniques and applications from related work. The survey also aims to increase the understanding of progress and hotspots in NLFF, and bring about discussions across many different disciplines.

...read moreread less

270 citations

Cites background from "Automated news reading: Stock price..."

...Prior to it, some relevant discussions about news impact on stock markets can be spotted within papers, such as [67] and [43]....
[...]
...Prior to it, some relevant discussions about news impact on stockmarkets can be spotted within papers, such as Li et al. (2014b) and Hagenau et al. (2013)....
[...]

Journal Article•DOI•

The impact of microblogging data for stock market prediction: Using Twitter to predict returns, volatility, trading volume and survey sentiment indices

[...]

Nuno Oliveira¹, Paulo Cortez¹, Nelson Areal¹•Institutions (1)

University of Minho¹

01 May 2017-Expert Systems With Applications

TL;DR: It was found that Twitter sentiment and posting volume were relevant for the forecasting of returns of S&P 500 index, portfolios of lower market capitalization and some industries, and KF sentiment was informative for the forecast of returns.

...read moreread less

Abstract: In this paper, we propose a robust methodology to assess the value of microblogging data to forecast stock market variables: returns, volatility and trading volume of diverse indices and portfolios. The methodology uses sentiment and attention indicators extracted from microblogs (a large Twitter dataset is adopted) and survey indices (AAII and II, USMC and Sentix), diverse forms to daily aggregate these indicators, usage of a Kalman Filter to merge microblog and survey sources, a realistic rolling windows evaluation, several Machine Learning methods and the Diebold-Mariano test to validate if the sentiment and attention based predictions are valuable when compared with an autoregressive baseline. We found that Twitter sentiment and posting volume were relevant for the forecasting of returns of S&P 500 index, portfolios of lower market capitalization and some industries. Additionally, KF sentiment was informative for the forecasting of returns. Moreover, Twitter and KF sentiment indicators were useful for the prediction of some survey sentiment indicators. These results confirm the usefulness of microblogging data for financial expert systems, allowing to predict stock market behavior and providing a valuable alternative for existing survey measures with advantages (e.g., fast and cheap creation, daily frequency).

...read moreread less

255 citations

Journal Article•DOI•

Deep learning-based feature engineering for stock price movement prediction

[...]

Wen Long¹, Zhichen Lu¹, Lingxiao Cui¹•Institutions (1)

Chinese Academy of Sciences¹

15 Jan 2019-Knowledge Based Systems

TL;DR: Experimental results show that the proposed novel end-to-end multi-filters neural network outperforms traditional machine learning models, statistical models, and single-structure networks in terms of the accuracy, profitability, and stability.

...read moreread less

Abstract: Stock price modeling and prediction have been challenging objectives for researchers and speculators because of noisy and non-stationary characteristics of samples. With the growth in deep learning, the task of feature learning can be performed more effectively by purposely designed network. In this paper, we propose a novel end-to-end model named multi-filters neural network (MFNN) specifically for feature extraction on financial time series samples and price movement prediction task. Both convolutional and recurrent neurons are integrated to build the multi-filters structure, so that the information from different feature spaces and market views can be obtained. We apply our MFNN for extreme market prediction and signal-based trading simulation tasks on Chinese stock market index CSI 300. Experimental results show that our network outperforms traditional machine learning models, statistical models, and single-structure(convolutional, recurrent, and LSTM) networks in terms of the accuracy, profitability, and stability.

...read moreread less

253 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55

Collapse

References

PDF

Open Access

More filters

Book•

Artificial Intelligence: A Modern Approach

[...]

Stuart Russell¹, Peter Norvig²•Institutions (2)

University of California, Berkeley¹, University of Southern California²

01 Jan 2020

TL;DR: In this article, the authors present a comprehensive introduction to the theory and practice of artificial intelligence for modern applications, including game playing, planning and acting, and reinforcement learning with neural networks.

...read moreread less

Abstract: The long-anticipated revision of this #1 selling book offers the most comprehensive, state of the art introduction to the theory and practice of artificial intelligence for modern applications. Intelligent Agents. Solving Problems by Searching. Informed Search Methods. Game Playing. Agents that Reason Logically. First-order Logic. Building a Knowledge Base. Inference in First-Order Logic. Logical Reasoning Systems. Practical Planning. Planning and Acting. Uncertainty. Probabilistic Reasoning Systems. Making Simple Decisions. Making Complex Decisions. Learning from Observations. Learning with Neural Networks. Reinforcement Learning. Knowledge in Learning. Agents that Communicate. Practical Communication in English. Perception. Robotics. For computer professionals, linguists, and cognitive scientists interested in artificial intelligence.

...read moreread less

16,983 citations

Journal Article•DOI•

A Tutorial on Support Vector Machines for Pattern Recognition

[...]

Christopher John Burges¹•Institutions (1)

Alcatel-Lucent¹

01 Jun 1998-Data Mining and Knowledge Discovery

TL;DR: There are several arguments which support the observed high accuracy of SVMs, which are reviewed and numerous examples and proofs of most of the key theorems are given.

...read moreread less

Abstract: The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light.

...read moreread less

15,696 citations

Book•

Introduction to Modern Information Retrieval

[...]

Gerard Salton, Michael J. McGill

01 Jan 1983

TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.

...read moreread less

Abstract: Some people may be laughing when looking at you reading in your spare time. Some may be admired of you. And some may want be like you who have reading hobby. What about your own feel? Have you felt right? Reading is a need and a hobby at once. This condition is the on that will make you feel that you must read. If you know are looking for the book enPDFd introduction to modern information retrieval as the choice of reading, you can find here.

...read moreread less

12,059 citations

Book Chapter•DOI•

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

[...]

Thorsten Joachims¹•Institutions (1)

Technical University of Dortmund¹

21 Apr 1998

TL;DR: This paper explores the use of Support Vector Machines for learning text classifiers from examples and analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task.

...read moreread less

Abstract: This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task. Empirical results support the theoretical findings. SVMs achieve substantial improvements over the currently best performing methods and behave robustly over a variety of different learning tasks. Furthermore they are fully automatic, eliminating the need for manual parameter tuning.

...read moreread less

8,658 citations

Journal Article•DOI•

An algorithm for suffix stripping

[...]

M. F. Porter¹•Institutions (1)

University of Cambridge¹

01 Dec 1997-Program: Electronic Library and Information Systems

TL;DR: An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL, and performs slightly better than a much more elaborate system with which it has been compared.

...read moreread less

Abstract: The automatic removal of suffixes from words in English is of particular interest in the field of information retrieval. An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL. Although simple, it performs slightly better than a much more elaborate system with which it has been compared. It effectively works by treating complex suffixes as compounds made up of simple suffixes, and removing the simple suffixes in a number of steps. In each step the removal of the suffix is made to depend upon the form of the remaining stem, which usually involves a measure of its syllable length.

...read moreread less

7,572 citations