scispace - formally typeset
Open Access

Automating Text Processing Using Analytics - Automating text classifications and financial news parsing

Thomas Forss
TLDR
This thesis presents an automation approach using machine learning in which it is shown how to improve text classification performance, and how this approach can reach practically acceptable performance levels even in certain abstract classification problems.
Abstract
Automating repetitive processes and replacing manual tasks with automated systems is an area of research that will greatly impact and transform our lives during the 21st century. Automation comes in many forms and we are now at the start of an era, after which repetitive non-creative tasks will be handled mainly by machines. In this thesis, two analytics approaches are presented that can be used to automate text processing tasks. The first is an automation approach using machine learning in which we show how we can improve text classification performance, and how we, through these improvements, can reach practically acceptable performance levels even in certain abstract classification problems. We test the developed methods on problematic web content categories, such as violence, racism, and hate. The second is an automation approach that uses network analytics to automatically process texts. We use this approach to automate processing of financial news and to automatically extract new information. We show that through automating the process, we can extract company specific sentimentrisks that a person would not identify simply by reading the news articles. Lastly, we show that the risks we have extracted can be used to identify companies that are at higher risk of stock price decrease.

read more

Citations
References
More filters
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Journal ArticleDOI

Support-Vector Networks

TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

Statistical learning theory

TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
Book

Designing and Conducting Mixed Methods Research

TL;DR: This book discusses writing and Evaluating Mixed Methods Research, and the importance of knowing the structure of the writing so that it Relates to the Designs Evaluating a Mixed Methods Study Within Designs.
Journal ArticleDOI

Efficient capital markets: a review of theory and empirical work*

Eugene F. Fama
- 01 May 1970 - 
TL;DR: Efficient Capital Markets: A Review of Theory and Empirical Work Author(s): Eugene Fama Source: The Journal of Finance, Vol. 25, No. 2, Papers and Proceedings of the Twenty-Eighth Annual Meeting of the American Finance Association New York, N.Y. December, 28-30, 1969 (May, 1970), pp. 383-417 as mentioned in this paper
Related Papers (5)