Word Embedding, Neural Networks and Text Classification: What is the State-of-the-Art?

doi:10.5282/JUMS/V4I1PP35-62

Open AccessDOI

Word Embedding, Neural Networks and Text Classification: What is the State-of-the-Art?

Estevan Vilar

- Vol. 4, Iss: 1, pp 35-62

Chats0

TLDR

This bachelor thesis introduces the machine learning methodology of text classification and proposes a method for the models to cope with words absent from a training corpus and identifies and discusses the current development of Convolutional Neural Networks and Recurrent Neural Networks from a text classification perspective.

Abstract:

In this bachelor thesis, I first introduce the machine learning methodology of text classification with the goal to describe the functioning of neural networks. Then, I identify and discuss the current development of Convolutional Neural Networks and Recurrent Neural Networks from a text classification perspective and compare both models. Furthermore, I introduce different techniques used to translate textual information in a language comprehensible by the computer, which ultimately serve as inputs for the models previously discussed. From there, I propose a method for the models to cope with words absent from a training corpus. This first part has also the goal to facilitate the access to the machine learning world to a broader audience than computer science students and experts. To test the proposal, I implement and compare two state-of-the-art models and eight different word representations using pre-trained vectors on a dataset given by LogMeIn and on a common benchmark. I find that, with my configuration, Convolutional Neural Networks are easier to train and are also yielding better results. Nevertheless, I highlight that models that combine both architectures can potentially have a better performance, but need more work on identifying appropriate hyperparameters for training. Finally, I find that the efficacy of word embedding methods depends not only on the dataset but also on the model used to tackle the subsequent task. In my context, they can boost performance by up to 10.2% compared to a random initialization. However, further investigations are necessary to evaluate the value of my proposal with a corpus that contains a greater ratio of unknown relevant words. Keywords: neural networks; machine learning; word embedding; text classification; business analytics

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Learn#: A Novel incremental learning method for text classification

Shan Guangxu, +4 more

- 01 Jun 2020 -

Expert Systems With Applications

TL;DR: A novel incremental learning strategy that has the advantage of a shorter training time than the One-Time model, because it only needs to train a new Student model each time, without changing the existing Student models.

...read moreread less

Journal ArticleDOI

A classification method based on encoder‐decoder structure with paper content

Yi Yin, +3 more

- 24 Mar 2020 -

Concurrency and Computation: Practice an...

TL;DR: This paper proves the effectiveness of the proposed paper classification method by evaluating the paper data in web of science and obtaining relevant experimental results.

...read moreread less

Proceedings ArticleDOI

Voting-Based Multiple Classification Approach for Turkish News Texts

Basak Buluz, +2 more

TL;DR: In this study, a dataset consisting of Turkish news content Kemik prepared by Yıldız Technical University, Natural Language Processing Group, used, a hierarchical approach based on a voting structure is adopted by using machine learning based approaches.

...read moreread less

Posted Content

EEMC: Embedding Enhanced Multi-tag Classification.

Yanlin Li, +2 more

- 29 Sep 2020 -

arXiv: Learning

TL;DR: This work uses representation learning technology to map raw data to a low-dimensional feature space, and finds that the virtual data generated by simple linear operation, in representation space, still retains the information of the raw data.

...read moreread less

Dissertation

Virtual Assistant Design for Water Systems Operation

Yousra Mohamed

TL;DR: This research developed a named entity recognizer that is able to infer the semantics in the water field by leveraging state-of-the art methods for word embeddings and leveraged Chatbot frameworks architecture to provide a context aware virtual assistance experience.

...read moreread less