Latent semantic analysis for text categorization using neural network

doi:10.1016/J.KNOSYS.2008.03.045

Journal ArticleDOI

Latent semantic analysis for text categorization using neural network

Bo Yu, +2 more

- 01 Dec 2008 -

Knowledge Based Systems

- Vol. 21, Iss: 8, pp 900-904

TLDR

Experimental results show that the models using MBPNN outperform than the basic BPNN and the application of LSA for this system can lead to dramatic dimensionality reduction while achieving good classification results.

Abstract:

New text categorization models using back-propagation neural network (BPNN) and modified back-propagation neural network (MBPNN) are proposed. An efficient feature selection method is used to reduce the dimensionality as well as improve the performance. The basic BPNN learning algorithm has the drawback of slow training speed, so we modify the basic BPNN learning algorithm to accelerate the training speed. The categorization accuracy also has been improved consequently. Traditional word-matching based text categorization system uses vector space model (VSM) to represent the document. However, it needs a high dimensional space to represent the document, and does not take into account the semantic relationship between terms, which can also lead to poor classification accuracy. Latent semantic analysis (LSA) can overcome the problems caused by using statistically derived conceptual indices instead of individual words. It constructs a conceptual vector space in which each term or document is represented as a vector in the space. It not only greatly reduces the dimensionality but also discovers the important associative relationship between terms. We test our categorization models on 20-newsgroup data set, experimental results show that the models using MBPNN outperform than the basic BPNN. And the application of LSA for our system can lead to dramatic dimensionality reduction while achieving good classification results.

Latent semantic analysis for text categorization using neural network

Citations

A Review of Machine Learning Algorithms for Text-Documents Classification

A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm

A novel probabilistic feature selection method for text classification

Hybrid feature selection for text classification

Text classification using genetic algorithm oriented latent semantic features

References

Indexing by Latent Semantic Analysis

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

A re-examination of text categorization methods

Using linear algebra for intelligent information retrieval

Feature selection, perceptron learning, and a usability case study for text categorization

Related Papers (5)

Machine learning in automated text categorization

Indexing by Latent Semantic Analysis

Term Weighting Approaches in Automatic Text Retrieval

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

A vector space model for automatic indexing