scispace - formally typeset
Journal ArticleDOI

Latent semantic analysis for text categorization using neural network

TLDR
Experimental results show that the models using MBPNN outperform than the basic BPNN and the application of LSA for this system can lead to dramatic dimensionality reduction while achieving good classification results.
Abstract
New text categorization models using back-propagation neural network (BPNN) and modified back-propagation neural network (MBPNN) are proposed. An efficient feature selection method is used to reduce the dimensionality as well as improve the performance. The basic BPNN learning algorithm has the drawback of slow training speed, so we modify the basic BPNN learning algorithm to accelerate the training speed. The categorization accuracy also has been improved consequently. Traditional word-matching based text categorization system uses vector space model (VSM) to represent the document. However, it needs a high dimensional space to represent the document, and does not take into account the semantic relationship between terms, which can also lead to poor classification accuracy. Latent semantic analysis (LSA) can overcome the problems caused by using statistically derived conceptual indices instead of individual words. It constructs a conceptual vector space in which each term or document is represented as a vector in the space. It not only greatly reduces the dimensionality but also discovers the important associative relationship between terms. We test our categorization models on 20-newsgroup data set, experimental results show that the models using MBPNN outperform than the basic BPNN. And the application of LSA for our system can lead to dramatic dimensionality reduction while achieving good classification results.

read more

Citations
More filters
Journal ArticleDOI

A Review of Machine Learning Algorithms for Text-Documents Classification

TL;DR: This paper provides a review of the theory and methods of document classification and text mining, focusing on the existing techniques and methodologies, focused mainly on text representation and machine learning techniques.
Journal ArticleDOI

A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm

TL;DR: Two-stage feature selection and feature extraction is used to improve the performance of text categorization and the proposed model is able to achieve high categorization effectiveness as measured by precision, recall and F-measure.
Journal ArticleDOI

A novel probabilistic feature selection method for text classification

TL;DR: This study proposes a novel filter based probabilistic feature selection method, namely distinguishing feature selector (DFS), for text classification that is compared with well-known filter approaches including chi square, information gain, Gini index and deviation from Poisson distribution.
Journal Article

Hybrid feature selection for text classification

TL;DR: A hybrid feature selection strategy, which consists of both filter and wrapper feature selection steps, is proposed to comprehensively analyze the redundancy or relevancy of the text features selected by different methods in the case of different feature set sizes, dataset characteristics, classifiers, and success measures.
Journal ArticleDOI

Text classification using genetic algorithm oriented latent semantic features

TL;DR: Experimental results demonstrate that GALSF outperforms both LSI and filter-based feature selection methods on benchmark datasets for various feature dimensions.
References
More filters
Journal ArticleDOI

Indexing by Latent Semantic Analysis

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Book ChapterDOI

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

TL;DR: This paper explores the use of Support Vector Machines for learning text classifiers from examples and analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task.
Proceedings ArticleDOI

A re-examination of text categorization methods

TL;DR: The results show that SVM, kNN and LLSF signi cantly outperform NNet and NB when the number of positive training instances per category are small, and that all the methods perform comparably when the categories are over 300 instances.
Journal ArticleDOI

Using linear algebra for intelligent information retrieval

TL;DR: A lexical match between words in users’ requests and those in or assigned to documents in a database helps retrieve textual materials from scientific databases.
Proceedings ArticleDOI

Feature selection, perceptron learning, and a usability case study for text categorization

TL;DR: An automated learning approach to text categorization based on perception learning and a new feature selection metric, called correlation coefficient, is described and empirical results indicate that this approach outperforms the best published results on this % uters collection.
Related Papers (5)