Using Maximum Entropy Model for chinese text categorization

Open AccessJournal Article

Using Maximum Entropy Model for chinese text categorization

Ronglu Li, +3 more

- 01 Jan 2004 -

Lecture Notes in Computer Science

- pp 578-587

Chats0

TLDR

In this article, the authors used the maximum entropy model for text categorization and compared it to Bayes, KNN, and SVM, and showed that its performance is higher than Bayes and comparable with SVM.

Abstract:

Maximum Entropy Model is a probability estimation technique widely used for a variety of natural language tasks. It offers a clean and accommodable frame to combine diverse pieces of contextual information to estimate the probability of a certain linguistics phenomena. This approach for many tasks of NLP perform near state-of-the-art level, or outperform other competing probability methods when trained and tested under similar conditions. In this paper, we use maximum entropy model for text categorization. We compare and analyze its categorization performance using different approaches for text feature generation, different number of features and smoothing technique. Moreover, in experiments we compare it to Bayes, KNN and SVM, and show that its performance is higher than Bayes and comparable with KNN and SVM. We think it is a promising technique for text categorization.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Latent semantic analysis for text categorization using neural network

Bo Yu, +2 more

- 01 Dec 2008 -

Knowledge Based Systems

TL;DR: Experimental results show that the models using MBPNN outperform than the basic BPNN and the application of LSA for this system can lead to dramatic dimensionality reduction while achieving good classification results.

...read moreread less

Journal ArticleDOI

Advances in Machine Learning Based Text Categorization

Su Jinshu, +2 more

- 01 Jan 2006 -

Journal of Software

TL;DR: It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization.

...read moreread less

Proceedings ArticleDOI

LSTM-CNN Hybrid Model for Text Classification

Jiarui Zhang, +3 more

TL;DR: A hybrid model of LSTM and CNN is proposed that can effectively improve the accuracy of text classification and the performance of the hybrid model is compared with that of other models in the experiment.

...read moreread less

Journal ArticleDOI

Text categorization based on combination of modified back propagation neural network and latent semantic analysis

Wei Wang, +1 more

- 09 Oct 2009 -

Neural Computing and Applications

TL;DR: The MBPNN is proposed to accelerate the training speed of BPNN and improve the categorization accuracy, and the application of LSA for the system can lead to dramatic dimensionality reduction while achieving good classification results.

...read moreread less

Proceedings ArticleDOI

Research on Short Text Classification Algorithm Based on Statistics and Rules

Zhou Fa-guo, +3 more

TL;DR: On the foundation of several common used classic text classification algorithms, mainly according to the major feature extraction methods, the short text classification based on statistics and rules is proposed and has better performance than other algorithms.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

An Evaluation of Statistical Approaches to Text Categorization

Yiming Yang

- 15 May 1999 -

Information Retrieval

TL;DR: Analysis and empirical evidence suggest that the evaluation results on some versions of Reuters were significantly affected by the inclusion of a large portion of unlabelled documents, mading those results difficult to interpret and leading to considerable confusions in the literature.

...read moreread less

Using Maximum Entropy for Text Classification

Kamal Nigam, +2 more

TL;DR: This paper uses maximum entropy techniques for text classification by estimating the conditional distribution of the class variable given the document by comparing accuracy to naive Bayes and showing that maximum entropy is sometimes significantly better, but also sometimes worse.

...read moreread less

Maximum entropy models for natural language ambiguity resolution

Adwait Ratnaparkhi, +1 more

TL;DR: This thesis demonstrates that several important kinds of natural language ambiguities can be resolved to state-of-the-art accuracies using a single statistical modeling technique based on the principle of maximum entropy.

...read moreread less

A Simple Introduction to Maximum Entropy Models for Natural Language Processing

Adwait Ratnaparkhi

TL;DR: The goal of this report is to provide enough detail to re implement the maximum entropy models described in Reynar and Ratnaparkhi and also to provide a simple explanation of the max imum entropy formalism.

...read moreread less

Proceedings ArticleDOI

Smoothing methods in maximum entropy language modeling

Sven C. Martin, +2 more

TL;DR: It is shown that straightforward maximum entropy models with nested features and discounted feature counts approximate backing-off smoothed relative frequencies models with Kneser's advanced marginal backoff distribution, and perplexity results for nested and non-nested features are shown.

...read moreread less

Using Maximum Entropy Model for chinese text categorization

Citations

Latent semantic analysis for text categorization using neural network

Advances in Machine Learning Based Text Categorization

LSTM-CNN Hybrid Model for Text Classification

Text categorization based on combination of modified back propagation neural network and latent semantic analysis

Research on Short Text Classification Algorithm Based on Statistics and Rules

References

An Evaluation of Statistical Approaches to Text Categorization

Using Maximum Entropy for Text Classification

Maximum entropy models for natural language ambiguity resolution

A Simple Introduction to Maximum Entropy Models for Natural Language Processing

Smoothing methods in maximum entropy language modeling

Related Papers (5)

Design and Evaluation of Approaches to Automatic Chinese Text Categorization

Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification

Parameter estimation of one-class SVM on imbalance text classification

Ambiguity measure feature-selection algorithm

Classification of Text Documents