scispace - formally typeset
Open AccessJournal Article

Using Maximum Entropy Model for chinese text categorization

Reads0
Chats0
TLDR
In this article, the authors used the maximum entropy model for text categorization and compared it to Bayes, KNN, and SVM, and showed that its performance is higher than Bayes and comparable with SVM.
Abstract
Maximum Entropy Model is a probability estimation technique widely used for a variety of natural language tasks. It offers a clean and accommodable frame to combine diverse pieces of contextual information to estimate the probability of a certain linguistics phenomena. This approach for many tasks of NLP perform near state-of-the-art level, or outperform other competing probability methods when trained and tested under similar conditions. In this paper, we use maximum entropy model for text categorization. We compare and analyze its categorization performance using different approaches for text feature generation, different number of features and smoothing technique. Moreover, in experiments we compare it to Bayes, KNN and SVM, and show that its performance is higher than Bayes and comparable with KNN and SVM. We think it is a promising technique for text categorization.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Latent semantic analysis for text categorization using neural network

TL;DR: Experimental results show that the models using MBPNN outperform than the basic BPNN and the application of LSA for this system can lead to dramatic dimensionality reduction while achieving good classification results.
Journal ArticleDOI

Advances in Machine Learning Based Text Categorization

TL;DR: It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization.
Proceedings ArticleDOI

LSTM-CNN Hybrid Model for Text Classification

TL;DR: A hybrid model of LSTM and CNN is proposed that can effectively improve the accuracy of text classification and the performance of the hybrid model is compared with that of other models in the experiment.
Journal ArticleDOI

Text categorization based on combination of modified back propagation neural network and latent semantic analysis

TL;DR: The MBPNN is proposed to accelerate the training speed of BPNN and improve the categorization accuracy, and the application of LSA for the system can lead to dramatic dimensionality reduction while achieving good classification results.
Proceedings ArticleDOI

Research on Short Text Classification Algorithm Based on Statistics and Rules

TL;DR: On the foundation of several common used classic text classification algorithms, mainly according to the major feature extraction methods, the short text classification based on statistics and rules is proposed and has better performance than other algorithms.
References
More filters
Journal ArticleDOI

An Evaluation of Statistical Approaches to Text Categorization

TL;DR: Analysis and empirical evidence suggest that the evaluation results on some versions of Reuters were significantly affected by the inclusion of a large portion of unlabelled documents, mading those results difficult to interpret and leading to considerable confusions in the literature.

Using Maximum Entropy for Text Classification

TL;DR: This paper uses maximum entropy techniques for text classification by estimating the conditional distribution of the class variable given the document by comparing accuracy to naive Bayes and showing that maximum entropy is sometimes significantly better, but also sometimes worse.

Maximum entropy models for natural language ambiguity resolution

TL;DR: This thesis demonstrates that several important kinds of natural language ambiguities can be resolved to state-of-the-art accuracies using a single statistical modeling technique based on the principle of maximum entropy.

A Simple Introduction to Maximum Entropy Models for Natural Language Processing

TL;DR: The goal of this report is to provide enough detail to re implement the maximum entropy models described in Reynar and Ratnaparkhi and also to provide a simple explanation of the max imum entropy formalism.
Proceedings ArticleDOI

Smoothing methods in maximum entropy language modeling

TL;DR: It is shown that straightforward maximum entropy models with nested features and discounted feature counts approximate backing-off smoothed relative frequencies models with Kneser's advanced marginal backoff distribution, and perplexity results for nested and non-nested features are shown.