scispace - formally typeset
Search or ask a question
Author

Ronglu Li

Bio: Ronglu Li is an academic researcher from Fudan University. The author has contributed to research in topics: Bayes' theorem & Categorization. The author has an hindex of 2, co-authored 2 publications receiving 41 citations.

Papers
More filters
Journal Article
TL;DR: In this article, the authors used the maximum entropy model for text categorization and compared it to Bayes, KNN, and SVM, and showed that its performance is higher than Bayes and comparable with SVM.
Abstract: Maximum Entropy Model is a probability estimation technique widely used for a variety of natural language tasks. It offers a clean and accommodable frame to combine diverse pieces of contextual information to estimate the probability of a certain linguistics phenomena. This approach for many tasks of NLP perform near state-of-the-art level, or outperform other competing probability methods when trained and tested under similar conditions. In this paper, we use maximum entropy model for text categorization. We compare and analyze its categorization performance using different approaches for text feature generation, different number of features and smoothing technique. Moreover, in experiments we compare it to Bayes, KNN and SVM, and show that its performance is higher than Bayes and comparable with KNN and SVM. We think it is a promising technique for text categorization.

35 citations

Book ChapterDOI
14 Apr 2004
TL;DR: This work uses maximum entropy model for text categorization to compare and analyze its categorization performance using different approaches for text feature generation, different number of features and smoothing technique, and thinks it is a promising technique forText categorization.
Abstract: Maximum Entropy Model is a probability estimation technique widely used for a variety of natural language tasks. It offers a clean and accommodable frame to combine diverse pieces of contextual information to estimate the probability of a certain linguistics phenomena. This approach for many tasks of NLP perform near state-of-the-art level, or outperform other competing probability methods when trained and tested under similar conditions. In this paper, we use maximum entropy model for text categorization. We compare and analyze its categorization performance using different approaches for text feature generation, different number of features and smoothing technique. Moreover, in experiments we compare it to Bayes, KNN and SVM, and show that its performance is higher than Bayes and comparable with KNN and SVM. We think it is a promising technique for text categorization.

6 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Experimental results show that the models using MBPNN outperform than the basic BPNN and the application of LSA for this system can lead to dramatic dimensionality reduction while achieving good classification results.
Abstract: New text categorization models using back-propagation neural network (BPNN) and modified back-propagation neural network (MBPNN) are proposed. An efficient feature selection method is used to reduce the dimensionality as well as improve the performance. The basic BPNN learning algorithm has the drawback of slow training speed, so we modify the basic BPNN learning algorithm to accelerate the training speed. The categorization accuracy also has been improved consequently. Traditional word-matching based text categorization system uses vector space model (VSM) to represent the document. However, it needs a high dimensional space to represent the document, and does not take into account the semantic relationship between terms, which can also lead to poor classification accuracy. Latent semantic analysis (LSA) can overcome the problems caused by using statistically derived conceptual indices instead of individual words. It constructs a conceptual vector space in which each term or document is represented as a vector in the space. It not only greatly reduces the dimensionality but also discovers the important associative relationship between terms. We test our categorization models on 20-newsgroup data set, experimental results show that the models using MBPNN outperform than the basic BPNN. And the application of LSA for our system can lead to dramatic dimensionality reduction while achieving good classification results.

115 citations

Journal ArticleDOI
TL;DR: It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization.
Abstract: In recent years, there have been extensive studies and rapid progresses in automatic text categorization, which is one of the hotspots and key techniques in the information retrieval and data mining field. Highlighting the state-of-art challenging issues and research trends for content information processing of Internet and other complex applications, this paper presents a survey on the up-to-date development in text categorization based on machine learning, including model, algorithm and evaluation. It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization. Possible solutions to these problems are also discussed respectively. Finally, some future directions of research are given.

112 citations

Proceedings ArticleDOI
01 Oct 2018
TL;DR: A hybrid model of LSTM and CNN is proposed that can effectively improve the accuracy of text classification and the performance of the hybrid model is compared with that of other models in the experiment.
Abstract: Text classification is a classic task in the field of natural language processing. however, the existing methods of text classification tasks still need to be improved because of the complex abstraction of text semantic information and the strong relecvance of context. In this paper, we combine the advantages of two traditional neural network model, Long Short-Term Memory(LSTM) and Convolutional Neural Network(CNN). LSTM can effectively preserve the characteristics of historical information in long text sequences, and extract local features of text by using the structure of CNN. We proposes a hybrid model of LSTM and CNN, construct CNN model on the top of LSTM, the text feature vector output from LSTM is further extracted by CNN structure. The performance of the hybrid model is compared with that of other models in the experiment. The experimental results show that the hybrid model can effectively improve the accuracy of text classification.

67 citations

Journal ArticleDOI
TL;DR: The MBPNN is proposed to accelerate the training speed of BPNN and improve the categorization accuracy, and the application of LSA for the system can lead to dramatic dimensionality reduction while achieving good classification results.
Abstract: This paper proposed a new text categorization model based on the combination of modified back propagation neural network (MBPNN) and latent semantic analysis (LSA). The traditional back propagation neural network (BPNN) has slow training speed and is easy to trap into a local minimum, and it will lead to a poor performance and efficiency. In this paper, we propose the MBPNN to accelerate the training speed of BPNN and improve the categorization accuracy. LSA can overcome the problems caused by using statistically derived conceptual indices instead of individual words. It constructs a conceptual vector space in which each term or document is represented as a vector in the space. It not only greatly reduces the dimension but also discovers the important associative relationship between terms. We test our categorization model on 20-newsgroup corpus and reuter-21578 corpus, experimental results show that the MBPNN is much faster than the traditional BPNN. It also enhances the performance of the traditional BPNN. And the application of LSA for our system can lead to dramatic dimensionality reduction while achieving good classification results.

36 citations

Proceedings ArticleDOI
29 Jul 2010
TL;DR: On the foundation of several common used classic text classification algorithms, mainly according to the major feature extraction methods, the short text classification based on statistics and rules is proposed and has better performance than other algorithms.
Abstract: In this paper, we introduced the overview of short text research and the short text classification firstly. On the foundation of several common used classic text classification algorithms, mainly according to the major feature extraction methods, the short text classification based on statistics and rules is proposed. Experiments show that this algorithm has better performance than other algorithms. In order to improve the recall rate of short text classification, two-steps classification method is put forward.

33 citations