scispace - formally typeset
Search or ask a question
Author

Li Ronglu

Bio: Li Ronglu is an academic researcher from Fudan University. The author has contributed to research in topics: Classification rule & n-gram. The author has an hindex of 2, co-authored 3 publications receiving 54 citations.

Papers
More filters
Journal ArticleDOI
Chen Xiao-Yun, Chen Yi, Wang Lei, Li Ronglu, Hu Yunfa 
TL;DR: This study illuminates that the word frequency is helpful for improving the accuracy of the association categorization and the classification rule tree can improve the efficiency of the Association categorization.
Abstract: Association categorization approach based on frequent patterns has been recently presented, which builds the classification rules according to frequent patterns in various categories and classifies the new text employing these rules. But there are two shortages when the method is applied to classify text data: one is that the method ignores the information about word's frequency in a text; another is that the rule pruning to improve the classification efficiency will lead to obvious descending of accuracy when mass rules are generated. Therefore, a text categorization algorithm based on frequent patterns with term frequency is presented. This study illuminates that the word frequency is helpful for improving the accuracy of the association categorization and the classification rule tree can improve the efficiency of the association classification. The result of experiments shows the performance of association classification is better than three typical text classification methods Bayes, kNN (k nearest neighbor) and SVM (support vector machines), so it is a promising text classification method.

7 citations

Book ChapterDOI
Ma Haibing1, Wang Chen1, Li Ronglu1, Liu Yong1, Hu Yunfa1 
29 Mar 2005
TL;DR: TG, an efficient pattern growth algorithm for mining frequent embedded suttees in a forest of rooted, labeled, and ordered trees, is presented and it is found that TG outperforms TreeMiner, one of the fastest methods proposed before, by a factor of 4 to 15.
Abstract: Methods for mining frequent trees are widely used in domains like bioinformatics, web-mining, chemical compound structure mining, and so on. In this paper, we present TG, an efficient pattern growth algorithm for mining frequent embedded suttees in a forest of rooted, labeled, and ordered trees. It uses rightmost path expansion scheme to construct complete pattern growth space, and creates a projected database for every grow point of the pattern ready to grow. Then, the problem is transformed from mining frequent trees to finding frequent nodes in the projected database. We conduct detailed experiments to test its performance and scalability and find that TG outperforms TreeMiner, one of the fastest methods proposed before, by a factor of 4 to 15.

1 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Experimental results show that the models using MBPNN outperform than the basic BPNN and the application of LSA for this system can lead to dramatic dimensionality reduction while achieving good classification results.
Abstract: New text categorization models using back-propagation neural network (BPNN) and modified back-propagation neural network (MBPNN) are proposed. An efficient feature selection method is used to reduce the dimensionality as well as improve the performance. The basic BPNN learning algorithm has the drawback of slow training speed, so we modify the basic BPNN learning algorithm to accelerate the training speed. The categorization accuracy also has been improved consequently. Traditional word-matching based text categorization system uses vector space model (VSM) to represent the document. However, it needs a high dimensional space to represent the document, and does not take into account the semantic relationship between terms, which can also lead to poor classification accuracy. Latent semantic analysis (LSA) can overcome the problems caused by using statistically derived conceptual indices instead of individual words. It constructs a conceptual vector space in which each term or document is represented as a vector in the space. It not only greatly reduces the dimensionality but also discovers the important associative relationship between terms. We test our categorization models on 20-newsgroup data set, experimental results show that the models using MBPNN outperform than the basic BPNN. And the application of LSA for our system can lead to dramatic dimensionality reduction while achieving good classification results.

115 citations

Journal ArticleDOI
TL;DR: It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization.
Abstract: In recent years, there have been extensive studies and rapid progresses in automatic text categorization, which is one of the hotspots and key techniques in the information retrieval and data mining field. Highlighting the state-of-art challenging issues and research trends for content information processing of Internet and other complex applications, this paper presents a survey on the up-to-date development in text categorization based on machine learning, including model, algorithm and evaluation. It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization. Possible solutions to these problems are also discussed respectively. Finally, some future directions of research are given.

112 citations

Journal ArticleDOI
TL;DR: A novel deep neural network model is proposed, Attention-Based BiGRU-CNN network (ABBC), which combines the characteristics and advantages of convolutional neural network, attention mechanism and recurrent neural network and achieves the best performance in the Chinese question classification task.
Abstract: Chinese question classification is one of the essential tasks in nature language processing (NLP) for Chinese language due to its distinctive characteristics. Methods presented in the literature are usually based on rules or traditional machine learning methods, which require manually created rules or features. Thus, the accuracy of the classification is constrained by inherent limitations of these methods. As deep learning-based methods have been proved to be able to mine deep information of text, to alleviate the problem, this article proposes a novel deep neural network model, Attention-Based BiGRU-CNN network (ABBC); and applies it to Chinese question classification task. The model combines the characteristics and advantages of convolutional neural network, attention mechanism and recurrent neural network. Our model can not only extract the features of Chinese questions effectively, but also learn the context information of words to solve the problem that the Text-CNN model can lose position feature. By comparing out model to four other classic models, the experimental results show that our model achieves the best performance in the Chinese question classification task.

73 citations

Proceedings ArticleDOI
01 Oct 2018
TL;DR: A hybrid model of LSTM and CNN is proposed that can effectively improve the accuracy of text classification and the performance of the hybrid model is compared with that of other models in the experiment.
Abstract: Text classification is a classic task in the field of natural language processing. however, the existing methods of text classification tasks still need to be improved because of the complex abstraction of text semantic information and the strong relecvance of context. In this paper, we combine the advantages of two traditional neural network model, Long Short-Term Memory(LSTM) and Convolutional Neural Network(CNN). LSTM can effectively preserve the characteristics of historical information in long text sequences, and extract local features of text by using the structure of CNN. We proposes a hybrid model of LSTM and CNN, construct CNN model on the top of LSTM, the text feature vector output from LSTM is further extracted by CNN structure. The performance of the hybrid model is compared with that of other models in the experiment. The experimental results show that the hybrid model can effectively improve the accuracy of text classification.

67 citations

Journal ArticleDOI
TL;DR: The MBPNN is proposed to accelerate the training speed of BPNN and improve the categorization accuracy, and the application of LSA for the system can lead to dramatic dimensionality reduction while achieving good classification results.
Abstract: This paper proposed a new text categorization model based on the combination of modified back propagation neural network (MBPNN) and latent semantic analysis (LSA). The traditional back propagation neural network (BPNN) has slow training speed and is easy to trap into a local minimum, and it will lead to a poor performance and efficiency. In this paper, we propose the MBPNN to accelerate the training speed of BPNN and improve the categorization accuracy. LSA can overcome the problems caused by using statistically derived conceptual indices instead of individual words. It constructs a conceptual vector space in which each term or document is represented as a vector in the space. It not only greatly reduces the dimension but also discovers the important associative relationship between terms. We test our categorization model on 20-newsgroup corpus and reuter-21578 corpus, experimental results show that the MBPNN is much faster than the traditional BPNN. It also enhances the performance of the traditional BPNN. And the application of LSA for our system can lead to dramatic dimensionality reduction while achieving good classification results.

36 citations