Journal ArticleDOI
A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm
Reads0
Chats0
TLDR
Two-stage feature selection and feature extraction is used to improve the performance of text categorization and the proposed model is able to achieve high categorization effectiveness as measured by precision, recall and F-measure.Abstract:
Text categorization is widely used when organizing documents in a digital form. Due to the increasing number of documents in digital form, automated text categorization has become more promising in the last ten years. A major problem of text categorization is its large number of features. Most of those are irrelevant noise that can mislead the classifier. Therefore, feature selection is often used in text categorization to reduce the dimensionality of the feature space and to improve performance. In this study, two-stage feature selection and feature extraction is used to improve the performance of text categorization. In the first stage, each term within the document is ranked depending on their importance for classification using the information gain (IG) method. In the second stage, genetic algorithm (GA) and principal component analysis (PCA) feature selection and feature extraction methods are applied separately to the terms which are ranked in decreasing order of importance, and a dimension reduction is carried out. Thereby, during text categorization, terms of less importance are ignored, and feature selection and extraction methods are applied to the terms of highest importance; thus, the computational time and complexity of categorization is reduced. To evaluate the effectiveness of dimension reduction methods on our purposed model, experiments are conducted using the k-nearest neighbour (KNN) and C4.5 decision tree algorithm on Reuters-21,578 and Classic3 datasets collection for text categorization. The experimental results show that the proposed model is able to achieve high categorization effectiveness as measured by precision, recall and F-measure.read more
Citations
More filters
Implication of Deep Learning for the automation of Design Patterns Organization
Shahid Hussain,Wai Jacky Keung,Arif Ali Khan,Francesco Piccialli,Adnan khunzada,Salvatore Cuomo,Awais Ahmad,Gwanggil Jeon +7 more
TL;DR: In this article, the authors proposed an approach by leveraging a powerful deep learning algorithm named Deep Belief Network (DBN) which learns on the semantic representation of documents formulated in the form of feature vectors and performed a case study in the context of a text categorization based automated system used for the classification and selection of software design patterns.
Proceedings ArticleDOI
Correlation-Supported Composite Service Reselection
TL;DR: This paper considers task correlations for runtime rebinding by extracting the QoS dependencies among services from the log repository through the APRIORI data mining method and mapping them to the tasks correlations by the defined mapping rules.
Journal ArticleDOI
Mother’s Lifestyle Feature Relevance for NICU and Preterm Birth Prediction
Himani Deshpande,Leena Ragha +1 more
TL;DR: Out of all the features hypertension, diabetes, PCOS and consumption of outside food during teenage are found to be the most relevant features for preterm birth prediction and prediction for neonatal intensive care unit (NICU) facility requirement for newborn.
Journal ArticleDOI
Cluster Analysis of US COVID-19 Infected States for Vaccine Distribution
TL;DR: This study collects medical indicators for each state in the United States from 2020 to 2021, and through feature selection, each state is clustered according to the epidemic’s severity, showing that the Cascade K-means cluster analysis has the highest accuracy.
References
More filters
Book
Genetic algorithms in search, optimization, and machine learning
TL;DR: In this article, the authors present the computer techniques, mathematical tools, and research results that will enable both students and practitioners to apply genetic algorithms to problems in many fields, including computer programming and mathematics.
Book
Adaptation in natural and artificial systems
TL;DR: Names of founding work in the area of Adaptation and modiication, which aims to mimic biological optimization, and some (Non-GA) branches of AI.
Journal ArticleDOI
Induction of Decision Trees
TL;DR: In this paper, an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail, is described, and a reported shortcoming of the basic algorithm is discussed.
Journal ArticleDOI
Nearest neighbor pattern classification
Thomas M. Cover,Peter E. Hart +1 more
TL;DR: The nearest neighbor decision rule assigns to an unclassified sample point the classification of the nearest of a set of previously classified points, so it may be said that half the classification information in an infinite sample set is contained in the nearest neighbor.
Journal ArticleDOI
Term Weighting Approaches in Automatic Text Retrieval
Gerard Salton,Chris Buckley +1 more
TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.