scispace - formally typeset
Book ChapterDOI: 10.1007/978-981-13-7082-3_42

A Hybrid Feature Selection Approach for Handling a High-Dimensional Data

01 Jan 2019-pp 365-373
Abstract: We proposed a Hybrid Feature selection method, the combination of Mutual Information (MI) a filter method, and Recursive Feature Elimination (RFE) a wrapper method. The methodology combines the strengths of both filter and Wrapper method. Performance of the proposed method is measured on three benchmark datasets (Ionosphere, Libras Movement, and Clean) from the UCI Repository. We compared the classification accuracy of the proposed Hybrid method with MI, RFE, Original Features by using random forest classifier. The performance are compared using four classification measures i.e. 1. F1-Score 2. Recall 3. Precession 4. Accuracy. It is evidence from the result analysis that the proposed hybrid method has out performed other methods.

...read more

Topics: Feature selection (56%), Feature (computer vision) (54%), Mutual information (51%) ...read more
Citations
  More

Open accessJournal ArticleDOI: 10.1109/ACCESS.2019.2936346
19 Aug 2019-IEEE Access
Abstract: Feature Selection has been a significant preprocessing procedure for classification in the area of Supervised Machine Learning. It is mostly applied when the attribute set is very large. The large set of attributes often tend to misguide the classifier. Extensive research has been performed to increase the efficacy of the predictor by finding the optimal set of features. The feature subset should be such that it enhances the classification accuracy by the removal of redundant features. We propose a new feature selection mechanism, an amalgamation of the filter and the wrapper techniques by taking into consideration the benefits of both the methods. Our hybrid model is based on a two phase process where we rank the features and then choose the best subset of features based on the ranking. We validated our model with various datasets, using multiple evaluation metrics. Furthermore, we have also compared and analyzed our results with previous works. The proposed model outperformed many existent algorithms and has given us good results.

...read more

Topics: Feature selection (63%), Classifier (UML) (51%), Ranking (50%)

12 Citations


Book ChapterDOI: 10.1007/978-3-030-35288-2_42
02 Dec 2019-
Abstract: Incompleteness is one of the challenging issues in data science. One approach to tackle this issue is using imputation methods to estimate the missing values in incomplete data sets. In spite of the popularity of adopting this approach in several machine learning tasks, it has been rarely investigated in symbolic regression. In this work, a genetic programming (GP) based feature selection and ranking method is proposed and applied to high-dimensional symbolic regression with incomplete data. The main idea is to construct GP programs for each incomplete feature using other features as predictors. The predictors selected by these GP programs are then ranked based on the fitness values of the best constructed GP programs and the frequency of occurrences of the predictors in these programs. The experimental work is conducted on high-dimensional data where the number of features is greater than the number of instances.

...read more

Topics: Imputation (statistics) (59%), Symbolic regression (59%), Genetic programming (56%) ...read more

7 Citations


Book ChapterDOI: 10.1007/978-3-030-34094-0_5
01 Jan 2020-
Abstract: Today, with the increase of data dimensions, many challenges are faced in many contexts including machine learning, informatics, and medicine. However, reducing data dimension can be considered as a basic method in handling high-dimensional data, because by reducing dimensions, applying many of the existing operations on data is facilitated.

...read more

5 Citations


Proceedings ArticleDOI: 10.1145/3374135.3385285
G S Thejas1, Daniel Jimenez2, S. Sitharama Iyengar2, Jerry Miller2  +2 moreInstitutions (3)
02 Apr 2020-
Abstract: When seeking to obtain insights from massive amounts of data, supervised classification problems require preprocessing to optimize computation. Among the various steps in preprocessing, feature selection (FS) empowers machine learning methods only to receive relevant data. We propose hybrid FS methods using unsupervised classification, statistical scoring, and a wrapper method. Among our tests using twelve dataset problems, the increase in performance from our novel method against existing FS methods represents an advancement in supervised classification.

...read more

Topics: Feature selection (58%), Random forest (52%), Cross-validation (51%)

4 Citations


Open accessJournal ArticleDOI: 10.1016/J.ARRAY.2020.100033
01 Sep 2020-
Abstract: Big Data has received much attention in the multi-domain industry. In the digital and computing world, information is generated and collected at a rate that quickly exceeds the boundaries. The traditional data integration system interconnects the limited number of resources and is built with relatively stable and generally complex and time-consuming design activities. However, the rapid growth of these large data sets creates difficulties in learning heterogeneous data structures for integration and indexing. It also creates difficulty in information retrieval for the various data analysis requirements. In this paper, a probabilistic feature Patterns (PFP) approach using feature transformation and selection method is proposed for efficient data integration and utilizing the features latent semantic analysis (F-LSA) method for indexing the unsupervised multiple heterogeneous integrated cluster data sources. The PFP approach takes the advantage of the features transformation and selection mechanism to map and cluster the data for the integration, and an analysis of the data features context relation using LSA to provide the appropriate index for fast and accurate data extraction. A huge volume of BibText dataset from different publication sources are processed to evaluated to understand the effectiveness of the proposal. The analytical study and the outcome results show the improvisation in integration and indexing of the work.

...read more

Topics: Data integration (60%), Data extraction (57%), Data structure (57%) ...read more

3 Citations


References
  More

Open access
01 Jan 2007-

17,312 Citations


Journal ArticleDOI: 10.1109/TPAMI.2005.159
Hanchuan Peng1, Fuhui Long1, Chris Ding1Institutions (1)
Abstract: Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection. Then, we present a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.

...read more

Topics: Feature selection (63%), Minimum redundancy feature selection (61%), Feature extraction (55%) ...read more

7,109 Citations



Open accessJournal ArticleDOI: 10.1023/A:1025667309714
01 Oct 2003-Machine Learning
Abstract: Relief algorithms are general and successful attribute estimators. They are able to detect conditional dependencies between attributes and provide a unified view on the attribute estimation in regression and classification. In addition, their quality estimates have a natural interpretation. While they have commonly been viewed as feature subset selection methods that are applied in prepossessing step before a model is learned, they have actually been used successfully in a variety of settings, e.g., to select splits or to guide constructive induction in the building phase of decision or regression tree learning, as the attribute weighting method and also in the inductive logic programming. A broad spectrum of successful uses calls for especially careful investigation of various features Relief algorithms have. In this paper we theoretically and empirically investigate and discuss how and why they work, their theoretical and practical properties, their parameters, what kind of dependencies they detect, how do they scale up to large number of examples and features, how to sample data for them, how robust are they regarding the noise, how irrelevant and redundant attributes influence their output and how different metrics influences them.

...read more

Topics: Feature selection (55%), Decision tree (52%), Feature (machine learning) (52%) ...read more

2,235 Citations


Journal ArticleDOI: 10.1016/J.CHEMOLAB.2006.01.007
Abstract: In this paper we apply the recently introduced Random Forest-Recursive Feature Elimination (RF-RFE) algorithm to the identification of relevant features in the spectra produced by Proton Transfer Reaction-Mass Spectrometry (PTR-MS) analysis of agroindustrial products. The method is compared with the more traditional Support Vector Machine-Recursive Feature Elimination (SVM-RFE), extended to allow multiclass problems, and with a baseline method based on the Kruskal–Wallis statistic (KWS). In particular, we apply all selection methods to the discrimination of nine varieties of strawberries and six varieties of typical cheeses from Trentino Province, North Italy. Using replicated experiments we estimate unbiased generalization errors. Our results show that RF-RFE outperforms SVM-RFE and KWS on the task of finding small subsets of features with high discrimination levels on PTR-MS data sets. We also show how selection probabilities and features co-occurrence can be used to highlight the most relevant features for discrimination.

...read more

334 Citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20221
20216
20204
20192
Network Information
Related Papers (5)
01 Jan 2017

David Zhang, Dongmin Guo +1 more

25 Jun 2021

M. Sreedevi, Garikapati Manasa +1 more

22 Dec 2014

Nooshin Taheri, Hossein Nezamabadi-pour

View PDF