scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Feature Selection Based on Structured Sparsity: A Comprehensive Study

TL;DR: This paper compares the differences and commonalities of these methods based on regression and regularization strategies, but also provides useful guidelines to practitioners working in related fields to guide them how to do feature selection.
Abstract: Feature selection (FS) is an important component of many pattern recognition tasks. In these tasks, one is often confronted with very high-dimensional data. FS algorithms are designed to identify the relevant feature subset from the original features, which can facilitate subsequent analysis, such as clustering and classification. Structured sparsity-inducing feature selection (SSFS) methods have been widely studied in the last few years, and a number of algorithms have been proposed. However, there is no comprehensive study concerning the connections between different SSFS methods, and how they have evolved. In this paper, we attempt to provide a survey on various SSFS methods, including their motivations and mathematical representations. We then explore the relationship among different formulations and propose a taxonomy to elucidate their evolution. We group the existing SSFS methods into two categories, i.e., vector-based feature selection (feature selection based on lasso) and matrix-based feature selection (feature selection based on ${l_{r,p}}$ -norm). Furthermore, FS has been combined with other machine learning algorithms for specific applications, such as multitask learning, multilabel learning, multiview learning, classification, and clustering. This paper not only compares the differences and commonalities of these methods based on regression and regularization strategies, but also provides useful guidelines to practitioners working in related fields to guide them how to do feature selection.
Citations
More filters
Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a new discriminative correlation filter (DCF) based tracking method, which enables joint spatial-temporal filter learning in a lower dimensional discriminativity manifold, and applied structured spatial sparsity constraints to multi-channel filters.
Abstract: With efficient appearance learning models, discriminative correlation filter (DCF) has been proven to be very successful in recent video object tracking benchmarks and competitions. However, the existing DCF paradigm suffers from two major issues, i.e., spatial boundary effect and temporal filter degradation. To mitigate these challenges, we propose a new DCF-based tracking method. The key innovations of the proposed method include adaptive spatial feature selection and temporal consistent constraints, with which the new tracker enables joint spatial-temporal filter learning in a lower dimensional discriminative manifold. More specifically, we apply structured spatial sparsity constraints to multi-channel filters. Consequently, the process of learning spatial filters can be approximated by the lasso regularization. To encourage temporal consistency, the filter model is restricted to lie around its historical value and updated locally to preserve the global structure in the manifold. Last, a unified optimization framework is proposed to jointly select temporal consistency preserving spatial features and learn discriminative filters with the augmented Lagrangian method. Qualitative and quantitative evaluations have been conducted on a number of well-known benchmarking datasets such as OTB2013, OTB50, OTB100, Temple-Colour, UAV123, and VOT2018. The experimental results demonstrate the superiority of the proposed method over the state-of-the-art approaches.

233 citations

Journal ArticleDOI
TL;DR: This work designs a triple‐band absorber using the REACTIVE method, where a deep learning model computes the metasurface structure automatically through inputting the desired absorption rate.
Abstract: Metasurfaces provide unprecedented routes to manipulations on electromagnetic waves, which can realize many exotic functionalities. Despite the rapid development of metasurfaces in recent years, the design process of metasurface is still time-consuming and computational resource-consuming. Moreover, it is quite complicated for layman users to design metasurfaces as plenty of specialized knowledge is required. In this work, a metasurface design method named REACTIVE is proposed on the basis of deep learning, as deep learning method has shown its natural advantages and superiorities in mining undefined rules automatically in many fields. REACTIVE is capable of calculating metasurface structure directly through a given design target; meanwhile, it also shows the advantage in making the design process automatic, more efficient, less time-consuming, and less computational resource-consuming. Besides, it asks for less professional knowledge, so that engineers are required only to pay attention to the design target. Herein, a triple-band absorber is designed using the REACTIVE method, where a deep learning model computes the metasurface structure automatically through inputting the desired absorption rate. The whole design process is achieved 200 times faster than the conventional one, which convincingly demonstrates the superiority of this design method. REACTIVE is an effective design tool for designers, especially for laymen users and engineers.

228 citations

Journal ArticleDOI
TL;DR: A novel semisupervised feature selection framework by mining correlations among multiple tasks and apply it to different multimedia applications is proposed, which outperforms the other state-of-the-art feature selection algorithms.
Abstract: In this paper, we propose a novel semisupervised feature selection framework by mining correlations among multiple tasks and apply it to different multimedia applications. Instead of independently computing the importance of features for each task, our algorithm leverages shared knowledge from multiple related tasks, thus improving the performance of feature selection. Note that the proposed algorithm is built upon an assumption that different tasks share some common structures. The proposed algorithm selects features in a batch mode, by which the correlations between various features are taken into consideration. Besides, considering the fact that labeling a large amount of training data in real world is both time-consuming and tedious, we adopt manifold learning, which exploits both labeled and unlabeled training data for a feature space analysis. Since the objective function is nonsmooth and difficult to solve, we propose an iteractive algorithm with fast convergence. Extensive experiments on different applications demonstrate that our algorithm outperforms the other state-of-the-art feature selection algorithms.

210 citations

Journal ArticleDOI
TL;DR: A comprehensive survey on the state-of-the-art works applying swarm intelligence to achieve feature selection in classification, with a focus on the representation and search mechanisms.
Abstract: One of the major problems in Big Data is a large number of features or dimensions, which causes the issue of “the curse of dimensionality” when applying machine learning, especially classification algorithms. Feature selection is an important technique which selects small and informative feature subsets to improve the learning performance. Feature selection is not an easy task due to its large and complex search space. Recently, swarm intelligence techniques have gained much attention from the feature selection community because of their simplicity and potential global search ability. However, there has been no comprehensive surveys on swarm intelligence for feature selection in classification which is the most widely investigated area in feature selection. Only a few short surveys is this area are still lack of in-depth discussions on the state-of-the-art methods, and the strengths and limitations of existing methods, particularly in terms of the representation and search mechanisms, which are two key components in adapting swarm intelligence to address feature selection problems. This paper presents a comprehensive survey on the state-of-the-art works applying swarm intelligence to achieve feature selection in classification, with a focus on the representation and search mechanisms. The expectation is to present an overview of different kinds of state-of-the-art approaches together with their advantages and disadvantages, encourage researchers to investigate more advanced methods, provide practitioners guidances for choosing the appropriate methods to be used in real-world scenarios, and discuss potential limitations and issues for future research.

202 citations

References
More filters
Journal ArticleDOI
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

40,785 citations


"Feature Selection Based on Structur..." refers background in this paper

  • ...An interesting way to cope with feature selection in the learning by examples framework is to resort to regularization techniques based on l1 penalty [37], [38]....

    [...]

Journal ArticleDOI
TL;DR: It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.
Abstract: Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together.The elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). By contrast, the lasso is not a very satisfactory variable selection method in the

16,538 citations


"Feature Selection Based on Structur..." refers methods in this paper

  • ...To handle features with strong correlations, elastic net regularization [46] is proposed as...

    [...]

Journal ArticleDOI
22 Dec 2000-Science
TL;DR: Locally linear embedding (LLE) is introduced, an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs that learns the global structure of nonlinear manifolds.
Abstract: Many areas of science depend on exploratory data analysis and visualization. The need to analyze large amounts of multivariate data raises the fundamental problem of dimensionality reduction: how to discover compact representations of high-dimensional data. Here, we introduce locally linear embedding (LLE), an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs. Unlike clustering methods for local dimensionality reduction, LLE maps its inputs into a single global coordinate system of lower dimensionality, and its optimizations do not involve local minima. By exploiting the local symmetries of linear reconstructions, LLE is able to learn the global structure of nonlinear manifolds, such as those generated by images of faces or documents of text.

15,106 citations


"Feature Selection Based on Structur..." refers background in this paper

  • ...The first term of (23), arg minZ Z T =Im×m tr(Z L Z T ), is exactly the same as [91], which is to find the lowdimensional embedding of each example....

    [...]

Journal ArticleDOI
TL;DR: The contributions of this special issue cover a wide range of aspects of variable selection: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.
Abstract: Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. These areas include text processing of internet documents, gene expression array analysis, and combinatorial chemistry. The objective of variable selection is three-fold: improving the prediction performance of the predictors, providing faster and more cost-effective predictors, and providing a better understanding of the underlying process that generated the data. The contributions of this special issue cover a wide range of aspects of such problems: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.

14,509 citations


"Feature Selection Based on Structur..." refers background in this paper

  • ...E. l2,0-Norm Regularized/Constrained Feature Selection Sparse feature selection [71] selects features by solving a smoothed general loss function with a l2,0-norm constraint....

    [...]

  • ...(22) 4) Feature Selection via Joint Embedding Learning and Sparse Regression: Instead of regressing each example to its label [26], [86], [88], the objective of joint embedding learning and sparse regression (JELSR) [89], [90] is to regress each example Xi to its low-dimensional embedding Zi ∈ Rm , where m is the dimensionality of embedding....

    [...]

  • ...C. l2,1-Norm Regularized/Constrained Feature Selection 1) Efficient and Robust Feature Selection via Joint l2,1-Norm Minimization: Nie et al. [26] aim to learn a linear function y = x T W + b, such that for n training examples, Y i ≈ X Ti W + b, i.e., minW,b ‖Y i − X Ti W − b‖2....

    [...]

  • ...To solve this issue, feature selection (also known as feature ranking, subset, or variable selection) [1]–[7] techniques are designed to select a subset of features from the high-dimensional feature set for a compact and accurate data representation....

    [...]

  • ...6) Unsupervised Feature Selection: Maximum margin criterion (MMC) [95], [96] is a supervised subspace method, a variant of linear discriminant analysis (LDA)....

    [...]

Journal ArticleDOI
TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.
Abstract: We propose a novel approach for solving the perceptual grouping problem in vision. Rather than focusing on local features and their consistencies in the image data, our approach aims at extracting the global impression of an image. We treat image segmentation as a graph partitioning problem and propose a novel global criterion, the normalized cut, for segmenting the graph. The normalized cut criterion measures both the total dissimilarity between the different groups as well as the total similarity within the groups. We show that an efficient computational technique based on a generalized eigenvalue problem can be used to optimize this criterion. We applied this approach to segmenting static images, as well as motion sequences, and found the results to be very encouraging.

13,789 citations