scispace - formally typeset
Open AccessProceedings Article

Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models

Tong Zhang
- Vol. 21, pp 1921-1928
Reads0
Chats0
TLDR
This work proposes a novel combination that is based on the forward greedy algorithm but takes backward steps adaptively whenever beneficial, and proves strong theoretical results showing that this procedure is effective in learning sparse representations.
Abstract
Consider linear prediction models where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with non-zero coefficients and reconstructing the target function from noisy observations. Two heuristics that are widely used in practice are forward and backward greedy algorithms. First, we show that neither idea is adequate. Second, we propose a novel combination that is based on the forward greedy algorithm but takes backward steps adaptively whenever beneficial. We prove strong theoretical results showing that this procedure is effective in learning sparse representations. Experimental results support our theory.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book

Machine Learning : A Probabilistic Perspective

TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
Journal ArticleDOI

Data-driven discovery of partial differential equations.

TL;DR: In this paper, the authors propose a sparse regression method for discovering the governing partial differential equation(s) of a given system by time series measurements in the spatial domain, which relies on sparsitypromoting techniques to select the nonlinear and partial derivative terms of the governing equations that most accurately represent the data, bypassing a combinatorially large search through all possible candidate models.
Journal Article

Analysis of Multi-stage Convex Relaxation for Sparse Regularization

TL;DR: A multi-stage convex relaxation scheme for solving problems with non-convex objective functions with sparse regularization is presented and it is shown that the local solution obtained by this procedure is superior to the global solution of the standard L1 conveX relaxation for learning sparse targets.
Posted Content

Multi-Label Prediction via Compressed Sensing

TL;DR: In this paper, a general theory for a variant of the error correcting output code scheme, using ideas from compressed sensing for exploiting output sparsity, was developed, which can be regarded as a simple reduction from multi-label regression problems to binary regression problems.
Posted Content

Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection

TL;DR: The submodularity ratio is introduced as a key quantity to help understand why greedy algorithms perform well even when the variables are highly correlated, and is a stronger predictor of the performance of greedy algorithms than other spectral parameters.
References
More filters
Journal ArticleDOI

Greed is good: algorithmic results for sparse approximation

TL;DR: This article presents new results on using a greedy algorithm, orthogonal matching pursuit (OMP), to solve the sparse approximation problem over redundant dictionaries and develops a sufficient condition under which OMP can identify atoms from an optimal approximation of a nonsparse signal.
Journal Article

On Model Selection Consistency of Lasso

TL;DR: It is proved that a single condition, which is called the Irrepresentable Condition, is almost necessary and sufficient for Lasso to select the true model both in the classical fixed p setting and in the large p setting as the sample size n gets large.
Journal ArticleDOI

Simultaneous analysis of lasso and dantzig selector

TL;DR: In this article, the Lasso estimator and the Dantzig selector exhibit similar behavior under a sparsity scenario, and they derive, in parallel, oracle inequalities for the prediction risk in the general nonparametric regression model, as well as bounds on the l p estimation loss for 1 ≤ p ≤ 2.
Related Papers (5)