scispace - formally typeset
Open AccessJournal ArticleDOI

Nonconcave Penalized Likelihood With NP-Dimensionality

Reads0
Chats0
TLDR
It is shown that in the context of generalized linear models, such methods possess model selection consistency with oracle properties even for dimensionality of nonpolynomial order of sample size, for a class of penalized likelihood approaches using folded-concave penalty functions, which were introduced to ameliorate the bias problems of convex penalty functions.
Abstract
Penalized likelihood methods are fundamental to ultrahigh dimensional variable selection. How high dimensionality such methods can handle remains largely unknown. In this paper, we show that in the context of generalized linear models, such methods possess model selection consistency with oracle properties even for dimensionality of nonpolynomial (NP) order of sample size, for a class of penalized likelihood approaches using folded-concave penalty functions, which were introduced to ameliorate the bias problems of convex penalty functions. This fills a long-standing gap in the literature where the dimensionality is allowed to grow slowly with the sample size. Our results are also applicable to penalized likelihood with the L1-penalty, which is a convex function at the boundary of the class of folded-concave penalty functions under consideration. The coordinate optimization is implemented for finding the solution paths, whose performance is evaluated by a few simulation examples and the real data analysis.

read more

Content maybe subject to copyright    Report

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 8, AUGUST 2011 5467
Nonconcave Penalized Likelihood
With NP-Dimensionality
Jianqing Fan and Jinchi Lv
Abstract—Penalized likelihood methods are fundamental to
ultrahigh dimensional variable selection. How high dimension-
ality such methods can handle remains largely unknown. In this
paper, we show that in the context of generalized linear models,
such methods possess model selection consistency with oracle
properties even for dimensionality of nonpolynomial (NP) order
of sample size, for a class of penalized likelihood approaches
using folded-concave penalty functions, which were introduced to
ameliorate the bias problems of convex penalty functions. This
fills a long-standing gap in the literature where the dimensionality
is allowed to grow slowly with the sample size. Our results are also
applicable to penalized likelihood with the
L
1
-penalty, which is
a convex function at the boundary of the class of folded-concave
penalty functions under consideration. The coordinate opti-
mization is implemented for finding the solution paths, whose
performance is evaluated by a few simulation examples and the
real data analysis.
Index Terms—Coordinate optimization, folded-concave penalty,
high dimensionality, Lasso, nonconcave penalized likelihood, or-
acle property, SCAD, variable selection, weak oracle property.
I. INTRODUCTION
T
HE analysis of data sets with the number of variables
comparable to or much larger than the sample size fre-
quently arises nowadays in many fields ranging from genomics
and health sciences to economics and machine learning. The
data that we collect is usually of the type
,
where the
’s are independent observations of the response
variable
given its covariates, or explanatory variables,
. Generalized linear models (GLMs) provide a
flexible parametric approach to estimating the covariate effects
(McCullagh and Nelder, 1989). In this paper we consider the
variable selection problem of nonpolynomial (NP) dimension-
ality in the context of GLMs. By NP-dimensionality we mean
that
for some . See Fan and Lv (2010)
Manuscript received January 13, 2010; revised February 23, 2011; accepted
March 02, 2011. Date of current version July 29, 2011. J. Fan was supported
in part by NSF Grants DMS-0704337 and DMS-0714554 and in part by NIH
Grant R01-GM072611 from the National Institute of General Medical Sciences.
J. Lv was supported in part by NSF CAREER Award DMS-0955316, in part by
NSF Grant DMS-0806030, and in part by the 2008 Zumberge Individual Award
from USC’s James H. Zumberge Faculty Research and Innovation Fund.
J. Fan is with the Department of Operations Research and Financial
Engineering, Princeton University, Princeton, NJ 08544 USA (e-mail:
jqfan@princeton.edu).
J. Lv is with the Information and Operations Management Department, Mar-
shall School of Business, University of Southern California, Los Angeles, CA
90089 USA (e-mail: jinchilv@marshall.usc.edu).
Communicated by A. Krzyzak, Associate Editor for Pattern Recognition, Sta-
tistical Learning and Inference.
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIT.2011.2158486
for an overview of recent developments in high dimensional
variable selection.
We denote by
the design matrix with
, and
the -dimensional response vector. Throughout the paper we
consider deterministic design matrix. With a canonical link, the
conditional distribution of
given belongs to the canonical
exponential family, having the following density function with
respect to some fixed measure
(1)
where
is an unknown -dimensional vector
of regression coefficients,
is a family of dis-
tributions in the regular exponential family with dispersion pa-
rameter
, and . As is common
in GLM, the function
is implicitly assumed to be twice con-
tinuously differentiable with
always positive. In the sparse
modeling, we assume that majority of the true regression coeffi-
cients
are exactly zero. Without loss of
generality, assume that
with each component
of
nonzero and . Hereafter we refer to the support
as the true underlying sparse model of
the indices. Variable selection aims at locating those predictors
with nonzero and giving an efficient estimate of .
In view of (1), the log-likelihood
of the
sample is given, up to an affine transformation, by
(2)
where
for .We
consider the following penalized likelihood
(3)
where
is a penalty function and is a regularization
parameter.
In a pioneering paper, Fan and Li (2001) build the theoret-
ical foundation of nonconcave penalized likelihood for vari-
able selection. The penalty functions that they used are not any
nonconvex functions, but really the folded-concave functions.
For this reason, we will call them more precisely folded-con-
cave penalties. The paper also introduces the oracle property for
model selection. An estimator
is said to have
the oracle property (Fan and Li, 2001) if it enjoys the model
selection consistency in the sense of
with probability
tending to 1 as
, and it attains an information bound
0018-9448/$26.00 © 2011 IEEE

5468 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 8, AUGUST 2011
mimicking that of the oracle estimator, where is a subvector
of
formed by its first components and the oracle knew the
true model
ahead of time. Fan and Li
(2001) study the oracle properties of nonconcave penalized like-
lihood estimators in the finite-dimensional setting. Their results
were extended later by Fan and Peng (2004) to the setting of
or in a general likelihood framework.
How large can the dimensionality
be, compared with
the sample size
, such that the oracle property continues to
hold in penalized likelihood estimation? What role does the
penalty function play? In this paper, we provide an answer to
these long-standing questions for a class of penalized likeli-
hood methods using folded-concave penalties in the context
of GLMs with NP-dimensionality. We also characterize the
nonasymptotic weak oracle property and the global optimality
of the nonconcave penalized maximum likelihood estimator.
Our theory applies to the
-penalty as well, but its conditions
are far more stringent than those for other members of the class.
These constitute the main theoretical contributions of the paper.
Numerous efforts have lately been devoted to studying the
properties of variable selection with ultrahigh dimensionality
and significant progress has been made. Meinshausen and
Bühlmann (2006), Zhao and Yu (2006), and Zhang and Huang
(2008) investigate the issue of model selection consistency for
LASSO under different setups when the number of variables is
of a greater order than the sample size. Candes and Tao (2007)
introduce the Dantzig selector to handle the NP-dimensional
variable selection problem, which was shown to behave simi-
larly to Lasso by Bickel
et al. (2009). Zhang (2010) is among
the first to study the nonconvex penalized least-squares esti-
mator with NP-dimensionality and demonstrates its advantages
over LASSO. He also developed the PLUS algorithm to find the
solution path that has the desired sampling properties. Fan and
Lv (2008) and Huang et al. (2008) introduce the independence
screening procedure to reduce the dimensionality in the context
of least-squares. The former establishes the sure screening
property with NP-dimensionality and the latter also studies the
bridge regression, a folded-concave penalty approach. Hall and
Miller (2009) introduce feature ranking using a generalized cor-
relation, and Hall et al. (2009) propose independence screening
using tilting methods and empirical likelihood. Fan and Fan
(2008) investigate the impact of dimensionality on ultrahigh
dimensional classification and establish an oracle property
for features annealed independence rules. Lv and Fan (2009)
make important connections between model selection and
sparse recovery using folded-concave penalties and establish
a nonasymptotic weak oracle property for the penalized least
squares estimator with NP-dimensionality. There are also a
number of important papers on establishing the oracle inequal-
ities for penalized empirical risk minimization. For example,
Bunea et al. (2007) establish sparsity oracle inequalities for the
Lasso under quadratic loss in the context of least-squares; van
de Geer (2008) obtains a nonasymptotic oracle inequality for
the empirical risk minimizer with the
-penalty in the context
of GLMs; Koltchinskii (2008) proves oracle inequalities for
penalized least squares with entropy penalization.
The rest of the paper is organized as follows. In Section II,
we discuss the choice of penalty functions and characterize the
nonconcave penalized likelihood estimator and its global opti-
mality. We study the nonasymptotic weak oracle properties and
oracle properties of nonconcave penalized likelihood estimator
in Sections III and IV, respectively. Section V introduces a co-
ordinate optimization algorithm, the iterative coordinate ascent
(ICA) algorithm, to solve regularization problems with concave
penalties. In Section VI, we present three numerical examples
using both simulated and real data sets. We provide some discus-
sions of our results and their implications in Section VII. Proofs
are presented in Section VIII. Technical details are relegated to
the Appendix.
II. N
ONCONCAVE
PENALIZED
LIKELIHOOD ESTIMATION
In this section, we discuss the choice of penalty functions in
regularization methods and characterize the nonconcave penal-
ized likelihood estimator as well as its global optimality.
A. Penalty Function
For any penalty function
, let .For
simplicity, we will drop its dependence on
and write as
when there is no confusion. Many penalty functions have
been proposed in the literature for regularization. For example,
the best subset selection amounts to using the
penalty. The
ridge regression uses the
penalty. The penalty
for bridges these two cases (Frank and Friedman,
1993). Breiman (1995) introduces the non-negative garrote for
shrinkage estimation and variable selection. Lasso (Tibshirani,
1996) uses the
-penalized least squares. The SCAD penalty
(Fan, 1997; Fan and Li, 2001) is the function whose derivative
is given by
(4)
where often is used, and MCP (Zhang, 2010) is defined
through
. Clearly the SCAD penalty takes
off at the origin as the
penalty and then levels off, and MCP
translates the flat part of the derivative of SCAD to the origin.
A family of folded concave penalties that bridge the
and
penalties were studied by Lv and Fan (2009).
Hereafter we consider penalty functions
that satisfy the
following condition:
Condition 1:
is increasing and concave in ,
and has a continuous derivative
with .In
addition,
is increasing in and is
independent of
.
The above class of penalty functions has been considered by
Lv and Fan (2009). Clearly the
penalty is a convex function
that falls at the boundary of the class of penalty functions satis-
fying Condition 1. Fan and Li (2001) advocate penalty functions
that give estimators with three desired properties: unbiasedness,
sparsity and continuity, and provide insights into them (see also
Antoniadis and Fan, 2001). SCAD satisfies Condition 1 and the
above three properties simultaneously. The
penalty and MCP
also satisfy Condition 1, but
does not enjoy the unbiased-
ness due to its constant rate of penalty and MCP violates the
continuity property. However, our results are applicable to the
-penalized and MCP regression. Condition 1 is needed for

FAN AND LV: NONCONCAVE PENALIZED LIKELIHOOD WITH NP-DIMENSIONALITY 5469
establishing the oracle properties of nonconcave penalized like-
lihood estimator.
B. Nonconcave Penalized Likelihood Estimator
It is generally difficult to study the global maximizer of
the penalized likelihood analytically without concavity. As
is common in the literature, we study the behavior of local
maximizers.
We introduce some notation to simplify our presentation. For
any
, define
and
(5)
It is known that the
-dimensional response vector following
the distribution in (1) has mean vector
and covariance ma-
trix
, where . Let , and
, , where de-
notes the sign function. We denote by
the norm of a
vector or matrix for
. Following Lv and Fan (2009)
and Zhang (2010), define the local concavity of the penalty
at
with as
(6)
By the concavity of
in Condition 1, we have .
It is easy to show by the mean-value theorem that
provided that the second derivative of
is continuous. For the SCAD penalty, unless some
component of
takes values in . In the latter case,
.
Throughout the paper, we use
and to repre-
sent the smallest and largest eigenvalues of a symmetric matrix,
respectively.
The following theorem gives a sufficient condition on the
strict local maximizer of the penalized likelihood
in (3).
Theorem 1 (Characterization of PMLE): Assume that
sat-
isfies Condition 1. Then
is a strict local maximizer of
the nonconcave penalized likelihood
defined by (3) if
(7)
(8)
(9)
where
and respectively denote the submatrices of
formed by columns in and its complement, ,
is a subvector of formed by all nonzero components, and
. On the other hand, if is a local
maximizer of
, then it must satisfy (7) (9) with strict
inequalities replaced by nonstrict inequalities.
There is only a tiny gap (nonstrict versus strict inequalities)
between the necessary condition for local maximizer and suffi-
cient condition for strict local maximizer. Conditions (7) and (9)
ensure that
is a strict local maximizer of (3) when constrained
on the
-dimensional subspace of ,
where
denotes the subvector of formed by components in
the complement of
. Condition (8) makes sure that the
sparse vector
is indeed a strict local maximizer of (3) on the
whole space
.
When
is the penalty, the penalized likelihood function
in (3) is concave in . Then the classical convex op-
timization theory applies to show that
is
a global maximizer if and only if there exists a subgradient
such that
(10)
that is, it satisfies the Karush-Kuhn-Tucker (KKT) condi-
tions, where the subdifferential of the
penalty is given
by
for and . Thus, condition
(10) reduces to (7) and (8) with strict inequality replaced by
nonstrict inequality. Since
for the -penalty,
condition (9) holds provided that
is nonsingular.
However, to ensure that
is the strict maximizer we need the
strict inequality in (8).
C. Global Optimality
A natural question is when the nonconcave penalized max-
imum likelihood estimator (NCPMLE)
is a global maximizer
of the penalized likelihood
. We characterize such a prop-
erty from two perspectives.
1) Global Optimality: Assume that the
design matrix
has a full column rank . This implies that . Since
is always positive, it is easy to show that the Hessian matrix of
is always positive definite, which entails that the log-
likelihood function
is strictly concave in . Thus, there
exists a unique maximizer
of . Let
be a sublevel set of for some and
be the maximum concavity of the penalty function . For the
penalty, SCAD and MCP, we have , , and
, respectively. The following proposition gives a sufficient
condition on the global optimality of NCPMLE.
Proposition 1 (Global Optimality): Assume that
has rank
and satisfies
(11)
Then the NCPMLE
is a global maximizer of the penalized
likelihood
if .
Note that for penalized least-squares, (11) reduces to
(12)
This condition holds for sufficiently large
in SCAD and MCP,
when the correlation between covariates is not too strong. The
latter holds for design matrices constructed by using spline
bases to approximate a nonparametric function. According
to Proposition 1, under (12), the penalized least-squares with
folded-concave penalty is a global minimum.
The proposition below gives a condition under which the
penalty term in (3) does not change the global maximizer. It

5470 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 8, AUGUST 2011
will be used to derive the condition under which the PMLE is
the same as the oracle estimator in Proposition 3(b). Here for
simplicity we consider the SCAD penalty
given by (4), and
the technical arguments are applicable to other folded-concave
penalties as well.
Proposition 2 (Robustness): Assume that
has rank
with and there exists some such that
for some . Then
the SCAD penalized likelihood estimator
is the global maxi-
mizer and equals
if and ,
where
.
2) Restricted Global Optimality: When
, it is hard
to show the global optimality of a local maximizer. However,
we can study the global optimality of the NCPMLE
on the
union of coordinate subspaces. A subspace of
is called co-
ordinate subspace if it is spanned by a subset of the natural basis
, where each is the -vector with th component
1 and 0 elsewhere. Here each
corresponds to the th predictor
. We will investigate the global optimality of on the union
of all -dimensional coordinate subspaces of in Proposi-
tion 3(a).
Of particular interest is to derive the conditions under which
the PMLE is also an oracle estimator, in addition to possessing
the above restricted global optimal estimator on
. To this
end, we introduce an identifiability condition on the true model
. The true model is called -identifiable for some
if
(13)
where
. In other words, is the best subset of size
, with a margin at least . The following proposition is an easy
consequence of Propositions 1 and 2.
Proposition 3 (Global Optimality on
):
a) If the conditions of Proposition 1 are satisfied for each
submatrix of , then the NCPMLE is a global
maximizer of
on .
b) Assume that the conditions of Proposition 2 are satis-
fied for the
submatrix of formed by columns
in
, the true model is -identifiable for some
, and . Then the
SCAD penalized likelihood estimator
is the global max-
imizer on
and equals to the oracle maximum likelihood
estimator
.
On the event that the PMLE estimator is the same as the oracle
estimator, it possesses of course the oracle property.
III. N
ONASYMPTOTIC WEAK ORACLE PROPERTIES
In this section, we study a nonasymptotic property of the non-
concave penalized likelihood estimator
, called the weak or-
acle property introduced by Lv and Fan (2009) in the setting
of penalized least squares. The weak oracle property means
sparsity in the sense of
with probability tending to 1
as
, and consistency under the loss, where
and is a subvector of formed by components in
. This property is weaker than the oracle
property introduced by Fan and Li (2001).
A. Regularity Conditions
As mentioned before, we condition on the design matrix
and use the penalty in the class satisfying Condition 1. Let
and respectively be the submatrices of the design
matrix
formed by columns in and
its complement, and
. To simplify the presentation,
we assume without loss of generality that each covariate
has
been standardized so that
. If the covariates have
not been standardized, the results still hold with
assumed
to be in the order of
. Let
(14)
be half of the minimum signal. We make the following assump-
tions on the design matrix and the distribution of the response.
Let
be a diverging sequence of positive numbers that de-
pends on the nonsparsity size
and hence depends on . Recall
that
is the nonvanishing components of the true parameter
.
Condition 2: The design matrix
satisfies
(15)
(16)
(17)
where the
norm of a matrix is the maximum of the norm
of each row,
, ,
, the derivative is taken componentwise, and
denotes the Hadamard (componentwise) product.
Here and below,
is associated with regularization param-
eter
satisfying (18) unless specified otherwise. For the clas-
sical Gaussian linear regression model, we have
and
. In this case, since we will assume that , condi-
tion (15) usually holds with
. In fact, Wainwright (2009)
shows that
if the rows of are
i.i.d. Gaussian vectors with
.In
general, since
we can take if . More
generally, (15) can be bounded as
and the above remark for the multiple regression model applies
to the submatrix
, which consists of rows of the samples
with
for some .
The left hand side of (16) is the multiple regression co-
efficients of each unimportant variable in
on , using
the weighted least squares with weights
. The order

FAN AND LV: NONCONCAVE PENALIZED LIKELIHOOD WITH NP-DIMENSIONALITY 5471
is mainly technical and can be relaxed, whereas
the condition
is genuine. When the
penalty is used, the upper bound in (16) is more restric-
tive, requiring uniformly less than 1. This condition is the
same as the strong irrepresentable condition of Zhao and Yu
(2006) for the consistency of the LASSO estimator, namely
. It is a drawback of the
penalty. In constrast, when a folded-concave penalty is used,
the upper bound on the right hand side of (16) can grow to
at rate .
Condition (16) controls the uniform growth rate of the
-norm of these multiple regression coefficients, a notion of
weak correlation between
and . If each element of the
multiple regression coefficients is of order
, then the
norm is of order . Hence, we can handle the nonsparse
dimensionality
, by (16), as long as the first term in
(16) dominates, which occurs for SCAD type of penalty with
. Of course, the actual dimensionality can be higher
or lower, depending on the correlation between
and ,but
for finite nonsparse dimensionality
, (16) is usually
satisfied.
For the Gaussian linear regression model, condition (17)
holds automatically.
We now choose the regularization parameter
and intro-
duce Condition 3. We will assume that half of the minimum
signal
for some . Take satis-
fying
and (18)
where
and is associated with the
nonsparsity size
.
Condition 3: Assume that
and
.
In addition, assume that
satisfies (18) and
, where and
, and that
if the responses are
unbounded.
The condition
is needed to ensure condition
(9). The condition always holds when
and is satisfied
for the SCAD type of penalty when
.
In view of (7) and (8), to study the nonconcave penalized like-
lihood estimator
we need to analyze the deviation of the -di-
mensional random vector
from its mean , where
denotes the -dimensional random re-
sponse vector in the GLM (1). The following proposition, whose
proof is given in Section VIII.E, characterizes such deviation
for the case of bounded responses and the case of unbounded
responses satisfying a moment condition, respectively.
Proposition 4 (Deviation): Let
be the
-dimensional independent random response vector and
. Then
a) If
are bounded in for some ,
then for any
(19)
b) If
are unbounded and there exist some
such that
(20)
with
, then for any
(21)
In light of (1), it is known that for the exponential family, the
moment-generating function of
is given by
where is in the domain of . Thus, the moment
condition (20) is reasonable. It is easy to show that condition
(20) holds for the Gaussian linear regression model and for the
Poisson regression model with bounded mean responses. Sim-
ilar probability bounds also hold for sub-Gaussian errors.
We now express the results in Proposition 4 in a unified form.
For the case of bounded responses, we define
for , where . For the case of un-
bounded responses satisfying the moment condition (20), we
define
, where . Then the
exponential bounds in (19) and (21) can be expressed as
(22)
where
if the responses are bounded and
if the responses are unbounded.
B. Weak Oracle Properties
Theorem 2 (Weak Oracle Property): Assume that Conditions
1 3 and the probability bound (22) are satisfied,
, and
. Then there exists a nonconcave penalized
likelihood estimator
such that for sufficiently large , with
probability at least
,
satisfies:
a) (Sparsity).
;
b) (
loss). ,
where
and are respectively the subvectors of and
formed by components in .
Under the given regularity conditions, the dimensionality
is
allowed to grow up to exponentially fast with the sample size
.
The growth rate of
is controlled by . It also enters
the nonasymptotic probability bound. This probability tends to
1 under our technical assumptions. From the proof of Theorem
2, we see that with asymptotic probability one, the
estima-
tion loss of the nonconcave penalized likelihood estimator
is
bounded from above by three terms (see (45)), where the second
term
is associated with
the penalty function
. For the penalty, the ratio
is equal to one, and for other concave penalties,
it can be (much) smaller than one. This is in line with the

Citations
More filters
Journal Article

A Selective Overview of Variable Selection in High Dimensional Feature Space.

TL;DR: In this paper, a brief account of the recent developments of theory, methods, and implementations for high-dimensional variable selection is presented, with emphasis on independence screening and two-scale methods.
Book

High-Dimensional Statistics: A Non-Asymptotic Viewpoint

TL;DR: This book provides a self-contained introduction to the area of high-dimensional statistics, aimed at the first-year graduate level, and includes chapters that are focused on core methodology and theory - including tail bounds, concentration inequalities, uniform laws and empirical process, and random matrices.
Journal ArticleDOI

Moving Object Detection by Detecting Contiguous Outliers in the Low-Rank Representation

TL;DR: Zhang et al. as mentioned in this paper proposed a unified framework named detecting contiguous outliers in the LOw-rank representation (DECOLOR), which integrates object detection and background learning into a single process of optimization, which can be solved by an alternating algorithm.
Posted Content

Moving Object Detection by Detecting Contiguous Outliers in the Low-Rank Representation

TL;DR: This paper presents a unified framework named DEtecting Contiguous Outliers in the LOw-rank Representation (DECOLOR), which integrates object detection and background learning into a single process of optimization, which can be solved by an alternating algorithm efficiently.
Journal ArticleDOI

Best subset selection via a modern optimization lens

TL;DR: In this article, a discrete extension of modern first-order continuous optimization methods is proposed to find high quality feasible solutions that are used as warm starts to a MIO solver that finds provably optimal solutions.
References
More filters
Journal ArticleDOI

Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

TL;DR: In this article, penalized likelihood approaches are proposed to handle variable selection problems, and it is shown that the newly proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well if the correct submodel were known.
Journal ArticleDOI

Model selection and estimation in regression with grouped variables

TL;DR: In this paper, instead of selecting factors by stepwise backward elimination, the authors focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection.
Journal ArticleDOI

The adaptive lasso and its oracle properties

TL;DR: A new version of the lasso is proposed, called the adaptive lasso, where adaptive weights are used for penalizing different coefficients in the ℓ1 penalty, and the nonnegative garotte is shown to be consistent for variable selection.
Book

Weak Convergence and Empirical Processes: With Applications to Statistics

TL;DR: In this article, the authors define the Ball Sigma-Field and Measurability of Suprema and show that it is possible to achieve convergence almost surely and in probability.
Related Papers (5)
Frequently Asked Questions (10)
Q1. What are the contributions mentioned in the paper "Nonconcave penalized likelihood with np-dimensionality" ?

In this paper, the authors show that in the context of generalized linear models, such methods possess model selection consistency with oracle properties even for dimensionality of nonpolynomial ( NP ) order of sample size, for a class of penalized likelihood approaches using folded-concave penalty functions, which were introduced to ameliorate the bias problems of convex penalty functions. 

Then is a strict local maximizer of the nonconcave penalized likelihood defined by (3) if(7)(8)(9)where and respectively denote the submatrices of formed by columns in and its complement, ,is a subvector of formed by all nonzero components, and . 

In this case, the dimensionality that the penalized least-squares can handle is as high as when, which is usually smaller than that for the case of . 

Condition (16) controls the uniform growth rate of the -norm of these multiple regression coefficients, a notion of weak correlation between and . 

A subspace of is called coordinate subspace if it is spanned by a subset of the natural basis , where each is the -vector with th component 1 and 0 elsewhere. 

Then there exists a strict local maxi-mizer of the penalized likelihood such that with probability tending to 1 as and, where is a subvector of formed by components in . 

More generally, when the second derivative of the penalty function does not necessarily exist, it is easy to show that the second part of the matrix can be replaced by a diagonal matrix with maximum absolute element bounded by . 

By the concavity of , the authors can easily show that for , is a closed convex set with and being its interior points and the level set is its boundary. 

When is quadratic in , e.g., for the Gaussian linear regression model, the second order approximation in ICA is exact at each step. 

Due to its popularity, the authors now examine the implications of Theorem 2 in the context of penalized least-squares and penalized likelihood.