scispace - formally typeset
Open AccessJournal ArticleDOI

Complexity analysis of interior point algorithms for non-Lipschitz and nonconvex minimization

TLDR
A first order interior point algorithm is proposed for a class of non-Lipschitz and nonconvex minimization problems with box constraints, which arise from applications in variable selection and regularized optimization and the objective function value is reduced monotonically along the iteration points.
Abstract
We propose a first order interior point algorithm for a class of non-Lipschitz and nonconvex minimization problems with box constraints, which arise from applications in variable selection and regularized optimization. The objective functions of these problems are continuously differentiable typically at interior points of the feasible set. Our first order algorithm is easy to implement and the objective function value is reduced monotonically along the iteration points. We show that the worst-case iteration complexity for finding an $$\epsilon $$ ∈ scaled first order stationary point is $$O(\epsilon ^{-2})$$ O ( ∈ - 2 ) . Furthermore, we develop a second order interior point algorithm using the Hessian matrix, and solve a quadratic program with a ball constraint at each iteration. Although the second order interior point algorithm costs more computational time than that of the first order algorithm in each iteration, its worst-case iteration complexity for finding an $$\epsilon $$ ∈ scaled second order stationary point is reduced to $$O(\epsilon ^{-3/2})$$ O ( ∈ - 3 / 2 ) . Note that an $$\epsilon $$ ∈ scaled second order stationary point must also be an $$\epsilon $$ ∈ scaled first order stationary point.

read more

Content maybe subject to copyright    Report

Mathematical Programming manuscript No.
(will be inserted by the editor)
Complexity Analysis of Interior Point Algorithms
for Non-Lipschitz and Nonconvex Minimization
Wei Bian · Xiaojun Chen · Yinyu Ye
July 25, 2012, Received: date / Accepted: date
Abstract We propose a first order interior point algorithm for a class of non-
Lipschitz and nonconvex minimization problems with box constraints, which
arise from applications in variable selection and regularized optimization. The
objective functions of these problems are continuously differentiable typically
at interior points of the feasible set. Our algorithm is easy to implement and the
objective function value is reduced monotonically along the iteration points.
We show that the worst-case complexity for finding an scaled first order sta-
tionary point is O(
2
). Moreover, we develop a second order interior point
algorithm using the Hessian matrix, and solve a quadratic program with ball
constraint at each iteration. Although the second order interior point algo-
rithm costs more computational time than that of the first order algorithm in
each iteration, its worst-case complexity for finding an scaled second order
stationary point is reduced to O(
3/2
). An scaled second order stationary
point is an scaled first order stationary point.
Keywords constrained non-Lipschitz optimization · complexity analysis ·
interior point method · first order algorithm · second order algorithm
This work was supported partly by Hong Kong Research Council Grant PolyU5003/10p, The
Hong Kong Polytechnic University Postdoctoral Fellowship Scheme and the NSF foundation
(11101107,10971043) of China.
Wei Bian
Department of Mathematics, Harbin Institute of Technology, Harbin, China. Current ad-
dress: Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong
Kong. E-mail: bianweilvse520@163.com.
Xiaojun Chen
Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong,
China. E-mail: maxjchen@polyu.edu.hk
Yinyu Ye
Department of Management Science and Engineering, Stanford University, Stanford, CA
94305. E-mail: yinyu-ye@stanford.edu

2
Mathematics Subject Classification (2000) 90C30 · 90C26 · 65K05 ·
49M37
1 Introduction
In this paper, we consider the following optimization problem:
min f(x) = H(x) + λ
n
X
i=1
ϕ(x
p
i
)
s.t. x = {x : 0 x b},
(1)
where H R
n
R is continuously differentiable, ϕ : R
+
R is continuous
and concave, λ > 0, 0 < p < 1, b = (b
1
, b
2
, . . . , b
n
)
T
with b
i
> 0, i = 1, 2, . . . , n.
Moreover, ϕ is continuously differentiable in R
++
and ϕ(0) = 0. Without loss
of generality, we assume that a minimizer of (1) exists and min
f(x) 0.
Problem (1) is nonsmooth, nonconvex, and non-Lipschitz, which has been
extensively used in image restoration, signal processing and variable selection;
see, e.g., [8,11,13,19]. The function H(x) is often used as a data fitting term,
while the function
P
n
i=1
ϕ(x
p
i
) is used as a regularization term. Numerical
experiments indicate these type of problems could be solved effectively for
finding a local minimizer or stationary point. But little theoretical complexity
or convergence speed analysis of the problems is known, which is in contrast
to complexity study of convex optimization in the past thirty years.
There were few results on complexity analysis of nonconvex optimization
problems. Using an interior-point algorithm, Ye [17] proved that an scaled
KKT or first order stationary point of general quadratic programming can
be computed in O(
1
log(
1
)) iterations where each iteration would solve
a ball-constrained or trust-region quadratic program that is equivalent to a
simplex convex quadratic minimization problem. He also proved that, as 0,
the iterative sequence converges to a point satisfying the scaled second order
necessary optimality condition. The same complexity result was extended to
linearly constrained concave minimization by Ge et al. [10].
Cartis, Gould and Toint [2] estimated the worst-case complexity of a first
order trust-region or quadratic regularization method for solving the following
unconstrained nonsmooth, nonconvex minimization problem
min
xR
n
Φ
h
(x) := H(x) + h(c(x)),
where h : R
m
R is convex but may be nonsmooth and c : R
n
R
m
is
continuously differentiable. They show that their method takes at most O(
2
)
steps to reduce the size of a first order criticality measure below , which is
the same in order as the worst-case complexity of steepest-descent methods
applied to unconstrained, nonconvex smooth optimization. However, f in (1)
cannot be defined in the form of Φ
h
.

3
Garmanjani and Vicente [9] proposed a class of smoothing direct-search
methods for the unconstrained optimization of nonsmooth functions by ap-
plying a direct-search method to the smoothing function
˜
f of the objective
function f [3]. When f is locally Lipschitz, the smoothing direct-search method
[9] took at most O(
3
log ) iterations to find an x such that k∇
˜
f(x, µ)k
and µ , where µ is the smoothing parameter. When µ 0,
˜
f(x, µ) f (x)
and
˜
f(x, µ) v with v f(x).
Recently, Bian and Chen [1] proposed a smoothing sequential quadratic
programming (SSQP) algorithm for solving the following non-Lipshchitz un-
constrained minimization problem
min
xR
n
f
0
(x) := H(x) + λ
n
X
i=1
ϕ(|x
i
|
p
). (2)
At each step, the SSQP algorithm solves a strongly convex quadratic mini-
mization problem with a diagonal Hessian matrix, which has a simple closed-
form solution. The SSQP algorithm is easy to implement and its worst-case
complexity of reaching an scaled stationary point is O(
2
).
Obviously, the objective functions f of (1) and f
0
of (2) are identical in R
n
+
.
Moreover, f is smooth in the interior of R
n
+
. Note that problem (2) includes
the l
2
-l
p
problem
min
xR
n
kAx ck
2
+ λ
n
X
i=1
|x
i
|
p
(3)
as a special case, where A R
m×n
, c R
m
, and p (0, 1).
In this paper, we analyze the worst-case complexity of interior point meth-
ods for solving problem (1). We propose a first order interior point algorithm
with the worst-case complexity for finding an scaled first order stationary
point being O(
2
), and develop a second order interior point algorithm with
the worst-case complexity of it for finding an scaled second order stationary
point being O(
3/2
).
Our paper is organized as follows. In Section 2, a first order interior point
algorithm is proposed for solving (1), which only uses f and a Lipschitz
constant of H on and is easy to implement. Any iteration point x
k
> 0
belongs to and the objective function is monotonically decreasing along the
generated sequence {x
k
}. Moreover, the algorithm produces an scaled first
order stationary point of (1) in at most O(
2
) steps. In Section 3, a second
order interior point algorithm is given to solve a special case of (1), where H
is twice continuously differentiable, ϕ := t and = {x : x 0}. By using
the Hessian of H, the second order interior point algorithm can generate an
interior scaled second order stationary point in at most O(
3/2
) steps.
Throughout this paper, K = {0, 1, 2, . . .}, I = {1, 2, . . . , n}, I
b
= {i
{1, 2, . . . , n} : b
i
< +∞} and e
n
= (1, 1, . . . , 1)
T
R
n
. For x R
n
, A =
(a
ij
)
m×n
R
m×n
and q > 0, |A|
q
= (|a
ij
|
q
)
m×n
, x
q
= (x
q
1
, x
q
2
, . . . , x
q
n
)
T
,
[x
i
]
n
i=1
:= x, |x| = (|x
1
|, |x
2
|, . . . , |x
n
|)
T
, kxk = kxk
2
:= (x
2
1
+ x
2
2
+ . . . + x
2
n
)
1
2
and diag(x) = diag(x
1
, x
2
, . . . , x
n
). For two matrices A, B R
n×n
, we denote
A B when A B is positive semi-definite.

4
2 First Order Method
In this section, we propose a first order interior point algorithm for solving (1),
which uses the first order reduction technique and keeps all iterates x
k
> 0
in the feasible set . We show that the objective function value f(x
k
) is
monotonically decreasing along the sequence {x
k
} generated by the algorithm,
and the worst-case complexity of the algorithm for generating an interior
scaled first order stationary point of (1) is O(
2
), which is the same in the
worst-case complexity order of the steepest-descent methods for nonconvex
smooth optimization problems. Moreover, it is worth noting that the proposed
first order interior point algorithm is easy to implement, and computation cost
at each step is little.
Throughout this section, we need the following assumptions.
Assumption 2.1: H is globally Lipschitz on with a Lipschitz constant
β.
Specially, when I
b
6= we choose β such that β max
iI
b
1
b
i
and β 1.
Assumption 2.2: For any given x
0
, there is R 1 such that
sup{kxk
: f(x) f (x
0
), x } R.
When H(x) =
1
2
kAx ck
2
, Assumption 2.1 holds with β = kA
T
Ak. As-
sumption 2.2 holds, if is bounded or H(x) 0 for all x and ϕ(t)
as t .
2.1 First Order Necessary Conditions
Note that for problem (1), when 0 < p < 1, the Clarke generalized gradient
of ϕ(|s|
p
) does not exist at 0. Inspired by the scaled first and second order
necessary conditions for local minimizers of unconstrained non-Lipschitz opti-
mization in [5,6], we give the scaled first and second order necessary condition
for local minimizers of the constrained non-Lipschitz optimization (1) in this
section. Then, for any (0, 1], the scaled first order and second order
necessary stationary point of (1) can be deduced directly.
First, for > 0, an global minimizer of (1) is defined as a feasible solution
0 x
b and
f(x
) min
0xb
f(x) .
It has been proved that finding a global minimizer of the unconstrained l
2
-l
p
minimization problem (3) is strongly NP hard in [4]. For the unconstrained
l
2
-l
p
optimization (3), any local minimizer x satisfies the first order necessary
condition [6]
XA
T
(Ax c) + λp|x|
p
= 0, (4)
and the second order necessary condition
XA
T
AX + λp(p 1)|X|
p
0, (5)
where |X|
p
= diag(|x
1
|
p
, . . . , |x
n
|
p
).

5
For (1), if x is a local minimizer of (1) at which f is differentiable, then
x satisfies
(i) [f(x)]
i
= 0 if x
i
6= b
i
;
(ii) [f(x)]
i
0 if x
i
= b
i
.
Although [f(x)]
i
does not exist when x
i
= 0, one can see that, as x
i
0+,
[f(x)]
i
+.
Denote
Xf(x) = XH(x) + λp[ϕ(s)
s=x
p
i
x
p
i
]
n
i=1
.
Similarly, using X as a scaling matrix, if x is a local minimizer of (1), then
x satisfies the scaled first order necessary condition
[Xf(x)]
i
= 0 if x
i
6= b
i
; (6a)
[f(x)]
i
0 if x
i
= b
i
. (6b)
Now we can define an scaled first order stationary point of (1).
Definition 1 For a given 0 < ε 1, we call x an scaled first order
stationary point of (1), if there is δ > 0 such that
(i) |[Xf(x)]
i
| if x
i
< b
i
δ;
(ii) [f(x)]
i
if x
i
b
i
δ.
Definition 1 is consistent with the first order necessary conditions in (6a)-
(6b) with = 0. Moreover, Definition 1 is consistent with the first order
necessary conditions given in [6] with = 0 for unconstrained optimization.
2.2 First Order Interior Point Algorithm
Note that for any x, x
+
(0, b], Assumption 2.1 implies that
H(x
+
) H(x) + h∇H(x), x
+
xi +
β
2
kx
+
xk
2
. (7)
Since ϕ is concave on [0, +), then for any s, t (0, +),
ϕ(t) ϕ(s) + h∇ϕ(s), t si. (8)
Thus, for any x, x
+
(0, b], we obtain
f(x
+
) f (x) + h∇f(x), x
+
xi +
β
2
kx
+
xk
2
. (9)
Let Xd
x
= x
+
x. We obtain
f(x
+
) f (x) + hXf(x), d
x
i +
β
2
kXd
x
k
2
. (10)
We now use the reduction idea to solve the constrained non-Lipschitz op-
timization problem (1). To achieve a reduction of the objective function, we

Citations
More filters
Journal ArticleDOI

Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models

TL;DR: The worst-case evaluation complexity for smooth (possibly nonconvex) unconstrained optimization is considered and it is shown that an $$epsilon $$ϵ-approximate first-order critical point can be computed in at most O(ϵ-(p+1)/p) evaluations of the problem’s objective function and its derivatives.
Journal ArticleDOI

Implementable tensor methods in unconstrained convex optimization.

TL;DR: New tensor methods for unconstrained convex optimization, which solve at each iteration an auxiliary problem of minimizing convex multivariate polynomial, and an efficient technique for solving the auxiliary problem, based on the recently developed relative smoothness condition are developed.
Journal ArticleDOI

Iterative reweighted minimization methods for $$l_p$$lp regularized unconstrained nonlinear programming

TL;DR: New methods for the Lipschitz continuous IRL1 minimization problems are developed and it is shown that any accumulation point of the sequence generated by these methods is a first-order stationary point, provided that the approximation parameter ϵ is below a computable threshold value.
Journal ArticleDOI

Bilinear Factor Matrix Norm Minimization for Robust PCA: Algorithms and Applications

TL;DR: The double nuclear norm and Frobenius/nuclear hybrid norm penalties are defined and it is proved that they are in essence the Schatten-LaTeX quasi-norms, respectively, which lead to much more tractable and scalable Lipschitz optimization problems.

Interior Point Algorithms Theory And Analysis

Yvonne Freeh
TL;DR: This interior point algorithms theory and analysis tends to be the representative book in this website.
References
More filters
Journal ArticleDOI

Regression Shrinkage and Selection via the Lasso

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Book

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.
Book

Introductory Lectures on Convex Optimization: A Basic Course

TL;DR: A polynomial-time interior-point method for linear optimization was proposed in this paper, where the complexity bound was not only in its complexity, but also in the theoretical pre- diction of its high efficiency was supported by excellent computational results.
Journal ArticleDOI

Regression shrinkage and selection via the lasso: a retrospective

TL;DR: In this article, the authors give a brief review of the basic idea and some history and then discuss some developments since the original paper on regression shrinkage and selection via the lasso.
Journal ArticleDOI

Nearly unbiased variable selection under minimax concave penalty

TL;DR: It is proved that at a universal penalty level, the MC+ has high probability of matching the signs of the unknowns, and thus correct selection, without assuming the strong irrepresentable condition required by the LASSO.
Related Papers (5)
Frequently Asked Questions (12)
Q1. What have the authors contributed in "Complexity analysis of interior point algorithms for non-lipschitz and nonconvex minimization" ?

The authors propose a first order interior point algorithm for a class of nonLipschitz and nonconvex minimization problems with box constraints, which arise from applications in variable selection and regularized optimization. The authors show that the worst-case complexity for finding an scaled first order stationary point is O ( −2 ). 

By using the Hessian of H, the second order interior point algorithm can generate an interior scaled second order stationary point in at most O( −3/2) steps. 

Using an interior-point algorithm, Ye [17] proved that an scaled KKT or first order stationary point of general quadratic programming can be computed in O( −1 log( −1)) iterations where each iteration would solve a ball-constrained or trust-region quadratic program that is equivalent to a simplex convex quadratic minimization problem. 

For the unconstrained l2-lp optimization (3), any local minimizer x satisfies the first order necessary condition [6]XAT (Ax− c) + λp|x|p = 0, (4)and the second order necessary conditionXATAX + λp(p− 1)|X|p 0, (5)where |X|p = diag(|x1|p, . . . , |xn|p).5 

Lemma 3 If ρk ≥ 49 (γR 3 + 12λR p)‖dk‖ holds for all k ∈ K, then the Second Order Interior Point Algorithm produces an global minimizer of (1) in at most O( − 3 2 ) steps. 

The proposed Second Order Interior Point Algorithm obtains an scaled second order stationary point or global minimizer of (27) in no more than 36f(x0) max{λ2R2p, 8λRp} − 32 steps. 

(2)At each step, the SSQP algorithm solves a strongly convex quadratic minimization problem with a diagonal Hessian matrix, which has a simple closedform solution. 

2.2 First Order Interior Point AlgorithmNote that for any x, x+ ∈ (0, b], Assumption 2.1 implies thatH(x+) ≤ H(x) + 〈∇H(x), x+ − x〉+ β 2 ‖x+ − x‖2. (7)Since ϕ is concave on [0,+∞), then for any s, t ∈ (0,+∞),ϕ(t) ≤ ϕ(s) + 〈∇ϕ(s), t− s〉. 

The authors show that the objective function value f(xk) is monotonically decreasing along the sequence {xk} generated by the algorithm, and the worst-case complexity of the algorithm for generating an interior scaled first order stationary point of (1) is O( −2), which is the same in the worst-case complexity order of the steepest-descent methods for nonconvex smooth optimization problems. 

In Section 3, a second order interior point algorithm is given to solve a special case of (1), where H is twice continuously differentiable, ϕ := t and Ω = {x : x ≥ 0}. 

These six penalty functions are concave in R+ and continuously differentiable in R++, which are often used in statistics and sparse reconstruction. 

∞In − λRp p(1− p)(2− p)(3− p) 2 ‖dk‖∞In − λ 2 Rp‖dk‖∞In.(58)From (54), (55), (56), (58), and ρk < 4 9 (γR 3 + λ2R p)‖dk‖, the authors obtainXk+1∇2f(xk+1)Xk+1 =Xk+1∇2H(xk+1)Xk+1 + λp(p− 1)Xk+1Xp−2k+1Xk+1− (9 4 ρk + γR 3‖dk‖∞ + 1 2 λRp‖dk‖∞)In − 2(γR3 + λ 2 Rp)‖dk‖∞In −min{1, 1 γR3 + 12λR p } √ In − √ In.(59)17According to Lemmas 3 - 5, the authors can obtain the complexity of the Second Order Interior Point Algorithm for finding an scaled second order stationary point of (27).