scispace - formally typeset
Open AccessJournal ArticleDOI

Active Learning Based on Locally Linear Reconstruction

Reads0
Chats0
TLDR
This paper proposes a novel active learning algorithm which takes into account the local structure of the data space, and proposes a transductive learning algorithm called Locally Linear Reconstruction (LLR) to reconstruct every other point.
Abstract
We consider the active learning problem, which aims to select the most representative points. Out of many existing active learning techniques, optimum experimental design (OED) has received considerable attention recently. The typical OED criteria minimize the variance of the parameter estimates or predicted value. However, these methods see only global euclidean structure, while the local manifold structure is ignored. For example, I-optimal design selects those data points such that other data points can be best approximated by linear combinations of all the selected points. In this paper, we propose a novel active learning algorithm which takes into account the local structure of the data space. That is, each data point should be approximated by the linear combination of only its neighbors. Given the local reconstruction coefficients for every data point and the coordinates of the selected points, a transductive learning algorithm called Locally Linear Reconstruction (LLR) is proposed to reconstruct every other point. The most representative points are thus defined as those whose coordinates can be used to best reconstruct the whole data set. The sequential and convex optimization schemes are also introduced to solve the optimization problem. The experimental results have demonstrated the effectiveness of our proposed method.

read more

Content maybe subject to copyright    Report

Active Learning Based on
Locally Linear Reconstruction
Lijun Zhang, Student Member, IEEE, Chun Chen, Member, IEEE,JiajunBu,Member, IEEE,
Deng Cai, Member, IEEE, Xiaofei He, Senior Member, IEEE, and Thomas S. Huang, Life Fellow, IEEE
Abstract—We consider the active learning problem, which aims to select the most representative points. Out of many existing active
learning techniques, optimum experimental design (OED) has received considerable attention recently. The typical OED criteria
minimize the variance of the parameter estimates or predicted value. However, these methods see only global euclidean structure,
while the local manifold structure is ignored. For example, I-optimal design selects those data points such that other data points can be
best approximated by linear combinations of all the selected points. In this paper, we propose a novel active learning algorithm which
takes into account the local structure of the data space. That is, each data point should be approximated by the linear combination of
only its neighbors. Given the local reconstruction coefficients for every data point and the coordinates of the selected points, a
transductive learning algorithm called Locally Linear Reconstruction (LLR) is proposed to reconstruct every other point. The most
representative points are thus defined as those whose coordinates can be used to best reconstruct the whole data set. The sequential
and convex optimization schemes are also introduced to solve the optimization problem. The experimental results have demonstrated
the effectiveness of our proposed method.
Index Terms—Active learning, experimental design, local structure, reconstruction.
Ç
1INTRODUCTION
I
N many real-word applications, there are huge volumes of
unlabeled data, but the labels are usually difficult to get and
expensive. Semi-supervised learning [1], [2], [3] addresses
this problem by exploring additional information contained
in the unlabeled data. Active learning reduces the labeling
cost in a complementary way by querying the labels of the
most informative points. Thus, instead of being a passive
recipient of data to be processed, the active learner has
the ability to control what data are added to its training set [4].
In this way, we expect that the active learner can achieve high
accuracy using as few labeled points as possible [5].
The main challenge in active learning is how to evaluate
the informativeness of the unlabeled points. One of the
most widely used principles is uncertainty sampling. That is,
the active learner queries those points whose predicted
labels are most uncertain using the current trained model.
This principle has been applied to logistic regression [6],
support vector machines [7], nearest neighbor classifiers [8],
[9], etc. Other popular active learning principles include
query by committee [10], [11], estimated error reduction [12],
[13], and variance reduction [4], [14].
The principle of variance reduction i s derived from
Optimum Experimental Design (OED) [14]. In statistics, the
problem of selecting samples to label is typically referred to
as experimental design. The sample x is referred to as
experiment and its label y is referred to as measurement. The
study of OED is concerned with the design of experiments
that are expected to minimize variances of a parameterized
model [14], [15], [16], [17]. There are two types of selection
criteria for OED. One type is to choose data points to
minimize the variance of the model’s parameters, which
results in D, A, and E-optimal Design. The other is to
minimize the variance of the prediction value, which results
in I and G-optimal Design.
Recently, Yu et al. have proposed Transductive Experi-
mental Design (TED) [16], which has yielded impressive
results. TED is fundamentally based on I-optimal design
but evaluates the average predictive variance over one test
set that is given beforehand. It has been shown that finding
those points which minimize the average predictive
variance of the estimated function is equivalent to finding
those points such that other points can be best approxi-
mated by linear combinations of the selected points. TED is
a global method in the sense that each data point is linearly
reconstructed by using all of the selected data points, no
matter how far away the selected data points are from the
point to be reconstructed.
In reality, the high-dimensional data may not be
uniformly distributed in the whole ambient space. Instead,
recent studies [18], [19], [20], [21] have shown that naturally
occurring data may reside on a lower dimensional sub-
manifold which is embedded in the high-dimensional
ambient space. However, previous approaches such as
TED fail to take into account this manifold structure. Given
2026 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. 10, OCTOBER 2011
. L. Zhang, C. Chen, and J. Bu are with the Zhejiang Provincial Key
Laboratory of Service Robot, College of Computer Science, Cao Guangbiao
Building, Yuquan Campus, Zhejiang University, Hangzhou 310027,
China. E-mail: {zljzju, chenc, bjj}@zju.edu.cn.
. D. Cai and X. He are with the State Key Lab of CAD & CG, College of
Computer Science, Zhejiang University, 388 Yu Hang Tang Rd.,
Hangzhou 310027, China. E-mail: {dengcai, xiaofeihe}@cad.zju.edu.cn.
. T.S. Huang is with the Beckman Institute for Advanced Sciences and
Technology, University of Illinois at Urbana Champaign, 405 North
Mathews Ave., Urbana, IL 61801. E-mail: huang@ifp.uiuc.edu.
Manuscript received 29 Jan. 2010; revised 26 Aug. 2010; accepted 25 Nov.
2010; published online 28 Jan. 2011.
Recommended for acceptance by J. Winn.
For information on obtaining reprints of this article, please send e-mail to:
tpami@computer.org, and reference IEEECS Log Number
TPAMI-2010-01-0069.
Digital Object Identifier no. 10.1109/TPAMI.2011.20.
0162-8828/11/$26.00 ß 2011 IEEE Published by the IEEE Computer Society

a data point, it is more reasonable to reconstruct it by using
only its nearest neighbors [18].
In this paper, we propose a novel active learning
algorithm which selects the most representative points with
respect to the intrinsic geometrical structure of the data.
Inspired by Locally Linear Embedding (LLE) [18], we
assume that each data point and its neighbors lie on or close
to a locally linear patch of the manifold. Then, the manifold
structure is characterized by the linear coefficients that
reconstruct each data point from its neighbors. A transduc-
tive learning algorithm called Locally Linear Reconstruction
(LLR) is proposed to reconstruct the whole data set by using
the given local reconstruction coefficients for every data
point and the coordinates of the selected points. The most
representative points are therefore defined as those whose
coordinates can be used to best reconstruct the whole data
set. A sequential optimization scheme and a convex
relaxation are proposed to solve the optimization problem.
The outline of the paper is as follows: In Section 2, we
review the related work in experimental design. Our
proposed active learning algorithm (LLR
Active
) is introduced
in Section 3. In Section 4, we propose two computational
schemes to solve the optimization problem. Experiments are
presented in Section 5. Finally, we provide some concluding
remarks and suggestions for future work in Section 6.
Notation. Capital letters (e.g., M) are used to denote
matrices. For a given matrix M, we denote its ith column by
M
i
and its ith row by M
i
. Script capital letters (e.g., X) are
used to denote ordinary sets. Blackboard bold capital letters
(e.g., IR ) are used to denote number sets. Small letters (e.g.,
) are used to denote scalars. Bold small letters (e.g., ) are
used to denote vectors. We use x
i
to denote both the
ith point and its coordinate (a column vector).
2RELATED WORK
As described, the work most related to our proposed
approach is optimum experimental design. In this section,
we will briefly describe the generic active learning problem
and then provide a review of the conventional experimental
design criteria and the recently proposed Transductive
Experimental Design algorithm.
2.1 The Active Learning Problem
The generic problem of active learning is the following.
Given a set of points fx
1
; ...; x
m
g in IR
d
, find a subset
fx
s
1
; ...; x
s
k
gXwhich contains the most informative
points. That is, if the points x
s
i
ði ¼ 1; ...;kÞ are labeled and
used as training points, we can predict the labels of the
unlabeled points most precisely. Active learning is usually
referred to as experimental design in statistics. Since our
approach is motivated by recent progress in experimental
design [14], [16], [17], we begin with a brief description of it.
2.2 Optimum Experimental Design
We consider a linear regression model
y ¼ w
T
x þ ; ð1Þ
where w 2 IR
d
is the parameter vector, y is the real-valued
output, and is the measurement noise with zero mean and
constant variance
2
. Optimum experimental design at-
tempts to select the most informative experiments (or data
points) to learn a prediction function fðxÞ¼w
T
x so that the
expected prediction error can be minimized. Given a set of
measured data points ðx
s
1
;y
1
Þ; ...; ðx
s
k
;y
k
Þ, the most pop-
ular estimation method is least squares,inwhichwe
minimize the residual sum of squares (RSS):
RSSðwÞ¼
X
k
i¼1
ðy
i
fðx
s
i
ÞÞ
2
: ð2Þ
Let Z ¼½x
s
1
; ...; x
s
k
T
and y ¼½y
1
; ...;y
k
T
. The optimal
solution is
b
w ¼ðZ
T
ZÞ
1
Z
T
y: ð3Þ
It can be proved [22] that bw is an unbiased estimation of w
and its covariance can be expressed as
Covð
b
wÞ¼
2
ðZ
T
ZÞ
1
: ð4Þ
The criteria of OED [14] can be classified into two
categories. The first category is to select the points x
s
i
in
order to minimize the size of the parameter covariance matrix
[23]. The typical methods in this category include D, A, and
E-optimal design. D-optimal design minimizes the determi-
nant of Covð
b
wÞ, and thus minimizes the volume of the
confidence region. A-optimal design minimizes the trace of
Covð
b
wÞ, and thus minimizes the dimensions of the enclosing
box around the confidence region. E-optimal design mini-
mizes the largest eigenvalue of Covð
b
wÞ, and thus minimizes
the size of the major axis of the confidence region.
The other category of experimental design criteria is to
select the points x
s
i
in order to minimize the variance of
predicted value over some region of interest O [24], [25].
Given a test point v 2O, the predicted value is
b
w
T
v
with variance v
T
Covð
b
wÞv. The two most common criteria
in this category are I and G-optimal design. I-optimal
design minimizes the a verage predictive variance
R
v2O
v
T
Covð
b
wÞv dðvÞ, where is a probability distribution
on O. G-optimal design minimizes the maximum predictive
variance, i.e., max
v2O
fv
T
Covð
b
wÞvg.
2.3 Transductive Experimental Design
Recently, Yu et al. [16] proposed the TED approach, which
can be seen as the discrete version of I-optimal design. TED
considers the Regularized Least Squares formulation (ridge
regression) as follows:
b
w
ridge
¼ argmin
w
X
k
i¼1
ðy
i
fðx
s
i
ÞÞ
2
þ kwk
2
; ð5Þ
where 0 is the regularization parameter. It is easy to
check that the optimal solution has the following expression:
b
w
ridge
¼ðZ
T
Z þ IÞ
1
Z
T
y; ð6Þ
where I is the identity matrix. The covariance matrix of
bw
ridge
is
Covð
b
w
ridge
Þ
¼ðZ
T
Z þ IÞ
1
Z
T
CovðyÞZðZ
T
Z þ IÞ
1
¼
2
ðZ
T
Z þ IÞ
1
Z
T
ZðZ
T
Z þ IÞ
1
¼
2
ðZ
T
Z þ IÞ
1
ðZ
T
Z þ I IÞðZ
T
Z þ IÞ
1
¼
2
ðZ
T
Z þ IÞ
1

2
ðZ
T
Z þ IÞ
2
:
ð7Þ
ZHANG ET AL.: ACTIVE LEARNING BASED ON LOCALLY LINEAR RECONSTRUCTION 2027

Since the regularization parameter is usually set to be very
small, we have
Covð
b
w
ridge
Þ
2
ðZ
T
Z þ IÞ
1
: ð8Þ
Similarly to I-optimal design, TED selects those points
which can minimize the average predictive variance over
one pregiven test set. For simplicity, we assume the test set
is just X. Let X ¼½x
1
; ...; x
m
T
. The average predictive
variance is
1
m
X
m
i¼1
x
T
i
Covð
b
w
ridge
Þx
i
2
m
X
m
i¼1
x
T
i
ðZ
T
Z þ IÞ
1
x
i
¼
2
m
TrðXðZ
T
Z þ IÞ
1
X
T
Þ:
ð9Þ
Thus, TED is formulated as the following optimization
problem:
min Tr XZ
T
Z þ I

1
X
T

ð10Þ
with variable Z ¼½x
s
1
; ...; x
s
k
T
. After some mathematical
derivation, the above problem can be formulated as
min
X
m
i¼1
kx
i
Z
T
i
k
2
þ k
i
k
2
; ð11Þ
where the variables are Z ¼½x
s
1
; ...; x
s
k
T
and
i
2 IR
k
, i ¼
1; ...;m [16]. The first term in the objective function shows
that the data points selected by TED are the most
representative ones. That is, the selected points can be used
to reconstruct the whole data set most precisely. The second
term indicates that TED penalizes the norm of the
reconstruction coefficients. So, it tends to select points with
large norm. Notice that TED is closely related to the
problem of Column-Based Matrix Decomposition [26].
3ACTIVE LEARNING BASED ON LOCALLY L INEAR
RECONSTRUCTION
In this section, we introduce a novel active learning
algorithm based on the principle of locally linear
reconstruction.
3.1 Locally Linear Reconstruction
Recent studies [18], [19], [20], [21], [27] have shown that
naturally occurring data may reside on a lower dimensional
submanifold which is embedded in the high-dimensional
ambient space. However, previous experimental design
approaches only take into account the global euclidean
structure of the data space, whereas the local manifold
structure is not well respected.
Inspired by LLE [18], we assume that the data lie on a
low-dimensional manifold which can be approximated
linearly in a local area of the high-dimensional space.
Therefore, we require that a data point can only be linearly
reconstructed from its neighbors. The optimal reconstruction
coefficients can be obtained by solving the following
problem [18]:
min
X
m
i¼1
kx
i
X
m
j¼1
W
ij
x
j
k
2
s:t:
X
m
j¼1
W
ij
¼ 1;i¼ 1; ...;m
W
ij
¼ 0ifx
j
62 N
p
ðx
i
Þ;
ð12Þ
where the variable is the matrix W 2 IR
mm
. Here, W
ij
summarizes the contribution of the jth data point to the
ith reconstruction, and N
p
ðx
i
Þ is the neighborhood of x
i
defined by its p nearest neighbors.
To measure the representativeness of the selected data
points, we need to design a data reconstruction mechanism
by using the reconstruction coefficients. Given a set of
selected data points fx
s
1
; ...; x
s
k
gX,weproposea
transductive learning algorithm, called LLR, to reconstruct
the data points. Let fq
1
; ...; q
m
g denote the reconstructed
points. Their coordinates are determined by minimizing the
following cost function:
ðq
1
; ...; q
m
Þ¼
X
k
i¼1
kq
s
i
x
s
i
k
2
þ
X
m
i¼1
kq
i
X
m
j¼1
W
ij
q
j
k
2
;
ð13Þ
where is a suitable constant. The role of the first term of
the right-hand side in the cost function is to fix the
coordinates of the selected data points. The second term
requires the reconstructed points to share the same local
geometrical structure with the original points.
Let X ¼½x
1
; ...; x
m
T
, Q ¼½q
1
; ...; q
m
T
, and be an
m m diagonal matrix whose diagonal entry
ii
is 1 if i 2
fs
1
; ...;s
k
g and 0 otherwise. Then, the above cost function
(13) can be rewritten in the following matrix form:
ðQÞ¼TrððQ XÞ
T
ðQ XÞÞ þ TrðQ
T
MQÞ; ð14Þ
where M ¼ðI W Þ
T
ðI W Þ. Requiring that the gradient
of ðQÞ vanish gives the following equation:
ðQ XÞþMQ ¼ 0: ð15Þ
Finally, the reconstructed points are given by
Q ¼ðM þ Þ
1
X: ð16Þ
The LLR algorithm presented here shares many common
properties with LLE [18]. For example, we use the same
objective function (12) to find the reconstruction coeffi-
cients. However, the goals of LLE and LLR are different.
LLE uses the reconstruction coefficients to obtain lower
dimensional representations of the original data points.
Suppose y
i
is the lðdÞ-dimensional embedding of x
i
,
i ¼ 1; ...;m. LLE solves the following optimization problem
to obtain y
i
s:
ðyÞ¼
X
m
i¼1
y
i
X
m
j¼1
W
ij
y
j
2
: ð17Þ
For our LLR algorithm, the goal is to reconstruct the data
set. Therefore, the reconstructed data point q
i
has the same
dimension as the original data point x
i
. Moreover, for
the selected data points x
s
i
, i ¼ 1; ...;k, their coordinates
are given. Therefore, their reconstructions (i.e., q
s
i
) should
2028 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. 10, OCTOBER 2011

be as close to their original coordinates (i.e., x
s
i
) as possible.
Our ultimate goal is to select the most representative data
points, so that the reconstruction error can be minimized.
There are also some works in semi-supervised learning
which have a similar principle of LLR, such as [2], [28], [29].
However, all of these approaches aim to predict the labels
for the unlabeled points by using both labeled and
unlabeled points. In LLR, there is no label prediction task.
The task of LLR is to reconstruct the data set, given some
selected points and the reconstruction coefficients.
3.2 Selecting the Most Representative Points
Given the original data points x
1
; ...; x
m
, and the recon-
structed data points q
1
; ...; q
m
, the reconstruction error can
be measured as follows:
eðx
s
1
; ...; x
s
k
Þ
¼kX Qk
2
F
¼kX ðM þ Þ
1
Xk
2
F
¼kX ðM þ Þ
1
ð þ M MÞXk
2
F
¼kðM þ Þ
1
MXk
2
F
;
ð18Þ
where kk
2
F
is the matrix Frobenius norm, which is defined
as kAk
2
F
¼ TrðAA
T
Þ¼TrðA
T
AÞ. Clearly, the reconstruction
error is only dependent on the selected d ata points
x
s
1
; ...; x
s
k
.
Thus, the most representative points are naturally
defined as those which minimize the reconstruction error
(18). That is, given their coordinates, we can reconstruct the
whole data set most precisely by using the LLR algorithm.
Suppose we are going to select k points, the active learning
problem is, thus, formally defined below:
Definition 1. Active Learning based on LLR:
min M þ Þ
1
MXk
2
F
s:t: is diagonal;
ii
2f0; 1g;i¼ 1; ...;m
X
m
i¼1
ii
¼ k;
ð19Þ
where the variable is the diagonal matrix 2 IR
mm
.
Given the optimal solution
b
of (19), we select those data
points whose corresponding entries in the diagonal matrix
are 1. After we obtain the labels of the selected points, we
can use any supervised or semi-supervised algorithms [1],
[2], [3], [22], [30] to predict the labels of the other points.
4OPTIMIZATION SCHEME
The optimization problem of LLR
Active
(19) is difficult due
to its combinatorial nature. In this section, we develop two
optimization schemes to solve (19). The first one is a
sequential greedy approach, and the second one is a convex
relaxation. The solution of sequential approach is subopti-
mal, but its sequential property makes it much more
efficient than convex optimization and it thus can be
applied to large-scale data sets. Moreover, our experimental
results show that there is only a slight difference between
sequential and convex optimization performance. On the
other hand, the convex relaxation approach can guarantee
to find the globally optimal solution of the relaxed problem,
but it is computationally expensive.
4.1 The Sequential Approach
Suppose a set of n points Z
n
¼fx
s
1
; ...; x
s
n
gXhave been
selected as the n most representative ones. Let
n
denote
the corresponding m m diagonal matrix whose diagonal
entry ð
n
Þ
ii
is 1 if x
i
2Z
n
and 0 otherwise. Let
i
be an
m m matrix whose ði; iÞth entry is 1 and all the other
entries are 0. The ðn þ 1Þth point x
s
nþ1
can be found by
solving the following problem:
s
nþ1
¼ argmin
i62fs
1
;...;s
n
g
M þ
n
þ
i
Þ
1
MXk
2
F
: ð20Þ
As can be seen, the most expensive calculation in (20) is the
matrix inverse ðM þ
n
þ
i
Þ
1
. Since the matrix M is
sparse, the sparse Cholesky factorization [31] can be applied
to accelerate the calculation of ðM þ
n
þ
i
Þ
1
MX. But
the sequential solver based on the sparse Cholesky
factorization still needs to perform m n factorizations in
order to solve (20), and thus doesn’t scale well.
A much faster method is to use the Sherman-Morrison-
Woodbury formula [32] to avoid directly inverting a matrix.
Given an invertible matrix A, two column vectors u and v,
the Sherman-Morrison-Woodbury formula states:
ðA þ uv
T
Þ
1
¼ A
1
A
1
uv
T
A
1
1 þ v
T
A
1
u
: ð21Þ
Denote the ith unit vector as e
i
. It is easy to check that
i
¼ e
i
e
T
i
. Define
H ¼ðM þ
n
Þ
1
:
Let H
i
denotes the ith column of H, and H
i
denotes the
ith row of H. Following (21), we get
ðM þ
n
þ
i
Þ
1
¼ H
H
i
H
i
1 þ H
ii
: ð22Þ
With (22), the objective function of (20) can be rewritten as
M þ
n
þ
i
Þ
1
MXk
2
F
¼
2
TrðHMXX
T
MHÞ
2
2
H
i
MXX
T
MHH
i
1 þ H
ii
þ
2
H
i
H
i
H
i
MXX
T
MH
i
ð1 þ H
ii
Þ
2
:
ð23Þ
For brevity, the derivations of (22) and (23) are given in
Appendices A and B, respectively, which can be found on
the Computer Society Digital Library at http://doi.ieee
computersociety.org/10.1109/TPAMI.2011.20.
Denote A ¼ MXX
T
M. Notice that TrðHAHÞ is a con-
stant when selecting the ðn þ 1Þth data point. Therefore, the
optimization problem (20) becomes
s
nþ1
¼ argmin
i62fs
1
;...;s
n
g
1
1 þ H
ii
H
i
H
i
H
i
AH
i
1 þ H
ii
2H
i
AHH
i

:
ð24Þ
ZHANG ET AL.: ACTIVE LEARNING BASED ON LOCALLY LINEAR RECONSTRUCTION 2029

Since H
i
H
i
¼kH
i
k
2
, the optimization problem (24) can be
further simplified as
s
nþ1
¼ argmin
i62fs
1
;...;s
n
g
1
1 þ H
ii
H
i
A
kH
i
k
2
2ð1 þ H
ii
Þ
I H
! !
H
i
:
ð25Þ
After we have selected the ðn þ 1Þth point x
s
nþ1
,the
H matrix can be updated as
H ðM þ
nþ1
Þ
1
¼ðM þ
n
þ
i
Þ
1
:
The matrix inverse can be computed according to (22). This
process is repeated until we have selected k points. In the
beginning, there are no data points selected. Therefore, we
set H ¼ðMÞ
1
. Since M is singular, a small ridge term is
added to it. The sequential approach is summarized in
Table 1.
4.2 The Convex Relaxation
In this section, we discuss how to perform convex
relaxation to solve the optimization problem (19).
First, we rewrite the objective function of (19) as follows:
M þ Þ
1
MXk
2
F
¼
2
TrðX
T
MðM þ Þ
1
ðM þ Þ
1
MXÞ
¼
2
TrðX
T
Mð
2
M
2
þ M þ M þ Þ
1
MXÞ;
ð26Þ
where, in line 3, we use the property
2
¼ .
Since is diagonal, we introduce a vector ¼½
1
; ...;
m
T
such that ¼ diagðÞ. Here, the value of
i
indicates
whether or not the data point x
i
is selected. Define an affine
function
hðÞ¼
2
M
2
þ
X
m
i¼1
i
M
i
e
T
i
þ e
i
M
i
þ e
i
e
T
i
:
Thus, the original optimization problem (19) is equivalent to
min TrðX
T
MhðÞ
1
MXÞ
s:t: 2f0; 1g
m
; 1
T
¼ k;
ð27Þ
where the variable is 2 IR
m
and 1 is a column vector of all
ones. Notice that the variable vector is sparse and has
only k nonzero entries.
In order to solve the above optimization problem
efficiently, we relax the integer constraints on
i
s and allow
i
s to take real nonnegative values. Then, the value of
i
indicates how significantly x
i
contributes to the minimiza-
tion in problem (27). The sparseness of can be controlled
by minimizing the
1
-norm of (kk
1
), which has
conventionally been applied to lasso regression [22], [33].
Following the convention in the field of optimization, we
use to denote componentwise inequality between two
vectors with the same dimension. For example,
means that
i
i
, for all i. Because all the elements of
are nonnegativ e, kk
1
is equal to 1
T
. Finally, the
optimization problem becomes
min TrðX
T
MhðÞ
1
MXÞþ1
T
s:t: 0;
ð28Þ
where the variable is 2 IR
m
and 0 is the column vectors of
all zeros. It can be shown that the problem (28) is a convex
optimization problem with variable [33].
The objective function of problem (28) is continuously
differentiable twice, so it can be solved directly by standard
optimization techniques [33]. In particular, we show that it
can be cast as a Semi-Definite Programming (SDP) problem,
which can be solved using a standard SDP package. By
introducing an auxiliary variable P 2 IR
dd
, the problem
(28) can be equivalently rewritten as
min TrðP Þþ1
T
s:t:P
SS
þ
d
X
T
MhðÞ
1
MX
0
ð29Þ
with variables P 2 IR
dd
and 2 IR
m
. Here, SS
þ
d
denotes the
set of symmetric positive semi-definite d d matrices,
which is called positive semi-definite cone in the field of
optimization. The associated generalized inequality
SS
þ
d
is
the usual matrix inequality: A
SS
þ
d
B means A B is a
positive semi-definite d d matrix [33].
The problem (29) can be cast as an SDP by using the Schur
complement theorem [33]. Given a symmetric matrix X
partitioned as
X ¼
AB
B
T
C

:
If A is invertible, the matrix S ¼ C B
T
A
1
B is called the
Schur complement of A in X. The Schur complement
theorem states that, if A is positive definite, then X is
positive semi-definite if and only if S is positive semi-
definite. According to this theorem, problem (29) is
equivalent to the following SDP problem:
2030 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. 10, OCTOBER 2011
TABLE 1
The Sequential Approach for LLR
Active

Figures
Citations
More filters
Journal ArticleDOI

Active Deep Learning for Classification of Hyperspectral Images

TL;DR: The proposed active learning algorithm based on a weighted incremental dictionary learning that trains a deep network efficiently by actively selecting training samples at each iteration is shown to be efficient and effective in classifying hyperspectral images.
Posted Content

Active Deep Learning for Classification of Hyperspectral Images

TL;DR: In this paper, an active deep learning algorithm based on a weighted incremental dictionary learning is proposed for hyperspectral images classification, which selects training samples that maximize two selection criteria, namely representative and uncertainty.
Journal ArticleDOI

Tensor factorization using auxiliary information

TL;DR: This paper proposes to use relationships among data as auxiliary information in addition to the low-rank assumption to improve the quality of tensor decomposition, and introduces two regularization approaches using graph Laplacians induced from the relationships.
Journal ArticleDOI

Exploring Representativeness and Informativeness for Active Learning

TL;DR: Wang et al. as discussed by the authors proposed a general active learning framework that effectively fuses the two active sampling criteria, namely representativeness and informativeness, without any assumption on data.
Proceedings ArticleDOI

Active semi-supervised learning using sampling theory for graph signals

TL;DR: In this paper, the authors proposed a sampling theory for graph signals, which aims to identify the class of graph signals that can be reconstructed from their values on a subset of vertices.
References
More filters
Book

Convex Optimization

TL;DR: In this article, the focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them, and a comprehensive introduction to the subject is given. But the focus of this book is not on the optimization problem itself, but on the problem of finding the appropriate technique to solve it.
Journal ArticleDOI

A Tutorial on Support Vector Machines for Pattern Recognition

TL;DR: There are several arguments which support the observed high accuracy of SVMs, which are reviewed and numerous examples and proofs of most of the key theorems are given.
Journal ArticleDOI

Nonlinear dimensionality reduction by locally linear embedding.

TL;DR: Locally linear embedding (LLE) is introduced, an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs that learns the global structure of nonlinear manifolds.
Journal ArticleDOI

A global geometric framework for nonlinear dimensionality reduction.

TL;DR: An approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set and efficiently computes a globally optimal solution, and is guaranteed to converge asymptotically to the true structure.
Frequently Asked Questions (2)
Q1. What are the contributions mentioned in the paper "Active learning based on locally linear reconstruction" ?

The authors consider the active learning problem, which aims to select the most representative points. In this paper, the authors propose a novel active learning algorithm which takes into account the local structure of the data space. The sequential and convex optimization schemes are also introduced to solve the optimization problem. 

The authors will investigate this in their future work.