What are the future works in "Active learning based on locally linear reconstruction" ?

The authors will investigate this in their future work.

(Open Access) Active Learning Based on Locally Linear Reconstruction (2011) | Lijun Zhang

Q: What are the contributions mentioned in the paper "Active learning based on locally linear reconstruction" ?

The authors consider the active learning problem, which aims to select the most representative points. In this paper, the authors propose a novel active learning algorithm which takes into account the local structure of the data space. The sequential and convex optimization schemes are also introduced to solve the optimization problem.

Active Learning Based on

Locally Linear Reconstruction

Lijun Zhang, Student Member, IEEE, Chun Chen, Member, IEEE,JiajunBu,Member, IEEE,

Deng Cai, Member, IEEE, Xiaofei He, Senior Member, IEEE, and Thomas S. Huang, Life Fellow, IEEE

Abstract—We consider the active learning problem, which aims to select the most representative points. Out of many existing active

learning techniques, optimum experimental design (OED) has received considerable attention recently. The typical OED criteria

minimize the variance of the parameter estimates or predicted value. However, these methods see only global euclidean structure,

while the local manifold structure is ignored. For example, I-optimal design selects those data points such that other data points can be

best approximated by linear combinations of all the selected points. In this paper, we propose a novel active learning algorithm which

takes into account the local structure of the data space. That is, each data point should be approximated by the linear combination of

only its neighbors. Given the local reconstruction coefficients for every data point and the coordinates of the selected points, a

transductive learning algorithm called Locally Linear Reconstruction (LLR) is proposed to reconstruct every other point. The most

representative points are thus defined as those whose coordinates can be used to best reconstruct the whole data set. The sequential

and convex optimization schemes are also introduced to solve the optimization problem. The experimental results have demonstrated

the effectiveness of our proposed method.

Index Terms—Active learning, experimental design, local structure, reconstruction.

1INTRODUCTION

N many real-word applications, there are huge volumes of

unlabeled data, but the labels are usually difficult to get and

expensive. Semi-supervised learning [1], [2], [3] addresses

this problem by exploring additional information contained

in the unlabeled data. Active learning reduces the labeling

cost in a complementary way by querying the labels of the

most informative points. Thus, instead of being a passive

recipient of data to be processed, the active learner has

the ability to control what data are added to its training set [4].

In this way, we expect that the active learner can achieve high

accuracy using as few labeled points as possible [5].

The main challenge in active learning is how to evaluate

the informativeness of the unlabeled points. One of the

most widely used principles is uncertainty sampling. That is,

the active learner queries those points whose predicted

labels are most uncertain using the current trained model.

This principle has been applied to logistic regression [6],

support vector machines [7], nearest neighbor classifiers [8],

[9], etc. Other popular active learning principles include

query by committee [10], [11], estimated error reduction [12],

[13], and variance reduction [4], [14].

The principle of variance reduction i s derived from

Optimum Experimental Design (OED) [14]. In statistics, the

problem of selecting samples to label is typically referred to

as experimental design. The sample x is referred to as

experiment and its label y is referred to as measurement. The

study of OED is concerned with the design of experiments

that are expected to minimize variances of a parameterized

model [14], [15], [16], [17]. There are two types of selection

criteria for OED. One type is to choose data points to

minimize the variance of the model’s parameters, which

results in D, A, and E-optimal Design. The other is to

minimize the variance of the prediction value, which results

in I and G-optimal Design.

Recently, Yu et al. have proposed Transductive Experi-

mental Design (TED) [16], which has yielded impressive

results. TED is fundamentally based on I-optimal design

but evaluates the average predictive variance over one test

set that is given beforehand. It has been shown that finding

those points which minimize the average predictive

variance of the estimated function is equivalent to finding

those points such that other points can be best approxi-

mated by linear combinations of the selected points. TED is

a global method in the sense that each data point is linearly

reconstructed by using all of the selected data points, no

matter how far away the selected data points are from the

point to be reconstructed.

In reality, the high-dimensional data may not be

uniformly distributed in the whole ambient space. Instead,

recent studies [18], [19], [20], [21] have shown that naturally

occurring data may reside on a lower dimensional sub-

manifold which is embedded in the high-dimensional

ambient space. However, previous approaches such as

TED fail to take into account this manifold structure. Given

2026 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. 10, OCTOBER 2011

. L. Zhang, C. Chen, and J. Bu are with the Zhejiang Provincial Key

Laboratory of Service Robot, College of Computer Science, Cao Guangbiao

Building, Yuquan Campus, Zhejiang University, Hangzhou 310027,

China. E-mail: {zljzju, chenc, bjj}@zju.edu.cn.

. D. Cai and X. He are with the State Key Lab of CAD & CG, College of

Computer Science, Zhejiang University, 388 Yu Hang Tang Rd.,

Hangzhou 310027, China. E-mail: {dengcai, xiaofeihe}@cad.zju.edu.cn.

. T.S. Huang is with the Beckman Institute for Advanced Sciences and

Technology, University of Illinois at Urbana Champaign, 405 North

Mathews Ave., Urbana, IL 61801. E-mail: huang@ifp.uiuc.edu.

Manuscript received 29 Jan. 2010; revised 26 Aug. 2010; accepted 25 Nov.

2010; published online 28 Jan. 2011.

Recommended for acceptance by J. Winn.

For information on obtaining reprints of this article, please send e-mail to:

tpami@computer.org, and reference IEEECS Log Number

TPAMI-2010-01-0069.

Digital Object Identifier no. 10.1109/TPAMI.2011.20.

0162-8828/11/$26.00 ß 2011 IEEE Published by the IEEE Computer Society

a data point, it is more reasonable to reconstruct it by using

only its nearest neighbors [18].

In this paper, we propose a novel active learning

algorithm which selects the most representative points with

respect to the intrinsic geometrical structure of the data.

Inspired by Locally Linear Embedding (LLE) [18], we

assume that each data point and its neighbors lie on or close

to a locally linear patch of the manifold. Then, the manifold

structure is characterized by the linear coefficients that

reconstruct each data point from its neighbors. A transduc-

tive learning algorithm called Locally Linear Reconstruction

(LLR) is proposed to reconstruct the whole data set by using

the given local reconstruction coefficients for every data

point and the coordinates of the selected points. The most

representative points are therefore defined as those whose

coordinates can be used to best reconstruct the whole data

set. A sequential optimization scheme and a convex

relaxation are proposed to solve the optimization problem.

The outline of the paper is as follows: In Section 2, we

review the related work in experimental design. Our

proposed active learning algorithm (LLR

Active

) is introduced

in Section 3. In Section 4, we propose two computational

schemes to solve the optimization problem. Experiments are

presented in Section 5. Finally, we provide some concluding

remarks and suggestions for future work in Section 6.

Notation. Capital letters (e.g., M) are used to denote

matrices. For a given matrix M, we denote its ith column by

i

and its ith row by M

i

. Script capital letters (e.g., X) are

used to denote ordinary sets. Blackboard bold capital letters

(e.g., IR ) are used to denote number sets. Small letters (e.g.,

) are used to denote scalars. Bold small letters (e.g., ) are

used to denote vectors. We use x

to denote both the

ith point and its coordinate (a column vector).

2RELATED WORK

As described, the work most related to our proposed

approach is optimum experimental design. In this section,

we will briefly describe the generic active learning problem

and then provide a review of the conventional experimental

design criteria and the recently proposed Transductive

Experimental Design algorithm.

2.1 The Active Learning Problem

The generic problem of active learning is the following.

Given a set of points X¼fx

; ...; x

g in IR

, find a subset

Z¼fx

; ...; x

gXwhich contains the most informative

points. That is, if the points x

ði ¼ 1; ...;kÞ are labeled and

used as training points, we can predict the labels of the

unlabeled points most precisely. Active learning is usually

referred to as experimental design in statistics. Since our

approach is motivated by recent progress in experimental

design [14], [16], [17], we begin with a brief description of it.

2.2 Optimum Experimental Design

We consider a linear regression model

y ¼ w

x þ ; ð1Þ

where w 2 IR

is the parameter vector, y is the real-valued

output, and  is the measurement noise with zero mean and

constant variance 

. Optimum experimental design at-

tempts to select the most informative experiments (or data

points) to learn a prediction function fðxÞ¼w

x so that the

expected prediction error can be minimized. Given a set of

measured data points ðx

Þ; ...; ðx

Þ, the most pop-

ular estimation method is least squares,inwhichwe

minimize the residual sum of squares (RSS):

RSSðwÞ¼

i¼1

ðy

 fðx

ÞÞ

: ð2Þ

Let Z ¼½x

; ...; x



and y ¼½y

; ...;y



. The optimal

solution is

w ¼ðZ

ZÞ

1

y: ð3Þ

It can be proved [22] that bw is an unbiased estimation of w

and its covariance can be expressed as

Covð

wÞ¼

ðZ

ZÞ

1

: ð4Þ

The criteria of OED [14] can be classified into two

categories. The first category is to select the points x

order to minimize the size of the parameter covariance matrix

[23]. The typical methods in this category include D, A, and

E-optimal design. D-optimal design minimizes the determi-

nant of Covð

wÞ, and thus minimizes the volume of the

confidence region. A-optimal design minimizes the trace of

Covð

wÞ, and thus minimizes the dimensions of the enclosing

box around the confidence region. E-optimal design mini-

mizes the largest eigenvalue of Covð

wÞ, and thus minimizes

the size of the major axis of the confidence region.

The other category of experimental design criteria is to

select the points x

in order to minimize the variance of

predicted value over some region of interest O [24], [25].

Given a test point v 2O, the predicted value is

with variance v

Covð

wÞv. The two most common criteria

in this category are I and G-optimal design. I-optimal

design minimizes the a verage predictive variance

v2O

Covð

wÞv dðvÞ, where  is a probability distribution

on O. G-optimal design minimizes the maximum predictive

variance, i.e., max

v2O

Covð

wÞvg.

2.3 Transductive Experimental Design

Recently, Yu et al. [16] proposed the TED approach, which

can be seen as the discrete version of I-optimal design. TED

considers the Regularized Least Squares formulation (ridge

regression) as follows:

ridge

¼ argmin

i¼1

ðy

 fðx

ÞÞ

þ  kwk

; ð5Þ

where   0 is the regularization parameter. It is easy to

check that the optimal solution has the following expression:

ridge

¼ðZ

Z þ IÞ

1

y; ð6Þ

where I is the identity matrix. The covariance matrix of

ridge

Covð

ridge

¼ðZ

Z þ IÞ

1

CovðyÞZðZ

Z þ IÞ

1

¼ 

ðZ

Z þ IÞ

1

ZðZ

Z þ IÞ

1

¼ 

ðZ

Z þ IÞ

1

ðZ

Z þ I  IÞðZ

Z þ IÞ

1

¼ 

ðZ

Z þ IÞ

1

 

ðZ

Z þ IÞ

2

ð7Þ

ZHANG ET AL.: ACTIVE LEARNING BASED ON LOCALLY LINEAR RECONSTRUCTION 2027

Since the regularization parameter  is usually set to be very

small, we have

Covð

ridge

Þ

ðZ

Z þ IÞ

1

: ð8Þ

Similarly to I-optimal design, TED selects those points

which can minimize the average predictive variance over

one pregiven test set. For simplicity, we assume the test set

is just X. Let X ¼½x

; ...; x



. The average predictive

variance is

i¼1

Covð

ridge

Þx





i¼1

ðZ

Z þ IÞ

1



TrðXðZ

Z þ IÞ

1

Þ:

ð9Þ

Thus, TED is formulated as the following optimization

problem:

min Tr XZ

Z þ I



1



ð10Þ

with variable Z ¼½x

; ...; x



. After some mathematical

derivation, the above problem can be formulated as

min

i¼1

 Z



þ k

; ð11Þ

where the variables are Z ¼½x

; ...; x



and 

2 IR

, i ¼

1; ...;m [16]. The first term in the objective function shows

that the data points selected by TED are the most

representative ones. That is, the selected points can be used

to reconstruct the whole data set most precisely. The second

term indicates that TED penalizes the norm of the

reconstruction coefficients. So, it tends to select points with

large norm. Notice that TED is closely related to the

problem of Column-Based Matrix Decomposition [26].

3ACTIVE LEARNING BASED ON LOCALLY L INEAR

RECONSTRUCTION

In this section, we introduce a novel active learning

algorithm based on the principle of locally linear

reconstruction.

3.1 Locally Linear Reconstruction

Recent studies [18], [19], [20], [21], [27] have shown that

naturally occurring data may reside on a lower dimensional

submanifold which is embedded in the high-dimensional

ambient space. However, previous experimental design

approaches only take into account the global euclidean

structure of the data space, whereas the local manifold

structure is not well respected.

Inspired by LLE [18], we assume that the data lie on a

low-dimensional manifold which can be approximated

linearly in a local area of the high-dimensional space.

Therefore, we require that a data point can only be linearly

reconstructed from its neighbors. The optimal reconstruction

coefficients can be obtained by solving the following

problem [18]:

min

i¼1



j¼1

s:t:

j¼1

¼ 1;i¼ 1; ...;m

¼ 0ifx

62 N

ðx

Þ;

ð12Þ

where the variable is the matrix W 2 IR

mm

. Here, W

summarizes the contribution of the jth data point to the

ith reconstruction, and N

ðx

Þ is the neighborhood of x

defined by its p nearest neighbors.

To measure the representativeness of the selected data

points, we need to design a data reconstruction mechanism

by using the reconstruction coefficients. Given a set of

selected data points fx

; ...; x

gX,weproposea

transductive learning algorithm, called LLR, to reconstruct

the data points. Let fq

; ...; q

g denote the reconstructed

points. Their coordinates are determined by minimizing the

following cost function:

ðq

; ...; q

Þ¼

i¼1

 x

þ 

i¼1



j¼1

;

ð13Þ

where  is a suitable constant. The role of the first term of

the right-hand side in the cost function is to fix the

coordinates of the selected data points. The second term

requires the reconstructed points to share the same local

geometrical structure with the original points.

Let X ¼½x

; ...; x



, Q ¼½q

; ...; q



, and  be an

m  m diagonal matrix whose diagonal entry 

is 1 if i 2

; ...;s

g and 0 otherwise. Then, the above cost function

(13) can be rewritten in the following matrix form:

ðQÞ¼TrððQ  XÞ

ðQ  XÞÞ þ TrðQ

MQÞ; ð14Þ

where M ¼ðI  W Þ

ðI  W Þ. Requiring that the gradient

of ðQÞ vanish gives the following equation:

ðQ  XÞþMQ ¼ 0: ð15Þ

Finally, the reconstructed points are given by

Q ¼ðM þ Þ

1

X: ð16Þ

The LLR algorithm presented here shares many common

properties with LLE [18]. For example, we use the same

objective function (12) to find the reconstruction coeffi-

cients. However, the goals of LLE and LLR are different.

LLE uses the reconstruction coefficients to obtain lower

dimensional representations of the original data points.

Suppose y

is the lðdÞ-dimensional embedding of x

i ¼ 1; ...;m. LLE solves the following optimization problem

to obtain y

ðyÞ¼

i¼1



j¼1



: ð17Þ

For our LLR algorithm, the goal is to reconstruct the data

set. Therefore, the reconstructed data point q

has the same

dimension as the original data point x

. Moreover, for

the selected data points x

, i ¼ 1; ...;k, their coordinates

are given. Therefore, their reconstructions (i.e., q

) should

2028 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. 10, OCTOBER 2011

be as close to their original coordinates (i.e., x

) as possible.

Our ultimate goal is to select the most representative data

points, so that the reconstruction error can be minimized.

There are also some works in semi-supervised learning

which have a similar principle of LLR, such as [2], [28], [29].

However, all of these approaches aim to predict the labels

for the unlabeled points by using both labeled and

unlabeled points. In LLR, there is no label prediction task.

The task of LLR is to reconstruct the data set, given some

selected points and the reconstruction coefficients.

3.2 Selecting the Most Representative Points

Given the original data points x

; ...; x

, and the recon-

structed data points q

; ...; q

, the reconstruction error can

be measured as follows:

eðx

; ...; x

¼kX  Qk

¼kX ðM þ Þ

1

Xk

¼kX ðM þ Þ

1

ð þ M  MÞXk

¼kðM þ Þ

1

MXk

;

ð18Þ

where kk

is the matrix Frobenius norm, which is defined

as kAk

¼ TrðAA

Þ¼TrðA

AÞ. Clearly, the reconstruction

error is only dependent on the selected d ata points

; ...; x

Thus, the most representative points are naturally

defined as those which minimize the reconstruction error

(18). That is, given their coordinates, we can reconstruct the

whole data set most precisely by using the LLR algorithm.

Suppose we are going to select k points, the active learning

problem is, thus, formally defined below:

Definition 1. Active Learning based on LLR:

min kðM þ Þ

1

MXk

s:t:  is diagonal;



2f0; 1g;i¼ 1; ...;m

i¼1



¼ k;

ð19Þ

where the variable is the diagonal matrix  2 IR

mm

Given the optimal solution

 of (19), we select those data

points whose corresponding entries in the diagonal matrix 

are 1. After we obtain the labels of the selected points, we

can use any supervised or semi-supervised algorithms [1],

[2], [3], [22], [30] to predict the labels of the other points.

4OPTIMIZATION SCHEME

The optimization problem of LLR

Active

(19) is difficult due

to its combinatorial nature. In this section, we develop two

optimization schemes to solve (19). The first one is a

sequential greedy approach, and the second one is a convex

relaxation. The solution of sequential approach is subopti-

mal, but its sequential property makes it much more

efficient than convex optimization and it thus can be

applied to large-scale data sets. Moreover, our experimental

results show that there is only a slight difference between

sequential and convex optimization performance. On the

other hand, the convex relaxation approach can guarantee

to find the globally optimal solution of the relaxed problem,

but it is computationally expensive.

4.1 The Sequential Approach

Suppose a set of n points Z

¼fx

; ...; x

gXhave been

selected as the n most representative ones. Let 

denote

the corresponding m  m diagonal matrix whose diagonal

entry ð

is 1 if x

and 0 otherwise. Let 

be an

m  m matrix whose ði; iÞth entry is 1 and all the other

entries are 0. The ðn þ 1Þth point x

nþ1

can be found by

solving the following problem:

nþ1

¼ argmin

i62fs

;...;s

kðM þ 

þ 

1

MXk

: ð20Þ

As can be seen, the most expensive calculation in (20) is the

matrix inverse ðM þ 

þ 

1

. Since the matrix M is

sparse, the sparse Cholesky factorization [31] can be applied

to accelerate the calculation of ðM þ 

þ 

1

MX. But

the sequential solver based on the sparse Cholesky

factorization still needs to perform m  n factorizations in

order to solve (20), and thus doesn’t scale well.

A much faster method is to use the Sherman-Morrison-

Woodbury formula [32] to avoid directly inverting a matrix.

Given an invertible matrix A, two column vectors u and v,

the Sherman-Morrison-Woodbury formula states:

ðA þ uv

1

¼ A

1



1

1 þ v

1

: ð21Þ

Denote the ith unit vector as e

. It is easy to check that



¼ e

. Define

H ¼ðM þ 

1

Let H

i

denotes the ith column of H, and H

i

denotes the

ith row of H. Following (21), we get

ðM þ 

þ 

1

¼ H 

i

i

1 þ H

: ð22Þ

With (22), the objective function of (20) can be rewritten as

kðM þ 

þ 

1

MXk

¼ 

TrðHMXX

MHÞ

2

i

MXX

MHH

i

1 þ H



i

i

i

MXX

i

ð1 þ H

ð23Þ

For brevity, the derivations of (22) and (23) are given in

Appendices A and B, respectively, which can be found on

the Computer Society Digital Library at http://doi.ieee

computersociety.org/10.1109/TPAMI.2011.20.

Denote A ¼ MXX

M. Notice that TrðHAHÞ is a con-

stant when selecting the ðn þ 1Þth data point. Therefore, the

optimization problem (20) becomes

nþ1

¼ argmin

i62fs

;...;s

1 þ H

i

i

i

i

1 þ H

 2H

i

AHH

i



ð24Þ

ZHANG ET AL.: ACTIVE LEARNING BASED ON LOCALLY LINEAR RECONSTRUCTION 2029

Since H

i

i

¼kH

i

, the optimization problem (24) can be

further simplified as

nþ1

¼ argmin

i62fs

;...;s

1 þ H

i

2ð1 þ H

I  H

! !

i

ð25Þ

After we have selected the ðn þ 1Þth point x

nþ1

,the

H matrix can be updated as

H ðM þ 

nþ1

1

¼ðM þ 

þ 

1

The matrix inverse can be computed according to (22). This

process is repeated until we have selected k points. In the

beginning, there are no data points selected. Therefore, we

set H ¼ðMÞ

1

. Since M is singular, a small ridge term is

added to it. The sequential approach is summarized in

Table 1.

4.2 The Convex Relaxation

In this section, we discuss how to perform convex

relaxation to solve the optimization problem (19).

First, we rewrite the objective function of (19) as follows:

kðM þ Þ

1

MXk

¼ 

TrðX

MðM þ Þ

1

ðM þ Þ

1

MXÞ

¼ 

TrðX

Mð

þ M þ M þ Þ

1

MXÞ;

ð26Þ

where, in line 3, we use the property 

¼ .

Since  is diagonal, we introduce a vector  ¼½

; ...;





such that  ¼ diagðÞ. Here, the value of 

indicates

whether or not the data point x

is selected. Define an affine

function

hðÞ¼

i¼1



M

i

þ e

i

þ e



Thus, the original optimization problem (19) is equivalent to

min TrðX

MhðÞ

1

MXÞ

s:t: 2f0; 1g

; 1

 ¼ k;

ð27Þ

where the variable is  2 IR

and 1 is a column vector of all

ones. Notice that the variable vector  is sparse and has

only k nonzero entries.

In order to solve the above optimization problem

efficiently, we relax the integer constraints on 

s and allow



s to take real nonnegative values. Then, the value of 

indicates how significantly x

contributes to the minimiza-

tion in problem (27). The sparseness of  can be controlled

by minimizing the ‘

-norm of  (kk

), which has

conventionally been applied to lasso regression [22], [33].

Following the convention in the field of optimization, we

use  to denote componentwise inequality between two

vectors with the same dimension. For example,   

means that 

 

, for all i. Because all the elements of 

are nonnegativ e, kk

is equal to 1

. Finally, the

optimization problem becomes

min TrðX

MhðÞ

1

MXÞþ1



s:t:  0;

ð28Þ

where the variable is  2 IR

and 0 is the column vectors of

all zeros. It can be shown that the problem (28) is a convex

optimization problem with variable  [33].

The objective function of problem (28) is continuously

differentiable twice, so it can be solved directly by standard

optimization techniques [33]. In particular, we show that it

can be cast as a Semi-Definite Programming (SDP) problem,

which can be solved using a standard SDP package. By

introducing an auxiliary variable P 2 IR

dd

, the problem

(28) can be equivalently rewritten as

min TrðP Þþ1



s:t:P

MhðÞ

1

  0

ð29Þ

with variables P 2 IR

dd

and  2 IR

. Here, SS

denotes the

set of symmetric positive semi-definite d  d matrices,

which is called positive semi-definite cone in the field of

optimization. The associated generalized inequality 

the usual matrix inequality: A 

B means A  B is a

positive semi-definite d  d matrix [33].

The problem (29) can be cast as an SDP by using the Schur

complement theorem [33]. Given a symmetric matrix X

partitioned as

X ¼



If A is invertible, the matrix S ¼ C  B

1

B is called the

Schur complement of A in X. The Schur complement

theorem states that, if A is positive definite, then X is

positive semi-definite if and only if S is positive semi-

definite. According to this theorem, problem (29) is

equivalent to the following SDP problem:

2030 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. 10, OCTOBER 2011

TABLE 1

The Sequential Approach for LLR

Active

Active Learning Based on Locally Linear Reconstruction

Figures

Citations

Active Deep Learning for Classification of Hyperspectral Images

Active Deep Learning for Classification of Hyperspectral Images

Tensor factorization using auxiliary information

Exploring Representativeness and Informativeness for Active Learning

Active semi-supervised learning using sampling theory for graph signals

References

Convex Optimization

The Elements of Statistical Learning

A Tutorial on Support Vector Machines for Pattern Recognition

Nonlinear dimensionality reduction by locally linear embedding.

A global geometric framework for nonlinear dimensionality reduction.

Related Papers (5)

Support vector machine active learning with applications to text classification

Nonlinear dimensionality reduction by locally linear embedding.

Active Learning Literature Survey

Query by committee

Active learning with statistical models

Frequently Asked Questions (2)

Q1. What are the contributions mentioned in the paper "Active learning based on locally linear reconstruction" ?

Q2. What are the future works in "Active learning based on locally linear reconstruction" ?