scispace - formally typeset
Open AccessProceedings ArticleDOI

Efficient monte carlo optimization for multi-label classifier chains

Reads0
Chats0
TLDR
This paper presents a novel double-Monte Carlo scheme (M2CC), both for finding a good chain sequence and performing efficient inference, which remains tractable for high-dimensional data sets and obtains the best overall accuracy.
Abstract
Multi-label classification (MLC) is the supervised learning problem where an instance may be associated with multiple labels. Modeling dependencies between labels allows MLC methods to improve their performance at the expense of an increased computational cost. In this paper we focus on the classifier chains (CC) approach for modeling dependencies. On the one hand, the original CC algorithm makes a greedy approximation, and is fast but tends to propagate errors down the chain. On the other hand, a recent Bayes-optimal method improves the performance, but is computationally intractable in practice. Here we present a novel double-Monte Carlo scheme (M2CC), both for finding a good chain sequence and performing efficient inference. The M2CC algorithm remains tractable for high-dimensional data sets and obtains the best overall accuracy, as shown on several real data sets with input dimension as high as 1449 and up to 103 labels.

read more

Content maybe subject to copyright    Report

EFFICIENT MONTE CARLO OPTIMIZATION FOR MULTI-LABEL CLASSIFIER CHAINS
Jesse
Read,
Luca Martino David Luengo
Dept. of Signal Theory and Communications Dept. of Circuits and Systems Engineering
Univ. Carlos III de Madrid (Spain) Univ. Politecnica de Madrid (Spain)
ABSTRACT
Multi-label classification (MLC) is the supervised learning
problem where an instance may be associated with multiple
labels.
Modeling dependencies between labels allows MLC
methods to improve their performance at the expense of an
increased computational cost. In this paper we focus on the
classifier chains (CC) approach for modeling dependencies.
On the one hand, the original CC algorithm makes a greedy
approximation, and is fast but tends to propagate errors down
the chain. On the other hand, a recent Bayes-optimal method
improves the performance, but is computationally intractable
in practice. Here we present a novel double-Monte Carlo
scheme (M2CC), both for finding a good chain sequence and
performing efficient inference. The M2CC algorithm remains
tractable for high-dimensional data sets and obtains the best
overall accuracy, as shown on several real data sets with input
dimension as high as 1449 and up to 103 labels.
Index Terms multi-label classification; Monte Carlo
methods; classifier chains
1.
INTRODUCTION
Multi-label classification (MLC) is the supervised learning
problem where an instance may be associated with multiple
labels,
rather than with a single label as in traditional binary
or multi-class single-label classification (SLC) problems. The
MLC learning context is receiving increased attention in the
literature, since it arises naturally in a wide variety of do-
mains: text, audio, still images and video, bioinformatics, etc.
[1,
2]. The main challenge in this area is modeling label de-
pendencies without incurring in an intractable complexity.
A basic approach to MLC is provided by the so-called
binary relevance (BR) method, which decomposes the MLC
problem into a set of SLC problems (one per label) and uses a
separate classifier for each label. In this way, the multi-label
problem is turned into a series of standard binary classifica-
tion problems that can be solved with any off-the-shelf binary
*This work has been partly supported by the Spanish government
through projects COMONSENS (CSD2008-00010), DEIPRO (TEC2009-
14504-C02-01), ALCIT (TEC2012-38800-C03-01), COMPREHENSION
(TEC2012-38883-C02-01) and DISSECT (TEC2012-38058-C03-01).
classifier (e.g., a logistic regressor or a support vector ma-
chine).
Unfortunately, although BR has a low computational
cost, it cannot provide high performance, because it does not
model dependencies between labels [2, 3,4, 5, 6].
In order to model dependencies explicitly, several alterna-
tive schemes have been proposed, such as the so-called label
powerset (LP) method [7]. LP considers each potential com-
bination of labels in the MLC problem as a single label. In
this way, the multi-label problem is turned into a traditional
multi-class problem that can be solved using standard meth-
ods.
Unfortunately, given the huge number of class values
produced by this transformation, this method is usually un-
feasible for practical application, and suffers from issues like
overfitting. This was recognised by [3, 8], which provide ap-
proximations to the LP scheme that reduce these problems,
although such methods have been superseded in recent years.
A more recent idea is using classifier chains (CC), which
improves the performance of BR and LP by constructing a
sequence of classifiers that make use of previous outputs of
the chain. The original CC method, introduced in [4] and ex-
tended in
[5,9],
makes a greedy approximation, and
is
fast but
tends to propagate errors down the chain. Nevertheless, a very
recent extensive experimental comparison reaffirmed that CC
is among the highest-performing methods for MLC, and rec-
ommended it as a benchmark algorithm [10]. A CC-based
Bayes-optimal method, probabilistic classifier chains (PCC),
has also been recently proposed [5]. However, although it im-
proves the performance of CC, its computational cost is too
large for most real-world applications.
In this paper we introduce a novel method that attains
the performance of PCC, but remains tractable for high-
dimensional data sets. Our approach (M2CC) is based on
a double Monte Carlo optimization technique and, unlike
all other chain-based methods in the literature, it explicitly
searches the space of possible chain-sequences during the
training stage. Hence, predictive performance can be traded
off for scalability depending on the application.
The paper is organized as follows. In Section 2 we review
multi-label classification and the important developments
leading up to this paper. In Section 3 we detail our proposed
novel methods. In Section 4 we carry out empirical evalua-
tions.
Finally, in Section 5 we draw some conclusions and
mention possible future work.

2.
MULTI-LABEL CLASSIFICATION (MLC)
2.2.
Probabilistic Classifier Chains (PCC)
Let us assume that we have a set of training data composed
of N labelled examples, V =
{(x^jW)}^,
where x« =
[x\\ ... ,x^']
T
is the i-th D -dimensional instance (input),
with xf eX
d
foxl <d<D, and y« = [yf\ ..., yfY is
the i-th example's Lxl label relevance vector (output), with
Vj G {0,1} being its j-th label assignment
(1
iff the label is
relevant to xW, 0 otherwise).
In MLC we seek to learn a function, y = h(x), that
assigns a vector of labels, y G
{0,1}
L
,
to each instance,
x G X\ x
x X
d
. Let us assume that the true distribu-
tion of the data is / (y |
x).
From a Bayesian point of
view,
the
optimal label assignment (i.e., the one with the largest prob-
ability of being the true one) for a given test instance, x*, is
provided by the maximum a posteriori (MAP) label estimate:
YMAP
= h
M
Ap(x*) = argmax/(y|x*). (1)
y
Unfortunately, the true distribution, /(y|x), is usually un-
known, and the classifier has to work with an approximation,
p(y|x),
constructed from the training data. Hence, the (pos-
sibly sub-optimal) label prediction is finally given by
y* = h(x*) = argmaxp(y|x*). (2)
y
2.1.
Classifier Chains (CC)
Classifier chains (CC) is based on modeling the correlation
among labels using the chain rule of
probability.
Given a data
instance, x, and a vector of label indexes, s = [si,..., s
L
]
T
,
obtained as a permutation of
{1,...,
L}, p(y|x, s) may be
expressed as
1
L
P(y|x*,s) =p(y
1
\yL*)Y[p(y
j
\yL*,y
1
,...,y
j
-
1
), (3)
where y = [y
i;
..., y
L
]
T
is the permuted label vector, jjj =
y
Sj
is the j-th label in the permutation, and the probabilities in
(3) are learnt from the labelled data during the training stage.
During the test stage, CC follows a single path greedily
down the chain of L binary classifiers, with the j-th classifier,
hj, predicting the j-th label's relevance, y*, using the test
instance, x*, and all previous predictions
{y{,...,
y|_i), as
y*
= /ij(x*|s) = argmaxp(%|x*,^,...,y*_
1
). (4)
In carrying out classification down a chain in this way, CC
models label dependencies and, as a result, usually performs
much better than BR, while being similar in memory and time
requirements in practice. However, due to its greedy approach
it is susceptible to errors in the initial links of the chain [5].
1
Theoretically, Eq. (3) does not depend on the label order. However, since
all the probabilities in (3) are estimated from the training data, the label order
can have a large effect in practice, as recognized by [5].
Probabilistic classifier chains (PCC) was introduced in [5].
In the training phase, PCC is identical to CC. However, dur-
ing the test stage PCC provides Bayes-optimal inference by
exploring all the 2
L
possible paths of the chain. Hence, for a
given test instance, x*, PCC provides the optimum label esti-
mate, obtained maximizing the label vector, y, rather than the
individual labels, yj:
y* = h(x*|s) = argmaxp(y|x*,s), (5)
y
where p(y|x*, s) is given by (3). In [5] an overall improve-
ment of
PCC
over
CC
is reported, but at the price of high com-
putational complexity: it is intractable for more than about 10
labels (= 2
10
paths), which represents the majority of prob-
lems in the multi-label domain.
3.
EFFICIENT DOUBLE MONTE CARLO
TECHNIQUE FOR CLASSIFIER CHAINS
In chain-based MLC problems, for any given test instance,
x*, and label order, s, we wish to find the best label-relevance
vector, y* = [y|,...,
y*
L
],
out of the 2
L
possible label vectors
or
paths.
However, the best inference on a poor model will not
be as good as the best inference on a good model. Therefore,
at training time we also wish to find the best chain order or
sequence, s = [si,..., s
L
], out of the L\ possible chains.
Unfortunately, the optimal solution of these two problems
is not feasible for large values of L. Hence, in this section
we introduce an efficient double Monte Carlo strategy for
quasi-optimal inference in Classifier Chains. We present both
a tractable label prediction scheme at test time (MCC) and
a method that performs an additional search for the optimal
chain sequence at build time (M2CC); an issue which, to the
best of our knowledge, has not yet been successfully tackled,
except by means of avoiding it using a network, such as the
conditional dependency network (CDN) of [6].
3.1.
Training step: finding the best chain
In order to obtain the best chain (i.e., the optimal label order)
during the training step we introduce
a
payoff function,
N
J(s) = ^p(y(
i
)|x«,s), (6)
i=i
and the optimal sequence, s, is the one that maximizes (6)
over the set of L\ possible sequences, i.e.,
N
s = argmax J(s) = argmax Vjp(y |x^%s). (7)
The exact solution of (7) is intractable even for medium
values of L. Therefore, we propose using the Monte Carlo

Algorithm 1 Finding
a
suitable
s
Algorithm
2
Finding
y* for a
given test instance
x*.
Input:
V =
{(xW,yW)}f
=1
: training data
7r(s|s
t
-i): proposal function
T":
number
of
iterations
Algorithm:
1.
Start with some random sequence,
s
0
,
and build
an
ini-
tial model, _p(y|x,
s
0
).
2.
Fort
= 1,...,T':
(a) Draw
s' ~
7r(s|s
t
-i) and build model p(y|x,s').
(b)
if J(S') >
J(s
t
_!)
s
t
<
s'
accept.
(c) else
s
t
<
s
t
-i
reject.
Output:
s = s
T
/:
estimated label sequence.
approach summarized
in
Algorithm 1
to
perform
an
efficient
exploration
of
the label-sequence space. This algorithm starts
with
a
randomly chosen label sequence,
s
0
,
which
is
then
modified trying
to
find local maximum
of the
payoff func-
tion
at
least. More specifically, given
a
sequence
s
t
_i the
proposal function 7r(st|s
t
-i) consists
of
choosing uniformly
two positions
of the
label sequence
(1 < £,m < L) and
swapping
the
labels corresponding
to
those positions,
so
that
st(^)
=
st-i(m)ands
t
(m)
=
s
t
_i(^-
1).
3.2.
Inference (test) step: finding the best path
y*
In
the
test step,
for a
given test instance,
x*, for
which
the
true label association
is
unknown,
and a
label order (either
estimated
for
M2CC
or randomly chosen
for
MCC),
we wish
to
find the optimal label vector that maximizes
(5). In
general,
this problem
can be
solved analytically
for low
values
of L
by exploring
all the 2
L
possible paths,
as in the
PCC method
[5].
However, when
L
grows this method quickly becomes
computationally intractable. Therefore,
we
propose here
us-
ing
the
random search Monte Carlo approach shown
in Al-
gorithm
2 to
approximate (5). This algorithm starts from
the
greedy inference offered by standard
CC,
draws samples y W,
i
=
1,...,
T
according
to the
model p(y
t
|x*, s), providing
a
predicted label sequence
y* =argmaxp(y
t
*|x*,s),
(8)
where
y*
t
(1 < t < T) are
the samples accepted
by the
algo-
rithm.
4.
EXPERIMENTS
We perform experiments
on a
collection
of
real world data
sets familiar in the multi-label literature [3,4, 5], whose char-
acteristics
are
shown
in
Table
1. We
compare
our
two novel
methods
(MCC and
M2CC)
to
baseline
BR [7], the
original
Input:
x*:
test instance.
s:
label order (estimated
or
chosen randomly).
p(y|x, s): probabilistic model (from training stage).
Algorithm:
1.
Obtain
an
initial path,
y
0
,
using CC.
2.
Fort=
1,...,T:
(a) Drawy'-p(y|x*,s)
(b)
if
p(y'|x*,s) >p(y
t
|x*,s)
Yt <- y'
accept.
(c) else
Yt<r-
y
t
-i
reject.
Output:
y* = yr-
predicted label assignment.
Table
1.
Multi-label datasets characteristics:
n
indicates
nu-
meric variables; 6 indicates binary variables, LC
is
label car-
dinality: average number
of
relevant labels
per
example.
N L D
LC
Type
Music 593 6 12n
1.87
audio
Scene 2407 6
294n
1.07
image
Yeast 2417
14 103n 4.24
biology
Genbase 661 27 11856
1.25
biology
Medical 978 45 14496
1.25
text
Enron 1702
53
10016
3.38 text
Reuters 6000 103
500n
1.46
text
classifier chains method
CC [4], the
Bayes-optimal rendi-
tion PCC [5];
and
also
the
conditional dependency networks
method CDN
of [6]
under
I = 1000
total iterations.
For
our methods,
we use T = 100
(inference y-step)
and
just
T"
= 10 for
M2CC (training s-step).
2
As a
base classifier
we use support vector machines fitted with logistic models
in
order
to
have
a
probabilistic output
[ll].
3
We carry out 5-fold cross validation (CV). Results for pre-
dictive performance
are
displayed
in
Table
2. As a
perfor-
mance measure we have used the exact match score (inversely
equivalent to subset zero-one loss),
1
N
EXACT MATCH
=
^I(y
(i)
= y*
W
),
i=l
where I(-)
is an
indicator function (returning 1
iff
the logical
condition
is
fulfilled
and
zero otherwise),
as
this
is the
loss
function minimized
by the
MAP estimator [5].Results under
other measures
of
evaluation
can be
seen
in
[13]. Note that,
since PCC
is
only tractable on datasets where
L <
10, we can
2
Better results
can be
obtained
by
increasing
T" at the
cost
of
more run-
ning time; however even
T = 10'
proves enough
to
improve
the
predictive
performance under
our
method.
3
A11 methods
are
implemented
and
will
be
made available within
the
MEKA framework (http
:
//meka . source forge .net).

Table 2. Average exact match over
5-fold
CV.
Dataset
BR
CC PCC CDN MCC M2CC
Music
0.299 0.287 0.346 0.297 0.346
0.361
Scene
0.538 0.545 0.636
0.531
0.636 0.657
Yeast 0.140 0.151
DNF
0.069 0.209 0.206
Genbase
0.941
0.964
DNF
0.945
0.964
0.967
Medical
0.585
0.622
DNF
0.602
0.629 0.627
Enron
0.065 0.099
DNF
0.073
0.101 0.103
Reuters
0.287 0.346
DNF
0.271
0.366
0.364
avg. rank 4.57 3.43 4.71 1.57 1.43
Table 3. Average running time (seconds) over
5-fold
CV.
Dataset BR CC PCC CDN MCC M2CC
Music 0 0 0 5 1 4
Scene
12
10 15
92
25 170
Yeast 10 10
DNF
88
32 222
Genbase 10 7
DNF 572
201
382
Medical 9 10
DNF
1546 338 506
Enron 102
91
DNF
3091 706 1399
Reuters 106 119
DNF 14734
1831 20593
Table 4. Average exact match over
5-fold
CV.
Dataset ECC EM2CC
Music 0.314(2)
0.329
(1)
Scene
0.608
(2)
0.633
(1)
Yeast 0.186(2) 0.193(1)
Genbase
0.945
(1)
0.945
(1)
Medical
0.643
(2)
0.649
(1)
Enron 0.112(2) 0.116(1)
Reuters
0.364
(1)
0.360
(2)
avg. rank
1.71
1.14
only provide results for the first two data sets, with DNF (Did
Not Finish) in Table 2 indicating this fact. Results for running
time performance are also given in Table 3. Furthermore, the
original
CC
paper [4] also presented
CC
in Bagging ensembles
(ECC) to improve predictive performance. We also bag M2CC
to create the ensemble method EM2CC. We use 10 models
for each ensemble, each one starting with a different random
initiation of the chain sequence (s
0
). Results for predictive
performance of
EM2CC
vs. M2CC are given in Table 4.
As claimed in the literature, CC improves over BR in all
cases.
PCC in turn improves on
CC
in the two cases where it is
tractable. The
MCC
methods perform the best overall. Both of
them outperform CC on every occasion - with the exception
of ties on Genbase. We note that
MCC
provides identical re-
sults to PCC on both datasets that it finishes on. M2CC obtains
even higher performance than PCC on these datasets, under-
lining the importance of the chain sequence in constructing
classifier chains, and the fact that we have been able to lever-
age this to create a better model. As expected,
M2 CC
also out-
performs
MCC
in most cases, and overall, precisely because it
optimises the chain-sequence space, improving the sequence
of labels at training time.
Clearly MCC and M2CC take much longer than the stan-
dard greedy CC method, but they are still tractable on all the
data sets we looked at (unlike PCC) and the improvement in
predictive performance is well worth the trade off. Further-
more, we note that our methods are generally faster than the
conditional dependency network CDN (with the exception of
M2CC on some datasets).
Finally, we note that, although ECC is able to offer an
improvement over CC (particularly on Yeast, Medical and
Enron), EM2CC still maintains a clear advantage over ECC on
all data sets. We also notice that, while a Bagging ensemble
can raise the accuracy of CC, even this additional accuracy
does not always compete well with a single MCC or M2CC
model (if we compare between Tables 2 and 4).
5. CONCLUSIONS AND FUTURE WORK
We have introduced two novel efficient Monte Carlo (MC)
algorithms (MCC and M2CC) for multi-label learning using
classifier chains. The proposed approaches use MC tech-
niques to efficiently search the label-path space at inference
time and also the chain-sequence space at training time in
the case of
M2CC.
We show through an empirical evaluation
that using these methods results in better predictive perfor-
mance than related methods while remaining computationally
tractable. In future work, we intend to look at more advanced
random search algorithms and dependency structures other
than chain models, as well different payoff functions. We also
plan to extend this work to multi-valued target attributes and
hierarchical MLC problems.
6. RELATION TO PRIOR WORK
This work builds on the classifier chains (CC) framework for
multi-label classification (MLC) [4] and its recent probabilis-
tic extension, probabilistic classifier chains (PCC) [5]. More
specifically, since the Bayes-optimal approach proposed by
PCC is unfeasible in practice due to its computational cost, we
propose a tractable inference scheme, based on Monte Carlo
(MC) methods, which attains a similar performance to PCC.
Furthermore, we also introduce an MC approach for the opti-
mization of the chain of classifiers during the training stage,
an issue that has not been tackled before as far as we know,
except by avoiding it altogether (e.g., by using conditional
dependency networks [6]). Finally, ensemble versions of the
two MC approaches proposed have been developed following
the line of ECC and EPCC [4, 5].

7.
REFERENCES
[1] G. Tsoumakas, I. Katakis, and I. Vlahavas, "Min-
ing multi-label data," in Data Mining and Knowledge
Discovery Handbook, O. Maimon and L. Rokach, Eds.
2010,
2nd edition, Springer.
[2] Jesse Read, Scalable Multi-label Classification, Ph.D.
thesis,
University of Waikato, 2010.
[3] Grigorios Tsoumakas and Ioannis R Vlahavas, "Ran-
dom k-labelsets: An ensemble method for multilabel
classification," in ECML '07: 18th European Con-
ference on Machine Learning. 2007, pp. 406^117,
Springer.
[4] Jesse Read, Bernhard Pfahringer, Geoffrey Holmes, and
Eibe Frank, "Classifier chains for multi-label classifica-
tion," Machine Learning, 2011.
[5] Weiwei Cheng, Krzysztof Dembczyfiski, and Eyke
Hullermeier, "Bayes optimal multilabel classification
via probabilistic classifier chains," in ICML '10: 27th
International Conference on Machine Learning, Haifa,
Israel, June 2010, Omnipress.
[6] Yuhong Guo and Suicheng Gu, "Multi-label classifi-
cation using conditional dependency networks.," in IJ-
CAI '11: Proceedings of
the
24th International Confer-
ence on Artificial Intelligence.
2011,
pp. 1300-1305, IJ-
CAI/AAAI.
[7] Grigorios Tsoumakas and Ioannis Katakis, "Multi label
classification: An overview," International Journal of
Data Warehousing and Mining, vol. 3, no. 3, pp. 1-13,
2007.
[8] Jesse Read, Bernhard Pfahringer, and Geoff Holmes,
"Multi-label classification using ensembles of pruned
sets,"
in ICDM'08: Eighth IEEE International Confer-
ence on Data Mining. 2008, pp. 995-1000, IEEE.
[9] Julio H. Zaragoza, Luis Enrique Sucar, Eduardo F.
Morales, Concha Bielza, and Pedro Larranaga,
"Bayesian chain classifiers for multidimensional clas-
sification," in Proceedings of the 24th International
Conference on Artificial Intelligence (IJCAI '11), 2011.
[10] Gjorgji Madjarov, Dragi Kocev, Dejan Gjorgjevikj, and
Saso Deroski, "An extensive experimental comparison
of methods for multi-label learning," Pattern Recogni-
tion,
vol. 45, no. 9, pp. 3084-3104, Sept. 2012.
[11] Trevor Hastie and Robert Tibshirani, "Classification by
pairwise coupling," in Advances in Neural Informa-
tion Processing Systems, Michael I. Jordan, Michael J.
Kearns, and Sara A. Solla, Eds. 1998, vol. 10, MIT
Press.
[12] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard
Pfahringer, Reutemann Peter, and Ian H. Witten, "The
weka data mining software: An update," SIGKDD Ex-
plorations, vol.
11,
no. 1, 2009.
[13] Jesse Read, Luca Martino, and David Luengo, "Effi-
cient Monte Carlo optimization for multi-label classifier
chains," Tech. Rep., Universidad Carlos III of Madrid,
Die.
2012,
arXiv:
1211.2190.
Citations
More filters
Proceedings ArticleDOI

A Genetic Algorithm for Optimizing the Label Ordering in Multi-label Classifier Chains

TL;DR: Experiments on diverse benchmark datasets, followed by the Wilcoxon test for assessing statistical significance, indicate that the proposed genetic algorithm for optimizing the label ordering in classifier chains produces more accurate classifiers.
Journal ArticleDOI

Physics-aware Gaussian processes in remote sensing

TL;DR: A Joint GP model that combines in situ measurements and simulated data in a single GP model, a latent force model for GP modeling that encodes ordinary differential equations to blend data-driven modeling and physical constraints of the system governing equations, and an Automatic Gaussian Process Emulator (AGAPE) that approximates the forward physical model using concepts from Bayesian optimization.
Journal ArticleDOI

Active k-labelsets ensemble for multi-label classification

TL;DR: An active k-labelsets ensemble (ACkEL) paradigm is proposed, borrowing the idea of active learning, where a label-selection criterion is proposed to evaluate the separability and balance level of the classes transformed from a label subset.
Journal Article

Scikit-multilearn: a scikit-based Python environment for performing multi-label classification

TL;DR: The scikit-multilearn is a Python library for performing multi-label classification that provides Python wrapped access to the extensive multi- label method stack from Java libraries and makes it possible to extend deep learning single-label methods for multilabel tasks.
Journal ArticleDOI

Conditional entropy based classifier chains for multi-label classification

TL;DR: This paper proposes ordering methods based on the conditional entropy of labels that generate a single order instead of multiple orders and shows that the proposed methods achieve good performance.
References
More filters
Journal ArticleDOI

The WEKA data mining software: an update

TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Journal ArticleDOI

Multi-label classification: An overview

TL;DR: The task of multi-label classification is introduced, the sparse related literature is organizes into a structured presentation and comparative experimental results of certain multilabel classification methods are performed.
Journal ArticleDOI

Classifier chains for multi-label classification

TL;DR: This paper presents a novel classifier chains method that can model label correlations while maintaining acceptable computational complexity, and illustrates the competitiveness of the chaining method against related and state-of-the-art methods, both in terms of predictive performance and time complexity.
Journal ArticleDOI

Classification by pairwise coupling

TL;DR: In this article, the authors discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then coupling the estimates together, similar to the Bradley-Terry method for paired comparisons.
Book ChapterDOI

Mining Multi-label Data

TL;DR: A large body of research in supervised learning deals with the analysis of single-label data, where training examples are associated with a single label λ from a set of disjoint labels L, however, training examples in several application domains are often associated withA set of labels Y ⊆ L.
Related Papers (5)