scispace - formally typeset
Open AccessJournal ArticleDOI

Sensitivity of Support Vector Machines to Random Feature Selection in Classification of Hyperspectral Data

Reads0
Chats0
TLDR
Experimental results clearly demonstrate that the generation of an SVM-based classifier system with RFS significantly improves overall classification accuracy as well as producer's and user's accuracies.
Abstract
The accuracy of supervised land cover classifications depends on factors such as the chosen classification algorithm, adequate training data, the input data characteristics, and the selection of features. Hyperspectral imaging provides more detailed spectral and spatial information on the land cover than other remote sensing resources. Over the past ten years, traditional and formerly widely accepted statistical classification methods have been superseded by more recent machine learning algorithms, e.g., support vector machines (SVMs), or by multiple classifier systems (MCS). This can be explained by limitations of statistical approaches with regard to high-dimensional data, multimodal classes, and often limited availability of training data. In the presented study, MCSs based on SVM and random feature selection (RFS) are applied to explore the potential of a synergetic use of the two concepts. We investigated how the number of selected features and the size of the MCS influence classification accuracy using two hyperspectral data sets, from different environmental settings. In addition, experiments were conducted with a varying number of training samples. Accuracies are compared with regular SVM and random forests. Experimental results clearly demonstrate that the generation of an SVM-based classifier system with RFS significantly improves overall classification accuracy as well as producer's and user's accuracies. In addition, the ensemble strategy results in smoother, i.e., more realistic, classification maps than those from stand-alone SVM. Findings from the experiments were successfully transferred onto an additional hyperspectral data set.

read more

Content maybe subject to copyright    Report

2880 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 7, JULY 2010
Sensitivity of Support Vector Machines to Random
Feature Selection in Classification
of Hyperspectral Data
Björn Waske, Member, IEEE, Sebastian van der Linden,
Jón Atli Benediktsson, Fellow, IEEE, Andreas Rabe, and Patrick Hostert
Abstract—The accuracy of supervised land cover classifications
depends on factors such as the chosen classification algorithm,
adequate training data, the input data characteristics, and the se-
lection of features. Hyperspectral imaging provides more detailed
spectral and spatial information on the land cover than other
remote sensing resources. Over the past ten years, traditional and
formerly widely accepted statistical classification methods have
been superseded by more recent machine learning algorithms,
e.g., support vector machines (SVMs), or by multiple classifier
systems (MCS). This can be explained by limitations of statistical
approaches with regard to high-dimensional data, multimodal
classes, and often limited availability of training data. In the pre-
sented study, MCSs based on SVM and random feature selection
(RFS) are applied to explore the potential of a synergetic use of
the two concepts. We investigated how the number of selected
features and the size of the MCS influence classification accuracy
using two hyperspectral data sets, from different environmental
settings. In addition, experiments were conducted with a vary-
ing number of training samples. Accuracies are compared with
regular SVM and random forests. Experimental results clearly
demonstrate that the generation of an SVM-based classifier system
with RFS significantly improves overall classification accuracy as
well as producer’s and user’s accuracies. In addition, the ensemble
strategy results in smoother, i.e., more realistic, classification maps
than those from stand-alone SVM. Findings from the experiments
were successfully transferred onto an additional hyperspectral
data set.
Index Terms—Classifier ensembles, hyperspectral data, multi-
ple classifier systems (MCSs), random feature selection (RFS),
support vector machines (SVMs).
I. INTRODUCTION
R
EMOTE sensing applications, such as land cover clas-
sification, provide a variety of important information
for decision support and environmental monitoring systems.
When complex environments are mapped or when very detailed
Manuscript received July 2, 2009; revised November 19, 2009. Date of
publication March 29, 2010; date of current version June 23, 2010. This work
was supported in part by the Research Fund of the University of Iceland and in
part by the Icelandic Research Fund.
B. Waske is with the Institute of Geodesy and Geoinformation, Faculty of
Agricultural, University of Bonn, 53115 Bonn, Germany (e-mail: bwaske@
uni-bonn.de).
J. A. Benediktsson is with the Faculty of Electrical and Computer Engineer-
ing, University of Iceland, 107 Reykjavik, Iceland.
S. van der Linden, A. Rabe, and P. Hostert are with the Geomatics Labo-
ratory, Geography Department, Humboldt-Universität zu Berlin, 10099 Berlin,
Germany.
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TGRS.2010.2041784
analyses are performed, the spectral and spatial resolution
requirements can be very high, e.g., in urban area mapping,
the characterization of mineral composition, or in plant-type
differentiation [1], [2]. In such situations, airborne hyperspec-
tral sensors are—at the moment—probably the most valuable
single data source. Data from these sensors provide detailed and
spectrally continuous spatial information on the land surface,
ranging from the visible to the short-wave infrared regions
of the electromagnetic spectrum. They enable discriminating
between spectrally similar land cover classes that occur at
highly frequent spatial patterns [3].
However, it is well known that increasing data dimensionality
and high redundancy between features might cause problems
during data analysis, e.g., in the context of supervised classi-
fication: The overall map accuracy can decrease when only a
limited number of training samples are available [4]. Against
this background, machine learning algorithms such as support
vector machines (SVMs) and concepts like multiple classifier
systems (MCSs) have emerged over the past decade [5]–[9].
SVMs construct an optimal separating hyperplane between
training samples of two classes within the multidimensional
feature space. In linear nonseparable cases, the data are mapped
using a kernel function into a higher dimensional feature space.
This enables the definition of a separating hyperplane, which
appears nonlinear in the original feature space. Based on this
so-called kernel trick, SVM can describe complex classes with
multimodal distributions in the feature space. Despite their
relatively good performance when high-dimensional data are
classified with only small training sets, a sufficient number of
samples should be considered to ensure that adequate train-
ing samples are included during supervised classification [7].
Recent studies discuss the use of SVM for spectral–spatial
classification of urban hyperspectral data [10]–[12] and multi-
source classification [13]–[15] and extend the supervised SVM
techniques by semisupervised concepts [16], [17].
The concept of MCS, on the other hand, does not refer to
a specific algorithm but to the idea of combining outputs from
more than one classifier to enhance classification accuracy [18].
These outputs may result from either a set of different classi-
fication algorithms or independent variants of the same algo-
rithms (i.e., the so-called base classifier). The latter are achieved
by modifying aspects of the input data, to which the base classi-
fier is sensitive during separate training processes. This includes
the following: the generation of training sample subsets, named
bagging [19], the sequential reweighting of training samples,
boosting [20], and the generation of feature subsets, e.g., by
0196-2892/$26.00 © 2010 IEEE

WA S K E et al.: SENSITIVITY OF SVMs TO RANDOM FEATURE SELECTION 2881
random feature selection (RFS) [21]. Afterward, outputs from
the various instances of the same base classifier are combined to
create t he final class decision [18]. Many MCS applications em-
ploy base classifiers of relatively little computational demand,
usually self-learning decision trees (DTs), which construct
many rather simple decision boundaries that are parallel to the
feature axis. MCSs were recently reviewed in the context of
remote sensing [22] and yield excellent results when dealing
with hyperspectral and multisource data sets [8], [9], [23]–[25].
In summary, the concepts of SVM and MCS are based on
different ideas: While the first focuses on optimizing a single
processing step, i.e., the fitting of the presumably optimal
separating hyperplane, the latter relies on an ideally positive in-
fluence of a combined decision derived from several s uboptimal
yet sometimes computationally simple outputs. Nevertheless,
the two approaches are not exclusive, and it appears desirable
to combine them in a complementary approach. Studies trying
to employ computationally more complex classifiers such as
neural networks [26] and SVMs [27] in MCSs exist, but they
are very limited, particularly in the remote sensing context. In
[28], the use of an SVM ensemble for the spectral–contextual
classification of a single high-spatial resolution remote sensing
image was shown to increase the accuracy of individual results.
Results in a non-remote-sensing context are more controversial,
though: Whereas bagging and boosting were successfully used
with SVMs in [27], the results of an empirical analysis in [29]
demonstrate that ensembles based on bagging and boosting are
not generally better than a single SVM in terms of accuracy.
Given these discrepancies and the significantly higher compu-
tational demand of SVM compared to DT, the idea of an MCS
based on SVM requires well thought and systematic approaches
to develop an SVM ensemble concept that leads to a general
trend of increasing accuracy and to give appropriate guidelines
to achieve efficient processing, i.e., reliable default parameters.
We present an SVM ensemble that uses RFS to generate
independent variants of SVM results. This novel combination
of the two classifier concepts appears particularly useful with
regard to the high dimensionality and redundancy of hyperspec-
tral information. We expect the results of this MCS to show
clearer trends than those reported in previous studies that use
bagging or boosting [27]–[29]. In order to evaluate the potential
of such a concept, we focus on three main research questions.
1) Is there a significant increase in accuracy compared to
regular SVM and advanced DT classification, when RFS
is performed to construct SVM ensembles?
2) What is the impact of the two parameters, namely, feature
subset size and ensemble size, on the accuracy and stabil-
ity of ensembles in terms of the classification result?
3) Is it possible to derive default values or recommendations
for the parameters in (2) in order to make the use of SVM
ensembles with RFS feasible?
To answer these three research questions, the specific objec-
tive of our study is the classification of two different hyper-
spectral data sets, i.e., an urban area from the city of Pavia,
Italy, and a volcanic area from Hekla volcano, Iceland, with
various SVM ensembles. These are generated by systematically
increasing the number of randomly selected features before
SVM classification as well as the number of repetitions that are
combined for the final class decision. Moreover, the size of the
training sets is varied to investigate the possible influences of
the number of training samples on the classification accuracy.
This paper is organized as follows. Section II introduces the
conceptual and algorithmic framework of SVM and MCS.
The used SVM ensemble strategy is explained in Section III.
The experimental setup and experimental results are presented
in Section IV. Section V discusses results, followed by the
conclusion in Section VI.
II. B
ACKGROUND
A. SVMs
The SVM is a universal learning machine for solving binary
classification problems and can be seen as an approximate
implementation of Vapnik’s Structural Risk Minimisation prin-
ciple, which has been shown superior to traditional Empirical
Risk Minimisation principle [30]. SVMs are able to sepa-
rate complex (e.g., multimodal) class distributions in high-
dimensional feature spaces by using nonlinear kernel functions
and to deal with noise and class confusion via a regularization
technique. A detailed introduction on the general concept of
SVM is given in [31]; an overview in the context of remote
sensing can be found, e.g., in [5] and [6].
In this paper, a Gaussian radial basis function (RBF) kernel
function is used, which is widely accepted in remote sensing
applications. A one-against-one (OAO) strategy is used to
handle multiclass problems with the originally binary SVM.
The OAO class decision is determined by a majority vote using
classwise decisions [6], [7].
B. MCSs
The concept of classifier ensembles or MCSs is based on
the hypothesis that independent classifiers generate individual
errors, which are not produced by the majority of the other
classifiers within the ensemble. The basic aim in MCS is
therefore to generate a diverse set of classifiers, making each
individual classifier as unique as possible [18].
Among MCS approaches based on iterative and independent
variations of a base classifier, bagging and boosting are perhaps
the widest used concepts for the construction of classifier en-
sembles. Boosting [20] consecutively modifies the training data
by adaptively weighting the training samples after each individ-
ual iteration. Accurately classified samples are assigned a lower
weight than those samples classified incorrectly. Bagging, on
the other hand, randomly generates a set of training sample
subsets. Each sample subset is used to train an individual base
classifier, and the outputs are combined to generate a final
class decision. While boosting is performed in series, bagging
[19] can be performed in parallel and may result in lower
computation time.
The generation of classifier ensembles by RFS—also known
as the random subspace method or attribute bagging—is an-
other method used to train an MCS [21], [32]. Each base
classifier is trained on randomly generated and independently
drawn feature subsets. The diverse outputs are combined to
define the final MCS class decision, often by a majority vote.
In [33], the concept was successfully applied to a set of multi-
temporal SAR images using a DT as a base classifier.
Breiman [34] introduced random forests (RF) which is a
DT-based classifier concept based on training sample and

2882 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 7, JULY 2010
feature subsets. Each tree within the ensemble is trained on
bootstraps of the training samples (i.e., the same as in bagging).
In addition, at each internal node of the DT, a randomly
generated feature subset is used.
III. SVM E
NSEMBLE STRATEGY
In [29], SVM ensembles were constructed by bagging and
different boosting variants. On average, these ensembles out-
performed a standard SVM in terms of accuracy. Nevertheless,
a general improvement of MCS strategies by incorporating
SVM cannot be stated based on the results in [29], and greater
computational time does not necessarily result in higher clas-
sification accuracies. On the other hand, increased computa-
tions may sometimes, in fact, reduce the accuracy. In [35],
the stability of SVM in terms of classification performance
was investigated following a bagging strategy. The obtained
experimental results underline that bagging does not improve
nor even slightly decrease the classification results of SVM.
Thus, Buciu et al. [35] consider SVM as a stable classifier with
regard to bagging. However, the instability of the base classifier,
i.e., a small change in the training samples leads to varying
classification results, i s an important requirement for a bagging
ensemble. Boosting, on the other hand, is computationally very
demanding, because it must be processed in series and performs
less efficient than bagging in the context of SVM [29]. In regard
to these findings and the fact that ensembles generated by RFS
can outperform bagging and boosting in terms of accuracy [21],
[32], the construction of SVM ensembles by RFS seems more
adequate.
Whereas bagging is based on the selection of training sam-
ples, RFS modifies the d-dimensional feature space
d
, and
each classifier within the ensemble is generated by randomly
choosing a feature subspace
d
of user-defined size d
. Ho [21]
recommends the selection of d
d/2, features for a DT-based
classifier system, where d is the total number of features.
By using RFS for generating an ensemble of size z for solv-
ing a classification problem with n classes, each classifier with
class decision Y
k
(x) ∈{1,...,n} and 1 k z is trained
on a randomly generated feature subset. The feature subsets
are usually generated without replacement, and d
features are
selected out of the whole feature set of size d. As in bagging,
the different classifications are often combined to the final class
decision Y (x) by a simple majority vote using classwise scores
Y (x) = arg max
i∈{1,...,n}
S
i
(x) (1)
S
i
(x)=
z
k=1
1, if Y
k
(x)=i
0, otherwise.
(2)
Our proposed strategy for the MCS is shown in Fig. 1. As a
first step after preprocessing, an RFS is performed to perform
various sets of feature subspaces of size d
(the feature subset
size). Afterward, an individual SVM is applied on the feature
subset, providing an individual classification result. These steps
are performed z times, either parallel or in series, with z being
the number of classifiers within the ensemble (i.e., ensemble
size). The z classification outputs are combined with a majority
vote, according to (1). The proposed fusion strategy has been
compared to alternative approaches, e.g., majority vote on
binary decision and class probability values, which, in no case,
Fig. 1. Schematic diagram of the RFS using z iterations and selecting d
out of
d available features. It should be noted that the classification of each individual
SVM already requires a voting strategy (Section II-A). This is not shown in the
diagram, and the individual outputs refer to a land cover map containing class
labels between 1 and c.
further improved the results. We believe that a detailed analysis
of more sophisticated fusion strategies may be worthwhile. It
requires a separate analysis that goes beyond the scope of this
paper, however.
IV. E
XPERIMENTAL SETUP AND RESULTS
A. Experimental Setup
Two hyperspectral data sets from study sites with different
environmental setting were used in this study. Both classifi-
cation problems are set up in ways that require hyperspectral
information for an appropriate description of target surface
types, i.e., lava types from different mineral composition and
age as well as urban surfaces. The latter study also requires a
very high spatial resolution in order to avoid a high fraction of
mixed pixels.
The first study site lies around the central-volcano Hekla, one
of the most active volcanoes in Iceland (Fig. 2). The image was
collected by Airborne Visible/Infrared Imaging Spectrometer
(AVIRIS) on a cloud-free day (June 17, 1991). AVIRIS operates
from the visible to the short-wave infrared regions of the
electromagnetic spectrum, ranging from 0.4 to 2.4 μm. Due to a
malfunction, spectrometer 4 operating in the wavelength range
from 1.84 to 2.4 μm was working incorrectly. Thus, the bands
from spectrometer 4 were deleted from the image data along
with the first channels of each of the three spectrometers, which
contained noise. Finally, 157 data channels were left. The image
strip is 2048 × 614 pixels, with a spatial resolution of 20 m
[36]. The classification aims on 22 land cover classes, mainly
lava flows from different eruptions and older hyaloclastites.
The second data set was acquired by the ROSIS-3 (Reflective
Optics System Imaging Spectrometer) sensor over the city

WA S K E et al.: SENSITIVITY OF SVMs TO RANDOM FEATURE SELECTION 2883
Fig. 2. AVIRIS data, Hekla, Iceland. False-color composite and correspond-
ing ground truth areas representing 22 land cover classes.
Fig. 3. ROSIS-3 data, Pavia, Italy. False-color composite and corresponding
ground truth areas representing nine land cover classes.
of Pavia, Italy (Fig. 3). The 115 bands of the sensor cover
the 0.43–0.86-μm range of the electromagnetic spectrum. The
spatial resolution of the data set is 1.3 m per pixel. The image
strip is 610 × 340 pixels, sounding the University of Pavia.
Some bands have been removed due to noise; the remaining 103
channels have been used in the classification. The classification
is aiming nine land cover classes.
For both data sets, ground truth information was used for
generating training and validation sample sets, using expert
knowledge in image interpretation. An equalized random sam-
pling was performed, guaranteeing that all classes are equally
included in the sample set. To investigate the possible influence
of the number of training samples on the performance of the
proposed method, training sets with different size were gener-
ated, containing 25, 50, 100, and 200 training samples per class,
respectively (from now on referred to as tr#25, tr#50, ...). For
each data set, an independent test set was available, containing
14 966 and 40 002 samples, respectively. Several experiments
were conducted to investigate the sensitivity of SVM classifiers
to RFS. Diverse SVM ensembles were generated for the two
data sets using the following: 1) feature subsets of differ-
ent sizes (10%, 20%,...,90% of all available features) and
TAB LE I
D
ATA SET 1—HEKLA.OVERALL ACCURACY (IN PERCENT)USING
DIFFERENT METHODS AND NUMBER OF TRAINING SAMPLES.THE
SVM ENSEMBLES ARE BASED ON THE RANDOMLY SELECTION OF
30% OF ALL FEATURES.
INDICATES A SIGNIFICANT DIFFERENCE
(α =0.01) IN COMPARISON TO THE REGULAR SVM
2) different numbers of classifiers within the ensemble (10, 25,
and 50). Aside from this, a standard SVM, which is based on
the whole data set, and an RF classifier [34] were applied to the
images using the same training sets.
The training and classification were performed using im-
ageSVM [37], which is a freely available IDL/ENVI imple-
mentation. ImageSVM
1
is based on the LIBSVM approach by
Chang and Lin [38] for the training of the SVM. A Gaussian
RBF kernel was used, and a regularization parameter C and
a kernel parameter g are determined by a grid search using
a threefold cross validation. Training and grid search are per-
formed for each SVM classifier in the ensemble, i.e., in the
case of an ensemble size of 50, individual SVMs are trained on
50 different feature subsets with each training including its own
grid search.
The RF classification was performed with WEKA [39] using
100 iterations. First experiments have shown that an increased
number of iterations do not significantly improve the total
accuracy. The number of features at each node was set to
the square root of the number of input features. This value is
considered adequate in literature [24] and proved reliable in a
previous studies by the authors [25].
Accuracy assessment was performed, giving overall accu-
racies and confusion matrices that were used to calculate the
producer’s and user’s accuracies. Based on 99% confidence
intervals, a statistical comparison of ensemble-based results and
maps from regular SVM was performed.
B. Results for Data From Hekla, Iceland
The results demonstrate that the SVM ensemble outper-
formed the regular SVM and RF classifier in terms of overall
accuracy for the four training sample sets. The RF achieved an
overall accuracy between 83.3% and 92.7%; a regular SVM,
on the other hand, achieved accuracies between 82.2% and
96.9%. In contrast to this, an SVM ensemble that is based
on RFS achieves overall accuracies between 88.3% and 97.7%
(Table I).
These results clearly underline that the overall accuracy
can be increased by the SVM ensemble. However, a strong
1
Software available at http://www.hu-geomatics.de.

2884 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 7, JULY 2010
influence of the number of training samples on classification
accuracy can be observed: For all methods, accuracies increase
monotonically with sample set size. The rate of this increase is
highest for the regular SVM (14.7% difference between tr#25
and tr#200) and lowest for SVM ensembles with 50 iterations
(8%). SVM ensembles with ten iterations outperform RF and
a regular SVM in t erms of accuracy for all training sample
sets. The overall accuracy was further improved by increasing
the number of iterations from 10 to 25. No significant further
increase takes place for larger ensembles.
Feature subset size significantly affected the overall accuracy
(Fig. 4).
Irrespective of the number of training samples, the use of
only 10% of the features (i.e., 16 bands) was ineffective in
terms of accuracies (e.g., 93.4% accuracy with 50 iterations and
tr#100). I n many cases, the accuracies were below the accura-
cies for the regular SVM. In contrast to this, the use of 20% of
the available features increased the accuracy and outperformed
a regular SVM classifier in terms of accuracy, or performed
at last equally well. The maximum accuracy is achieved by
generating an ensemble with 30% of the features (e.g., 96.6%
with 50 iterations and tr#100). The use of additional features
did not further improve the overall accuracy. In the case of tr#25
and tr#50, the overall accuracy decreases significantly when
feature subset size is further increased.
With regard to the class accuracies, the proposed strategy
outperformed the other methods in the majority of the cases.
In Fig. 5, the differences between the producer’s and users’s ac-
curacy, achieved by the SVM ensemble and a regular SVM, are
shown. While some classes show almost no differences, such as
the three vegetation-covered Andesite Lava classes (classes 6, 7,
and 8) as well as Firn and Glacier Ice and Snow (classes 21 and
22), the difference tends significantly toward the positive in the
majority of the classes, i.e., the ensemble achieves higher class
accuracies. As for the overall accuracy, this effect is reduced
with an increasing number of training samples (Fig. 4). In the
case of tr#50, the class accuracies are often improved by 5%
or even more using the ensemble approach. On the other hand,
the improvement in the case of the training set tr#200 is less
significant and usually below 5%. Only the user accuracies for
classes 5 and 14 are improved by more than 5% (Andesite Lava
1991 II and Hyaloclastite Formation III) due to the ensemble
classifier (see Figs. 5a and 5b). The two classification maps
achieved by the regular SVM and ensemble appear similar in
many regions (Figs. 6 and 7). Nevertheless, some differences
exists and the map achieved by the ensemble appears more
homogenous. Classification accuracies achieved by the SVM
ensemble were significantly higher than those produced by the
regular SVM classifier with the respective number of train-
ing samples based on a test with a 99% confidence interval
(α =0.01) (Table I).
C. Results for Data From Pavia, Italy
As for the Hekla data set, the results achieved with the SVM
ensemble show higher overall accuracies than those for the
regular SVM and RF for three training sample sets (Table II).
Again, SVM ensembles with ten iterations yield higher
accuracies than the regular SVM. Accuracies for 25 and 50
iterations are even higher but do not show relevant differences.
Fig. 4. Hekla data. Overall accuracy (in percent) achieved by the SVM
ensemble using different number of iterations, input features, and training
samples.
The experimental results again show that SVM ensembles
perform well with small training sample sets: Accuracies
achieved with the regular SVM and tr#25 are 4.5% and 4.9%
below accuracies obtained with the ensemble approach in 25 or
50 iterations. For tr#200, this difference is only 2.3%.
The investigation of the impact of the number of features on
the overall accuracy clearly showed that the use of only 10% of
the features was ineffective and resulted in lower overall accura-
cies than a regular SVM classifier does. The adequate number
of features is 30%, using the training sets tr#100 and tr#200,
and an increased number of features do not improve the overall

Citations
More filters
Journal ArticleDOI

Hyperspectral Image Classification Using Deep Pixel-Pair Features

TL;DR: Experimental results based on several hyperspectral image data sets demonstrate that the proposed pixel-pair method can achieve better classification performance than the conventional deep learning-based method.
Journal ArticleDOI

Deep learning classifiers for hyperspectral imaging: A review

TL;DR: A comprehensive review of the current-state-of-the-art in DL for HSI classification, analyzing the strengths and weaknesses of the most widely used classifiers in the literature is provided, providing an exhaustive comparison of the discussed techniques.
Posted Content

Class Imbalance Problem in Data Mining Review

TL;DR: There are different methods available for classification of imbalance data set which are divided into three main categories, the algorithmic approach, data-preprocessing approach and feature selection approach as mentioned in this paper.
Journal ArticleDOI

Diverse Region-Based CNN for Hyperspectral Image Classification

TL;DR: Experimental results with widely used hyperspectral image data sets demonstrate that the proposed classification framework, called diverse region-based CNN, can surpass any other conventional deep learning-based classifiers and other state-of-the-art classifiers.
Journal ArticleDOI

An SVM Ensemble Approach Combining Spectral, Structural, and Semantic Features for the Classification of High-Resolution Remotely Sensed Imagery

TL;DR: A new multifeature model, aiming to construct a support vector machine (SVM) ensemble combining multiple spectral and spatial features at both pixel and object levels is proposed, which provides more accurate classification results compared to the voting and probabilistic models.
References
More filters
Journal ArticleDOI

Random Forests

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Book

The Nature of Statistical Learning Theory

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Book

Data Mining: Practical Machine Learning Tools and Techniques

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Journal ArticleDOI

Bagging predictors

Leo Breiman
TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.
Journal ArticleDOI

A Tutorial on Support Vector Machines for Pattern Recognition

TL;DR: There are several arguments which support the observed high accuracy of SVMs, which are reviewed and numerous examples and proofs of most of the key theorems are given.
Related Papers (5)
Frequently Asked Questions (8)
Q1. What are the contributions in "Sensitivity of support vector machines to random feature selection in classification of hyperspectral data" ?

The authors investigated how the number of selected features and the size of the MCS influence classification accuracy using two hyperspectral data sets, from different environmental settings. 

Among MCS approaches based on iterative and independent variations of a base classifier, bagging and boosting are perhaps the widest used concepts for the construction of classifier ensembles. 

the instability of the base classifier, i.e., a small change in the training samples leads to varying classification results, is an important requirement for a bagging ensemble. 

One reason for this might be that, in the case of small suboptimal training sample sets, the SVM classifier is affected by the curse of dimensionality, even though SVMs usually perform well in high-dimensional feature space and with small training sets. 

The scene consists of 145 × 145 pixels, and 14 land cover classes were considered in their experiments, ranging from 54 to 2466 pixels in size. 

Particularly for small training sample sets, the presented SVM ensemble strategy by RFS constitutes a feasible approach and useful modification of the regular SVM. 

Irrespective of the number of training samples, the use of only 10% of the features (i.e., 16 bands) was ineffective in terms of accuracies (e.g., 93.4% accuracy with 50 iterations and tr#100). 

Due to the fact that even values outside these ranges yield results superior to those from regular SVM and the relatively small values for ensemble size and feature subset size, the use of SVM ensembles appears worthwhile, and efficient implementation strategies should be investigated.