What are the common concepts for the construction of classifier ensembles?

Among MCS approaches based on iterative and independent variations of a base classifier, bagging and boosting are perhaps the widest used concepts for the construction of classifier ensembles.

What is the main reason why the classifier is not stable?

the instability of the base classifier, i.e., a small change in the training samples leads to varying classification results, is an important requirement for a bagging ensemble.

What is the reason for the large surplus in accuracy achieved by the SVM ensemble approach?

One reason for this might be that, in the case of small suboptimal training sample sets, the SVM classifier is affected by the curse of dimensionality, even though SVMs usually perform well in high-dimensional feature space and with small training sets.

How many pixels were considered in the experiment?

The scene consists of 145 × 145 pixels, and 14 land cover classes were considered in their experiments, ranging from 54 to 2466 pixels in size.

What is the way to use the SVM ensemble strategy?

Particularly for small training sample sets, the presented SVM ensemble strategy by RFS constitutes a feasible approach and useful modification of the regular SVM.

Why does the use of SVM ensembles yield better results than regular SVM?

Due to the fact that even values outside these ranges yield results superior to those from regular SVM and the relatively small values for ensemble size and feature subset size, the use of SVM ensembles appears worthwhile, and efficient implementation strategies should be investigated.

(Open Access) Sensitivity of Support Vector Machines to Random Feature Selection in Classification of Hyperspectral Data (2010) | Björn Waske

Q: What are the contributions in "Sensitivity of support vector machines to random feature selection in classification of hyperspectral data" ?

The authors investigated how the number of selected features and the size of the MCS influence classification accuracy using two hyperspectral data sets, from different environmental settings.

2880 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 7, JULY 2010

Sensitivity of Support Vector Machines to Random

Feature Selection in Classiﬁcation

of Hyperspectral Data

Björn Waske, Member, IEEE, Sebastian van der Linden,

Jón Atli Benediktsson, Fellow, IEEE, Andreas Rabe, and Patrick Hostert

Abstract—The accuracy of supervised land cover classiﬁcations

depends on factors such as the chosen classiﬁcation algorithm,

adequate training data, the input data characteristics, and the se-

lection of features. Hyperspectral imaging provides more detailed

spectral and spatial information on the land cover than other

remote sensing resources. Over the past ten years, traditional and

formerly widely accepted statistical classiﬁcation methods have

been superseded by more recent machine learning algorithms,

e.g., support vector machines (SVMs), or by multiple classiﬁer

systems (MCS). This can be explained by limitations of statistical

approaches with regard to high-dimensional data, multimodal

classes, and often limited availability of training data. In the pre-

sented study, MCSs based on SVM and random feature selection

(RFS) are applied to explore the potential of a synergetic use of

the two concepts. We investigated how the number of selected

features and the size of the MCS inﬂuence classiﬁcation accuracy

using two hyperspectral data sets, from different environmental

settings. In addition, experiments were conducted with a vary-

ing number of training samples. Accuracies are compared with

regular SVM and random forests. Experimental results clearly

demonstrate that the generation of an SVM-based classiﬁer system

with RFS signiﬁcantly improves overall classiﬁcation accuracy as

well as producer’s and user’s accuracies. In addition, the ensemble

strategy results in smoother, i.e., more realistic, classiﬁcation maps

than those from stand-alone SVM. Findings from the experiments

were successfully transferred onto an additional hyperspectral

data set.

Index Terms—Classiﬁer ensembles, hyperspectral data, multi-

ple classiﬁer systems (MCSs), random feature selection (RFS),

support vector machines (SVMs).

I. INTRODUCTION

EMOTE sensing applications, such as land cover clas-

siﬁcation, provide a variety of important information

for decision support and environmental monitoring systems.

When complex environments are mapped or when very detailed

Manuscript received July 2, 2009; revised November 19, 2009. Date of

publication March 29, 2010; date of current version June 23, 2010. This work

was supported in part by the Research Fund of the University of Iceland and in

part by the Icelandic Research Fund.

B. Waske is with the Institute of Geodesy and Geoinformation, Faculty of

Agricultural, University of Bonn, 53115 Bonn, Germany (e-mail: bwaske@

uni-bonn.de).

J. A. Benediktsson is with the Faculty of Electrical and Computer Engineer-

ing, University of Iceland, 107 Reykjavik, Iceland.

S. van der Linden, A. Rabe, and P. Hostert are with the Geomatics Labo-

ratory, Geography Department, Humboldt-Universität zu Berlin, 10099 Berlin,

Germany.

Color versions of one or more of the ﬁgures in this paper are available online

at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TGRS.2010.2041784

analyses are performed, the spectral and spatial resolution

requirements can be very high, e.g., in urban area mapping,

the characterization of mineral composition, or in plant-type

differentiation [1], [2]. In such situations, airborne hyperspec-

tral sensors are—at the moment—probably the most valuable

single data source. Data from these sensors provide detailed and

spectrally continuous spatial information on the land surface,

ranging from the visible to the short-wave infrared regions

of the electromagnetic spectrum. They enable discriminating

between spectrally similar land cover classes that occur at

highly frequent spatial patterns [3].

However, it is well known that increasing data dimensionality

and high redundancy between features might cause problems

during data analysis, e.g., in the context of supervised classi-

ﬁcation: The overall map accuracy can decrease when only a

limited number of training samples are available [4]. Against

this background, machine learning algorithms such as support

vector machines (SVMs) and concepts like multiple classiﬁer

systems (MCSs) have emerged over the past decade [5]–[9].

SVMs construct an optimal separating hyperplane between

training samples of two classes within the multidimensional

feature space. In linear nonseparable cases, the data are mapped

using a kernel function into a higher dimensional feature space.

This enables the deﬁnition of a separating hyperplane, which

appears nonlinear in the original feature space. Based on this

so-called kernel trick, SVM can describe complex classes with

multimodal distributions in the feature space. Despite their

relatively good performance when high-dimensional data are

classiﬁed with only small training sets, a sufﬁcient number of

samples should be considered to ensure that adequate train-

ing samples are included during supervised classiﬁcation [7].

Recent studies discuss the use of SVM for spectral–spatial

classiﬁcation of urban hyperspectral data [10]–[12] and multi-

source classiﬁcation [13]–[15] and extend the supervised SVM

techniques by semisupervised concepts [16], [17].

The concept of MCS, on the other hand, does not refer to

a speciﬁc algorithm but to the idea of combining outputs from

more than one classiﬁer to enhance classiﬁcation accuracy [18].

These outputs may result from either a set of different classi-

ﬁcation algorithms or independent variants of the same algo-

rithms (i.e., the so-called base classiﬁer). The latter are achieved

by modifying aspects of the input data, to which the base classi-

ﬁer is sensitive during separate training processes. This includes

the following: the generation of training sample subsets, named

bagging [19], the sequential reweighting of training samples,

boosting [20], and the generation of feature subsets, e.g., by

WA S K E et al.: SENSITIVITY OF SVMs TO RANDOM FEATURE SELECTION 2881

random feature selection (RFS) [21]. Afterward, outputs from

the various instances of the same base classiﬁer are combined to

create t he ﬁnal class decision [18]. Many MCS applications em-

ploy base classiﬁers of relatively little computational demand,

usually self-learning decision trees (DTs), which construct

many rather simple decision boundaries that are parallel to the

feature axis. MCSs were recently reviewed in the context of

remote sensing [22] and yield excellent results when dealing

with hyperspectral and multisource data sets [8], [9], [23]–[25].

In summary, the concepts of SVM and MCS are based on

different ideas: While the ﬁrst focuses on optimizing a single

processing step, i.e., the ﬁtting of the presumably optimal

separating hyperplane, the latter relies on an ideally positive in-

ﬂuence of a combined decision derived from several s uboptimal

yet sometimes computationally simple outputs. Nevertheless,

the two approaches are not exclusive, and it appears desirable

to combine them in a complementary approach. Studies trying

to employ computationally more complex classiﬁers such as

neural networks [26] and SVMs [27] in MCSs exist, but they

are very limited, particularly in the remote sensing context. In

[28], the use of an SVM ensemble for the spectral–contextual

classiﬁcation of a single high-spatial resolution remote sensing

image was shown to increase the accuracy of individual results.

Results in a non-remote-sensing context are more controversial,

though: Whereas bagging and boosting were successfully used

with SVMs in [27], the results of an empirical analysis in [29]

demonstrate that ensembles based on bagging and boosting are

not generally better than a single SVM in terms of accuracy.

Given these discrepancies and the signiﬁcantly higher compu-

tational demand of SVM compared to DT, the idea of an MCS

based on SVM requires well thought and systematic approaches

to develop an SVM ensemble concept that leads to a general

trend of increasing accuracy and to give appropriate guidelines

to achieve efﬁcient processing, i.e., reliable default parameters.

We present an SVM ensemble that uses RFS to generate

independent variants of SVM results. This novel combination

of the two classiﬁer concepts appears particularly useful with

regard to the high dimensionality and redundancy of hyperspec-

tral information. We expect the results of this MCS to show

clearer trends than those reported in previous studies that use

bagging or boosting [27]–[29]. In order to evaluate the potential

of such a concept, we focus on three main research questions.

1) Is there a signiﬁcant increase in accuracy compared to

regular SVM and advanced DT classiﬁcation, when RFS

is performed to construct SVM ensembles?

2) What is the impact of the two parameters, namely, feature

subset size and ensemble size, on the accuracy and stabil-

ity of ensembles in terms of the classiﬁcation result?

3) Is it possible to derive default values or recommendations

for the parameters in (2) in order to make the use of SVM

ensembles with RFS feasible?

To answer these three research questions, the speciﬁc objec-

tive of our study is the classiﬁcation of two different hyper-

spectral data sets, i.e., an urban area from the city of Pavia,

Italy, and a volcanic area from Hekla volcano, Iceland, with

various SVM ensembles. These are generated by systematically

increasing the number of randomly selected features before

SVM classiﬁcation as well as the number of repetitions that are

combined for the ﬁnal class decision. Moreover, the size of the

training sets is varied to investigate the possible inﬂuences of

the number of training samples on the classiﬁcation accuracy.

This paper is organized as follows. Section II introduces the

conceptual and algorithmic framework of SVM and MCS.

The used SVM ensemble strategy is explained in Section III.

The experimental setup and experimental results are presented

in Section IV. Section V discusses results, followed by the

conclusion in Section VI.

II. B

ACKGROUND

A. SVMs

The SVM is a universal learning machine for solving binary

classiﬁcation problems and can be seen as an approximate

implementation of Vapnik’s Structural Risk Minimisation prin-

ciple, which has been shown superior to traditional Empirical

Risk Minimisation principle [30]. SVMs are able to sepa-

rate complex (e.g., multimodal) class distributions in high-

dimensional feature spaces by using nonlinear kernel functions

and to deal with noise and class confusion via a regularization

technique. A detailed introduction on the general concept of

SVM is given in [31]; an overview in the context of remote

sensing can be found, e.g., in [5] and [6].

In this paper, a Gaussian radial basis function (RBF) kernel

function is used, which is widely accepted in remote sensing

applications. A one-against-one (OAO) strategy is used to

handle multiclass problems with the originally binary SVM.

The OAO class decision is determined by a majority vote using

classwise decisions [6], [7].

B. MCSs

The concept of classiﬁer ensembles or MCSs is based on

the hypothesis that independent classiﬁers generate individual

errors, which are not produced by the majority of the other

classiﬁers within the ensemble. The basic aim in MCS is

therefore to generate a diverse set of classiﬁers, making each

individual classiﬁer as unique as possible [18].

Among MCS approaches based on iterative and independent

variations of a base classiﬁer, bagging and boosting are perhaps

the widest used concepts for the construction of classiﬁer en-

sembles. Boosting [20] consecutively modiﬁes the training data

by adaptively weighting the training samples after each individ-

ual iteration. Accurately classiﬁed samples are assigned a lower

weight than those samples classiﬁed incorrectly. Bagging, on

the other hand, randomly generates a set of training sample

subsets. Each sample subset is used to train an individual base

classiﬁer, and the outputs are combined to generate a ﬁnal

class decision. While boosting is performed in series, bagging

[19] can be performed in parallel and may result in lower

computation time.

The generation of classiﬁer ensembles by RFS—also known

as the random subspace method or attribute bagging—is an-

other method used to train an MCS [21], [32]. Each base

classiﬁer is trained on randomly generated and independently

drawn feature subsets. The diverse outputs are combined to

deﬁne the ﬁnal MCS class decision, often by a majority vote.

In [33], the concept was successfully applied to a set of multi-

temporal SAR images using a DT as a base classiﬁer.

Breiman [34] introduced random forests (RF) which is a

DT-based classiﬁer concept based on training sample and

2882 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 7, JULY 2010

feature subsets. Each tree within the ensemble is trained on

bootstraps of the training samples (i.e., the same as in bagging).

In addition, at each internal node of the DT, a randomly

generated feature subset is used.

III. SVM E

NSEMBLE STRATEGY

In [29], SVM ensembles were constructed by bagging and

different boosting variants. On average, these ensembles out-

performed a standard SVM in terms of accuracy. Nevertheless,

a general improvement of MCS strategies by incorporating

SVM cannot be stated based on the results in [29], and greater

computational time does not necessarily result in higher clas-

siﬁcation accuracies. On the other hand, increased computa-

tions may sometimes, in fact, reduce the accuracy. In [35],

the stability of SVM in terms of classiﬁcation performance

was investigated following a bagging strategy. The obtained

experimental results underline that bagging does not improve

nor even slightly decrease the classiﬁcation results of SVM.

Thus, Buciu et al. [35] consider SVM as a stable classiﬁer with

regard to bagging. However, the instability of the base classiﬁer,

i.e., a small change in the training samples leads to varying

classiﬁcation results, i s an important requirement for a bagging

ensemble. Boosting, on the other hand, is computationally very

demanding, because it must be processed in series and performs

less efﬁcient than bagging in the context of SVM [29]. In regard

to these ﬁndings and the fact that ensembles generated by RFS

can outperform bagging and boosting in terms of accuracy [21],

[32], the construction of SVM ensembles by RFS seems more

adequate.

Whereas bagging is based on the selection of training sam-

ples, RFS modiﬁes the d-dimensional feature space 

, and

each classiﬁer within the ensemble is generated by randomly

choosing a feature subspace 



of user-deﬁned size d



. Ho [21]

recommends the selection of d



≈ d/2, features for a DT-based

classiﬁer system, where d is the total number of features.

By using RFS for generating an ensemble of size z for solv-

ing a classiﬁcation problem with n classes, each classiﬁer with

class decision Y

(x) ∈{1,...,n} and 1 ≤ k ≤ z is trained

on a randomly generated feature subset. The feature subsets

are usually generated without replacement, and d



features are

selected out of the whole feature set of size d. As in bagging,

the different classiﬁcations are often combined to the ﬁnal class

decision Y (x) by a simple majority vote using classwise scores

Y (x) = arg max

i∈{1,...,n}

(x) (1)

(x)=



k=1



1, if Y

(x)=i

0, otherwise.

(2)

Our proposed strategy for the MCS is shown in Fig. 1. As a

ﬁrst step after preprocessing, an RFS is performed to perform

various sets of feature subspaces of size d



(the feature subset

size). Afterward, an individual SVM is applied on the feature

subset, providing an individual classiﬁcation result. These steps

are performed z times, either parallel or in series, with z being

the number of classiﬁers within the ensemble (i.e., ensemble

size). The z classiﬁcation outputs are combined with a majority

vote, according to (1). The proposed fusion strategy has been

compared to alternative approaches, e.g., majority vote on

binary decision and class probability values, which, in no case,

Fig. 1. Schematic diagram of the RFS using z iterations and selecting d



out of

d available features. It should be noted that the classiﬁcation of each individual

SVM already requires a voting strategy (Section II-A). This is not shown in the

diagram, and the individual outputs refer to a land cover map containing class

labels between 1 and c.

further improved the results. We believe that a detailed analysis

of more sophisticated fusion strategies may be worthwhile. It

requires a separate analysis that goes beyond the scope of this

paper, however.

IV. E

XPERIMENTAL SETUP AND RESULTS

A. Experimental Setup

Two hyperspectral data sets from study sites with different

environmental setting were used in this study. Both classiﬁ-

cation problems are set up in ways that require hyperspectral

information for an appropriate description of target surface

types, i.e., lava types from different mineral composition and

age as well as urban surfaces. The latter study also requires a

very high spatial resolution in order to avoid a high fraction of

mixed pixels.

The ﬁrst study site lies around the central-volcano Hekla, one

of the most active volcanoes in Iceland (Fig. 2). The image was

collected by Airborne Visible/Infrared Imaging Spectrometer

(AVIRIS) on a cloud-free day (June 17, 1991). AVIRIS operates

from the visible to the short-wave infrared regions of the

electromagnetic spectrum, ranging from 0.4 to 2.4 μm. Due to a

malfunction, spectrometer 4 operating in the wavelength range

from 1.84 to 2.4 μm was working incorrectly. Thus, the bands

from spectrometer 4 were deleted from the image data along

with the ﬁrst channels of each of the three spectrometers, which

contained noise. Finally, 157 data channels were left. The image

strip is 2048 × 614 pixels, with a spatial resolution of 20 m

[36]. The classiﬁcation aims on 22 land cover classes, mainly

lava ﬂows from different eruptions and older hyaloclastites.

The second data set was acquired by the ROSIS-3 (Reﬂective

Optics System Imaging Spectrometer) sensor over the city

WA S K E et al.: SENSITIVITY OF SVMs TO RANDOM FEATURE SELECTION 2883

Fig. 2. AVIRIS data, Hekla, Iceland. False-color composite and correspond-

ing ground truth areas representing 22 land cover classes.

Fig. 3. ROSIS-3 data, Pavia, Italy. False-color composite and corresponding

ground truth areas representing nine land cover classes.

of Pavia, Italy (Fig. 3). The 115 bands of the sensor cover

the 0.43–0.86-μm range of the electromagnetic spectrum. The

spatial resolution of the data set is 1.3 m per pixel. The image

strip is 610 × 340 pixels, sounding the University of Pavia.

Some bands have been removed due to noise; the remaining 103

channels have been used in the classiﬁcation. The classiﬁcation

is aiming nine land cover classes.

For both data sets, ground truth information was used for

generating training and validation sample sets, using expert

knowledge in image interpretation. An equalized random sam-

pling was performed, guaranteeing that all classes are equally

included in the sample set. To investigate the possible inﬂuence

of the number of training samples on the performance of the

proposed method, training sets with different size were gener-

ated, containing 25, 50, 100, and 200 training samples per class,

respectively (from now on referred to as tr#25, tr#50, ...). For

each data set, an independent test set was available, containing

14 966 and 40 002 samples, respectively. Several experiments

were conducted to investigate the sensitivity of SVM classiﬁers

to RFS. Diverse SVM ensembles were generated for the two

data sets using the following: 1) feature subsets of differ-

ent sizes (10%, 20%,...,90% of all available features) and

TAB LE I

ATA SET 1—HEKLA.OVERALL ACCURACY (IN PERCENT)USING

DIFFERENT METHODS AND NUMBER OF TRAINING SAMPLES.THE

SVM ENSEMBLES ARE BASED ON THE RANDOMLY SELECTION OF

30% OF ALL FEATURES.

∗

INDICATES A SIGNIFICANT DIFFERENCE

(α =0.01) IN COMPARISON TO THE REGULAR SVM

2) different numbers of classiﬁers within the ensemble (10, 25,

and 50). Aside from this, a standard SVM, which is based on

the whole data set, and an RF classiﬁer [34] were applied to the

images using the same training sets.

The training and classiﬁcation were performed using im-

ageSVM [37], which is a freely available IDL/ENVI imple-

mentation. ImageSVM

is based on the LIBSVM approach by

Chang and Lin [38] for the training of the SVM. A Gaussian

RBF kernel was used, and a regularization parameter C and

a kernel parameter g are determined by a grid search using

a threefold cross validation. Training and grid search are per-

formed for each SVM classiﬁer in the ensemble, i.e., in the

case of an ensemble size of 50, individual SVMs are trained on

50 different feature subsets with each training including its own

grid search.

The RF classiﬁcation was performed with WEKA [39] using

100 iterations. First experiments have shown that an increased

number of iterations do not signiﬁcantly improve the total

accuracy. The number of features at each node was set to

the square root of the number of input features. This value is

considered adequate in literature [24] and proved reliable in a

previous studies by the authors [25].

Accuracy assessment was performed, giving overall accu-

racies and confusion matrices that were used to calculate the

producer’s and user’s accuracies. Based on 99% conﬁdence

intervals, a statistical comparison of ensemble-based results and

maps from regular SVM was performed.

B. Results for Data From Hekla, Iceland

The results demonstrate that the SVM ensemble outper-

formed the regular SVM and RF classiﬁer in terms of overall

accuracy for the four training sample sets. The RF achieved an

overall accuracy between 83.3% and 92.7%; a regular SVM,

on the other hand, achieved accuracies between 82.2% and

96.9%. In contrast to this, an SVM ensemble that is based

on RFS achieves overall accuracies between 88.3% and 97.7%

(Table I).

These results clearly underline that the overall accuracy

can be increased by the SVM ensemble. However, a strong

Software available at http://www.hu-geomatics.de.

2884 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 7, JULY 2010

inﬂuence of the number of training samples on classiﬁcation

accuracy can be observed: For all methods, accuracies increase

monotonically with sample set size. The rate of this increase is

highest for the regular SVM (14.7% difference between tr#25

and tr#200) and lowest for SVM ensembles with 50 iterations

(8%). SVM ensembles with ten iterations outperform RF and

a regular SVM in t erms of accuracy for all training sample

sets. The overall accuracy was further improved by increasing

the number of iterations from 10 to 25. No signiﬁcant further

increase takes place for larger ensembles.

Feature subset size signiﬁcantly affected the overall accuracy

(Fig. 4).

Irrespective of the number of training samples, the use of

only 10% of the features (i.e., 16 bands) was ineffective in

terms of accuracies (e.g., 93.4% accuracy with 50 iterations and

tr#100). I n many cases, the accuracies were below the accura-

cies for the regular SVM. In contrast to this, the use of 20% of

the available features increased the accuracy and outperformed

a regular SVM classiﬁer in terms of accuracy, or performed

at last equally well. The maximum accuracy is achieved by

generating an ensemble with 30% of the features (e.g., 96.6%

with 50 iterations and tr#100). The use of additional features

did not further improve the overall accuracy. In the case of tr#25

and tr#50, the overall accuracy decreases signiﬁcantly when

feature subset size is further increased.

With regard to the class accuracies, the proposed strategy

outperformed the other methods in the majority of the cases.

In Fig. 5, the differences between the producer’s and users’s ac-

curacy, achieved by the SVM ensemble and a regular SVM, are

shown. While some classes show almost no differences, such as

the three vegetation-covered Andesite Lava classes (classes 6, 7,

and 8) as well as Firn and Glacier Ice and Snow (classes 21 and

22), the difference tends signiﬁcantly toward the positive in the

majority of the classes, i.e., the ensemble achieves higher class

accuracies. As for the overall accuracy, this effect is reduced

with an increasing number of training samples (Fig. 4). In the

case of tr#50, the class accuracies are often improved by 5%

or even more using the ensemble approach. On the other hand,

the improvement in the case of the training set tr#200 is less

signiﬁcant and usually below 5%. Only the user accuracies for

classes 5 and 14 are improved by more than 5% (Andesite Lava

1991 II and Hyaloclastite Formation III) due to the ensemble

classiﬁer (see Figs. 5a and 5b). The two classiﬁcation maps

achieved by the regular SVM and ensemble appear similar in

many regions (Figs. 6 and 7). Nevertheless, some differences

exists and the map achieved by the ensemble appears more

homogenous. Classiﬁcation accuracies achieved by the SVM

ensemble were signiﬁcantly higher than those produced by the

regular SVM classiﬁer with the respective number of train-

ing samples based on a test with a 99% conﬁdence interval

(α =0.01) (Table I).

C. Results for Data From Pavia, Italy

As for the Hekla data set, the results achieved with the SVM

ensemble show higher overall accuracies than those for the

regular SVM and RF for three training sample sets (Table II).

Again, SVM ensembles with ten iterations yield higher

accuracies than the regular SVM. Accuracies for 25 and 50

iterations are even higher but do not show relevant differences.

Fig. 4. Hekla data. Overall accuracy (in percent) achieved by the SVM

ensemble using different number of iterations, input features, and training

samples.

The experimental results again show that SVM ensembles

perform well with small training sample sets: Accuracies

achieved with the regular SVM and tr#25 are 4.5% and 4.9%

below accuracies obtained with the ensemble approach in 25 or

50 iterations. For tr#200, this difference is only 2.3%.

The investigation of the impact of the number of features on

the overall accuracy clearly showed that the use of only 10% of

the features was ineffective and resulted in lower overall accura-

cies than a regular SVM classiﬁer does. The adequate number

of features is 30%, using the training sets tr#100 and tr#200,

and an increased number of features do not improve the overall

Sensitivity of Support Vector Machines to Random Feature Selection in Classification of Hyperspectral Data

Figures

Citations

Hyperspectral Image Classification Using Deep Pixel-Pair Features

Deep learning classifiers for hyperspectral imaging: A review

Class Imbalance Problem in Data Mining Review

Diverse Region-Based CNN for Hyperspectral Image Classification

An SVM Ensemble Approach Combining Spectral, Structural, and Semantic Features for the Classification of High-Resolution Remotely Sensed Imagery

References

Random Forests

The Nature of Statistical Learning Theory

Data Mining: Practical Machine Learning Tools and Techniques

Bagging predictors

A Tutorial on Support Vector Machines for Pattern Recognition

Related Papers (5)

Classification of hyperspectral remote sensing images with support vector machines

Random Forests

Support vector machines in remote sensing: A review

On the mean accuracy of statistical pattern recognizers

Deep Feature Extraction and Classification of Hyperspectral Images Based on Convolutional Neural Networks

Frequently Asked Questions (8)

Q1. What are the contributions in "Sensitivity of support vector machines to random feature selection in classification of hyperspectral data" ?

Q2. What are the common concepts for the construction of classifier ensembles?

Q3. What is the main reason why the classifier is not stable?

Q4. What is the reason for the large surplus in accuracy achieved by the SVM ensemble approach?

Q5. How many pixels were considered in the experiment?

Q6. What is the way to use the SVM ensemble strategy?

Q7. How many features did the SVM ensemble use?

Q8. Why does the use of SVM ensembles yield better results than regular SVM?