scispace - formally typeset
Open AccessProceedings ArticleDOI

A two level approach for scene recognition

TLDR
This paper presents a stratified approach to both binary (outdoor-indoor) and multiple category of scene classification, which first learns mixture models for 20 basic classes of local image content based on color and texture information, and produces 20 probability density response maps indicating the likelihood that each image region was produced by each class.
Abstract
Classifying pictures into one of several semantic categories is a classical image understanding problem. In this paper, we present a stratified approach to both binary (outdoor-indoor) and multiple category of scene classification. We first learn mixture models for 20 basic classes of local image content based on color and texture information. Once trained, these models are applied to a test image, and produce 20 probability density response maps (PDRM) indicating the likelihood that each image region was produced by each class. We then extract some very simple features from those PDRMs, and use them to train a bagged LDA classifier for 10 scene categories. For this process, no explicit region segmentation or spatial context model are computed. To test this classification system, we created a labeled database of 1500 photos taken under very different environment and lighting conditions, using different cameras, and from 43 persons over 5 years. The classification rate of outdoor-indoor classification is 93.8%, and the classification rate for 10 scene categories is 90.1%. As a byproduct, local image patches can be contextually labeled into the 20 basic material classes by using loopy belief propagation (Yedidia et al., 2001) as an anisotropic filter on PDRMs, producing an image-level segmentation if desired.

read more

Content maybe subject to copyright    Report

A Two Level Approach for Scene Recognition
Le Lu Kentaro Toyama Gregory D. Hager
Computer Science Department Microsoft Research Computer Science Department
Johns Hopkins University One Microsoft Way Johns Hopkins University
Baltimore, MD 21218 Redmond, WA 98052 Baltimore, MD 21218
Abstract
Classifying pictures into one of several semantic cat-
egories is a classical image understanding problem. In
this paper, we present a stratified approach to both binary
(outdoor-indoor) and multiple category of scene classifica-
tion. We first learn mixture models for 20 basic classes of
local image content based on color and texture information.
Once trained, these models are applied to a test image, and
produce 20 probability density response maps (PDRM) in-
dicating the likelihood that each image region was produced
by each class. We then extract some very simple features
from those PDRMs, and use them to train a bagged LDA
classifier for 10 scene categories. For this process, no ex-
plicit region segmentation or spatial context model are com-
puted.
To test this classification system, we created a labeled
database of 1500 photos taken under very different envi-
ronment and lighting conditions, using different cameras,
and from 43 persons over 5 years. The classification rate
of outdoor-indoor classification is 93.8%, and the classifi-
cation rate for 10 scene categories is 90.1%. As a byprod-
uct, local image patches can be contextually labeled into
the 20 basic material classes by using Loopy Belief Propa-
gation [33] as an anisotropic filter on PDRMs, producing
an image-level segmentation if desired.
1 Introduction
Classifying pictures into semantic types of scenes [24,
26, 22] is a classical image understanding problem which
requires the effective interaction of high level semantic in-
formation and low level image observations. Our goal is
to build a very practical prototype for scene classification
of typical consumer photos, along the lines of the Kodak
system [22]. Thus, we are interested in systems that are ac-
curate, efficient, and which can work with a wide range of
photos and photographic quality.
Given the extremely large within-category variations in
typical photographs, it is usually simpler and thus easier to
break the problem of scene classification into a two-step
The work was partially performed when the first author was a summer
intern in Microsoft Research.
process. In this paper, we first train local, image patch
based color-texture Gaussian Mixture models (GMM) to
detect each of 20 materials in a local image patch. These
models are used to scan an image and generate 20 local re-
sponses for each pixel. Each response map, called a Prob-
ability Density Response Map (PDRM), can be taken as a
real-valued image indicating the relative likelihood of each
material at each image location. We then compute moments
from the response maps and form a feature vector for each
photo. By employing the random subspace method [12, 28]
and bootstrapping [31], we obtain a set of LDA scene clas-
sifiers over these feature vectors. These classification re-
sults are combined into the final decision through bagging
[2]. After learning the local and global models, a typical
1200 × 800 image can be classified in less than 1 second
with our unoptimized Matlab implementation. Therefore
there is a potential to develop a real-time scene classifier
upon our approach. A complete diagram of our approach is
showninFigure1.
There are several related efforts in this area. Luo et al.
[19, 22] propose a bottom-up approach to first find and label
well-segmented image regions, such as water, beach, sky,
and then to learn the spatial contextual model among re-
gions. A Bayesian network codes these relational depen-
dencies. By comparison, we do not perform an explicit
spatial segmentation, and we use relatively simple (LDA-
based) classification methods. Perona et al. [8, 30] present
a constellation model of clustered feature components for
object recognition. Their method works well for detecting
single objects, but strongly depends on the performance and
reliability of the interest detector [13]. In the case of scene
classification, we need to model more than one class of ma-
terial, where classes are non-structural and do not have sig-
nificant features (such as foliage, rock and et al.) [13]. This
motivates our use of a GMM on the feature space. In order
to maintain good stability, we estimate the GMM in a lin-
ear subspace computed by LDA. These density models are
quite flexible and can be used to model a wide variety of
image patterns with a good compromise between discrimi-
nation and smoothness.
Kumar et al. [14, 15] propose the use of Markov random
1

field (MRF)-based spatial contextual models to detect man-
made buildings in a natural landscape. They build a multi-
scale color and textual descriptor to capture the local depen-
dence among building and non-building image blocks and
use MRF to model the prior of block labels. In our work,
we have found that simple local labeling suffices to gener-
ate good classification results; indeed regularization using
loopy belief propagation method [33] yields no significant
improvement in performance. Thus, we claim that there is
no need to segment image regions explicitly for scene clas-
sification as other authors have done [22, 19, 15].
Linear discriminant analysis (LDA) is an optimization
method to compute linear combinations of features that
have more power to separate different classes. For texture
modeling, Zhu et al [35] pursue features to find the mar-
ginal distributions which are also the linear combinations
of the basic filter banks, but they use a much more com-
plex method (Monte Carlo Markov Chain) to stochastically
search the space of linear coefficients. In our case, the goal
is not to build a generative model for photos belonging to
different scenes, but simply to discriminate among them.
We show a simple method such as LDA, if designed prop-
erly, can be very effective and efficient to build a useful clas-
sifier for complex scenes.
We organize the r est of the paper as follows. In sec-
tion 2, we present the local image-level processing used
to create PDRMs. In section 3, we describe how PDRMs
are processed to perform scene classification. Experimen-
tal results and analysis on the performance of patch based
material detector and image based scene classification on a
database of 1500 personal photos taken by 43 users using
traditional or digital cameras over the last 5 years are given
in section 4. Finally we summarize the paper and discuss
the future work in section 5.
2 Local Image-Level Processing
The role of image-level processing is to roughly classify
local image content at each location in the image. The gen-
eral approach is to compute feature vectors of both color
and texture, and then develop classifiers for these features.
In our current implementation, we have chosen to perform
supervised feature classification. Although arguably less
practical than corresponding unsupervised methods, super-
vised classification permits us to control the structure of the
representations built at this level, and thereby to better un-
derstand the relationship between low-level representations
and overall system performance.
In this step, we compute 20 data driven probabilistic den-
sity models to describe the color-texture properties of image
patches of 20 predefined materials
1
. These 20 categories
1
The vocabulary of materials to be detected is designed by considering
their popularity in the usual family photos. This definition is, of course,
not unique or optimized.
are: building, blue sky, bush, other (mostly trained with hu-
man clothes), cloudy sky, dirt, mammal, pavement, pebble,
rock, sand, skin, tree, water, shining sky, grass, snow, car-
pet, wall and furniture.
To prepare the training data, we manually crop image re-
gions for each material in our database, and randomly draw
dozens of 25 by 25 pixel patches from each rectangle. Al-
together, we have 2000 image patches for each material.
Some examples of the cropped images and sampled image
patches are shown in Figure 2. For simplicity, we do not
precisely follow the material boundaries in the photos while
cropping. Some outlier features are thus included in the
training patches. Fortunately these outliers are smoothed
nicely by learning continuous mixture density models.
Multi-scale image representation and automatic scale se-
lection problem has been a topic of intense discussion over
the last decade [17, 20, 13, 6, 14]. In general, the approach
of most authors has been to first normalize i mages with re-
spect to the estimated scale of local image regions before
learning. However it is not a trivial problem to reliably re-
cover the local image scales for a collection of 1500 family
photos. We instead choose to train the GMM using the raw
image patches extracted directly from the original pictures.
For the labeled image patches with closer and coarser views,
their complex color-texture distributions can will be approx-
imated by a multi-modal Gaussian mixture model during
clustering.
2.1 Color-Texture Descriptor for Image Patches
Our first problem is to extract a good color-texture de-
scriptor which effectively allows us to distinguish the ap-
pearance of different materials. In the domain of color, ex-
perimental evaluation of several color models has not indi-
cated significant performance differences among color rep-
resentations. As a result, we simply represent the color of
an image patch as the mean color in RGB space.
There are also several methods to extract texture feature
vectors for image patches. Here we consider two: filter
banks, and the Haralick texture descriptor. Filter banks have
been widely used for 2 and 3 dimensional texture recogni-
tion. [16, 5, 27]. We apply the Leung-Malik (LM) filter
bank [16] which consists of 48 isotropic and anisotropic
filters with 6 directions, 3 scales and 2 phases. Thus, each
patch is represented by a 48 component feature vector.
The Haralick texture descriptor [10] is designed for im-
age classification and has been adopted in the area of im-
age retrieval [1]. Haralick texture measurements are de-
rived from the Gray Level Co-occurrence Matrix (GLCM).
GLCM is also called the Grey Tone Spatial Dependency
Matrix which is a tabulation of how often different combi-
nations of pixel brightness values (grey levels) occur in an
image region. GLCM texture considers the relation between
two pixels at a time, called the reference and the neighbor
pixel. Their spatial relation can be decided by two fac-
2

image patches
for blue sky
image patches
for tree
image patches
for pavement
Labeled Image Patches
for each of 20 materials
Patch-based Discriminative
Gaussian Mixture Density Model
for each of 20 materials
Patch Based Color-Texture
Feature Extraction
LDA Projection for Material
Classes
GMM for
pavement
GMM for tree
GMM for blue
sky
PDRM for
pavement
PDRM for tree
PDRM for blue
sky
Probability Density Response
Maps for each of 20 materials
Moments Feature Extraction and
Vectorization of Each PDRM
LDA Projection for Scene
Categories (Bootstrapping+Random
Subspace Sampling)
Bagging of LDA Classifiers for
Scene Categories
Patch-Level Processing Image-Level Procesing
Figure 1: The diagram of our two level approach for scene recognition. The dashed line boxes are the input data or output learned
models; the solid line boxes represent the functions of our algorithm.
Figure 2: (a, c, e, g) Examples of cropped subimages of building, building under closer view, human skin, and grass respectively. (b, d, f,
h) Examples of image patches of these materials including local patches sampled from the above subimages. Each local image patch is 25
by 25 pixels.
tors, the orientation and offset. Given any image patch, we
search all the pixel pairs satisfying a certain spatial relation
and record their second order gray level distributions with
a 2 dimensional histogram indexed by their brightness val-
ues
2
. Haralick also designed 14 different texture features
[10] based on the GLCM. We selected 5 texture features
including dissimilarity, Angular Second Moment (ASM),
mean, standard deviation (STD) and correction. Definitions
for these can be found in Appendix A.
There is no general argument that the filter bank features
or Haralick feature is a better texture descriptor. We eval-
uate their texture discrimination performances experimen-
2
The reference and neighbor pixel intensities normally need to be quan-
tized into 16 or less levels instead of 256 which results in not too sparse
GLCM.
tally in section 4 and find Haralick features generally per-
form better.
2.2 Discriminative Mixture Density Models for 20
Materials
The color and texture features for 2000 image patches
form, in principle, an empirical model for each material.
However, classifying new patches against the raw features
would require the solution to a high-dimensional nearest-
neighbor problem, and the result would be sensitive to noise
and outliers. Instead, we compute a continuous membership
function using a Gaussian mixture model.
Although we have 2000 training samples, our feature
vectors have 40 dimensions, so the training set is still too
sparse to learn a good mixture model without dimensional
reduction. Because one of our purposes is to maximize the
3

discrimination among different materials, Linear Discrim-
inant Analysis (LDA) [31] was chosen to project the data
into a subspace where each class is well separated. The
LDA computation is reviewed in appendix B.
When each class has a Gaussian density with a common
covariance matrix, LDA is the optimal transform to sepa-
rate data from different classes. Unfortunately the material
color-texture distributions all have multiple modes because
the training image patches are sampled from a large variety
of photos. Therefore we have two options: employ LDA to
discriminate among 20 material classes; or use LDA to sep-
arate all the modes of materials. Although the latter seems
closer to the model for which LDA was designed, we found
its material classification rate is worse because the optimal
separation among the multiple modes within the same ma-
terial class is irrelevant. Therefore we choose the former.
The LDA computation provides a projection of the origi-
nal feature space into a lower-dimensional feature space Z.
We assume that the color-texture features of each material
class is described by a finite mixture distribution on Z of
the form
P (z|c)=
g
c
k=1
π
c
k
G(z; µ
c
k
, Σ
c
k
),c=1, 2, ..., 20 (1)
where the π
c
k
are the mixing proportions (
g
c
k=1
π
c
k
=1)
and G(z; µ
c
k
, Σ
c
k
) is a multivariate Gaussian function de-
pending on a parameter vector θ
c
k
. The number of mix-
tures g
c
and the model parameters {π
c
k
c
k
} for each ma-
terial class c are initialized by spectral clustering [21] and
learned in an iterative Expectation-Maximization manner
[31, 7] where g
c
ranged from 4 to 8 depending on the mate-
rial class. As a summary, discriminative Gaussian mixture
models are obtained by applying LDA across the material
classes and learning the GMM within each material class,
respectively.
3 Global Image Processing
Once we obtain 20 Gaussian mixture models
{π
i
k
,P(z; θ
i
k
),i =1, 2, ..., 20} for 20 material classes,
we can evaluate the membership density values of image
patches for each material class. For any given photo, we
scan local image patches, extract their color-texture feature
vector, normalize each of its components from 0 to 1 [1],
project it to the lower dimensional subspace Z computed
by LDA, and finally compute the density value given by
equation (1) for all 20 material classes. The result is 20
real-valued grid maps
3
representing membership support
for each of the 20 classes. An example is shown in Figure
3. Two examples of the local patch labeling for indoor and
outdoor photos are shown in Figure 4.
Our next goal is to classify the photos into one of ten
3
The size of the map depends on the original photo size and the patches’
spatial sampling intervals.
Skin
Skin
Bush
Furniture
Wall&Curtain
Pebbel
Blue Sky
Water
Pavement
Sand
Rock
Bush
Grass
Other
Building
(a) (b)
Figure 4:
(a) The local patch material labeling results of an in-
door photo. (b) The local patch material labeling results of an
outdoor photo. Loopy belief propagation is used for enhancement.
The colored dots represent the material label and the boundaries
are manually overlayed for illustration purpose only.
categories: cityscape, landscape, mountain, beach, snow,
other outdoors, portrait, party, still life and other indoor. In
order to classify photos, we must still reduce the dimension
of the PDRMs to a manageable size. To do this, we compute
the zeroth, first, and second order moments of each PDRM.
Intuitively, the zeroth moment describes the prevalence of a
given material class in an image; the first moment describes
where it occurs, and the second moment its spatial ”spread”.
The moment features from the 20 PDRMs are combined in
a global feature vector Y.
Using the scene category labels of the training photos,
we now compute the LDA transform that attempts to sep-
arate the training feature vectors of different categories.
For the indoor-outdoor recognition, the LDA projected sub-
space has only one dimension. As a typical pattern classifi-
cation problem, we can find the optimal decision boundary
from the training data and apply it to the other testing data.
Finding decision boundaries for 10 scene category recog-
nition is more complex. In practice, it is very difficult to
train a GMM classifier because of the data is too sparse over
the 10 categories. As a result, we have used both the near-
est neighbor and Kmeans [31] classifiers for this decision
problem.
We have found that the standard method for creating an
LDA classifier works well for indoor-outdoor scene clas-
sification, but the classification results for 10 scene cate-
gories is not good enough to constitute a practical proto-
type. To improve the classification rate, we have imple-
mented variations on random subspace generation [12, 28]
and bootstrapping [31] to create multiple LDA classifiers.
These classifiers are combined using bagging [2]. Recall
that LDA is a two step process that first computes the singu-
lar value decomposition (SVD) [9] of the within-class scat-
ter matrix S
W
, then, after normalization, computes SVD
on the between-class scatter matrix S
B
. After the first step,
S
W
is divided into the principal subspace S
P
of the nonzero
eigenvalues Λ
P
and their associated eigenvectors U
P
, and
the null subspace S
N
with the zero eigenvalues Λ
N
and cor-
responding eigenvectors U
N
. In the traditional LDA trans-
form, only S
P
is used for the whitening of S
W
and nor-
4

Figure 3: (a) Photo 1459#. (b) Its confidence map. (c, d, e, f, g) Its support maps of blue sky, cloud sky, water, building and skin. Only
the material classes with the significant membership support are shown.
malization of S
B
while S
N
is discarded (see equation 10 in
Appendix B). Chen et al. [4] have found that the null s ub-
space S
N
satisfying U
T
P
S
W
U
P
=0also contains important
discriminatory information. Here we make use of this ob-
servation by uniformly sampling an eigenvector matrix U
r
from {U
P
U
N
} and use it in place of U in the initial LDA
projection step. Several projections (including the original
LDA projection matrix) are thus created.
In the second step of LDA, the subset V
P
of the full
eigenvector matrix V with the largest eigenvalues, nor-
mally replaces V in equation (10). It is also possible that
there is useful discriminative information in the subspace
{V V
P
}. Therefore we employ a similar sampling strat-
egy as [28] in the context of PCA by first sampling a small
subset of eigenvectors V
r
of {V V
P
}, then replacing V
with the joint subspace {V
P
V
r
} in equation 10.
Finally we also perform bootstrapping [31] by sampling
subjects of the training set and creating LDA classifiers
for these subsets. By the above three random sampling
processes, we learn a large set of LDA subspaces and classi-
fiers which we combine using the majority voting (bagging)
methods [2]. In Section 4, we show the bagged recognition
rates of 20 classifiers from bootstrapping replicates and 20
from random subspace sampling.
4 Experiments
Our photo collection currently consists of 540 indoor
and 860 outdoor customer photos. We randomly select half
of them as the training data and use other photos as the
testing data. We have also intentionally minimized redun-
dancy when collecting photos, i.e., only one photo is se-
lected when there are several similar pictures.
We first address the problem of the image patch based
color-texture feature description and classification. Com-
parison of the recognition rates of 1200 testing image
patches for each material class for different color-texture
descriptors, different numbers of training patches and dif-
ferent classifiers is provided in Figure 6 (a,b). In partic-
ular, we have also benchmarked the LDA+GMM model
against a brute-force nearest neighbor classifier. Let x
j
and
z
j
represent an image patch feature vector before and af-
ter the LDA projection, respectively. The nearest neighbor
classifier computes the class label of a testing patch j as
the label of that training patch l such that x
j
x
l
=
min
i
{x
j
x
i
} where i ranges over the training image
patches of all material classes. The GMM classifier simply
building
blue sky
bush
other
c-sky
dirt
mammal
pavement
pebble
rock
sand
skin
tree
water
s-sky
grass
snow
carpet
furniture
wall
Figure 5: The pairwise confusion matrix of 20 material
classes. The indexing order of the confusion matrix is
shown on the left of the matrix. The indexing order is sym-
metrical.
chooses the maximal class density, i.e. the class c
such that
P (z
j
|c
) = max
c=1,2,...,20
{P (z
j
|c)}.
Comparing the plots shown in Figure 6, the classifier
based on the Maximum Likelihood of GMM density func-
tions outperforms the Nearest Neighbor classifier, thus val-
idating the use of the LDA+GMM method. We also com-
pared the recognition rates of 4 different feature combina-
tions and found that the Haralick texture descriptor com-
bined with the mean color of the image patch yields the best
results. Finally, in Figure 6 (b), we see that the LDA+GMM
method improves the recognition rate significantly when in-
creasing the training image patch from 500, becoming sta-
ble after 2000 patches.
Figure 5 shows the confusion rate using the GMM clas-
sifiers learned from 2000 training image patches per class.
The size of the white rectangle in each grid is proportional
to the pairwise recognition error ratio. The largest and
smallest confusion rates are 23.6% and 0.24%, respectively.
From Figure 5, we see that pebble, rock and sand classes
are well separated which shows that our patch-level learn-
ing process achieves a good balance of Haralick texture
and color cues by finding differences of the material classes
with the similar color. There is significant confusion among
grass, bush and tree due to their similar color and texture
distribution. For some material classes, such as furniture,
carpet, and other, the overall confusion rates are also high.
For global classification, we have found that first order
5

Citations
More filters
Journal ArticleDOI

Machine learning

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Journal ArticleDOI

Landmark recognition with compact BoW histogram and ensemble ELM

TL;DR: A landmark recognition framework is proposed by employing a novel discriminative feature selection method and the improved extreme learning machine (ELM) algorithm to generate a set of preliminary codewords for landmark images.
Journal ArticleDOI

Landmark recognition with sparse representation classification and extreme learning machine

TL;DR: A novel landmark recognition algorithm using the spatial pyramid kernel based bag-of-words (SPK-BoW) histogram approach with the feedforward artificial neural networks (FNN) and the sparse representation classifier (SRC) is proposed.

Landmark recognition with sparse representationclassification and extreme learning machine

TL;DR: In this paper, a novel landmark recognition algorithm using the spatial pyramid kernel based bag-of-words (SPK-BoW) histogram approach with the feedforward artificial neural networks (FNN) and the sparse representation classifier (SRC) was proposed.
References
More filters
Book

Matrix computations

Gene H. Golub
Journal ArticleDOI

Textural Features for Image Classification

TL;DR: These results indicate that the easily computable textural features based on gray-tone spatial dependancies probably have a general applicability for a wide variety of image-classification applications.
Journal ArticleDOI

Bagging predictors

Leo Breiman
TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.
Journal ArticleDOI

Machine learning

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Frequently Asked Questions (13)
Q1. What are the future works in "A two level approach for scene recognition∗" ?

In future work, the authors intend to investigate unsupervised clustering methods for low-level image patch classification. In particular, the authors plan to apply their unsupervised, iterative LDA-GMM algorithm [ 18 ]. The authors also plan to investigate a hybrid approach where classified images are used as labeled data to compute an initial LDA projection, which is then subsequently refined with new, unlabeled images using iterative LDA-GMM. Finally, because LDA is only optimal when each class has a Gaussian density with a common covariance matrix, the non-parametric discriminant analysis ( proposed in [ 34 ] ) will be tested as a means to generalize their approach to a more comprehensive image database which may contain thousands of various kinds of photos. 

In this paper, the authors present a stratified approach to both binary ( outdoor-indoor ) and multiple category of scene classification. The authors then extract some very simple features from those PDRMs, and use them to train a bagged LDA classifier for 10 scene categories. To test this classification system, the authors created a labeled database of 1500 photos taken under very different environment and lighting conditions, using different cameras, and from 43 persons over 5 years. 

By employing the random subspace method [12, 28] and bootstrapping [31], the authors obtain a set of LDA scene classifiers over these feature vectors. 

To improve the classification rate, the authors have implemented variations on random subspace generation [12, 28] and bootstrapping [31] to create multiple LDA classifiers. 

For any given photo, the authors scan local image patches, extract their color-texture feature vector, normalize each of its components from 0 to 1 [1], project it to the lower dimensional subspace Z computed by LDA, and finally compute the density value given by equation (1) for all 20 material classes. 

Once the authors obtain 20 Gaussian mixture models {πik, P (z; θik), i = 1, 2, ..., 20} for 20 material classes, the authors can evaluate the membership density values of image patches for each material class. 

To prepare the training data, the authors manually crop image regions for each material in their database, and randomly draw dozens of 25 by 25 pixel patches from each rectangle. 

The number of mixtures gc and the model parameters {πck, θck} for each material class c are initialized by spectral clustering [21] and learned in an iterative Expectation-Maximization manner [31, 7] where gc ranged from 4 to 8 depending on the material class. 

An misclassified outdoor photo.moment features of PRDMs are useful in outdoor scenes, but reduce the recognition rate for indoor scenes. 

The authors evaluate their texture discrimination performances experimen-2The reference and neighbor pixel intensities normally need to be quantized into 16 or less levels instead of 256 which results in not too sparse GLCM.tally in section 4 and find Haralick features generally perform better. 

For texture modeling, Zhu et al [35] pursue features to find the marginal distributions which are also the linear combinations of the basic filter banks, but they use a much more complex method (Monte Carlo Markov Chain) to stochastically search the space of linear coefficients. 

The LDA computation is reviewed in appendix B.When each class has a Gaussian density with a common covariance matrix, LDA is the optimal transform to separate data from different classes. 

the authors describe a combination of LDA and Gaussian mixture models that achieves a good balance of discrimination and smoothness.