scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Boosted local structured HOG-LBP for object localization

20 Jun 2011-pp 1393-1400
TL;DR: This paper proposes a boosted Local Structured HOG-LBP based object detector to capture the object's local structure, and develop the descriptors from shape and texture information, respectively, and presents a boosted feature selection and fusion scheme for part based object detectors.
Abstract: Object localization is a challenging problem due to variations in object's structure and illumination. Although existing part based models have achieved impressive progress in the past several years, their improvement is still limited by low-level feature representation. Therefore, this paper mainly studies the description of object structure from both feature level and topology level. Following the bottom-up paradigm, we propose a boosted Local Structured HOG-LBP based object detector. Firstly, at feature level, we propose Local Structured Descriptor to capture the object's local structure, and develop the descriptors from shape and texture information, respectively. Secondly, at topology level, we present a boosted feature selection and fusion scheme for part based object detector. All experiments are conducted on the challenging PASCAL VOC2007 datasets. Experimental results show that our method achieves the state-of-the-art performance.

Summary (6 min read)

1. Introduction

  • The assessment of nanoparticles (NPs) using Caenorhabditis elegans (C. elegans) has rapidly increased in the last years, supporting its suitability as an in vivo model to screen NPs.
  • It is unlikely that all researchers will routinely use identical methodologies in initial experiments of novel materials, as they frequently have different objectives.
  • To date, the majority of studies on NPs in C. elegans have focused on noble metal and metal oxide NPs.
  • Finally, C. elegans also serves as an in vivo platform to validate the efficiency of a nanomaterial for a given application, such as imaging or targeting (application labeled as ‘in vivo validation’ in the figure).

2.1. Exposure media

  • In the laboratory, C. elegans are typically grown on agar plates of Nematode Growth Media (NGM) with Escherichia coli (E. coli) OP50 as food source.
  • Conversely, some authors reported formation of micrometric aggregates of Ag-NPs in K-medium that rapidly settled from suspension leading to an elevated effective local ‘‘dose’’ in the bottom of the wells.
  • The left panel focuses on the exposure conditions.

2.3. Physicochemical properties of the test NPs

  • This section describes general findings regarding the influence of NP size, surface coating and composition in the response of C. elegans to NPs.
  • The conditions of Fig. 3 Transmission Electron Microscopy (TEM) of ZnO-NPs in the exposure media.
  • They proposed that larger particle size may enable faster uptake rates by oral ingestion and thus higher mass doses than exposure to smaller stable particle sizes.
  • 26 Surface charge also affected the oxidation state of Ce in the C. elegans tissues after uptake: greater reduction of Ce from Ce(IV) to Ce(III) was found in C. elegans when exposed to the neutral and negatively charged relative to positively charged CeO2-NPs.
  • Besides metal and metal oxide NPs, quantum dots69 and carbonbased nanomaterials such as graphene oxide have been widely studied recently.

3. Techniques to investigate entrance route, uptake, biodistribution and fate

  • Nanopharmacokinetics are critical for both environmental and biomedical research.
  • Based on the small size of the materials under study, techniques with spatial resolution at the nanoscale are required to discern and identify particles with certainty at multiple biological levels, from the organismal down the intracellular scales, in order to shed light on single-NP localization, translocation routes and NP status in vivo.
  • In C. elegans, entrance of metal and metal oxide NPs has been reported to occur mainly through the alimentary system, consistent with the fact that C. elegans does not discern between entities up to 5 mm when feeding.
  • The most prevalent techniques applied to date to determine NP uptake and fate in this animal model have been fluorescent microscopy,42,51,52 hyperspectral dark-field microscopy,33,38 and to a much lesser extent, transmission electron microscopy (TEM),53,68 synchrotron-based techniques25 and other analytical techniques.
  • The authors illustrate how the entrance route, uptake, biodistribution and fate of metal and metal oxide NPs have been evaluated in C. elegans, and propose to extend the toolkit of available techniques.

3.1. Fluorescence microscopy

  • By using fluorescence microscopy Pluskota et al. showed that fluorescently labeled 50 nm NPs (PS and SiO2) were efficiently ingested by the worms during feeding, and translocated to primary organs such as epithelial cells of the intestine, as well as to secondary organs belonging to the reproductive tract.
  • Within the intestine, NPs accumulate with decreasing concentrations from the anterior to the posterior regions of the intestine.
  • Cytoplasmic uptake of 50 nm PS-NPs was observed in early embryos.52 Scharf et al. identified two entry portals of silica and PS-NPs: via the pharynx to the intestinal system, and via the vulva to the reproductive system.
  • 48 However, it is important to note that fluorescence microscopy is limited to a spatial resolution of 200 nm, hence, without the use of complementary techniques, the possibility that single NPs penetrate further into C. elegans tissue or are taken up intracellularly cannot be excluded.

3.3. Transmission electron microscopy

  • TEM has sufficient spatial resolution to allow single-NP detection.
  • Adapted from Höss et al.27 (B) Table indicating the distribution of silica, polystyrene and carboxy-polystyrene NPs within the body of treated C. elegans.
  • It can also provide further clues about NP translocation routes, i.e. by endocytosis, although this should be confirmed by chemical identification or molecular mechanistic evidence.
  • Sampling at random locations along the body of the worm can limit the information obtained by TEM visualizations; moreover, the analysis of a large number of sections is very costly and laborious.
  • To maximize the control of the anatomical area investigated in the cross-sections, targeted ultramicrotomy protocols can be applied using correlated light and electron microscopy (CLEM), with the aim of establishing a statistically significant and biologically meaningful link between the location in the body and the NP status in vivo.75.

3.4. Scanning electron microscopy

  • Scanning electron microscopy (SEM) allows investigation of the morphology of the C. elegans external surface, the cuticle, in detail.
  • After a 24 h exposure to citrate coated 10 nm Ag-NPs in NGM agar, they observed severe epidemic edema and bursting of the cuticle of C. elegans (Fig. 7A–C), suggesting that Ag-NPs can induce adverse physical effects via the dermal route.
  • Given that previous studies in liquid33 did not reveal such effects, the authors proposed they were induced by the movement of C. elegans in the agar plates where the Ag-NPs were distributed.
  • More recently, the authors evaluated the external surface of worms treated with Fe2O3 and Au-NPs in liquid by SEM coupled to energy-dispersive X-ray spectroscopy (EDX), an elemental analysis technique that enables the study of chemical composition.
  • The authors could not visualize NPs on the cuticle of C. elegans by SEM, nor did they detect the presence of iron or gold elements on this structure by EDX (Fig. 7D–F) after thorough rinsing of the treated worms.

3.5. Synchrotron and microprobe techniques

  • Among the synchrotron techniques, synchrotron radiation X-ray fluorescence (m-SRXRF) has been used to map the metal distribution in C. elegans, while synchrotron X-ray absorption near-edge spectroscopy (m-XANES) has provided information regarding the oxidation state and coordination environment of metals.
  • Cu2+ exposure resulted in a much higher absorption and accumulation.
  • 25 Regarding the use of nuclear microprobe techniques, Le Trequesser combined scanning transmission ion microscopy (STIM) and micro-proton-induced X-ray emission (m-PIXE) to detect and quantify 30 nm TiO2-NPs in C. elegans.
  • After 4 h exposure, NPs were visible only in the lumen of the alimentary system extending from the pharynx to the anal region, and were retained there even 24 h after feeding.
  • 43 Given that alterations in the distribution of trace metal such as Fe, Cu, Zn or Mn are sometimes related to certain pathological states, the use of these techniques is of value in the study of alterations in metal homeostasis.

3.6. Analytical chemistry techniques

  • Among the analytical techniques applied to investigate NPs in C. elegans, different micro-spectroscopy modalities have been used to characterize NP status, while quantitation of NP uptake Fig. 5 Use of fluorescence and hyperspectral imaging to characterize NP pharmacokinetics.
  • Upper panels show epifluorescence images of carboxy 50 nm PS-NPs in the intestine (left) and cytoplasm of early embryos .
  • This journal is©The Royal Society of Chemistry 2017 Mater. Horiz. has been mainly addressed by chemical elemental analysis (ICP-MS).
  • Fe concentrations after a 6 h exposure (2 mg mg 1 worm), however Fe uptake decreased after 2 h under normal conditions due to defecation of the NPs contained in the intestinal lumen (disposal of 50% Fe), and it was further reduced by the disposal of the surface-attached Fe during molting (additional 80% reduction of nematodeassociated Fe) (Fig. 9B).
  • 27 More recently, Johnson et al. applied ICP-MS to quantify Au-NP uptake by C. elegans and, operating in single particle ICP-MS mode, to characterize Au-NP status inside the animals.

3.7. Other experimental techniques with potential to characterize nano/bio interactions

  • Use of m-PIXE to characterize NP toxicokinetics in C. elegans.
  • Magnetometry can also be applied to investigate NP fate by monitoring how the magnetic moment of the sample changes over temperature at a determined applied magnetic field.
  • These results provide indirect evidence of NP size evolution, and have been successfully corroborated with much more laborious analysis of C. elegans cross-sections by TEM (Fig. 12F and G).34,68 Both ZFC-FC measurements and TEM visualization supports the notion that NPs remain individual inside C. elegans.

4. Biological responses triggered by metal and metal oxide NPs: approaches and common outcomes

  • In the evaluation of metal and metal oxide NPs, it is important to note that some metals, known as essential metals, have a biological role in animals and plants.
  • Fig. 13 summarizes the different dissolution rates for different NP compositions at pH B 4.5.
  • Genetic approaches involve manipulating the genome of an organism to probe how this alteration affects the biological response to a stressor.
  • R am an m- sp ec tr os co py B io m ol ec u la r ph en ot yp e — Q u an ti ta ti ve . – Fat and carbohydrate metabolism – Oxidative stress protection (SOD) – Neurotransmitter synthesis and metabolism.

4.1. Pathways by which metal and metal oxide NPs cross biological barriers

  • Given the technical difficulty of studying NP fate with nanometric resolution inside living organisms at present, there is limited evidence of the internalization and translocation mechanisms of NPs in C. elegans.
  • Moreover, the negative membrane potential of most cells interacts differently with particles with a positive or negative surface charge.
  • The electrostatic interaction of NPs with the negatively charged bilayer of a membrane mediates their binding and their toxicity.
  • For biodegradable particles such as ZnO or Fe2O3, NP metabolism inside C. elegans has been demonstrated using a range of techniques, including magnetometry or spectroscopy.
  • 37 Gonzalez-Moragas et al. reported intracellular uptake by clathrin-mediated endocytosis of 6 nm Fe2O3-NPs, and down-regulation of the early endosome formation gene dyn-1, among other intestinal-related genes, however internalization of 11 nm Au-NPs were not detected either by electron microscopy or by gene expression analysis.

4.2. Toxicological mechanisms of metal nanoparticles

  • In an attempt to study particle-specific effects of manufactured nanomaterials, Tsyusko et al. chose gold NPs as a model since they are resistant to oxidative dissolution.
  • In good agreement, Contreras et al. showed that Ag-NP pre-exposed nematodes suffered cumulative damage.
  • In a later study, Roh et al. reported the formation of ROS and analyzed the expression of genes in MAPK signaling pathways.
  • This study highlighted a critical role for dissolved silver in the toxicity of all tested Ag-NPs, and also proposed a specific nano-Ag effect via oxidative stress typically for the less soluble Ag-NPs, hence encompassing the two most prevalent Ag-NPs toxicity mechanisms reported previously.
  • Overall, the data is consistent with multiple mechanisms of action playing larger and smaller roles in different contexts.

4.3. Toxicological mechanisms of metal oxide nanoparticles

  • Ma et al. investigated the toxicity of ZnO-NPs and did not find differences from Zn2+ at the same molar (Zn) concentrations.
  • Cyp35a2 is a highly stress-responsive gene, and these results may highlight the principle that induction of a gene does not prove that the mechanism of toxicity of the stressor is closely related to the biological function of the gene induced, since genes are often inducible by multiple stressors.
  • Many metal ions can cause indirect oxidative stress via inhibition of antioxidant and other enzymes, depletion of glutathione and other antioxidants, or disruption of the electron transport chain.
  • Arnold et al. could not attribute CeO2NP-induced growth inhibition to oxidative or metal stress, but rather proposed a non-specific inhibition of feeding caused by NPs aggregating in the test media and/or inside the gut tract.
  • Only after long exposures and high doses (25 mg L 1 for 24 h), which suggest the relatively safe properties of TiO2-NPs.

4.4. Toxicological mechanisms of other nanomaterials

  • The cellular and molecular mechanisms of nanotoxicity of other nanomaterials including quantum dots (QD) and carbon nanomaterials such as graphene oxide (GO) have also been addressed recently in C. elegans.
  • In a study of QD-exposed nematodes, the intestinal barrier played a crucial role in transgenerational toxicity.
  • 116 Wu et al. found a key role of innate immunity in regulating in vivo GO toxicity.
  • 112 Recently, Subramani et al. investigated the shielding efficacy of BSA and poly ethylene imine (PEI) on graphene oxide (GO) in C. elegans and reported reduced toxicity of BSA-coated GO NPs compared to PEI-coated NPs, confirming the importance of surface properties in particle toxicity in vivo.
  • Hence, these studies offer a perspective complementary to metal-based NP evaluation, and open new avenues for future research on these materials.

5. Proposed integrated workflow for biological and materials science-based

  • The data is heterogeneous and still limited, hampering metadata analysis.
  • Therefore, to further their understanding of nano/bio interactions and facilitate data integration, the authors propose adoption of a minimal set of materials science and toxicological assessments that should be included in all studies.
  • First, it is of vital importance that NPs be very wellcharacterized after their synthesis or their purchase.
  • In addition, colloidal stability of NPs should be investigated in different exposure media well-tolerated by C. elegans.

6. Conclusions

  • The authors have presented how toxicology and materials science experts are contributing to better understand nano/bio interactions in the model organism C. elegans, although they have often worked separately, missing joint opportunities between their research fields.
  • A major recurrent shortcoming identified is that most of the toxicological studies to date have focused on identifying toxicity endpoints, while very few also investigate NP status inside C. elegans.
  • From the materials science viewpoint, magnetometry can find a novel use in the study of worms exposed to magnetic NPs; absorbance micro-spectroscopy can be applied to characterize noble metal NPs inside C. elegans; and FT-IR micro-spectroscopy might determine the level of tissue oxidation in NP-treated animals.
  • In C. elegans, to date these techniques have revealed NP-induced adverse effects in different cell organelles, among them mitochondria, lysosomes and the nucleus.

Did you find this useful? Give us your feedback

Figures (11)

Content maybe subject to copyright    Report

Boosted Local Structured HOG-LBP for Object Localization
Junge Zhang, Kaiqi Huang, Yinan Yu and Tieniu Tan
National Laboratory of Pattern Recognition
Institute of Automation, Chinese Academy of Sciences
{jgzhang,kqhuang,ynyu,tnt}@nlpr.ia.ac.cn
Abstract
Object localization is a challenging problem due to vari-
ations in object’s structure and illumination. Although ex-
isting part based models have achieved impressive progress
in the past several years, their improvement is still limited
by low-level feature representation. Therefore, this paper
mainly studies the description of object structure from both
feature level and topology level. Following the bottom-up
paradigm, we propose a boosted Local Structured HOG-
LBP based object detector. Firstly, at feature level, we pro-
pose Local Structured Descriptor to capture the object’s
local structure, and develop the descriptors from shape
and texture information, respectively. Secondly, at topol-
ogy level, we present a boosted feature selection and fusion
scheme for part based object detector. All experiments are
conducted on the challenging PASCAL VOC2007 datasets.
Experimental results show that our method achieves the
state-of-the-art performance.
1. Introduction
Object localization is an essential task in computer vi-
sion. Impressive performance improvement in object lo-
calization has been achieved via the progress in: 1) learn-
ing object structure [5, 8, 19, 20, 27] and detector model,
and 2) learning low-level feature based appearance model
[3, 10, 18, 21, 24, 25].
Detector models mainly include part based models [8, 9,
20, 27] and rigid template models [1, 24, 25]. In part based
models, they try to describe the object’s structure using sev-
eral parts and their relationships. Part based models can
be considered as top-down structure to tackle the problem
of partial occlusion and appearance variations. Part based
models [8, 20, 27] have been shown success on many diffi-
cult datasets [14]. For these good properties of robustness
to deformation, part based model is regarded as a promis-
ing method for localizing objects in images. This motivates
us to focus on part based model. Rigid template models
(a) (b)
(c) (d)
Figure 1. Detection results of different methods. (a) is the original
image. (b) is the result of SVM+HOG. (c) is the result from [7]
and (d) is the result by the proposed method.
can not describe the object’s structure variations with fixed
template. Therefore, they perform well on ideally condi-
tioned database but suffer from those difficult data with de-
formations. The progress in low-level feature advances the
progress of object localization greatly as well. One repre-
sentative feature is Histogram of Oriented Gradients (HOG)
[1]. The others include Pairs of Adjacent Segments (PAS)
[10] and Local Binary Pattern (LBP) [17], etc.
One important problem in object localization is how to
describe object’s structure robustly. Part based model as
a top-down structure shows its good property of modeling
object structure in topology level [20]. But robust low-level
feature representation challenges the part based model to
obtain better performance. In the field of signal process-
ing, signal is considered structured when the local intensity
varies along some preferred orientations [4]. Local structure
can be corners, edges or crossings, etc. The research in sig-
nal processing indicates that there is relation between local
energy and local structure. These studies state that using the
local energy can represent the local structure [4] well. From
1393

this aspect, previous popular feature HOG and LBP are his-
togram features. Thus, they can not effectively describe an
object’s local structure information which is important for
object localization.
Motivated by these challenges of robust low-level feature
representation for part based model, we address the problem
via Local Structured Descriptors based part model. Firstly,
we propose Local Structured HOG(LSHOG) in which the
Local Structured Descriptor is computed from local en-
ergy of shape information, Secondly, similar to LSHOG,
we present Local Structured LBP(LSLBP) in which the Lo-
cal Structured Descriptor is based on texture information.
In addition, to tackle the non-linear illumination changes,
we clip the large feature value caused by non-linear illumi-
nation changes with a truncation item. To reduce the effect
of small deformation, we apply spatial weighting which is
proved to be robust to aliasing and bin interpolation which
can accurately describe histograms in LSLBP. Thirdly, we
present a boosted Local Structured HOG-LBP based object
detector, and the proposed method achieves the state-of-the-
art performance on the challenging PASCAL VOC datasets
[14]. Figure 1 gives an example of person detection.
The rest of this paper is organized as follows. Section
2 gives a brief overview of related work. Section 3 intro-
duces the framework of our approach. Section 4 shows and
analyzes the experimental results and Section 5 draws con-
clusions.
2. Related work
This paper focuses on two basic problems: how to ac-
curately describe object structure at feature level and how
to fuse multiple Local Structured Descriptors for part based
model at topology level.
2.1. Features for object localization
Various visual features such as HOG, LBP, etc.have
been proposed for object localization. HOG was first pro-
posed for human detection [1]. Ever since then HOG
has been proved one of the most successful features in
general object localization [14]. During the past few
years, many variants of HOG have been presented, such as
Co-occurrence Histograms of Oriented Gradients(CoHOG)
[26] in which the co-occurrence with various positional off-
sets is adopted to express complex shapes of object. In [8],
contrast-sensitive and contrast-insensitive features are used
to formulate more informative gradient feature. LBP was
first presented by Ojala et al. [17], for the purpose of texture
classification. Uniform LBP then was developed to reduce
the negative effect caused by noises. In [16], Mu et al. stated
that the traditional LBP did not perform well in human de-
tection, so they proposed two variants of LBP named by Se-
LSHOG
LSLBP
Boostedfeature
Learningfeature
T rainingsamples
Gradientimage
LBPimage
Featurepool
Initializeroot
Initializeparts
fromroot
Updatesparts
&retrain
Detector
Trainingpartbaseddetector
Figure 2. The framework of Local Structured HOG-LBP based
part based object detector. This paper mainly focuses on feature
construction and multiple features learning for part based model.
We perform feature selection in root level. In the training phase,
parts models are initialized and updated using the feature learnt
from the root. We adopt latent SVM from [8].
mantic LBP(S-LBP) and Fourier LBP(F-LBP).Wang et al.
also proposed a cell-structured LBP [25] dividing the scan-
ning window into non-overlapping cells for human detec-
tion.These features(HOG,LBP,CoHOG,S-LBP,etc.) are all
histogram features which have limitation in describing the
object’s local structure. In addition, PAS [10] showed at-
tractive performance compared with HOG in recent years.
PAS uses the line segments to capture the object’s global
shape and its structure which is different from HOG and
LBP’s description schemes. But the boundary detection in
PAS is very time consuming which limits its wide applica-
tions.
2.2. Part based models
Part based models are robust to partial occlusion and
small deformation due to their expressive description of ob-
ject’s structure considering the relationships between parts.
During the past decade, the most representative part mod-
els are the constellation model proposed by Fergus et al .
[9] and the star-structured part model presented by Felzen-
szwalb et al. [8]. In [9], the parts’ locations are deter-
mined by the interest points. While in [8], parts’ loca-
tions are searched through dense feature HOG. Especially,
the star-structured part model is discriminatively (For con-
venience, we refer the method in [8] as DPBM for short)
trained and demonstrated state-of-the-art performance in
the past several years. In DPBM, an object is represented
by a root model and several parts models. The parts’ loca-
tions are considered as latent information and a latent SVM
is proposed to efficiently optimize the model’s parameters.
DPBM provides a very strong benchmark in the field of ob-
ject localization. But the performance of DPBM is still lim-
ited by the robust low-level feature representation.
1394

T
U
STT
2/*
~
N
)1
~
(
1
i
i
TUU
)
~
(
TUU
i
i
1
iͲ1
i
N
1i
U
i
U
Local
Structured
HOG
LocalStructured
Descriptor
histogram
Figure 3. The flowchart of the computation of Local Structured
HOG.
3. Boosted Local Structured HOG-LBP for
part based model
We show our framework of training Local Structured
HOG-LBP based part model in Figure 2. The system con-
sists of two parts: learning feature and training deformable
part based detector. The first stage is learning feature, in-
cluding extraction of Local Structured Descriptors based
on shape and texture information, and feature selection of
LSLBP in a supervised manner. In the stage of training ob-
ject detector, we firstly train the root model using the learnt
feature from the first stage, then initialize parts models from
the root model. We use latent SVM [6, 7, 8] to iteratively
train the part based detector.
3.1. Local Structured HOG
In this subsection, the details of Local Structured De-
scriptor based on shape information will be introduced.
As shown in Figure 3, the procedures of LSHOG com-
putation include gradient computation, orientation binning,
normalization and formulating Local Structured Descriptor.
The LSHOG includes both the histogram feature and Local
Structured Descriptor. Thus, LSHOG not only describe the
shape information through histogram feature, but also cap-
ture the relative local structure information through struc-
tured descriptor. The former steps are similar to HOG in
[1]. Especially, we don’t perform gamma/color normaliza-
tion and Gaussian weighting because we find they have little
affect on performance.
The gradient features used in LSHOG include both un-
signed gradient and signed gradient [1, 8]. Their orientation
range is 0
180
and 0
360
, respectively. To obtain a
cell-structured feature descriptor, the cell size is set to 8×8.
Local Structured Description. As discussed in above
section, the original HOG and its variants are still histogram
features and can not describe the local structure effectively.
Empirically, the boundary of any object(e.g., person)
tends to be continuous and the spatial adjacent regions must
have certain structure relation. As mentioned in above sec-
tion, PAS [10] is used to capture the spatial structure of ob-
()
2
¦
¦
()
2
¦
()
2
()
¦
()
2
¦
LR1
LR2
LR3
LR4
j
j+1 j+2
i
i+1
i+2
……
Figure 4. The details of the computation of LSHOG. The left im-
age illustrates the histogram of gradient in each cell. The right im-
age gives the gradient energy via the sum of squares of histogram
of gradient in each cell.
jects where the length of adjacent segments and their rela-
tive angles are encoded in the final descriptor. However, the
Berkeley probability boundary detector used in PAS is very
time consuming which limits PAS’s large scale applications.
In the field of signal processing, local energy based struc-
ture representation is widely used for its robustness to noise
and aliasing [4]. Inspired by these progresses, we adopt the
local gradient energy to capture local structure. We believe
the relative local structure between adjacent blocks is more
informative. Therefore, we use the relative gradient energy
within object’s adjacent blocks to capture the local struc-
ture.
The computation of LSHOG is illustrated in Figure 4.
Let F
i,j(i=1,2,...,h;j=1,2,...,w)
be the feature map where h, w
are the height and width of the feature map, respectively.
Let H
i,j
specify the sum of histogram of gradients at
F
(
i, j), and let LR
i(i=1,2,3,4)
be the squared block consist-
ing of four adjacent cells around cell (i +1,j+1). To avoid
a large local structure value, for an example, we define LR
1
by
LR
1
=
H
i+1,j+1
E
i,j
+ E
i,j+1
+ E
i+1,j
+ E
i+1,j+1
(1)
where E
i,j(i=1,2,...,h;j=1,2,...,w)
is used to denote the gra-
dient energy obtained from the sum of squares of gradient
histogram at each cell (i, j) from F . The computation of
LR
2
,LR
3
and LR
4
is similar to LR
1
. Then we can de-
fine the Local Structured Descriptor as follows. The Local
Horizontal Structure(LHS) is defined as:
LHS
1
= λ | LR
1
LR
2
|
LHS
2
= λ | LR
3
LR
4
|
(2)
The Local Vertical Structure(LVS) is defined as:
LV S
1
= λ | LR
1
LR
3
|
LV S
2
= λ | LR
2
LR
4
|
(3)
1395

Trilinear
interpolation
()
2
¦
¦
()
2
¦
()
2
()
¦
()
2
¦
LR1
LR2
LR3
LR4
jj+1
j+2
i
i+1
i+2
Figure 5. Overview of the computation of LSLBP.
The Local Diagonal Structure(LDS) is defined as:
LDS
1
= λ | LR
1
LR
4
|
LDS
2
= λ | LR
2
LR
3
|
(4)
And the Local Overall Structure(LOS) is defined by
LOS = λ
| LR
1
+ LR
2
+ LR
3
+ LR
4
| (5)
The control parameter λ can be taken as a normalization
factor for LHS,LVS and LDS. We set
λ =
σ × 18
4
(6)
where σ is the maximum possible value for gradient fea-
ture. The purpose of Eq.6 is to make Local Structured
Descriptor’s value be the same order of quantity with his-
togram feature’s value. In LSHOG, we use the truncation
value σ =0.2,soλ =0.4743. For LOS, we find the set-
ting λ
=0.1 is enough which has the same purpose as
λ. As illustrated above, this coding scheme has several ad-
vantages:1) Simple to compute. 2) Robust to small defor-
mation. Because the descriptor is related with the local re-
gions’ energy, small deformation would change little in the
energy of the corresponding region. 3) Easy to be applied
in other pixel based histogram features.
3.2. Local Structured LBP
In this subsection, we will give the details of Local Struc-
tured Descriptor based on texture information. As shown in
Figure 5, firstly, we compute the uniform binary pattern at
each pixel, then the initial cell-structured LBP descriptor is
formulated by trilinear interpolation. The final LSLBP con-
sists of both binary patterns histogram and Local Structured
Descriptor. The local structure coding scheme is similar
with LSHOG.
The LSLBP is computed with the cell size 8 × 8 to be
compatible with LSHOG. Many previous work on LBP did
not use the trilinear interpolation which is in fact, very help-
ful for accurate description of histogram based feature [1].
Similar to LSHOG, we capture the local structure
through texture information via LHS, LVS, LDS and LOS.
In LSLBP, each cell’s energy is computed from the sum of
squares of binary patterns histogram. That is,
E
i,j
=
59
p=1
h
2
p
(7)
where h
p
is the histogram of binary patterns, and p denotes
the p
th
feature in h. In this way, the LSLBP can capture the
local structure from the aspect of texture, which is mutual
complementary with LSHOG.
According to the coding scheme of LBP, it is invariant
to linear illumination changes. In the non-linear case, some
LBP values tend to become too large while others’ not. In
order to reduce the possible negative effect caused by these
non-linear changes, we clip the entry of uniform pattern
with 0.2. Especially, the entry of non-uniform pattern is of-
ten much larger than uniform patterns, so we limit its max-
imum value to 0.3 empirically. The normalization factors λ
and λ
are set by the same scheme with LSHOG.
3.3. Learning feature and training detector
In this subsection, we address the problem of combin-
ing LSHOG and LSLBP and training part based model with
learnt LSHOG-LSLBP. This work is different from [25], in
which a rigid template model is trained for human detection
using concatenated basic HOG-LBP.
To begin with the details of learning feature, we give the
formulation of multiple features combination generally.
Fusion problem. Let’s denote the training samples as
{(x
i
,y
i
)
i=1,...,N
} where x
i
X is the training image and
y
i
∈{+1, 1} is the corresponding class label. We can
extract different types of features such as LSHOG, LSLBP,
etc. which are denoted by f
l
i(i=1,...,N,l=1,...,M )
F where
f
l
i
denotes the l
th
feature extracted from sample x
i
,N is the
number of training samples and M is the total number of
feature types. Therefore, the feature combination could be
formulated as a learning problem:
g : α
1
T
1
(f
1
)+...α
l
T
l
(f
l
)+...α
M
T
M
(f
M
) −→ (1, +1)
(8)
where T
l
is the transformation function of the l
th
feature
and α
l
is the corresponding weight. g is the optimization
function.
Many popular methods have been proposed to tackle
the feature combination problem. They are Multiple Ker-
nel Learning [23, 24], Boosting [11] and subspace learning
[12], etc. These methods can be roughly divided into two
categories: basic feature level and feature subspace. In this
paper, we mainly investigate some methods at feature level,
including na
¨
ıve combination, MKL and Boosting methods.
For the above three combination schemes, we take a uni-
fied way to learn feature and train the part based object de-
1396

tector using the learnt feature. The whole framework in-
cludes two stages: 1)Feature learning stage; 2) Part based
model training stage.
Feature learning stage. The goal in this paper is to train
a LSHOG-LSLBP based part based detector. Hence, the
key problem is how to learn feature for part models. In
this work, we use the star-structured part based model [8]
and the inference of a detection window for the part based
model can be summarized as,
score
subwindow
= sr +
N
i=1
sp
i
N
i=1
dc
i
(9)
where sr is the root score(The rigid template model is anal-
ogous to the root model here), sp
i
means the score from the
i
th
part filter, dc
i
is the deformation cost from the i
th
part
filter and N is the number of parts. In the star-structured
part based model, the parts models are initialized from the
root model. Therefore, we could perform feature selection
on root feature only. In the training part based object de-
tector stage, we use the learnt feature to initialize both root
model and parts models. This approach has an important
advantage that is the learning procedure does not need to
know the parts models’ sizes.
Because the part based model is based on dense cell-
structured feature(LSHOG,LSLBP,etc.), learning feature
from root still has two strategies: one is learning from fea-
tures at each cell; The other is from features within the
whole detection window. Because our objective is to op-
timize and classify features from the whole detection win-
dow but not from each cell. Therefore, we adopt the latter
strategy, e.g. learning feature from the detection window. In
addition, learning feature procedure is performed for each
component to train a part model with multiple components
[8] according to aspect ratio.
Part based model training stage. Firstly, we use the
learnt feature to initialize the root model. Parts models are
then initialized from the root model. Latent SVM [6, 7,
8] is used to train the part models iteratively. The whole
algorithm can be found in Algorithm 1.
4. Experiments
We evaluate the proposed method on the challenging
PASCAL VOC datasets [14] which are widely acknowl-
edged as difficult benchmark datasets for object local-
ization. In PASCAL VOC datasets, there are 20 ob-
ject classes consisting of person, vehicles(e.g.,car, bus),
household(e.g.,chair, sofa) and animals(e.g.,cat,horse) [14].
The criterion adopted in VOC challenge is Average Preci-
sion(AP). Our method achieves the state-of-the-art results
on PASCAL VOC datasets over other related methods.
Experiments are conducted in three groups:1) Single
1 Learnt feature LF
i
:= Ø;
2 for component i := 1 to N do
3 PF : Extract positive features from i
th
root;
4 NF : Random sampling from negative samples;
5 Learning feature(MKL,Boosting,etc.) from
PF,NF;
6 Add learnt feature to LF
i
;
7 end
8 Training part based object detector
9 for component i := 1 to N do
10 Initialize i
th
root from LF
i
;
11 for part j := 1 to N
part
do
12 Initialize:j
th
part from i
th
root;
13 end
14 for Iter k := 1 to K
iter
do
15 Update models and retrain;
16 end
17 end
Algorithm 1: Learning feature and training object de-
tector.
LSHOG s experiments designed to validate the effective-
ness of Local Structured Descriptor;2) Single LSLBP’s ex-
periments developed to validate the effectiveness of trilin-
ear interpolation, truncation and Local Structured Descrip-
tor; 3) Comparison experiments with different combina-
tion schemes; 4) The full results of proposed boosted Lo-
cal Structured HOG-LBP based object detector on PASCAL
VOC2007.
Several versions of latent SVM were released at Felzen-
szwalb’s homepage. To avoid confusion, we mention voc-
release3.1 [6] as V3 and voc-release4 [7] as V4 shortly. The
latent SVM from V4 is only adopted in the full experiments
on PASCAL VOC datasets and latent SVM from V3 is used
in other experiments. The purpose of using like this is to
verify the stability of the proposed method.
4.1. Localization results with LSHOG
To validate the proposed LSHOG, we train a person de-
tector using LSHOG on PASCAL VOC2007 datasets us-
ing latent SVM from V3. We achieve 37.4% AP score on
person with 1.2% improvement compared with 36.2% from
V3. We also do the comparison experiments on aeroplane
and dog categories randomly chosen from 20 classes. The
results are presented in Figure 6, from which we can see
that the improvement is promising.
These results validate that the local structured descriptor
can effectively capture more structured information and im-
prove the detection performance. It should be highlighted
that the simple coding scheme could be easily extended to
other pixel based histogram features.
1397

Citations
More filters
Proceedings ArticleDOI
23 Jun 2014
TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.
Abstract: Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also present experiments that provide insight into what the network learns, revealing a rich hierarchy of image features. Source code for the complete system is available at http://www.cs.berkeley.edu/~rbg/rcnn.

21,729 citations

Posted Content
TL;DR: This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.
Abstract: Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012---achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also compare R-CNN to OverFeat, a recently proposed sliding-window detector based on a similar CNN architecture. We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset. Source code for the complete system is available at this http URL.

13,081 citations


Cites background from "Boosted local structured HOG-LBP fo..."

  • ...This hypothesis is grounded, for example, in the wide range of papers that attempt to boost detection accuracy with work along four axes: (1) rich structured models [20, 42]; (2) multiple feature learning [38, 41]; (3) learned histogram-based features [11, 29, 32]; or (4) unsupervised feature learning [34]....

    [...]

Posted Content
TL;DR: DenseBox is introduced, a unified end-to-end FCN framework that directly predicts bounding boxes and object class confidences through all locations and scales of an image and shows that when incorporating with landmark localization during multi-task learning, DenseBox further improves object detection accuray.
Abstract: How can a single fully convolutional neural network (FCN) perform on object detection? We introduce DenseBox, a unified end-to-end FCN framework that directly predicts bounding boxes and object class confidences through all locations and scales of an image. Our contribution is two-fold. First, we show that a single FCN, if designed and optimized carefully, can detect multiple different objects extremely accurately and efficiently. Second, we show that when incorporating with landmark localization during multi-task learning, DenseBox further improves object detection accuray. We present experimental results on public benchmark datasets including MALF face detection and KITTI car detection, that indicate our DenseBox is the state-of-the-art system for detecting challenging objects such as faces and cars.

437 citations


Cites methods from "Boosted local structured HOG-LBP fo..."

  • ...First, handcrafted image features such as HOG [5, 45, 44], SIFT [25], and Fisher Vector [4] are extracted at every location and scale of an image....

    [...]

Journal ArticleDOI
TL;DR: This work proposes to model an object class by a cascaded boosting classifier which integrates various types of features from competing local regions, named as region lets, which significantly outperforms the state-of-the-art on popular multi-class detection benchmark datasets with a single method.
Abstract: Generic object detection is confronted by dealing with different degrees of variations, caused by viewpoints or deformations in distinct object classes, with tractable computations. This demands for descriptive and flexible object representations which can be efficiently evaluated in many locations. We propose to model an object class with a cascaded boosting classifier which integrates various types of features from competing local regions, each of which may consist of a group of subregions, named as regionlets . A regionlet is a base feature extraction region defined proportionally to a detection window at an arbitrary resolution (i.e., size and aspect ratio). These regionlets are organized in small groups with stable relative positions to be descriptive to delineate fine-grained spatial layouts inside objects. Their features are aggregated into a one-dimensional feature within one group so as to be flexible to tolerate deformations. The most discriminative regionlets for each object class are selected through a boosting learning procedure. Our regionlet approach achieves very competitive performance on popular multi-class detection benchmark datasets with a single method, without any context. It achieves a detection mean average precision of 41.7 percent on the PASCAL VOC 2007 dataset, and 39.7 percent on the VOC 2010 for 20 object categories. We further develop support pixel integral images to efficiently augment regionlet features with the responses learned by deep convolutional neural networks. Our regionlet based method won second place in the ImageNet Large Scale Visual Object Recognition Challenge (ILSVRC 2013).

334 citations


Cites methods from "Boosted local structured HOG-LBP fo..."

  • ...We do not use any context cues in this paper and leave it as a future work....

    [...]

Proceedings ArticleDOI
01 Dec 2013
TL;DR: This work proposes to model an object class by a cascaded boosting classifier which integrates various types of features from competing local regions, named as region lets, which significantly outperforms the state-of-the-art on popular multi-class detection benchmark datasets with a single method.
Abstract: Generic object detection is confronted by dealing with different degrees of variations in distinct object classes with tractable computations, which demands for descriptive and flexible object representations that are also efficient to evaluate for many locations. In view of this, we propose to model an object class by a cascaded boosting classifier which integrates various types of features from competing local regions, named as region lets. A region let is a base feature extraction region defined proportionally to a detection window at an arbitrary resolution (i.e. size and aspect ratio). These region lets are organized in small groups with stable relative positions to delineate fine grained spatial layouts inside objects. Their features are aggregated to a one-dimensional feature within one group so as to tolerate deformations. Then we evaluate the object bounding box proposal in selective search from segmentation cues, limiting the evaluation locations to thousands. Our approach significantly outperforms the state-of-the-art on popular multi-class detection benchmark datasets with a single method, without any contexts. It achieves the detection mean average precision of 41.7% on the PASCAL VOC 2007 dataset and 39.7% on the VOC 2010 for 20 object categories. It achieves 14.7% mean average precision on the Image Net dataset for 200 object categories, outperforming the latest deformable part-based model (DPM) by 4.7%.

316 citations


Cites background from "Boosted local structured HOG-LBP fo..."

  • ...Contexts from local or global appearance have been explored to improve object detection [23, 27, 8, 5, 18, 29]....

    [...]

References
More filters
Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

31,952 citations


"Boosted local structured HOG-LBP fo..." refers background or methods in this paper

  • ...Following the bottom-up paradigm, we propose a boosted Local Structured HOGLBP based object detector....

    [...]

  • ...This paper focuses on two basic problems: how to accurately describe object structure at feature level and how to fuse multiple Local Structured Descriptors for part based model at topology level....

    [...]

Journal ArticleDOI
TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
Abstract: The Pascal Visual Object Classes (VOC) challenge is a benchmark in visual object category recognition and detection, providing the vision and machine learning communities with a standard dataset of images and annotation, and standard evaluation procedures. Organised annually from 2005 to present, the challenge and its associated dataset has become accepted as the benchmark for object detection. This paper describes the dataset and evaluation procedure. We review the state-of-the-art in evaluated methods for both classification and detection, analyse whether the methods are statistically different, what they are learning from the images (e.g. the object or its context), and what the methods find easy or confuse. The paper concludes with lessons learnt in the three year history of the challenge, and proposes directions for future improvement and extension.

15,935 citations


"Boosted local structured HOG-LBP fo..." refers background or methods in this paper

  • ...Secondly, at topology level, we present a boosted feature selection and fusion scheme for part based object detector....

    [...]

  • ...Experimental results show that our method achieves the state-of-the-art performance....

    [...]

  • ...This paper focuses on two basic problems: how to accurately describe object structure at feature level and how to fuse multiple Local Structured Descriptors for part based model at topology level....

    [...]

Journal ArticleDOI
TL;DR: An object detection system based on mixtures of multiscale deformable part models that is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges is described.
Abstract: We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks such as the PASCAL data sets. Our system relies on new methods for discriminative training with partially labeled data. We combine a margin-sensitive approach for data-mining hard negative examples with a formalism we call latent SVM. A latent SVM is a reformulation of MI--SVM in terms of latent variables. A latent SVM is semiconvex, and the training problem becomes convex once latent information is specified for the positive examples. This leads to an iterative training algorithm that alternates between fixing latent values for positive examples and optimizing the latent SVM objective function.

10,501 citations


"Boosted local structured HOG-LBP fo..." refers background in this paper

  • ...During the past few years, many variants of HOG have been presented, such as Co-occurrence Histograms of Oriented Gradients(CoHOG) [26] in which the co-occurrence with various positional offsets is adopted to express complex shapes of object....

    [...]

  • ...This paper focuses on two basic problems: how to accurately describe object structure at feature level and how to fuse multiple Local Structured Descriptors for part based model at topology level....

    [...]

  • ...One representative feature is Histogram of Oriented Gradients (HOG) [1]....

    [...]

  • ...Although existing part based models have achieved impressive progress in the past several years, their improvement is still limited by low-level feature representation....

    [...]

  • ...Therefore, this paper mainly studies the description of object structure from both feature level and topology level....

    [...]

Journal ArticleDOI
TL;DR: This paper evaluates the performance both of some texture measures which have been successfully used in various applications and of some new promising approaches proposed recently.

6,650 citations


"Boosted local structured HOG-LBP fo..." refers background or methods in this paper

  • ...Experimental results show that our method achieves the state-of-the-art performance....

    [...]

  • ...This paper focuses on two basic problems: how to accurately describe object structure at feature level and how to fuse multiple Local Structured Descriptors for part based model at topology level....

    [...]

Journal ArticleDOI
TL;DR: This work shows that this seemingly mysterious phenomenon of boosting can be understood in terms of well-known statistical principles, namely additive modeling and maximum likelihood, and develops more direct approximations and shows that they exhibit nearly identical results to boosting.
Abstract: Boosting is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted versions of the training data and then taking a weighted majority vote of the sequence of classifiers thus produced. For many classification algorithms, this simple strategy results in dramatic improvements in performance. We show that this seemingly mysterious phenomenon can be understood in terms of well-known statistical principles, namely additive modeling and maximum likelihood. For the two-class problem, boosting can be viewed as an approximation to additive modeling on the logistic scale using maximum Bernoulli likelihood as a criterion. We develop more direct approximations and show that they exhibit nearly identical results to boosting. Direct multiclass generalizations based on multinomial likelihood are derived that exhibit performance comparable to other recently proposed multiclass generalizations of boosting in most situations, and far superior in some. We suggest a minor modification to boosting that can reduce computation, often by factors of 10 to 50. Finally, we apply these insights to produce an alternative formulation of boosting decision trees. This approach, based on best-first truncated tree induction, often leads to better performance, and can provide interpretable descriptions of the aggregate decision rule. It is also much faster computationally, making it more suitable to large-scale data mining applications.

6,598 citations

Frequently Asked Questions (11)
Q1. What contributions have the authors mentioned in the paper "Boosted local structured hog-lbp for object localization" ?

Therefore, this paper mainly studies the description of object structure from both feature level and topology level. Following the bottom-up paradigm, the authors propose a boosted Local Structured HOGLBP based object detector. Firstly, at feature level, the authors propose Local Structured Descriptor to capture the object ’ s local structure, and develop the descriptors from shape and texture information, respectively. Secondly, at topology level, the authors present a boosted feature selection and fusion scheme for part based object detector. 

In order to reduce the possible negative effect caused by these non-linear changes, the authors clip the entry of uniform pattern with 0.2. 

This paper focuses on two basic problems: how to accurately describe object structure at feature level and how to fuse multiple Local Structured Descriptors for part based model at topology level. 

In the stage of training object detector, the authors firstly train the root model using the learnt feature from the first stage, then initialize parts models from the root model. 

The latent SVM from V4 is only adopted in the full experiments on PASCAL VOC datasets and latent SVM from V3 is used in other experiments. 

3) Boosted multiple features fusion scheme for part based model stably improves the localization performance and performs best among these methods. 

1) Truncation is helpful for robustness to nonlinear illumination changes; 2)Different truncation for uniform patterns and non-uniform patterns is reasonable. 

the entry of non-uniform pattern is often much larger than uniform patterns, so the authors limit its maximum value to 0.3 empirically. 

LSLBP achieves 32.4% best AP score, with an improvement by 1.4% and 2.2% over LBP with trilinear interpolation and traditional LBP, respectively. 

The authors outperforms other methods except Oxford-MKL method(Oxford-MKL method adopted four types of multi-level features and achieved very competitive results on VOC2007 datasets) in 16 out of 20 categories. 

As shown in Figure 7, for the person category, the Boosting based method achieves the best results, by improving 4% over baseline V3, 3% over naı̈ve combination and 2.4% over MKL based method.