RES E AR C H Open Access
Embedded tracking algorithm based on
multi-feature crowd fusion and visual
object compression
Zheng Wenyi
1,2*
and Dong Decun
1
Abstract
The accuracy and poor real-time performance of moving objects in a dynamic range complex environment become
the bottleneck problem of the target location and tracking. In order to improve the positioning accuracy and the
quality of tracking service, we propose an embedded tracking algorithm based on multi-feature fusion and visual
object compression. On the hand, according to the feature of the target, the optimal feature matching method is
selected, and the multi-feature crowd fusion location model is proposed. On the other hand, to reduce the dimension
of the multidimensional space composed of the moving object visual frame and the compression of the visual object,
the embedded tracking algorithm is established. Experimental results show that the proposed tracking algorithm has
high precision, low energy consumption, and low delay.
Keywords: Visual object compression, Embedded tracking, Crowd fusion, Multi-feature systems
1 Introduction
Moving target tracking is one of the most active areas in
the development of science and technology. The target
tracking algorithms have been widely valued by all coun-
tries in the world [1]. With the performance of continu -
ous improvement and expansion, location and tracking
algorithm for successful application in industry, agricul-
ture, health care, and service industry [2], in the urban
security, defense and space exploration have dangerous
situations [3] is to show their talents.
The low-complexity and high-accuracy algorithm was
presented in [4], for reducing the computational load of the
traditional data-fusion algorithm with heterogeneous obser-
vations for location tracking. Trogh et al. [5] presented a
radio frequency-based location tracking system, which
could improve the performance by eliminating the shadow-
ing. In [6], Liang and Krause proposed the proof-of-
concept system based on a sensor fusion approach, which
was built with considerations for lower cost, and higher
mobility, deplorability, and portability, by combining the
drift velocities of anchor nodes. The scheme of [7] could es-
timate the drift velocity of the tracked node by using spatial
correlation of ocean current. The distributed multi-human
location algorithm was researched by Y ang et al. [8] for a
binary piezoelectric infrared sensor tracking system.
The model-based approach was presented in [9], which
is used to predict the geometric structure of an object
using its visual hull. The task-dependent codebook com-
pression framework was proposed to learn a compression
function and adapt the codebook compression [10]. Ji et.al
[11] proposed a novel compact bag-of-patterns descriptor
with an application to low bit rate mobile landmark search.
The blocks were flagged by lying in the object regions flag-
ging compression blocks and an object tree would be
added in each coding tree unit to describe the object’s
shape in its additional object tree [12].
However, how to provide guarantee for the high precision
tracking of moving objects in the dynamic range and com-
plex environment and the complexity of the optimization al-
gorithm is one of the most difficult problems. Based on the
results of the above researches, the embedded tracking algo-
rithm based on multi-feature crowd fusion and visual object
compression was proposed for mobile object tracking.
The rest of the paper is organized as follows. Section 2
describes the location model based on multi-feature
* Correspondence: weyzheng@sina.com
1
The Key Laboratory of Road and Traffic Engineer ing, Ministry of Education,
Tongji University, Shanghai 201804, China
2
School of Computer Science and Communication Engineering, Jiangsu
University, Zhenjiang 212013, Jiangsu, China
EURASIP Journal on
Embedded Systems
© 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made.
Wenyi and Decun EURASIP Journal on Embedded Systems (2016) 2016:16
DOI 10.1186/s13639-016-0053-7
crowd fusion. In Section 3, we show the embedded track-
ing algorithm for visual object compre ssi on. We ana-
lyzed and evaluated the proposed scheme in Section 4.
Finally, we conclude the paper in Section 5.
2 Location model based on multi-feature crowd
fusion
Generally, the target location is divided into two stages.
In the first phase, the feature of moving target is ex-
tracted. Feature extraction would be completed with the
following steps.
(1)Capture moving target image frame sequence.
(2)The characteristics of real-time target image frames
would be extracted.
(3)From the current image frame to the still image
frame between the targe t, search and the extracted
features of the image frame the most similar to the
target motion characteristics.
The second stage involves the characteristics of the
moving target matching.
Choosing different features, according to the charac-
teristics of the target, is selecting the best feature match-
ing scheme.
Theabovelocalizationschemehasthefollowingdefects:
(1)The extracted features are single. Such feature
extraction is difficult to locate for the complex
moving objects with multiple states.
(2)The change features of the moving object in complex
scenes such C
def
as various deformation, C
lgf
(light),
C
siz
(size), and C
col
(color), making the single feature
matching success rate SR
FM
very low, as shown in
formula (
1).
f IF
i
¼ ifðÞ¼
X
N
i¼1
G
i
; h if; h
i−1
;
X
i
j¼1
h
j
! !
if C
def
; C
lgf
h if; h
i−1
;
X
i
j¼1
h
j
!
¼ if
i
C
siz
; C
col
ðÞ
−αjjh
i−1
α;
X
i
j¼1
h
j
!
SR
FM
¼
X
M
i¼1
f if
i
; h
i
ðÞ
1
X
N
j¼1
f ðif
j
Þ
≤
f if
M
; h
M
ðÞ
N
8
>
>
>
>
>
>
>
>
>
>
>
>
>
>
<
>
>
>
>
>
>
>
>
>
>
>
>
>
>
:
ð1Þ
Here, if represents the image frames. IF is the representa-
tion of image frame sequence. G is the vector representa-
tion of image frame matrix. H is the function used to solve
the image frame characteristics and frame similarity. N rep-
resents the captured motion target image frame sequence
length. M represents the frames of image feature matching.
From formula (1), it is found that the upper bound of the
matching success rate is
f if
M
;h
M
ðÞ
N
. But the success rate of
image frames captured is inversely proportional to the
number of captured image frames. The conclusion shows
that the captured image frames will restrict the single fea-
ture matching characteristic.
(3)The accuracy and robustness of target motion in
real time are poor, as shown in formula (
2). In order
to improve the accuracy and robustness, a single
feature set is tracking the feature series, but the
complexity of the transition algorithm is too high, as
showninformula(
3).
A
TR
¼
X
M
i¼1
sinβ−α
i
1
N
þ
X
M
i¼1
h
i
ffiffiffi
α
p
RUS
TR
¼ ρf IF
M
ðÞSR
FM
8
>
>
>
<
>
>
>
:
ð2Þ
Here, A
TR
indicates the positioning accuracy. RUS
TR
indi-
cates the location robustness. β is the included angle between
adjacent image frames. ρ denotes the expressed error vector.
CLE
TSA
¼ IF
i
−ρ sinβ
jj
≤M gh
i
; αðÞ
gh
i
; αðÞ≈
X
M
i¼1
h
i
if C
def
; C
lgf
; C
siz
; C
col
≥M if
M
C
def
; C
lgf
; C
siz
; C
col
8
>
>
>
<
>
>
>
:
ð3Þ
Here, CLE
TSA
represents the complexity of the transi-
tion algorit hm. The function g(h
i
, α) represents transi-
tion algorithm. It ca n be found that the comple x image
frames are proportional to the degree and feature
matching. This shows that more space, time, and com-
putation must be paid in order to get more features to
match the image frames.
In order to solve the above problems, we propose a
multi-feature crowd fusion location model. The model
analyzes the dynamic motion of the target, the moving
track, and the structure parameters of the image frame.
The state characteristics of different targets are captured;
the composition of multiple feature vectors such as for-
mula (4) is presented. This vector integrates the character-
istics of motion state and deformation, light, size, and
color and can effectively improve the low matching suc-
cess rate of single feature extraction, such as formula (5).
ML
F
¼
"
v
mot
if
11
⋯ v
mot
f
1L
⋮
⋱
⋮
v
mot
K
f
K1
⋯v
mot
K
f
KL
#
v
mot
¼
M
N
X
i¼1
N−M
fif
i
; h
i
ðÞ
8
>
>
>
>
>
>
>
<
>
>
>
>
>
>
>
:
ð4Þ
Here, v
mot
is the target motion traje ctory fitting func-
tion. K is the representation of time series features. L is
the representation of spatial sequence features.
Wenyi and Decun EURASIP Journal on Embedded Systems (2016) 2016:16 Page 2 of 8
SR
FM
¼
α⋅rank ML
F
fg
tanβ; β < α
jj
β > ρ
1; α≤β≤ρ
ð5Þ
Here, rank {ML
F
} is the rank of multi-feature vector.
From formula (5), we can see that the high matching
success rate can be guaranteed as long as the multiple
feature vectors are solved correctly.
In order to further reduce the complexity and improve
the accuracy and reliability of multi-feature matching, this
model combines the multi-feature fusion mechanism
based on crowd feature analysis. Multi-feature vector and
the target motion curve of the knowledge combination
are shown in Fig. 1. In the crowd analysis characteristics,
the cur ve and the multiple features are relatively inde-
pendent, the relative independence between the arc and
the multiple features. By crowd analysis, multi-feature vec-
tors are optimized. Multi-features in this vector are not
mutually exclusive. Multi features in this vector are not
mutually exclusive for improving the performance of multi
feature fusion. This can reduce the amount of fusion oper-
ations, as shown in formula (6).
fu
comp
¼
1
M
2
X
K
i¼1
X
L
j¼1
Gi; jðÞML
Fi;jðÞ
Gi; jðÞ¼
f IF
i
ðÞf IF
j
hf; ML
Fi
; ML
Fj
8
>
>
>
>
<
>
>
>
>
:
ð6Þ
In summary,the multi feature crowd fusion algorithm
is shown in Fig. 2.
3 Embedded tracking algorithm for visual object
compression
The visual frame of the moving object constitutes the
multidimensional space of Q dimension. The visual
frame in this space is denoted as x
Q
. The visual frame of
Motion
object
Single feature
Single feature
Single feature
MLF
C
def
C
col
C
siz
C
lgf
fusion
Crowd analyzing model
Feature extraction
SR
FM
A
TR
fu
comp
Fig. 2 Multi-feature crowd fusion location algorithm and workflow
Fig. 1 Crowd fusion between multiple features
Wenyi and Decun EURASIP Journal on Embedded Systems (2016) 2016:16 Page 3 of 8
the internal elements is used to form a visua l matrix
VM. The element of the visual matrix receives the inter-
ference of the multidimensional space, and the informa-
tion is easy to be distorted. In order to solve this
problem, the visual matrix VM can be compressed. The
visual frame of the L-dimensional space tracking system
is shown in Fig. 3. Here, the VM is selected as the object
of the center visual frame VF, perpendicular to the co-
ordinate axes of the L-dimensional space. The angle be-
tween the vertical line is denoted as θ
1
, …, θ
Q
.MO
represents a moving target. In the moving process of ob-
jects, φ is the angle between the VF point and the motion
direction. The VF point can collect the visual targets with
different degrees. These targets are used to update the ele-
ments of the visual matrix VM. The fusion results of VF
and VM are mapped into the MO plane. The compression
matrix must satisfy the omni-directional low-dimensional
characteristics, as shown in formula (7). After the matrix
is compressed, the characteristics of the distribution of the
visual frame must be satisfied, as shown in formula (8).
F θ ; φðÞ¼VM
jj
H
sinθ cosφ
2d
D
eff
¼
d
2π
Q
X
Q
i¼1
sinθ
i
cosφ
θ
i
min
e
8
>
>
>
<
>
>
>
:
ð7Þ
Here, F(θ, φ) represents the direction function of visual
frames in L-dimensional space. D represents the distance
between the VF point and the origin of the L-dimensional
space. D
eff
represents the effective dimension of visual
tracking space. The parameter value is obviously less than
L, which can effectively reduce the dimension and im-
prove the compression efficiency of the visual frame.
features
1L
Q dimensional space
1Q
crowd
compresion
Location
fusion
VF
U
Normal plane
embedded
Tar
g
et ob
j
ect trackin
g
Fig. 4 Embedded tracking algorithm architecture with visual frame target compression
Fig. 3 L-dimensional space tracking system for visual frame
Wenyi and Decun EURASIP Journal on Embedded Systems (2016) 2016:16 Page 4 of 8
f VFðÞ¼
VM
jj
H
D
eff
; θ
1
< φ < θ
D
eff
VM
jj
H
Q
; φ ¼ θ
1
; ⋯θ
Q
D
eff
fg
i¼1;…; VM
jj
H
max
; φ > θ
D
eff
jj
φ < θ
1
ð8Þ
8
>
>
>
>
>
>
>
<
>
>
>
>
>
>
>
:
Here, f(VF) is the visual frame distribution density
function. D
eff
fg
i¼1;…; VM
jj
H
max
represents the largest spatial
dimension in the VM rank of the visual matrix.
After the visual matrix VM is compressed and recon-
structed by the visual frame, the tracking signal is shown
in formula (9).
vf
D
eff
¼ VM
sinθ cosφ
X
VMjj
H
i¼1
x
i
ð9Þ
The visual object compression method can obtain the
mapping relationship between the visual frame and the
moving object from the L dimension or D
eff
dimensional
space by choosing the D
eff
fg
i¼1;…; VM
jj
H
min
and realize the
target motion prediction. The specific steps of the visual
target compression algorithm are as follows:
(1)The visual matrix of the core visual frame-oriented
migration: The VFC state of the visual frame is
{θ
C
, φ
C
, D
eff_C
}, which is captured by the current
moving target. The visual frames propagate along
the direction of the F(θ
C
, φ
C
). The new state
{θ
U
, φ
U
, D
eff_U
} of the visual frame is obtained after
spreading on the dimensional space D
eff_C
.
(2)The moving object, the current state of the visual
frame, and the diffusion of the visual frame form a
compressed plane PCV: The compressed point set PT
is formed in the plane. Arbitrary two points PT
j
and
PT
i
in the plane into a visual line: PT
i
normal vector is
NV
i
=sinθ
C
cos φ
U
‖PT
j
‖. The normal vector of any
point PT
j
on the plane is NV
j
=sinθ
C
cos φ
U
‖PT
i
‖.
(3)The included angle between the normal plane of the
method vector NV
i
and the normal plane of the
normal vector NV
j
: The relation between the plane
angle and the direction arc is sinγ ¼ NV
i
NV
j
PT
kk
arctan
θ
C
−θ
U
jj
φ
C
−φ
U
jj
. The vector mapping relation
between the plane angle and the direction field of
the moving object is shown in formula (
10).
M
γ
¼
M
PT
i
M
PT
j
M
VF
C
M
VF
U
M
i;jðÞ
¼ F sinθ
C
; cosφ
U
ðÞPT
i
kkPT
j
X
VMjj
H
i¼1
x
i
8
>
>
>
>
<
>
>
>
>
:
ð10Þ
The mapping matrix M
γ
is divided into 4 submatrices.
Each submatrix M
(i,j)
is obtained by solving the direction
function, compressing the visual frame point and the
signal strength.
(4)The 4 submatrices of the moving object multi-
feature fusion mapping matri x are obtained by the
visual frame analysis. Tracking matrix MT is
obtained through the target compression.
M
T
¼
ML
Fi;jðÞ
fu
comp
SR
FM
rank ML
F
fgtanγ
M if
M
ðC
def
; C
lgf
; C
siz
; C
col
Þ
X
M
i¼1
sinγ−α
i
1
N
#
"
200 400 600 800
5
10
15
20
25
30
Number of visual frame samples
)%(rorrenoitacoL
Single feature
Multiple features fusion
Fig. 5 Location error
Table 1 Delay with 1000 visual frame samples
Normal plane
radian
Location delay with
single feature
Location delay with multiple
features and fusion
30 25.7 ms 1.9 ms
50 89.4 ms 2.0 ms
110 345.2 ms 1.8 ms
Wenyi and Decun EURASIP Journal on Embedded Systems (2016) 2016:16 Page 5 of 8