scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Real-Time Modeling of 3-D Soccer Ball Trajectories From Multiple Fixed Cameras

01 Mar 2008-IEEE Transactions on Circuits and Systems for Video Technology (Institute of Electrical and Electronics Engineers)-Vol. 18, Iss: 3, pp 350-362
TL;DR: In this paper, model-based approaches for real-time 3-D soccer ball tracking are proposed, using image sequences from multiple fixed cameras as input, and incorporating motion cues and temporal hysteresis thresholding in ball detection and employing phase-specific models to estimate ball trajectories.
Abstract: In this paper, model-based approaches for real-time 3-D soccer ball tracking are proposed, using image sequences from multiple fixed cameras as input. The main challenges include filtering false alarms, tracking through missing observations, and estimating 3-D positions from single or multiple cameras. The key innovations are: 1. incorporating motion cues and temporal hysteresis thresholding in ball detection; 2. modeling each ball trajectory as curve segments in successive virtual vertical planes so that the 3-D position of the ball can be determined from a single camera view; and 3. introducing four motion phases (rolling, flying, in possession, and out of play) and employing phase-specific models to estimate ball trajectories which enables high-level semantics applied in low-level tracking. In addition, unreliable or missing ball observations are recovered using spatio-temporal constraints and temporal filtering. The system accuracy and robustness are evaluated by comparing the estimated ball positions and phases with manual ground-truth data of real soccer sequences.

Summary (6 min read)

Introduction

  • Model-based approaches for real-time 3D soccer ball tracking are proposed, using image sequences from multiple fixed cameras as input.
  • 10, 12], similar methods cannot be extended to ball detection and tracking for several reasons.

B. Contributions of This Work

  • A system is presented for model-based 3D ball tracking from real soccer videos.
  • The main contributions can be summarized as follows.
  • Meanwhile, a probability measure is defined to capture the likelihood that any specific detected moving object represents the ball.
  • Secondly, the 3D ball motion is modeled as a series of planar curves each residing in a vertical virtual plane (VVP), which involves geometric based vision techniques for 3D ball positioning.
  • For the first two types, phase-specific models are employed to estimate ball positions in linear and parabolic trajectories, respectively.

C. Structure of the Paper

  • In Section II, the method the authors used for tracking and detecting moving objects is described, using Gaussian mixtures [17] and calibrated cameras [19].
  • In Section III, a method is presented for identifying the ball from these objects.
  • These methods operate in the image plane from each camera separately.
  • In Section IV, the data from multiple cameras is integrated, to provide a segment-based model of the ball trajectory over the entire pitch, estimating 3D ball positions from either single view or multiple views.
  • Experimental results are presented in Section VI and the conclusions are drawn in Section VII.

II. MOVING OBJECTS DETECTION AND TRACKING

  • To locate and track players and the soccer ball, a multi-modal adaptive background model is utilized which provides robust foreground detection using image differencing [17].
  • This detection process is applied only to visible pitch pixels of the appropriate color.
  • Grouped foreground connectedcomponents (i.e. blobs) are tracked by a Kalman filter which estimates 2D position, velocity and object dimensions.
  • Greater detail is given in the subsections below.

A. Determining Pitch Masks

  • Rather than process the whole image, a pitch mask is developed to avoid processing pixels containing spectators.
  • The former constrains processing to only those pixels on the pitch, and can be easily derived from a coordinate transform of the position of the pitch in the ground plane to the image plane as follows.
  • Note, however, that parts of the pitch can be occluded by foreground spectators or parts of the stadium.
  • The hue component of the HSV color space is used to identify the region of the background image representing the pitch, since it is robust to shadows and other variations in the appearance of the grass.
  • Defined as the positions at which the histogram has decreased by 90% of the peak frequency, image pixels contributing to this interval are included in the color-based mask cM .

B. Detecting Moving Objects

  • Over the mask M detected above, foreground pixels are located using the robust multi-modal adaptive background model [17].
  • Firstly, an initial background image is determined by a per-pixel Gaussian Mixture Model, and then the background image is progressively updated using a running average algorithm for efficiency.
  • The distribution which matches each new pixel observation kI is updated as follows: )1( )1( T2 1 2 1 kkkkkk kkk μIμI Iμμ.
  • For each unmatched distribution, the parameters remain the same but its weight decreases.
  • Inside these foreground masks, a set of foreground regions are generated using connected component analysis.

C. Tracking Moving Objects

  • A Kalman tracker is used in the image plane to filter noisy measurements and split merged objects because of frequent occlusions of players and the ball.
  • The state Ix and measurement The state transition and measurement equations in the Kalman filter are: )( )1( kkk kkk IIII IIII vxHz wxAx (7) where Iw and Iv are the image plane process noise and measurement noise, and IA and IH are the state transition matrix and measurement matrix, respectively.
  • Further detail on the method for data association and handling of occlusions can be found in [18].

D. Computing Ground Plane Positions

  • Using the Tsai‟s algorithm for camera calibration [19], the measurements are transformed from image co-ordinates into world co-ordinates.
  • Basically, the pin-hole model of 3D-2D perspective projection is employed in [19] to estimate totally 11 intrinsic and extrinsic camera parameters.
  • In addition, effective dimensions of pixel in images are obtained in both horizontal and vertical directions as two fixed intrinsic constants.
  • (This assumption is usually true for players, but the ball could be anywhere on the line between that ground plane point and the camera position).
  • For each tracked object, a position and attribute measurement vector is defined as T][ yxi vvzyxp and T][ nahwi a .

III. DETECTING BALL-LIKE FEATURES

  • The two elementary properties to distinguish the ball from players and other false alarms are its size and color.
  • In general it can be assumed that the ball rapidly moving in the image plane is more likely to be positioned above the ground plane, and therefore, the size threshold should be increased to accommodate the consequent over-estimation of the ball size.
  • In addition, the proportion of white color within the object is required no less than 30% of the whole area.
  • Candidates with a likelihood above 1h are unequivocally designated a „ball‟ label; and candidates with a likelihood below 3h are unequivocally classified as „not ball‟ (i.e. false alarms).
  • Application of the temporal filter successfully locates the ball among these various candidates.

IV. MODEL BASED 3D POSITION ESTIMATION IN SINGLE AND

  • The detection results of the ball from all single views are integrated for estimation of 3D position.
  • Otherwise, the 2D image position can only provide constraints for the 3D line on which, somewhere, the ball is located.
  • After a segment-based model of the ball motion is presented, two methods are provided for determining 3D ball positions.
  • The first method is for cases in which the ball is detected from only one camera: the instant when the ball bounces on the ground is detected and the corresponding 3D position is estimated as zero.
  • The second is for cases in which the ball is visible from at least two cameras, thus integration from multiple observations are used.

A. The Ball Motion Model

  • During a soccer game, the ball is moving regularly from one place to another.
  • In a special case when the ball is rolling on the ground, the curve will become a straight line.
  • The complete ball trajectory can be modeled as a sequence of adjacent planar curve segments.
  • While beyond the scope of this paper, if the ball is struck to impart significant spin about an axis, then it will „swerve‟ in the air and the assumption that the ball travels in a vertical plane is invalid, although the „swerve‟ may be approximated by several segments, each defined by a vertical plane.
  • These estimated 3D ball positions are described as fully determined estimates, in contract to most observations, which are only determined up to a line passing through the camera focal point.

B. Fully Determined Estimates from a Single View

  • From a single camera view, the strategy adopted for determining a 3D ball position, is to detect an occasion in which the ball bounces off some other object: players, ground or goal-post.
  • If the height at which the bounce occurs can be estimated, then this height, together with its 2D image location, completely determines the 3D ball position at this time.
  • Then, the height of the ball position is estimated as zero if there are no players or other objects near the ball.
  • It can be assumed the ball is two meters off the ground plane when it strikes a player‟s head.

C. Fully Determined Estimates from Multiple Views

  • When a ball is observed in multiple cameras, there are multiple projection lines from each camera position through the corresponding observation (which, in this application, can be terminated at the ground plane).
  • False observations may exist which will lead to incorrect solutions.
  • Some false estimates can be generated from the mis-association of the ball (in one camera) and e.g. some background clutter (from another camera).
  • When the different measurement covariances for 1p and 2p are considered, the distances from b to 1p and 2p are changed into Mahalanobis distances.

E. Estimation of Missed or Uncertain Ball Positions

  • For those frames without ball observations in any single view or with ball observations of lower likelihood, i.e. less than a given threshold, the 3D ball positions are estimated by using polynomial interpolation in a curve on the corresponding vertical planes (see Section V).
  • Each curve is calculated from two fully determined estimates.
  • If more fully determined estimates are available, then they could all be incorporated into the estimation of the trajectory based on a more general least squares estimator [25].

A. Four Phases of Ball Motion

  • It is proposed to model the ball motion at each instant into four phases, namely rolling (R), flying (F), in-possession (P) and out-of-play (O).
  • A different tracking model is applicable to each phase, and furthermore the designation also provides a useful insight into the semantic progression of the game.
  • Though some other semantic events have been analyzed for soccer video understanding [21-24], they are focused on players‟ motion in broadcasting context, yet phase transitions in the ball trajectory have not been discussed.
  • This is because in-possession phases act as special periods that initialize other phases (such as rolling or flying), i.e. literally kicking the ball off in a particular direction.
  • Furthermore, the pattern of play is punctuated by periods when the ball is out-of-play, e.g. caused by fouls, ball crossing touchline, off-side or in-possession by the goal-keeper.

B. Estimating Motion Phases

  • Given observations of the ball from separate cameras and height cues obtained as described in Section IV, what follows is the estimation of the current ball phase.
  • Prior to this stage, at each frame there is at most one estimate of ball position from each of the camera views, and each estimate is assigned a measure of the likelihood that it represents the ball.
  • A „soft‟ classification [26] of the four phases is introduced, which is then input into a decision process to determine the final estimate of the phase.
  • Smooth functions are chosen to provide a measure, bounded between 0 and 1, of the membership of each motion phase.
  • For each of the in-play phases, a specific model is then employed for robust trajectory estimation below.

C. Phase-specific Trajectory Estimation

  • Finally, in this section, the three different in-play models of ball motion are described, starting with the flying trajectory.
  • Disregarding air friction, the velocity parallel to the ground plane is constant and thus the ball follows a single parabolic trajectory.
  • Disregarding all friction, )(tx and )(ty will satisfy the following equations, whether the ball is rolling or flying: )( )(.

A. System Architecture

  • The proposed system was tested on data captured from matches played at Fulham Football Club, U.K. in the 2001 Premiership Season captured by eight fixed cameras.
  • The 3D ball trajectory is visualized along with tracked players on a virtual playfield.
  • It is the multi-view tracker which is responsible for orchestrating the process by which (single-view) Feature Servers generate their results of features.
  • Eight cameras were statically mounted around the stadium as described in Figure 8.
  • The white balance was set to automatic on all cameras.

B. Data Preparation and Results

  • The proposed model has been tested in several sequences with up to 8 cameras, and each sequence has over 5500 frames.
  • Then, all the ball candidates detected from 8 sequences are integrated for multi-view tracking of the ball and 3D positioning.
  • Then, its 3D position is estimated by using multi-view geometry constraints.
  • For the frames between two GT frames, the estimated GT positions are linearly interpolated.
  • The first is the distance (in meters) between estimated and ground-truth (GT) ball positions, in which only 2D distance in x-y plane is used.

C. Evaluation of Tracking Accuracy versus Latency

  • In eight testing sequences of 5500 frames each, 3D ball positions are estimated in about 3720 frames.
  • The ground-plane errors among calibration projections are estimated to be between 0.1 and 2.5 meters, depending on the distance of the ground point to the cameras.
  • Estimated ball positions are shown as a magenta trajectory.
  • From this table it can be observed that, without temporal filtering, only 34.5% ball positions can be recovered.

D. Evaluation of Phase Transition Accuracy

  • Figure 11 illustrates a complete 3D trajectory history (ground plane projection) from frame 0 to 954 and its corresponding phase transitions.
  • Secondly, about 25% rolling and 13% in-possession balls are misjudged from each other, which happens when a rolling ball cannot be observed in a crowd or an in-possession ball is rolling near the player who possessed the ball.
  • This misjudgment affects the accuracy of the ground-truth as well as the estimate from the proposed method.
  • Heights below fz will not be recognized correctly.
  • It is worth noting that there are some phase transitions missing from the estimated trajectory.

E. System Limitations

  • As discussed above, there are two main drawbacks in their system in terms of tracking accuracy and phase transition accuracy owing to severe occlusions or insufficient observations.
  • In principle, most of these problems may be resolved by putting additional cameras, even capturing images over the pitch.
  • Occlusions are still unavoidable in the soccer context which constraints the overall recovery rate and accuracy.
  • Moreover, their system ignores air friction and cannot model some complex movements of the ball, such as the „swerve‟, and this may be an interesting topic for further investigation.

VII. CONCLUSIONS

  • A method has been described for real-time 3D trajectory estimation of the ball in a soccer game.
  • In the proposed system, video data is captured from multiple fixed and calibrated cameras.
  • Temporal filtering of the ball likelihood is also proved essential in robust ball detection and tracking.
  • One interesting feature of the approach is that it uses high-level phase transition information to aid low-level tracking.
  • Through recognition of the four phases, phase-specific models are successfully applied in estimating 3D position of the ball.

Did you find this useful? Give us your feedback

Figures (14)

Content maybe subject to copyright    Report

Abstract In this paper, model-based approaches for
real-time 3D soccer ball tracking are proposed, using image
sequences from multiple fixed cameras as input. The main
challenges include filtering false alarms, tracking through
missing observations and estimating 3D positions from single
or multiple cameras. The key innovations are: i) incorporating
motion cues and temporal hysteressis thresholding in ball
detection; ii) modeling each ball trajectory as curve segments in
successive virtual vertical planes so that the 3D position of the
ball can be determined from a single camera view; iii)
introducing four motion phases (rolling, flying, in possession,
and out of play) and employing phase-specific models to
estimate ball trajectories which enables high-level semantics
applied in low-level tracking. In addition, unreliable or missing
ball observations are recovered using spatio-temporal
constraints and temporal filtering. The system accuracy and
robustness is evaluated by comparing the estimated ball
positions and phases with manual ground-truth data of real
soccer sequences.
Index Terms Motion analysis, video signal processing,
geometric modeling, tracking, multiple cameras,
three-dimensional vision.
I. INTRODUCTION
ith the development of computer vision and multimedia
technologies, many important applications have been
developed in automatic soccer video analysis and content-based
indexing, retrieval and visualization [1-3]. By accurately
tracking players and ball, a number of innovative applications
can be derived for automatic comprehension of sports events.
These include annotation of video content, summarization,
team strategy analysis and verification of referee decisions, as
Manuscript received Dec 20, 2005. This work was supported in part by the
European Commission under Project IST-2001-37422.
J. Ren is with School of Informatics, University of Bradford, BD7 1DP, U.K.,
on leave from the School of Computers, Northwestern Polytechnic University,
Xi‟an, 710072, China (email: j.ren@bradford.ac.uk; npurjc@yahoo.com).
J. Orwell and G. A. Jones are with Digital Imaging Research Centre, Kingston
University, Surrey, KT1 2EE, U.K. (email: j.orwell@kingston.ac.uk;
g.jones@kingston.ac.uk).
M. Xu is with Signal Processing Lab, Engineering Department, Cambridge
University, CB2 1PZ, U.K. (email: mx204@cam.ac.uk).
Copyright (c) 2007 IEEE. Personal use of this material is permitted. However,
permission to use this material for any other purposes must be obtained from the
IEEE by sending an email to pubs-permissions@ieee.org.
well as the 2D or 3D reconstruction and visualization of action
[3-16]. In addition, some more recent work on tracking of
players and the ball can be also found in [27-29].
In a soccer match, the ball is invariably the focus of
attention. Although players can be successfully detected and
tracked on the basis of color and shape [1, 10, 12], similar
methods cannot be extended to ball detection and tracking for
several reasons. First, the ball is small and exhibits irregular
shape, variable size and inconsistent color when moving
rapidly, as illustrated in Figure 1. Second, the ball is frequently
occluded by players or is out of all camera fields of view (FOV),
such as when it is kicked high in the air. Finally, the ball often
leaves the ground surface, and its 3D position cannot be
uniquely determined without the measurements from at least
two cameras with overlapping fields of view. Therefore, 3D
ball position estimation and tracking is, arguably, the most
important challenge in soccer video analysis. In this paper the
problem under investigation is the automatic ball tracking from
multiple fixed cameras.
A. Related Work
Generally, TV broadcast cameras or fixed-cameras around
the stadium are the two usual sources of soccer image streams.
While TV imagery generally provides high resolution data of
the ball in the image centre, the complex camera movements
and partial views of the field, make it hard to obtain accurate
camera parameters for on-field ball positioning. On the other
hand, fixed cameras are easily calibrated, but their wide-angle
field of view makes ball detection more difficult, since the ball
is often represented by only a small number of pixels.
In the soccer domain, fully automatic methods for limited
scene understanding have been proposed, e.g. recognition of
replays from cinematic features extracted from broadcast TV
data [1] and detection of the ball in broadcast TV data [1, 2,
4-9]. Gong et al adopted white color and circular shape to
detect balls in image sequences [1]. In Yow et al [2], the ball is
detected by template matching in each of the reference frames
and then tracked between each pair of these reference frames.
Seo et al applied template matching and Kalman filter to track
balls after manual initialization [4]. Tong et al [5] employed
indirect ball detection by eliminating non-ball regions using
color and shape constraints. In Yamada et al [6], white regions
Real-time Modeling of 3D Soccer Ball
Trajectories from Multiple Fixed Cameras
Jinchang Ren, James Orwell, Graeme A Jones and Ming Xu
W
Fig. 1. Ball samples in various sizes, shapes and colors.

are taken as ball candidates after removing of players and field
lines. In Yu et al [7, 8], candidate balls are first identified by
size range, color and shape, and then these candidates are
further verified by trajectory mining with a Kalman filter.
D‟Orazio et al [9] detected the ball using a modified Hough
transform along with a neural classifier.
Using soccer sequences from fixed cameras, usually there
are two steps for the estimation and tracking of 3D ball
positions. Firstly, the ball is detected and tracked in each single
view independently. Then, 2D ball positions from different
camera views are integrated to obtain 3D positions using
known motion models [10-12]. Ohno et al arranged eight
cameras to attain a full view of the pitch [10]. They modeled the
3D ball trajectory by considering air friction and gravity which
depend on an unsolved initial velocity. Matsumoto et al [11]
used four cameras in their optimized viewpoint determination
system, in which template matching is also applied for ball
detection. Bebie and Bieri [12] employed two cameras for
soccer game reconstruction, and modeled 3D trajectory
segments by Hermite spline curves. However, about one-fifth of
the ball positions need to be set manually before estimation. In
Kim et al [13] and Reid and North [14], reference players and
shadows were utilized in the estimation of 3D ball positions.
These are unlikely to be robust as the shadow positions depend
more on light source positions than on camera projections.
B. Contributions of This Work
In this paper, a system is presented for model-based 3D ball
tracking from real soccer videos. The main contributions can
be summarized as follows.
Firstly, a motion-based thresholding process along with
temporal filtering is used to detect the ball, which has proved to
be robust to the inevitable variations in ball color and size that
result from its rapid movement. Meanwhile, a probability
measure is defined to capture the likelihood that any specific
detected moving object represents the ball.
Secondly, the 3D ball motion is modeled as a series of planar
curves each residing in a vertical virtual plane (VVP), which
involves geometric based vision techniques for 3D ball
positioning. To determine each vertical plane, at least two
observed positions of the ball with reliable height estimate are
required. These reliable estimates are obtained by either
recognizing a bouncing on the ground from single view, or
triangulating from multiple views. Based on these VVPs, the
3D ball positions are determined in single camera views by
projections. Ball positions for frames without any valid
observations are easily estimated by polynomial interpolation
to allow a continuous 3D ball trajectory to be generated.
Thirdly, the ball trajectories are modeled as one of four
phases of ball motion rolling, flying, in-possession and
out-of-play. These phase types were chosen because they each
require different models in trajectory recovery. For the first two
types, phase-specific models are employed to estimate ball
positions in linear and parabolic trajectories, respectively. It is
shown how two 3D points are sufficient to estimate the
parabolic trajectory of a flying ball. In addition, the transitions
from one phase to another also provide useful semantic insight
into the progression of the game, i.e. they coincide with the
passes, kicks etc. that constitute the play.
C. Structure of the Paper
The remaining part of the paper is organized as follows. In
Section II, the method we used for tracking and detecting
moving objects is described, using Gaussian mixtures [17] and
calibrated cameras [19]. In Section III, a method is presented
for identifying the ball from these objects. These methods
operate in the image plane from each camera separately. In
Section IV, the data from multiple cameras is integrated, to
provide a segment-based model of the ball trajectory over the
entire pitch, estimating 3D ball positions from either single
view or multiple views. In Section V, a technique is introduced
for recognizing different phases of ball motion, and for
applying phase-specific models for robust ball tracking.
Experimental results are presented in Section VI and the
conclusions are drawn in Section VII.
II. MOVING OBJECTS DETECTION AND TRACKING
To locate and track players and the soccer ball, a
multi-modal adaptive background model is utilized which
provides robust foreground detection using image differencing
[17]. This detection process is applied only to visible pitch
pixels of the appropriate color. Grouped foreground connected-
components (i.e. blobs) are tracked by a Kalman filter which
estimates 2D position, velocity and object dimensions. These
2D positions and dimensions are converted to 3D coordinates
on the pitch. Greater detail is given in the subsections below.
A. Determining Pitch Masks
Rather than process the whole image, a pitch mask is
developed to avoid processing pixels containing spectators.
This mask is defined as the intersection of the geometry-based
mask
g
M
and the color-based mask
c
M
, as shown in Figure
2. The former constrains processing to only those pixels on the
pitch, and can be easily derived from a coordinate transform of
the position of the pitch in the ground plane to the image plane
as follows. For each image pixel
p
, compute its corresponding
ground-plane point
P
. If
P
locates within the pitch, then
p
is set to 255 in
g
M
, otherwise 0. Note, however, that parts of
the pitch can be occluded by foreground spectators or parts of
the stadium. Thus, a color-based mask is used to exclude these
elements from the overall pitch mask (i.e. the region to be
processed).
The hue component of the HSV color space is used to
identify the region of the background image representing the
pitch, since it is robust to shadows and other variations in the
appearance of the grass. As it is assumed that the pitch region
has an approximately uniform color and occupies the dominant
area of the background image, pixels belonging to the pitch will
contribute to the largest peak in any hue histogram. Lower and
upper hue thresholds
and
2
H
delimit an interval around
the position
0
H
of this maximum. Defined as the positions at

which the histogram has decreased by 90% of the peak
frequency, image pixels contributing to this interval are
included in the color-based mask
c
M
.
A morphological closing operation is performed on
c
M
to
bridge the gaps caused by the white field lines in the initial
color-based mask. Thus the final mask,
M
, can be generated
as follows:
BHHvuHvuM
c
]},[),(|),{(
21
(1)
cg
MMM
(2)
where the morphological closing operation is denoted by
and
B
is its square structuring element of size
66
.
B. Detecting Moving Objects
Over the mask
M
detected above, foreground pixels are
located using the robust multi-modal adaptive background
model [17]. Firstly, an initial background image is determined
by a per-pixel Gaussian Mixture Model, and then the
background image is progressively updated using a running
average algorithm for efficiency.
Each per-pixel Gaussian Mixture Model is represented as
(
)()()(
,,
j
k
j
k
j
k
μ
), where
)()(
,
j
k
j
k
μ
and
)( j
k
are the mean,
root of the trace of covariance matrix, and weight of the j
th
distribution at frame k. The distribution which matches each
new pixel observation
k
I
is updated as follows:
)()()1(
)1(
T2
1
2
1
kkkkkk
kkk
μIμI
Iμμ
(3)
where
is the updating rate satisfying
10
. For each
unmatched distribution, the parameters remain the same but its
weight decreases. The initial background image is selected as
the distribution with the greatest weight at each pixel.
Given the input image
k
I
, the foreground binary mask
k
F
can be generated by comparing
||||
1
kk
μI
against a
threshold, i.e.
k
5.2
. To accelerate the process of updating
the background image, a running average algorithm is further
employed after the initial background and foreground have
been estimated:
kkHkHkkLkLk
FF ])1([])1([
11
μIμIμ
(4)
where
k
F
is the complement of
k
F
. The use of two update
weights (where
10 
HL
) ensures that the
background image is updated slowly in the presence of
foreground regions. Updating is required even when a pixel is
flagged as moving to allow the system to overcome mistakes in
the initial background estimate.
Inside these foreground masks, a set of foreground regions
are generated using connected component analysis. Each
region is represented by its centroid
),(
00
cr
, area
a
, and
bounding box where
),(
11
cr
and
),(
22
cr
are the top-left and
bottom-right corners of the bounding box.
C. Tracking Moving Objects
A Kalman tracker is used in the image plane to filter noisy
measurements and split merged objects because of frequent
occlusions of players and the ball. The state
I
x
and
measurement
I
z
are given by:
T
22110000
][ crcrcrcr
I
x
(5)
T
221100
][ crcrcr
I
z
(6)
where
),(
00
cr
is the centroid,
),(
00
cr
is the velocity,
),(
11
cr
and
),(
22
cr
are the top-left and bottom-right
corners of the bounding box respectively (such that
21
rr
and
21
cc
) and
),(
11
cr
and
),(
22
cr
are the
relative positions of the two opposite corners to the centroid.
The state transition and measurement equations in the Kalman
filter are:
)()()(
)()()1(
kkk
kkk
IIII
IIII
vxHz
wxAx
(7)
where
I
w
and
I
v
are the image plane process noise and
measurement noise, and
I
A
and
I
H
are the state transition
matrix and measurement matrix, respectively.
2222
2222
2222
2222
IOOO
OIOO
OOIO
OOII
A
T
I
(a) (b)
(c) (d)
Fig. 2. Extraction of pitch masks based on both color and geometry:
(a) Original background image, (b) Geometry-based mask of pitch, (c)
Color-based mask of pitch, and (d) Final mask obtained.

2222
2222
2222
IOOI
OIOI
OOOI
H
I
(8)
In equation (8),
2
I
and
2
O
represent
22
identity and
zero metrics;
T
is the time interval between frames. Further
detail on the method for data association and handling of
occlusions can be found in [18].
D. Computing Ground Plane Positions
Using the Tsai‟s algorithm for camera calibration [19], the
measurements are transformed from image co-ordinates into
world co-ordinates. Basically, the pin-hole model of 3D-2D
perspective projection is employed in [19] to estimate totally 11
intrinsic and extrinsic camera parameters. In addition,
effective dimensions of pixel in images are obtained in both
horizontal and vertical directions as two fixed intrinsic
constants. These two constants are then taken to calculate the
world co-ordinates measurements of the objects on the basis of
detected image-plane bounding boxes. Let
),,( zyx
denote
the 3D object position in world co-ordinates, then
x
and
y
are estimated by using the center point of the bottom line of
each bounding box, and
z
initialized as zero. Until Section
IV, all objects are assumed to lie on the ground plane. (This
assumption is usually true for players, but the ball could be
anywhere on the line between that ground plane point and the
camera position). For each tracked object, a position and
attribute measurement vector is defined as
T
][
yxi
vvzyxp
and
T
][ nahw
i
a
. In
addition, a ground plane velocity
),(
yx
vv
is estimated from
the projection of the image-plane velocity (which is obtained
from the image plane tracking process) onto the ground plane.
Note that this ground-plane velocity is not intended to estimate
the real velocity, in cases where the ball is off the ground. The
attributes
hw,
and
a
are an object‟s width, height and area,
also measured in meters (and meters squared), and calculated
by assuming the object touches the ground plane. Besides, each
object is validated before further processing provided that its
size satisfies
mw 1.0
,
mh 1.0
and
2
03.0 ma
.
Finally,
n
is the longevity of the tracked object, measured in
frames.
III. DETECTING BALL-LIKE FEATURES
To identify ball-like features in a single-view process, each
of the tracked objects is attributed with a likelihood l that
represents the ball. The two elementary properties to
distinguish the ball from players and other false alarms are its
size and color. Three simple features are used to describe the
size of the object, i.e. its width, height, and area, in which
measurements in real-world units are adopted for robustness
against variable sizes of the ball in image plane. A fourth
feature derived from its color appearance, measures the
proportion of the object‟s area that is white.
To discriminate the ball from other objects, a
straightforward process is to apply fixed thresholds to these
features. However, this suffers from several difficulties. Firstly,
false alarms such as fragmented field lines or fragments of
players (especially socks) cannot always be discriminated.
Secondly, if no information is available about the height of the
ball, the estimate of the dimensions may be inaccurate. For
example, by assuming the ball is touching the ground plane, an
airborne ball will appear to be a larger object. Thirdly, the
image of a fast-moving ball is affected by motion blurring,
rendering it larger and less white than a stationary (or slower
moving) ball.
A key observation from soccer videos is that the ball in play
is nearly always moving, which suggests that the velocity may
be a useful additional discriminant. Thus, as field markings are
stationary the majority of these markings can be discriminated
from the ball by thresholding both the size and absolute velocity
of the detected object.
Another category of false alarms is caused by a part of a
player that has become temporarily disassociated from the
remainder of the player. A typical cause of this phenomenon is
imperfect foreground segmentation. However, such transitory
artifacts do not in general persist for longer than a couple of
(a)
(b)
(c)
(d)
Fig. 3. Tracked ball with ID and assigned likelihood (a) Id=7, l=0.9 (b)
l=0.0, the ball is moving out of current camera view (c) Id=16, l=0.9
and (d) ball is merged with player 9 in frame #977, #990, #1044, and
#1056, respectively.

frames, whereupon the correct representation is resumed.
Therefore, this category of false alarm can be correctly
discriminated by discarding all short-lived objects, i.e. whose
longevity is less than five frames.
Features describing the velocity and longevity of the
observations are used to solve the three difficulties described
above. These features (derived from tracking) are employed
alongside size and color features to help discriminate the ball
from other objects. The velocity feature is also useful when the
size of the detected ball is overestimated, either through a
motion-blur effect (proportional to the duration of the
shutter-speed), or a range error effect (incorrectly assuming
the object lies on the ground plane). Here, the key innovation is
to allow the size threshold to vary as a function of the estimated
ground-plane velocity. There is a simple rationale for the
motion-blur effect: the expected area is also directly
proportional to the image-plane speed. The range error effect is
more complicated as the 3D trajectory of the ball may be
directly towards the camera generating zero velocity in the
image plane. However, in general it can be assumed that the
ball rapidly moving in the image plane is more likely to be
positioned above the ground plane, and therefore, the size
threshold should be increased to accommodate the consequent
over-estimation of the ball size.
As for a standard soccer ball, it has a constant diameter
0
d
(between
m216.0
and
m226.0
) and an area (of a great circle)
0
a
about
2
04.0 m
. Considering over-estimated ball size
during fast movement, two thresholds for the width and height
of the ball,
0
w
and
0
h
, are defined by
Tvdh
Tvdw
y
x
||
||
00
00
(9)
For robustness, valid size ranges of the ball are required
satisfying
5/||
00
dww
,
5/||
00
dhh
, and
2
00
)(||8/|| Tvvaaa
yx
. In addition, the proportion
of white color within the object is required no less than 30% of
the whole area. All objects having size and color outside the
prescribed thresholds are assigned a likelihood of zero and
excluded from further processing. Each remaining object is
classed as a ball candidate, and assigned an estimate of the
likelihood that represents the ball. The proposed form for this
estimate is the following equation, incorporating both its
absolute velocity
i
v
and longevity
n
:
)1(
0
max
nt
i
i
e
v
l
v
(10)
where
max
v
is the maximum absolute velocity of all the objects
detected in the given camera, at a given frame (including the
ball, if visible, and also non-ball objects), and
0
t
is a constant
parameter. Thus, faster moving objects are considered more
likely to be the ball based on the fact that, in the professional
game, the ball normally moves faster than other objects.
Figure 3 shows partial views of camera #1 with detected ball
at frame 977, 990, 1044 and 1056, respectively. The ball or
each player is assigned with a unique ID unless it is near the
(a)
(b)
(c)
Fig. 4. Thirty seconds of single camera tracking data from camera #1 (a) and filtered results of the ball in (b) and (c), in which time
t moves from left to right, and the x-coordinate of the objects c
0
is plotted up the y-axis. In (b) and (c), most-likely ball is labeled in
black, (b) is the result filtering on appearance and velocity and (c) is the result after temporal filtering.

Citations
More filters
Dissertation
01 Jan 2014
TL;DR: A real-time distributed video pipeline that produces high resolution panorama video and selects three current state-of-the-art object tracking algorithms and through implementation of three of their own tracker algorithms, evaluates the performance and feasibility of object tracking in a real- time video pipeline.
Abstract: Great technological advances in the last decades have enabled the creation of multimedia systems that capture high resolution images at high frame rates. As more and more digital video content is generated, we strive for ways to automatically analyze and understand video through computer vision, for categorization, searchable video and other purposes. Object recognition forms an important part in this evolution, where objects in video are found and identified. In this thesis, we introduce a real-time distributed video pipeline that produces high resolution panorama video. Created as a diverse tool for capture and analysis of sports footage, the system allows for real-time generation and viewing of a cylindrical panorama covering an entire soccer field using five industrial cameras. In three case studies, we investigate the potential for ball and object tracking within such a panorama, by installing the system in both indoor and outdoor locations using different configurations. By selecting three current state-of-the-art object tracking algorithms and through implementation of three of our own tracker algorithms, we evaluate the performance and feasibility of object tracking in a real-time video pipeline. We detail and evaluate the performance of these six trackers in all case studies, and learn that there is often a trade-off between tracking robustness and execution speed. Trackers that maintain adaptive appearance models to track unknown objects, struggle with handling the workload of the high resolution video produced by Bagadus. By developing tracking algorithms specific to each sequence, we are able to achieve usable performance by reducing computations, at the sacrifice of portability. We recognize that especially ball tracking in team sports is a very challenging case, and hope to provide knowledge useful for further research.

2 citations


Cites background from "Real-Time Modeling of 3-D Soccer Ba..."

  • ...Exploiting the fact that soccer is a sport based around the ball, [47] made separate physics-based motion models for different phases the ball was in (flying, rolling, in-possession)....

    [...]

  • ...Lots of research has been produced in combining object tracking with broadcast soccer footage [69, 38, 43, 47], but these tracking algorithms do not operate within the context of an video pipeline producing such extreme image resolutions....

    [...]

  • ...We also saw that many try to tackle the challenge of team sports by tracking both players and the ball, and reasoning about the gameplay in different ways [38, 43, 47, 61, 69]....

    [...]

Proceedings ArticleDOI
03 Sep 2012
TL;DR: Some in depth study on the challenges and issues in many real time video surveillance applications are presented, highlighting the need for an improved video tracking algorithms for effective design of video surveillance systems.
Abstract: Real time video surveillance is an interdisciplinary task that has perceived reasonable attention safety and security purpose. Such video surveillance task is challenging task which involves detection of one or more moving objects from a video sequence. Though there has been several analysis made on different perspectives of video surveillance, there are many issues left open for investigation namely segmentation of moving objects, foreground and background detection, preprocessing, feature extraction and so on. This paper presents some in depth study on the challenges and issues in many real time video surveillance applications, highlighting the need for an improved video tracking algorithms for effective design of video surveillance systems. In addition the paper focuses on to provide a new proposal in three fold ways, there by producing a refined approach as compared to previous techniques for real time video surveillance. The proposed system is experimented over the synthetic data set and also tested under commercial data repository, which leads to results that were promising.

2 citations


Cites background from "Real-Time Modeling of 3-D Soccer Ba..."

  • ...…for many applications like number plate recognition, automatic face recognition, traffic monitoring, explorative visualization and analysis (Buter et al., 2011), in-house health care systems (Raty, 2010; Shiu, 2010; Jinchang et al., 2008) and human machine interaction (Halim et al., 2011)....

    [...]

Proceedings ArticleDOI
01 Jun 2022
TL;DR: In this article , a small neural network trained on image patches around candidates generated by a conventional ball detector is used to predict the confidence of having a ball in the image patch, and through its confidence output, the model improves the detection rate by filtering the candidates produced by the detector.
Abstract: Ball 3D localization in team sports has various applications including automatic offside detection in soccer, or shot release localization in basketball. Today, this task is either resolved by using expensive multi-views setups, or by restricting the analysis to ballistic trajectories. In this work, we propose to address the task on a single image from a calibrated monocular camera by estimating ball diameter in pixels and use the knowledge of real ball diameter in meters. This approach is suitable for any game situation where the ball is (even partly) visible. To achieve this, we use a small neural network trained on image patches around candidates generated by a conventional ball detector. Besides predicting ball diameter, our network outputs the confidence of having a ball in the image patch. Validations on 3 basketball datasets reveals that our model gives remarkable predictions on ball 3D localization. In addition, through its confidence output, our model improves the detection rate by filtering the candidates produced by the detector. The contributions of this work are (i) the first model to address 3D ball localization on a single image, (ii) an effective method for ball 3D annotation from single calibrated images, (iii) a high quality 3D ball evaluation dataset annotated from a single viewpoint. In addition, the code to reproduce this research will be made freely available at https://github.com/gabriel-vanzandycke/deepsport

1 citations

DOI
01 Jan 2019
TL;DR: This thesis proposes a model that tracks both types of objects simultaneously, while respecting the physical laws of ball motion when in free fall, and interaction constraints that appear when players are in the possession of the ball.
Abstract: Multiple object tracking is a crucial Computer Vision Task. It aims at locating objects of interest in the image sequences, maintaining their identities, and identifying their trajectories over time. A large portion of current research focuses on tracking pedestrians, and other types of objects, that often exhibit predictable behaviours, that allow us, as humans, to track those objects. Nevertheless, most existing approaches rely solely on simple affinity or appearance cues to maintain the identities of the tracked objects, ignoring their behaviour. This presents a challenge when objects of interest are invisible or indistinguishable for a long period of time. In this thesis, we focus on enhancing the quality of multiple object trackers by learning and exploiting the long ranging models of object behaviour. Such behaviours come in different forms, be it a physical model of the ball motion, model of interaction between the ball and the players in sports or motion patterns of pedestrians or cars, that is specific to a particular scene. In the first part of the thesis, we begin with the task of tracking the ball and the players in team sports. We propose a model that tracks both types of objects simultaneously, while respecting the physical laws of ball motion when in free fall, and interaction constraints that appear when players are in the possession of the ball. We show that both the presence of the behaviour models and the simultaneous solution of both tasks aids the performance of tracking, in basketball, volleyball, and soccer. In the second part of the thesis, we focus on motion models of pedestrian and car behaviour that emerge in the outdoor scenes. Such motion models are inherently global, as they determine where people starting from one location tend to end up much later in time. Imposing such global constraints while keeping the tracking problem tractable presents a challenge, which is why many approaches rely on local affinity measures. We formulate a problem of simultaneously tracking the objects and learning their behaviour patterns. We show that our approach, when applied in conjunction with a number of state-of-the-art trackers, improves their performance, by forcing their output to follow the learned motion patterns of the scene. In the last part of the thesis, we study a new emerging class of models for multiple object tracking, that appeared recently due to availability of large scale datasets sequence models for multiple object tracking. While such models could potentially learn arbitrarily long ranging behaviours, training them presents several challenges. We propose a training scheme and a loss function

1 citations


Cites methods from "Real-Time Modeling of 3-D Soccer Ba..."

  • ...In [147], Canny-like hysteresis is used to select candidates above a certain confidence level and link them to already hypothesized trajectories....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: There is a natural uncertainty principle between detection and localization performance, which are the two main goals, and with this principle a single operator shape is derived which is optimal at any scale.
Abstract: This paper describes a computational approach to edge detection. The success of the approach depends on the definition of a comprehensive set of goals for the computation of edge points. These goals must be precise enough to delimit the desired behavior of the detector while making minimal assumptions about the form of the solution. We define detection and localization criteria for a class of edges, and present mathematical forms for these criteria as functionals on the operator impulse response. A third criterion is then added to ensure that the detector has only one response to a single edge. We use the criteria in numerical optimization to derive detectors for several common image features, including step edges. On specializing the analysis to step edges, we find that there is a natural uncertainty principle between detection and localization performance, which are the two main goals. With this principle we derive a single operator shape which is optimal at any scale. The optimal detector has a simple approximate implementation in which edges are marked at maxima in gradient magnitude of a Gaussian-smoothed image. We extend this simple detector using operators of several widths to cope with different signal-to-noise ratios in the image. We present a general method, called feature synthesis, for the fine-to-coarse integration of information from operators at different scales. Finally we show that step edge detector performance improves considerably as the operator point spread function is extended along the edge.

28,073 citations


"Real-Time Modeling of 3-D Soccer Ba..." refers methods in this paper

  • ...The filter uses hysteresis to process the likelihood estimates into discrete labels, in an approach similar to the Canny filter [20]....

    [...]

Book
01 Jun 1969
TL;DR: In this paper, Monte Carlo techniques are used to fit dependent and independent variables least squares fit to a polynomial least-squares fit to an arbitrary function fitting composite peaks direct application of the maximum likelihood.
Abstract: Uncertainties in measurements probability distributions error analysis estimates of means and errors Monte Carlo techniques dependent and independent variables least-squares fit to a polynomial least-squares fit to an arbitrary function fitting composite peaks direct application of the maximum likelihood. Appendices: numerical methods matrices graphs and tables histograms and graphs computer routines in Pascal.

12,737 citations

Journal ArticleDOI
TL;DR: Numerical methods matrices graphs and tables histograms and graphs computer routines in Pascal and Monte Carlo techniques dependent and independent variables least-squares fit to a polynomial least-square fit to an arbitrary function fitting composite peaks direct application of the maximum likelihood.
Abstract: Uncertainties in measurements probability distributions error analysis estimates of means and errors Monte Carlo techniques dependent and independent variables least-squares fit to a polynomial least-squares fit to an arbitrary function fitting composite peaks direct application of the maximum likelihood. Appendices: numerical methods matrices graphs and tables histograms and graphs computer routines in Pascal.

10,546 citations


"Real-Time Modeling of 3-D Soccer Ba..." refers methods in this paper

  • ...If more fully determined estimates are available, then they could all be incorporated into the estimation of the trajectory based on a more general least-squares estimator [ 25 ]....

    [...]

  • ...Moreover, if more than two ball positions have been decided within a curve segment, then a least-squares calculation of the trajectory segment can be used to provide a more robust estimate [ 25 ]....

    [...]

Proceedings ArticleDOI
23 Jun 1999
TL;DR: This paper discusses modeling each pixel as a mixture of Gaussians and using an on-line approximation to update the model, resulting in a stable, real-time outdoor tracker which reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes.
Abstract: A common method for real-time segmentation of moving regions in image sequences involves "background subtraction", or thresholding the error between an estimate of the image without moving objects and the current image. The numerous approaches to this problem differ in the type of background model used and the procedure used to update the model. This paper discusses modeling each pixel as a mixture of Gaussians and using an on-line approximation to update the model. The Gaussian, distributions of the adaptive mixture model are then evaluated to determine which are most likely to result from a background process. Each pixel is classified based on whether the Gaussian distribution which represents it most effectively is considered part of the background model. This results in a stable, real-time outdoor tracker which reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes. This system has been run almost continuously for 16 months, 24 hours a day, through rain and snow.

7,660 citations


"Real-Time Modeling of 3-D Soccer Ba..." refers methods in this paper

  • ...To locate and track players and the soccer ball, a multimodal adaptive background model is utilized that provides robust foreground detection using image differencing [17]....

    [...]

  • ...In Section II, the method we used for tracking and detecting moving objects is described, using Gaussian mixtures [17] and calibrated cameras [19]....

    [...]

  • ...Over the mask detected above, foreground pixels are located using the robust multimodal adaptive background model [17]....

    [...]

Journal ArticleDOI
Roger Y. Tsai1
01 Aug 1987
TL;DR: In this paper, a two-stage technique for 3D camera calibration using TV cameras and lenses is described, aimed at efficient computation of camera external position and orientation relative to object reference coordinate system as well as the effective focal length, radial lens distortion, and image scanning parameters.
Abstract: A new technique for three-dimensional (3D) camera calibration for machine vision metrology using off-the-shelf TV cameras and lenses is described. The two-stage technique is aimed at efficient computation of camera external position and orientation relative to object reference coordinate system as well as the effective focal length, radial lens distortion, and image scanning parameters. The two-stage technique has advantage in terms of accuracy, speed, and versatility over existing state of the art. A critical review of the state of the art is given in the beginning. A theoretical framework is established, supported by comprehensive proof in five appendixes, and may pave the way for future research on 3D robotics vision. Test results using real data are described. Both accuracy and speed are reported. The experimental results are analyzed and compared with theoretical prediction. Recent effort indicates that with slight modification, the two-stage calibration can be done in real time.

5,940 citations

Frequently Asked Questions (2)
Q1. What are the future works in this paper?

The authors model the ball trajectory as curve segments in consecutive virtual vertical planes, which can accurately approximate the real cases even in complex situation. Using geometric reconstruction techniques, the authors can successfully estimate 3D ball positions from a single view. 

In this paper, a real-time 3D soccer ball tracking system is proposed, using image sequences from multiple fixed cameras as input.