Journal Article•DOI•

Real-Time Modeling of 3-D Soccer Ball Trajectories From Multiple Fixed Cameras

Jinchang Ren¹, Ming Xu², James Orwell³, Graeme A. Jones³•Institutions (3)

University of Bradford¹, University of Cambridge², Kingston University³

01 Mar 2008-IEEE Transactions on Circuits and Systems for Video Technology (Institute of Electrical and Electronics Engineers)-Vol. 18, Iss: 3, pp 350-362

TL;DR: In this paper, model-based approaches for real-time 3-D soccer ball tracking are proposed, using image sequences from multiple fixed cameras as input, and incorporating motion cues and temporal hysteresis thresholding in ball detection and employing phase-specific models to estimate ball trajectories.

read less

Abstract: In this paper, model-based approaches for real-time 3-D soccer ball tracking are proposed, using image sequences from multiple fixed cameras as input. The main challenges include filtering false alarms, tracking through missing observations, and estimating 3-D positions from single or multiple cameras. The key innovations are: 1. incorporating motion cues and temporal hysteresis thresholding in ball detection; 2. modeling each ball trajectory as curve segments in successive virtual vertical planes so that the 3-D position of the ball can be determined from a single camera view; and 3. introducing four motion phases (rolling, flying, in possession, and out of play) and employing phase-specific models to estimate ball trajectories which enables high-level semantics applied in low-level tracking. In addition, unreliable or missing ball observations are recovered using spatio-temporal constraints and temporal filtering. The system accuracy and robustness are evaluated by comparing the estimated ball positions and phases with manual ground-truth data of real soccer sequences.

...read moreread less

Summary (6 min read)

Jump to: [Introduction] – [A. Related Work] – [B. Contributions of This Work] – [C. Structure of the Paper] – [II. MOVING OBJECTS DETECTION AND TRACKING] – [A. Determining Pitch Masks] – [B. Detecting Moving Objects] – [C. Tracking Moving Objects] – [D. Computing Ground Plane Positions] – [III. DETECTING BALL-LIKE FEATURES] – [IV. MODEL BASED 3D POSITION ESTIMATION IN SINGLE AND] – [A. The Ball Motion Model] – [B. Fully Determined Estimates from a Single View] – [C. Fully Determined Estimates from Multiple Views] – [E. Estimation of Missed or Uncertain Ball Positions] – [A. Four Phases of Ball Motion] – [B. Estimating Motion Phases] – [C. Phase-specific Trajectory Estimation] – [A. System Architecture] – [B. Data Preparation and Results] – [C. Evaluation of Tracking Accuracy versus Latency] – [D. Evaluation of Phase Transition Accuracy] – [E. System Limitations] and [VII. CONCLUSIONS]

Introduction

Model-based approaches for real-time 3D soccer ball tracking are proposed, using image sequences from multiple fixed cameras as input.
10, 12], similar methods cannot be extended to ball detection and tracking for several reasons.

B. Contributions of This Work

A system is presented for model-based 3D ball tracking from real soccer videos.
The main contributions can be summarized as follows.
Meanwhile, a probability measure is defined to capture the likelihood that any specific detected moving object represents the ball.
Secondly, the 3D ball motion is modeled as a series of planar curves each residing in a vertical virtual plane (VVP), which involves geometric based vision techniques for 3D ball positioning.
For the first two types, phase-specific models are employed to estimate ball positions in linear and parabolic trajectories, respectively.

C. Structure of the Paper

In Section II, the method the authors used for tracking and detecting moving objects is described, using Gaussian mixtures [17] and calibrated cameras [19].
In Section III, a method is presented for identifying the ball from these objects.
These methods operate in the image plane from each camera separately.
In Section IV, the data from multiple cameras is integrated, to provide a segment-based model of the ball trajectory over the entire pitch, estimating 3D ball positions from either single view or multiple views.
Experimental results are presented in Section VI and the conclusions are drawn in Section VII.

II. MOVING OBJECTS DETECTION AND TRACKING

To locate and track players and the soccer ball, a multi-modal adaptive background model is utilized which provides robust foreground detection using image differencing [17].
This detection process is applied only to visible pitch pixels of the appropriate color.
Grouped foreground connectedcomponents (i.e. blobs) are tracked by a Kalman filter which estimates 2D position, velocity and object dimensions.
Greater detail is given in the subsections below.

A. Determining Pitch Masks

Rather than process the whole image, a pitch mask is developed to avoid processing pixels containing spectators.
The former constrains processing to only those pixels on the pitch, and can be easily derived from a coordinate transform of the position of the pitch in the ground plane to the image plane as follows.
Note, however, that parts of the pitch can be occluded by foreground spectators or parts of the stadium.
The hue component of the HSV color space is used to identify the region of the background image representing the pitch, since it is robust to shadows and other variations in the appearance of the grass.
Defined as the positions at which the histogram has decreased by 90% of the peak frequency, image pixels contributing to this interval are included in the color-based mask cM .

B. Detecting Moving Objects

Over the mask M detected above, foreground pixels are located using the robust multi-modal adaptive background model [17].
Firstly, an initial background image is determined by a per-pixel Gaussian Mixture Model, and then the background image is progressively updated using a running average algorithm for efficiency.
The distribution which matches each new pixel observation kI is updated as follows: )1( )1( T2 1 2 1 kkkkkk kkk μIμI Iμμ.
For each unmatched distribution, the parameters remain the same but its weight decreases.
Inside these foreground masks, a set of foreground regions are generated using connected component analysis.

C. Tracking Moving Objects

A Kalman tracker is used in the image plane to filter noisy measurements and split merged objects because of frequent occlusions of players and the ball.
The state Ix and measurement The state transition and measurement equations in the Kalman filter are: )( )1( kkk kkk IIII IIII vxHz wxAx (7) where Iw and Iv are the image plane process noise and measurement noise, and IA and IH are the state transition matrix and measurement matrix, respectively.
Further detail on the method for data association and handling of occlusions can be found in [18].

D. Computing Ground Plane Positions

Using the Tsai‟s algorithm for camera calibration [19], the measurements are transformed from image co-ordinates into world co-ordinates.
Basically, the pin-hole model of 3D-2D perspective projection is employed in [19] to estimate totally 11 intrinsic and extrinsic camera parameters.
In addition, effective dimensions of pixel in images are obtained in both horizontal and vertical directions as two fixed intrinsic constants.
(This assumption is usually true for players, but the ball could be anywhere on the line between that ground plane point and the camera position).
For each tracked object, a position and attribute measurement vector is defined as T][ yxi vvzyxp and T][ nahwi a .

III. DETECTING BALL-LIKE FEATURES

The two elementary properties to distinguish the ball from players and other false alarms are its size and color.
In general it can be assumed that the ball rapidly moving in the image plane is more likely to be positioned above the ground plane, and therefore, the size threshold should be increased to accommodate the consequent over-estimation of the ball size.
In addition, the proportion of white color within the object is required no less than 30% of the whole area.
Candidates with a likelihood above 1h are unequivocally designated a „ball‟ label; and candidates with a likelihood below 3h are unequivocally classified as „not ball‟ (i.e. false alarms).
Application of the temporal filter successfully locates the ball among these various candidates.

IV. MODEL BASED 3D POSITION ESTIMATION IN SINGLE AND

The detection results of the ball from all single views are integrated for estimation of 3D position.
Otherwise, the 2D image position can only provide constraints for the 3D line on which, somewhere, the ball is located.
After a segment-based model of the ball motion is presented, two methods are provided for determining 3D ball positions.
The first method is for cases in which the ball is detected from only one camera: the instant when the ball bounces on the ground is detected and the corresponding 3D position is estimated as zero.
The second is for cases in which the ball is visible from at least two cameras, thus integration from multiple observations are used.

A. The Ball Motion Model

During a soccer game, the ball is moving regularly from one place to another.
In a special case when the ball is rolling on the ground, the curve will become a straight line.
The complete ball trajectory can be modeled as a sequence of adjacent planar curve segments.
While beyond the scope of this paper, if the ball is struck to impart significant spin about an axis, then it will „swerve‟ in the air and the assumption that the ball travels in a vertical plane is invalid, although the „swerve‟ may be approximated by several segments, each defined by a vertical plane.
These estimated 3D ball positions are described as fully determined estimates, in contract to most observations, which are only determined up to a line passing through the camera focal point.

B. Fully Determined Estimates from a Single View

From a single camera view, the strategy adopted for determining a 3D ball position, is to detect an occasion in which the ball bounces off some other object: players, ground or goal-post.
If the height at which the bounce occurs can be estimated, then this height, together with its 2D image location, completely determines the 3D ball position at this time.
Then, the height of the ball position is estimated as zero if there are no players or other objects near the ball.
It can be assumed the ball is two meters off the ground plane when it strikes a player‟s head.

C. Fully Determined Estimates from Multiple Views

When a ball is observed in multiple cameras, there are multiple projection lines from each camera position through the corresponding observation (which, in this application, can be terminated at the ground plane).
False observations may exist which will lead to incorrect solutions.
Some false estimates can be generated from the mis-association of the ball (in one camera) and e.g. some background clutter (from another camera).
When the different measurement covariances for 1p and 2p are considered, the distances from b to 1p and 2p are changed into Mahalanobis distances.

E. Estimation of Missed or Uncertain Ball Positions

For those frames without ball observations in any single view or with ball observations of lower likelihood, i.e. less than a given threshold, the 3D ball positions are estimated by using polynomial interpolation in a curve on the corresponding vertical planes (see Section V).
Each curve is calculated from two fully determined estimates.
If more fully determined estimates are available, then they could all be incorporated into the estimation of the trajectory based on a more general least squares estimator [25].

A. Four Phases of Ball Motion

It is proposed to model the ball motion at each instant into four phases, namely rolling (R), flying (F), in-possession (P) and out-of-play (O).
A different tracking model is applicable to each phase, and furthermore the designation also provides a useful insight into the semantic progression of the game.
Though some other semantic events have been analyzed for soccer video understanding [21-24], they are focused on players‟ motion in broadcasting context, yet phase transitions in the ball trajectory have not been discussed.
This is because in-possession phases act as special periods that initialize other phases (such as rolling or flying), i.e. literally kicking the ball off in a particular direction.
Furthermore, the pattern of play is punctuated by periods when the ball is out-of-play, e.g. caused by fouls, ball crossing touchline, off-side or in-possession by the goal-keeper.

B. Estimating Motion Phases

Given observations of the ball from separate cameras and height cues obtained as described in Section IV, what follows is the estimation of the current ball phase.
Prior to this stage, at each frame there is at most one estimate of ball position from each of the camera views, and each estimate is assigned a measure of the likelihood that it represents the ball.
A „soft‟ classification [26] of the four phases is introduced, which is then input into a decision process to determine the final estimate of the phase.
Smooth functions are chosen to provide a measure, bounded between 0 and 1, of the membership of each motion phase.
For each of the in-play phases, a specific model is then employed for robust trajectory estimation below.

C. Phase-specific Trajectory Estimation

Finally, in this section, the three different in-play models of ball motion are described, starting with the flying trajectory.
Disregarding air friction, the velocity parallel to the ground plane is constant and thus the ball follows a single parabolic trajectory.
Disregarding all friction, )(tx and )(ty will satisfy the following equations, whether the ball is rolling or flying: )( )(.

A. System Architecture

The proposed system was tested on data captured from matches played at Fulham Football Club, U.K. in the 2001 Premiership Season captured by eight fixed cameras.
The 3D ball trajectory is visualized along with tracked players on a virtual playfield.
It is the multi-view tracker which is responsible for orchestrating the process by which (single-view) Feature Servers generate their results of features.
Eight cameras were statically mounted around the stadium as described in Figure 8.
The white balance was set to automatic on all cameras.

B. Data Preparation and Results

The proposed model has been tested in several sequences with up to 8 cameras, and each sequence has over 5500 frames.
Then, all the ball candidates detected from 8 sequences are integrated for multi-view tracking of the ball and 3D positioning.
Then, its 3D position is estimated by using multi-view geometry constraints.
For the frames between two GT frames, the estimated GT positions are linearly interpolated.
The first is the distance (in meters) between estimated and ground-truth (GT) ball positions, in which only 2D distance in x-y plane is used.

C. Evaluation of Tracking Accuracy versus Latency

In eight testing sequences of 5500 frames each, 3D ball positions are estimated in about 3720 frames.
The ground-plane errors among calibration projections are estimated to be between 0.1 and 2.5 meters, depending on the distance of the ground point to the cameras.
Estimated ball positions are shown as a magenta trajectory.
From this table it can be observed that, without temporal filtering, only 34.5% ball positions can be recovered.

D. Evaluation of Phase Transition Accuracy

Figure 11 illustrates a complete 3D trajectory history (ground plane projection) from frame 0 to 954 and its corresponding phase transitions.
Secondly, about 25% rolling and 13% in-possession balls are misjudged from each other, which happens when a rolling ball cannot be observed in a crowd or an in-possession ball is rolling near the player who possessed the ball.
This misjudgment affects the accuracy of the ground-truth as well as the estimate from the proposed method.
Heights below fz will not be recognized correctly.
It is worth noting that there are some phase transitions missing from the estimated trajectory.

E. System Limitations

As discussed above, there are two main drawbacks in their system in terms of tracking accuracy and phase transition accuracy owing to severe occlusions or insufficient observations.
In principle, most of these problems may be resolved by putting additional cameras, even capturing images over the pitch.
Occlusions are still unavoidable in the soccer context which constraints the overall recovery rate and accuracy.
Moreover, their system ignores air friction and cannot model some complex movements of the ball, such as the „swerve‟, and this may be an interesting topic for further investigation.

VII. CONCLUSIONS

A method has been described for real-time 3D trajectory estimation of the ball in a soccer game.
In the proposed system, video data is captured from multiple fixed and calibrated cameras.
Temporal filtering of the ball likelihood is also proved essential in robust ball detection and tracking.
One interesting feature of the approach is that it uses high-level phase transition information to aid low-level tracking.
Through recognition of the four phases, phase-specific models are successfully applied in estimating 3D position of the ball.

Did you find this useful? Give us your feedback

Figures (14)

Fig. 1. Ball samples in various sizes, shapes and colors.

Table 2. Tracking accuracy versus latency (buffering size)

Fig. 10. Estimated 3D ball trajectory compared with GT and two 2D trajectories (a), and the visualization of the 3D trajectory from frame #755 to #826 shown in (b).

Fig. 9. Probability distribution of tracking accuracy compared the estimated ball positions with GT in 5500 frames: (a) PDF of the accuracy, (b) is accumulated probability of (a).

Fig. 5. 3D ball b estimation from cameras 1c and 2c with projected ball positions 1b and 2b .

Fig. 6. Geometric relationship among the camera position c, the ball position b, as well as vertical and ground planes and .

Table 1. List of important constant parameters and thresholds.

Fig. 8. Architecture of the proposed tracking system and arrangements of eight cameras with their field-of-views.

Fig. 2. Extraction of pitch masks based on both color and geometry: (a) Original background image, (b) Geometry-based mask of pitch, (c) Color-based mask of pitch, and (d) Final mask obtained.

Fig. 7. Phase transition graph in soccer ball motion.

Fig. 3. Tracked ball with ID and assigned likelihood (a) Id=7, l=0.9 (b) l=0.0, the ball is moving out of current camera view (c) Id=16, l=0.9 and (d) ball is merged with player 9 in frame #977, #990, #1044, and #1056, respectively.

Fig. 11. Overall trajectory of the ball from frame #0 to frame #954 (left), and its corresponding phase transition graph in a complete CPT compared with ground truth with four y-axis positions represent the four phases (right).

Table 3. Quantitative analysis of Figure 11(a) using ground truth and estimated results.

Fig. 4. Thirty seconds of single camera tracking data from camera #1 (a) and filtered results of the ball in (b) and (c), in which time t moves from left to right, and the x-coordinate of the objects c0 is plotted up the y-axis. In (b) and (c), most-likely ball is labeled in black, (b) is the result filtering on appearance and velocity and (c) is the result after temporal filtering.

Content maybe subject to copyright Report

 Abstract— In this paper, model-based approaches for

real-time 3D soccer ball tracking are proposed, using image

sequences from multiple fixed cameras as input. The main

challenges include filtering false alarms, tracking through

missing observations and estimating 3D positions from single

or multiple cameras. The key innovations are: i) incorporating

motion cues and temporal hysteressis thresholding in ball

detection; ii) modeling each ball trajectory as curve segments in

successive virtual vertical planes so that the 3D position of the

ball can be determined from a single camera view; iii)

introducing four motion phases (rolling, flying, in possession,

and out of play) and employing phase-specific models to

estimate ball trajectories which enables high-level semantics

applied in low-level tracking. In addition, unreliable or missing

ball observations are recovered using spatio-temporal

constraints and temporal filtering. The system accuracy and

robustness is evaluated by comparing the estimated ball

positions and phases with manual ground-truth data of real

soccer sequences.

Index Terms— Motion analysis, video signal processing,

geometric modeling, tracking, multiple cameras,

three-dimensional vision.

I. INTRODUCTION

ith the development of computer vision and multimedia

technologies, many important applications have been

developed in automatic soccer video analysis and content-based

indexing, retrieval and visualization [1-3]. By accurately

tracking players and ball, a number of innovative applications

can be derived for automatic comprehension of sports events.

These include annotation of video content, summarization,

team strategy analysis and verification of referee decisions, as

Manuscript received Dec 20, 2005. This work was supported in part by the

European Commission under Project IST-2001-37422.

J. Ren is with School of Informatics, University of Bradford, BD7 1DP, U.K.,

on leave from the School of Computers, Northwestern Polytechnic University,

Xi‟an, 710072, China (email: j.ren@bradford.ac.uk; npurjc@yahoo.com).

J. Orwell and G. A. Jones are with Digital Imaging Research Centre, Kingston

University, Surrey, KT1 2EE, U.K. (email: j.orwell@kingston.ac.uk;

g.jones@kingston.ac.uk).

M. Xu is with Signal Processing Lab, Engineering Department, Cambridge

University, CB2 1PZ, U.K. (email: mx204@cam.ac.uk).

permission to use this material for any other purposes must be obtained from the

IEEE by sending an email to pubs-permissions@ieee.org.

well as the 2D or 3D reconstruction and visualization of action

[3-16]. In addition, some more recent work on tracking of

players and the ball can be also found in [27-29].

In a soccer match, the ball is invariably the focus of

attention. Although players can be successfully detected and

tracked on the basis of color and shape [1, 10, 12], similar

methods cannot be extended to ball detection and tracking for

several reasons. First, the ball is small and exhibits irregular

shape, variable size and inconsistent color when moving

rapidly, as illustrated in Figure 1. Second, the ball is frequently

occluded by players or is out of all camera fields of view (FOV),

such as when it is kicked high in the air. Finally, the ball often

leaves the ground surface, and its 3D position cannot be

uniquely determined without the measurements from at least

two cameras with overlapping fields of view. Therefore, 3D

ball position estimation and tracking is, arguably, the most

important challenge in soccer video analysis. In this paper the

problem under investigation is the automatic ball tracking from

multiple fixed cameras.

A. Related Work

Generally, TV broadcast cameras or fixed-cameras around

the stadium are the two usual sources of soccer image streams.

While TV imagery generally provides high resolution data of

the ball in the image centre, the complex camera movements

and partial views of the field, make it hard to obtain accurate

camera parameters for on-field ball positioning. On the other

hand, fixed cameras are easily calibrated, but their wide-angle

field of view makes ball detection more difficult, since the ball

is often represented by only a small number of pixels.

In the soccer domain, fully automatic methods for limited

scene understanding have been proposed, e.g. recognition of

replays from cinematic features extracted from broadcast TV

data [1] and detection of the ball in broadcast TV data [1, 2,

4-9]. Gong et al adopted white color and circular shape to

detect balls in image sequences [1]. In Yow et al [2], the ball is

detected by template matching in each of the reference frames

and then tracked between each pair of these reference frames.

Seo et al applied template matching and Kalman filter to track

balls after manual initialization [4]. Tong et al [5] employed

indirect ball detection by eliminating non-ball regions using

color and shape constraints. In Yamada et al [6], white regions

Real-time Modeling of 3D Soccer Ball

Trajectories from Multiple Fixed Cameras

Jinchang Ren, James Orwell, Graeme A Jones and Ming Xu

Fig. 1. Ball samples in various sizes, shapes and colors.

are taken as ball candidates after removing of players and field

lines. In Yu et al [7, 8], candidate balls are first identified by

size range, color and shape, and then these candidates are

further verified by trajectory mining with a Kalman filter.

D‟Orazio et al [9] detected the ball using a modified Hough

transform along with a neural classifier.

Using soccer sequences from fixed cameras, usually there

are two steps for the estimation and tracking of 3D ball

positions. Firstly, the ball is detected and tracked in each single

view independently. Then, 2D ball positions from different

camera views are integrated to obtain 3D positions using

known motion models [10-12]. Ohno et al arranged eight

cameras to attain a full view of the pitch [10]. They modeled the

3D ball trajectory by considering air friction and gravity which

depend on an unsolved initial velocity. Matsumoto et al [11]

used four cameras in their optimized viewpoint determination

system, in which template matching is also applied for ball

detection. Bebie and Bieri [12] employed two cameras for

soccer game reconstruction, and modeled 3D trajectory

segments by Hermite spline curves. However, about one-fifth of

the ball positions need to be set manually before estimation. In

Kim et al [13] and Reid and North [14], reference players and

shadows were utilized in the estimation of 3D ball positions.

These are unlikely to be robust as the shadow positions depend

more on light source positions than on camera projections.

B. Contributions of This Work

In this paper, a system is presented for model-based 3D ball

tracking from real soccer videos. The main contributions can

be summarized as follows.

Firstly, a motion-based thresholding process along with

temporal filtering is used to detect the ball, which has proved to

be robust to the inevitable variations in ball color and size that

result from its rapid movement. Meanwhile, a probability

measure is defined to capture the likelihood that any specific

detected moving object represents the ball.

Secondly, the 3D ball motion is modeled as a series of planar

curves each residing in a vertical virtual plane (VVP), which

involves geometric based vision techniques for 3D ball

positioning. To determine each vertical plane, at least two

observed positions of the ball with reliable height estimate are

required. These reliable estimates are obtained by either

recognizing a bouncing on the ground from single view, or

triangulating from multiple views. Based on these VVPs, the

3D ball positions are determined in single camera views by

projections. Ball positions for frames without any valid

observations are easily estimated by polynomial interpolation

to allow a continuous 3D ball trajectory to be generated.

Thirdly, the ball trajectories are modeled as one of four

phases of ball motion – rolling, flying, in-possession and

out-of-play. These phase types were chosen because they each

require different models in trajectory recovery. For the first two

types, phase-specific models are employed to estimate ball

positions in linear and parabolic trajectories, respectively. It is

shown how two 3D points are sufficient to estimate the

parabolic trajectory of a flying ball. In addition, the transitions

from one phase to another also provide useful semantic insight

into the progression of the game, i.e. they coincide with the

passes, kicks etc. that constitute the play.

C. Structure of the Paper

The remaining part of the paper is organized as follows. In

Section II, the method we used for tracking and detecting

moving objects is described, using Gaussian mixtures [17] and

calibrated cameras [19]. In Section III, a method is presented

for identifying the ball from these objects. These methods

operate in the image plane from each camera separately. In

Section IV, the data from multiple cameras is integrated, to

provide a segment-based model of the ball trajectory over the

entire pitch, estimating 3D ball positions from either single

view or multiple views. In Section V, a technique is introduced

for recognizing different phases of ball motion, and for

applying phase-specific models for robust ball tracking.

Experimental results are presented in Section VI and the

conclusions are drawn in Section VII.

II. MOVING OBJECTS DETECTION AND TRACKING

To locate and track players and the soccer ball, a

multi-modal adaptive background model is utilized which

provides robust foreground detection using image differencing

[17]. This detection process is applied only to visible pitch

pixels of the appropriate color. Grouped foreground connected-

components (i.e. blobs) are tracked by a Kalman filter which

estimates 2D position, velocity and object dimensions. These

2D positions and dimensions are converted to 3D coordinates

on the pitch. Greater detail is given in the subsections below.

A. Determining Pitch Masks

Rather than process the whole image, a pitch mask is

developed to avoid processing pixels containing spectators.

This mask is defined as the intersection of the geometry-based

mask

and the color-based mask

, as shown in Figure

2. The former constrains processing to only those pixels on the

pitch, and can be easily derived from a coordinate transform of

the position of the pitch in the ground plane to the image plane

as follows. For each image pixel

, compute its corresponding

ground-plane point

. If

locates within the pitch, then

is set to 255 in

, otherwise 0. Note, however, that parts of

the pitch can be occluded by foreground spectators or parts of

the stadium. Thus, a color-based mask is used to exclude these

elements from the overall pitch mask (i.e. the region to be

processed).

The hue component of the HSV color space is used to

identify the region of the background image representing the

pitch, since it is robust to shadows and other variations in the

appearance of the grass. As it is assumed that the pitch region

has an approximately uniform color and occupies the dominant

area of the background image, pixels belonging to the pitch will

contribute to the largest peak in any hue histogram. Lower and

upper hue thresholds

and

delimit an interval around

the position

of this maximum. Defined as the positions at

which the histogram has decreased by 90% of the peak

frequency, image pixels contributing to this interval are

included in the color-based mask

A morphological closing operation is performed on

bridge the gaps caused by the white field lines in the initial

color-based mask. Thus the final mask,

, can be generated

as follows:

BHHvuHvuM

]},[),(|),{(



(1)

MMM 

(2)

where the morphological closing operation is denoted by



and

is its square structuring element of size

66

B. Detecting Moving Objects

Over the mask

detected above, foreground pixels are

located using the robust multi-modal adaptive background

model [17]. Firstly, an initial background image is determined

by a per-pixel Gaussian Mixture Model, and then the

background image is progressively updated using a running

average algorithm for efficiency.

Each per-pixel Gaussian Mixture Model is represented as

(

)()()(



), where

)()(



and

)( j



are the mean,

root of the trace of covariance matrix, and weight of the j

distribution at frame k. The distribution which matches each

new pixel observation

is updated as follows:













)()()1(

)1(

kkkkkk

kkk

μIμI

Iμμ





(3)

where



is the updating rate satisfying

10 



. For each

unmatched distribution, the parameters remain the same but its

weight decreases. The initial background image is selected as

the distribution with the greatest weight at each pixel.

Given the input image

, the foreground binary mask

can be generated by comparing

||||

1



μI

against a

threshold, i.e.



5.2

. To accelerate the process of updating

the background image, a running average algorithm is further

employed after the initial background and foreground have

been estimated:

kkHkHkkLkLk

FF ])1([])1([

11 

 μIμIμ



(4)

where

is the complement of

. The use of two update

weights (where

10 



) ensures that the

background image is updated slowly in the presence of

foreground regions. Updating is required even when a pixel is

flagged as moving to allow the system to overcome mistakes in

the initial background estimate.

Inside these foreground masks, a set of foreground regions

are generated using connected component analysis. Each

region is represented by its centroid

),(

, area

, and

bounding box where

),(

and

),(

are the top-left and

bottom-right corners of the bounding box.

C. Tracking Moving Objects

A Kalman tracker is used in the image plane to filter noisy

measurements and split merged objects because of frequent

occlusions of players and the ball. The state

and

measurement

are given by:

22110000

][ crcrcrcr





(5)

221100

][ crcrcr

z

(6)

where

),(

is the centroid,

),(



is the velocity,

),(

and

),(

are the top-left and bottom-right

corners of the bounding box respectively (such that

rr 

and

cc 

) and

),(

cr 

and

),(

cr 

are the

relative positions of the two opposite corners to the centroid.

The state transition and measurement equations in the Kalman

filter are:

)()()(

)()()1(

kkk

IIII

vxHz

wxAx





(7)

where

and

are the image plane process noise and

measurement noise, and

and

are the state transition

matrix and measurement matrix, respectively.

















2222

IOOO

OIOO

OOIO

OOII

(a) (b)

Fig. 2. Extraction of pitch masks based on both color and geometry:

(a) Original background image, (b) Geometry-based mask of pitch, (c)

Color-based mask of pitch, and (d) Final mask obtained.















2222

IOOI

OIOI

OOOI

(8)

In equation (8),

and

represent

22

identity and

zero metrics;

T

is the time interval between frames. Further

detail on the method for data association and handling of

occlusions can be found in [18].

D. Computing Ground Plane Positions

Using the Tsai‟s algorithm for camera calibration [19], the

measurements are transformed from image co-ordinates into

world co-ordinates. Basically, the pin-hole model of 3D-2D

perspective projection is employed in [19] to estimate totally 11

intrinsic and extrinsic camera parameters. In addition,

effective dimensions of pixel in images are obtained in both

horizontal and vertical directions as two fixed intrinsic

constants. These two constants are then taken to calculate the

world co-ordinates measurements of the objects on the basis of

detected image-plane bounding boxes. Let

),,( zyx

denote

the 3D object position in world co-ordinates, then

and

are estimated by using the center point of the bottom line of

each bounding box, and

initialized as zero. Until Section

IV, all objects are assumed to lie on the ground plane. (This

assumption is usually true for players, but the ball could be

anywhere on the line between that ground plane point and the

camera position). For each tracked object, a position and

attribute measurement vector is defined as

][

yxi

vvzyxp

and

][ nahw

a

. In

addition, a ground plane velocity

),(

is estimated from

the projection of the image-plane velocity (which is obtained

from the image plane tracking process) onto the ground plane.

Note that this ground-plane velocity is not intended to estimate

the real velocity, in cases where the ball is off the ground. The

attributes

hw,

and

are an object‟s width, height and area,

also measured in meters (and meters squared), and calculated

by assuming the object touches the ground plane. Besides, each

object is validated before further processing provided that its

size satisfies

mw 1.0

mh 1.0

and

03.0 ma 

Finally,

is the longevity of the tracked object, measured in

frames.

III. DETECTING BALL-LIKE FEATURES

To identify ball-like features in a single-view process, each

of the tracked objects is attributed with a likelihood l that

represents the ball. The two elementary properties to

distinguish the ball from players and other false alarms are its

size and color. Three simple features are used to describe the

size of the object, i.e. its width, height, and area, in which

measurements in real-world units are adopted for robustness

against variable sizes of the ball in image plane. A fourth

feature derived from its color appearance, measures the

proportion of the object‟s area that is white.

To discriminate the ball from other objects, a

straightforward process is to apply fixed thresholds to these

features. However, this suffers from several difficulties. Firstly,

false alarms such as fragmented field lines or fragments of

players (especially socks) cannot always be discriminated.

Secondly, if no information is available about the height of the

ball, the estimate of the dimensions may be inaccurate. For

example, by assuming the ball is touching the ground plane, an

airborne ball will appear to be a larger object. Thirdly, the

image of a fast-moving ball is affected by motion blurring,

rendering it larger and less white than a stationary (or slower

moving) ball.

A key observation from soccer videos is that the ball in play

is nearly always moving, which suggests that the velocity may

be a useful additional discriminant. Thus, as field markings are

stationary the majority of these markings can be discriminated

from the ball by thresholding both the size and absolute velocity

of the detected object.

Another category of false alarms is caused by a part of a

player that has become temporarily disassociated from the

remainder of the player. A typical cause of this phenomenon is

imperfect foreground segmentation. However, such transitory

artifacts do not in general persist for longer than a couple of

(a)

(b)

(c)

(d)

Fig. 3. Tracked ball with ID and assigned likelihood (a) Id=7, l=0.9 (b)

l=0.0, the ball is moving out of current camera view (c) Id=16, l=0.9

and (d) ball is merged with player 9 in frame #977, #990, #1044, and

#1056, respectively.

frames, whereupon the correct representation is resumed.

Therefore, this category of false alarm can be correctly

discriminated by discarding all short-lived objects, i.e. whose

longevity is less than five frames.

Features describing the velocity and longevity of the

observations are used to solve the three difficulties described

above. These features (derived from tracking) are employed

alongside size and color features to help discriminate the ball

from other objects. The velocity feature is also useful when the

size of the detected ball is overestimated, either through a

motion-blur effect (proportional to the duration of the

shutter-speed), or a range error effect (incorrectly assuming

the object lies on the ground plane). Here, the key innovation is

to allow the size threshold to vary as a function of the estimated

ground-plane velocity. There is a simple rationale for the

motion-blur effect: the expected area is also directly

proportional to the image-plane speed. The range error effect is

more complicated as the 3D trajectory of the ball may be

directly towards the camera generating zero velocity in the

image plane. However, in general it can be assumed that the

ball rapidly moving in the image plane is more likely to be

positioned above the ground plane, and therefore, the size

threshold should be increased to accommodate the consequent

over-estimation of the ball size.

As for a standard soccer ball, it has a constant diameter

(between

m216.0

and

m226.0

) and an area (of a great circle)

about

04.0 m

. Considering over-estimated ball size

during fast movement, two thresholds for the width and height

of the ball,

and

, are defined by

Tvdh

Tvdw



(9)

For robustness, valid size ranges of the ball are required

satisfying

5/||

dww 

5/||

dhh 

, and

)(||8/|| Tvvaaa



. In addition, the proportion

of white color within the object is required no less than 30% of

the whole area. All objects having size and color outside the

prescribed thresholds are assigned a likelihood of zero and

excluded from further processing. Each remaining object is

classed as a ball candidate, and assigned an estimate of the

likelihood that represents the ball. The proposed form for this

estimate is the following equation, incorporating both its

absolute velocity

and longevity

)1(

max





(10)

where

max

is the maximum absolute velocity of all the objects

detected in the given camera, at a given frame (including the

ball, if visible, and also non-ball objects), and

is a constant

parameter. Thus, faster moving objects are considered more

likely to be the ball based on the fact that, in the professional

game, the ball normally moves faster than other objects.

Figure 3 shows partial views of camera #1 with detected ball

at frame 977, 990, 1044 and 1056, respectively. The ball or

each player is assigned with a unique ID unless it is near the

(a)

(b)

(c)

Fig. 4. Thirty seconds of single camera tracking data from camera #1 (a) and filtered results of the ball in (b) and (c), in which time

t moves from left to right, and the x-coordinate of the objects c

is plotted up the y-axis. In (b) and (c), most-likely ball is labeled in

black, (b) is the result filtering on appearance and velocity and (c) is the result after temporal filtering.

HTML Viewer

Frequently Asked Questions (2)

Q1. What are the future works in this paper?

The authors model the ball trajectory as curve segments in consecutive virtual vertical planes, which can accurately approximate the real cases even in complex situation. Using geometric reconstruction techniques, the authors can successfully estimate 3D ball positions from a single view.

Q2. What are the contributions in this paper?

In this paper, a real-time 3D soccer ball tracking system is proposed, using image sequences from multiple fixed cameras as input.

Real-Time Modeling of 3-D Soccer Ball Trajectories From Multiple Fixed Cameras

Summary (6 min read)

Introduction

B. Contributions of This Work

C. Structure of the Paper

II. MOVING OBJECTS DETECTION AND TRACKING

A. Determining Pitch Masks

B. Detecting Moving Objects

C. Tracking Moving Objects

D. Computing Ground Plane Positions

III. DETECTING BALL-LIKE FEATURES

IV. MODEL BASED 3D POSITION ESTIMATION IN SINGLE AND

A. The Ball Motion Model

B. Fully Determined Estimates from a Single View

C. Fully Determined Estimates from Multiple Views

E. Estimation of Missed or Uncertain Ball Positions

A. Four Phases of Ball Motion

B. Estimating Motion Phases

C. Phase-specific Trajectory Estimation

A. System Architecture

B. Data Preparation and Results

C. Evaluation of Tracking Accuracy versus Latency

D. Evaluation of Phase Transition Accuracy

E. System Limitations

VII. CONCLUSIONS

Figures (14)

Citations

Cites background or methods from "Real-Time Modeling of 3-D Soccer Ba..."

References

"Real-Time Modeling of 3-D Soccer Ba..." refers methods in this paper

"Real-Time Modeling of 3-D Soccer Ba..." refers methods in this paper

"Real-Time Modeling of 3-D Soccer Ba..." refers methods in this paper

Related Papers (5)

Frequently Asked Questions (2)

Q1. What are the future works in this paper?

Q2. What are the contributions in this paper?