# Real-Time Modeling of 3-D Soccer Ball Trajectories From Multiple Fixed Cameras

TL;DR: In this paper, model-based approaches for real-time 3-D soccer ball tracking are proposed, using image sequences from multiple fixed cameras as input, and incorporating motion cues and temporal hysteresis thresholding in ball detection and employing phase-specific models to estimate ball trajectories.

Abstract: In this paper, model-based approaches for real-time 3-D soccer ball tracking are proposed, using image sequences from multiple fixed cameras as input. The main challenges include filtering false alarms, tracking through missing observations, and estimating 3-D positions from single or multiple cameras. The key innovations are: 1. incorporating motion cues and temporal hysteresis thresholding in ball detection; 2. modeling each ball trajectory as curve segments in successive virtual vertical planes so that the 3-D position of the ball can be determined from a single camera view; and 3. introducing four motion phases (rolling, flying, in possession, and out of play) and employing phase-specific models to estimate ball trajectories which enables high-level semantics applied in low-level tracking. In addition, unreliable or missing ball observations are recovered using spatio-temporal constraints and temporal filtering. The system accuracy and robustness are evaluated by comparing the estimated ball positions and phases with manual ground-truth data of real soccer sequences.

## Summary (6 min read)

### Introduction

- Model-based approaches for real-time 3D soccer ball tracking are proposed, using image sequences from multiple fixed cameras as input.
- 10, 12], similar methods cannot be extended to ball detection and tracking for several reasons.

### B. Contributions of This Work

- A system is presented for model-based 3D ball tracking from real soccer videos.
- The main contributions can be summarized as follows.
- Meanwhile, a probability measure is defined to capture the likelihood that any specific detected moving object represents the ball.
- Secondly, the 3D ball motion is modeled as a series of planar curves each residing in a vertical virtual plane (VVP), which involves geometric based vision techniques for 3D ball positioning.
- For the first two types, phase-specific models are employed to estimate ball positions in linear and parabolic trajectories, respectively.

### C. Structure of the Paper

- In Section II, the method the authors used for tracking and detecting moving objects is described, using Gaussian mixtures [17] and calibrated cameras [19].
- In Section III, a method is presented for identifying the ball from these objects.
- These methods operate in the image plane from each camera separately.
- In Section IV, the data from multiple cameras is integrated, to provide a segment-based model of the ball trajectory over the entire pitch, estimating 3D ball positions from either single view or multiple views.
- Experimental results are presented in Section VI and the conclusions are drawn in Section VII.

### II. MOVING OBJECTS DETECTION AND TRACKING

- To locate and track players and the soccer ball, a multi-modal adaptive background model is utilized which provides robust foreground detection using image differencing [17].
- This detection process is applied only to visible pitch pixels of the appropriate color.
- Grouped foreground connectedcomponents (i.e. blobs) are tracked by a Kalman filter which estimates 2D position, velocity and object dimensions.
- Greater detail is given in the subsections below.

### A. Determining Pitch Masks

- Rather than process the whole image, a pitch mask is developed to avoid processing pixels containing spectators.
- The former constrains processing to only those pixels on the pitch, and can be easily derived from a coordinate transform of the position of the pitch in the ground plane to the image plane as follows.
- Note, however, that parts of the pitch can be occluded by foreground spectators or parts of the stadium.
- The hue component of the HSV color space is used to identify the region of the background image representing the pitch, since it is robust to shadows and other variations in the appearance of the grass.
- Defined as the positions at which the histogram has decreased by 90% of the peak frequency, image pixels contributing to this interval are included in the color-based mask cM .

### B. Detecting Moving Objects

- Over the mask M detected above, foreground pixels are located using the robust multi-modal adaptive background model [17].
- Firstly, an initial background image is determined by a per-pixel Gaussian Mixture Model, and then the background image is progressively updated using a running average algorithm for efficiency.
- The distribution which matches each new pixel observation kI is updated as follows: )1( )1( T2 1 2 1 kkkkkk kkk μIμI Iμμ.
- For each unmatched distribution, the parameters remain the same but its weight decreases.
- Inside these foreground masks, a set of foreground regions are generated using connected component analysis.

### C. Tracking Moving Objects

- A Kalman tracker is used in the image plane to filter noisy measurements and split merged objects because of frequent occlusions of players and the ball.
- The state Ix and measurement The state transition and measurement equations in the Kalman filter are: )( )1( kkk kkk IIII IIII vxHz wxAx (7) where Iw and Iv are the image plane process noise and measurement noise, and IA and IH are the state transition matrix and measurement matrix, respectively.
- Further detail on the method for data association and handling of occlusions can be found in [18].

### D. Computing Ground Plane Positions

- Using the Tsai‟s algorithm for camera calibration [19], the measurements are transformed from image co-ordinates into world co-ordinates.
- Basically, the pin-hole model of 3D-2D perspective projection is employed in [19] to estimate totally 11 intrinsic and extrinsic camera parameters.
- In addition, effective dimensions of pixel in images are obtained in both horizontal and vertical directions as two fixed intrinsic constants.
- (This assumption is usually true for players, but the ball could be anywhere on the line between that ground plane point and the camera position).
- For each tracked object, a position and attribute measurement vector is defined as T][ yxi vvzyxp and T][ nahwi a .

### III. DETECTING BALL-LIKE FEATURES

- The two elementary properties to distinguish the ball from players and other false alarms are its size and color.
- In general it can be assumed that the ball rapidly moving in the image plane is more likely to be positioned above the ground plane, and therefore, the size threshold should be increased to accommodate the consequent over-estimation of the ball size.
- In addition, the proportion of white color within the object is required no less than 30% of the whole area.
- Candidates with a likelihood above 1h are unequivocally designated a „ball‟ label; and candidates with a likelihood below 3h are unequivocally classified as „not ball‟ (i.e. false alarms).
- Application of the temporal filter successfully locates the ball among these various candidates.

### IV. MODEL BASED 3D POSITION ESTIMATION IN SINGLE AND

- The detection results of the ball from all single views are integrated for estimation of 3D position.
- Otherwise, the 2D image position can only provide constraints for the 3D line on which, somewhere, the ball is located.
- After a segment-based model of the ball motion is presented, two methods are provided for determining 3D ball positions.
- The first method is for cases in which the ball is detected from only one camera: the instant when the ball bounces on the ground is detected and the corresponding 3D position is estimated as zero.
- The second is for cases in which the ball is visible from at least two cameras, thus integration from multiple observations are used.

### A. The Ball Motion Model

- During a soccer game, the ball is moving regularly from one place to another.
- In a special case when the ball is rolling on the ground, the curve will become a straight line.
- The complete ball trajectory can be modeled as a sequence of adjacent planar curve segments.
- While beyond the scope of this paper, if the ball is struck to impart significant spin about an axis, then it will „swerve‟ in the air and the assumption that the ball travels in a vertical plane is invalid, although the „swerve‟ may be approximated by several segments, each defined by a vertical plane.
- These estimated 3D ball positions are described as fully determined estimates, in contract to most observations, which are only determined up to a line passing through the camera focal point.

### B. Fully Determined Estimates from a Single View

- From a single camera view, the strategy adopted for determining a 3D ball position, is to detect an occasion in which the ball bounces off some other object: players, ground or goal-post.
- If the height at which the bounce occurs can be estimated, then this height, together with its 2D image location, completely determines the 3D ball position at this time.
- Then, the height of the ball position is estimated as zero if there are no players or other objects near the ball.
- It can be assumed the ball is two meters off the ground plane when it strikes a player‟s head.

### C. Fully Determined Estimates from Multiple Views

- When a ball is observed in multiple cameras, there are multiple projection lines from each camera position through the corresponding observation (which, in this application, can be terminated at the ground plane).
- False observations may exist which will lead to incorrect solutions.
- Some false estimates can be generated from the mis-association of the ball (in one camera) and e.g. some background clutter (from another camera).
- When the different measurement covariances for 1p and 2p are considered, the distances from b to 1p and 2p are changed into Mahalanobis distances.

### E. Estimation of Missed or Uncertain Ball Positions

- For those frames without ball observations in any single view or with ball observations of lower likelihood, i.e. less than a given threshold, the 3D ball positions are estimated by using polynomial interpolation in a curve on the corresponding vertical planes (see Section V).
- Each curve is calculated from two fully determined estimates.
- If more fully determined estimates are available, then they could all be incorporated into the estimation of the trajectory based on a more general least squares estimator [25].

### A. Four Phases of Ball Motion

- It is proposed to model the ball motion at each instant into four phases, namely rolling (R), flying (F), in-possession (P) and out-of-play (O).
- A different tracking model is applicable to each phase, and furthermore the designation also provides a useful insight into the semantic progression of the game.
- Though some other semantic events have been analyzed for soccer video understanding [21-24], they are focused on players‟ motion in broadcasting context, yet phase transitions in the ball trajectory have not been discussed.
- This is because in-possession phases act as special periods that initialize other phases (such as rolling or flying), i.e. literally kicking the ball off in a particular direction.
- Furthermore, the pattern of play is punctuated by periods when the ball is out-of-play, e.g. caused by fouls, ball crossing touchline, off-side or in-possession by the goal-keeper.

### B. Estimating Motion Phases

- Given observations of the ball from separate cameras and height cues obtained as described in Section IV, what follows is the estimation of the current ball phase.
- Prior to this stage, at each frame there is at most one estimate of ball position from each of the camera views, and each estimate is assigned a measure of the likelihood that it represents the ball.
- A „soft‟ classification [26] of the four phases is introduced, which is then input into a decision process to determine the final estimate of the phase.
- Smooth functions are chosen to provide a measure, bounded between 0 and 1, of the membership of each motion phase.
- For each of the in-play phases, a specific model is then employed for robust trajectory estimation below.

### C. Phase-specific Trajectory Estimation

- Finally, in this section, the three different in-play models of ball motion are described, starting with the flying trajectory.
- Disregarding air friction, the velocity parallel to the ground plane is constant and thus the ball follows a single parabolic trajectory.
- Disregarding all friction, )(tx and )(ty will satisfy the following equations, whether the ball is rolling or flying: )( )(.

### A. System Architecture

- The proposed system was tested on data captured from matches played at Fulham Football Club, U.K. in the 2001 Premiership Season captured by eight fixed cameras.
- The 3D ball trajectory is visualized along with tracked players on a virtual playfield.
- It is the multi-view tracker which is responsible for orchestrating the process by which (single-view) Feature Servers generate their results of features.
- Eight cameras were statically mounted around the stadium as described in Figure 8.
- The white balance was set to automatic on all cameras.

### B. Data Preparation and Results

- The proposed model has been tested in several sequences with up to 8 cameras, and each sequence has over 5500 frames.
- Then, all the ball candidates detected from 8 sequences are integrated for multi-view tracking of the ball and 3D positioning.
- Then, its 3D position is estimated by using multi-view geometry constraints.
- For the frames between two GT frames, the estimated GT positions are linearly interpolated.
- The first is the distance (in meters) between estimated and ground-truth (GT) ball positions, in which only 2D distance in x-y plane is used.

### C. Evaluation of Tracking Accuracy versus Latency

- In eight testing sequences of 5500 frames each, 3D ball positions are estimated in about 3720 frames.
- The ground-plane errors among calibration projections are estimated to be between 0.1 and 2.5 meters, depending on the distance of the ground point to the cameras.
- Estimated ball positions are shown as a magenta trajectory.
- From this table it can be observed that, without temporal filtering, only 34.5% ball positions can be recovered.

### D. Evaluation of Phase Transition Accuracy

- Figure 11 illustrates a complete 3D trajectory history (ground plane projection) from frame 0 to 954 and its corresponding phase transitions.
- Secondly, about 25% rolling and 13% in-possession balls are misjudged from each other, which happens when a rolling ball cannot be observed in a crowd or an in-possession ball is rolling near the player who possessed the ball.
- This misjudgment affects the accuracy of the ground-truth as well as the estimate from the proposed method.
- Heights below fz will not be recognized correctly.
- It is worth noting that there are some phase transitions missing from the estimated trajectory.

### E. System Limitations

- As discussed above, there are two main drawbacks in their system in terms of tracking accuracy and phase transition accuracy owing to severe occlusions or insufficient observations.
- In principle, most of these problems may be resolved by putting additional cameras, even capturing images over the pitch.
- Occlusions are still unavoidable in the soccer context which constraints the overall recovery rate and accuracy.
- Moreover, their system ignores air friction and cannot model some complex movements of the ball, such as the „swerve‟, and this may be an interesting topic for further investigation.

### VII. CONCLUSIONS

- A method has been described for real-time 3D trajectory estimation of the ball in a soccer game.
- In the proposed system, video data is captured from multiple fixed and calibrated cameras.
- Temporal filtering of the ball likelihood is also proved essential in robust ball detection and tracking.
- One interesting feature of the approach is that it uses high-level phase transition information to aid low-level tracking.
- Through recognition of the four phases, phase-specific models are successfully applied in estimating 3D position of the ball.

Did you find this useful? Give us your feedback

...read more

##### Citations

^{1}

581 citations

^{1}

440 citations

161 citations

159 citations

147 citations

### Cites background or methods from "Real-Time Modeling of 3-D Soccer Ba..."

...In [44] multiple fixed cameras are used for real time modeling of 3D ball trajectories....

[...]

...Different ball samples in various sizes, shapes, and colors (from [44])....

[...]

...[44] Multiview tracking eight cameras Eight fixed cameras 720 576 Model-based 3D position estimation 8 seq of 5500 frames 3 s delay...

[...]

...these cases, many works propose approaches based on the evaluation of the ball trajectory [44–48,51] since the analysis of kinematic parameters allows the ball detected from among a set of ball candidates....

[...]

##### References

26,639 citations

### "Real-Time Modeling of 3-D Soccer Ba..." refers methods in this paper

...The filter uses hysteresis to process the likelihood estimates into discrete labels, in an approach similar to the Canny filter [20]....

[...]

12,721 citations

10,545 citations

### "Real-Time Modeling of 3-D Soccer Ba..." refers methods in this paper

...If more fully determined estimates are available, then they could all be incorporated into the estimation of the trajectory based on a more general least-squares estimator [ 25 ]....

[...]

...Moreover, if more than two ball positions have been decided within a curve segment, then a least-squares calculation of the trajectory segment can be used to provide a more robust estimate [ 25 ]....

[...]

7,436 citations

### "Real-Time Modeling of 3-D Soccer Ba..." refers methods in this paper

...To locate and track players and the soccer ball, a multimodal adaptive background model is utilized that provides robust foreground detection using image differencing [17]....

[...]

...In Section II, the method we used for tracking and detecting moving objects is described, using Gaussian mixtures [17] and calibrated cameras [19]....

[...]

...Over the mask detected above, foreground pixels are located using the robust multimodal adaptive background model [17]....

[...]

^{1}

5,771 citations

##### Related Papers (5)

##### Frequently Asked Questions (2)

###### Q2. What are the contributions in this paper?

In this paper, a real-time 3D soccer ball tracking system is proposed, using image sequences from multiple fixed cameras as input.