TL;DR: In this paper, model-based approaches for real-time 3-D soccer ball tracking are proposed, using image sequences from multiple fixed cameras as input, and incorporating motion cues and temporal hysteresis thresholding in ball detection and employing phase-specific models to estimate ball trajectories.
Abstract: In this paper, model-based approaches for real-time 3-D soccer ball tracking are proposed, using image sequences from multiple fixed cameras as input. The main challenges include filtering false alarms, tracking through missing observations, and estimating 3-D positions from single or multiple cameras. The key innovations are: 1. incorporating motion cues and temporal hysteresis thresholding in ball detection; 2. modeling each ball trajectory as curve segments in successive virtual vertical planes so that the 3-D position of the ball can be determined from a single camera view; and 3. introducing four motion phases (rolling, flying, in possession, and out of play) and employing phase-specific models to estimate ball trajectories which enables high-level semantics applied in low-level tracking. In addition, unreliable or missing ball observations are recovered using spatio-temporal constraints and temporal filtering. The system accuracy and robustness are evaluated by comparing the estimated ball positions and phases with manual ground-truth data of real soccer sequences.
Model-based approaches for real-time 3D soccer ball tracking are proposed, using image sequences from multiple fixed cameras as input.
10, 12], similar methods cannot be extended to ball detection and tracking for several reasons.
A. Related Work
Generally, TV broadcast cameras or fixed-cameras around the stadium are the two usual sources of soccer image streams.
While TV imagery generally provides high resolution data of the ball in the image centre, the complex camera movements and partial views of the field, make it hard to obtain accurate camera parameters for on-field ball positioning.
Tong et al [5] employed indirect ball detection by eliminating non-ball regions using color and shape constraints.
Using soccer sequences from fixed cameras, usually there are two steps for the estimation and tracking of 3D ball positions.
These are unlikely to be robust as the shadow positions depend more on light source positions than on camera projections.
B. Contributions of This Work
A system is presented for model-based 3D ball tracking from real soccer videos.
The main contributions can be summarized as follows.
Meanwhile, a probability measure is defined to capture the likelihood that any specific detected moving object represents the ball.
Secondly, the 3D ball motion is modeled as a series of planar curves each residing in a vertical virtual plane (VVP), which involves geometric based vision techniques for 3D ball positioning.
For the first two types, phase-specific models are employed to estimate ball positions in linear and parabolic trajectories, respectively.
C. Structure of the Paper
In Section II, the method the authors used for tracking and detecting moving objects is described, using Gaussian mixtures [17] and calibrated cameras [19].
In Section III, a method is presented for identifying the ball from these objects.
These methods operate in the image plane from each camera separately.
In Section IV, the data from multiple cameras is integrated, to provide a segment-based model of the ball trajectory over the entire pitch, estimating 3D ball positions from either single view or multiple views.
Experimental results are presented in Section VI and the conclusions are drawn in Section VII.
II. MOVING OBJECTS DETECTION AND TRACKING
To locate and track players and the soccer ball, a multi-modal adaptive background model is utilized which provides robust foreground detection using image differencing [17].
This detection process is applied only to visible pitch pixels of the appropriate color.
Grouped foreground connectedcomponents (i.e. blobs) are tracked by a Kalman filter which estimates 2D position, velocity and object dimensions.
Greater detail is given in the subsections below.
A. Determining Pitch Masks
Rather than process the whole image, a pitch mask is developed to avoid processing pixels containing spectators.
The former constrains processing to only those pixels on the pitch, and can be easily derived from a coordinate transform of the position of the pitch in the ground plane to the image plane as follows.
Note, however, that parts of the pitch can be occluded by foreground spectators or parts of the stadium.
The hue component of the HSV color space is used to identify the region of the background image representing the pitch, since it is robust to shadows and other variations in the appearance of the grass.
Defined as the positions at which the histogram has decreased by 90% of the peak frequency, image pixels contributing to this interval are included in the color-based mask cM .
B. Detecting Moving Objects
Over the mask M detected above, foreground pixels are located using the robust multi-modal adaptive background model [17].
Firstly, an initial background image is determined by a per-pixel Gaussian Mixture Model, and then the background image is progressively updated using a running average algorithm for efficiency.
The distribution which matches each new pixel observation kI is updated as follows: )1( )1( T2 1 2 1 kkkkkk kkk μIμI Iμμ.
For each unmatched distribution, the parameters remain the same but its weight decreases.
Inside these foreground masks, a set of foreground regions are generated using connected component analysis.
C. Tracking Moving Objects
A Kalman tracker is used in the image plane to filter noisy measurements and split merged objects because of frequent occlusions of players and the ball.
The state Ix and measurement The state transition and measurement equations in the Kalman filter are: )( )1( kkk kkk IIII IIII vxHz wxAx (7) where Iw and Iv are the image plane process noise and measurement noise, and IA and IH are the state transition matrix and measurement matrix, respectively.
Further detail on the method for data association and handling of occlusions can be found in [18].
D. Computing Ground Plane Positions
Using the Tsai‟s algorithm for camera calibration [19], the measurements are transformed from image co-ordinates into world co-ordinates.
Basically, the pin-hole model of 3D-2D perspective projection is employed in [19] to estimate totally 11 intrinsic and extrinsic camera parameters.
In addition, effective dimensions of pixel in images are obtained in both horizontal and vertical directions as two fixed intrinsic constants.
(This assumption is usually true for players, but the ball could be anywhere on the line between that ground plane point and the camera position).
For each tracked object, a position and attribute measurement vector is defined as T][ yxi vvzyxp and T][ nahwi a .
III. DETECTING BALL-LIKE FEATURES
The two elementary properties to distinguish the ball from players and other false alarms are its size and color.
In general it can be assumed that the ball rapidly moving in the image plane is more likely to be positioned above the ground plane, and therefore, the size threshold should be increased to accommodate the consequent over-estimation of the ball size.
In addition, the proportion of white color within the object is required no less than 30% of the whole area.
Candidates with a likelihood above 1h are unequivocally designated a „ball‟ label; and candidates with a likelihood below 3h are unequivocally classified as „not ball‟ (i.e. false alarms).
Application of the temporal filter successfully locates the ball among these various candidates.
IV. MODEL BASED 3D POSITION ESTIMATION IN SINGLE AND
The detection results of the ball from all single views are integrated for estimation of 3D position.
Otherwise, the 2D image position can only provide constraints for the 3D line on which, somewhere, the ball is located.
After a segment-based model of the ball motion is presented, two methods are provided for determining 3D ball positions.
The first method is for cases in which the ball is detected from only one camera: the instant when the ball bounces on the ground is detected and the corresponding 3D position is estimated as zero.
The second is for cases in which the ball is visible from at least two cameras, thus integration from multiple observations are used.
A. The Ball Motion Model
During a soccer game, the ball is moving regularly from one place to another.
In a special case when the ball is rolling on the ground, the curve will become a straight line.
The complete ball trajectory can be modeled as a sequence of adjacent planar curve segments.
While beyond the scope of this paper, if the ball is struck to impart significant spin about an axis, then it will „swerve‟ in the air and the assumption that the ball travels in a vertical plane is invalid, although the „swerve‟ may be approximated by several segments, each defined by a vertical plane.
These estimated 3D ball positions are described as fully determined estimates, in contract to most observations, which are only determined up to a line passing through the camera focal point.
B. Fully Determined Estimates from a Single View
From a single camera view, the strategy adopted for determining a 3D ball position, is to detect an occasion in which the ball bounces off some other object: players, ground or goal-post.
If the height at which the bounce occurs can be estimated, then this height, together with its 2D image location, completely determines the 3D ball position at this time.
Then, the height of the ball position is estimated as zero if there are no players or other objects near the ball.
It can be assumed the ball is two meters off the ground plane when it strikes a player‟s head.
C. Fully Determined Estimates from Multiple Views
When a ball is observed in multiple cameras, there are multiple projection lines from each camera position through the corresponding observation (which, in this application, can be terminated at the ground plane).
False observations may exist which will lead to incorrect solutions.
Some false estimates can be generated from the mis-association of the ball (in one camera) and e.g. some background clutter (from another camera).
When the different measurement covariances for 1p and 2p are considered, the distances from b to 1p and 2p are changed into Mahalanobis distances.
E. Estimation of Missed or Uncertain Ball Positions
For those frames without ball observations in any single view or with ball observations of lower likelihood, i.e. less than a given threshold, the 3D ball positions are estimated by using polynomial interpolation in a curve on the corresponding vertical planes (see Section V).
Each curve is calculated from two fully determined estimates.
If more fully determined estimates are available, then they could all be incorporated into the estimation of the trajectory based on a more general least squares estimator [25].
A. Four Phases of Ball Motion
It is proposed to model the ball motion at each instant into four phases, namely rolling (R), flying (F), in-possession (P) and out-of-play (O).
A different tracking model is applicable to each phase, and furthermore the designation also provides a useful insight into the semantic progression of the game.
Though some other semantic events have been analyzed for soccer video understanding [21-24], they are focused on players‟ motion in broadcasting context, yet phase transitions in the ball trajectory have not been discussed.
This is because in-possession phases act as special periods that initialize other phases (such as rolling or flying), i.e. literally kicking the ball off in a particular direction.
Furthermore, the pattern of play is punctuated by periods when the ball is out-of-play, e.g. caused by fouls, ball crossing touchline, off-side or in-possession by the goal-keeper.
B. Estimating Motion Phases
Given observations of the ball from separate cameras and height cues obtained as described in Section IV, what follows is the estimation of the current ball phase.
Prior to this stage, at each frame there is at most one estimate of ball position from each of the camera views, and each estimate is assigned a measure of the likelihood that it represents the ball.
A „soft‟ classification [26] of the four phases is introduced, which is then input into a decision process to determine the final estimate of the phase.
Smooth functions are chosen to provide a measure, bounded between 0 and 1, of the membership of each motion phase.
For each of the in-play phases, a specific model is then employed for robust trajectory estimation below.
C. Phase-specific Trajectory Estimation
Finally, in this section, the three different in-play models of ball motion are described, starting with the flying trajectory.
Disregarding air friction, the velocity parallel to the ground plane is constant and thus the ball follows a single parabolic trajectory.
Disregarding all friction, )(tx and )(ty will satisfy the following equations, whether the ball is rolling or flying: )( )(.
A. System Architecture
The proposed system was tested on data captured from matches played at Fulham Football Club, U.K. in the 2001 Premiership Season captured by eight fixed cameras.
The 3D ball trajectory is visualized along with tracked players on a virtual playfield.
It is the multi-view tracker which is responsible for orchestrating the process by which (single-view) Feature Servers generate their results of features.
Eight cameras were statically mounted around the stadium as described in Figure 8.
The white balance was set to automatic on all cameras.
B. Data Preparation and Results
The proposed model has been tested in several sequences with up to 8 cameras, and each sequence has over 5500 frames.
Then, all the ball candidates detected from 8 sequences are integrated for multi-view tracking of the ball and 3D positioning.
Then, its 3D position is estimated by using multi-view geometry constraints.
For the frames between two GT frames, the estimated GT positions are linearly interpolated.
The first is the distance (in meters) between estimated and ground-truth (GT) ball positions, in which only 2D distance in x-y plane is used.
C. Evaluation of Tracking Accuracy versus Latency
In eight testing sequences of 5500 frames each, 3D ball positions are estimated in about 3720 frames.
The ground-plane errors among calibration projections are estimated to be between 0.1 and 2.5 meters, depending on the distance of the ground point to the cameras.
Estimated ball positions are shown as a magenta trajectory.
From this table it can be observed that, without temporal filtering, only 34.5% ball positions can be recovered.
D. Evaluation of Phase Transition Accuracy
Figure 11 illustrates a complete 3D trajectory history (ground plane projection) from frame 0 to 954 and its corresponding phase transitions.
Secondly, about 25% rolling and 13% in-possession balls are misjudged from each other, which happens when a rolling ball cannot be observed in a crowd or an in-possession ball is rolling near the player who possessed the ball.
This misjudgment affects the accuracy of the ground-truth as well as the estimate from the proposed method.
Heights below fz will not be recognized correctly.
It is worth noting that there are some phase transitions missing from the estimated trajectory.
E. System Limitations
As discussed above, there are two main drawbacks in their system in terms of tracking accuracy and phase transition accuracy owing to severe occlusions or insufficient observations.
In principle, most of these problems may be resolved by putting additional cameras, even capturing images over the pitch.
Occlusions are still unavoidable in the soccer context which constraints the overall recovery rate and accuracy.
Moreover, their system ignores air friction and cannot model some complex movements of the ball, such as the „swerve‟, and this may be an interesting topic for further investigation.
VII. CONCLUSIONS
A method has been described for real-time 3D trajectory estimation of the ball in a soccer game.
In the proposed system, video data is captured from multiple fixed and calibrated cameras.
Temporal filtering of the ball likelihood is also proved essential in robust ball detection and tracking.
One interesting feature of the approach is that it uses high-level phase transition information to aid low-level tracking.
Through recognition of the four phases, phase-specific models are successfully applied in estimating 3D position of the ball.
TL;DR: In this article, the radio tag doubles the transmitted frequency and returns the processed signal to a transceiver typically located on the player, and the currently transmitted frequency is then compared with the received frequency to obtain a difference frequency from which an apparatus may estimate the distance.
Abstract: Systems, apparatuses, and methods estimate the distance between a player and a ball by transmitting a chirp (sweep signal) to a radio tag located on the ball. During the chirp, the frequency of the transmitted signal is changed in a predetermined fashion. The radio tag doubles the transmitted frequency and returns the processed signal to a transceiver typically located on the player. The currently transmitted frequency is then compared with the received frequency to obtain a difference frequency from which an apparatus may estimate the distance. The apparatus may simultaneously receive the processed signal from the radio tag while transmitting the sweep signal.
TL;DR: In this paper, a sensor system is adapted for use with an article of footwear and includes an insert member including a first layer and a second layer, a port connected to the insert and configured for communication with an electronic module, a plurality of force and/or pressure sensors on the insert member, and leads connecting the sensors to the port.
Abstract: A sensor system is adapted for use with an article of footwear and includes an insert member including a first layer and a second layer, a port connected to the insert and configured for communication with an electronic module, a plurality of force and/or pressure sensors on the insert member, and a plurality of leads connecting the sensors to the port.
TL;DR: A survey of soccer video analysis systems for different applications: video summarization, provision of augmented information, high-level analysis, and for each application area the computer vision methodologies, their strengths and weaknesses are analyzed.
Abstract: This paper presents a survey of soccer video analysis systems for different applications: video summarization, provision of augmented information, high-level analysis. Computer vision techniques have been adapted to be applicable in the challenging soccer context. Different semantic levels of interpretation are required according to the complexity of the corresponding applications. For each application area we analyze the computer vision methodologies, their strengths and weaknesses and we investigate whether these approaches can be applied to extensive and real time soccer video analysis.
164 citations
Cites background or methods from "Real-Time Modeling of 3-D Soccer Ba..."
...In [44] multiple fixed cameras are used for real time modeling of 3D ball trajectories....
[...]
...Different ball samples in various sizes, shapes, and colors (from [44])....
[...]
...[44] Multiview tracking eight cameras Eight fixed cameras 720 576 Model-based 3D position estimation 8 seq of 5500 frames 3 s delay...
[...]
...these cases, many works propose approaches based on the evaluation of the ball trajectory [44–48,51] since the analysis of kinematic parameters allows the ball detected from among a set of ball candidates....
TL;DR: In this article, a method for processing data includes receiving a depth map of a scene containing a humanoid form, which is processed so as to identify three-dimensional (3D) connected components in the scene, each connected component including a set of the pixels that are mutually adjacent and have mutually-adjacent depth values.
Abstract: A method for processing data includes receiving a depth map of a scene containing a humanoid form. The depth map is processed so as to identify three-dimensional (3D) connected components in the scene, each connected component including a set of the pixels that are mutually adjacent and have mutually-adjacent depth values. Separate, first and second connected components are identified as both belonging to the humanoid form, and a representation of the humanoid form is generated including both of the first and second connected components.
TL;DR: In this paper, a method for image processing includes receiving a depth image of a scene containing a human subject and receiving a color image of the scene containing the human subject, and a part of a body of the subject is identified in at least one of the images.
Abstract: A method for image processing includes receiving a depth image of a scene containing a human subject and receiving a color image of the scene containing the human subject. A part of a body of the subject is identified in at least one of the images. A quality of both the depth image and the color image is evaluated, and responsively to the quality, one of the images is selected to be dominant in processing of the part of the body in the images. The identified part is localized in the dominant one of the images, while using supporting data from the other one of the images.
TL;DR: There is a natural uncertainty principle between detection and localization performance, which are the two main goals, and with this principle a single operator shape is derived which is optimal at any scale.
Abstract: This paper describes a computational approach to edge detection. The success of the approach depends on the definition of a comprehensive set of goals for the computation of edge points. These goals must be precise enough to delimit the desired behavior of the detector while making minimal assumptions about the form of the solution. We define detection and localization criteria for a class of edges, and present mathematical forms for these criteria as functionals on the operator impulse response. A third criterion is then added to ensure that the detector has only one response to a single edge. We use the criteria in numerical optimization to derive detectors for several common image features, including step edges. On specializing the analysis to step edges, we find that there is a natural uncertainty principle between detection and localization performance, which are the two main goals. With this principle we derive a single operator shape which is optimal at any scale. The optimal detector has a simple approximate implementation in which edges are marked at maxima in gradient magnitude of a Gaussian-smoothed image. We extend this simple detector using operators of several widths to cope with different signal-to-noise ratios in the image. We present a general method, called feature synthesis, for the fine-to-coarse integration of information from operators at different scales. Finally we show that step edge detector performance improves considerably as the operator point spread function is extended along the edge.
28,073 citations
"Real-Time Modeling of 3-D Soccer Ba..." refers methods in this paper
...The filter uses hysteresis to process the likelihood estimates into discrete labels, in an approach similar to the Canny filter [20]....
TL;DR: In this paper, Monte Carlo techniques are used to fit dependent and independent variables least squares fit to a polynomial least-squares fit to an arbitrary function fitting composite peaks direct application of the maximum likelihood.
Abstract: Uncertainties in measurements probability distributions error analysis estimates of means and errors Monte Carlo techniques dependent and independent variables least-squares fit to a polynomial least-squares fit to an arbitrary function fitting composite peaks direct application of the maximum likelihood. Appendices: numerical methods matrices graphs and tables histograms and graphs computer routines in Pascal.
TL;DR: Numerical methods matrices graphs and tables histograms and graphs computer routines in Pascal and Monte Carlo techniques dependent and independent variables least-squares fit to a polynomial least-square fit to an arbitrary function fitting composite peaks direct application of the maximum likelihood.
Abstract: Uncertainties in measurements probability distributions error analysis estimates of means and errors Monte Carlo techniques dependent and independent variables least-squares fit to a polynomial least-squares fit to an arbitrary function fitting composite peaks direct application of the maximum likelihood. Appendices: numerical methods matrices graphs and tables histograms and graphs computer routines in Pascal.
10,546 citations
"Real-Time Modeling of 3-D Soccer Ba..." refers methods in this paper
...If more fully determined estimates are available, then they could all be incorporated into the estimation of the trajectory based on a more general least-squares estimator [ 25 ]....
[...]
...Moreover, if more than two ball positions have been decided within a curve segment, then a least-squares calculation of the trajectory segment can be used to provide a more robust estimate [ 25 ]....
TL;DR: This paper discusses modeling each pixel as a mixture of Gaussians and using an on-line approximation to update the model, resulting in a stable, real-time outdoor tracker which reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes.
Abstract: A common method for real-time segmentation of moving regions in image sequences involves "background subtraction", or thresholding the error between an estimate of the image without moving objects and the current image. The numerous approaches to this problem differ in the type of background model used and the procedure used to update the model. This paper discusses modeling each pixel as a mixture of Gaussians and using an on-line approximation to update the model. The Gaussian, distributions of the adaptive mixture model are then evaluated to determine which are most likely to result from a background process. Each pixel is classified based on whether the Gaussian distribution which represents it most effectively is considered part of the background model. This results in a stable, real-time outdoor tracker which reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes. This system has been run almost continuously for 16 months, 24 hours a day, through rain and snow.
7,660 citations
"Real-Time Modeling of 3-D Soccer Ba..." refers methods in this paper
...To locate and track players and the soccer ball, a multimodal adaptive background model is utilized that provides robust foreground detection using image differencing [17]....
[...]
...In Section II, the method we used for tracking and detecting moving objects is described, using Gaussian mixtures [17] and calibrated cameras [19]....
[...]
...Over the mask detected above, foreground pixels are located using the robust multimodal adaptive background model [17]....
TL;DR: In this paper, a two-stage technique for 3D camera calibration using TV cameras and lenses is described, aimed at efficient computation of camera external position and orientation relative to object reference coordinate system as well as the effective focal length, radial lens distortion, and image scanning parameters.
Abstract: A new technique for three-dimensional (3D) camera calibration for machine vision metrology using off-the-shelf TV cameras and lenses is described. The two-stage technique is aimed at efficient computation of camera external position and orientation relative to object reference coordinate system as well as the effective focal length, radial lens distortion, and image scanning parameters. The two-stage technique has advantage in terms of accuracy, speed, and versatility over existing state of the art. A critical review of the state of the art is given in the beginning. A theoretical framework is established, supported by comprehensive proof in five appendixes, and may pave the way for future research on 3D robotics vision. Test results using real data are described. Both accuracy and speed are reported. The experimental results are analyzed and compared with theoretical prediction. Recent effort indicates that with slight modification, the two-stage calibration can be done in real time.
The authors model the ball trajectory as curve segments in consecutive virtual vertical planes, which can accurately approximate the real cases even in complex situation. Using geometric reconstruction techniques, the authors can successfully estimate 3D ball positions from a single view.
Q2. What are the contributions in this paper?
In this paper, a real-time 3D soccer ball tracking system is proposed, using image sequences from multiple fixed cameras as input.