scispace - formally typeset
Open AccessProceedings ArticleDOI

Computer vision system for tracking players in sports games

TLDR
The development of a computer vision system for tracking players in indoor team games is presented, which consists of spatio-temporal trajectories of the players, which can be further processed and analyzed by sport experts.
Abstract
The development of a computer vision system for tracking players in indoor team games is presented. Several image processing and tracking methods are described, along with camera calibration and lens distortion correction. The output of the system consists of spatio-temporal trajectories of the players, which can be further processed and analyzed by sport experts. In some critical situations the automatic tracking process must be manually interrupted. To correct miss-trackings, human supervision is required. Some experimental results are presented as well.

read more

Content maybe subject to copyright    Report

First Intl Workshop on Image and Signal Processing and Analysis, June 14-15, 2000, Pula, Croatia
Computer Vision System for Tracking Players in Sports Games
Janez Perš, Stanislav Kovacic
Faculty of Electrical Engineering, University of Ljubljana
Tržaška 25, 1000 Ljubljana
janez.pers@kiss.uni-lj.si, stanislav.kovacic@fe.uni-lj.si
Abstract
The development of computer vision system for tracking players in indoor team games is
presented. Several image processing and tracking methods are described, along with camera
calibration and lens distortion correction. The output of the system consists of spatio-
temporal trajectories of the players, which can be further processed and analyzed by sport
experts. In some critical situations the automatic tracking process must be manually
interrupted. To correct miss-trackings, human supervision is required. Some experimental
results are presented as well.
Keywords: object tracking, human motion analysis, camera calibration, handball
1. Introduction
Human motion analysis is receiving increasing attention from computer vision researchers.
This interest is motivated by applications over a wide spectrum of topics (Aggarval and Cai,
1997). In this paper we concentrate on tracking people in team sports based on computer
vision. If the trajectories are sufficiently accurately determined, a wealth of additional
information, e.g. players speed, acceleration and players interactions can be obtained.
For many years the analysis of a sport event has been based on observation sheetsfilled-in
during the match. Later, with a help of video recordings, motion acquisition and analysis were
performed manually, a time consuming and tedious task. In the past, progress in introducing
the computer vision technology to the team sports domain was slow, due to inadequate video
and computational facilities and complexity of the tracking problem itself. The players strive
to move rapidly, change direction unpredictably and collide with one another. They violate
the smooth motion assumption, on which many tracking algorithms are based. Players appear
in the images as highly non-rigid forms, especially due to the movements of their extremities.
Many of the proposed approaches solved the motion acquisition and analysis problem only
partially and were therefore unable to provide an adequate solution to the sports experts, i.e.
tracking every player in the whole field in every instance of time (Erdmann, 1994).
2. Image acquisition
Careful planning of image acquisition could be crucial for the success of the whole project,
and inappropriately placed cameras can add a significant degree of difficulty to the tracking
problem (Intille and Bobick, 1995). To determine the trajectories of the players from the
beginning to the end of the game, the objects in question have to be in the field of view for the
duration of the whole match. Two stationary cameras, mounted directly above the court, as
shown in Fig. 1 (a) were chosen as the most straightforward solution.

First Intl Workshop on Image and Signal Processing and Analysis, June 14-15, 2000, Pula, Croatia
(a) (b)
Fig. 1: (a) Camera placement. (b) Image obtained.
Both the handball match and the test video sequence of players moving on predefined paths
were recorded using two PAL S-VHS videorecorders. The transfer to digital domain was
carried out using the Motion-JPEG frame grabber at 25 frames per second with image
resolution of 384x288 pixels. The result is shown in Fig. 1 (b).
3. Camera calibration
Due to significant radial distortion otherwise widely used calibration technique (Tsai, 1987)
failed to produce satisfactory results. We decided to take different approach by modeling the
radial image distortion more accurately.
Let us imagine an ideal pinhole camera C, mounted on a pan-tilt device above the point 0, as
illustrated in Fig 2. X is the observed point on the court plane, at distance R from the point 0.
h is the distance from the camera to the court plane. Angle α is the angle of pan-tilt device
when observing the point X. The differential dR of radius R is projected to the differential dr,
which is parallel to the camera image plane. The image of dr appears on the image plane.
Relations between dR, dr and α are given within the triangle on the enlarged part of Fig. 2 (a).
camera
R
h
α
dR
dr
α
0
C
X
°
°
°
°
X
(a) (b)
Fig. 2: (a) Model of radial distortion. (b) The resulting images after correction.
Thus, we can write the following relations:
),arctg(,)cos(
h
R
dRdr == αα (1)
.))cos(arctg( dR
h
R
dr = (2)
Now, let us substitute the pan-tilt camera with fixed camera, equipped with wide-angle lens.
The whole area, which would be covered by changing the angle α of a pan-tilt camera, is
captured simultaneously to the single image of a stationary camera. Additionally, let us

First Intl Workshop on Image and Signal Processing and Analysis, June 14-15, 2000, Pula, Croatia
assume the scaling factor between the dr and the image of dr on the image plane to be 1.
Therefore, we can obtain the length of the image of radius R on the image plane by integrating
left side of the eq. (2) over the interval (0, r
1
), and over the interval (0, R
1
) on the right side.
With R
1
being the distance from the observed point X to the point 0, and r
1
being the distance
from the image of point X to the image of point 0 on the image plane, we have the solution of
the inverse problem:
.1ln
2
2
11
1
++=
h
R
h
R
hr (3)
By solving eq. (3) for R
1
we obtain the formula, which can be used to correct the radial
distortion:
.
1)(
2
1
1
2
1
h
r
h
r
e
eh
R
=
(4)
4. Player tracking
We developed three different algorithms for use in player tracker: motion detection, template
tracking and color based tracking. We tested different combinations of these algorithms; each
one has its own advantages and disadvantages.
4.1 Motion detection
The most straightforward approach to motion detection is image subtraction. Each frame C in
the sequence was subtracted from the reference frameR, i.e. an image of the empty court,
yielding the difference image D,
D R R G G B B
R C R C R C
= + + , (5)
where R, G and B denote the red, green and blue components of the current frame and the
reference frame, respectively. The difference image D is then thresholded using fixed
threshold value and subsequently filtered to reduce image noise. Filtered image is shown in
Fig. 3. Resulting blobs in the filtered image are then counted, labeled and their centers of
gravity calculated. Some of blobs correspond to the players, while others are caused by noise,
shadows and other distracting objects. This limits the efficiency of this technique. Collisions
of more than two players and intense player shadows contacting other players require almost
always an intervention by human operator. Typical problem is shown in Fig. 3 (b),(c).
(a) (b) (c)
Fig. 3: (a) Blobs detected in the motion detection stage. (b) Typical scene, which confuses
the motion detection algorithm. Four players are marked by white arrows. (c) Corresponding
difference image.

First Intl Workshop on Image and Signal Processing and Analysis, June 14-15, 2000, Pula, Croatia
4.2 Template tracking
Visual differences between the players and the background objects were exploited to further
improve tracking process.
The feature set, which could be used to successfully separate objects from background was
needed. This is a difficult problem, as shape of the players varies over time and the image
resolution is low. Players are represented by relatively small area of only 10-15 pixels in
diameter. We have defined a set of 2D functions, i.e. templates {K
j
, j=1,..,14}, which
extract the very basic appearances of the player. They resemble the Walsh functions, but they
are not orthogonal. These functions are shown in Fig. 4 (a).
G
(background)
H (player)
F (unknown object)
d
GF
d
FH
(a) (b)
Fig. 4: (a) Basic appearances of the player. Black areas represent zeros, while grey areas
represent values of 1. (b) Classification of an unknown object.
Each channel of RGB color image is processed separately and the vector of 14 features F for
each channel is obtained using the following expression:
)14,..,1(,),(),(
16
1
16
1
==
= =
jyxIyxKF
x y
jj
, (6)
where K
j
is one of templates, and I is one of the three RGB channels, obtained from the
current image. The result is 42-dimensional feature vector F. Additionally, similar vector of
features G is calculated from the reference image of the empty playing court at the same
coordinates. The third vector H is obtained by averaging the last n vectors of features F,
which allows certain adaptivity, as the player appearance changes over time. The simplified,
two-dimensional case of classification is shown in Fig. 4 (b).
0.2
0.3
0.4
0.5
0.6
0.7
0.8
20
40
60
80
100
(a) (b) (c)
Fig. 5: Locating the player wearing the yellow dress using the set of templates. (a) Player is
shown in the center of image. (b) The difference image, as defined in (5) for that particular
case. c) Calculated similarity measure s as defined in (7). The white region (showing high
similarity) marks the player location more accurately than the difference image, shown in (b).

First Intl Workshop on Image and Signal Processing and Analysis, June 14-15, 2000, Pula, Croatia
The both distances (Fig. 4 (b)) and corresponding similarity measure s are defined as follows:
==
==
42
1
2
42
1
2
)()(
j
jjFH
j
jjGF
HFdGFd
,
21
2
dd
d
s
+
= (7)
The results of player localization using this approach are shown in Fig. 5.
4.3 Color tracking
Color as an identifying feature can be also used for tracking players. The three-dimensional
RGB color representation was chosen, as some players wear dark dresses, which would result
in undefined values of hue. Color identification and localization, based on color histograms
(Swain and Ballard, 1991) is not appropriate, as there are only a few (3-6) pixels that closely
resemble the reference color of the players dress. To ensure reliable tracking, only the
location of the pixel that is most similar to the players color becomes new player position.
The similarity measure used is defined as follows:
2
1
))),(()),(()),(((),(
222
BBGGRR
CyxICyxICyxIyxS ++= , (8)
where I is the image and C is the color of the player. Subscripts R, G and B denote the red,
green and blue channel, respectively.
The algorithm searches for the pixel most similar to the recorded color of the player in limited
area around the previous player position. Colors of the players are defined by the operator
supervising the tracking process, before the actual tracking is started.
5. Results
The following tracking methods were tested: the motion detection algorithm (A), the color
tracking algorithm (B) and the combination of color and template tracking algorithm (C). The
use of color tracking in C avoids drift in player position, caused by the template tracking.
Tests were performed on two video sequences. Their lengths were 30 and 50 seconds,
respectively.
First sequence was taken from the recording of handball match and was used to evaluate
reliability of the algorithms by counting the human interventions required to maintain error-
free tracking process. The second one was the test sequence, in which players were standing
still. The measured distances therefore directly correspond to noise added by particular
tracking method. Additionally, the processing time per frame on PC equipped with 500 MHz
Pentium III processor was measured. The results are shown in Table 1.
Table 1: Comparison results for three different combinations of tracking methods.
Method Interventions Noise Time
A 45 80 meters 0.424 sec/frame
B 12 249 meters 0.175 sec/frame
C 14 55 meters 0.229 sec/frame
The advantages and disadvantages for each of the methods can be clearly seen. The motion
detection (A) introduced little noise to the trajectories. On the other hand, more than one
intervention per one second of playing time is required, which puts enormous pressure on

Citations
More filters
Journal ArticleDOI

A Review of Vision-Based Motion Analysis in Sport

TL;DR: In this paper, the authors present an automated motion detection system for team sports using a set of motion recognition algorithms, such as TRAKUS, SoccerMan, TRAKPERFORMANCE, Pfinder, and Prozone.
Journal ArticleDOI

Structural Analysis of Action and Time in Sports: Judo

TL;DR: In this paper, a computer program called Saats (Structural Analysis of Action and Time in Sports) is used to analyze the following actions: break, grip, technique, fall and groundwork.
Journal ArticleDOI

Automatic soccer players tracking in goal scenes by camera motion elimination

TL;DR: A novel and effective algorithm for tracking soccer players in goal scenes, by eliminating fast camera motions effect through the correspondence between line marks in soccer field model and image sequences is proposed.
Journal ArticleDOI

Adaptive mean-shift for automated multi object tracking

TL;DR: A method that remedies problems of mean-shift tracking and presents an easy to implement, robust and efficient tracking method that can be used for automated static camera video surveillance applications is proposed and it is shown that the proposed method is superior to the standard mean- shift.
Proceedings ArticleDOI

Robust tracking of soccer players based on data fusion

TL;DR: The major feature of the proposed method is that each of the observation units with different pattern matching algorithms is executed step-by-step to innovate the state vector considering the reliability of the observer, which makes the tracks robust to occlusion and to deformation.
References
More filters
Journal ArticleDOI

A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses

TL;DR: In this paper, a two-stage technique for 3D camera calibration using TV cameras and lenses is described, aimed at efficient computation of camera external position and orientation relative to object reference coordinate system as well as the effective focal length, radial lens distortion, and image scanning parameters.
Book

A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses

Roger Y. Tsai
TL;DR: A new technique for three-dimensional camera calibration for machine vision metrology using off-the-shelf TV cameras and lenses using two-stage technique has advantage in terms of accuracy, speed, and versatility over existing state of the art.
Journal ArticleDOI

Color indexing

TL;DR: In this paper, color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models, and they can differentiate among a large number of objects.
Proceedings ArticleDOI

Human motion analysis: a review

TL;DR: The paper gives an overview of the various tasks involved in motion analysis of the human body, and focuses on three major areas related to interpreting human motion: motion analysis involving human body parts, tracking of human motion using single or multiple cameras, and recognizing human activities from image sequences.
Journal ArticleDOI

Human Motion Analysis

TL;DR: An overview of the various tasks involved in motion analysis of the human body is given and three major areas related to interpreting human motion are focused on: motion analysis involving human body parts, tracking a moving human from a single view or multiple camera perspectives, and recognizing human activities from image sequences.
Related Papers (5)