scispace - formally typeset

Proceedings ArticleDOI

3D Estimation and Visualization of Motion in a Multicamera Network for Sports

07 Sep 2011-pp 15-19

TL;DR: This work develops image processing and computer vision techniques for visually tracking a tennis ball, in 3D, on a court instrumented with multiple low-cost IP cameras, and incorporates a physics-based trajectory model into the system.

AbstractIn this work, we develop image processing and computer vision techniques for visually tracking a tennis ball, in 3D, on a court instrumented with multiple low-cost IP cameras The technique first obtains 2D ball tracking data from each camera view using 2D object tracking methods Next, an automatic feature-based video synchronization method is applied This technique uses the extracted 2D ball information from two or more camera views, plus camera calibration information In order to find 3D trajectory, the temporal 3D locations of the ball is estimated using triangulation of correspondent 2D locations obtained from automatically synchronized videos Furthermore, in order to improve the continuity of the tracked 3D ball during times when no two cameras have overlapping views of the ball location, we incorporate a physics-based trajectory model into the system The resultant 3D ball tracks are then visualized in a virtual 3D graphical environment Finally, we quantify the accuracy of our system in terms of reprojection error

Topics: Tennis ball (67%), Video tracking (58%), Camera resectioning (57%), Image processing (52%)

Summary (2 min read)

Introduction

  • In professional sports the authors are familiar with high-end camera technology being used to enhance the viewer experience above and beyond a traditional broadcast.
  • By enabling sports video analysis with low cost camera networks, many local amateur clubs and sports institutions will be able to make use of these types of technologies.
  • This 3D ball track data can be used for analysis purposes such as determining the speed of the ball over the net (a common tennis coach requirement), classification of type of shots played by the players, or to index the video frames and classify important events for coaching [1].
  • In addition, the use of less expensive cameras also lead to the distortion [2] in the videos acquired, hence camera calibration of both camera intrinsics, as well as extrinsics, is essential.
  • The remainder of this paper is organized as follows: Section II outlines previous work in the area.

III. ALGORITHMIC DESIGN

  • The authors use videos acquired from both sideview cameras and the overhead camera in the dataset for 2D ball detection and tracking, as explained in III-A. Camera calibration data is acquired for each individual camera using the Matlab camera calibration toolbox [5].
  • Next, the authors apply morphological operations to enhance the size of the moving pixels, which otherwise, would be verydifficult to discrimanate between different objects.
  • The final constraint considered when eliminating false ball candidates is basedon distance between tennis ball positions in two consecutive frames.
  • As such, there is need to synchronize these videos before the 2D ball tracks from multiple cameras can be used for 3D estimation.

C. Robust 3D Tracking

  • A disadvantage of considering only two cameras is that the authors dont get a continuous temporal 3D ball track stream due to lack of availability of the synchronized 2D data in two views through all the frames.
  • To overcome this drawback, the authors employed a robust 3D tracking method using3D coordinates obtained from different camera pairs at different points in time.
  • The authors combine the tracking data from these multiple cameras to calculate a more stable, robustand accurate 3D ball trajectory.
  • The authors calculate 3D points using triangulation of the 2D points in each camera (p2D,i) with the 2D point in theoverhead camera (the 9th camera) (p2D,9): p3D,i = triangulate(p2D,i, p2D,9).
  • (5) The 3D points calculated at each time instance correspond toone real-world 3D coordinate and ideally all of them should be identical.

D. Physics Based Trajectory Modeling

  • Temporal prediction of the ball coordinates through times when no 3D ball information is available is necessary because to increase the continuity in the tracked features.
  • Projectiles are particles which are projected under gravity through air, such as objects thrown by hand or shells fired from a gun.
  • To simplify the problem, few assumptions have been made.
  • Parameters like air resistance and ball spin, which would require modification in the modeling, have been neglected.
  • The authors have developed a GUI using OpenGL [7], one of the most widely used and supported 2D and 3D graphics application programming interface (API).

IV. EXPERIMENTAL RESULTS

  • To evaluate their approach, the authors quantify the accuracy of their system interms of reprojection error, which is defined as the distance between the actual 2D pixel coordinates and the reprojected pixel coordinates calculated using L1 Norm.
  • As the number of cameras considered for analysis increases, the number of tracked points also increases, but at the cost of reprojection error.
  • Time is on the horizontal axis, and times at which the ball is tracked is highlighted with a horizontal line, with times when the ball track is lost represented by a gap.
  • The bottom line represents continuity when trajectory modeling is included.
  • The authors can observe that some of the gaps are filled after incorporating prediction in the system.

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

3D Estimation and Visualization of Motion in a
Multicamera Network for Sports
Anil Kumar, P. Shashidhar Chavan, Sharatchandra V.K
and Sumam David
National Institute of Technology Karnataka,
Karnataka, India
Email: {anil3487, shashi3494, sharatkashimath}@gmail.com
sumam@ieee.org
Philip Kelly and Noel E. O’Connor
CLARITY: Centre for Sensor Web Technologies,
Dublin City University,
Dublin 9, Ireland
Email: {philip.kelly, noel.oconnor}@dcu.ie
Abstract—In this work, we develop image processing and
computer vision techniques for visually tracking a tennis ball, in
3D, on a court instrumented with multiple low-cost IP cameras.
The technique first extracts 2D ball track data from each camera
view, using object tracking methods. Next, an automatic feature-
based video synchronization method is applied. This technique
uses both the extracted 2D ball information from two or more
camera views, plus camera calibration information. Then, in
order to find 3D trajectory, the temporal 3D locations of the ball
is estimated using triangulation of correspondent 2D locations
obtained from automatically synchronized videos. Furthermore,
we also incorporate a physics-based trajectory model into the
system to improve the continuity of the tracked 3D ball during
times when no two cameras have overlapping views of the ball
location. The resultant 3D ball tracks are then visualized in
a virtual 3D graphical environment. Finally, we quantify the
accuracy of our system in terms of reprojection error.
I. INTRODUCTION
In professional sports we are familiar with high-end camera
technology being used to enhance the viewer experience above
and beyond a traditional broadcast. High profile examples
include the Hawk-Eye Officiating System as used in tennis,
snooker and cricket. Whilst extremely valuable to the viewing
experience, such technologies are only feasible for high profile
professional sports. Sports video analysis has also been exten-
sively used by coaches for the effective training of athletes.
Presently, there are several commercial technological solutions
for sports video analysis. However, these systems, again, tend
to be expensive to purchase and run.
Advances in camera technology, coupled with falling prices
means that reasonable quality visual capture is now within
reach of most local and amateur sporting and leisure organi-
zations. Thus it becomes feasible for every field sports club,
whether tennis, soccer, cricket or hockey, to install their own
camera network at their local ground. By enabling sports video
analysis with low cost camera networks, many local amateur
clubs and sports institutions will be able to make use of these
types of technologies. In these cases, the motivation is usually
not for broadcast purposes, but rather for the technology to act
as a video referee or adjudicator, and also to facilitate coaches
and mentors to provide better feedback to athletes based on
recorded competitive training matches, training drills or any
prescribed set of activities.
In this work, we focus on tracking a tennis ball in 3D space
during a tennis match using the videos obtained from low-
cost camera network. Although the obtained 3D data could be
used for decision making purposes, as in Hawk-Eye, we focus
on its use as a low-cost tennis analysis system for coaching.
This 3D ball track data can be used for analysis purposes
such as determining the speed of the ball over the net (a
common tennis coach requirement), classification of type of
shots played by the players, or to index the video frames and
classify important events for coaching [1]. One of the main
problems from using low-cost camera networks is that the
cameras are typically no synchronization between sensors, as
such a need for automatic video synchronization algorithms
exists. In addition, the use of less expensive cameras also
lead to the distortion [2] in the videos acquired, hence camera
calibration of both camera intrinsics, as well as extrinsics, is
essential. In this work, we also introduce a physics based
model into our system, to predict the position of the ball
when there is a lack of overlapping data from different camera
views. Modeling of the ball trajectory is an essential part of
this system as it provides continuity of the tracked features,
leading to improvised tracking robustness.
The remainder of this paper is organized as follows: Section
II outlines previous work in the area. We give a high-level
overview of our system in section III. In this section, we
the subsequently describe the video analysis components that
underpin the ball tracking techniques. In addition, this section
provides details on physics based modeling that we incorpo-
rated and visualization framework developed using OpenGL.
Section IV provides quantitative experimental evaluation of
our system in terms of reprojection error, and graphical results
indicating the advantages of prediction. Finally, we give our
conclusions and directions for future work in section V.
II. RELATED WORK
The work of [1] illustrates how a low-cost camera network
could be effectively used for performance analysis if ball and
player tracks are known. Our work extend that described by
Aksay et. al.[3], where techniques for 2D ball tracking, feature
based automatic video synchronization and 3D estimation are
described. We utilize the above mentioned techniques and
improvise the overall quality of the system by developing our

Fig. 1. Camera locations around the court.
Fig. 2. Block diagram of the system.
own algorithm for prediction in case of missing points in the
trajectory. For this work. the dataset from the “3DLife ACM
Multimedia Grand Challenge 2010 Dataset ”[4] is utilized.
This dataset includes 9 video streams of a competitive singles
tennis match scenario from 9 IP cameras placed at different
positions around an entire tennis court see Figure1. This
dataset also includes chessboard images and 3D locations of
some known objects in the scene for camera calibration.
III. ALGORITHMIC DESIGN
Figure 2 represents our system at a block level. We use
videos acquired from both sideview cameras and the overhead
camera in the dataset for 2D ball detection and tracking, as
explained in III-A. Camera calibration data is acquired for
each individual camera using the Matlab camera calibration
toolbox [5]. Once the tracking information from each of the
2D camera view was acquired, every cameras video stream is
synchronized with respect to overhead camera see Section
III-B. Once synced, the 3D ball tracking is extended into 3D
space using the camera calibration information as explained
Section III-C. We then introduce a physics based trajectory
model, which is required to provide continuity in obtained
3D data ball tracks. The algorithmic description of trajectory
modeling is provided in Section III-D. Finally, we created a
virtual tennis court using OpenGL and visualized the motion
(a) (b) (c)
Fig. 3. Results of Object Tracking using frame differencing and thresholding;
(a) Original Frame; (b) Dilated Moving Pixels (c) Ball Blob.
of the ball, which is explained in detail in Section III-E.
A. 2D Ball Detection and Tracking
Object tracking techniques include frame differencing, op-
tical flow, mean-shift and various other methods. A simple
frame differencing and thresholding method would suffice in
the given context, since the data set provided has a static
background with the only moving objects being tennis ball
and players. In order to extract the ball trajectories, we begin
with detection of ball candidates for every video frame S(n).
We use method similar to the one described in [3]. All of
the moving parts of the frame that satisfy certain color and
size constraints are initially considered as ball candidates. We
detect moving parts by utilizing the luminance adjacent frames
difference. For the n
th
luminance frame, S
y
(n), we obtainthe
moving parts by thresholding the image, M (n), calculated as:
M(n) = abs[S
y
(n + 1) S
y
(n)].abs[S
y
(n) S
y
(n 1)] (1)
where the . in the above equation represents element by
element multiplication.In this way, the real moving parts of
S(n) are heavily emphasised in M(n). Using 3 adjacent
frames to detect moving parts in the middle frame, as in
equation 1, is necessary step so that ambiguities in the location
of moving parts are avoided.
To eliminate the false candidates from the obtained loca-
tions, distance, colour and size constraints are applied. We
first eliminate false candiadates based on the colour infor-
mation. The blue, C
b
, and red, C
r
, channel values of the
tennis ball is inspected over diferent frames and for different
cameras. An empirical values of C
b
and C
r
is set and moving
pixels outside this range are eliminated. Next, we apply
morphological operations (dilation) to enhance the size of
the moving pixels, which otherwise, would be verydifficult to
discrimanate between different objects. Dilation will also come
into advantage in identifying the blobs correspondingto players
and tennis ball, as blobs corresponding to players will be much
larger compared to the blobs corresponding to balls. Hence,
by empirically setting a threshold value on the blob size,
largerblobs are removed. The final constraint considered when
eliminating false ball candidates is basedon distance between
tennis ball positions in two consecutive frames.A maximum
distance is set and any moving pixels outside this distance
areeliminated. This way, only one coordinate (corresponding
to center of mass of tennis ball) is extracted for each frame.
Figure 3(b) and (c) shows the results of the above discussed
techniques on an input frame.

Fig. 4. Plot of LM (∆) vs. Framedelay
B. 3D Estimation and Video Synchronization
Since the tennis videos in our data set are recorded at differ-
ent frame rates,there is no guarantee of the videos being started
at same time or that there will be no dropped frames through
each sequence. As such, there is need to synchronize these
videos before the 2D ball tracks from multiple cameras can be
used for 3D estimation. As such, we implemented the feature
based automatic video synchronization technique explained in
[3]. This method requires estimated 3D coordinate features
for each frame in order to know the de-synchronized timing.
Hence, its an inter-dependency problem where 3D coordinates
are required to synchronize the videos, and synchronized
videos are required to calculate accurate 3D coordinate. We
first calculate the 3D trajectories point-by-point using trian-
gulation [6] of two 2D trajectories from the two videos to be
synchronized. Then, the 3D trajectoriesare back-projected onto
one of the camera views. Assuming that the camera calibration
is accurate, back-projected 3D trajectories should be almost
identical to the 2D original camera trajectories when the time
shift used is close to the real de-synchronization of the videos.
However, due to issues such as non-ideal calibration data and
outlier 3D trajectories, the measure, LM(∆), suggested in [3]
is used to find out the best matching time shift
max
.
LM(∆) =
L(∆)
D(∆)
, L(∆) = count(||or bp|| < T
L
) (2)
D(∆) =
Σ||or bp|| < T
L
L(∆)
(3)
where ||or bp|| is the euclidean distance between the original
point and back projected point and is the tested time
shift. D(∆) is normalized euclidean distance between points,
calculated using only those points whose reprojected points
are within distance of some empirically set value of T
L
. The
required time shift is
max
= arg max(LM(∆)) (4)
Figure 4 shows the plot of LM (∆) for different frame delays
in a test scenario. We choose the value of frame delay
that corresponds to the maximum value of LM(∆). In this
work, all the videos were synchronized with reference to
the overhead camera, since this camera has a field of view
covering the whole of the tennis court.
C. Robust 3D Tracking
A disadvantage of considering only two cameras is that
we dont get a continuous temporal 3D ball track stream due
to lack of availability of the synchronized 2D data in two
views through all the frames. To overcome this drawback, we
employed a robust 3D tracking method using3D coordinates
obtained from different camera pairs at different points in time.
We combine the tracking data from these multiple cameras to
calculate a more stable, robustand accurate 3D ball trajectory.
Let a 2D coordinate of the tennis ball at time instance, t, in
i
th
camera view be p
2D,i
= [x
i
(t), y
i
(t)]
T
. We calculate 3D
points using triangulation of the 2D points in each camera
(p
2D,i
) with the 2D point in theoverhead camera (the 9
t
h
camera) (p
2D,9
):
p
3D,i
= triangulate(p
2D,i
, p
2D,9
). (5)
The 3D points calculated at each time instance correspond
toone real-world 3D coordinate and ideally all of them should
be identical. However, due to several factors like camera cali-
bration errors, 2D tracker errors ortriangulation approximation,
each of the 3D points will tend to differ slightly, so some
formal technique for combining these multiple 3D points is
needed. in this work, we use a weighted averaging to find a
robust andaccurate 3D point p
3D
p
3D
=
P
i
w
i
p
3D,i
P
i
w
i
(6)
where, w
i
is the measure for the level of accuracy of each
3D pointp
3D,i
. w
i
is calculated as the inverse Euclidean
distancebetween the original 2D point (p
2D,9
) and the back
projected 2D point (bp
2D,i
) on the 9
th
camera view. as shown
below.
w
i
=
1
d
i
=
1
||p
2D,9
, bp
2D,i
||
(7)
D. Physics Based Trajectory Modeling
Temporal prediction of the ball coordinates through times
when no 3D ball information is available is necessary because
to increase the continuity in the tracked features. This predic-
tion is achieved by considering the trajectory of the ball to be
a projectile. Projectiles are particles which are projected under
gravity through air, such as objects thrown by hand or shells
fired from a gun. Typically, mathematics describe projectiles
with both horizontal and vertical velocity components, and are
subject to a downward vertical acceleration (i.e. acceleration
due to gravity). To simplify the problem, few assumptions
have been made. Parameters like air resistance and ball spin,
which would require modification in the modeling, have been
neglected. We consider following kinematic equations of mo-
tion to predict the position of the ball in case of missing 3D
points:
v = u + at (8)
s = ut +
1
2
at
2
(9)
v
2
= u
2
+ 2as (10)
where v is the velocity at any time t, u is the initial velocity, a
is acceleration, and s is the distance traveled in time t. In our
problem, as air resistance and ball spin are not considered, only

Fig. 5. Visualization of ball trajectory at two different viewpoints.
the Z component of acceleration exists (-gravity), so the X and
Y components of acceleration are set to zero. We consider x,
y and z components of velocities separately and apply above
equations to predict position of the ball in case of missing
points in ball trajectory.
The coordinates are predicted using following steps: Say,
frames i to i + k require prediction
1) For the frame i with no 3D coordinate estimated, the
x,y,z components of velocities are found out using
tracked 3D points in frames i 1 and i 2.
2) Using this velocity information and equations of motion,
the 3D coordinate in the frame i is predicted.
3) Step 1 & 2 are carried out for all consecutive frames
(up to frame i + k).
4) Predicted points are retained only if the predicted point
in frame i + k is within some tolerable distance of the
estimated 3D coordinate in the frame i + k + 1.
E. Visualization Framework
If the developed algorithms have to be effectively used for
performance analysis by coaches or as a decision making
tool, a 3D graphical user interface (GUI) is essential, as the
visualisation makes the system more intuitive and appealing.
We have developed a GUI using OpenGL [7], one of the most
widely used and supported 2D and 3D graphics application
programming interface (API). It is hardware independent and
very much portable, hence it can be used wide across many
platforms. The frame work we developed is a virtual tennis
court, with an interface of selectingdifferent camera views and
zoom in features see Figure 5.
IV. EXPERIMENTAL RESULTS
To evaluate our approach, we quantify the accuracy of our
system interms of reprojection error, which is defined as the
distance between the actual 2D pixel coordinates and the
reprojected pixel coordinates calculated using L1 Norm.
TABLE I
REPROJECTION ERRORS
Camera Combinations Reprojection Error
Camera 2 & 9 10.5891
Camera 4 & 9 7.2260
Camera 2, 4 & 9 12.8549
Table I shows the reprojection error obtained for different
combinations of cameras used for 3D tracking. As the number
of cameras considered for analysis increases, the number of
tracked points also increases, but at the cost of reprojection
error. Unfortunately, for the tracked points with inclusion of
Fig. 6. Continuity of tracked features for different conditions
(a)
(b)
Fig. 7. Reprojected trajectory (green) of the ball; (a) without prediction, and;
(b) with prediction
prediction model, the reprojection error can not be calculated
since we do not have ground truth data to compare with.
A graphical representation of the results obtained for various
techniques is shown in Figure 6. In this figure, time is on
the horizontal axis, and times at which the ball is tracked is
highlighted with a horizontal line, with times when the ball
track is lost represented by a gap. From this figure, we can
see that with increasing the number of cameras for tracking
the continuity in the tracked features also increases. The
bottom line represents continuity when trajectory modeling
is included. We can observe that some of the gaps are filled
after incorporating prediction in the system.
This advantage of incorporating trajectory modeling can
also be seen in Figure 7, where trajectories with and without
prediction are depicted. Notice how the tracked trajectory of
the ball (in green) is increased in (b) when compared to (a).
V. CONCLUSIONS AND FUTURE WORK
In this paper we presented algorithms associated with 2D
& 3D object tracking and video synchronization. We also
presented a basic, physics based modelling to increase the-
continuity of the tracked features. We believe that, though
the prediction model is very basic, it could be the first step
towardsdevelopment of a complex modelling system. In future
work, an accurate modelling of the ball trajectory could be
developed to ensure the continuity of the tracked features, by
considering realtime scenarios like ball spin and air resistance.
ACKNOWLEDGMENT
This work is supported by Science Foundation Ireland under grant
07/CE/I1147.

REFERENCES
[1] P. Kelly, J. Diego, P.-M. Agapito, C. O. Conaire, D. Connaghan, J. Kuk-
lyte, and N. E. O’Connor., “Performance analysis and visualisation in
tennis using a low-cost camera network, in Multimedia Grand Challenge
Track at ACM Multimedia, 2010.
[2] G. Bradski and A. Kaehler, Learning OpenCV-Computer Vision with
OpenCV Library, M. Loukides, Ed. OReilly Publications, 2008.
[3] A. Aksay, V. Kitanovski, K. Vaiapury, E. Onasoglou, J. D. P. M. Agapito,
P. Daras, and E. Izquierdo., “Robust 3d tracking in tennis videos. in
Engage Summer School, Sept. 2010.
[4] C. O. Conaire, P. Kelly, D. Connaghan, and N. E. O’Connor., “Tennis-
sense: A platform for extracting semantic information from multi-camera
tennis data, in DSP 2009 - 16th International Conference on Digital
Signal Processing, 2009, pp. 1062–1067.
[5] J. Bouguet, “Camera calibration toolbox for matlab. [Online]. Available:
http://www.vision.caltech.edu/bouguetj/
[6] Y. Morvan, Acquisition, compression and rendering of depth and tex-
ture for multiview video, Ph.D. dissertation, Eindhoven University of
Technology, Eindhoven, The Netherlands, 2009.
[7] K. Group, “Opengl overview. [Online]. Available:
http://www.opengl.org/about/overview/
Citations
More filters

Journal ArticleDOI
TL;DR: An exhaustive survey of all the published research works on ball tracking in a categorical manner is presented to present discussions on the published work so far and views and opinions followed by a modified block diagram of the tracking process.
Abstract: Increase in the number of sport lovers in games like football, cricket, etc. has created a need for digging, analyzing and presenting more and more multidimensional information to them. Different classes of people require different kinds of information and this expands the space and scale of the required information. Tracking of ball movement is of utmost importance for extracting any information from the ball based sports video sequences. Based on the literature survey, we have initially proposed a block diagram depicting different steps and flow of a general tracking process. The paper further follows the same flow throughout. Detection is the first step of tracking. Dynamic and unpredictable nature of ball appearance, movement and continuously changing background make the detection and tracking processes challenging. Due to these challenges, many researchers have been attracted to this problem and have produced good results under specific conditions. However, generalization of the published work and algorithms to different sports is a distant dream. This paper is an effort to present an exhaustive survey of all the published research works on ball tracking in a categorical manner. The work also reviews the used techniques, their performance, advantages, limitations and their suitability for a particular sport. Finally, we present discussions on the published work so far and our views and opinions followed by a modified block diagram of the tracking process. The paper concludes with the final observations and suggestions on scope of future work.

27 citations


Journal ArticleDOI
Abstract: This paper presents a novel framework for predicting shot location and type in tennis. Inspired by recent neuroscience discoveries, we incorporate neural memory modules to model the episodic and semantic memory components of a tennis player. We propose a Semi-Supervised Generative Adversarial Network architecture that couples these memory models with the automatic feature learning power of deep neural networks, and demonstrate methodologies for learning player level behavioral patterns with the proposed framework. We evaluate the effectiveness of the proposed model on tennis tracking data from the 2012 Australian Tennis Open and exhibit applications of the proposed method in discovering how players adapt their style depending on the match context.

16 citations


Proceedings ArticleDOI
01 Oct 2019
TL;DR: It is shown theoretically and empirically that a simple motion trajectory analysis suffices to translate from pixel measurements to the person's metric height, reaching a MAE of up to 3.9 cm on jumping motions, and that this works without camera and ground plane calibration.
Abstract: Estimating the metric height of a person from monocular imagery without additional assumptions is ill-posed. Existing solutions either require manual calibration of ground plane and camera geometry, special cameras, or reference objects of known size. We focus on motion cues and exploit gravity on earth as an omnipresent reference 'object' to translate acceleration, and subsequently height, measured in image-pixels to values in meters. We require videos of motion as input, where gravity is the only external force. This limitation is different to those of existing solutions that recover a person's height and, therefore, our method opens up new application fields. We show theoretically and empirically that a simple motion trajectory analysis suffices to translate from pixel measurements to the person's metric height, reaching a MAE of up to 3.9 cm on jumping motions, and that this works without camera and ground plane calibration.

7 citations


Cites background or methods from "3D Estimation and Visualization of ..."

  • ..., when the direction of gravity, camera intrinsic, and extrinsic parameters are calibrated, it is true that q can be further decomposed to compute the object’s distance d and extend in all directions, which was the focus of previous studies [16, 23, 24, 17, 29]....

    [...]

  • ...[17] analyze tennis and use physics to fill-in frames for which no multiview triangulation of the ball is available....

    [...]

  • ...Our method is inspired by approaches that estimate the 3D trajectory of rigid objects in free fall [16, 23, 24, 17, 29]....

    [...]


01 Jan 2015
Abstract: Tracking a moving object and reconstructing its trajectory can be done with a stereo camera system, since the two cameras enable depth vision. However, such a system would not work if one of the cameras fails to detect the object. If that happens, it would be beneficial if the system could still use the functioning camera to make an approximate trajectory reconstruction.In this study, I have investigated how past observations from a stereo system can be used to recreate trajectories when video from only one of the cameras is available. Several approaches have been implemented and tested, with varying results. The best method was found to be a nearest neighbors-search optimized by a Kalman filter. On a test set with 10000 golf shots, the algorithm was able to create estimations which on average differed around 3.5 meters from the correct trajectory, with better results for trajec-tories originating close to the camera.

5 citations


Cites background or methods from "3D Estimation and Visualization of ..."

  • ...[22] have created a multi-camera system with a physics model which tracks the trajectory of a tennis ball....

    [...]

  • ...Possible use cases are automated referee systems [22], registration for statistical purposes [6], improving TV-broadcasts with augmented graphics etc....

    [...]


Journal ArticleDOI
Joongsik Kim1, Moonsoo Ra1, Hongjun Lee1, Jeyeon Kim1, Whoi-Yul Kim1 
TL;DR: The experimental results show that the proposed method can estimate a 3D baseball trajectory precisely using a multiple unsynchronized camera system and is robust to variations in capture delay, both in the simulation space and in real-world situations.
Abstract: We developed a method for the precise estimation of the 3D trajectory of a baseball by modeling the movement of the baseball and estimating the capture delay, using multiple unsynchronized cameras. To develop the proposed algorithm, we mimicked the real-world process of capturing a baseball in simulation space, and analyzed the capture process using a multiple unsynchronized camera system. We represented the movement of the baseball using a piece-wise spline model, and predicted the position of the baseball in the subframes in a manner which is robust to position error and change in direction of movement of the baseball. This method accurately predicts the baseball position over time by modeling the movement of the baseball in a real baseball game environment, and improves the accuracy of the reconstructed 3D baseball trajectories. We defined an objective function to estimate the capture delay, and estimate the optimal capture delay parameter using non-linear optimization method. In addition, we evaluated the performance of the proposed method in simulation space and in a real-world situation. The experimental results show that the proposed method can estimate a 3D baseball trajectory precisely using a multiple unsynchronized camera system and is robust to variations in capture delay, both in the simulation space and in real-world situations.

3 citations


Cites background from "3D Estimation and Visualization of ..."

  • ...frame units have previously been proposed [10], [11]....

    [...]


References
More filters

DOI
01 Jan 2009
TL;DR: A new multi-view depth-estimation technique is proposed, employing a one-dimensional optimization strategy that reduces the noise level in the estimated depth images and enforces consistent depth images across the views, and is suitable for execution on a standard Graphics Processor Unit (GPU).
Abstract: Three-dimensional (3D) video and imaging technologies is an emerging trend in the development of digital video systems, as we presently witness the appearance of 3D displays, coding systems, and 3D camera setups. Three-dimensional multi-view video is typically obtained from a set of synchronized cameras, which are capturing the same scene from different viewpoints. This technique especially enables applications such as freeviewpoint video or 3D-TV. Free-viewpoint video applications provide the feature to interactively select and render a virtual viewpoint of the scene. A 3D experience such as for example in 3D-TV is obtained if the data representation and display enable to distinguish the relief of the scene, i.e., the depth within the scene. With 3D-TV, the depth of the scene can be perceived using a multi-view display that renders simultaneously several views of the same scene. To render these multiple views on a remote display, an efficient transmission, and thus compression of the multi-view video is necessary. However, a major problem when dealing with multiview video is the intrinsically large amount of data to be compressed, decompressed and rendered. We aim at an efficient and flexible multi-view video system, and explore three different aspects. First, we develop an algorithm for acquiring a depth signal from a multi-view setup. Second, we present efficient 3D rendering algorithms for a multi-view signal. Third, we propose coding techniques for 3D multi-view signals, based on the use of an explicit depth signal. This motivates that the thesis is divided in three parts. The first part (Chapter 3) addresses the problem of 3D multi-view video acquisition. Multi-view video acquisition refers to the task of estimating and recording a 3D geometric description of the scene. A 3D description of the scene can be represented by a so-called depth image, which can be estimated by triangulation of the corresponding pixels in the multiple views. Initially, we focus on the problem of depth estimation using two views, and present the basic geometric model that enables the triangulation of corresponding pixels across the views. Next, we review two calculation/optimization strategies for determining corresponding pixels: a local and a one-dimensional optimization strategy. Second, to generalize from the two-view case, we introduce a simple geometric model for estimating the depth using multiple views simultaneously. Based on this geometric model, we propose a new multi-view depth-estimation technique, employing a one-dimensional optimization strategy that (1) reduces the noise level in the estimated depth images and (2) enforces consistent depth images across the views. The second part (Chapter 4) details the problem of multi-view image rendering. Multi-view image rendering refers to the process of generating synthetic images using multiple views. Two different rendering techniques are initially explored: a 3D image warping and a mesh-based rendering technique. Each of these methods has its limitations and suffers from either high computational complexity or low image rendering quality. As a consequence, we present two image-based rendering algorithms that improves the balance on the aforementioned issues. First, we derive an alternative formulation of the relief texture algorithm which was extented to the geometry of multiple views. The proposed technique features two advantages: it avoids rendering artifacts ("holes") in the synthetic image and it is suitable for execution on a standard Graphics Processor Unit (GPU). Second, we propose an inverse mapping rendering technique that allows a simple and accurate re-sampling of synthetic pixels. Experimental comparisons with 3D image warping show an improvement of rendering quality of 3.8 dB for the relief texture mapping and 3.0 dB for the inverse mapping rendering technique. The third part concentrates on the compression problem of multi-view texture and depth video (Chapters 5–7). In Chapter 5, we extend the standard H.264/MPEG-4 AVC video compression algorithm for handling the compression of multi-view video. As opposed to the Multi-view Video Coding (MVC) standard that encodes only the multi-view texture data, the proposed encoder peforms the compression of both the texture and the depth multi-view sequences. The proposed extension is based on exploiting the correlation between the multiple camera views. To this end, two different approaches for predictive coding of views have been investigated: a block-based disparity-compensated prediction technique and a View Synthesis Prediction (VSP) scheme. Whereas VSP relies on an accurate depth image, the block-based disparity-compensated prediction scheme can be performed without any geometry information. Our encoder adaptively selects the most appropriate prediction scheme using a rate-distortion criterion for an optimal prediction-mode selection. We present experimental results for several texture and depth multi-view sequences, yielding a quality improvement of up to 0.6 dB for the texture and 3.2 dB for the depth, when compared to solely performing H.264/MPEG-4AVC disparitycompensated prediction. Additionally, we discuss the trade-off between the random-access to a user-selected view and the coding efficiency. Experimental results illustrating and quantifying this trade-off are provided. In Chapter 6, we focus on the compression of a depth signal. We present a novel depth image coding algorithm which concentrates on the special characteristics of depth images: smooth regions delineated by sharp edges. The algorithm models these smooth regions using parameterized piecewiselinear functions and sharp edges by a straight line, so that it is more efficient than a conventional transform-based encoder. To optimize the quality of the coding system for a given bit rate, a special global rate-distortion optimization balances the rate against the accuracy of the signal representation. For typical bit rates, i.e., between 0.01 and 0.25 bit/pixel, experiments have revealed that the coder outperforms a standard JPEG-2000 encoder by 0.6-3.0 dB. Preliminary results were published in the Proceedings of 26th Symposium on Information Theory in the Benelux. In Chapter 7, we propose a novel joint depth-texture bit-allocation algorithm for the joint compression of texture and depth images. The described algorithm combines the depth and texture Rate-Distortion (R-D) curves, to obtain a single R-D surface that allows the optimization of the joint bit-allocation in relation to the obtained rendering quality. Experimental results show an estimated gain of 1 dB compared to a compression performed without joint bit-allocation optimization. Besides this, our joint R-D model can be readily integrated into an multi-view H.264/MPEG-4 AVC coder because it yields the optimal compression setting with a limited computation effort.

81 citations


Proceedings ArticleDOI
05 Jul 2009
TL;DR: TennisSense, a technology platform for the digital capture, analysis and retrieval of tennis training and matches, is introduced and the algorithms for extracting useful metadata from the overhead court camera are described and evaluated.
Abstract: In this paper, we introduce TennisSense, a technology platform for the digital capture, analysis and retrieval of tennis training and matches. Our algorithms for extracting useful metadata from the overhead court camera are described and evaluated. We track the tennis ball using motion images for ball candidate detection and then link ball candidates into locally linear tracks. From these tracks we can infer when serves and rallies take place. Using background subtraction and hysteresis-type blob tracking, we track the tennis players positions. The performance of both modules is evaluated using ground-truthed data. The extracted metadata provides valuable information for indexing and efficient browsing of hours of multi-camera tennis footage and we briefly illustrative how this data is used by our tennis-coach playback interface.

38 citations


"3D Estimation and Visualization of ..." refers background in this paper

  • ...A physics based trajectory model, which is required to provide continuity in obtained 3D data ball tracks, is then employed....

    [...]


Proceedings Article
01 Oct 2010
TL;DR: A novel system for tennis performance analysis that allows coaches to review games and provide detailed audio-visual feedback to tennis athletes and can be generalised to other sports and allow a range of non-professional sports clubs to provide high-quality feedback to their athletes.
Abstract: We describe a novel system for tennis performance analysis that allows coaches to review games and provide detailed audio-visual feedback to tennis athletes. The basis for our system is a network of low-cost IP cameras surrounding the tennis court. Our system exploits the output of several visual analysis modules, including the tracking of players and the tennis ball, and the extraction of player silhouettes for 3D reconstruction. A range of intuitive tools within the interface allow tennis coaches to add 2D and 3D annotations to live video, view play from multiple perspectives, record audio commentary and compute game statistics in real-time. The result is a video file that can be used to provide personalised feedback to the players or for use as a teaching resource for others. While we focus on tennis in this work, we believe our system can be generalised to other sports and allow a range of non-professional sports clubs to provide high-quality feedback to their athletes.

9 citations


"3D Estimation and Visualization of ..." refers background or methods in this paper

  • ...Figure 2 represents our system at a block level....

    [...]

  • ...This 3D data can be used for analysis purposes such as determining the speed of the ball over the net (a common tennis coach requirement), classification of type of shots played by the players, or to index the video frames and classify important events for coaching [1]....

    [...]


Frequently Asked Questions (2)
Q1. What have the authors contributed in "3d estimation and visualization of motion in a multicamera network for sports" ?

In this work, the authors develop image processing and computer vision techniques for visually tracking a tennis ball, in 3D, on a court instrumented with multiple low-cost IP cameras. Furthermore, the authors also incorporate a physics-based trajectory model into the system to improve the continuity of the tracked 3D ball during times when no two cameras have overlapping views of the ball location. Finally, the authors quantify the accuracy of their system in terms of reprojection error. 

In future work, an accurate modelling of the ball trajectory could be developed to ensure the continuity of the tracked features, by considering realtime scenarios like ball spin and air resistance.