# 3D Estimation and Visualization of Motion in a Multicamera Network for Sports

07 Sep 2011-pp 15-19

TL;DR: This work develops image processing and computer vision techniques for visually tracking a tennis ball, in 3D, on a court instrumented with multiple low-cost IP cameras, and incorporates a physics-based trajectory model into the system.

Abstract: In this work, we develop image processing and computer vision techniques for visually tracking a tennis ball, in 3D, on a court instrumented with multiple low-cost IP cameras The technique first obtains 2D ball tracking data from each camera view using 2D object tracking methods Next, an automatic feature-based video synchronization method is applied This technique uses the extracted 2D ball information from two or more camera views, plus camera calibration information In order to find 3D trajectory, the temporal 3D locations of the ball is estimated using triangulation of correspondent 2D locations obtained from automatically synchronized videos Furthermore, in order to improve the continuity of the tracked 3D ball during times when no two cameras have overlapping views of the ball location, we incorporate a physics-based trajectory model into the system The resultant 3D ball tracks are then visualized in a virtual 3D graphical environment Finally, we quantify the accuracy of our system in terms of reprojection error

## Summary (2 min read)

Jump to: [Introduction] – [II. RELATED WORK] – [III. ALGORITHMIC DESIGN] – [C. Robust 3D Tracking] – [D. Physics Based Trajectory Modeling] and [IV. EXPERIMENTAL RESULTS]

### Introduction

- In professional sports the authors are familiar with high-end camera technology being used to enhance the viewer experience above and beyond a traditional broadcast.
- By enabling sports video analysis with low cost camera networks, many local amateur clubs and sports institutions will be able to make use of these types of technologies.
- This 3D ball track data can be used for analysis purposes such as determining the speed of the ball over the net (a common tennis coach requirement), classification of type of shots played by the players, or to index the video frames and classify important events for coaching [1].
- In addition, the use of less expensive cameras also lead to the distortion [2] in the videos acquired, hence camera calibration of both camera intrinsics, as well as extrinsics, is essential.
- The remainder of this paper is organized as follows: Section II outlines previous work in the area.

### III. ALGORITHMIC DESIGN

- The authors use videos acquired from both sideview cameras and the overhead camera in the dataset for 2D ball detection and tracking, as explained in III-A. Camera calibration data is acquired for each individual camera using the Matlab camera calibration toolbox [5].
- Next, the authors apply morphological operations to enhance the size of the moving pixels, which otherwise, would be verydifficult to discrimanate between different objects.
- The final constraint considered when eliminating false ball candidates is basedon distance between tennis ball positions in two consecutive frames.
- As such, there is need to synchronize these videos before the 2D ball tracks from multiple cameras can be used for 3D estimation.

### C. Robust 3D Tracking

- A disadvantage of considering only two cameras is that the authors dont get a continuous temporal 3D ball track stream due to lack of availability of the synchronized 2D data in two views through all the frames.
- To overcome this drawback, the authors employed a robust 3D tracking method using3D coordinates obtained from different camera pairs at different points in time.
- The authors combine the tracking data from these multiple cameras to calculate a more stable, robustand accurate 3D ball trajectory.
- The authors calculate 3D points using triangulation of the 2D points in each camera (p2D,i) with the 2D point in theoverhead camera (the 9th camera) (p2D,9): p3D,i = triangulate(p2D,i, p2D,9).
- (5) The 3D points calculated at each time instance correspond toone real-world 3D coordinate and ideally all of them should be identical.

### D. Physics Based Trajectory Modeling

- Temporal prediction of the ball coordinates through times when no 3D ball information is available is necessary because to increase the continuity in the tracked features.
- Projectiles are particles which are projected under gravity through air, such as objects thrown by hand or shells fired from a gun.
- To simplify the problem, few assumptions have been made.
- Parameters like air resistance and ball spin, which would require modification in the modeling, have been neglected.
- The authors have developed a GUI using OpenGL [7], one of the most widely used and supported 2D and 3D graphics application programming interface (API).

### IV. EXPERIMENTAL RESULTS

- To evaluate their approach, the authors quantify the accuracy of their system interms of reprojection error, which is defined as the distance between the actual 2D pixel coordinates and the reprojected pixel coordinates calculated using L1 Norm.
- As the number of cameras considered for analysis increases, the number of tracked points also increases, but at the cost of reprojection error.
- Time is on the horizontal axis, and times at which the ball is tracked is highlighted with a horizontal line, with times when the ball track is lost represented by a gap.
- The bottom line represents continuity when trajectory modeling is included.
- The authors can observe that some of the gaps are filled after incorporating prediction in the system.

Did you find this useful? Give us your feedback

Content maybe subject to copyright Report

3D Estimation and Visualization of Motion in a

Multicamera Network for Sports

Anil Kumar, P. Shashidhar Chavan, Sharatchandra V.K

and Sumam David

National Institute of Technology Karnataka,

Karnataka, India

Email: {anil3487, shashi3494, sharatkashimath}@gmail.com

sumam@ieee.org

Philip Kelly and Noel E. O’Connor

CLARITY: Centre for Sensor Web Technologies,

Dublin City University,

Dublin 9, Ireland

Email: {philip.kelly, noel.oconnor}@dcu.ie

Abstract—In this work, we develop image processing and

computer vision techniques for visually tracking a tennis ball, in

3D, on a court instrumented with multiple low-cost IP cameras.

The technique ﬁrst extracts 2D ball track data from each camera

view, using object tracking methods. Next, an automatic feature-

based video synchronization method is applied. This technique

uses both the extracted 2D ball information from two or more

camera views, plus camera calibration information. Then, in

order to ﬁnd 3D trajectory, the temporal 3D locations of the ball

is estimated using triangulation of correspondent 2D locations

obtained from automatically synchronized videos. Furthermore,

we also incorporate a physics-based trajectory model into the

system to improve the continuity of the tracked 3D ball during

times when no two cameras have overlapping views of the ball

location. The resultant 3D ball tracks are then visualized in

a virtual 3D graphical environment. Finally, we quantify the

accuracy of our system in terms of reprojection error.

I. INTRODUCTION

In professional sports we are familiar with high-end camera

technology being used to enhance the viewer experience above

and beyond a traditional broadcast. High proﬁle examples

include the Hawk-Eye Ofﬁciating System as used in tennis,

snooker and cricket. Whilst extremely valuable to the viewing

experience, such technologies are only feasible for high proﬁle

professional sports. Sports video analysis has also been exten-

sively used by coaches for the effective training of athletes.

Presently, there are several commercial technological solutions

for sports video analysis. However, these systems, again, tend

to be expensive to purchase and run.

Advances in camera technology, coupled with falling prices

means that reasonable quality visual capture is now within

reach of most local and amateur sporting and leisure organi-

zations. Thus it becomes feasible for every ﬁeld sports club,

whether tennis, soccer, cricket or hockey, to install their own

camera network at their local ground. By enabling sports video

analysis with low cost camera networks, many local amateur

clubs and sports institutions will be able to make use of these

types of technologies. In these cases, the motivation is usually

not for broadcast purposes, but rather for the technology to act

as a video referee or adjudicator, and also to facilitate coaches

and mentors to provide better feedback to athletes based on

recorded competitive training matches, training drills or any

prescribed set of activities.

In this work, we focus on tracking a tennis ball in 3D space

during a tennis match using the videos obtained from low-

cost camera network. Although the obtained 3D data could be

used for decision making purposes, as in Hawk-Eye, we focus

on its use as a low-cost tennis analysis system for coaching.

This 3D ball track data can be used for analysis purposes

such as determining the speed of the ball over the net (a

common tennis coach requirement), classiﬁcation of type of

shots played by the players, or to index the video frames and

classify important events for coaching [1]. One of the main

problems from using low-cost camera networks is that the

cameras are typically no synchronization between sensors, as

such a need for automatic video synchronization algorithms

exists. In addition, the use of less expensive cameras also

lead to the distortion [2] in the videos acquired, hence camera

calibration of both camera intrinsics, as well as extrinsics, is

essential. In this work, we also introduce a physics based

model into our system, to predict the position of the ball

when there is a lack of overlapping data from different camera

views. Modeling of the ball trajectory is an essential part of

this system as it provides continuity of the tracked features,

leading to improvised tracking robustness.

The remainder of this paper is organized as follows: Section

II outlines previous work in the area. We give a high-level

overview of our system in section III. In this section, we

the subsequently describe the video analysis components that

underpin the ball tracking techniques. In addition, this section

provides details on physics based modeling that we incorpo-

rated and visualization framework developed using OpenGL.

Section IV provides quantitative experimental evaluation of

our system in terms of reprojection error, and graphical results

indicating the advantages of prediction. Finally, we give our

conclusions and directions for future work in section V.

II. RELATED WORK

The work of [1] illustrates how a low-cost camera network

could be effectively used for performance analysis if ball and

player tracks are known. Our work extend that described by

Aksay et. al.[3], where techniques for 2D ball tracking, feature

based automatic video synchronization and 3D estimation are

described. We utilize the above mentioned techniques and

improvise the overall quality of the system by developing our

Fig. 1. Camera locations around the court.

Fig. 2. Block diagram of the system.

own algorithm for prediction in case of missing points in the

trajectory. For this work. the dataset from the “3DLife ACM

Multimedia Grand Challenge 2010 Dataset ”[4] is utilized.

This dataset includes 9 video streams of a competitive singles

tennis match scenario from 9 IP cameras placed at different

positions around an entire tennis court – see Figure1. This

dataset also includes chessboard images and 3D locations of

some known objects in the scene for camera calibration.

III. ALGORITHMIC DESIGN

Figure 2 represents our system at a block level. We use

videos acquired from both sideview cameras and the overhead

camera in the dataset for 2D ball detection and tracking, as

explained in III-A. Camera calibration data is acquired for

each individual camera using the Matlab camera calibration

toolbox [5]. Once the tracking information from each of the

2D camera view was acquired, every cameras video stream is

synchronized with respect to overhead camera – see Section

III-B. Once synced, the 3D ball tracking is extended into 3D

space using the camera calibration information as explained

Section III-C. We then introduce a physics based trajectory

model, which is required to provide continuity in obtained

3D data ball tracks. The algorithmic description of trajectory

modeling is provided in Section III-D. Finally, we created a

virtual tennis court using OpenGL and visualized the motion

(a) (b) (c)

Fig. 3. Results of Object Tracking using frame differencing and thresholding;

(a) Original Frame; (b) Dilated Moving Pixels (c) Ball Blob.

of the ball, which is explained in detail in Section III-E.

A. 2D Ball Detection and Tracking

Object tracking techniques include frame differencing, op-

tical ﬂow, mean-shift and various other methods. A simple

frame differencing and thresholding method would sufﬁce in

the given context, since the data set provided has a static

background with the only moving objects being tennis ball

and players. In order to extract the ball trajectories, we begin

with detection of ball candidates for every video frame S(n).

We use method similar to the one described in [3]. All of

the moving parts of the frame that satisfy certain color and

size constraints are initially considered as ball candidates. We

detect moving parts by utilizing the luminance adjacent frames

difference. For the n

th

luminance frame, S

y

(n), we obtainthe

moving parts by thresholding the image, M (n), calculated as:

M(n) = abs[S

y

(n + 1) − S

y

(n)].abs[S

y

(n) − S

y

(n − 1)] (1)

where the . in the above equation represents element by

element multiplication.In this way, the real moving parts of

S(n) are heavily emphasised in M(n). Using 3 adjacent

frames to detect moving parts in the middle frame, as in

equation 1, is necessary step so that ambiguities in the location

of moving parts are avoided.

To eliminate the false candidates from the obtained loca-

tions, distance, colour and size constraints are applied. We

ﬁrst eliminate false candiadates based on the colour infor-

mation. The blue, C

b

, and red, C

r

, channel values of the

tennis ball is inspected over diferent frames and for different

cameras. An empirical values of C

b

and C

r

is set and moving

pixels outside this range are eliminated. Next, we apply

morphological operations (dilation) to enhance the size of

the moving pixels, which otherwise, would be verydifﬁcult to

discrimanate between different objects. Dilation will also come

into advantage in identifying the blobs correspondingto players

and tennis ball, as blobs corresponding to players will be much

larger compared to the blobs corresponding to balls. Hence,

by empirically setting a threshold value on the blob size,

largerblobs are removed. The ﬁnal constraint considered when

eliminating false ball candidates is basedon distance between

tennis ball positions in two consecutive frames.A maximum

distance is set and any moving pixels outside this distance

areeliminated. This way, only one coordinate (corresponding

to center of mass of tennis ball) is extracted for each frame.

Figure 3(b) and (c) shows the results of the above discussed

techniques on an input frame.

Fig. 4. Plot of LM (∆) vs. Framedelay

B. 3D Estimation and Video Synchronization

Since the tennis videos in our data set are recorded at differ-

ent frame rates,there is no guarantee of the videos being started

at same time or that there will be no dropped frames through

each sequence. As such, there is need to synchronize these

videos before the 2D ball tracks from multiple cameras can be

used for 3D estimation. As such, we implemented the feature

based automatic video synchronization technique explained in

[3]. This method requires estimated 3D coordinate features

for each frame in order to know the de-synchronized timing.

Hence, its an inter-dependency problem where 3D coordinates

are required to synchronize the videos, and synchronized

videos are required to calculate accurate 3D coordinate. We

ﬁrst calculate the 3D trajectories point-by-point using trian-

gulation [6] of two 2D trajectories from the two videos to be

synchronized. Then, the 3D trajectoriesare back-projected onto

one of the camera views. Assuming that the camera calibration

is accurate, back-projected 3D trajectories should be almost

identical to the 2D original camera trajectories when the time

shift used is close to the real de-synchronization of the videos.

However, due to issues such as non-ideal calibration data and

outlier 3D trajectories, the measure, LM(∆), suggested in [3]

is used to ﬁnd out the best matching time shift ∆

max

.

LM(∆) =

L(∆)

D(∆)

, L(∆) = count(||or − bp|| < T

L

) (2)

D(∆) =

Σ||or − bp|| < T

L

L(∆)

(3)

where ||or −bp|| is the euclidean distance between the original

point and back projected point and ∆ is the tested time

shift. D(∆) is normalized euclidean distance between points,

calculated using only those points whose reprojected points

are within distance of some empirically set value of T

L

. The

required time shift is

∆

max

= arg max(LM(∆)) (4)

Figure 4 shows the plot of LM (∆) for different frame delays

in a test scenario. We choose the value of frame delay

that corresponds to the maximum value of LM(∆). In this

work, all the videos were synchronized with reference to

the overhead camera, since this camera has a ﬁeld of view

covering the whole of the tennis court.

C. Robust 3D Tracking

A disadvantage of considering only two cameras is that

we dont get a continuous temporal 3D ball track stream due

to lack of availability of the synchronized 2D data in two

views through all the frames. To overcome this drawback, we

employed a robust 3D tracking method using3D coordinates

obtained from different camera pairs at different points in time.

We combine the tracking data from these multiple cameras to

calculate a more stable, robustand accurate 3D ball trajectory.

Let a 2D coordinate of the tennis ball at time instance, t, in

i

th

camera view be p

2D,i

= [x

i

(t), y

i

(t)]

T

. We calculate 3D

points using triangulation of the 2D points in each camera

(p

2D,i

) with the 2D point in theoverhead camera (the 9

t

h

camera) (p

2D,9

):

p

3D,i

= triangulate(p

2D,i

, p

2D,9

). (5)

The 3D points calculated at each time instance correspond

toone real-world 3D coordinate and ideally all of them should

be identical. However, due to several factors like camera cali-

bration errors, 2D tracker errors ortriangulation approximation,

each of the 3D points will tend to differ slightly, so some

formal technique for combining these multiple 3D points is

needed. in this work, we use a weighted averaging to ﬁnd a

robust andaccurate 3D point p

3D

p

3D

=

P

i

w

i

∗ p

3D,i

P

i

w

i

(6)

where, w

i

is the measure for the level of accuracy of each

3D pointp

3D,i

. w

i

is calculated as the inverse Euclidean

distancebetween the original 2D point (p

2D,9

) and the back

projected 2D point (bp

2D,i

) on the 9

th

camera view. as shown

below.

w

i

=

1

d

i

=

1

||p

2D,9

, bp

2D,i

||

(7)

D. Physics Based Trajectory Modeling

Temporal prediction of the ball coordinates through times

when no 3D ball information is available is necessary because

to increase the continuity in the tracked features. This predic-

tion is achieved by considering the trajectory of the ball to be

a projectile. Projectiles are particles which are projected under

gravity through air, such as objects thrown by hand or shells

ﬁred from a gun. Typically, mathematics describe projectiles

with both horizontal and vertical velocity components, and are

subject to a downward vertical acceleration (i.e. acceleration

due to gravity). To simplify the problem, few assumptions

have been made. Parameters like air resistance and ball spin,

which would require modiﬁcation in the modeling, have been

neglected. We consider following kinematic equations of mo-

tion to predict the position of the ball in case of missing 3D

points:

v = u + at (8)

s = ut +

1

2

at

2

(9)

v

2

= u

2

+ 2as (10)

where v is the velocity at any time t, u is the initial velocity, a

is acceleration, and s is the distance traveled in time t. In our

problem, as air resistance and ball spin are not considered, only

Fig. 5. Visualization of ball trajectory at two different viewpoints.

the Z component of acceleration exists (-gravity), so the X and

Y components of acceleration are set to zero. We consider x,

y and z components of velocities separately and apply above

equations to predict position of the ball in case of missing

points in ball trajectory.

The coordinates are predicted using following steps: Say,

frames i to i + k require prediction

1) For the frame i with no 3D coordinate estimated, the

x,y,z components of velocities are found out using

tracked 3D points in frames i − 1 and i − 2.

2) Using this velocity information and equations of motion,

the 3D coordinate in the frame i is predicted.

3) Step 1 & 2 are carried out for all consecutive frames

(up to frame i + k).

4) Predicted points are retained only if the predicted point

in frame i + k is within some tolerable distance of the

estimated 3D coordinate in the frame i + k + 1.

E. Visualization Framework

If the developed algorithms have to be effectively used for

performance analysis by coaches or as a decision making

tool, a 3D graphical user interface (GUI) is essential, as the

visualisation makes the system more intuitive and appealing.

We have developed a GUI using OpenGL [7], one of the most

widely used and supported 2D and 3D graphics application

programming interface (API). It is hardware independent and

very much portable, hence it can be used wide across many

platforms. The frame work we developed is a virtual tennis

court, with an interface of selectingdifferent camera views and

zoom in features – see Figure 5.

IV. EXPERIMENTAL RESULTS

To evaluate our approach, we quantify the accuracy of our

system interms of reprojection error, which is deﬁned as the

distance between the actual 2D pixel coordinates and the

reprojected pixel coordinates calculated using L1 Norm.

TABLE I

REPROJECTION ERRORS

Camera Combinations Reprojection Error

Camera 2 & 9 10.5891

Camera 4 & 9 7.2260

Camera 2, 4 & 9 12.8549

Table I shows the reprojection error obtained for different

combinations of cameras used for 3D tracking. As the number

of cameras considered for analysis increases, the number of

tracked points also increases, but at the cost of reprojection

error. Unfortunately, for the tracked points with inclusion of

Fig. 6. Continuity of tracked features for different conditions

(a)

(b)

Fig. 7. Reprojected trajectory (green) of the ball; (a) without prediction, and;

(b) with prediction

prediction model, the reprojection error can not be calculated

since we do not have ground truth data to compare with.

A graphical representation of the results obtained for various

techniques is shown in Figure 6. In this ﬁgure, time is on

the horizontal axis, and times at which the ball is tracked is

highlighted with a horizontal line, with times when the ball

track is lost represented by a gap. From this ﬁgure, we can

see that with increasing the number of cameras for tracking

the continuity in the tracked features also increases. The

bottom line represents continuity when trajectory modeling

is included. We can observe that some of the gaps are ﬁlled

after incorporating prediction in the system.

This advantage of incorporating trajectory modeling can

also be seen in Figure 7, where trajectories with and without

prediction are depicted. Notice how the tracked trajectory of

the ball (in green) is increased in (b) when compared to (a).

V. CONCLUSIONS AND FUTURE WORK

In this paper we presented algorithms associated with 2D

& 3D object tracking and video synchronization. We also

presented a basic, physics based modelling to increase the-

continuity of the tracked features. We believe that, though

the prediction model is very basic, it could be the ﬁrst step

towardsdevelopment of a complex modelling system. In future

work, an accurate modelling of the ball trajectory could be

developed to ensure the continuity of the tracked features, by

considering realtime scenarios like ball spin and air resistance.

ACKNOWLEDGMENT

This work is supported by Science Foundation Ireland under grant

07/CE/I1147.

REFERENCES

[1] P. Kelly, J. Diego, P.-M. Agapito, C. O. Conaire, D. Connaghan, J. Kuk-

lyte, and N. E. O’Connor., “Performance analysis and visualisation in

tennis using a low-cost camera network,” in Multimedia Grand Challenge

Track at ACM Multimedia, 2010.

[2] G. Bradski and A. Kaehler, Learning OpenCV-Computer Vision with

OpenCV Library, M. Loukides, Ed. OReilly Publications, 2008.

[3] A. Aksay, V. Kitanovski, K. Vaiapury, E. Onasoglou, J. D. P. M. Agapito,

P. Daras, and E. Izquierdo., “Robust 3d tracking in tennis videos.” in

Engage Summer School, Sept. 2010.

[4] C. O. Conaire, P. Kelly, D. Connaghan, and N. E. O’Connor., “Tennis-

sense: A platform for extracting semantic information from multi-camera

tennis data,” in DSP 2009 - 16th International Conference on Digital

Signal Processing, 2009, pp. 1062–1067.

[5] J. Bouguet, “Camera calibration toolbox for matlab.” [Online]. Available:

http://www.vision.caltech.edu/bouguetj/

[6] Y. Morvan, “Acquisition, compression and rendering of depth and tex-

ture for multiview video,” Ph.D. dissertation, Eindhoven University of

Technology, Eindhoven, The Netherlands, 2009.

[7] K. Group, “Opengl overview.” [Online]. Available:

http://www.opengl.org/about/overview/

##### Citations

More filters

••

[...]

TL;DR: An exhaustive survey of all the published research works on ball tracking in a categorical manner is presented to present discussions on the published work so far and views and opinions followed by a modified block diagram of the tracking process.

Abstract: Increase in the number of sport lovers in games like football, cricket, etc. has created a need for digging, analyzing and presenting more and more multidimensional information to them. Different classes of people require different kinds of information and this expands the space and scale of the required information. Tracking of ball movement is of utmost importance for extracting any information from the ball based sports video sequences. Based on the literature survey, we have initially proposed a block diagram depicting different steps and flow of a general tracking process. The paper further follows the same flow throughout. Detection is the first step of tracking. Dynamic and unpredictable nature of ball appearance, movement and continuously changing background make the detection and tracking processes challenging. Due to these challenges, many researchers have been attracted to this problem and have produced good results under specific conditions. However, generalization of the published work and algorithms to different sports is a distant dream. This paper is an effort to present an exhaustive survey of all the published research works on ball tracking in a categorical manner. The work also reviews the used techniques, their performance, advantages, limitations and their suitability for a particular sport. Finally, we present discussions on the published work so far and our views and opinions followed by a modified block diagram of the tracking process. The paper concludes with the final observations and suggestions on scope of future work.

27 citations

••

[...]

TL;DR: In this article, a semi-supervised generative adversarial network (GAN) was proposed to predict shot location and type in tennis players based on their episodic and semantic memory components.

Abstract: This paper presents a novel framework for predicting shot location and type in tennis. Inspired by recent neuroscience discoveries, we incorporate neural memory modules to model the episodic and semantic memory components of a tennis player. We propose a Semi-Supervised Generative Adversarial Network architecture that couples these memory models with the automatic feature learning power of deep neural networks, and demonstrate methodologies for learning player level behavioral patterns with the proposed framework. We evaluate the effectiveness of the proposed model on tennis tracking data from the 2012 Australian Tennis Open and exhibit applications of the proposed method in discovering how players adapt their style depending on the match context.

16 citations

••

[...]

TL;DR: It is shown theoretically and empirically that a simple motion trajectory analysis suffices to translate from pixel measurements to the person's metric height, reaching a MAE of up to 3.9 cm on jumping motions, and that this works without camera and ground plane calibration.

Abstract: Estimating the metric height of a person from monocular imagery without additional assumptions is ill-posed. Existing solutions either require manual calibration of ground plane and camera geometry, special cameras, or reference objects of known size. We focus on motion cues and exploit gravity on earth as an omnipresent reference 'object' to translate acceleration, and subsequently height, measured in image-pixels to values in meters. We require videos of motion as input, where gravity is the only external force. This limitation is different to those of existing solutions that recover a person's height and, therefore, our method opens up new application fields. We show theoretically and empirically that a simple motion trajectory analysis suffices to translate from pixel measurements to the person's metric height, reaching a MAE of up to 3.9 cm on jumping motions, and that this works without camera and ground plane calibration.

7 citations

### Cites background or methods from "3D Estimation and Visualization of ..."

[...]

[...]

[...]

[...]

01 Jan 2015

TL;DR: In this article, the authors investigated how past observations from a stereo system can be used to recreate trajectories when video from only one of the cameras is available, and the best method was found to be a nearest neighbors-search optimized by a Kalman filter.

Abstract: Tracking a moving object and reconstructing its trajectory can be done with a stereo camera system, since the two cameras enable depth vision. However, such a system would not work if one of the cameras fails to detect the object. If that happens, it would be beneficial if the system could still use the functioning camera to make an approximate trajectory reconstruction.In this study, I have investigated how past observations from a stereo system can be used to recreate trajectories when video from only one of the cameras is available. Several approaches have been implemented and tested, with varying results. The best method was found to be a nearest neighbors-search optimized by a Kalman filter. On a test set with 10000 golf shots, the algorithm was able to create estimations which on average differed around 3.5 meters from the correct trajectory, with better results for trajec-tories originating close to the camera.

5 citations

### Cites background or methods from "3D Estimation and Visualization of ..."

[...]

[...]

••

[...]

TL;DR: The experimental results show that the proposed method can estimate a 3D baseball trajectory precisely using a multiple unsynchronized camera system and is robust to variations in capture delay, both in the simulation space and in real-world situations.

Abstract: We developed a method for the precise estimation of the 3D trajectory of a baseball by modeling the movement of the baseball and estimating the capture delay, using multiple unsynchronized cameras. To develop the proposed algorithm, we mimicked the real-world process of capturing a baseball in simulation space, and analyzed the capture process using a multiple unsynchronized camera system. We represented the movement of the baseball using a piece-wise spline model, and predicted the position of the baseball in the subframes in a manner which is robust to position error and change in direction of movement of the baseball. This method accurately predicts the baseball position over time by modeling the movement of the baseball in a real baseball game environment, and improves the accuracy of the reconstructed 3D baseball trajectories. We defined an objective function to estimate the capture delay, and estimate the optimal capture delay parameter using non-linear optimization method. In addition, we evaluated the performance of the proposed method in simulation space and in a real-world situation. The experimental results show that the proposed method can estimate a 3D baseball trajectory precisely using a multiple unsynchronized camera system and is robust to variations in capture delay, both in the simulation space and in real-world situations.

3 citations

### Cites background from "3D Estimation and Visualization of ..."

[...]

##### References

More filters

•

[...]

01 Jan 2009

TL;DR: A new multi-view depth-estimation technique is proposed, employing a one-dimensional optimization strategy that reduces the noise level in the estimated depth images and enforces consistent depth images across the views, and is suitable for execution on a standard Graphics Processor Unit (GPU).

Abstract: Three-dimensional (3D) video and imaging technologies is an emerging trend in the development of digital video systems, as we presently witness the appearance of 3D displays, coding systems, and 3D camera setups. Three-dimensional multi-view video is typically obtained from a set of synchronized cameras, which are capturing the same scene from different viewpoints. This technique especially enables applications such as freeviewpoint video or 3D-TV. Free-viewpoint video applications provide the feature to interactively select and render a virtual viewpoint of the scene. A 3D experience such as for example in 3D-TV is obtained if the data representation and display enable to distinguish the relief of the scene, i.e., the depth within the scene. With 3D-TV, the depth of the scene can be perceived using a multi-view display that renders simultaneously several views of the same scene. To render these multiple views on a remote display, an efficient transmission, and thus compression of the multi-view video is necessary. However, a major problem when dealing with multiview video is the intrinsically large amount of data to be compressed, decompressed and rendered. We aim at an efficient and flexible multi-view video system, and explore three different aspects. First, we develop an algorithm for acquiring a depth signal from a multi-view setup. Second, we present efficient 3D rendering algorithms for a multi-view signal. Third, we propose coding techniques for 3D multi-view signals, based on the use of an explicit depth signal. This motivates that the thesis is divided in three parts. The first part (Chapter 3) addresses the problem of 3D multi-view video acquisition. Multi-view video acquisition refers to the task of estimating and recording a 3D geometric description of the scene. A 3D description of the scene can be represented by a so-called depth image, which can be estimated by triangulation of the corresponding pixels in the multiple views. Initially, we focus on the problem of depth estimation using two views, and present the basic geometric model that enables the triangulation of corresponding pixels across the views. Next, we review two calculation/optimization strategies for determining corresponding pixels: a local and a one-dimensional optimization strategy. Second, to generalize from the two-view case, we introduce a simple geometric model for estimating the depth using multiple views simultaneously. Based on this geometric model, we propose a new multi-view depth-estimation technique, employing a one-dimensional optimization strategy that (1) reduces the noise level in the estimated depth images and (2) enforces consistent depth images across the views. The second part (Chapter 4) details the problem of multi-view image rendering. Multi-view image rendering refers to the process of generating synthetic images using multiple views. Two different rendering techniques are initially explored: a 3D image warping and a mesh-based rendering technique. Each of these methods has its limitations and suffers from either high computational complexity or low image rendering quality. As a consequence, we present two image-based rendering algorithms that improves the balance on the aforementioned issues. First, we derive an alternative formulation of the relief texture algorithm which was extented to the geometry of multiple views. The proposed technique features two advantages: it avoids rendering artifacts ("holes") in the synthetic image and it is suitable for execution on a standard Graphics Processor Unit (GPU). Second, we propose an inverse mapping rendering technique that allows a simple and accurate re-sampling of synthetic pixels. Experimental comparisons with 3D image warping show an improvement of rendering quality of 3.8 dB for the relief texture mapping and 3.0 dB for the inverse mapping rendering technique. The third part concentrates on the compression problem of multi-view texture and depth video (Chapters 5–7). In Chapter 5, we extend the standard H.264/MPEG-4 AVC video compression algorithm for handling the compression of multi-view video. As opposed to the Multi-view Video Coding (MVC) standard that encodes only the multi-view texture data, the proposed encoder peforms the compression of both the texture and the depth multi-view sequences. The proposed extension is based on exploiting the correlation between the multiple camera views. To this end, two different approaches for predictive coding of views have been investigated: a block-based disparity-compensated prediction technique and a View Synthesis Prediction (VSP) scheme. Whereas VSP relies on an accurate depth image, the block-based disparity-compensated prediction scheme can be performed without any geometry information. Our encoder adaptively selects the most appropriate prediction scheme using a rate-distortion criterion for an optimal prediction-mode selection. We present experimental results for several texture and depth multi-view sequences, yielding a quality improvement of up to 0.6 dB for the texture and 3.2 dB for the depth, when compared to solely performing H.264/MPEG-4AVC disparitycompensated prediction. Additionally, we discuss the trade-off between the random-access to a user-selected view and the coding efficiency. Experimental results illustrating and quantifying this trade-off are provided. In Chapter 6, we focus on the compression of a depth signal. We present a novel depth image coding algorithm which concentrates on the special characteristics of depth images: smooth regions delineated by sharp edges. The algorithm models these smooth regions using parameterized piecewiselinear functions and sharp edges by a straight line, so that it is more efficient than a conventional transform-based encoder. To optimize the quality of the coding system for a given bit rate, a special global rate-distortion optimization balances the rate against the accuracy of the signal representation. For typical bit rates, i.e., between 0.01 and 0.25 bit/pixel, experiments have revealed that the coder outperforms a standard JPEG-2000 encoder by 0.6-3.0 dB. Preliminary results were published in the Proceedings of 26th Symposium on Information Theory in the Benelux. In Chapter 7, we propose a novel joint depth-texture bit-allocation algorithm for the joint compression of texture and depth images. The described algorithm combines the depth and texture Rate-Distortion (R-D) curves, to obtain a single R-D surface that allows the optimization of the joint bit-allocation in relation to the obtained rendering quality. Experimental results show an estimated gain of 1 dB compared to a compression performed without joint bit-allocation optimization. Besides this, our joint R-D model can be readily integrated into an multi-view H.264/MPEG-4 AVC coder because it yields the optimal compression setting with a limited computation effort.

81 citations

••

[...]

TL;DR: TennisSense, a technology platform for the digital capture, analysis and retrieval of tennis training and matches, is introduced and the algorithms for extracting useful metadata from the overhead court camera are described and evaluated.

Abstract: In this paper, we introduce TennisSense, a technology platform for the digital capture, analysis and retrieval of tennis training and matches. Our algorithms for extracting useful metadata from the overhead court camera are described and evaluated. We track the tennis ball using motion images for ball candidate detection and then link ball candidates into locally linear tracks. From these tracks we can infer when serves and rallies take place. Using background subtraction and hysteresis-type blob tracking, we track the tennis players positions. The performance of both modules is evaluated using ground-truthed data. The extracted metadata provides valuable information for indexing and efficient browsing of hours of multi-camera tennis footage and we briefly illustrative how this data is used by our tennis-coach playback interface.

38 citations

### "3D Estimation and Visualization of ..." refers background in this paper

[...]

•

[...]

01 Oct 2010

TL;DR: A novel system for tennis performance analysis that allows coaches to review games and provide detailed audio-visual feedback to tennis athletes and can be generalised to other sports and allow a range of non-professional sports clubs to provide high-quality feedback to their athletes.

Abstract: We describe a novel system for tennis performance analysis that allows coaches to review games and provide detailed audio-visual feedback to tennis athletes. The basis for our system is a network of low-cost IP cameras surrounding the tennis court. Our system exploits the output of several visual analysis modules, including the tracking of players and the tennis ball, and the extraction of player silhouettes for 3D reconstruction. A range of intuitive tools within the interface allow tennis coaches to add 2D and 3D annotations to live video, view play from multiple perspectives, record audio commentary and compute game statistics in real-time. The result is a video file that can be used to provide personalised feedback to the players or for use as a teaching resource for others. While we focus on tennis in this work, we believe our system can be generalised to other sports and allow a range of non-professional sports clubs to provide high-quality feedback to their athletes.

9 citations

### "3D Estimation and Visualization of ..." refers background or methods in this paper

[...]

[...]

##### Related Papers (5)

[...]

[...]

[...]

[...]

[...]

##### Frequently Asked Questions (2)

###### Q2. What are the future works in "3d estimation and visualization of motion in a multicamera network for sports" ?

In future work, an accurate modelling of the ball trajectory could be developed to ensure the continuity of the tracked features, by considering realtime scenarios like ball spin and air resistance.