scispace - formally typeset
Open AccessJournal ArticleDOI

Arbitrary viewpoint video synthesis from multiple uncalibrated cameras

Satoshi Yaguchi, +1 more
- Vol. 34, Iss: 1, pp 430-439
Reads0
Chats0
TLDR
A method for arbitrary view synthesis from uncalibrated multiple camera system, targeting large spaces such as soccer stadiums, and a method for merging the synthesized images with the virtual background scene in the PGS.
Abstract
We propose a method for arbitrary view synthesis from uncalibrated multiple camera system, targeting large spaces such as soccer stadiums. In Projective Grid Space (PGS), which is a three-dimensional space defined by epipolar geometry between two basis cameras in the camera system, we reconstruct three-dimensional shape models from silhouette images. Using the three-dimensional shape models reconstructed in the PGS, we obtain a dense map of the point correspondence between reference images. The obtained correspondence can synthesize the image of arbitrary view between the reference images. We also propose a method for merging the synthesized images with the virtual background scene in the PGS. We apply the proposed methods to image sequences taken by a multiple camera system, which installed in a large concert hall. The synthesized image sequences of virtual camera have enough quality to demonstrate effectiveness of the proposed method.

read more

Content maybe subject to copyright    Report

430 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 34, NO. 1, FEBRUARY 2004
Arbitrary Viewpoint Video Synthesis
From Multiple Uncalibrated Cameras
Satoshi Yaguchi and Hideo Saito
Abstract—We propose a method for arbitrary view synthesis
from uncalibrated multiple camera system, targeting large spaces
such as soccer stadiums. In Projective Grid Space (PGS), which is
a three-dimensional space defined by epipolar geometry between
two basis cameras in the camera system, we reconstruct three-di-
mensional shape models from silhouette images. Using the three-
dimensional shape models reconstructed in the PGS, we obtain
a dense map of the point correspondence between reference im-
ages. The obtained correspondence can synthesize the image of
arbitrary view between the reference images. We also propose a
method for merging the synthesized images with the virtual back-
ground scene in the PGS. We apply the proposed methods to image
sequences taken by a multiple camera system, which installed in
a large concert hall. The synthesized image sequences of virtual
camera have enough quality to demonstrate effectiveness of the
proposed method.
Index Terms—Fundamental matrix, projective geometry, pro-
jective grid space, shape from multiple cameras, view interpola-
tion, virtual view synthesis.
I. INTRODUCTION
T
HE synthesis of new views from images can enhance the
visual-entertainment effect of a movie or broadcast for a
television viewer. One way of enhancing the visual effect is
through virtual movement of viewpoint, which makes viewers
virtually feel that they are in the target scene. Recent applica-
tions of this effect can be found in the futuristic movie “The
Matrix,” and the SuperBowl XXXV broadcast by CBS in 2001
which used the EyeVision system. Virtualized Reality [8], a pi-
oneering project in this field, has achieved virtual viewpoint
movement for dynamic scenes by using computer vision tech-
nology. Whereas “The Matrix” and EyeVision use the switching
effect of real images taken by multiple cameras, computer-vi-
sion-based technology can synthesize arbitrary viewpoint im-
ages to create a virtual viewpoint movement effect.
We aim to apply virtualized reality technology to actual
sporting events. New-view images are generated by rendering
pixel values of input images in accordance with the geometry of
the new view and a three-dimensional (3-D) structure model of
the scene, which is reconstructed from multiple-view images.
The 3-D shape reconstruction from multiple views requires
camera calibration, which is carried out in order to relate the
camera geometry to the object space geometry. For camera cal-
ibration, the 3-D positions of several points in Euclidean space
Manuscript received February 19, 2002; revised July 2, 2002. This paper was
recommended by Associate Editor I. Gu.
The authors are with the Department of Information and Com-
puter Science, Keio University, Yokohama 223-8522, Japan (e-mail:
yagu@ozawa.ics.keio.ac.jp; saito@ozawa.ics.keio.ac.jp).
Digital Object Identifier 10.1109/TSMCB.2003.817108
and 2-D positions of those points on each view image must
be measured precisely. For this reason, when there are many
cameras involved in the production of an event, a lot of effort
must be expended to perform the calibration. This is especially
true in the case of a large space, such as a stadium, where it is
very difficult to set many calibration points whose positions
have to be precisely measured for the entire area. We have
already proposed a new framework for shape reconstruction
from multiple uncalibrated cameras in a projective grid space
(PGS) [15], in which coordinates between cameras are defined
by using epipolar geometry instead of calibration.
In this paper, we present a method for generating arbitrary
views from image sequences taken from multiple uncalibreated
cameras. The shape-from-silhouette (SS) [2], [14] method is ap-
plied to reconstruct the shape model in the PGS. Then, the dense
corresponding relation between the images derived from the
shape model is used to synthesize intermediate appearance view
images. We demonstrate the proposed framework by showing
several virtual image sequences generated from corrected mul-
tiple-camera image sequences captured in a large space
.
II. R
ELATED WORKS
View synthesis from stereo images has long been a topic of
study [17]. Once the disparity between a pair of stereo images
is obtained, it can be modified to obtain intermediate images.
However, a hole, where no pixel value can be assigned from
the original stereo pair, generally appears in synthesized view
images because of occluded regions.
One method devised for removing such a hole caused by an
occlusion is to use a completely closed 3-D shape model of the
object, which can be obtained by using shape scanning tech-
nology [4], [24] or recovered from multiple-view images [6],
[19], [23]. Such a framework for generating new views from
the recovered 3-D model of an object and its texture map on the
3-D model surface is generally called model-based rendering
(MBR). MBR can handle the occlusion problem, but registra-
tion errors in the texture map on the constructed 3-D model may
cause blurring of the synthesized virtual images.
Alternatively, image-based rendering (IBR) [1], [3], [5], [7],
[10], [11], [18] has recently been developed for generating
new-view images from multiple-view images without using
a 3-D shape model of the object. Because IBR is essentially
based on 2-D image processing (cut, warp, paste, etc.), the
errors in 3-D shape reconstruction do not affect the quality of
the generated images as much as they do for the model-based
rendering method. This implies that the quality of the input
1083-4419/04$20.00 © 2004 IEEE

YAGUCHI AND SAITO: ARBITRARY VIEWPOINT VIDEO SYNTHESIS FROM MULTIPLE UNCALIBRATED CAMERAS 431
images can be well preserved in the new view images, however,
we will have to ignore the occlusion effects.
Appearance-based virtual-view synthesis [16] takes into ac-
count the advantages of MBR and IBR. This 3-D shape model,
which is recovered from multiple images, provides the required
information for the IBR process, such as correspondence map,
occluded area, etc., for the input images. The precise and dense
correspondences make it possible to generate virtual views at
arbitrary viewpoints without losing pixels even in partially oc-
cluded regions. Image-based Visual Hull (IBVH) [12] is another
virtual view synthesis method that has the advantages of MBR
and IBR. In IBVH, the hull shape of the object is represented
by the intersection of silhouettes on the epipolar lines of one
base camera. Such image-based representation contributes to
high-speed rendering with conventional image rendering hard-
ware. IBVH is difficult to manipulate reconstructed objects and
virtual objects in the virtual space however, because the explicit
3-D shape model is not represented. The concept of a visual hull
was originally proposed by Laurentini [9]. Although the visual
hull reconstructed from silhouette images cannot represent an
actual 3-D shape, the visual hull can be used as an approxima-
tion of the actual 3-D shape in some cases, such as IBVH [12]
and the method presented in this paper.
The method presented in this paper extends the appear-
ance-based virtual-view synthesis to the projective reconstruc-
tion framework in PGS. By applying the PGS to the similar
virtual view synthesis technique, the strong camera calibration
required in conventional work can be avoided.
III. P
ROJECTIVE GRID
SPACE
Reconstructing a 3-D shape model from multiple-view im-
ages requires a relationship between the 3-D coordinate of the
object scene and the 2-D coordinate of the camera-image plane.
Projection matrices that represent this relationship are estimated
from measurements of 3-D/2-D correspondences obtained at a
set of sample points. Since the 3-D coordinates are defined in-
dependently from the camera geometries, the 3-D positions of
the sample points must be measured independently from each
camera geometry. This procedure is called camera calibration
[22]. Calibrating all of the each camera in a multiple-camera
system requires a lot of work [8], [23]. Reconstructing a 3-D
shape model from multiple-view images requires a relationship
between the 3-D coordinate of the object scene and the 2-D co-
ordinate of the camera image plane.
In our method, a 3-D point is related to a 2-D image point
without estimating the projection matrices in a PGS [15], which
is determined by using only the fundamental matrices [25] rep-
resenting the epipolar geometry between two basis cameras. Be-
cause the 3-D coordinate in a PGS is dependently defined from
the camera-image coordinates, the 3-D position of the sample
points does not have to be measured. Therefore, the PGS en-
ables 3-D reconstruction from multiple images without the need
to estimate the projection matrices of each camera.
Fig. 1 shows the PGS scheme. The PGS is defined by the
camera coordinates of the two basis cameras. Each pixel point
in the first basis camera image defines one grid line in
the space. On the grid line, grid-node points are defined by the
horizontal position
in the second image. Thus, the coordinates
P and Q of PGS are decided by the horizontal coordinate and the
vertical coordinate of the first basis image, and the coordinate
R of the PGS is decided by the horizontal coordinate. Since the
fundamental matrix
limits the position in the second basis
view on the epipolar line
, is sufficient for defining the grid
point. In this way, the projective grid space can be defined by
two basis view images, whose node points are represented by
.
We should note here the potential problem in this PGS frame-
work. If the epipolar lines are nearly parallel, the epipolar lines
transferring scheme fails to determine accurate intersection
points. In such a case, we cannot recover a correct 3-D shape
model and synthesize intermediate view images based on this
PGS framework. This situation can be avoided by distributing
the camera system so that the epipolar lines are not parallel
between cameras.
IV. M
ODEL
RECONSTRUCTION
Under the PGS framework, we reconstruct a 3-D shape model
of the dynamic object by using the SS method. (We assume
that the silhouette has been previously extracted by background
subtraction.)
In the conventional SS method, each voxel in a certain Eu-
clidean space is projected onto every silhouette image with pro-
jection matrices (which are calculated by accurately calibrating
every camera [2], [14]) to check whether it is included in the ob-
ject region. In applying the SS method in the PGS, every point
in the PGS must be projected onto each silhouette image. As de-
scribed in the previous section, the PGS is defined by two basis
views, and a point in the PGS is represented as
. Point
is projected onto and in the first basis
image and the second basis image, respectively. Point
is pro-
jected as the epipolar line
on the second basis view. Point
on the projected line (Fig. 1), is expressed as
(1)
where
represents the fundamental matrix between the first
and second basis images.
The projected point in an
th arbitrary real image is deter-
mined from two fundamental matrices,
, between two
basis images and the
th image. Since is projected
onto
in the first basis image, the projected point in the
th image must be on the epipolar line of , which is
derived by
as
(2)
In the same way, the projected point in the
th image must be on
epipolar line
of in the basis image, which is derived
by the
as
(3)

432 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART B: CYBERNETICS, VOL. 34, NO. 1, FEBRUARY 2004
Fig. 1. Definition of projective grid. Point
A
(
p; q; r
)
on the projective grid
space is projected to
a
(
p; q
)
and
a
(
r; s
)
on the first and second basis images.
Fig. 2. Projection of point in space onto an image. Point
A
(
p; q;r
)
on the
projective grid space is projected to the intersection of two epipolar lines in the
image of other view
i
.
The point where epipolar lines and intersect is the pro-
jected point of
onto the th image (Fig. 2). In this way,
every projective grid point is projected onto every image, where
the relationship can be represented by only the fundamental ma-
trices between the image and two basis images.
The process for reconstructing 3-D shape model is outlined
as follows.
First, two cameras are selected as basis cameras, and then the
coordinate of the PGS is determined. Every voxel in a certain re-
gion is projected onto each silhouette image with the proposed
scheme, as shown in Figs. 1 and 2. The voxel that is projected
onto the object silhouette for all images is considered an ex-
istent voxel, while others are considered nonexistent. Thus the
volume of the object can be determined in the voxel represented
in the PGS. In this process, the order in which the existence of
a voxel is checked is important for reducing the computational
complexity, because the cost of computing the projection of a
voxel onto an image is not the same for all the images in the
proposed scheme. Since the vertical and horizontal coordinate
of the first basis-view image are equivalent to P and Q coordi-
nates in the PGS, projecting each voxel onto the first basis view
image requires no calculation involving a fundamental matrix.
In the second basis view image, the projected point is decided
by calculating only one multiplication of a fundamental matrix
to determine the epipolar line. This implies that the calculation
for projection onto the second basis view becomes half com-
pared with projecting the other images. Therefore, the order in
which the existence of a voxel is checked should be Basis view
1, Basis view 2, and so on.
After the voxel existence determination, the implicit surface
of the voxel representation of the object is extracted by using
the Marching Cubes algorithm. Finally, the object model is re-
constructed as a surface representation in the PGS.
V. V
IRTUAL VIEW SYNTHESIS
An arbitrary view image from a 3-D shape model can be gen-
erated by texture mapping onto the 3-D shape model [8], [23]
or by morphing from the point correspondence of some refer-
ence images calculated using the model [1], [3], [16], [18]. In
the former, the texture of the images are projected onto the 3-D
shape model, and then re-projected onto the image. In this pro-
cedure, however, the generated images are likely to suffer from
rendering artifacts caused by the inaccuracy of the 3-D shape.
Therefore, we apply the latter procedure to generate arbitrary
view images.
A. Arbitrary View Synthesis
Arbitrary view images are synthesized as intermediate images
of two or three real neighboring reference images. If two refer-
ence images are selected, a virtual viewpoint can be taken on
the line between the two real reference viewpoints. If three are
selected, the virtual viewpoint can be taken from the inside of
the triangle formed by the three real viewpoints. Therefore, if a
number of cameras are mounted on the surface of a hemisphere
enclosing the target space and any three of them form a triangle
effectively, the virtual viewpoint can be moved freely all around
the half sphere.
For the synthesis of arbitrary view images, intermediate im-
ages are synthesized by interpolating two or three reference im-
ages. The interpolation is based on the related concepts of view
interpolation [3]. First, an image depicting the depth (a depth
image) of the 3-D model is rendered on each reference image.
To render the depth image, the 3-D positions of all the vertices
on the surface representation of the 3-D shape model in the
PGS are projected onto each reference viewpoint by applying
the smallest depth value to the points projected onto the depth
image. The depth
of the surface point from the reference view-
point can be calculated by the following equation:
(4)
where
and represent the 3-D position on the
surface in PGS and the viewpoint of the reference image, re-
spectively. The 3-D position of the viewpoint can be determined
by using the epipolar geometry of the cameras in the following
procedure.
As shown in Fig. 3, the viewpoints of the two basis cameras
and the other cameras are indicated by
, , and , respec-
tively. Since the first basis camera viewpoint
can be pro-
jected onto everywhere of the first basis images (Image 1), the
and components of can not uniquely be determined.
Thus, we take the center point of the first basis camera, such

YAGUCHI AND SAITO: ARBITRARY VIEWPOINT VIDEO SYNTHESIS FROM MULTIPLE UNCALIBRATED CAMERAS 433
Fig. 3. Position of the viewpoint of each camera in PGS.
Fig. 4. Synthesis of desired view from three neighboring view images.
as ( , ). The component of is the component
of the projected point of
onto the second basis image, thus
the epipole of the first basis camera in the second basis camera
determines the component of . Therefore, the 3-D po-
sition of the viewpoint of the first basis camera is defined as
.
In the same way, if
is the epipole of the second basis
camera in the first basis camera, then the
and components
of the 3-D position of the second basis camera viewpoint
are represented as , , respectively. We also define the
component of by the center position of the second basis
camera. Then the 3-D position of the viewpoint of the second
basis camera is defined as
. On the other
hand, the viewpoint of the other cameras
can be represented
as
, by using the epipoles of the two basis
cameras in camera
.
After rendering these depth images of the reference view-
points, an intermediate viewpoint image is synthesized as fol-
lows. Let
, , represent the weighted values of the inter-
polation of the reference view images. First, each vertex on the
3-D surface model is projected onto all the reference viewpoints,
and the projected points are indicated as
, and , which
are shown in Fig. 4. Then, the pixel position on the interpolated
view image
for the vertex is calculated by the following equa-
tion:
(5)
Next, visibility of the vertex from each reference viewpoints
is checked by comparing the depth from the reference viewpoint
Fig. 5. Extracting the correspondence points between the two basis view
images.
to the vertex with the depth value in the depth image at the refer-
ence viewpoint. If they are not equal, the vertex can be regarded
as invisible from the reference viewpoint. Let
, , repre-
sent the visibility (1: visible, 0: invisible) of the reference view-
point. If the vertex is visible from at least one reference view-
point, the color value is determined as the following equation:
(6)
where
, and are the colors of the projected
points, and
is the interpolated color.
By changing the weighting ratio, the virtual viewpoint can be
moved inside of the triangle.
The interpolation strategy presented here can synthesize geo-
metrically correct intermediate images only if the viewing direc-
tions of the reference cameras are parallel to each other. Actu-
ally, they cannot be parallel because we assume all the cameras
are directed at the common objective space. Thus, the interpo-
lation strategy implies the geometrical approximation such that
the reference cameras are parallel. In this paper, we assume that
the distortion caused by the approximation is not obvious in the
camera placement shown in Fig. 9. If we need to synthesize a
geometrically correct interpolation between the reference view-
points, we need to take into account the homographic transfor-
mation of the image plane that occurs among the reference cam-
eras and the interpolated viewpoint, as Seitz
et al. proposed in
[18].
B. Synthesizing the Floor Plane
We also propose here a method to synthesize a floor plane in
this projective method. Since the floor plane is removed at the
step where the silhouette image is made, the SS method only
provides the 3-D shape of an object without its background. We
generate more realistic images, by synthesizing a floor plane
image.
Since the coordinate axes of the PGS are defined by two basis
cameras, a line and a plane can not be represented in the PGS
by the same form of the equation in Euclidean space. Therefore,
we need to represent the floor plane by using basis views. We
synthesize the floor plane from more than three points extracted
from the real background image. In the following, we explain
the details of the procedure.
Several correspondence points on the floor region between
the two basis view images are extracted as shown in Fig. 5.
Those points are picked out manually in our experiment. From

434 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART B: CYBERNETICS, VOL. 34, NO. 1, FEBRUARY 2004
Fig. 6. Projecting the vertex of Delauney triangles onto the other view image
using fundamental matrices.
Fig. 7. Synthesizing the floor plane on the arbitrary view image.
Fig. 8. Event hall at B-con Plaza.
the definition of the PGS, the coordinate of correspondence
point
is on the first basis view, and
on the second, thus the coordinate of in the PGS becomes
. Since the coordinate of a point in the PGS is
fixed, the point can be projected onto every input view image
with the fundamental matrices in the same way as stated before.
Fig. 9. Camera placement in our system.
Fig. 10. Feature point extraction for estimating fundamental matrices.
The points on the floor are triangulated so that the floor plane
can be represented by Delauney triangulation in the first basis
view image. The vertices of the triangle mesh are projected onto
two interpolating background images using fundamental ma-
trices as shown in Fig. 6.
Synthesizing the background image of an arbitrary viewpoint
is done using the same interpolation strategy described in the
previous section. The background images require the correspon-
dence of all the points between the two references. According to
the correspondence of the vertices of each triangle mesh, affine
transforms between the two background images are calculated
for each triangle mesh. Since the affine transforms of all the tri-
angle meshes provide pixel-wise correspondence between the
two reference background images, all the pixel positions and
values are determined in accord with to the way expressed in
(5) and (6), as shown in Fig. 7.
Although the use of affine transform is not perspectively cor-
rect, we ignore such perspective errors because the distance be-
tween the object and the camera is relatively large in the present
experiment. In the case of such an approximation that cannot

Citations
More filters
Patent

Systems and methods for the autonomous production of videos from multi-sensored data

TL;DR: In this paper, an autonomous computer-based method and system is described for personalized production of videos such as team sport videos, such as basketball videos from multi-sensored data under limited display resolution.

Identification of 3d objects from multiple silhouettes using quadtrees/octrees.

TL;DR: The scheme is extended to generate a “generalized octree” of an object from three known non-coplanar views, where the block associated with each node in a generalized octree is a parallelepiped, instead of being a cube.
Journal ArticleDOI

Virtual Viewpoint Replay for a Soccer Match by View Interpolation From Multiple Cameras

TL;DR: A novel method for virtual view synthesis that allows viewers to virtually fly through real soccer scenes, which are captured by multiple cameras in a stadium, by view interpolation of real camera images near the chosen viewpoints.
Proceedings ArticleDOI

BISi: a blended interaction space

TL;DR: This work presents early iterations of the Blended Interaction Space One prototype, BISi, and the lessons learned from its creation, as well as exploring the systematic creation of blended spaces for distributed collaboration through the design of appropriate shared spatial geometries.
Journal ArticleDOI

Personalized production of basketball videos from multi-sensored data under limited display resolution

TL;DR: This work designs and implements the estimation process for efficient integration of contextual information, which is implemented by smoothing generated viewpoint/camera sequences to alleviate flickering visual artifacts and discontinuous story-telling artifacts and shows that the method efficiently reduces those artifacts.
References
More filters
Journal ArticleDOI

A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses

TL;DR: In this paper, a two-stage technique for 3D camera calibration using TV cameras and lenses is described, aimed at efficient computation of camera external position and orientation relative to object reference coordinate system as well as the effective focal length, radial lens distortion, and image scanning parameters.
Book

A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses

Roger Y. Tsai
TL;DR: A new technique for three-dimensional camera calibration for machine vision metrology using off-the-shelf TV cameras and lenses using two-stage technique has advantage in terms of accuracy, speed, and versatility over existing state of the art.
Proceedings ArticleDOI

Light field rendering

TL;DR: This paper describes a sampled representation for light fields that allows for both efficient creation and display of inward and outward looking views, and describes a compression system that is able to compress the light fields generated by more than a factor of 100:1 with very little loss of fidelity.
Proceedings ArticleDOI

A volumetric method for building complex models from range images

TL;DR: This paper presents a volumetric method for integrating range images that is able to integrate a large number of range images yielding seamless, high-detail models of up to 2.6 million triangles.
Proceedings ArticleDOI

The lumigraph

TL;DR: A new method for capturing the complete appearance of both synthetic and real world objects and scenes, representing this information, and then using this representation to render images of the object from new camera positions.
Frequently Asked Questions (21)
Q1. How long does it take to render the intermediate images?

Once the 3-D model is generated, it takes about 0.16 s to render the intermediate images from the reference images, which includes 0.05 s to access the file and 0.04 s to display the image. 

Since the floor plane is removed at the step where the silhouette image is made, the SS method only provides the 3-D shape of an object without its background. 

The precise and dense correspondences make it possible to generate virtual views at arbitrary viewpoints without losing pixels even in partially occluded regions. 

Since the PGS is defined by the basis cameras, the geometrical settings of the base cameras affects the results obtained by the proposed method. 

Reconstructing a 3-D shape model from multiple-view images requires a relationship between the 3-D coordinate of the object scene and the 2-D coordinate of the camera-image plane. 

Although the visual hull reconstructed from silhouette images cannot represent an actual 3-D shape, the visual hull can be used as an approximation of the actual 3-D shape in some cases, such as IBVH [12] and the method presented in this paper. 

Because IBR is essentially based on 2-D image processing (cut, warp, paste, etc.), the errors in 3-D shape reconstruction do not affect the quality of the generated images as much as they do for the model-based rendering method. 

The authors select two basis cameras so that the angle of the viewing direction between the basis cameras is close to 90 to make the axes , and almost perpendicular to each other. 

By applying the PGS to the similar virtual view synthesis technique, the strong camera calibration required in conventional work can be avoided. 

Since the field of view of the cameras used in this experiment is less than 10 , the voxel density in the PGS of the objective area is roughly homogeneous. 

Since the coordinate of a point in the PGS is fixed, the point can be projected onto every input view image with the fundamental matrices in the same way as stated before. 

Because the 3-D coordinate in a PGS is dependently defined from the camera-image coordinates, the 3-D position of the sample points does not have to be measured. 

In this section, the authors proposea method for reconstructing precise 3-D shape models in a large target space, which involves dividing the target space intoseveral small subcells and reconstructing a 3-D shape model for every cell. 

In the conventional SS method, each voxel in a certain Euclidean space is projected onto every silhouette image with projection matrices (which are calculated by accurately calibrating every camera [2], [14]) to check whether it is included in the object region. 

The fundamental matrices between the cameras are obtained by putting a checkerboard pattern at various heights, as depicted in Fig. 10, so that the image feature points can be distributed in the objective space. 

if a number of cameras are mounted on the surface of a hemisphere enclosing the target space and any three of them form a triangle effectively, the virtual viewpoint can be moved freely all around the half sphere. 

From such images, about 50 image feature points are extracted, and then the same feature points extracted in the other cameras are manually corresponded. 

Although the use of affine transform is not perspectively correct, the authors ignore such perspective errors because the distance between the object and the camera is relatively large in the present experiment. 

4. Then, the pixel position on the interpolated view image for the vertex is calculated by the following equation:(5)Next, visibility of the vertex from each reference viewpoints is checked by comparing the depth from the reference viewpointto the vertex with the depth value in the depth image at the reference viewpoint. 

In this process, the order in which the existence of a voxel is checked is important for reducing the computational complexity, because the cost of computing the projection of a voxel onto an image is not the same for all the images in the proposed scheme. 

even when the target space is very large, e.g., a soccer field or an American football field, the authors can synthesize arbitrary view images by dividing the whole target space into several cells and reconstructing a 3-D shape model in each cell separately.