scispace - formally typeset
Open AccessProceedings ArticleDOI

Creating image-based VR using a self-calibrating fisheye lens

Yalin Xiong, +1 more
- pp 237-243
Reads0
Chats0
TLDR
It is demonstrated that with four fisheye lens images, one can seamlessly register them to create the spherical panorama, while self-calibrating its distortion and field of view.
Abstract
Image-based virtual reality is emerging as a major alternative to the more traditional 3D-based VR. The main advantages of the image-based VR are its photoquality realism and 3D illusion without any 3D information. Unfortunately, creating content for image-based VR is usually a very tedious process. This paper proposes to use a non-perspective fisheye lens to capture the spherical panorama with very few images. Unlike most of camera calibration in computer vision, self-calibration of the fisheye lens poses new questions regarding the parameterization of the distortion and wrap-around effects. Because of its unique projection model and large field of view (near 180 degrees), most of the ambiguity problems in self-calibrating a traditional lens can be solved trivially. We demonstrate that with four fisheye lens images, we can seamlessly register them to create the spherical panorama, while self-calibrating its distortion and field of view.

read more

Content maybe subject to copyright    Report

Abstract
1 Intro duction
Creating Image-Based VR Using a Self-Calibrating Fisheye Lens
Yalin Xiong Ken Turkowski
QuickTime VR Group
Apple Computer
Cupertino, CA 95014
Image-based virtual reality is emerging as a major
alternative to the moretraditional 3D-based VR. The
main advantages of the image-basedVRare its photo-
quality realism and 3D il lusion without any 3D infor-
mation. Unfortunately, creating content for image-
based VR is usual ly a very tedious process. This pa-
per proposes to use a non-perspective sheye lens to
capture the spherical panorama with very few images.
Unlike most of cameracalibration in computer vision,
self-calibration of the sheye lens poses new questions
regarding the parameterization of the distortion and
wrap-around eects. Because of its unique projection
model and large eld of view (near 180 degrees), most
of the ambiguity problems in self-calibrating a tradi-
tional lens can be solved trivial ly. We demonstrate
that with four sheye lens images, we can seamlessly
register them to create the spherical panorama, while
self-calibrating its distortion and eld of view.
a priori
Image-based virtual reality is emerging as a ma jor
alternative to the more traditional 3D-based VR. Un-
like virtual environments generated by 3D graphics, in
which the information to represe nt the environmentis
kept internally as geometry and texture maps, image-
based VR represents the environmentby one or more
images, which can b e either captured by camera or syn-
thesized from 3D computer graphics. There are two
types of image-based VR representations: the single-
node 2D representation [2], which represents the vir-
tual world around one nodal p ointby a panorama, and
the light-eld 4D representation [5], which represents
the virtual world contained in a pre-dened 3D vol-
ume. The main advantages of image-based VR are its
simplicity for rendering, photographic quality realism,
and the 3D illusion experienced by users.
This pap er is concerned with creating content for
single-node 2D panoramas. The conventional wayto
create a surrounding panorama is by rotating a cam-
era around its no dal p oint. Using a 15mm lens with
35mm lm, it takes about 12 pictures to capture a
panorama with 90-degree vertical eld of view. Cap-
turing a full spherical panorama requires at least 30
pictures and involves rotating the camera along two dif-
ferent axes. In addition, the image registration pro cess
becomes complicated. Fortunately, some commercially
available sheye lenses enable us to capture spherical
panoramas using far less number of pictures b ecause of
their near 180-degree eld of view.
Surprisingly, there is little literature on the cali-
bration of sheye lenses. Most of the published and
patented works on using sheye lens assume either an
ideal pro jection model [1, 8] or use the distortion model
of rectilinear lenses by adding more nonlinear terms [7].
We found in experiments that none of the ab ovetwo
schemes is accurate enough for the purp ose of register-
ing multiple sheye images into panoramas. Further-
more, we also need to minimize the requirements for
elaborate calibration equipment so that it is easy to
use. Therefore, self calibration of the sheye lens is
also desirable.
The fundamental dierence between a sheye lens
and an ordinary rectilinear lens is that the pro jection
from a 3D ray to a 2D image position in the sheye lens
is intrinsically non-p erspective. There are many pro-
jection mo dels for sheye lenses prop osed in literature
[6]. We found that the equi-distance mo del is a rea-
sonable rst-order approximation. On top of the equi-
distance mo del, we model the additional radial lens
distortion by a third order polynomial. Experimental
results demonstrate that sheye images can be regis-
tered seamlessly when the distortions are corrected.
By establishing the corresp ondence between twoor
more images, it is shown in [3] that many camera pa-
rameters can b e recovered without knowledge
of the camera motion or scene geometry. Unfortu-
nately, self calibration in general is unstable if the im-
age center and the eld of view are unknown. The
self-calibration of a sheye lens is even more dicult
because of its unknown lens distortion. But for a sh-
eye lens, its image center can be determined trivially

q
1
q
2
r
1
r
2
f
1
X
Z
Image Plane
Nodal Point
p
111
while
2 2
1 2
2
3
3
3.1 Camera Setup
x; y
x c ;
y c ;
c
r r x y c
r
r c c c ;
2 Fisheye Pro jection Mo del and Dis-
tortion
3 Image Registration and Self Calibra-
tion
Figure 1: An Image from a Fisheye Lens
as the center of the ellipse whichenvelop es the im-
age (Figure 1). When we rotate the camera around
its nodal point to capture the spherical panorama, the
wrap-around eect, i.e., the overlap between the rst
and last images, provides enough constraints for its
eld of view. Once we know those intrinsic parame-
ters, the self calibration becomes very stable. Hartley
in [4] proposed a similar self-calibration approach for
a rectilinear lens by rotating the camera, though it is
dicult to assess his results for image registration pur-
pose.
Another ma jor dierence b etween the work pre-
sented in this pap er and other published works on
self-calibration is that we register images self-
calibrating the camera. The benet is that the quality
of the calibration is iteratively improved because of the
improved image registration, and the quality of the im-
age registration is iteratively improved b ecause of the
improved calibration. We adopt a multi-level gradient
based registration to register the sheye images while
self-calibrating its distortion parameters and eld of
view. Using the Levenberg-Marquardt minimization,
we show that the registration process with the radial
distortion mo delled as a cubic p olynomial results in
excellent spherical panoramas.
The pro jection from 3D rays to 2D image positions
in a sheye lens can be approximated by the so-called
\equi-distance" model. Suppose a 3D ray from the
nodal point of the lens is specied bytwo angles and
as in Figure 2. Then the equi-distance pro jection
model pro jects the 3D rayinto an image position ( ),
in which
= cos (1)
Figure 2: Equi-Distance Pro jection Mo del
= sin (2)
where is a scale factor determined by the focal length
and the scale of the lm scanning. In other words, the
equi-distance model maps the latitude angle to the
polar distance in the image, i.e., = + = ,
as well as the longitude angle to the polar direction in
the image.
The advantage of this pro jection mo del over the
traditional planar pro jection model is that it allows
an arbitrarily large eld of view, at least mathemati-
cally. Current commercial sheye lenses include Nikon
8mm (180-degree FOV) and 6mm (220-degree FOV).
We tested the equi-distance pro jection model in the
8mm sheye lens, and found that it is a go od rst-
order approximation as we will show later.
The radial distortion mo del mo dels the higher order
eects in the mapping b etween the latitude angle and
the polar distance :
= + + + (3)
where the order of the polynomial can be determined
experimentally.
Figure 3 shows the setup for capturing spherical
panoramas and self-calibrating. The Nikon N900 cam-
era is mounted on a platform, which can slide in two
orthogonal directions. The pointing direction of the
camera is slightly tilted upward for reasons we will ex-
plain later.
The no dal p oint of the sheye lens needs to b e ad-
justed so that it lies on the rotation axis of the tripo d.
Once the camera is set up properly,we can take either
four pictures by rotating the camera 90 degrees after

1
2
k ij
0
1
0
0
0
0
1
1
X
2 3
2
3
2
3
s
!
q
o R
x x p
x
p
x x
x
x
x
o
R
x
x
M
0 1 2 3
0
1
2 3 0
1 2 3
2
3 2 1
1
2
2
1
1 2 3
2
3.2 Ob jectiveFunction and Minimization
i
i i
i i
i j
ij
ij
x A
k
k ii k i jj k j
ij
k i
j
ij
k k
k
k
T
i
i
x
i
y
T
i
i
x
i
y
T
i
i
i
x
i
x
i
y
i
y
i
i
y
i
y
i
x
i
x
i
i i
i i
j i
I I I I
I
I
I I I
;i
;;
i ;;;
c c c
s a
s a
i ;;
I I
S
A
e;
e sI a sI T a :
A T
I
I
A
T T T T :
T
;
k
x y ;
o o
;
R R
;
I
r
x o
R
y o
R
y o
R
;
x o
R
;
I
r c;c;c ;
;
T
I
Note that we use the same notation for the 2D polar direc-
tion and the 3D longitude angle because they are the same as
long as the tangential distortion is zero, which is assumed in this
paper.
Figure 3: Camera Setup
every shot, or three pictures by rotating it 120 degrees.
We prefer the four-picture method simply because it
provides larger overlap regions.
Given the four images , , , and ,we formu-
late the registration and self-calibration problems as a
single nonlinear minimization problem. The 3D refer-
ence frame is the camera coordinate of image . The
following 34 parameters are adjusted in the minimiza-
tion process:
Camera rotations: We fully parameterize the rel-
ative orientations of the camera co ordinates of ,
, and with resp ect to the reference frame
in order to accommo date arbitrary, unconstrained
rotations. Three angles (roll, pitch, yaw) for
each image yield nine rotation parameters (=
1 2 3).
Image Centers and Radii: As shown in Figure 1,
the envelope of the image is an ellipse with two
slightly dierent principal radii. The parame-
ters are image center p ositions and radii
( = 0 1 2 3). The total number of parameters is
sixteen.
Radial Lens Distortion: We use one cubic p olyno-
mial to represent the mapping between the lati-
tude angle and the polar distance for all images.
The parameters are , , and . The reason
to choose a cubic polynomial is purely experimen-
tal, and specic to the Nikon 8mm sheye lens we
have. For other sheye lenses, the order of the
polynomial may need to b e higher or lower.
Image Brightness Dierence: The brightness scal-
ing factor (contrast) and oset (brightness).
The six illumination parameters are and ,
( = 1 2 3).
Let us rst consider the registration of two sheye
images and . The ob jective function is:
=
1
(4)
= ( ( )+ ) ( (( ; )) + )
where is the overlap region, ( ) is a function which
transforms the image position in to its corre-
sponding position in , and is the vector of all pa-
rameters listed above except the brightness compensa-
tion parameters. The overlap region is determined
by the current estimate of the camera parameters and
rotations.
The transformation function can b e decomposed
into three concatenated functions:
( )= ( ( ( ))) (5)
The rst function ( ) transforms the image posi-
tion intoa3Dray direction ( ). In the following
discussion , we will drop the subscript to simplify the
notation. Let
= (6)
= (7)
= (8)
we can represent the image p osition in the polar co-
ordinate of image as
= + (9)
= atan2 (10)
where atan2 is the arc tangent function with quadrant
information. Therefore, the 3D ray direction of rep-
resented in the camera coordinate of is:
= 2( ; ) (11)
= (12)
where 2( ) is the inverse function of the distortion poly-
nomial in Eq. 3. In practice, the inverse can b e solved
using the Newton-Raphson ro ot-nding metho d.
The second function ( ) converts the 3D ray di-
rection into the camera coordinate of . Let and

ij
2
1
0
4 Exp eriments
2 3 2 3
2
3
2
4
3
5
X
X
0
0 0
0
0
8f g 6;
0
0
M
q q
MM
q
M
u Mu M
u u
M
M
1
3
1 2
2
3
3
: =
1 2
3
2
=0
2
0
3.3 Initial Estimates and Damping
4.1 Minimization Feedback
j
i j
j
x
j
y
j
z
T
j
i
i
x
i
y
i
z
T
i
x
i
y
i
z
T
i i
i i
i
j
j
j
z
j
j
y
j
x
j j
j
j j
j j
j j
j
x
j
x
j j
j
y
j
y
j j
ij
i;j A
ij
i
i
i
i
i
u u u u u u
;
u u u
:
I
u ;
u;u :
T ;
I x;y
r c c c;
:
x o Rr ;
y o Rr :
S
S S:
c :=;c
c :
s : a :
E C ;
C
be 3 3 rotation matrices computed from the
roll/pitch/yaw angles and ,we then have
= (13)
in which
=
sin cos
sin sin
cos
(14)
Therefore, the 3D ray direction in the camera co ordi-
nate of can b e represented as
= acos( ) (15)
= atan2( ) (16)
The third function ( ) maps the 3D ray( )
onto the image p osition in ( ). The image posi-
tion in p olar coordinate is
= + + (17)
= (18)
In Cartesian image co ordinate, the position is
= + cos (19)
= + sin (20)
The minimum of the ob jective function in Eq. 4
is reached when its derivative is zero. When four im-
ages are considered together, the overall ob jective func-
tion is the sum of the all image pairs with overlap:
= (21)
The Levenberg-Marquardt method is then used to min-
imize the objective function with proper initial esti-
mates of parameters.
The initial estimate problem is important for any
nonlinear optimization in order to avoid local minima
and divergence. Among the parameters we need to op-
timize, we can set the initial radial distortion model
to the ideal equi-distance pro jection ( =20 =
= 0 0), and brightness dierence parameters to ei-
ther = 1 0 and =00orvalues computed from
camera exposure/ap erture settings. We need to b e es-
pecially careful about the rotation angles, image cen-
ters and radii because they are the main sources of the
nonlinearity in the ob jective function. The optimiza-
tion can rarely recover from grossly erroneous rotation
angles, image centers or radii.
Between two arbitrary sheye images taken by ro-
tating the camera around its nodal p oint, we need to
have an initial estimate of the rotation represe nted by
either the roll/pitch/yaw angles or a rotation ma-
trix .Ifwehave, for example, three p oints in two
images matched manually,we can minimize the follow-
ing function to get an initial estimate of the rotation
matrix:
= ( )+ ( ) (22)
where and are the two3Drays computed as in
Eq. 14 from the image p ositions using the current cam-
era parameters, and the term ( ) constrains the ma-
trix to b e a rotation matrix.
It is well known that the self calibration is dicult
when the image center position is unknown. Fortu-
nately wehave an independentway to compute a goo d
initial estimate of the image center due to the unique
pro jection model in the sheye lens. According to the
equi-distance and the radial distortion mo del, the im-
age center position coincides with the center position
of the ellipse. The initial estimates of radii and image
centers are obtained by tting the ellipse. In order for
the nonlinear optimization to b e stable and more likely
to converge to the global minimum,we nd from ex-
periments that we need to damp en the image center
position and radii.
There is no foolpro of way to guarantee that the
Levenberg-Marquardt method or any other non-linear
minimization method converges to the global mini-
mum. Therefore, we need to provide the users with
necessary feedback in order for them to tune param-
eters. In addition, providing feedback while perform-
ing the nonlinear minimization will increase the user-
friendliness as well.
In the nonlinear minimization pro cess, after every
iteration, we show users the current status of the regis-
tration. The issue is how to display the current spher-
ical panorama to users in an ecient and intuitive
way. In the exp eriments shown below, we use the ideal
equi-distance pro jection to pro ject the whole spherical
panorama into an image (Figure 4) as if it were imaged
by an ideal sheye lens with FOV of 360 degrees. The
north and south p oles are indicated as in the gure,
and the outmost circle of the image corresp onds to a
single ray = . This 360-degree spherical mapping
can be physically approximated by the reection on a
shiny ball such as a Christmas ornament when viewed
from far away.

South Pole
North Pole
0 10 20 30 40 50 60 70 80 90
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Latitude Angle Theta
Nomalized Radial Distance from Image Center
Ideal Equi-distance Model
Calibrated Distortion Model
2
References
4.2 Experimental Results
Optical Engineering
Proc.
SIGGRAPH Conference
Proc.
European Conference on Computer Vision
Proc. European Conference
on Computer Vision
Proc. SIGGRAPH Conference
Journal of Optical
Society of America
Proc.
Int'l Conferenceon Robotics and Automation
U.S. Patent No. 5185667
[1] Zuoliang Cao, Sung J. Oh, and Ernest Hall. Omnidi-
rectional dynamic vision p ositionin g for a mobile rob ot.
, 25(12):1278{1283, December 1986.
[2] Shenchang E. Chen. QuickTime VR | an image-based
approach to virtual environment navigation. In
, pages 29{38, August 1995.
[3] O.D. Faugeras, Q. T. Luong, and S. J. Maybank. Cam-
era self-calibra tion: Theory and experiments. In
, pages 321{
334, 1992.
[4] Richard I. Hartley. Self-calibrati on from multiple views
with a rotating camera. In
, pages 471{478, 1994.
[5] Marc Levoy and Pat Hanrahan. Light eld rendering.
In , pages 31{42, August
1996.
[6] Kenro Miyamoto. Fish eye lens.
, 54:1060{1061, 1964.
[7] S. Shah and J. K. Aggarwal. A simple calibrati on proce-
dure for sheye (high distortion) lens camera. In
, pages
3422{3427, 1994.
[8] Steven D. Zimmermann. Omniview motionless camera
orientation system. , 1993.
Figure 4: Feedback from the minimization
We tested our algorithm using four sheye images
(Figure 5) taken by rotating the camera roughly 90 de-
grees for every shot. We used Ko dak ASA 400 lm, and
the images were scanned at resolution of 768 512 with
24-bit color. In the b ottom portion of each image the
tripo d is visible. Since the eld of view of the sheye
lens is near 180 degrees, and its no dal p oint has to be
on the rotation axis, there appears to b e no easy way
to get around the problem. In our minimization, we
do not take the bottom p ortion of the sheye images
into account. The reason we usually tilt the camera
upward is that since the bottom portion contains the
tripo d anyway,we are better o tilting it upward so
that the top portion (near north p ole) is covered re-
dundantly.
In the minimization, the inital rotation angles are
90-degree apart, and the initial rotation axis is point-
ing north. The image registration is gradient-based.
We currently use the derivative of Gaussian as the gra-
dient lter and the Gaussian as the smoothing lter.
The size of the smo othing and gradient lters are ad-
justable to achieve registration at dierent scales. Fig-
ure 6 shows the feedback information during the mini-
mization. The lens distortion model is the cubic poly-
nomial as in Eq. 3. The seams in the feedback images
are intentionally left so that the users know where each
sheye image is mapped. Those seams will not be visi-
ble in the nal stitched panoramas. We can see that the
optimization converges quickly to the global minimum.
Figure 7 shows the nal results of minimizations when
the ideal equi-distance pro jection model and our cubic
distortion model are used. We also tested the same op-
timization on three other sets of sheye images taken
indoor and outdo or using the same sheye lens. In all
cases, wewere able to converge to the global minimum
in our rst try.
Figure 8: Calibrated Lens Distortion
The result of the self calibration of the sheye lens
is the cubic polynomial of the pro jection mo del speci-
ed in Eq. 3. Figure 8 shows the calibrated pro jection
model and the ideal equi-distance mo del.
Once the four sheye images are registered and the
sheye lens is calibrated, we can represent the spheri-
cal panorama using any pro jection. The equi-distance
pro jection we used in the minimization feedback is one
choice. We can also, for example, pro ject the spheri-
cal panorama onto a cube. Figure 9 shows the texture
maps as pro jected on the six faces of the cub e.

Citations
More filters
Book

Computer Vision: Algorithms and Applications

TL;DR: Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images and takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene.
Book

Image Alignment and Stitching: A Tutorial

TL;DR: In this article, the basic motion models underlying alignment and stitching algorithms are described, and effective direct (pixel-based) and feature-based alignment algorithms, and blending algorithms used to produce seamless mosaics.
Proceedings ArticleDOI

Rendering with concentric mosaics

TL;DR: An image based system and process for rendering novel views of a real or synthesized 3D scene based on a series of concentric mosaics depicting the scene using multiple ray directions from the novel viewpoint.
Proceedings ArticleDOI

Review of image-based rendering techniques

TL;DR: The continuum between images and geometry used in image-based rendering techniques suggests that image- based rendering with traditional 3D graphics can be united in a joint image and geometry space.
Journal ArticleDOI

Systems and Experiment Paper: Construction of Panoramic Image Mosaics with Global and Local Alignment

TL;DR: This paper presents a complete system for constructing panoramic image mosaics from sequences of images, and introduces a rotational mosaic representation that associates a rotation matrix with each input image and a patch-based alignment algorithm to quickly align two images given motion models.
References
More filters
Proceedings ArticleDOI

Light field rendering

TL;DR: This paper describes a sampled representation for light fields that allows for both efficient creation and display of inward and outward looking views, and describes a compression system that is able to compress the light fields generated by more than a factor of 100:1 with very little loss of fidelity.
Proceedings ArticleDOI

QuickTime VR: an image-based approach to virtual environment navigation

TL;DR: This paper presents a new approach which uses 360-degree cylindrical panoramic images to compose a virtual environment which includes viewing of an object from different directions and hit-testing through orientation-independent hot spots.
Book ChapterDOI

Camera Self-Calibration: Theory and Experiments

TL;DR: It is shown, using experiments with noisy data, that it is possible to calibrate a camera just by pointing it at the environment, selecting points of interest and then tracking them in the image as the camera moves.
Patent

Omniview motionless camera orientation system

TL;DR: In this article, the effect that the image from a fisheye lens produces a circular image of an entire hemispherical field-of-view, which can be mathematically corrected using high speed electronic circuitry.
Book ChapterDOI

Self-calibration from multiple views with a rotating camera

TL;DR: There is no epipolar structure since all images are taken from the same point in space and determination of point matches is considerably easier than for images taken with a moving camera, since problems of occlusion or change of aspect or illumination do not occur.