Creating image-based VR using a self-calibrating fisheye lens

doi:10.1109/CVPR.1997.609326

Abstract

1 Intro duction

Creating Image-Based VR Using a Self-Calibrating Fisheye Lens

Yalin Xiong Ken Turkowski

QuickTime VR Group

Apple Computer

Cupertino, CA 95014

Image-based virtual reality is emerging as a major

alternative to the moretraditional 3D-based VR. The

main advantages of the image-basedVRare its photo-

quality realism and 3D il lusion without any 3D infor-

mation. Unfortunately, creating content for image-

based VR is usual ly a very tedious process. This pa-

per proposes to use a non-perspective sheye lens to

capture the spherical panorama with very few images.

Unlike most of cameracalibration in computer vision,

self-calibration of the sheye lens poses new questions

regarding the parameterization of the distortion and

wrap-around eects. Because of its unique projection

model and large eld of view (near 180 degrees), most

of the ambiguity problems in self-calibrating a tradi-

tional lens can be solved trivial ly. We demonstrate

that with four sheye lens images, we can seamlessly

register them to create the spherical panorama, while

self-calibrating its distortion and eld of view.

a priori

Image-based virtual reality is emerging as a ma jor

alternative to the more traditional 3D-based VR. Un-

like virtual environments generated by 3D graphics, in

which the information to represe nt the environmentis

kept internally as geometry and texture maps, image-

based VR represents the environmentby one or more

images, which can b e either captured by camera or syn-

thesized from 3D computer graphics. There are two

types of image-based VR representations: the single-

node 2D representation [2], which represents the vir-

tual world around one nodal p ointby a panorama, and

the light-eld 4D representation [5], which represents

the virtual world contained in a pre-dened 3D vol-

ume. The main advantages of image-based VR are its

simplicity for rendering, photographic quality realism,

and the 3D illusion experienced by users.

This pap er is concerned with creating content for

single-node 2D panoramas. The conventional wayto

create a surrounding panorama is by rotating a cam-

era around its no dal p oint. Using a 15mm lens with

35mm lm, it takes about 12 pictures to capture a

panorama with 90-degree vertical eld of view. Cap-

turing a full spherical panorama requires at least 30

pictures and involves rotating the camera along two dif-

ferent axes. In addition, the image registration pro cess

becomes complicated. Fortunately, some commercially

available sheye lenses enable us to capture spherical

panoramas using far less number of pictures b ecause of

their near 180-degree eld of view.

Surprisingly, there is little literature on the cali-

bration of sheye lenses. Most of the published and

patented works on using sheye lens assume either an

ideal pro jection model [1, 8] or use the distortion model

of rectilinear lenses by adding more nonlinear terms [7].

We found in experiments that none of the ab ovetwo

schemes is accurate enough for the purp ose of register-

ing multiple sheye images into panoramas. Further-

more, we also need to minimize the requirements for

elaborate calibration equipment so that it is easy to

use. Therefore, self calibration of the sheye lens is

also desirable.

The fundamental dierence between a sheye lens

and an ordinary rectilinear lens is that the pro jection

from a 3D ray to a 2D image position in the sheye lens

is intrinsically non-p erspective. There are many pro-

jection mo dels for sheye lenses prop osed in literature

[6]. We found that the equi-distance mo del is a rea-

sonable rst-order approximation. On top of the equi-

distance mo del, we model the additional radial lens

distortion by a third order polynomial. Experimental

results demonstrate that sheye images can be regis-

tered seamlessly when the distortions are corrected.

By establishing the corresp ondence between twoor

more images, it is shown in [3] that many camera pa-

rameters can b e recovered without knowledge

of the camera motion or scene geometry. Unfortu-

nately, self calibration in general is unstable if the im-

age center and the eld of view are unknown. The

self-calibration of a sheye lens is even more dicult

because of its unknown lens distortion. But for a sh-

eye lens, its image center can be determined trivially

q

1

q

2

r

1

r

2

f

1

X

Z

Image Plane

Nodal Point

p

111

while

2 2

1 2

2

3

3.1 Camera Setup





x; y

x c ;

y c ;

c



r r x y c



r

r c c c ;

2 Fisheye Pro jection Mo del and Dis-

tortion

3 Image Registration and Self Calibra-

tion

Figure 1: An Image from a Fisheye Lens

as the center of the ellipse whichenvelop es the im-

age (Figure 1). When we rotate the camera around

its nodal point to capture the spherical panorama, the

wrap-around eect, i.e., the overlap between the rst

and last images, provides enough constraints for its

eld of view. Once we know those intrinsic parame-

ters, the self calibration becomes very stable. Hartley

in [4] proposed a similar self-calibration approach for

a rectilinear lens by rotating the camera, though it is

dicult to assess his results for image registration pur-

pose.

Another ma jor dierence b etween the work pre-

sented in this pap er and other published works on

self-calibration is that we register images self-

calibrating the camera. The benet is that the quality

of the calibration is iteratively improved because of the

improved image registration, and the quality of the im-

age registration is iteratively improved b ecause of the

improved calibration. We adopt a multi-level gradient

based registration to register the sheye images while

self-calibrating its distortion parameters and eld of

view. Using the Levenberg-Marquardt minimization,

we show that the registration process with the radial

distortion mo delled as a cubic p olynomial results in

excellent spherical panoramas.

The pro jection from 3D rays to 2D image positions

in a sheye lens can be approximated by the so-called

\equi-distance" model. Suppose a 3D ray from the

nodal point of the lens is specied bytwo angles and

as in Figure 2. Then the equi-distance pro jection

model pro jects the 3D rayinto an image position ( ),

in which

= cos (1)

Figure 2: Equi-Distance Pro jection Mo del

= sin (2)

where is a scale factor determined by the focal length

and the scale of the lm scanning. In other words, the

equi-distance model maps the latitude angle to the

polar distance in the image, i.e., = + = ,

as well as the longitude angle to the polar direction in

the image.

The advantage of this pro jection mo del over the

traditional planar pro jection model is that it allows

an arbitrarily large eld of view, at least mathemati-

cally. Current commercial sheye lenses include Nikon

8mm (180-degree FOV) and 6mm (220-degree FOV).

We tested the equi-distance pro jection model in the

8mm sheye lens, and found that it is a go od rst-

order approximation as we will show later.

The radial distortion mo del mo dels the higher order

eects in the mapping b etween the latitude angle and

the polar distance :

= + + + (3)

where the order of the polynomial can be determined

experimentally.

Figure 3 shows the setup for capturing spherical

panoramas and self-calibrating. The Nikon N900 cam-

era is mounted on a platform, which can slide in two

orthogonal directions. The pointing direction of the

camera is slightly tilted upward for reasons we will ex-

plain later.

The no dal p oint of the sheye lens needs to b e ad-

justed so that it lies on the rotation axis of the tripo d.

Once the camera is set up properly,we can take either

four pictures by rotating the camera 90 degrees after

1

2

k ij



0

1

0

1

X

2 3

2

3

2

3

s

   

!

q

o R

x x p

x

p

x x

x

o

R

x

M

0 1 2 3

0

1

2 3 0

1 2 3

2

3 2 1

1

2

1

1 2 3

2

3.2 Ob jectiveFunction and Minimization

i

i i

i j

ij

x A

k

k ii k i jj k j

ij

k i

j

ij

k k

k

T

i

x

i

y

T

i

x

i

y

T

i

x

i

x

i

y

i

y

i

y

i

y

i

x

i

x

i

i i

j i

I I I I

I

I I I

;i

;;

i ;;;

c c c

s a

i ;;

I I

S

A

e;

e sI a sI T a :

A T

I

A

T T T T :

T

; 

k

x y ;

o o

;

R R

;

I

r

x o

R

y o

R



y o

R

;

x o

R

;

I

 r c;c;c ;

 ;

T

I

Note that we use the same notation for the 2D polar direc-

tion and the 3D longitude angle because they are the same as

long as the tangential distortion is zero, which is assumed in this

paper.

Figure 3: Camera Setup

every shot, or three pictures by rotating it 120 degrees.

We prefer the four-picture method simply because it

provides larger overlap regions.

Given the four images , , , and ,we formu-

late the registration and self-calibration problems as a

single nonlinear minimization problem. The 3D refer-

ence frame is the camera coordinate of image . The

following 34 parameters are adjusted in the minimiza-

tion process:

Camera rotations: We fully parameterize the rel-

ative orientations of the camera co ordinates of ,

, and with resp ect to the reference frame

in order to accommo date arbitrary, unconstrained

rotations. Three angles (roll, pitch, yaw) for

each image yield nine rotation parameters (=

1 2 3).

Image Centers and Radii: As shown in Figure 1,

the envelope of the image is an ellipse with two

slightly dierent principal radii. The parame-

ters are image center p ositions and radii

( = 0 1 2 3). The total number of parameters is

sixteen.

Radial Lens Distortion: We use one cubic p olyno-

mial to represent the mapping between the lati-

tude angle and the polar distance for all images.

The parameters are , , and . The reason

to choose a cubic polynomial is purely experimen-

tal, and specic to the Nikon 8mm sheye lens we

have. For other sheye lenses, the order of the

polynomial may need to b e higher or lower.

Image Brightness Dierence: The brightness scal-

ing factor (contrast) and oset (brightness).

The six illumination parameters are and ,

( = 1 2 3).

Let us rst consider the registration of two sheye

images and . The ob jective function is:

=

1

(4)

= ( ( )+ ) ( (( ; )) + )

where is the overlap region, ( ) is a function which

transforms the image position in to its corre-

sponding position in , and is the vector of all pa-

rameters listed above except the brightness compensa-

tion parameters. The overlap region is determined

by the current estimate of the camera parameters and

rotations.

The transformation function can b e decomposed

into three concatenated functions:

( )= ( ( ( ))) (5)

The rst function ( ) transforms the image posi-

tion intoa3Dray direction ( ). In the following

discussion , we will drop the subscript to simplify the

notation. Let

= (6)

= (7)

= (8)

we can represent the image p osition in the polar co-

ordinate of image as

= + (9)

= atan2 (10)

where atan2 is the arc tangent function with quadrant

information. Therefore, the 3D ray direction of rep-

resented in the camera coordinate of is:

= 2( ; ) (11)

= (12)

where 2( ) is the inverse function of the distortion poly-

nomial in Eq. 3. In practice, the inverse can b e solved

using the Newton-Raphson ro ot-nding metho d.

The second function ( ) converts the 3D ray di-

rection into the camera coordinate of . Let and

ij

2

1

0

4 Exp eriments

2 3 2 3

2

3

2

4

3

5

X

0

0 0

0

8f g 6;

0

M

q q

MM

q

M

u Mu M

u u

M

1

3

1 2

2

3

: =

1 2

3

2

=0

2

0

3.3 Initial Estimates and Damping

4.1 Minimization Feedback

j

i j

j

x

j

y

j

z

T

j

i

x

i

y

i

z

T

i

x

i

y

i

z

T

i i

i

j

z

j

y

j

x

j j

j

j j

j

x

j

x

j j

j

y

j

y

j j

ij

i;j A

ij

i

u u u u u u

;

u u u

 



:

I

 u ;

 u;u :

T ;

I x;y

r c c c;

 :

x o Rr ;

y o Rr :

S

S S:

c :=;c

c :

s : a :

E C ;

C

 

be 3 3 rotation matrices computed from the

roll/pitch/yaw angles and ,we then have

= (13)

in which

=

sin cos

sin sin

cos

(14)

Therefore, the 3D ray direction in the camera co ordi-

nate of can b e represented as

= acos( ) (15)

= atan2( ) (16)

The third function ( ) maps the 3D ray( )

onto the image p osition in ( ). The image posi-

tion in p olar coordinate is

= + + (17)

= (18)

In Cartesian image co ordinate, the position is

= + cos (19)

= + sin (20)

The minimum of the ob jective function in Eq. 4

is reached when its derivative is zero. When four im-

ages are considered together, the overall ob jective func-

tion is the sum of the all image pairs with overlap:

= (21)

The Levenberg-Marquardt method is then used to min-

imize the objective function with proper initial esti-

mates of parameters.

The initial estimate problem is important for any

nonlinear optimization in order to avoid local minima

and divergence. Among the parameters we need to op-

timize, we can set the initial radial distortion model

to the ideal equi-distance pro jection ( =20 =

= 0 0), and brightness dierence parameters to ei-

ther = 1 0 and =00orvalues computed from

camera exposure/ap erture settings. We need to b e es-

pecially careful about the rotation angles, image cen-

ters and radii because they are the main sources of the

nonlinearity in the ob jective function. The optimiza-

tion can rarely recover from grossly erroneous rotation

angles, image centers or radii.

Between two arbitrary sheye images taken by ro-

tating the camera around its nodal p oint, we need to

have an initial estimate of the rotation represe nted by

either the roll/pitch/yaw angles or a rotation ma-

trix .Ifwehave, for example, three p oints in two

images matched manually,we can minimize the follow-

ing function to get an initial estimate of the rotation

matrix:

= ( )+ ( ) (22)

where and are the two3Drays computed as in

Eq. 14 from the image p ositions using the current cam-

era parameters, and the term ( ) constrains the ma-

trix to b e a rotation matrix.

It is well known that the self calibration is dicult

when the image center position is unknown. Fortu-

nately wehave an independentway to compute a goo d

initial estimate of the image center due to the unique

pro jection model in the sheye lens. According to the

equi-distance and the radial distortion mo del, the im-

age center position coincides with the center position

of the ellipse. The initial estimates of radii and image

centers are obtained by tting the ellipse. In order for

the nonlinear optimization to b e stable and more likely

to converge to the global minimum,we nd from ex-

periments that we need to damp en the image center

position and radii.

There is no foolpro of way to guarantee that the

Levenberg-Marquardt method or any other non-linear

minimization method converges to the global mini-

mum. Therefore, we need to provide the users with

necessary feedback in order for them to tune param-

eters. In addition, providing feedback while perform-

ing the nonlinear minimization will increase the user-

friendliness as well.

In the nonlinear minimization pro cess, after every

iteration, we show users the current status of the regis-

tration. The issue is how to display the current spher-

ical panorama to users in an ecient and intuitive

way. In the exp eriments shown below, we use the ideal

equi-distance pro jection to pro ject the whole spherical

panorama into an image (Figure 4) as if it were imaged

by an ideal sheye lens with FOV of 360 degrees. The

north and south p oles are indicated as in the gure,

and the outmost circle of the image corresp onds to a

single ray = . This 360-degree spherical mapping

can be physically approximated by the reection on a

shiny ball such as a Christmas ornament when viewed

from far away.

South Pole

North Pole

0 10 20 30 40 50 60 70 80 90

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Latitude Angle Theta

Nomalized Radial Distance from Image Center

Ideal Equi-distance Model

Calibrated Distortion Model

2

References

4.2 Experimental Results

Optical Engineering

Proc.

SIGGRAPH Conference

Proc.

European Conference on Computer Vision

Proc. European Conference

on Computer Vision

Proc. SIGGRAPH Conference

Journal of Optical

Society of America

Proc.

Int'l Conferenceon Robotics and Automation

U.S. Patent No. 5185667

[1] Zuoliang Cao, Sung J. Oh, and Ernest Hall. Omnidi-

rectional dynamic vision p ositionin g for a mobile rob ot.

, 25(12):1278{1283, December 1986.

[2] Shenchang E. Chen. QuickTime VR | an image-based

approach to virtual environment navigation. In

, pages 29{38, August 1995.

[3] O.D. Faugeras, Q. T. Luong, and S. J. Maybank. Cam-

era self-calibra tion: Theory and experiments. In

, pages 321{

334, 1992.

[4] Richard I. Hartley. Self-calibrati on from multiple views

with a rotating camera. In

, pages 471{478, 1994.

[5] Marc Levoy and Pat Hanrahan. Light eld rendering.

In , pages 31{42, August

1996.

[6] Kenro Miyamoto. Fish eye lens.

, 54:1060{1061, 1964.

[7] S. Shah and J. K. Aggarwal. A simple calibrati on proce-

dure for sheye (high distortion) lens camera. In

, pages

3422{3427, 1994.

[8] Steven D. Zimmermann. Omniview motionless camera

orientation system. , 1993.

Figure 4: Feedback from the minimization

We tested our algorithm using four sheye images

(Figure 5) taken by rotating the camera roughly 90 de-

grees for every shot. We used Ko dak ASA 400 lm, and

the images were scanned at resolution of 768 512 with

24-bit color. In the b ottom portion of each image the

tripo d is visible. Since the eld of view of the sheye

lens is near 180 degrees, and its no dal p oint has to be

on the rotation axis, there appears to b e no easy way

to get around the problem. In our minimization, we

do not take the bottom p ortion of the sheye images

into account. The reason we usually tilt the camera

upward is that since the bottom portion contains the

tripo d anyway,we are better o tilting it upward so

that the top portion (near north p ole) is covered re-

dundantly.

In the minimization, the inital rotation angles are

90-degree apart, and the initial rotation axis is point-

ing north. The image registration is gradient-based.

We currently use the derivative of Gaussian as the gra-

dient lter and the Gaussian as the smoothing lter.

The size of the smo othing and gradient lters are ad-

justable to achieve registration at dierent scales. Fig-

ure 6 shows the feedback information during the mini-

mization. The lens distortion model is the cubic poly-

nomial as in Eq. 3. The seams in the feedback images

are intentionally left so that the users know where each

sheye image is mapped. Those seams will not be visi-

ble in the nal stitched panoramas. We can see that the

optimization converges quickly to the global minimum.

Figure 7 shows the nal results of minimizations when

the ideal equi-distance pro jection model and our cubic

distortion model are used. We also tested the same op-

timization on three other sets of sheye images taken

indoor and outdo or using the same sheye lens. In all

cases, wewere able to converge to the global minimum

in our rst try.

Figure 8: Calibrated Lens Distortion

The result of the self calibration of the sheye lens

is the cubic polynomial of the pro jection mo del speci-

ed in Eq. 3. Figure 8 shows the calibrated pro jection

model and the ideal equi-distance mo del.

Once the four sheye images are registered and the

sheye lens is calibrated, we can represent the spheri-

cal panorama using any pro jection. The equi-distance

pro jection we used in the minimization feedback is one

choice. We can also, for example, pro ject the spheri-

cal panorama onto a cube. Figure 9 shows the texture

maps as pro jected on the six faces of the cub e.

Creating image-based VR using a self-calibrating fisheye lens

Figures

Citations

Computer Vision: Algorithms and Applications

Image Alignment and Stitching: A Tutorial

Rendering with concentric mosaics

Review of image-based rendering techniques

Systems and Experiment Paper: Construction of Panoramic Image Mosaics with Global and Local Alignment

References

Light field rendering

QuickTime VR: an image-based approach to virtual environment navigation

Camera Self-Calibration: Theory and Experiments

Omniview motionless camera orientation system

Self-calibration from multiple views with a rotating camera

Related Papers (5)

Catadioptric omnidirectional camera

QuickTime VR: an image-based approach to virtual environment navigation

Multiple view geometry in computer vision

Plenoptic modeling: an image-based rendering system

Creating full view panoramic image mosaics and environment maps