Open AccessProceedings ArticleDOI

A "Region-Growing" Algorithm for Matching of Terrain Images.

Q: How many processors can parallelise the algorithm?

The authors have already parallelised their algorithm on a network of SUN workstations, and speedup is, as predicted, linear up to 15 processors (which was all the authors could get their hands on!).

Q: What is the main item not specified by this pseudo-code?

Selecting which match should be used to grow from ...The remaining major item not specified by this pseudo-code is which match should be selected from list_to_be_grown_from on each iteration.

Q: How many interations does it take to get a good theoretical estimate?

The number of interations required depends heavily upon the input data (image & initial parameter values), so it is very difficult to get a good theoretical estimate.

Q: Who helped us with the Transputer-related issues?

The authors would also like to thank their colleagues for help and useful discussions; most notably, Kevin Collins (of RSRE) helped us with Transputer-related issues, while Tim Day and JP.

Q: What is the reason for linearizing in the way Gruen does?

"One of the reasons for linearizing in the way Gruen does is that the equations are then in the form of a "Gauss-Markov estimation model".

G. P. Otto, +1 more

- pp 1-6

About:

This article is published in Alvey Vision Conference.The article was published on 1988-01-01 and is currently open access. It has received 40 citations till now. The article focuses on the topics: Matching (statistics) & Terrain.

Content maybe subject to copyright Report

A "Region-Growing" Algorithm for Matching of Terrain Images

G. P. Otto

T. K. W. Chau

Department of Computer Science

University College London

Gower Street

London WC1E6BT

This paper describes and discusses a new algorithm for stereo match-

ing, which has been designed to work well with data from the SPOT

satellite.* It is basically an

extension

ofGruen's adaptive least squares

correlation algorithm,

so that whole images can be automatically

matched,

instead of just selected patches. Initial results on quality and

speed are presented, together with a theoretical analysis of

the

potential

speed

both conventional and multi-processor architectures.

OVERVIEW

This paper describes and discusses a new algorithm for stereo

matching, which has been designed to work well with data from the

SPOT satellite.

It is basically an extension of Gnien's adaptive least

squares correlation algorithm,

so that whole images can be

automatically matched, instead of just selected patches.

THE PROBLEM

We wanted an algorithm capable of producing accurate

and

dense

disparity maps quickly and

reliably

using images produced by the SPOT

satellite. (Ultimately, we want accurate digital elevation models, but

this

paper does

not deal with the associated sensor calibration and orien-

tation problems.) As a target, we want our system to be able to achieve

height accuracies of 5 m or better (RMS) over all of the visible scene

surface. Assuming SPOT images with a base-height ratio of about 1,

this means that we need to match to

better than

0.5 pixelf wherever

pos-

sible.

The panchromatic scanner on the SPOT satellite is a linear 6000

element CCD array; 2-dimensional images are produced by composing

a succession of 1-D images, taken at 1.5ms intervals, as the satellite

orbits the earth at" 7km/s. The linear array is approximately perpendic-

ular to the satellite's track, so the 2-D images are approximately square.

(SPOT supply the data as 6000 x 6000 images.) For stereo work, the

array is frequently tilted so that it is looking at angles off the vertical of

up to 25° or so, thus allowing base-height ratios of just over 1. This,

and the fact that 1 pixel corresponds to about 10m x 10m on the ground,

gives the potential for deriving good quality DEM's from this data. For

more information, see (e.g.) Chevrel et al.,

Gugan

and Dowman.

POSSIBLE APPROACHES TO SOLVING THE PROBLEM

Let us start by summarising some important facts about our appli-

cation area:

• The surface we are looking at is typically continuous. (Overhang-

ing cliffs & such-like are rare.) On the other

hand,

there may well

be cloud and haze between the sensors and that surface, so

regions may be occluded or changed in contrast.

• The stereo pairs of images are taken at different times, so transi-

tory phenomena such as clouds and wave-patterns will not match

up between the two images.

• The images typically contain lots of small-scale texture (though

often with poor contrast), but large-scale features are relatively

sparse.

• There are often significant distortions between corresponding

image patches — the base-height ratio of the sensor-scene combi-

nation is typically about 1 (which makes it easier to attain good

height accuracy), and the scenes often contain steeply sloping

regions.

• The initial disparity ranges can be large — up to about 1000 pix-

els (e.g. when trying to match two SPOT images, of base-height

ratio " 1, of the Himalayas) — even for "ordinary terrain", ranges

of 100-200 pixels

are

typical.

• Obtaining accurate sensor orientation information is relatively

difficult/expensive, so we do not wish to use geometrical con-

straints for the matching unless necessary, or unless not-very-

accurate information is adequate.

• Since SPOT uses a scanned line sensor, algorithms which rely

upon the epipolar lines being known need to be converted into a

successive refinement format (if they are to produce accurate

disparities). See Otto

for

discussion of

this

point

3.1.

An area-based (correlation) algorithm is most appropriate for

this task

People commonly divide stereo matching algorithms into two

kinds:

feature-based and area-based. In feature-based althorithms, the

original pixel data is converted into some more abstract "features"

(often line-segments) before matching, whereas in area-based algo-

rithms the pixel data is compared directly (typically by minimising

some measure of "mismatch" over small areas surrounding the points of

interest). Feature-based algorithms are typically faster, since converting

images into "features" reduces the quantity of data to be handled, and

makes the comparison/matching between images easier. For this rea-

son,

we started by considering primarily feature-based algorithms.

However, we came to the conclusion that an area-based (correla-

tion) algorithm was more appropriate for our needs, because of the

difficulty of simultaneously attaining the accuracy & density goals with

a feature-based algorithm. This difficulty arises from the need to be

able to locate a "feature" to high accuracy (at least 0.2 pixel) near any

position in an image, in order to be able to obtain correspondences accu-

rate to better than 0.5 pixel "densely" over the scene, t (Large

regions

the SPOT images we have been working with (views of Aix-en-

Provence) have few good edges; in addition, there is significant noise

(much of it linear!) in the images we have.)

Area-based algorithms require assumptions about the smoothness

of the surface being viewed (e.g. that, locally, it is approximately flat)

so that disparities at neighbouring points can be related. However,

where those assumptions are valid (which they usually are in this appli-

cation), the area-based algorithms can produce very accurate answers

since they make use of all of the local data. Furthermore, area-based

algorithms will perform reasonably well even in regions which are

nearly homogeneous.

The area-based algorithm we chose to concentrate on was one

described by Gruen,

because it was claimed to achieve very high

accuracy

(though that was only on selected regions of images), and

because it could be made to run at an acceptable speed (see appendix)

t We need "better thin", so that we have same margin for error in the conversion from

disparities to heights, and for homogeneous and difficult-to-match areas.

t Note that it is difficult to "post-process" the correspondences to increase accuracy,

because of the lack of large-scale regularity in our scenes. On the other hand, post-

processing can be effectively used K> remove "blunders". i-e. grossly inaccurate

matches.

123

AVC 1988 doi:10.5244/C.2.19

even though it allowed for distortions caused by (e.g.) the different

viewing angles.

3.2.

Outline of Gruen's Algorithm for matching image patches

Gruen's algorithm is an adaptive least-squares correlation algo-

rithm — the basic idea being to minimise the sum-of-the-square-of-the-

differences between two image patches, with the minimisation being

over a set of parameters specifying how the patches (and their grey-

levels) are allowed to be distorted between images. Such distortion can

arise from many causes (e.g. perspective distortion); Gruen

allows for

an affine transformation between coordinates in the images, and an

additive distortion to the grey-levels. This geometric distortion is

more-or-less equivalent to assuming that the viewed surface is approxi-

mately planar (within the region visible in a patch), and that each patch

subtends a small angle at the sensor (so that non-linear terms in the per-

spective distortion can be ignored). Note that it does not assume any-

thing about the angle of inclination of the surface, and that it will cope

with large base-height ratios.

Ending the best fit is essentially a multi-parameter optimization

problem. Gruen solves this iteratively, by making initial estimates of

the parameters, and then linearising the problem in such a way that he

has to solve a set of over-constrained linear equations at each stage.

(See his papers, and the appendix, for more detail. Alternatively, Chau

and Otto

give a concise & precise formulation.)

This algorithm is capable of producing high-accuracy results at a

moderate computational cost, but its radius of convergence is small —

e.g. it needs to be given starting values for disparities which are within a

few pixels of

the

true values.

33.

Solving the "search" problem

The correlation algorithm can accurately match points in the two

images — but it needs to be given good approximations to start with.

Rather than attempt some form of general search (over a potentially

large range of disparities and distortions), this seemed to be an excellent

time to exploit the continuity of the surface being viewed. This led to

the following algorithm:

DESCRIPTION OF THE REGION-GROWING ALGORITHM

The essence of the algorithm is simple: start with an approximate

match between a point in one image and a point in the other, use

Gruen's algorithm to produce a more accurate match and the distortion

parameters, and use this to predict approximate matches for points in

the neighbourhood of the first match. Then use Gruen's algorithm to

refine these matches, and so on.

In pseudo-code this becomes:

{ INPUTS: two images; 1

or more

approximate matches between the images }

set list_to_be_grown_from to empty

for each approximate match

run Gruen's algorithm

if it converges

store result in list_to_be_grown_from

while list_to_be_grown_from is not empty

pick an item from the list (& remove it from the list)

for each "neighbour" of

the

selected match

if "neighbour" not already matched

use selected item to predict match

run Gruen's algorithm using prediction

if it converges (and satisfies any constraints we might impose)

store result in list_to_be_grown_from

This pseudo-code doesn't mention output — clearly, the results

can either be output as they are generated, or at the end, when they can

be ordered in whatever fashion is convenient

Another thing not defined by this pseudo-code is what a "neigh-

bour" is. Since Gruen's algorithm is only applicable when the scene

surface is approximately planart, it does not seem worthwhile to use it

directly to attempt to match every pixel in (say) the left hand image to

t Though it can be generalised somewhat— Be discussion in lection 4.Z

the corresponding point in the right-hand image — if we use Gruen's

algorithm on every 5th or 10th pixel, we can predict quite accurately

what the matches are for intervening points. For this reason, our current

implementation of this algorithm allows us to specify a grid of regularly

spaced pixels in the left-hand image, which are the points for which

Gruen's algorithm will be used. Then the "neighbours" are the four

nearest points on this grid.

4.1.

Selecting which match should be used to grow from ...

The remaining major item not specified by this pseudo-code is

which match should be selected from list_to_be_grown_from on each

iteration. Our first attempt used a stack (last-in

first-out

list), because it

was simple and would keep the list size to a minimum. However, this

version turned out to be susceptible to mismatches, which could occur

when a region of the image was relatively homogeneous, or was

obscured by cloud. (See below for more discussion.) To cure this, we

decided to use a "best-first" strategy.

The "best-first" strategy is based on assigning a measure of

"goodness" to each match we obtain using Gruen's algorithm. Then,

when selecting a match to use for prediction, we use the "best" match

left in list to be grown from. (This requires the list to be a priority

queue — but this is not a notable overhead if a data-structure such as a

heap

is used.)

4.1.1.

Which match is "best"?

Gruen's algorithm can fail in two main ways:

(i) it fails to converge;

(ii) it converges, but to the wrong match.

The first of these failure modes does not trouble the region-

growing algorithm too badly, since

(a) it knows that it has failed, and

(b) it is likely to have several more attempts, from other direc-

tions.

This latter point is important, since if the failure to converge was

due to a bad prediction (because, for example, a ridge or other

discontinuity has been crossed), there is a good chance that the

predictions from at least one of the other directions will be better.

(E.g. the one(s) from the other side of the ridge.)

On the other hand, the second mode of failure could be very seri-

ous,

for two reasons:

(a) the algorithm does not know that a bad match has occurred,

and

(b) it may lead to worse matches in future, if the surrounding

regions of the images lack sufficient distinguishing features

to ensure that Gruen's algorithm either produces the correct

match or nothing at all.

(The first, depth-first, implementation of this algorithm went

wrong for this reason — when it attempted to grow through a

nearly homogeneous region, it wandered sufficiently far from the

correct match that

later

matches converged wrongly.)

There are two ways of curing this:

• devise tests which ensure that a match is "correct";

• grow from the matches which are (believed to be) most accurate

first

The first method is desirable, but it is not obvious how to do this

thoroughly. However, relatively simple tests can detect many

errors;

for

example, using the sensor geometry one can constrain the possible

matches to a line — anything too far from this is

blunder.

The second method is relatively easy, so long as one can produce some

number which is likely to correlate with the accuracy of the match. A

number which seems plausible, and which has succeeded well so far, is

the value of the largest eigenvalue of the 2 x 2 matrix composed of the

estimated (co)variances of the line & sample disparities. (See Gruen

for the derivation of this matrix.) This number should correlate well

with the magnitude of the error in disparity for that match, but the

derivation of the estimated (co)variances depends on assumptions about

124

the image noise characteristics which are only approximately true, at

best, so there may be better measures of "goodness".

4.2.

Choosing the set of allowable distortions

The accuracy of our solution is a trade-off between the following

factors:

(i) how well the distortions we allow (or correct for) model the real

distortions (within an image patch);

(ii) how many parameters we have to estimate (the more parameters

we have to estimate from a given set of data, the less accurate the

estimate of any parameter

becomes,

in general);

(iii) and how big we can make our patches (so that we can "average

over" more data, and thus obtain more accurate answers).

This leads to two design questions:

• what are the significant distortions within a given size of patch?

• what is the smallest set of parameters which will compensate for

those distortions? (In particular, it may be possible to correct for

some of the distortions using just (e.g.) the sensor geometry,

rather than

measuring them from the data in a patch.)

It is convenient to divide the distortions into two classes:

radiometric (which affect the measured grey-level at any corresponding

point),

and geometric (which affect the positions of corresponding

points). The radiometric distortions will arise from causes such as vari-

ations in sensor gain, atmospheric haze & so forth. The geometric dis-

tortions will arise because of the projection of the viewed surface onto

two different viewpoints.

4.2.1.

Radiometric distortions

Variations in atmospheric haze alone can cause significant varia-

tion in grey-levels. (A factor of two or more in contrast for correspond-

ing patches.) Thus, we need to compensate for changes in contrast

between images. In addition, there is often a significant additive offset,

so we currently allow for an additive and a multiplicative distortion

between left and

right

images patches. This has worked well so far.

4.2.2.

Geometric distortions

The geometric distortions model the way in which the disparities

vary in a local region of the images. The disparities will, in general,

have both an x and y componentf. The x disparity can vary, almost

arbitrarily, due to changes in scene height. However, because the

earth's surface is relatively smooth (at the scales we are considering),

the variation can be modelled quite well (locally) by a linear variation.

The y disparity can, in theory, be determined once the sensor orienta-

tion is known accurately and the x disparity is

known.

(The y disparity

cannot be determined accurately without knowing the scene height (or

equivalently, the x disparity) — see Otto.

) However, the SPOT

satellite's attitude varies slightly & not very predictably during its orbit,

so that it is very difficult to calculate the y disparity to better than a

pixel or so, unless hundred's of control points are used. Thus, it is

easier, and more accurate, to measure the y disparity than to calculate it.

For the sake of computational convenience (and because that is the way

Gruen

did it), we also measure, rather than pre-calculate, the varia-

tion in v disparity across a patch. However, we would expect some

improvement in quality (and speedl) if we utilised the geometric infor-

mation from the SPOT headers, so that these distortions could be

expressed as functions of

the

other distortions. We intend to experiment

with this fairly soon.

This model of the geometric distortion gives us 6 parameters (

x-disparity, y-disparity, ^-disparity

dx-disparity

dy-disparity

'ty

) — '•

e- we are mode

g the disparities (locally) using

the first-order terms of a Taylor series.

t We will unime that x increafea from left to right i

downward.

i an image, and y mcieaiBi

This model will break down at discontinuities (such as a

ridge),

where the terrain is very rough (e.g. craggy peaks). This has not yet

proved to be a significant problem, but more experimentation is required

before we can be confident of this. If this algorithm were to be applied

to aerial photographs (which have a much higher resolution), then we

would need to do some modification to cope with (e.g.) urban areas,

with their many sharp changes in slope. Li such areas, it may be

appropriate to augment our matcher with some edge-based matching, or

to allow "folded

plate"

distortions in the correlations.

5. GENERATING APPROXIMATE MATCHES FOR THE

REGION-GROWING TO START FROM

This region-growing algorithm requires a few approximate

correspondences to start growing the regions from. These correspon-

dences need to be accurate to about 1-2 pixels, and, ideally, there should

be at least one such match in each "isolated region" of the images. (By

"isolated region" we mean a region which is surrounded by nearly

homogeneous or obscured regions, so that the region-growing algorithm

has no good path to follow from other well-textured areas.) We have

used only small numbers of approximate correspondences to seed our

algorithm (typically 3 or 4); this has worked well so far, but we will

probably use more when we automate this stage properly.

Such correspondences can be generated by hand (e.g. any ground

control features used for sensor orientation), or automatically, by an

algorithm such as Barnard and Thompson's algorithm.

Kevin Collins

has modified this algorithm to work with SPOT data — see Collins et

al.,

Day and Muller,

and Muller et al.

Another method, which shows promise, but which hasn't been

fully tested yet, is based on the idea of picking isolated "features" using

some straightforward feature detector (e.g. critical points of

the

intensity

surface, or Moravec

or Forstner

), finding the largest set of matches

which are geometrically consistent with the a priori data, and with each

other, and then keeping only those points which are uniquely matched

and which correlate well (using e.g. Gruen) with each other.

6. HOW GOOD IS THE REGION-GROWING ALGORITHM?

6.1.

How good are the results?

We only have preliminary results about the quality and robustness

of this algorithm. One example will give the flavour of the results so

far:

Using the region-growing algorithm to produce a 30m DEM over a pair

of 240 x 240 subimages extracted from SPOT images of Aix-en-

Provence, we obtained plausible matches for over

99%

of the matchable

points. (Some points were not visible in both images, or were obscured

by cloud.) Comparing the matched points against a DEM derived from

underflight photography, we found:

Mean difference = -3.716

Standard deviation =

7.955

RMS difference =

8.780

(No blunder detection was used.) The DEM derived from the

underflight photography is estimated to have an RMS height error of

about l-2m; in addition, much of the remaining error is due to the sen-

sor model used. The remaining error is that due to the matching algo-

rithm; this error appears to be significantly less than 0.5 pixels (RMS),

possibly as low as 0.1 pixels (RMS). One notable cause of error

appears to be the smoothing effect of the largish correlation patches

used (21 x 21 pixels), since in the area being viewed there were several

narrow ravines. We expect these results to improve when we have

incorporated various refinements into the algorithm and the sensor

model.

Colleagues in the Department of Photogrammetry and Surveying

are going to do a thorough analysis of the quality of results obtained by

this algorithm. See Day and Muller

for a description of some of their

early results.

125

62.

How fast is this algorithm?

The speed of the algorithm depends partly on the implementation,

and partly on the parameters used (e.g. patch size and grid spacing).

Since we are still in the process of evaluating and tuning it for real data,

we cannot yet give a definitive answer. Thus, this section will just out-

line

our

knowledge so far.

6.2.1.

Speed of Gruen's algorithm

On a single SUN 3/180, using a 68881 floating point coprocessor,

our current implementation takes just under

sec to do one iteration of

Gruen's algorithm, using a patch size of 21 x 21, or about

sec for a

patch size of 15 x 15. (The time is approximately proportional to the

area of the patches used.) Receding to use integer arithmetic wherever

possible would probably speed this up by a factor of about 4. However,

a 20 MHz T800 Transputer^ should be able to reduce these times to

about 0.06 and 0.03 seconds respectively, if good code is produced for

the critical sections. (See appendix.)

For any given pair of patches, these times need to be multiplied

by the number of iterations required to get accurate convergence — our

initial experiments indicate that typically only one or two iterations are

needed when Gruen's algorithm is used as part of the region-growing

algorithm.

Overall, then, matching one pair of patches on a Transputer will

typically take between

0.05s

and 0.2s, depending on the parameters

used, and the image characteristics.

6.22. Parallelizing the region-growing algorithm

Since Gruen's algorithm requires significant computation (at least

50 ms worth) on comparatively little data (two image patches, together

with the initial values of the parameters add up to a few kbytes), an easy

and efficient way of parallelising the region-growing algorithm is to

have one central "master", which manages the priority queue & the

image data, and many "workers", which just apply Gruen's algorithm to

whatever patches they are given, and return their results to the "master".

Since we wish to match many more points than we have Transputers,

and Transputer links are fast enough to transfer the data faster than the

CPU's can process it, this automatically leads to good load-balancing

and a high processor utilization, with little "parallelization" overhead,

i.e. We expect the speedup to be linear in the number of processors.

Furthermore, the "workers" do not need much memory — 100-

200kbytes is ample.

If we have enough "workers", then the one "master" will become

a bottleneck. Initial analysis has shown that this will not happen until

there are at least 30 "workers", and probably more. Even then, it would

be relatively straightforward to partition the master over two or more

processors.

We have already parallelised our algorithm on a network of SUN

workstations, and speedup is, as predicted, linear up to 15 processors

(which was all we could get our hands on!). A Transputer implementa-

tion has just begun (in collaboration with the Royal Signals & Radar

Establishment, Malvem).

63.

Summary — speed

Using 30 T800 Transputers, and the algorithm above, it should be

possible to produce a high-quality, dense disparity map from a pair of

SPOT images in about 2 hours.J Clearly, this time will vary, depending

upon the image chacteristics and the algorithm parameters; but this time

should be fairly realistic and representative.

7. SUMMARY AND CONCLUSIONS

We have described an algorithm which is capable of producing

high-quality, dense range-maps, which runs at a reasonable rate on con-

ventional processors, and which can achieve linear speedup on multi-

processor architectures. It is applicable when the scene being viewed

has significant texture, but few discontinuities, and a full range-map is

required.

The speed is independent of the range of disparities present in the

images, and the algorithm does not require any knowledge of the sensor

geometry. (Though such knowledge can be used to get more accurate

matches, and to assist in the initial "seeding" of the algorithm with

approximate matches.)

8. ACKNOWLEDGEMENTS

This work was done as part of a collaborative research project,

funded by the Alvey Directorate. We would also like to thank our col-

leagues for help and useful discussions; most notably, Kevin Collins (of

RSRE) helped us with Transputer-related issues, while Tim Day and J-

P. Muller (both of UCL-P&S) provided us with input & test data, and

helped with the quality assessment.

APPENDIX — Speed of Gruen's algorithm on one

processor

This section analyses the speed potential of Gruen's

stereo

match-

ing algorithm,

running on a single processor machine. It assumes

that it is being used for matching one patch in one image to one patch in

the other. This is because Gruen is likely to be parallelized (for a

multi-processor system) by doing such matches concurrently on

separate processors, with little or no interaction between them. See

main text for a fuller discussion.

Al. Assumptions, parameters and notation used in this analysis

The (resampled) image patches will be

regarded

rectangular,

size N xM. The number of parameters to be determined will be denoted

by p. The notation will generally follow Gruen;

exceptions will be

noted where they occur.

Edge (of matrix/image) effects will not be discussed since they

would complicate the analysis, but don't significantly affect the speed.

A2.

"Basic" Gruen

This section discusses a "no bells or whistles" version of Gruen's

algorithm (the core of his 1985 paper) — the next section will discuss

the possible refinements, and their effects upon the speed.

As mentioned above, the algorithm is iterative, so we will begin

by discussing the steps

required

within one iteration.

A2.1 Resampling an image patch

The algorithm works by matching image patches — typically rec-

tangles of between 15x15 and 30x30 pixels. The left-hand patch is just

a subwindow of the left-hand image, and stays constant throughout.t

The

right-hand

patch is a distorted and resampled subwindow from the

right-hand image. The distortion is an affine transformation (which

includes the disparity information), together (typically) with a

radiometric adjustment of some kind. (E.g. a constant added to each

point of

the

patent.) Gruen uses bilinear interpolation.

Essence of

the

resampling:

t One 20MHz T800 Transputer can manage between

and 15 MFlops on floating-point

intensive code. This, together with the fast serial links, makes the Transputer a very

suitable building block for this application.

t This figure is based on: each point requiring about 0.2 CrV-seconds to match (on

average); matching all the points on a square grid consisting of every 6th pixel

horizontally

vertically (ie. 1 in 36 pixels of the

original);

6000 x 6000 images.

The collaborating partners are the Department of Photogrammeiry and Surveying at

University College London, the Royal Signals and Radar Establishment, Laser-Scan

Laboratories

and

Thom-EMI Central Research Laboratory.

t The 1986 paper also allows the left-hand patch to move, so that the final disparity

estimate is obtained for a fixed xj» position with respect to the ground. This isn't

relevant to

us,

so we will ignore it here.

i Such an offset doesn't need to be done during the reaampling. All it requires is Ihe

corresponding distortion parameter, and appropriate design matrix entries.

126

FOR x = 0

N -1 DO /* coords in resampled patch */

FORy = 0TOM-lDO

x_from = a*x + b*y + c;

a, b, c constants */

y_from = d*x + e*y + f; I* d, e, f constants */

patch[x,y] = interpolate(rhimage,x_from,y_from);

At first glance, this seems to require 4 multiplications and several addi-

tions within the inner loop to calculate x_from & y_from. However, by

moving constant expressions out of the loop & so forth, the new values

of x_from, y_from can be calculated from the old with just one addition

each. So, let us concentrate on the interpolation

part

for

while.

The basis of

the

(bilinear) interpolation is:

int_x = \xjrom\ ;

int_y =

\.yjrom\ ;

fract_x = x_from - int_x;

fract_y = y_from - int_y;

tempi = (l-fract_x)*rhimage[int_x,int_y] +

fract_x*rhimage[int_x+1

,int_y];

temp2 = (l-fract_x)*rhimage[int_x,int_y+l] +

fract_x*rhimage[int_x+1 ,int_y+1

];

result = (l-fract_y)*templ + fract_y*temp2;

This can be rearranged into:

[ calculate int_x, int_y, fract_x, fract_y ]

tempi = rhimage[int_x,int_y] + fract_x*(rhimage[int_x+l,int_y] -

rhimage[int_x,int_y]);

temp2 = rhimage[int_x,int_y+l] + fract_x*(rhimage[int_x+l,int_y+l] -

rhimage[int_x,int_y+l]);

result = tempi + fract_y*(temp2-templ);

In addition to calculating

int_[xy],

fract_[xy], and some array accesses,

this requires 3 multiplications each time, and 6 additions/subtractions.t

int_[xy],

fractjxy] can be calculated directly instead of calculating

x_from, y_from — thus 4 additions are needed within the loop to calcu-

late these. (In practice, int_x will stay constant for many iterations of

the

inner

loop,

so this could probably be reduced to 3 additions, on aver-

age.)

The array accesses can be speeded up by using pointers and regis-

ters (in a language like C

) or, in cccam,

"IS"

and local variables, so

we will just regard them as part of the "housekeeping". The multiplica-

tions and additions/subtractions can all be done using integer arithmetic,

if the numbers are scaled suitably. Thus, we can assume that integer

arithmetic has been used, if it is faster on the system being used.

Overall then, the resampling will require about

MJN.Q

multiplications +10 additions + some

housekeeping)

A2.2.

Forming the "design matrix" and "observation equations"

The "design matrix" (called A in Gruen's papers) is a matrix with

MN rows (one for each x,y position within the resampled image

patches), and p columns. In the 1985 version, each row of the matrix

has the form:

(gx *g* y-8x gy x.g

y.g

where g

denotes 4^-, and g

denotes -SS-, evaluated at the appropriate

(xy) coordinates within the patch. To form each row of this, we need

to calculate g

, g

(essentially two subtractions to form the first differ-

ence approximations), and then multiply these by x, y appropriately (4

multiplications). The time to insert the final 1 will be regarded as part

of the house-keeping.

With care & implicit decimal points, these calculations can also

be done using integer arithmetic. Overall then, forming the design

matrix will require about

MN.(4

multiplications

+ 2 additions + some

housekeeping)

f Fuitber simplification is possible if the remapping is a pure translation, since then

ftmc;_[xy] arc c

Also required for the observation matrix is the vector /, which is

an NM column vector, each of whose elements is f(xy}-g(x,y) for

the appropriate point of the image patch. This adds another

MN.(additions + some housekeeping)

A23.

Solving the over-constrained linear equations

To do this, Gruen first calculates A

PA and A

Pl. P is a

xNM

matrix specifying the weights to be used when summing the

squares of the differences of the distorted patches. For the simple case,

it is equal to the identity matrix, and can be ignored.

In general, to multiply an

xm matrix by a

xp matrix requires

nmp multiplications + ~nmp additions. (There are cunning methods

which are slightly faster on large matrices, but they are unlikely to be

faster on this size of matrix. See Aho

for an introduction to these

methods.)

Thus,

to calculate

PA will require roughly

NM.p

.(addition +

multiplication

housekeeping)

operations, since the

transpose can be done as part of the matrix multiplication. In fact, we

can halve this, since A

PA will be symmetric, so we need only calcu-

late the upper (or lower) triangle. Again, with careful juggling, integer

arithmetic can be used if it is faster.

To calculate A

PI will require about NMp additions and multi-

plications — integer arithmetic can again be used.

If we denote the column vector consisting of the (updates to) the

parameters as a (Gruen uses x in his paper), then we now have that

PA)a = (A

Pl)

Since A

PA is a fairly small, real symmetric positive definite matrix,

this can most easily & efficiently be solved using Cholesky decomposi-

tion.

(See Stoer,

chapters 4 and 8, for an explanation of this method

and a discussion of the alternatives.) This requires about

multipli-

cations and additions, + p square roots, to decompose the matrix into a

product of triangular

matrices.

Then, - 2.p

operations (multiplications

and additions) are required to solve for a given this decomposition.

These matrix operations must be done in floating point (prefer-

ably double precision) to avoid overflow and/or significant rounding

errors.

A2.4.

Termination test

This is very simple — the iterations finish when the changes in

the remapping parameters fall below a threshold. This requires 6 abso-

lute value operations, and 6 comparisons — negligible in comparison to

the other operations.

A2.5.

Summary — operations, time, per iteration

Adding up the

figures

above, we get

add +

(•*g-+2p

)

floating

point operations + p square roots +

housekeeping

If we take N, M = 20, and p = 7 (fairly typical values), this gives us

about

15000

mult.

+ 18000 add + 155

floating

pt. ops + 7 square roots +

housekeeping

per iteration. Clearly, unless floating point operations are very much

slower than integer operations, we can neglect the specifically floating

point operations in comparison with the other arithmetic. (N.B. Float-

ing point may be used for the other operations, but only if it is faster.)

If we assume that these operations will take of the order of a \is each

(plausible if using floating point on a T800 Transputer), this gives us a

total time of about 34ms — so if we allow a bit extra for housekeeping

& so forth, we get a round

figure

of about 50ms.

Alternatively, if we assume that we use integer arithmetic on a

Transputer (or a SUN-3), then a multiplication will take about 2\xs, and

an addition/subtraction < 'A\ls, giving a total time of about 40ms per

iteration for the calculations. Adding in something for housekeeping

127

HTML Viewer

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Development and Status of Image Matching in Photogrammetry

Armin Gruen

- 01 Mar 2012 -

Photogrammetric Record

TL;DR: In this paper, the authors describe the development of image matching techniques in photogrammetry over the past 50 years, address the results of some empirical accuracy studies and also provide a critical account of some of the problems that remain.

...read moreread less

DissertationDOI

Automatic Digital Surface Model (DSM) generation from linear array images

Li Zhang

TL;DR: In this paper, an image matching approach for automatic digital surface model (DSM) generation from linear array images, which has the ability to provide dense, precise and reliable results is presented.

...read moreread less

Scene Reconstruction and Visualization From Community Photo Collections Recent progress is described in digitizing and visualizing the world from data captured by people taking photos and uploading them to the web.

Noah Snavely, +5 more

TL;DR: In this article, the authors review recent progress on these challenging computer vision problems, and describe how to use the recovered structure to turn community photo collections into immersive, interactive 3D experiences.

...read moreread less

Algorithm Theoretical Basis

David J. Diner, +8 more

Automatic dsm generation from linear array imagery data

Zhang Li, +1 more

TL;DR: A matching procedure for automatic DSM generation from linear array imagery data with effective combination of several image matching algorithms and automatic quality control that can provide dense, precise and reliable results.

...read moreread less

Collapse

`Region-growing' algorithm for matching of terrain images

G. P. Otto, +1 more

- 01 May 1989 -

Image and Vision Computing

Using synthetic images to register real images with surface models

Berthold K. P. Horn, +1 more

- 01 Nov 1978 -

Communications of The ACM

PMF: A Stereo Correspondence Algorithm Using a Disparity Gradient Limit:

Stephen Pollard, +2 more

- 01 Aug 1985 -

Perception

Disparity Analysis of Images

Stephen T. Barnard, +1 more

- 01 Apr 1980 -

IEEE Transactions on Pattern Analysis an...

3D Terrain Matching Algorithm Based on 3D Zernike Moments

Bin Ye, +1 more

Frequently Asked Questions (16)

Q1. What have the authors contributed in "A "Region-Growing" algorithm for matching of terrain images" ?

This paper describes and discusses a new algorithm for stereo matching, which has been designed to work well with data from the SPOT satellite. Initial results on quality and speed are presented, together with a theoretical analysis of the potential speed on both conventional and multi-processor architectures.

Q2. How long does it take to match a patch?

then, matching one pair of patches on a Transputer will typically take between 0.05s and 0.2s, depending on the parameters used, and the image characteristics.

Q3. What is the error in the DEM derived from the underflight photography?

The DEM derived from the underflight photography is estimated to have an RMS height error of about l-2m; in addition, much of the remaining error is due to the sensor model used.

Q4. How many constant expressions can be added to the old?

by moving constant expressions out of the loop & so forth, the new values of x_from, y_from can be calculated from the old with just one addition each.

Q5. How many processors can parallelise the algorithm?

The authors have already parallelised their algorithm on a network of SUN workstations, and speedup is, as predicted, linear up to 15 processors (which was all the authors could get their hands on!).

Q6. What is the main item not specified by this pseudo-code?

Selecting which match should be used to grow from ...The remaining major item not specified by this pseudo-code is which match should be selected from list_to_be_grown_from on each iteration.

Q7. How many control points are used to calculate the y disparity?

the SPOT satellite's attitude varies slightly & not very predictably during its orbit, so that it is very difficult to calculate the y disparity to better than a pixel or so, unless hundred's of control points are used.

Q8. How many interations does it take to get a good theoretical estimate?

The number of interations required depends heavily upon the input data (image & initial parameter values), so it is very difficult to get a good theoretical estimate.

Q9. Who helped us with the Transputer-related issues?

The authors would also like to thank their colleagues for help and useful discussions; most notably, Kevin Collins (of RSRE) helped us with Transputer-related issues, while Tim Day and JP.

Q10. What is the reason for linearizing in the way Gruen does?

"One of the reasons for linearizing in the way Gruen does is that the equations are then in the form of a "Gauss-Markov estimation model".

Q11. What is the remaining error due to the algorithm?

The remaining error is that due to the matching algorithm; this error appears to be significantly less than 0.5 pixels (RMS), possibly as low as 0.1 pixels (RMS).

Q12. How many pixels are in the original?

This figure is based on: each point requiring about 0.2 CrV-seconds to match (on average); matching all the points on a square grid consisting of every 6th pixel horizontally or vertically (ie. 1 in 36 pixels of the original); 6000 x 6000 images.

Q13. What is the way to divide the distortions into two classes?

It is convenient to divide the distortions into two classes: radiometric (which affect the measured grey-level at any corresponding point), and geometric (which affect the positions of corresponding points).

Q14. how long does it take to produce a high-quality, dense disparity map?

Summary — speedUsing 30 T800 Transputers, and the algorithm above, it should be possible to produce a high-quality, dense disparity map from a pair of SPOT images in about 2 hours.

Q15. What can be used to help in the initial "Seeding" of the algorithm?

(Though such knowledge can be used to get more accurate matches, and to assist in the initial "seeding" of the algorithm with approximate matches.

Q16. How many approximate correspondences do the authors need to seed their algorithm?

The authors have used only small numbers of approximate correspondences to seed their algorithm (typically 3 or 4); this has worked well so far, but the authors will probably use more when the authors automate this stage properly.

A "Region-Growing" Algorithm for Matching of Terrain Images.

Citations

Development and Status of Image Matching in Photogrammetry

Automatic Digital Surface Model (DSM) generation from linear array images

Scene Reconstruction and Visualization From Community Photo Collections Recent progress is described in digitizing and visualizing the world from data captured by people taking photos and uploading them to the web.

Algorithm Theoretical Basis

Automatic dsm generation from linear array imagery data

Related Papers (5)

`Region-growing' algorithm for matching of terrain images

Using synthetic images to register real images with surface models

PMF: A Stereo Correspondence Algorithm Using a Disparity Gradient Limit:

Disparity Analysis of Images

3D Terrain Matching Algorithm Based on 3D Zernike Moments

Frequently Asked Questions (16)

Q1. What have the authors contributed in "A "Region-Growing" algorithm for matching of terrain images" ?

Q2. How long does it take to match a patch?

Q3. What is the error in the DEM derived from the underflight photography?

Q4. How many constant expressions can be added to the old?

Q5. How many processors can parallelise the algorithm?

Q6. What is the main item not specified by this pseudo-code?

Q7. How many control points are used to calculate the y disparity?

Q8. How many interations does it take to get a good theoretical estimate?

Q9. Who helped us with the Transputer-related issues?

Q10. What is the reason for linearizing in the way Gruen does?

Q11. What is the remaining error due to the algorithm?

Q12. How many pixels are in the original?

Q13. What is the way to divide the distortions into two classes?

Q14. how long does it take to produce a high-quality, dense disparity map?

Q15. What can be used to help in the initial "Seeding" of the algorithm?

Q16. How many approximate correspondences do the authors need to seed their algorithm?