how many atoms are in the dictionary?

2/1221 1 βα . (5) α is an optimality factor related to the strategy adopted to determine the best atom in the dictionary, while β strictly depends on the dictionary representing its ability to capture the feature of the input function f [5].

how many steps can be used to decompose a dictionary?

Applying iterativelysuch a procedure, after N iterations the authors obtain:fRgfRgf N Nnn nn += ∑ −=1 0 , γγ (3)where fRn is the residual at the nth step and ffR =0 .

What is the simplest way to build a dictionary?

This dictionary is built acting on a generating function of unit L2 norm by means of a family of unitary operators Uγ:},{ Γ∈= γγUD , (6) for a given set of indexes Γ .

Why does the g0 algorithm show a better MSE?

Probably this is due to the fact that the peaky generating function g0 suits particularly well when the atom is set in the position where there is a peak in the residual.

(Open Access) New dictionary and fast atom searching method for matching pursuit representation of displaced frame difference (2002) | F. Moschetti

Q: What contributions have the authors mentioned in the paper "New dictionary and fast atom searching method for matching pursuit representation of displaced frame difference" ?

In this paper the authors focus on the Matching Pursuit representation of the displaced frame difference ( dfd ). In particular, the authors introduce a new dictionary for Matching Pursuit that efficiently exploits the signal structures of the dfd. The authors also propose a fast strategy to find the atoms exploiting the max of the absolute value of the error in the motion predicted image and the convergence of the MSE with the rotation of the atoms.

Q: What is the assumption that the authors made?

The assumption the authors made is that the value of the scalar product between the residual image and the atom increases as soon as the authors are getting closer to the right angle, for most of the atoms.

Q: What is the angle to search?

Once found the best matching angle, then a dichotomist process starts which keeps on dividing by two the angle until the authors get to the unit angle γ=Π/128.

NEW DICTIONARY AND FAST ATOM SEARCHING METHOD FOR MATCHING

PURSUIT REPRESENTATION OF DISPLACED FRAME DIFFERENCE

Fulvio Moschetti

, Lorenzo Granai

, Pierre Vandergheynst

, Pascal Frossard

ITS Institute of Signal Processing EPFL, CH 1015 Lausanne Switzerland

IBM TJ Watson Research Center PO Box 219, Hawthorne, NY 10598, USA

ABSTRACT

Matching Pursuit decomposes a signal into a linear

expansion of functions selected from a redundant

dictionary, isolating the signal structures that are coherent

with respect to a given dictionary. In this paper we focus

on the Matching Pursuit representation of the displaced

frame difference (dfd). In particular, we introduce a new

dictionary for Matching Pursuit that efficiently exploits the

signal structures of the dfd. We also propose a fast

strategy to find the atoms exploiting the max of the

absolute value of the error in the motion predicted image

and the convergence of the MSE with the rotation of the

atoms.

Results show that the fast strategy is quite robust when

compared to exhaustive search techniques and it improves

the results of a suboptimal search strategy based on a

genetic algorithm.

1. INTRODUCTION

High compression ratios in video coding are achieved by

adopting hybrid systems that combine two stages. In the

first stage motion estimation and compensation predict

each frame from the neighboring frames. At the second

stage the prediction error is coded. Current video

compression standards use block based DCT to code the

residual error. In [1, 2] authors have shown that improved

coding efficiency can be achieved by replacing the DCT

with an overcomplete transform. Non orthogonal

transforms represent indeed a valid alternative to

orthogonal transforms like DCT or wavelet based scheme

especially at low bit-rates, where most of the signal energy

can be captured by only few elements. Matching Pursuit

(MP) algorithms iteratively decompose a signal in its most

important features using a set of atoms chosen among a

redundant dictionary of basis functions. Particular

attention has to be dedicated to the dictionary design since

it impacts the coding performance.

In this paper we present a new dictionary that efficiently

captures the contour and edges of dfd images. It improves

the performance of a previously introduced dictionary [3],

based on oriented and anisotropically refined atoms,

whose quality performance already overwhelmed the

commonly used two dimensional separable Gabor

function.

The main limitation in the adoption of a redundant

dictionary remains the encoding complexity. For this

reason we propose a fast and efficient method for atom

selection. This method has been compared to another sub-

optimal approach based on a genetic algorithm and to a

full search based approach.

2. MATCHING PURSUIT

A detailed explanation of the theory of the Matching

Pursuit can be found in Mallat et Zhang [4]. Here we just

recall the basics of the iterative process used for the

selection of the waveforms that represent the signal

structures.

Let

Γ∈

γγ

}{ gD

be a dictionary of unitary norm vectors

called atoms and

represent the set of possible

indexes. The function f is first decomposed as follows:

Rfgfgf +=

γγ

, (1)

where Rf is the residual component after having

approximated f in the direction of

Since Rf and

are orthogonal, it follows that

Rffgf +=

. (2)

To minimize

, we must choose

such that the

fg ,

projection is maximal. Applying iteratively

such a procedure, after N iterations we obtain:

fRgfRgf

∑

−

γγ

(3)

where

is the residual at the n

step and

ffR =

As in (2) we can write

, fRfRgf

∑

−

. (4)

This equation expresses the energy conservation.

The convergence of MP depends on both the dictionary

and the (sub)optimal search strategy. In [4] it has been

shown that there are two real numbers

]]

1,0 , ∈

βα

such that

for all

0≥m

the following relation is valid:

()

fRfR

⋅−≤

2/1

221

βα

. (5)

is an optimality factor related to the strategy adopted to

determine the best atom in the dictionary, while

strictly

depends on the dictionary representing its ability to

capture the feature of the input function f [5].

This dictionary is built acting on a generating function of

unit L

norm by means of a family of unitary operators Uγ:

},{ Γ∈=

, (6)

for a given set of indexes

. Basically this set contains

three types of operations: translation

, rotation θ and

anisotropic scaling (c

A possible action of Uγ on the generating atom g is thus

given by:

gccAdUgU ),(),(

(7)

whereU is an operator that acts on translation and rotation,

while A is an anisotropic dilation operator.

The system we implemented for the Matching Pursuit

representation takes as input the difference between the

original frame and the motion compensated one. Motion

estimation-compensation is realized as full search block

matching. Matching Pursuit decomposition is then applied

on blocks SxS, with S=32. The decomposition is halted

when MSE reaches a threshold value of 65, or when the

number of atoms exceed 7 per block.

3. NEW DICTIONARY

In our case we are targeting a particular set of images,

which are the by product of the motion compensation.

They have a particular signal structure, characterized by

edges. To capture these features the proposed dictionary

is composed of anisotropically refined atoms. Anisotropy

increases the redundancy in the dictionary because of the

introduction of an extra parameter to code, but as it has

been shown in [3] this produces an overall increase in

efficiency. In our experiments [6]we chose as generating

function, the combination of a gaussian and a triangular

function, that is

()

ewyxg

+−

⋅=

, where (8)











+−

<≤−

otherwise 24

10for 42

We have compared this dictionary with the one introduced

in [3], that uses the combination of a gaussian and its

second derivative, as expressed by :

()

24,

exyxg

+−

⋅−=

(9)

A possible way to compare two dictionaries consists in

considering the convergence speed of Matching Pursuit,

that corresponds to its ability to extract the maximum

signal energy in a few iterations. Namely, the decay rate of

the residue represents the coding efficiency of the

Matching Pursuit.

From (5) we can see that the error decay rate involves two

parameters

and

. Using an exhaustive searching

strategy of the parameters of the atoms,

becomes not

influential anymore (since equal to 1), so we can try to

estimate

, which depends solely on the dictionary

construction. In this case we have:

−≤

(10).

Eq. (10) sets an upper limit for

which can be estimated

measuring the values

for each iteration m.

We can also notice that the efficiency of the dictionary

might change at each iteration, as a result of the changing

nature of the residue in the iterative Matching Pursuit

operation. For three different sequences (Stefan, Akiyo

and Coastguard) of five images, we have measured the

values of T

at each iteration for all the blocks into which

the image has been divided. We have estimated in this way

the value of

. Results show an improvement for the

dictionary obtained using g

rather than g

of about 4%,

with values of

respectively 0.50 and 0.48.

These results represent an upper bound in the decay of the

residual images. In section 5 we will see that in the

practical case the spread between the two dictionaries is

more pronounced.

4. NEW FAST CONVERGING ALGORITHM

The direct application of the Matching Pursuit algorithm

would require us to test each 2-D basis function at all

possible integer-pixels locations in the image and compute

all of the resulting inner products.

In this section we introduce a fast atom selection algorithm

that reduces the number of position searched and the

number of angles evaluated.

4.1. Coordinates selection

In order to reduce this computationally demanding task,

we make some assumptions about the residual image to be

coded. We assume that the energy in the image is

concentrated in the areas where the motion predicted

model was inadequate. In particular, the points where the

atoms have to be set are chosen by selecting the maximum

value of the absolute difference in the motion residual

frame. In the exhaustive search approach each dictionary

structure is centered at each location in the block area and

the inner product between the structure and the SxS region

of image data is computed.

Choosing directly the position where the atom has to be

set, brings an improvement of a factor S

in the number of

times the inner product has to be computed.

4.2. Angle selection

Atoms are identified by position, scale factors and

rotation. We propose an algorithm to reduce the

operations needed to compute the exact rotation for the

atom. The full search procedure tests every angle from the

smallest to the largest, selecting the one whose projection

has the highest value. Let’s remind that

[[

πθ

,0∈

and we

express it using an integer n such that

128

⋅

The assumption we made is that the value of the scalar

product between the residual image and the atom increases

as soon as we are getting closer to the right angle, for most

of the atoms. We propose a method based on a

dichotomist search.

First the algorithm tests four angles set at iδ, with i=[0..3]

and δ=Π/4; these are called respectively A,B,C,D in Fig.1.

Once found the best matching angle, then a dichotomist

process starts which keeps on dividing by two the angle

until we get to the unit angle γ=Π/128. At each step two

comparisons are made and the rotation angle for which the

best angle is found becomes the angle around which the

new rotation angles are searched.

Indicating with N=128 the possible angles to be searched,

then the complexity Nr of the proposed search can be

expressed as follows:

Nr = 4 + 2log

δ/ γ . (11)

The gain G in terms of complexity of the proposed

approach when compared to the full search is the

following.













log24

(12)

5. EXPERIMENTAL RESULTS

In our experiments we have used the two functions g

and

and we have compared the different search strategies.

FS is the Full Search; Max+FS is the algorithm that places

the atom at the Max of the MAD of the dfd frame and uses

a fullsearch for the 3 other parameters. GA is a genetic

algorithm as described in [7]. Max+GA sets the atom

adopting the same approach as the previous described

method and uses a GA for the other parameters. Max+ang

sets the atoms as the previous two methods and adopts the

angle selection method described in subsection 4.2; the

scale parameters are searched using an exhaustive

approach.

We have used these different search algorithms with the

dictionary obtained from the generating functions g

and

. Results in table 1 have been obtained coding the

sequence Stefan. We can see that the function g

shows a

better behavior than g

. Results are expressed in MSE and

number of atoms needed to code the frame.

The spread between the two dictionaries increases when

sub-optimal search algorithms are used. Probably this is

due to the fact that the peaky generating function g

suits

particularly well when the atom is set in the position where

there is a peak in the residual.

We can notice that the criterion of setting the atom in the

Max of the absolute value introduces a 10% increase in

MSE when compared to an exhaustive approach.

On the contrary, the same criterion (Max+GA algorithm)

introduces a 10% improvement in the MSE when

compared to a complete GA algorithm.

Results for the Max+ang. algorithm show that the idea that

the MSE converges as soon as the atom gets closer to the

right orientation is quite correct. In fact the drop in MSE,

when compared to the Max+FS approach is very limited.

We can state that the logarithmic search doesn’t influence

so much the quality performance.

On the contrary a big impact on the final result is given by

the position chosen to set the atom.

MSE g0 MSE g1 Atoms g0 Atoms g1

FS 193 198 382 382

Max+FS 218 230 387 389

Max+GA 223 236 393 395

GA 249 256 401 404

Max+ang. 221 229 391 388

Concerning the complexity, we have a great improvement

over the full search which is expressed in Eq. (12), while

because of the difficulties to measure the convergence of

the GA we can in absolute say that implementations of the

Table 1 Comparison of the two dictionaries derived from g0 and g1

and of the various atom selection al

orithms for the se

uence Stefan

Fig. 1 Dichotomist algorithm for the angle selection. First four

positions checked are A,B,C,D. 1 and 2 represent the first two

steps of the log search.

Max+ang. and GA takes approximately the same time to

code a frame.

Fig. 2.a and 2.b are two typical examples of the

RfMSE =

behavior when coding a block with the

different algorithms so far proposed. We have selected a

good and a bad motion compensated block. MP stops at

the 7

iteration or when MSE is 65. FS is the limit to be

reached. Max+FS works better than GA.

Another interesting observation that came out of our

results is that the parameters of the atoms for the

dictionary obtained from g

and g

are quite similar.

Position, rotation and scale factors are very close, so a

possible improvement could be to use more dictionaries

and try different functions once determined the atoms

parameter. This would imply a limited computational

impact, since just one more match would be necessary.

Moreover, the cost of coding another dictionary would be

just one bit per atom, while adaptive dictionaries could

probably better match the evolving structure of the

residual.

6. CONCLUSIONS

We introduced a new Matching Pursuit dictionary and a

new strategy for the atom selection. The dictionary

proposed shows a good behavior for dfd images.

The search algorithms introduced dramatically reduce the

computational complexity when compared to the

exhaustive search, while the quality impact is limited. This

new methodology shows some improvements when

compared to another sub-optimal strategy based on a

genetic algorithm. Parameters of the two examined

dictionaries are quite similar once the atom is selected,

which suggests that adapting dictionaries could probably

better represent the evolving nature of the residual.

Variable dictionaries might indeed improve the coding

efficiency and represent an interesting topic for further

investigations.

7. REFERENCES

1. Neff R. and Zakhor A., Very Low Bit-Rate Video

Coding Based on Matching Pursuit. IEEE Trans.

Circuits Syst. Video Technol., February 1997. vol.

7(no. 1): p. 158-171.

2. Osama K. Al-Shaykh, et al., Video Compression Using

Matching Pursuit. IEEE Trans. Circuits Syst. Video

Technol., February 1999. vol. 9(n.1): p. 123-143.

3. Vandergheynst P. and Frossard P. Efficient Image

Representation by Anisotropic Refinement in Matching

Pursuit. in ICASSP 2001. 2001. Salt Lake City.

4. Mallat S. and Zhang Z., Matching Pursuit with Time-

Frequency Dictionary. IEEE Trans. on Signal

Processing, December 1993. vol. 41(n. 12): p. 3397-

3415.

5. Frossard P. and Vandergheynst P. Redundancy in Non-

Orthogonal Transforms. in ISIT. 2001. Washington

DC.

6. L. Granai, Master Thesis Codifica Video con Matching

Pursuit. October 2001, University of Siena, Italy.

7. Rosa M. Figueras y Ventura, Pierre Vandergheynst,

and P. Frossard. Evolutionary Multiresolution

Matching Pursuit and its Relations with Human Visual

System. to appear in Proceedings of Eusipco. 2002.

(a)

(b)

Fig. 2. MSE versus number of iteration in two blocks of the

uence Stefan

New dictionary and fast atom searching method for matching pursuit representation of displaced frame difference

Figures

Citations

Denoising by sparse approximation: error bounds based on rate-distortion theory

Matching pursuits coding of data

Video encoding and decoding methods and corresponding devices

Basis picking for matching pursuits image coding

Low complexity separable matching pursuits

References

Matching pursuits with time-frequency dictionaries

Very low bit-rate video coding based on matching pursuits

Video compression using matching pursuits

Efficient image representation by anisotropic refinement in matching pursuit

Redundancy in non-orthogonal transforms

Related Papers (5)

Matching pursuits with time-frequency dictionaries

Very low bit-rate video coding based on matching pursuits

Matching pursuit video coding .I. Dictionary approximation

Matching pursuits video coding: dictionaries and fast implementation

High-flexibility scalable image coding

Frequently Asked Questions (13)

Q1. What contributions have the authors mentioned in the paper "New dictionary and fast atom searching method for matching pursuit representation of displaced frame difference" ?

Q2. What is the main limitation in the adoption of a redundant dictionary?

Q3. What is the effect of anisotropy on the dictionary?

Q4. how many atoms are in the dictionary?

Q5. What is the assumption that the authors made?

Q6. how many steps can be used to decompose a dictionary?

Q7. What are the advantages of non orthogonal transforms?

Q8. What is the simplest way to build a dictionary?

Q9. How can the authors improve the coding efficiency of video?

Q10. What is the angle to search?

Q11. What is the gain G in terms of complexity of the proposed approach when compared to the full?

Q12. How much would it cost to code another dictionary?

Q13. Why does the g0 algorithm show a better MSE?