Algorithm of defining 1-D indexing for M-D mixed radix FFT implementation

doi:10.1109/PACRIM.1993.407316

ALGORITHM OF DEFINING 1-D INDEXING FOR M-D

MIXED RADIX FFT IMPLEMENTATION

Chwen-Jye Ju, Ph.D.

Sharp Microelectronics Technology, Inc.

5700

NW

Pacific

Rim

Blvd.

Camas,

WA

98607,

USA

ABSTRACT

Multi-dimensional

@I-D)

fast Fourier transform

(FFT) is an essential algorithm in array signal pro-

cessing. However, the calculation of M-D indexing

and transposition

of

data matrix required by the M-D

FFT

are the algorithm's performance killer. The

paper will propose

a

novel M-D to l-D

FFT

signal

flow

graph (SFG) mapping. Thus, the

M-D

FlV

can

be efficiently implemented by the unified l-D indexing

and the address generator design

can

be

simplified.

In

addition, the matrix transposition is no longer

necessary. Finally, practical chip design considera-

tion in implementing the algorithm is given.

1.

INTRODUCTION

In

recent decades, the fast Fourier transform algo-

rithm

has

been

a driving force to the progress of digital

signal processing. With

the

advance of the

WI

tech-

nology, the

FlT

algorithm has

been

pushed further in

solving the multidimensional array signal processing

in

real-time. However, there is no efficient addressing

method for l-D to M-D

ITTs.

Therefore,

the

paper will

conquer

this

problem and propose a unified addressing

for l-D to M-D

FFTs.

All

the

M-D indexing can be

simplified and implemented by l-D indexing. The pro-

posed approach

has

been

implemented by many com-

panies in

their

high-end

systems such as radar, medical

image recovery,

etc.

A

novel vector-matrix representation of l-D

to

M-

D radix-2

ITT

algorithms has

been

discussed

in

[1.21. It

is

shown

that

the

M-D

FlT

has the same matrix form

as

the

l-D

FlT

if

both have

the

same number

of

data.

This

implies that

the

SFG structure

of

the M-D

FFT

can be

mapped to that of the

l-D

FFT.

Thus,

the

unified l-D

indexing can be applied to

the

M-D FE;T.

This

paper

will extend the radix-2

FFT

results to the

mixed

radix

FET

case.

For definiteness,

this

paper only

discusses

the

decimation-in-time digit-reverse-input and normal-output

ITT

algorithms.

Section

2

introduces an easy way of

constructing an M-D

FFT

SFG structure.

The

required

M-D

FJT

addressing sequences including digit-reverse,

data, and twiddle factor are defined in

Section

3, 4, and

5.

Section

6

investigates

the

practical design considera-

tion of

the

algorithm. The

uniiied

indexing for l-D

to

M-D

FFT

algorithms

has

been

implemented

in

the

array

processor chip set LH9124LH9320 developed by Sharp

Microelectronics Technology 13.41. It can be

seen from

the chip set implementation that

the

proposed M-D

FFT

approach has tremendous advantages

over

the traditional

M-D

FFT

approach in

both

cost and performance.

2. M-D

FFT

SIGNAL

FLOW

GRAPH

It

is

well-known that the twiddle factor matrix

of

the DIT can

be

recursively partitioned into the multipli-

cation

of

the

butterfly stage

(BS)

matrices

1561.

These

matrices can

also

be represented by cascading butterfly

stages of the

FFT

signal flow graph

as

shown

in

Fig.

1.

Thus,

the

SFG structure of the l-D

FlT

can

be

represented by

SFG =BS11@ BS12@

''.

@

BSIsI

(1)

where

s1

denotes

the

number

of

FFT

stages and

"@"

is

a

cascadmg operator.

BS1,

can be an arbitrary radix-nl,

butterfly stage. Thus, Fig. 1 can be represented by

W"3.

2.1.2-D

Fm

Signal Flow Graph

If

the

2-D

FFT

is

implemented by the l-D

FFT,

we can select either

the

rowcolumn

or

column-row

approach. For definiteness in implementation, we will

define an array mapping. For

a

2-D array with the row

length

L1

and column length

L2.

the 2-D array mapping

for

the

rowcolumn approach will be

(Nlflz)

=

(LlL3

and the column-row approach will be (N

lf12)

=

(LZL

l).

Thus,

the

SFG structure

of

the 2-D

lTT

can be

represented by that

of

the

l-D

FFr

with the length

N=N,*N,.

If

the

SFG

structure

of

the

Nl-point

FFT

is

SFGl =BSI1

@

BS12@

'

' '

@

BSIJ1

(2)

and that

of

the

Nz-point

FJT

is

SFGz =BSz1@

BSn

@

' '

.

@

BSa2,

(3)

SFG SFGl@ SFGz

.

(4)

then the SFG

structure

of

the 2-D

FFT

will

be

IEEE

Pac

Rim

'93

-

484

-

0-7803-0971-5/93/$3.00

0

1993

IEEE

Fig. 1

shows

the

mapped

SFG

structure

of

a

6

by

radix-3

stage

followed

by

aoe

radix-2

stage.

The

N2-

point

FET

is

implemented

by

OEE

radix-2

stage

followed

by

one

radix-3

stage.

Thus.

the

2-D

m

can

be imple-

3@2@2@3

if

the

input, autplt

and

twiddle

factor

6

2-D

F+T.

The

Nl-point

F+T

is

implemented

by a~e

mented

as

the

36-point

1-D

FET

with

SFG

~WtUre

=W===praperlYdefined.

2.2.

M-D

FFX'

Si

HOW

Graph

The

2-D

to

1-D

F+T

SFG

mapping

~811

be

My

extended

to

the

M-D

case.

Set

an

M-D array

(Ll.L2,.

.

*

.LM)

with

the

length

of

the

j-th tuple

Lj.

'Ihere

are

two

approaches,

rowdumn

and

column-row,

to

imp-

the

2-D

RT.

Haweve,

there

are

M!

approeche~

to

@Wt

the

M-D

FET.

For

defhik~~~,

the

OrQr

of

the

M-D

FTT

implementation

tdal

rmmber

of

points

afthe

mapped

1-D

F+T

will

be

will

be

mapped

to

the

M-D array

(NI,"~,

. .

*

JVM).

The

N

=NI

*

N2*

..*

*

NM.

(5)

SFG =SFGl@ SFG2@

.

@

SFGM

(6)

'Ibe

SFG

S~IUC~

of

the

M-D

FIT

is

represented by

where

SFG,

can

be

further

partitid

into

SFGi

=

BSi1

Q

BS,2

@

.

*

@

&Sisi

.

3. DIGIT-REVERSE SEQUENCE

(7)

It is well-known

for

the

in-place

FTT

algorithm

tbat

if

the

input array

is

in

normal

order.

then

the

output

array

after

FFT

operations

will

be

in

digit-reverse order

and

vice versa.

This

sectim

will

investigate

how

to

define

addresing

for

the

M-D

normal

and

digit-reverse

arrays.

Thus,

those M-D arrays

can

be

efficiently imple-

mented

by 1-D

ad-.

3.1.2-D Digit-Reverse !kquence

Given a 2-D array C(nl.nJ1.

after

its

discrete

Fourier

transform

we

may get

another

2-D array

[(klPJI

in

nd

order

as

following

mapping

[(nlPz)l

-

[(k,.kJI.

(8)

2-DDJT

Maceover,

for

the

2-D array

after

the

2-D

FJT

opera-

tions,

we may get

the

2-D array

in

digit-revere

&

as

fcdlowing

mapping

C-R

2-0

FJT

[(nl*nz)l

-

C(dr(k1)JmJ)l

f

(9)

The

following

will

&ow

what

the

mapping will

be

if

only 1-D

addressing

is

employed.

The

1-D

addressing

fm

the

2-D

narmal

array

can

be

defined

in

the

last-

tuple-majar

order

as

nr(n&

i

n,

*

N2

+

n2

=

n1n2

=

nr

(dr

(n

J&

(n

1))

(11)

dr(N(dr(nl)dr(nz)))

=

N(n2Pl)

. (12)

It

can

be

seen

fran

(10). (11) and (12) that

if

the

dew

in

the

mal

(digit-

reverse) columnmajar order,

then

the

outputs

of

the

2-D

F+T

will

be

in

the

digit-reveme

(normal)

row-major

order.

Similarly,

if

the

inputs

are

in

the

normal

(digit-

reverse)

row-major

order,

then

the

outputs

are

in

the

digit-reverse

("al)

column-major

ader.

Thus,

the

2-

D

FFT

im-

by

the

ded 1-D

addresing

get

the

mapping

as

follows

[<nl.nz)l

->

W(kJdr(k41

.

(13)

The

digit-mverse operatim

is

reversible.

Thus.

we have

and

that

of

the

2-D digit-reverse array

can

be

derived

as

inplts

of

the

2-D

FTT

Um

f

ied

1-D

FFT

dr(dr(~(n13z)))

=

nln2

=nr(nl.nz)

.

(14)

33. M-D Digit-Reverse !kquence

Tlae

2-D

to

1-D

indexing

mapping

can

be

extended

to

the

M-D

case.

The

M-D element

stored

in

the

memory

can

be

de6114

in

the

last-tuple-major

order

as

(15)

The

digit-Feverse

addxessing

for

the

last-tuple-major

order

of

the

M-D

normal

array

can

be

derived

85

dr(nr(n1p2.

. . .

JZM)

=

dr(nMWr(nM-1)

*

. .

dr(nJ

nr(n1,n2. .

.

.nM)

-

n1n2

.

nM-1nM

.

=

nr(dr(nM).dr(nM-l).

.

.dr(n,))

(16)

and

that

of

the

M-D

digit-reverse array

can

be

dr(nr(dr(n1).

. .

*

,dr(nM)))

=

W(Q.

. . .

~1).

(17)

Therefore,

if

the

inputs

of

the

M-D

FFT

a~

dew

in

the

namal

(digit-reverse) last-tuple-major

order,

tben

the

outputs

of

the

2-D

FFT

will

be

in

the

digit-reverse

(normal)

first-tuple-major

order

and

vice

versa. Similarly.

the

digit-reverse

operation

is reversible

as

dr(dr(~(n1,

. . . 8M)))

nr(n1,

. . .

PM)

.

(18)

33. Parameter

Definition

wij

=

Ni+l

*

Ni+2

*

.

*

NM

*

niti+l)

nui

*

nio.+*)

*

.

. .

*

(19)

-485

-

Gij

=

Ni+l

*

Ni+2

*

.

. .

*

NM

*

nij

nisi

(20)

*

niti+l)

*

.

*

-

wij

=

N,

*

N2

*

. . .

*

Ni-l

*

nil

-

v,,

=

nll

*

n12

*

. . .

*

nI

6-1)

(23)

4.

DATA

SEQUENCE

The

data sequence for the mapped M-D

FIT

will

be the same

as

that for the 1-D

FFT

in each stage

if

the

total number

of

data is the same.

As

shown

in Fig. 1,

the

data sequences for the first and second stages of the

row

FIT

are the same

as

those for the first and second

stages of the 1-D

FFT

and the Grst and second stages of

the column

FFT

are

the

same as those of

the

third and

fourth stages of the 1-D

FFT.

The addressing algorithm

to

generate

the

data sequence for the BSI,-stage of the

M-D

F3T

is listed in the following

for

&=O;

k

5

GI,-l;

k+t)

for (1=0; 15 GIJ-l; I++)

{Output

1

*

GI,

+

k}

5.

TWIDDLE FACTOR

SEQUENCE

’

In

the

mapped M-D implementation, the M-D

E’FT

can employ exactly the same data and digit-reverse

addressing sequences as the 1-D

FTT.

However,

The

twiddle factor sequences will be different except the first

dimension

as

shown

in Figs. 1 and

2.

The indices of

twiddle factors in the figures are indicated upper for the

2-D case and lower for

the

1-D case. Nevertheless, with

different parameter setting

both

M-D and 1-D twiddle

factor sequences can be generated by the same operation.

The

addressing algorithm to generate the twiddle

factor sequence for the BS1,-stage of the 1-D

FFr

is

listed

as

for

&=O;

k

5

FIJ-l;

ktt)

for (1=0

1

5

wlJ-l;

ltt)

for (m=O; m

e

r

1,

-1; m++)

{

output m

*

k

*

wl,

}

and that for

the

BS,, -stage of

the

M-D

for

(1=0

15 v,,-l; I++)

is listed

as

for

(k4;

k

S

TI,

-1;

ktt)

for (ma, m

<

r,,-l; mi+)

{

output m

*

k

*

v,,

}

Table I lists

the

parameters required

to

generate

the data and twiddle factor sequences of

the

36point 1-

D

FIT and 6 by 6 2-D

FIT

with 3@2@2@3 and

2@3@3@2

SFG

structures. Two parameters

are

required for

the

data sequence and

three

parameters

are

required for the twiddle factor sequence

of

the stage.

With the same number

of

array points, there is

no

difference in

setting

parameters for the data sequences of

1-D and M-D

FFTs.

However, parameter

setting

for

the

twiddle factor sequences of 1-D and M-D

FFTs

is

different.

Factor Wuences

of

36-Point 1-D and

6

by

6

2-D

FFTs

Table

I.

Parameter Setting for

Data

and Twiddle

mmm3

3

2

3

12

6

3

1

36

12

6

3

1

3

6

12

6

18

6

1

3

1

2

2@383@2

2

3

2

18

6

2

1

36

18

6

2

1

2

6

18

6

12

6

1

2

1

3

6.

ALGORITHM

REALIZED

BY

LH9124/LH9320

This

section

discusses hardware realization

of

the

proposed algorithm. It

is

impractical

to

build butterfly

modules for all

the

radixes in the data path

of

a chip.

Therefore, the execution unit (LH9124) of

the

SMT’s

array processor chip set selects radix-2, radix-4,

and

radix-16 modules

[3].

The radix-16 butterfly

is

too com-

plex to be directly implemented.

Thus,

the radix-16 is

actually implemented by

two

radix4 stages and can be

finished every 16 cycles [71.

The proposed addressing algorithm is realized by a

programmable address generator called LH9320 [4]. It

provides

the

address pattern

required

by

the

LH.9124.

Since

the

radix-16 butterfly

is

implemented by two

radix-4 stages, the algorithm for generating twiddle fac-

tor sequence of the quasi radix-16 stage has

to

be

modified

as

-

486

-

for

(

M,

k

<

FI,-l

;

k-i+)

for

(

1=0; 1

5

w1,

/4-k

l-~)

for

(ne n

<

3;

n*)

for

(

ne, n

<

3;

ni+)

{mtputn*k*w,

}

for

(

m=l;

m

<

3;

m-H)

{

output m

*

(n

*

Gl,

+

k)

*

w1,/4

1

Table

II

compares

the

performance

of

the

1-D

and

M-D

FFTs.

It can be

seen

with

the

same

number

of

array

points

that

both

1-D

and

M-D

FZTs

have

the

same

performance.

With

25

nanh machine cycle time.

the

256

by

256 2-D

c~mple~

FFT

cu

be

hished

within

6.56

"n

ds.

The=

a~

several advantages

for

the

proposed

M-

D

FFT

implementation. First,

the

number

of

instructions

reqd

is

greatly

reduced.

Thus,

the

program memory

is

not

necessary

and

the

performance

can be improved

by

reducing

instruction

pipelined overhead. For exam-

ple,

the

proposed approach

requires

only

3

instructions to

implement

16

by

16

by

16 3-D

FFT.

while

the

tradi-

tid approach

requires

768

instructicms.

second.

no

datamatrix

transpositicm

isrequiredbecause

the

transyo-

1-D

64K

points

2-D 256

by

256

Table

II.

Performance

of

FFh

by

LH9124LH9320

4 262416

6560.4

4 262416

6560.4

3-D

16

hv

16

hv

16

3

Fig.

1

Sid

F~OW

Graph

oF3@2@2@3

1-D

FFT

and

3@2

by

2@3

2-D

FIT

-487

-

7.

CONCLUSIONS

This

paper defines the unified 1-D addressing for

the

M-D

FFT

implementation.

The

addressing

seqwnces

can be derived from the factorization of the twiddle fac-

tor matrix

[6].

The discussion

only

includes the

decimation-in-the digit-reverse-input

and

normal-output

FFT

algorithms.

Essentially

all

the

results

extended to

other algorithms in a straightforward manner.

Algorithms for defining mixed radix 1-D

FFT

indexing

can

also

be

found in

181

implementation automatically solves

the

scaling problem

of

the

block floating-point arithmetic. The concept

can

also

be

extended to derive

the

efficient general DSP

algorithms for block floating-point arithmetic such as

IIR

filtering, adaptive filtering, polyphase filter bank, and

multichannel DSP

191.

The

unified indexing -apt

of

the

M-D

FFT

ACKNOWLEDGMENT

~

The

author wishes

to

thank the System

and

Design

groups

of

Sharp Microelectronics Technology for practi-

cally implementing

the

ded

FFT

algorithms in

the

array processing chip set.

REFERENCES

C.

Ju

and M. Fleming, "Design concept

of

real-time

array signal processors,"

Proceeding of the International

Conference

on

Signal Processing Applications and Tech-

nology,

Boston, pp.188-197, Nov. 1992.

C.

Tu,

"Equivalent relationship and

unified

indexing of

FFT

algorithms,"

Proceeding of International Symposium

on Circuits and Systems,

Chicago, May 1993.

LH9124 Digital Signal Processor User's Guide,

Sharp

Electronics Corporation.

LH9320 Address Generator User's Guide,

Sharp

Elec-

tronics Corporation.

J.

W.

Cooley and

J.

W.

Tukey, "An algorithm for the

machine computation

of

complex Fourier series,"

Math.

Comput.,

vo1.19. pp.297-301,

Apr.

1965.

C. Ju, "Derivation and realization of fast Fourier

transform," unpublished.

C. Ju,

LH9124lLH9320 Fat Fourier Transform Applica-

tion Note,

Sharp Electronics Corporation.

G.

L. DeMuth, "Algorithms

for

defining

mixed

radix

FFT

flow

graphs,"

IEEE

Trans. on Acoustics, Speech,

and Signal Processing,

pp.1349-1358, Sept. 1989.

C. Ju, "General

DSP

algorithms

for

block floating-point

arithmetic." unpublished.

Algorithm of defining 1-D indexing for M-D mixed radix FFT implementation

Citations

FFT-based parallel system for array processing with low latency

Radix-3 $\,\times\,$ 3 Algorithm for The 2-D Discrete Hartley Transform

Equivalent relationship and unified indexing of FFT algorithm

Apparatus for fast Fourier transform

Equivalent Relationship of Function-level Representation and Implementation of Unified Indexing of FFT Algorithms

References

An algorithm for the machine calculation of complex Fourier series

Algorithms for defining mixed radix FFT flow graphs

Equivalent relationship and unified indexing of FFT algorithm

Related Papers (5)

A Generalized Mixed-Radix Algorithm for Memory-Based FFT Processors

The radix-r one stage FFT kernel computation

Design and performance analysis of 32 and 64 point fft using radix-2 algorithm

Improved Twiddle Access for Fast Fourier Transforms

A Memory-Based FFT Processor Design With Generalized Efficient Conflict-Free Address Schemes