scispace - formally typeset
Open AccessProceedings ArticleDOI

Algorithm of defining 1-D indexing for M-D mixed radix FFT implementation

C.-J. Ju
- Vol. 2, pp 484-488
Reads0
Chats0
TLDR
The M-D FFT can be efficiently implemented by the unified 1-D indexing, and the address generator design can be simplified, because the matrix transposition is no longer necessary.
Abstract: 
A novel M-D (multidimensional) to 1-D FFT (fast Fourier transform) signal flow graph mapping is proposed. Thus, the M-D FFT can be efficiently implemented by the unified 1-D indexing, and the address generator design can be simplified. In addition, the matrix transposition is no longer necessary. The addressing sequences can be derived from the factorization of the twiddle factor matrix. The unified indexing concept of the M-D FFT implementation automatically solves the scaling problem of the block floating-point arithmetic. Practical chip design considerations in implementing the algorithm are presented. >

read more

Content maybe subject to copyright    Report

ALGORITHM OF DEFINING 1-D INDEXING FOR M-D
MIXED RADIX FFT IMPLEMENTATION
Chwen-Jye Ju, Ph.D.
Sharp Microelectronics Technology, Inc.
5700
NW
Pacific
Rim
Blvd.
Camas,
WA
98607,
USA
ABSTRACT
Multi-dimensional
@I-D)
fast Fourier transform
(FFT) is an essential algorithm in array signal pro-
cessing. However, the calculation of M-D indexing
and transposition
of
data matrix required by the M-D
FFT
are the algorithm's performance killer. The
paper will propose
a
novel M-D to l-D
FFT
signal
flow
graph (SFG) mapping. Thus, the
M-D
FlV
can
be efficiently implemented by the unified l-D indexing
and the address generator design
can
be
simplified.
In
addition, the matrix transposition is no longer
necessary. Finally, practical chip design considera-
tion in implementing the algorithm is given.
1.
INTRODUCTION
In
recent decades, the fast Fourier transform algo-
rithm
has
been
a driving force to the progress of digital
signal processing. With
the
advance of the
WI
tech-
nology, the
FlT
algorithm has
been
pushed further in
solving the multidimensional array signal processing
in
real-time. However, there is no efficient addressing
method for l-D to M-D
ITTs.
Therefore,
the
paper will
conquer
this
problem and propose a unified addressing
for l-D to M-D
FFTs.
All
the
M-D indexing can be
simplified and implemented by l-D indexing. The pro-
posed approach
has
been
implemented by many com-
panies in
their
high-end
systems such as radar, medical
image recovery,
etc.
A
novel vector-matrix representation of l-D
to
M-
D radix-2
ITT
algorithms has
been
discussed
in
[1.21. It
is
shown
that
the
M-D
FlT
has the same matrix form
as
the
l-D
FlT
if
both have
the
same number
of
data.
This
implies that
the
SFG structure
of
the M-D
FFT
can be
mapped to that of the
l-D
FFT.
Thus,
the
unified l-D
indexing can be applied to
the
M-D FE;T.
This
paper
will extend the radix-2
FFT
results to the
mixed
radix
FET
case.
For definiteness,
this
paper only
discusses
the
decimation-in-time digit-reverse-input and normal-output
ITT
algorithms.
Section
2
introduces an easy way of
constructing an M-D
FFT
SFG structure.
The
required
M-D
FJT
addressing sequences including digit-reverse,
data, and twiddle factor are defined in
Section
3, 4, and
5.
Section
6
investigates
the
practical design considera-
tion of
the
algorithm. The
uniiied
indexing for l-D
to
M-D
FFT
algorithms
has
been
implemented
in
the
array
processor chip set LH9124LH9320 developed by Sharp
Microelectronics Technology 13.41. It can be
seen from
the chip set implementation that
the
proposed M-D
FFT
approach has tremendous advantages
over
the traditional
M-D
FFT
approach in
both
cost and performance.
2. M-D
FFT
SIGNAL
FLOW
GRAPH
It
is
well-known that the twiddle factor matrix
of
the DIT can
be
recursively partitioned into the multipli-
cation
of
the
butterfly stage
(BS)
matrices
1561.
These
matrices can
also
be represented by cascading butterfly
stages of the
FFT
signal flow graph
as
shown
in
Fig.
1.
Thus,
the
SFG structure of the l-D
FlT
can
be
represented by
SFG =BS11@ BS12@
''.
@
BSIsI
(1)
where
s1
denotes
the
number
of
FFT
stages and
"@"
is
a
cascadmg operator.
BS1,
can be an arbitrary radix-nl,
butterfly stage. Thus, Fig. 1 can be represented by
W"3.
2.1.2-D
Fm
Signal Flow Graph
If
the
2-D
FFT
is
implemented by the l-D
FFT,
we can select either
the
rowcolumn
or
column-row
approach. For definiteness in implementation, we will
define an array mapping. For
a
2-D array with the row
length
L1
and column length
L2.
the 2-D array mapping
for
the
rowcolumn approach will be
(Nlflz)
=
(LlL3
and the column-row approach will be (N
lf12)
=
(LZL
l).
Thus,
the
SFG structure
of
the 2-D
lTT
can be
represented by that
of
the
l-D
FFr
with the length
N=N,*N,.
If
the
SFG
structure
of
the
Nl-point
FFT
is
SFGl =BSI1
@
BS12@
'
' '
@
BSIJ1
(2)
and that
of
the
Nz-point
FJT
is
SFGz =BSz1@
BSn
@
' '
.
@
BSa2,
(3)
SFG SFGl@ SFGz
.
(4)
then the SFG
structure
of
the 2-D
FFT
will
be
IEEE
Pac
Rim
'93
-
484
-
0-7803-0971-5/93/$3.00
0
1993
IEEE

Fig. 1
shows
the
mapped
SFG
structure
of
a
6
by
radix-3
stage
followed
by
aoe
radix-2
stage.
The
N2-
point
FET
is
implemented
by
OEE
radix-2
stage
followed
by
one
radix-3
stage.
Thus.
the
2-D
m
can
be imple-
3@2@2@3
if
the
input, autplt
and
twiddle
factor
6
2-D
F+T.
The
Nl-point
F+T
is
implemented
by a~e
mented
as
the
36-point
1-D
FET
with
SFG
~WtUre
=W===praperlYdefined.
2.2.
M-D
FFX'
Si
HOW
Graph
The
2-D
to
1-D
F+T
SFG
mapping
~811
be
My
extended
to
the
M-D
case.
Set
an
M-D array
(Ll.L2,.
.
*
.LM)
with
the
length
of
the
j-th tuple
Lj.
'Ihere
are
two
approaches,
rowdumn
and
column-row,
to
imp-
the
2-D
RT.
Haweve,
there
are
M!
approeche~
to
@Wt
the
M-D
FET.
For
defhik~~~,
the
OrQr
of
the
M-D
FTT
implementation
tdal
rmmber
of
points
afthe
mapped
1-D
F+T
will
be
will
be
mapped
to
the
M-D array
(NI,"~,
. .
*
JVM).
The
N
=NI
*
N2*
..*
*
NM.
(5)
SFG =SFGl@ SFG2@
.
.
.
@
SFGM
(6)
'Ibe
SFG
S~IUC~
of
the
M-D
FIT
is
represented by
where
SFG,
can
be
further
partitid
into
SFGi
=
BSi1
Q
BS,2
@
.
.
*
@
&Sisi
.
3. DIGIT-REVERSE SEQUENCE
(7)
It is well-known
for
the
in-place
FTT
algorithm
tbat
if
the
input array
is
in
normal
order.
then
the
output
array
after
FFT
operations
will
be
in
digit-reverse order
and
vice versa.
This
sectim
will
investigate
how
to
define
addresing
for
the
M-D
normal
and
digit-reverse
arrays.
Thus,
those M-D arrays
can
be
efficiently imple-
mented
by 1-D
ad-.
3.1.2-D Digit-Reverse !kquence
Given a 2-D array C(nl.nJ1.
after
its
discrete
Fourier
transform
we
may get
another
2-D array
[(klPJI
in
nd
order
as
following
mapping
[(nlPz)l
-
[(k,.kJI.
(8)
2-DDJT
Maceover,
for
the
2-D array
after
the
2-D
FJT
opera-
tions,
we may get
the
2-D array
in
digit-revere
&
as
fcdlowing
mapping
C-R
2-0
FJT
[(nl*nz)l
-
C(dr(k1)JmJ)l
f
(9)
The
following
will
&ow
what
the
mapping will
be
if
only 1-D
addressing
is
employed.
The
1-D
addressing
fm
the
2-D
narmal
array
can
be
defined
in
the
last-
tuple-majar
order
as
nr(n&
i
n,
*
N2
+
n2
=
n1n2
=
nr
(dr
(n
J&
(n
1))
(11)
dr(N(dr(nl)dr(nz)))
=
N(n2Pl)
. (12)
It
can
be
seen
fran
(10). (11) and (12) that
if
the
dew
in
the
mal
(digit-
reverse) columnmajar order,
then
the
outputs
of
the
2-D
F+T
will
be
in
the
digit-reveme
(normal)
row-major
order.
Similarly,
if
the
inputs
are
in
the
normal
(digit-
reverse)
row-major
order,
then
the
outputs
are
in
the
digit-reverse
("al)
column-major
ader.
Thus,
the
2-
D
FFT
im-
by
the
ded 1-D
addresing
get
the
mapping
as
follows
[<nl.nz)l
->
W(kJdr(k41
.
(13)
The
digit-mverse operatim
is
reversible.
Thus.
we have
and
that
of
the
2-D digit-reverse array
can
be
derived
as
inplts
of
the
2-D
FTT
Um
f
ied
1-D
FFT
dr(dr(~(n13z)))
=
nln2
=nr(nl.nz)
.
(14)
33. M-D Digit-Reverse !kquence
Tlae
2-D
to
1-D
indexing
mapping
can
be
extended
to
the
M-D
case.
The
M-D element
stored
in
the
memory
can
be
de6114
in
the
last-tuple-major
order
as
(15)
The
digit-Feverse
addxessing
for
the
last-tuple-major
order
of
the
M-D
normal
array
can
be
derived
85
dr(nr(n1p2.
. . .
JZM)
=
dr(nMWr(nM-1)
*
. .
dr(nJ
nr(n1,n2. .
.
.
.nM)
-
n1n2
.
.
.
nM-1nM
.
=
nr(dr(nM).dr(nM-l).
.
.
.
.dr(n,))
(16)
and
that
of
the
M-D
digit-reverse array
can
be
dr(nr(dr(n1).
. .
*
,dr(nM)))
=
W(Q.
. . .
~1).
(17)
Therefore,
if
the
inputs
of
the
M-D
FFT
a~
dew
in
the
namal
(digit-reverse) last-tuple-major
order,
tben
the
outputs
of
the
2-D
FFT
will
be
in
the
digit-reverse
(normal)
first-tuple-major
order
and
vice
versa. Similarly.
the
digit-reverse
operation
is reversible
as
dr(dr(~(n1,
. . . 8M)))
nr(n1,
. . .
PM)
.
(18)
33. Parameter
Definition
wij
=
Ni+l
*
Ni+2
*
.
.
.
*
NM
*
niti+l)
nui
*
nio.+*)
*
.
. .
*
(19)
-485
-

Gij
=
Ni+l
*
Ni+2
*
.
. .
*
NM
*
nij
nisi
(20)
*
niti+l)
*
.
.
.
*
-
wij
=
N,
*
N2
*
. . .
*
Ni-l
*
nil
-
v,,
=
nll
*
n12
*
. . .
*
nI
6-1)
(23)
4.
DATA
SEQUENCE
The
data sequence for the mapped M-D
FIT
will
be the same
as
that for the 1-D
FFT
in each stage
if
the
total number
of
data is the same.
As
shown
in Fig. 1,
the
data sequences for the first and second stages of the
row
FIT
are the same
as
those for the first and second
stages of the 1-D
FFT
and the Grst and second stages of
the column
FFT
are
the
same as those of
the
third and
fourth stages of the 1-D
FFT.
The addressing algorithm
to
generate
the
data sequence for the BSI,-stage of the
M-D
F3T
is listed in the following
for
&=O;
k
5
GI,-l;
k+t)
for (1=0; 15 GIJ-l; I++)
{Output
1
*
GI,
+
k}
5.
TWIDDLE FACTOR
SEQUENCE
’
In
the
mapped M-D implementation, the M-D
E’FT
can employ exactly the same data and digit-reverse
addressing sequences as the 1-D
FTT.
However,
The
twiddle factor sequences will be different except the first
dimension
as
shown
in Figs. 1 and
2.
The indices of
twiddle factors in the figures are indicated upper for the
2-D case and lower for
the
1-D case. Nevertheless, with
different parameter setting
both
M-D and 1-D twiddle
factor sequences can be generated by the same operation.
The
addressing algorithm to generate the twiddle
factor sequence for the BS1,-stage of the 1-D
FFr
is
listed
as
for
&=O;
k
5
FIJ-l;
ktt)
for (1=0
1
5
wlJ-l;
ltt)
for (m=O; m
e
r
1,
-1; m++)
{
output m
*
k
*
wl,
}
and that for
the
BS,, -stage of
the
M-D
for
(1=0
15 v,,-l; I++)
is listed
as
for
(k4;
k
S
TI,
-1;
ktt)
for (ma, m
<
r,,-l; mi+)
{
output m
*
k
*
v,,
}
Table I lists
the
parameters required
to
generate
the data and twiddle factor sequences of
the
36point 1-
D
FIT and 6 by 6 2-D
FIT
with 3@2@2@3 and
2@3@3@2
SFG
structures. Two parameters
are
required for
the
data sequence and
three
parameters
are
required for the twiddle factor sequence
of
the stage.
With the same number
of
array points, there is
no
difference in
setting
parameters for the data sequences of
1-D and M-D
FFTs.
However, parameter
setting
for
the
twiddle factor sequences of 1-D and M-D
FFTs
is
different.
Factor Wuences
of
36-Point 1-D and
6
by
6
2-D
FFTs
Table
I.
Parameter Setting for
Data
and Twiddle
mmm3
3
2
2
3
12
6
3
1
36
12
6
3
1
3
6
12
12
6
18
6
1
3
1
2
2@383@2
2
3
3
2
18
6
2
1
36
18
6
2
1
2
6
18
18
6
12
6
1
2
1
3
6.
ALGORITHM
REALIZED
BY
LH9124/LH9320
This
section
discusses hardware realization
of
the
proposed algorithm. It
is
impractical
to
build butterfly
modules for all
the
radixes in the data path
of
a chip.
Therefore, the execution unit (LH9124) of
the
SMT’s
array processor chip set selects radix-2, radix-4,
and
radix-16 modules
[3].
The radix-16 butterfly
is
too com-
plex to be directly implemented.
Thus,
the radix-16 is
actually implemented by
two
radix4 stages and can be
finished every 16 cycles [71.
The proposed addressing algorithm is realized by a
programmable address generator called LH9320 [4]. It
provides
the
address pattern
required
by
the
LH.9124.
Since
the
radix-16 butterfly
is
implemented by two
radix-4 stages, the algorithm for generating twiddle fac-
tor sequence of the quasi radix-16 stage has
to
be
modified
as
-
486
-

for
(
M,
k
<
FI,-l
;
k-i+)
for
(
1=0; 1
5
w1,
/4-k
l-~)
for
(ne n
<
3;
n*)
for
(
ne, n
<
3;
ni+)
{mtputn*k*w,
}
for
(
m=l;
m
<
3;
m-H)
{
output m
*
(n
*
Gl,
+
k)
*
w1,/4
1
Table
II
compares
the
performance
of
the
1-D
and
M-D
FFTs.
It can be
seen
with
the
same
number
of
array
points
that
both
1-D
and
M-D
FZTs
have
the
same
performance.
With
25
nanh machine cycle time.
the
256
by
256 2-D
c~mple~
FFT
cu
be
hished
within
6.56
"n
ds.
The=
a~
several advantages
for
the
proposed
M-
D
FFT
implementation. First,
the
number
of
instructions
reqd
is
greatly
reduced.
Thus,
the
program memory
is
not
necessary
and
the
performance
can be improved
by
reducing
instruction
pipelined overhead. For exam-
ple,
the
proposed approach
requires
only
3
instructions to
implement
16
by
16
by
16 3-D
FFT.
while
the
tradi-
tid approach
requires
768
instructicms.
second.
no
datamatrix
transpositicm
isrequiredbecause
the
transyo-
1-D
64K
points
2-D 256
by
256
Table
II.
Performance
of
FFh
by
LH9124LH9320
4 262416
6560.4
4 262416
6560.4
3-D
16
hv
16
hv
16
3
Fig.
1
Sid
F~OW
Graph
oF3@2@2@3
1-D
FFT
and
3@2
by
2@3
2-D
FIT
-487
-

7.
CONCLUSIONS
This
paper defines the unified 1-D addressing for
the
M-D
FFT
implementation.
The
addressing
seqwnces
can be derived from the factorization of the twiddle fac-
tor matrix
[6].
The discussion
only
includes the
decimation-in-the digit-reverse-input
and
normal-output
FFT
algorithms.
Essentially
all
the
results
extended to
other algorithms in a straightforward manner.
Algorithms for defining mixed radix 1-D
FFT
indexing
can
also
be
found in
181
implementation automatically solves
the
scaling problem
of
the
block floating-point arithmetic. The concept
can
also
be
extended to derive
the
efficient general DSP
algorithms for block floating-point arithmetic such as
IIR
filtering, adaptive filtering, polyphase filter bank, and
multichannel DSP
191.
The
unified indexing -apt
of
the
M-D
FFT
ACKNOWLEDGMENT
~
The
author wishes
to
thank the System
and
Design
groups
of
Sharp Microelectronics Technology for practi-
cally implementing
the
ded
FFT
algorithms in
the
array processing chip set.
REFERENCES
C.
Ju
and M. Fleming, "Design concept
of
real-time
array signal processors,"
Proceeding of the International
Conference
on
Signal Processing Applications and Tech-
nology,
Boston, pp.188-197, Nov. 1992.
C.
Tu,
"Equivalent relationship and
unified
indexing of
FFT
algorithms,"
Proceeding of International Symposium
on Circuits and Systems,
Chicago, May 1993.
LH9124 Digital Signal Processor User's Guide,
Sharp
Electronics Corporation.
LH9320 Address Generator User's Guide,
Sharp
Elec-
tronics Corporation.
J.
W.
Cooley and
J.
W.
Tukey, "An algorithm for the
machine computation
of
complex Fourier series,"
Math.
Comput.,
vo1.19. pp.297-301,
Apr.
1965.
C. Ju, "Derivation and realization of fast Fourier
transform," unpublished.
C. Ju,
LH9124lLH9320 Fat Fourier Transform Applica-
tion Note,
Sharp Electronics Corporation.
G.
L. DeMuth, "Algorithms
for
defining
mixed
radix
FFT
flow
graphs,"
IEEE
Trans. on Acoustics, Speech,
and Signal Processing,
pp.1349-1358, Sept. 1989.
C. Ju, "General
DSP
algorithms
for
block floating-point
arithmetic." unpublished.
Citations
More filters
Patent

FFT-based parallel system for array processing with low latency

TL;DR: In this paper, a DSP system is provided for performing FFT computations with low latency by parallel processing of complex data points through a plurality of butterfly FFT execution units.
Journal ArticleDOI

Radix-3 $\,\times\,$ 3 Algorithm for The 2-D Discrete Hartley Transform

TL;DR: A vector-radix algorithm for the fast computation of a 2-D discrete Hartley transform (DHT) and a radix-3 times 3 decimation in frequency algorithm for data sequences whose length is a power of three is developed.
Proceedings ArticleDOI

Equivalent relationship and unified indexing of FFT algorithm

C.-J. Ju
TL;DR: It is shown that the multidimensional (M-D) FFT can be represented by the same vector-matrix form as the 1-D FFT.
Patent

Apparatus for fast Fourier transform

TL;DR: In this article, a data processing system for use in digital signal processing applications for processing N data points through Y processing stages using Z execution units, each execution unit having a plurality of I/O ports including A and B ports.
ReportDOI

Equivalent Relationship of Function-level Representation and Implementation of Unified Indexing of FFT Algorithms

Nee-Hua Cho
TL;DR: This Thesis is brought to you for free and open access because it has been accepted for inclusion in Dissertations and Theses by an authorized administrator of PDXScholar.
References
More filters
Journal ArticleDOI

An algorithm for the machine calculation of complex Fourier series

J.W. Cooley, +1 more
- 01 Apr 1965 - 
TL;DR: Good generalized these methods and gave elegant algorithms for which one class of applications is the calculation of Fourier series, applicable to certain problems in which one must multiply an N-vector by an N X N matrix which can be factored into m sparse matrices.
Journal ArticleDOI

Algorithms for defining mixed radix FFT flow graphs

TL;DR: A unified set of algorithms is presented to define the intraconnection and phase rotation structure of flow graphs for arbitrary fast Fourier transforms (FFTs) and permit HOL (high-order language) specification of FFTs for programming signal processors.
Proceedings ArticleDOI

Equivalent relationship and unified indexing of FFT algorithm

C.-J. Ju
TL;DR: It is shown that the multidimensional (M-D) FFT can be represented by the same vector-matrix form as the 1-D FFT.
Related Papers (5)