The Feedforward Short-Time Fourier Transform

doi:10.1109/TCSII.2016.2534838

The Feedforward Short-Time Fourier

Transform

Mario Garrido Gálvez

Journal Article

N.B.: When citing this work, cite the original article.

reprint/republish this material for advertising or promotional purposes or for creating new

collective works for resale or redistribution to servers or lists, or to reuse any copyrighted

component of this work in other works must be obtained from the IEEE.

Mario Garrido Gálvez , The Feedforward Short-Time Fourier Transform, IEEE Transactions

on Circuits and Systems - II - Express Briefs, 2016. 63(9), pp.868-872.

http://dx.doi.org/10.1109/TCSII.2016.2534838

Postprint available at: Linköping University Electronic Press

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-131671

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS 1

The Feedforward Short-Time Fourier Transform

Mario Garrido, Member, IEEE

Abstract—This paper presents the feedforward short-time

Fourier transform (STFT). This new approach is based on

reusing the calculations of the STFT at consecutive time instants.

This leads to signiﬁcant savings in hardware components with

respect to FFT-based STFTs. Furthermore, the feedforward

STFT does not have the accumulative error of iterative STFT

approaches. As a result, the proposed feedforward STFT presents

an excellent trade-off between hardware utilization and perfor-

mance.

Index Terms—STFT, FFT, feedforward, pipelined architecture

I. INTRODUCTION

The short-time Fourier transform (STFT) is a linear trans-

form used to calculate the evolution of a signal over time,

offering a trade-off between spectral and temporal resolutions.

Whereas the fast Fourier transform (FFT) is adequate for the

study of stationary signals, i.e., those signals whose parameters

do not vary over time such as sinusoids, the STFT is adequate

for non-stationary signals. In this case, parameters such as

amplitude, frequency and phase can vary over time.

The STFT is mainly used in spectral analysis [1]–[3],

being a key element in many applications such as medical

applications [4]–[6], digital receivers [7]–[9], and musical and

audio signal analysis [10]–[12].

Nowadays there are two main approaches to calculate the

STFT in hardware. The ﬁrst one consists of using several FFT

modules in parallel [1]. Each of these modules calculates all

the STFT frequencies at a certain time instant. The second

approach obtains the values for each frequency independently

following an iterative fashion [13], [14].

Although both approaches are useful for calculating the

STFT, they have important drawbacks. On the one hand, the

FFT-based STFT has a high cost in terms of hardware, as it

needs the implementation of multiple FFTs in parallel. On the

other hand, the iterative STFT generates an accumulative error

that increases at each iteration.

This paper presents the feedforward STFT. This new ap-

proach has the advantage that it requires signiﬁcantly less

hardware than the FFT-based STFT and, at the same time, it

does not generate accumulative errors like the iterative STFT.

The paper is organized as follows. Section II reviews pre-

vious approaches for the STFT. Section III explains the basic

principle in which the feedforward STFT is based. Section IV

presents the feedforward STFT algorithm. Section V shows the

proposed feedforward STFT architecture. Section VI presents

the STFT for real-valued inputs. Finally, Section VII summa-

rizes the main conclusions of the paper.

M. Garrido is with the Department of Electrical Engineering,

Linköping University, SE-581 83 Linköping, Sweden, e-mail:

mario.garrido.galvez@liu.se

However, permission to use this material for any other purposes must be

obtained from the IEEE by sending an email to pubs-permissions@ieee.org

Fig. 1. FFT-based STFT.

II. STFT REVIEW

In digital systems the STFT of a discrete signal x[n] is

deﬁned as

STFT

x

[n, k] = X[n, k] =

n+(N−1)

X

m=n

x[m]e

−j

2πk

N

m

, (1)

where k = 0, 1 . . . , N −1. In the equation, time and frequency

are represented by the discrete variables n and k, respectively.

The STFT at a certain time n corresponds to the FFT of

the sequence from samples x[n] to x[n + (N − 1)]. Thus, one

approach to calculate the STFT is to use N FFT processors in

parallel [1], as shown in Fig. 1. In this case, all the processors

start to calculate the FFT at the same time. Due to the delay

of the buffer, the FFTs start with different samples and, thus,

calculate the FFT at different time instants.

Considering that each FFT is implemented by a single-

path delay feedback (SDF) FFT [1], each FFT processor has

2 log

2

N adders, log

2

N multipliers and a total memory size

of N. Therefore, the entire STFT has 2N log

2

N adders,

N log

2

N multipliers and a memory size of N

2

.

Another approach consists of calculating the STFT for

each frequency independently [13], [14]. This is done by the

iterative structure in Fig. 2, based on the equation:

STFT

x

[n, k] = e

j

2π

N

k

(STFT

x

[n−1, k]+x[n−1+N]−x[n−1])

(2)

Therefore, at each time instant, each frequency is updated with

the incoming value. In this case the number of adders and

multipliers is N, and the total memory size is 2N.

The hardware cost is smaller in the iterative approach com-

pared to the FFT-based one. However, the iterative approach

has the drawback that the quantization error accumulates at

each iteration [14].

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS 2

Fig. 2. Iterative STFT.

Fig. 3. Flow graph of a 16-point radix-2 DIT FFT.

III. BASIC PRINCIPLE OF THE FEEDFORWARD STFT

The feedforward STFT is based on the radix-2 decimation

in time (DIT) FFT. The DIT algorithm has the property that

the STFT can reuse operations of consecutive FFTs as shown

next.

The ﬂow graph of a radix-2 DIT FFT for N = 16 points

is shown in Fig. 3. The numbers at the input represent the

index of the input sequence, x[n], whereas those at the output

are the frequencies, k, of the output signal X[k]. In the ﬂow

graph, the input values are depicted in bit-reversed order and

the output frequencies are in natural order.

The ﬂow graph consists of a series of n stages, s ∈

{1 . . . n}, where additions, subtractions and complex multi-

plications are calculated. Additions and subtractions come in

pairs, forming a structure called butterﬂy. The ﬂow graph in

Fig. 3 assumes that the lower edge of each butterﬂy is always

multiplied by −1. These −1 are usually not depicted in order

to simplify the graphs.

Fig. 4. Basic Principle of the proposed STFT.

The multiplications are represented by the numbers between

the stages. Each number, φ, in between the stages indicates a

multiplication by the twiddle factor:

W

φ

N

= e

−j

2π

N

φ

(3)

Let us imagine that we are calculating the STFT and,

therefore, we have to calculate consecutive FFTs. Fig. 4 shows

this case. The ﬁrst FFT is calculated on samples x[0] to x[15],

the second FFT is calculated on samples x[1] to x[16], and

the third FFT is calculated on samples x[2] to x[17]. In each

consecutive FFT a new sample arrives at the input and another

sample is discarded. The second FFT discards sample x[0] and

incorporates sample x[16]. Fig. 4 also shows the butterﬂies of

the ﬁrst stage of the algorithm. It can be observed that all

the butterﬂies of the ﬁrst stage can be reused for the next FFT

except one of them. This means that there are many operations

that are repeated among consecutive FFTs and only need to

be calculated once. Therefore, if we store previous results,

we only need to calculate one butterﬂy for each incoming

sample in order to calculate the STFT. If we generalize this,

for the second stage we have to calculate 2 butterﬂies for each

incoming sample and for any stage s ∈ {1 . . . n}, we have to

calculate 2

s−1

butterﬂies.

Regarding multiplications, the DIT algorithm has the par-

ticularity that multiplications can be reused for the STFT in

the same way as the butterﬂies. Indeed, in Fig. 3 the ﬂow

graph until stage 2 is formed by four equal parts and the

ﬂow graph until stage 3 is formed by two equal parts. This

shows the large number of operations that are repeated among

consecutive STFTs.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS 3

IV. PROPOSED FEEDFOR WARD STFT ALGORITHM

for i = 1 : length(input),

x00 = input(i); − − Input Sample

m1 = mod(i, N/2); − − Stage 1

x10 = b10(m1) + x00;

x11 = b10(m1) − x00;

b10(m1) = x00;

m2 = mod(i, N/4); − − Stage 2

xr11 = (−j) ∗ x11;

x20 = b20(m2) + x10;

x21 = b20(m2) − x10;

x22 = b21(m2) + xr11;

⋆ x23 = b21(m2) − xr11;

b20(m2) = x10;

b21(m2) = x11;

m3 = mod(i, N/8); − − Stage 3

xr21 = (−j) ∗ x21;

xr22 = (0.7071 − 0.7071j) ∗ x22;

⋆ xr23 = (−0.7071 − 0.7071j) ∗ x23;

x30 = b30(m3) + x20;

x31 = b30(m3) − x20;

x32 = b31(m3) + xr21;

⋆ x33 = b31(m3) − xr21;

x34 = b32(m3) + xr22;

x35 = b32(m3) − xr22;

⋆ x36 = b33(m3) + xr23;

⋆ x37 = b33(m3) − xr23;

b30(m3) = x20;

b31(m3) = x21;

b32(m3) = x22;

⋆ b33(m3) = x23;

STFT(i, 0) = x30; − − Outputs

STFT(i, 1) = x34;

STFT(i, 2) = x32;

⋆ STFT(i, 3) = x36;

STFT(i, 4) = x31;

STFT(i, 5) = x35;

⋆ STFT(i, 6) = x33;

⋆ STFT(i, 7) = x37;

end

(4)

The pseudocode of the proposed feedforward STFT algo-

rithm is shown in equation (4) for the case of N = 8 points.

At each iteration the algorithm takes one input value x[i]

from the input signal x[n] and provides 8 output frequencies

corresponding to time i. They are provided in STFT(i, k),

k = 0, . . . , N − 1.

The proposed algorithm is separated in stages for an easier

explanation. First, the input value x[i] is saved in the variable

x00. The ﬁrst stage consists of a memory b10 and the two

variables x10 and x11. The memory implements a circular

buffer of size N/2, addressed by the variable m1. Thus, when

all the memory has been ﬁlled, it starts to be ﬁlled from the

beginning. The two variables x10 and x11 are the output of

the butterﬂy of the ﬁrst stage, whose inputs are the value in

the buffer and the input signal.

Note that the ﬁrst stage only calculates a butterﬂy for each

incoming input sample. This agrees with Fig. 4, where only

the calculation of one butterﬂy is needed at the ﬁrst stage.

Likewise, two samples that are operated in the butterﬂy differ

by N/2 samples. This is the reason why the buffer has a length

of N/2.

The second stage includes a buffer of length N/4 and four

variables x20 to x23 that save the results of the butterﬂies.

Previous to this, the input x11 is multiplied by the twiddle

factor W

2

8

= −j. After the butterﬂies are calculated, the

buffers b20 and b21 are updated.

The third stage is similar to the second one. It only differs

in the twiddle factors that are calculated and the number of

butterﬂies.

After the third stage, the output of the STFT is obtained and

stored in the variable ’STFT’. For this purpose, the outputs are

assigned according to the bit reversal algorithm [15].

In a general case for any N , the pseudocode of the feed-

forward STFT algorithm is shown in equation (5).

for i = 1 : length(input),

for s = 1 : n,

m = mod(i, N/2

s

);

for k = 0 : 2 : 2

s

− 1,

xr = x

s−1,k/2

∗ TW(s, k, n);

x

s,k

= b

s,k/2

(m) + xr;

x

s,k+1

= b

s,k/2

(m) − xr;

end

for k = 0 : 2

s−1

− 1,

b

s,k

(m) = x

s−1,k

;

end

(5)

The number of additions of the proposed algorithm is

2N − 2 and the number of multiplications N − 1. This can

be compared to the use of an FFT-based STFT in software:

The FFT-based STFT requires to calculate one FFT for each

incoming sample. In software this means N log

2

N additions

and N/2 log

2

N multiplications per incoming sample. Thus,

the proposed algorithm has less additions and multiplications.

Fig. 6 compares the proposed algorithm to the FFT-based

STFT for the case of an 8-point STFT , using an Intel CORE

i3 and MATLAB R2009b. The simulation runs 100 iterations

of each algorithm. Fig. 6 shows the time of each iteration.

In the proposed algorithm the average time per iteration is

13.29 µs whereas the average time for the FFT-based approach

is 15.69 µs. This represents savings of 18 % for the proposed

algorithm compared to using an FFT-based approach.

V. PROPOSED FEEDFOWARD STFT ARCHITECTURE

The hardware implementation of the feedforward STFT is

shown in Fig. 5 for the case of N = 16. It consists of four

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS 4

Fig. 5. Hardware architecture of the proposed 16-point radix-2 DIT STFT.

0 20 40 60 80 100 120

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

x 10

−5

Iteration

Time(s)

Proposed algorithm

Using FFT

Fig. 6. Execution time of the proposed vs. FFT-based algorithm, in an Intel

CORE i3 using MATLAB R2009b.

stages with butterﬂies, multipliers and buffers. Boxed numbers

represent the length of the buffers, whereas numbers close to

the multipliers indicate the value φ of the twiddle factor. Note

that all the multipliers in the feedforward STFT multiply by

a constant, which simpliﬁes the design.

Like the feedforward STFT algorithm, the number of but-

terﬂies and multipliers increases with the stage. Speciﬁcally,

the number of adders in stage s is 2

s

, the number of constant

multipliers is 2

s−1

, and the memory is N/2. This leads to a

total of 2N − 2 adders, N − 1 multipliers and a memory size

of (N/2) log

2

N in the entire architecture.

From the DIT FFT algorithm, the feedforward STFT archi-

tecture inherits the property that the output of each stage s

provides the result of a 2

s

-point STFT.

Table I compares the FFT-based, iterative and feedforward

STFTs. Compared to the FFT-based STFT, the feedforward

TABLE I

COMPA RISON OF STFT HARDWARE IMPLEMENTATIONS

STFT Implementation FFT-based Iterative Feedforward

Adders 2N log

2

N N 2N − 2

Multipliers N log

2

N N N − 1

Memory N

2

2N (N/2) log

2

N

Accumulative Error No Yes No

STFT reduces the amount of adders, multipliers and memory

signiﬁcantly. Compared to the iterative STFT, the feedfoward

STFT has comparable amount of adders and multipliers and

has the advantage that it does not have accumulative error in

the calculations.

VI. STFT FOR REAL-VALUED INPUTS

The STFT version for real-valued inputs is derived from

the proposed STFT by following the approach in [16]. This

approach considers the property that in a real-valued FFT

X[N − k] = X

∗

[k]. This allows for removing the calculation

of part of the ﬂow graph as shown in Fig. 8. Speciﬁcally,

X[12] is obtained from X[4]; X[14] and X[6] from X[2] and

X[10]; and X[15], X[7], X[11] and X[3] from X[1], X[9],

X[5] and X[13], respectively.

The impact in the feedforward STFT algorithm is that

certain lines of the algorithm can be removed. Speciﬁcally,

the lines with a star (⋆) in equation (4) do not need to be

calculated. They correspond to the frequencies that can be

obtained from their symmetric frequency.

With respect to the STFT architecture a number of but-

terﬂies, buffers and multipliers can be removed from the

feedfoward STFT in Fig. 5, leading to the architecture in

Fig. 7. Speciﬁcally, the datapaths for the frequencies that

can be obtained from their symmetric frequency are removed.

Furthermore, in the real feedforward STFT in Fig. 7, some

The Feedforward Short-Time Fourier Transform

Citations

Comprehensive Review on Detection and Classification of Power Quality Disturbances in Utility Grid With Renewable Energy Penetration

Digital signal processing for self-vibration monitoring in grinding: A new approach based on the time-frequency analysis of vibration signals

Novel Short-Time Fractional Fourier Transform: Theory, Implementation, and Applications

DeepVoice: A voiceprint-based mobile health framework for Parkinson's disease identification

Radar Emitter Recognition Based on SIFT Position and Scale Features

References

A Pipelined FFT Architecture for Real-Valued Signals

A hardware-efficient, multirate, digital channelized receiver architecture

Implementing FFT-based digital channelized receivers on FPGA platforms

Novel parallel architectures for short-time Fourier transform

Optimum Circuits for Bit Reversal

Related Papers (5)

Novel parallel architectures for short-time Fourier transform

The fractional Fourier transform: theory, implementation and error analysis

Observer-Based Recursive Sliding Discrete Fourier Transform [Tips & Tricks]

Digital signals, processors and noise

Discrete Fourier Transform and Signal Spectrum