scispace - formally typeset
Open AccessJournal ArticleDOI

Automatic implementation of FIR filters on field programmable gate arrays

S. Mohanakrishnan, +1 more
- 01 Mar 1995 - 
- Vol. 2, Iss: 3, pp 51-53
TLDR
This letter describes a CAD system for automatic implementation of FIR filters on Xilinx field programmable gate arrays (FPGA) given the frequency specifications, and the FPGA specific mapping techniques used to increase speed are discussed.
Abstract
This letter describes a CAD system for automatic implementation of FIR filters on Xilinx field programmable gate arrays (FPGA). Given the frequency specifications, this software package designs an FIR filter, optimizes the filter coefficients in the power of two coefficient space, and implements it on FPGA chips. The FPGA specific mapping techniques used to increase speed are discussed. The performance of the typical filters that were implemented is presented. >

read more

Content maybe subject to copyright    Report

AUTOMATIC IMPLEMENTATION OF FIR
FILTERS ON FIELD PROGRAMMABLE
GATE ARRAYS
Satish Mohanakrishnan and Joseph B. Evans
Telecommunications & Information Sciences Laboratory
Department of Electrical Engineering & Computer Science
University of Kansas
Lawrence, KS 66045-2228
October 7, 1993
ABSTRACT
This paper describes a CAD system for automatic implementation of FIR filters
on Xilinx Field Programmable Gate Arrays. Given the frequency specifications,
this software package designs an FIR filter, optimizes the filter coefficients in the
power of two coefficient space and implements it on an FPGA chip. The FPGA
specific mapping techniques used to increase speed are discussed. The performance
of the typical filters which were implemented is presented.
This research is partially supported by the Kansas Technology Enterprise Corporation through
the Center for Excellence in Computer-Aided Systems Engineering and by the University of Kansas
General Research allocation 3626-20-0038.

y
k
k
x
D DD
w
0
w
N-1
w
N-3
w
N-2
Figure 1: FIR Filter Structure
1 Introduction
FiniteImpulseResponse filterswithoutfullmultipliersand their potential highspeed
VLSI implementations have received attention over the past decade [1, 2, 3, 4].
An efficient FIR lter architecture suitable for Field Programmable Gate Arrays
(FPGA), which requires the coefficients to be a sum or difference of two power-
of-two terms was discussed in [1]. In this paper, we present an improved lter tap
structure and several mapping techniques which were used to increase the sampling
rate. This paper also describes a CAD system which can be used for design of
FIR lters, optimization of filter coefficients in the discrete coefficient space, and
subsequent implementation on Xilinx XC3100 series FPGAs.
2 Background
In binary arithmetic, multiplication by a power-of-two is simply a shift operation.
Implementation of systems with multiplications may be simplified by using only a
limited number of power-of-two terms, so that only a limited number of shift and
add operations are required.
In order to obtain good performance using a small number of such terms, the
number of power-of-two terms used in approximating each coefficient value, the
architecture of the filter, and the optimization technique used to derive the discrete
space coefficient values must be carefully selected. It was demonstrated in [4] that
an FIR filter with -60dB of frequency response ripple magnitude can be realized
using two power-of-two terms for each coefficient value.
An inverted form FIR filter, which will be used in our FPGA implementations,
is depicted in Figure 1. If the coefficient value is an integer power-of-two or a sum
of two powers-of-two, the multipliers can be replaced by one or two shifters. Since
the coefficients will be fixed for this class of lter, the coefficient values can be
realized by appropriately routing the inputs to the full adders in the filter structure.
That is, moving the adder inputs
k
places to the left achieves the same effect as
would a coefficient value of 2
k
.
2

filter
tap
filter
tap
filter
tap
B
i
B
i
B
d
input
B
i
B
d
B
i
B
d
B
i
B
d
outputadder
Figure 2: FIR Filter Architecture
3 Architecture
The overall filter structure is shown in Figure 2, where the filter taps and final adder
stage are shown. The adder is required to resolve the carries that are generated
and propagated through the pipeline. The structure of a portion of a typical filter
tap is shown in Figure 3, where the internal pipeline is depicted. The two shifted
versions of the data corresponding to the two power-of-two components of each
coefficient are shown as dotted lines. Two adders are necessary for adding the sum
and carry generated by the previous tap and the two shifted versions. The sign of
the coefficients is controlled by inverters. The sum and carry signals from the full
addersare pipelined using a carry-saveaddition(CSA) technique inorder to increase
the sampling rate and alleviate potential routing delays. The hardware requirements
for a tap with
B
d
input datapath bits and
B
i
intermediate accumulation path bits are
then 2
B
i
full adders and a minimum of 2
B
i
flip-flops.
4 CAD Tool
4.1 Filter Design and Optimization
The first stage in the design process is to obtain the filter coefficients. Given
the frequency specifications, MILP3, written by Y.C. Lim [5] is used to obtain a
continuous solution (which assumes infinite precision coefficient values). MILP3
3

X
3
S
3
X
2
S
2
X
1
S
1
X
0
S
0
C
3
C
2
C
1
C
0
X
3
C
3
S
3
X
2
C
2
S
2
X
1
C
1
S
1
X
0
C
0
S
0
FA FA FA FA
FAFAFAFA
DD DD DD DD
Figure 3: FIR Filter Tap Structure
uses standard integer programming techniques to optimize this continuous solution
in the discrete powers-of-two coefficient space [2]. The resulting discrete solution
has coefficients which are a sum or difference of power-of-two terms.
4.2 Xilinx Implementation
The output of the optimization stage is fed to code which maps the filter onto the
FPGA. With the help of the Xilinx tools, the configuration details for the FPGA are
then generated.
4.2.1 Place and Route
Due to the limited availability of global and local routing resources, placement of
ConfigurableLogic Blocks (CLBs) and routingof nets are very critical in any FPGA
design. The Automatic Place and Route (APR) program provided by Xilinx cannot
be used to provide 100% placement for the following reasons. Due to the large of
number of variables in the optimization problem, it takes an exorbitant amount of
CPU time for placement. Since it is a general purpose package based on heuristic
methods, it cannot always give the optimum placement for all the designs. For
instance, when APR was given full freedom of placement for all of the 22 x 22
4

2
2
2
2
1
1
1
1
2
2
2
2
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
1
1
1
1
Figure 4: Mapping of the filter architecture on the Xilinx FPGA
array of CLBs for a 11 tap filter, it took 9 hours 2 minutes and 27 secs on a Sun
SPARCstation–2 for the completion of placement and routing.
The mapping of the architecture in Figure 3 is shown in Figure 4, where each
full adder is implemented in a Configurable Logic Block (CLB). The two rows of
full adders map to alternate columns of the chip referred to as 1 and 2 as shown in
Figure 4. To reduce congestion, the two shifted versions of the data are distributed
among the two sets of full adders, whereas in the previous approach [1], they were
routed to the rst set of full adders. In the previous tap structure, the sum outputs
of the second set of full adders in any tap are fed to the corresponding full adders in
the next tap, which are two columns away. When the new structure is mapped onto
the FPGA, the routing is only between CLBs which are in the adjacent columns.
This makes more efficient use of the local routing resources. This structure has been
found to achieve an improvement of 5 - 15% in the sampling rate for several typical
filters.
The input data bus is distributed using horizontal long lines from one end of
the chip to the other. By careful assignment of input pins and hence the horizontal
long lines to the data, it is possible to reduce the maximum distance between
any horizontal long line and a CLB where the data is needed. The assignment
which gives the least distance and hence the delay varies from filter to filter. This
optimization problem, with relatively a relatively small number of variables, is
solved very effectively by APR to give an improvement of 20 - 30% in the sampling
rate over the unoptimized placement.
5

Citations
More filters
Patent

Programmable logic device with cascading DSP slices

TL;DR: In this article, a programmable logic device (PLD) with columns of DSP slices that can be cascaded to create DSP circuits of varying size and complexity is described, each slice includes a plurality of operand input ports and a slice output port, all of which are programmably connected to general routing and logic resources.
Patent

Arithmetic circuit with multiplexed addend inputs

TL;DR: In this paper, arithmetic circuits are divided logically into a product generator and an adder, and multiplexing circuitry logically disposed between the generator and adder supports conventional functionality by providing partial products from the product generator to addend terminals of the adder.
Patent

Applications of cascading DSP slices

TL;DR: In this article, a plurality of cascaded digital signal processing slices, where each slice has a multiplier coupled to an adder via a multiplexer, and each slice can be configured to perform one or more mathematical operations via opmodes.
Patent

Narrow-band filter including sigma-delta modulator implemented in a programmable logic device

TL;DR: In this article, a narrow-band bandpass filter is implemented in a field programmable gate array (FPGA), where an analog-to-digital converter quantizes an input analog signal with a high degree of precision to produce input data samples.
Patent

Digital signal processing circuit having a pattern circuit for determining termination conditions

TL;DR: In this paper, a method for detecting a pattern from an arithmetic logic unit (ALU) in an integrated circuit is presented, which includes the steps of: generating an output from an ALU; bitwise comparing the ALU output to a pattern to produce a first output; inverting the pattern and comparing the output with the inverted pattern to generate a second output; and storing the first and second output comparison bits in a memory.
References
More filters
Journal ArticleDOI

VLSI Array processors

Sun-Yuan Kung
- 01 Jan 1985 - 
TL;DR: A general overview of VLSI array processors and a unified treatment from algorithm, architecture, and application perspectives is provided in this article, where a broad range of application domains including digital filtering, spectrum estimation, adaptive array processing, image/vision processing, and seismic and tomographic signal processing.

VLSI array processors

Sun-Yuan Kung
TL;DR: A general overview of VLSI array processors is provided and a unified treatment from algorithm, architecture, and application perspectives is provided.
Journal ArticleDOI

FIR filter design over a discrete powers-of-two coefficient space

TL;DR: In this article, a digital filter with discrete coefficient values selected from the powers-of-two coefficient space is designed using the methods of integer programming, and the frequency responses obtained are shown to be superior to those obtained by simply rounding the coefficients.
Journal ArticleDOI

Design of cascade form FIR filters with discrete valued coefficients

TL;DR: The authors show that by cascading two direct-form FIR filters, each with coefficients that are the sum or difference of two power-of-two terms, it is possible to achieve very small peak ripple.
Frequently Asked Questions (15)
Q1. What are the contributions in "Automatic implementation of fir filters on field programmable gate arrays" ?

This paper describes a CAD system for automatic implementation of FIR filters on Xilinx Field Programmable Gate Arrays. The FPGA specific mapping techniques used to increase speed are discussed. The performance of the typical filters which were implemented is presented. This research is partially supported by the Kansas Technology Enterprise Corporation through the Center for Excellence in Computer-Aided Systems Engineering and by the University of Kansas General Research allocation 3626-20-0038. 

The hardware requirements for a tap with Bd input datapath bits andBi intermediate accumulation path bits are then 2Bi full adders and a minimum of 2Bi flip-flops. 

An input data word size of 10 bits was used for all the examples; the 22 rows provide sufficient intermediate word width protection against overflow. 

With the Xilinx XC3195, which has an array of 22 by 22 (484) CLBs, the maximum intermediate wordlength that can be accomodated is 22 bits. 

In this paper, the authors present an improved filter tap structure and several mapping techniques which were used to increase the sampling rate. 

Due to the limited availability of global and local routing resources, placement of Configurable Logic Blocks (CLBs) and routing of nets are very critical in any FPGA design. 

For instance, when APR was given full freedom of placement for all of the 22 x 22array of CLBs for a 11 tap filter, it took 9 hours 2 minutes and 27 secs on a Sun SPARCstation–2 for the completion of placement and routing. 

It was demonstrated in [4] that an FIR filter with -60dB of frequency response ripple magnitude can be realized using two power-of-two terms for each coefficient value. 

The Automatic Place and Route (APR) program typically requires 10 - 15 minutes for routing this type of FIR filter implementation. 

An efficient FIR filter architecture suitable for Field Programmable Gate Arrays (FPGA), which requires the coefficients to be a sum or difference of two powerof-two terms was discussed in [1]. 

In order to obtain good performance using a small number of such terms, the number of power-of-two terms used in approximating each coefficient value, the architecture of the filter, and the optimization technique used to derive the discrete space coefficient values must be carefully selected. 

It is possible to implement the final adder stage in the XC4000 series of FPGAs, however, by virtue of the fast carry logic supported by these devices. 

The sum and carry signals from the full adders are pipelined using a carry-save addition (CSA) technique in order to increase the sampling rate and alleviate potential routing delays. 

Given the frequency specifications, MILP3, written by Y.C. Lim [5] is used to obtain a continuous solution (which assumes infinite precision coefficient values). 

The Automatic Place and Route (APR) program provided by Xilinx cannot be used to provide 100% placement for the following reasons.