What are the hardware requirements for a tap with Bd input datapath bits andBi?

The hardware requirements for a tap with Bd input datapath bits andBi intermediate accumulation path bits are then 2Bi full adders and a minimum of 2Bi flip-flops.

How many input data bits were used for each example?

An input data word size of 10 bits was used for all the examples; the 22 rows provide sufficient intermediate word width protection against overflow.

How many bits can be accomodated in the XC3195?

With the Xilinx XC3195, which has an array of 22 by 22 (484) CLBs, the maximum intermediate wordlength that can be accomodated is 22 bits.

Why is the Xilinx APR program so critical?

Due to the limited availability of global and local routing resources, placement of Configurable Logic Blocks (CLBs) and routing of nets are very critical in any FPGA design.

How long did it take to complete the placement and routing?

For instance, when APR was given full freedom of placement for all of the 22 x 22array of CLBs for a 11 tap filter, it took 9 hours 2 minutes and 27 secs on a Sun SPARCstation–2 for the completion of placement and routing.

How long does it take to implement an APR?

The Automatic Place and Route (APR) program typically requires 10 - 15 minutes for routing this type of FIR filter implementation.

What is the description of the FIR filter architecture?

An efficient FIR filter architecture suitable for Field Programmable Gate Arrays (FPGA), which requires the coefficients to be a sum or difference of two powerof-two terms was discussed in [1].

How many terms are used in the design of a FIR filter?

In order to obtain good performance using a small number of such terms, the number of power-of-two terms used in approximating each coefficient value, the architecture of the filter, and the optimization technique used to derive the discrete space coefficient values must be carefully selected.

How can The authorimplement the final adder stage in the XC4000 series?

It is possible to implement the final adder stage in the XC4000 series of FPGAs, however, by virtue of the fast carry logic supported by these devices.

(Open Access) Automatic implementation of FIR filters on field programmable gate arrays (1995) | S. Mohanakrishnan

Q: What is the purpose of this paper?

In this paper, the authors present an improved filter tap structure and several mapping techniques which were used to increase the sampling rate.

Q: How many power-of-two terms can be used for each coefficient value?

It was demonstrated in [4] that an FIR filter with -60dB of frequency response ripple magnitude can be realized using two power-of-two terms for each coefficient value.

AUTOMATIC IMPLEMENTATION OF FIR

FILTERS ON FIELD PROGRAMMABLE

GATE ARRAYS

Satish Mohanakrishnan and Joseph B. Evans

Telecommunications & Information Sciences Laboratory

Department of Electrical Engineering & Computer Science

University of Kansas

Lawrence, KS 66045-2228

October 7, 1993

ABSTRACT

This paper describes a CAD system for automatic implementation of FIR ﬁlters

on Xilinx Field Programmable Gate Arrays. Given the frequency speciﬁcations,

this software package designs an FIR ﬁlter, optimizes the ﬁlter coefﬁcients in the

power of two coefﬁcient space and implements it on an FPGA chip. The FPGA

speciﬁc mapping techniques used to increase speed are discussed. The performance

of the typical ﬁlters which were implemented is presented.

This research is partially supported by the Kansas Technology Enterprise Corporation through

the Center for Excellence in Computer-Aided Systems Engineering and by the University of Kansas

General Research allocation 3626-20-0038.

D DD

N-1

N-3

N-2

Figure 1: FIR Filter Structure

1 Introduction

FiniteImpulseResponse ﬁlterswithoutfullmultipliersand their potential highspeed

VLSI implementations have received attention over the past decade [1, 2, 3, 4].

An efﬁcient FIR ﬁlter architecture suitable for Field Programmable Gate Arrays

(FPGA), which requires the coefﬁcients to be a sum or difference of two power-

of-two terms was discussed in [1]. In this paper, we present an improved ﬁlter tap

structure and several mapping techniques which were used to increase the sampling

rate. This paper also describes a CAD system which can be used for design of

FIR ﬁlters, optimization of ﬁlter coefﬁcients in the discrete coefﬁcient space, and

subsequent implementation on Xilinx XC3100 series FPGAs.

2 Background

In binary arithmetic, multiplication by a power-of-two is simply a shift operation.

Implementation of systems with multiplications may be simpliﬁed by using only a

limited number of power-of-two terms, so that only a limited number of shift and

add operations are required.

In order to obtain good performance using a small number of such terms, the

number of power-of-two terms used in approximating each coefﬁcient value, the

architecture of the ﬁlter, and the optimization technique used to derive the discrete

space coefﬁcient values must be carefully selected. It was demonstrated in [4] that

an FIR ﬁlter with -60dB of frequency response ripple magnitude can be realized

using two power-of-two terms for each coefﬁcient value.

An inverted form FIR ﬁlter, which will be used in our FPGA implementations,

is depicted in Figure 1. If the coefﬁcient value is an integer power-of-two or a sum

of two powers-of-two, the multipliers can be replaced by one or two shifters. Since

the coefﬁcients will be ﬁxed for this class of ﬁlter, the coefﬁcient values can be

realized by appropriately routing the inputs to the full adders in the ﬁlter structure.

That is, moving the adder inputs

places to the left achieves the same effect as

would a coefﬁcient value of 2

filter

tap

filter

tap

filter

tap

input

outputadder

Figure 2: FIR Filter Architecture

3 Architecture

The overall ﬁlter structure is shown in Figure 2, where the ﬁlter taps and ﬁnal adder

stage are shown. The adder is required to resolve the carries that are generated

and propagated through the pipeline. The structure of a portion of a typical ﬁlter

tap is shown in Figure 3, where the internal pipeline is depicted. The two shifted

versions of the data corresponding to the two power-of-two components of each

coefﬁcient are shown as dotted lines. Two adders are necessary for adding the sum

and carry generated by the previous tap and the two shifted versions. The sign of

the coefﬁcients is controlled by inverters. The sum and carry signals from the full

addersare pipelined using a carry-saveaddition(CSA) technique inorder to increase

the sampling rate and alleviate potential routing delays. The hardware requirements

for a tap with

input datapath bits and

intermediate accumulation path bits are

then 2

full adders and a minimum of 2

ﬂip-ﬂops.

4 CAD Tool

4.1 Filter Design and Optimization

The ﬁrst stage in the design process is to obtain the ﬁlter coefﬁcients. Given

the frequency speciﬁcations, MILP3, written by Y.C. Lim [5] is used to obtain a

continuous solution (which assumes inﬁnite precision coefﬁcient values). MILP3

FA FA FA FA

FAFAFAFA

DD DD DD DD

Figure 3: FIR Filter Tap Structure

uses standard integer programming techniques to optimize this continuous solution

in the discrete powers-of-two coefﬁcient space [2]. The resulting discrete solution

has coefﬁcients which are a sum or difference of power-of-two terms.

4.2 Xilinx Implementation

The output of the optimization stage is fed to code which maps the ﬁlter onto the

FPGA. With the help of the Xilinx tools, the conﬁguration details for the FPGA are

then generated.

4.2.1 Place and Route

Due to the limited availability of global and local routing resources, placement of

ConﬁgurableLogic Blocks (CLBs) and routingof nets are very critical in any FPGA

design. The Automatic Place and Route (APR) program provided by Xilinx cannot

be used to provide 100% placement for the following reasons. Due to the large of

number of variables in the optimization problem, it takes an exorbitant amount of

CPU time for placement. Since it is a general purpose package based on heuristic

methods, it cannot always give the optimum placement for all the designs. For

instance, when APR was given full freedom of placement for all of the 22 x 22

Figure 4: Mapping of the ﬁlter architecture on the Xilinx FPGA

array of CLBs for a 11 tap ﬁlter, it took 9 hours 2 minutes and 27 secs on a Sun

SPARCstation–2 for the completion of placement and routing.

The mapping of the architecture in Figure 3 is shown in Figure 4, where each

full adder is implemented in a Conﬁgurable Logic Block (CLB). The two rows of

full adders map to alternate columns of the chip referred to as 1 and 2 as shown in

Figure 4. To reduce congestion, the two shifted versions of the data are distributed

among the two sets of full adders, whereas in the previous approach [1], they were

routed to the ﬁrst set of full adders. In the previous tap structure, the sum outputs

of the second set of full adders in any tap are fed to the corresponding full adders in

the next tap, which are two columns away. When the new structure is mapped onto

the FPGA, the routing is only between CLBs which are in the adjacent columns.

This makes more efﬁcient use of the local routing resources. This structure has been

found to achieve an improvement of 5 - 15% in the sampling rate for several typical

ﬁlters.

The input data bus is distributed using horizontal long lines from one end of

the chip to the other. By careful assignment of input pins and hence the horizontal

long lines to the data, it is possible to reduce the maximum distance between

any horizontal long line and a CLB where the data is needed. The assignment

which gives the least distance and hence the delay varies from ﬁlter to ﬁlter. This

optimization problem, with relatively a relatively small number of variables, is

solved very effectively by APR to give an improvement of 20 - 30% in the sampling

rate over the unoptimized placement.

Automatic implementation of FIR filters on field programmable gate arrays

Figures

Citations

Programmable logic device with cascading DSP slices

Arithmetic circuit with multiplexed addend inputs

Applications of cascading DSP slices

Narrow-band filter including sigma-delta modulator implemented in a programmable logic device

Digital signal processing circuit having a pattern circuit for determining termination conditions

References

VLSI Array processors

VLSI array processors

FIR filter design over a discrete powers-of-two coefficient space

VLSI and Modern Signal Processing

Design of cascade form FIR filters with discrete valued coefficients

Related Papers (5)

Fpga implementation of digital filters

Digital data bit order conversion using universal switch matrix comprising rows of bit swapping selector groups

Reconfigurable SIMD coprocessor architecture for sum of absolute differences and symmetric filtering (scalable MAC engine for image processing)

High performance self modifying on-the-fly alterable logic FPGA, architecture and method

Fir filters with multiplexed inputs suitable for use in reconfigurable adaptive equalizers

Frequently Asked Questions (15)

Q1. What are the contributions in "Automatic implementation of fir filters on field programmable gate arrays" ?

Q2. What are the hardware requirements for a tap with Bd input datapath bits andBi?

Q3. How many input data bits were used for each example?

Q4. How many bits can be accomodated in the XC3195?

Q5. What is the purpose of this paper?

Q6. Why is the Xilinx APR program so critical?

Q7. How long did it take to complete the placement and routing?

Q8. How many power-of-two terms can be used for each coefficient value?

Q9. How long does it take to implement an APR?

Q10. What is the description of the FIR filter architecture?

Q11. How many terms are used in the design of a FIR filter?

Q12. How can The authorimplement the final adder stage in the XC4000 series?

Q13. What is the use of a carry-save technique?

Q14. What is the purpose of the MILP3 program?

Q15. What are the reasons why the APR program cannot be used to provide 100% placement?