World Applied Sciences Journal 18 (9): 1291-1295, 2012 ISSN 1818-4952 © IDOSI Publications, 2012 DOI: 10.5829/idosi.wasj.2012.18.09.1134

# **Spartan-3A FPGA Implementation**

Mohammed H. Al Mijalli

Department of Biomedical Technology, College of Applied Medical Sciences, King Saud University, Riyadh, Saudi Arabia

**Abstract:** This paper presents Spartan-3A devices; including, XC5S50A (package: tq144, speed grade: -5), XC3S200A (package: ft256, speed grade: -5), XC3S400A (package: Fg400, speed grade: -5), XC3S700A (package: fg484, speed grade: -5) field programmable gate array (FPGA) design and implementation using Very High speed integrated circuit Hardware Description Language (VHDL) based Braun's multipliers. The resources utilization is obtained for 4×4, 6×6, 8×8 and 12×12 Braun's multipliers. The comparison between Spartan-3A devices show same numbers for four input LUTs, occupied slices, bonded IOBs, total equivalent gate count but their average connection and maximum pin delays are different. For average and maximum pin delays all devices show to some extent comparable behaviour.

Key words: Application Specific Integarted Circuits (ASICs) • Braun's Multipliers • Field Programmable Gate Array (FPGA) • Spartan-3A • Very High speed integrated circuit Hardware Description Language (VHDL)

# **INTRODUCTION**

digital signal processing (DSP) systems In multipliers are required by finite impulse response (FIR) filters and other DSP functions including Fourier Transform and DCT, so an efficient implementation of multipliers is the key for cost effective solution of these applications [1-4]. In order to develop hardware implementation using Application Specific Integrated Circuits (ASICs) designs, buy expensive fixed function processors for example an FFT chip, or use an array of microprocessors are costly and time consuming. Current development in very large scale integration (VLSI) technology in Field Programmable Gate Array (FPGA) performance and size offer a new hardware acceleration option and provides real time configuration [5]. New communication standards and high channel aggregation system requirements are pushing DSP system performance requirements beyond the capabilities of digital signal processors. Xilinx FPGA family includes embedded DSP block multipliers making them an excellent solution for DSP systems.

To achieve efficient hardware implementation of parallel multipliers; numerous research efforts have been reported [1-4, 6-10].

The objective of this work is to demonstrate the Spartan-3A FPGA devices based hardware implementation of Braun's Multipliers. The Spartan-3A FPGA family devices includes: XC5S50A, XC3S200A, XC3S400A and XC3S700A.

## MATERIALS AND METHODS

FPGA Platform: FPGAs are essentially hardware implementation devices, although they are programmable. In microprocessor the design is controlled by scheduling the operation of a processing engine, but FPGA configuring the hardware itself to perform the necessary operation for a particular design. FPGAs especially find applications in algorithms that can make use of the massive parallelism offered by their architecture. Remarkable speedup in computation time has been achieved by assigning computation intensive tasks to hardware implementation and similarly exploiting the parallelism in algorithms. The flexibility of the FPGA allows for even higher performance by trading off precision and range in the number format for an increased number of parallel arithmetic units. This has driven a new type of processing called reconfigurable computing, where time intensive tasks are offloaded from software to FPGAs.

Corresponding Author: Mohammed H. Al Mijalli, Department of Biomedical Technology, College of Applied Medical Sciences, King Saud University, Riyadh, Saudi Arabia.



Fig. 1: The architecture of a 6×6-bit Braun's multiplier

**Spartan-3:** Spartan-3 [11] FPGA is designed to meet the requirements of high volume, low cost electronic systems. The Spartan-3 family includes eight member offering densities ranging from 50,000 to five million system gates. The Spartan-3 FPGA consists of five fundamental programmable functional elements: CLBs, IOBs, Block RAMs, 18×18 bit dedicated multipliers and digital clock managers (DCMs).

**Braun's Multipliers:** Braun's multiplier is an  $m \times n$  bit parallel multiplier and generally known as carry save multiplier and is constructed with  $m \times (n-1)$  adders and  $m \times n$  AND gates is shown in Fig. 1. In Braun's multiplier each product is generated in parallel with the AND gates. The row of adders produces sum of partial product added to the each partial product. The carry out will be shifted one bit to the left or right and then it will be added to the sum which is generated by the first adder and the newly generated partial product. The shifting is carried out with the help of Carry Save Adder (CSA) and the Ripple Carry Adder (RCA). The RCA is used instead of CSA in the last stage of the Braun's multiplier. The Braun's multiplier has a glitching problem which is due to the RCA.

**Mathematical Basis of Braun's Multiplier:** Consider a generic *m* by *n* multiplication of two unsigned *n*-bit numbers  $Y = Y_{m-1} \dots Y_0$  and  $X = X_{n-1} \dots X_0$ 

$$Y = \sum_{i=0}^{m-1} Y_i 2^i$$

$$X = \sum_{i=0}^{n-1} X_i 2^i$$

The product  $P = P_{2n-1} \dots P_1 P_0$ , which results from multiplying the multiplicand *Y* by the multiplier *X*, can be written as follows:

$$P = \sum_{i=0}^{m-1} \sum_{j=0}^{n-1} (Y_i \cdot X_j) 2^{i+j}$$

#### RESULTS

The design of Braun's multipliers  $4 \times 4$ ,  $6 \times 6$ ,  $8 \times 8$  and  $12 \times 12$ -bit are done using VHDL and implemented in a Xilinx Spartan-3A FPGA family; devices including XC5S50A (package: tq144, speed grade: -5), XC3S200A (package: ft256, speed grade: -5), XC3S400A (package: Fg400, speed grade: -5), XC3S700A (package: fg484, speed grade: -5) using the Xilinx ISE 9.2i design tool [12].

### DISCUSSION

Figs. 2, 3, 4 and 5 show the block diagram of  $4 \times 4$ ,  $6 \times 6$ , 8×8 and 12×12-bit Braun's multipliers. Figs. 6 and 7 illustrate the internal complete RTL schematic of the 4×4 and 6×6-bit Braun's multipliers. Figs. 8 and 9 demonstrate the part of internal RTL schematic of the 8×8 and 12×12bit Braun's multipliers. Tables 1-4 summarize the FPGA device resources utilization for standard Braun's 4×4, 6×6, 8×8 and 12×12-bit multipliers. FPGA resource utilization

#### World Appl. Sci. J., 18 (9): 1291-1295, 2012

|           |             | · · ·       |              |            |                       |                       |                |  |
|-----------|-------------|-------------|--------------|------------|-----------------------|-----------------------|----------------|--|
|           |             | Four Input  | Occupied     | Bonded     | Total                 | Average               | Maximum        |  |
| Bit Width | Multipliers | LUTs (1408) | Slices (704) | IOBs (108) | Equivalent Gate Count | Connection delay (ns) | Pin delay (ns) |  |
| 4×4       | Standard    | 32          | 17           | 16         | 192                   | 0.668                 | 1.786          |  |
| 6×6       | Standard    | 75          | 40           | 24         | 450                   | 0.818                 | 2.574          |  |
| 8×8       | Standard    | 133         | 69           | 32         | 798                   | 0.766                 | 2.237          |  |
| 12×12     | Standard    | 295         | 152          | 48         | 1770                  | 0.917                 | 3.770          |  |

Table 1: FPGA resource utilization for standard Braun's multiplier for Spartan-3A XC3S50A (Package: tq144, speed grade:-5)

Table 2: FPGA resource utilization for standard Braun's multiplier for Spartan-3A XC3S200A (Package: ft256, speed grade:-5)

|           |             | Four Input  | Occupied      | Bonded     | Total                 | Average               | Maximum        |
|-----------|-------------|-------------|---------------|------------|-----------------------|-----------------------|----------------|
| Bit Width | Multipliers | LUTs (3584) | Slices (1792) | IOBs (195) | Equivalent Gate Count | Connection delay (ns) | Pin delay (ns) |
| 4×4       | Standard    | 32          | 17            | 16         | 192                   | 0.866                 | 2.061          |
| 6×6       | Standard    | 75          | 40            | 24         | 450                   | 0.847                 | 2.869          |
| 8×8       | Standard    | 133         | 69            | 32         | 798                   | 0.802                 | 2.841          |
| 12×12     | Standard    | 295         | 152           | 48         | 1770                  | 0.938                 | 3.527          |

Table 3: FPGA resource utilization for standard Braun's multiplier for Spartan-3A XC3S400A (Package: fg400, speed grade:-5)

|           |             | Four Input  | Occupied      | Bonded     | Total                 | Average               | Maximum        |
|-----------|-------------|-------------|---------------|------------|-----------------------|-----------------------|----------------|
| Bit Width | Multipliers | LUTs (7168) | Slices (3584) | IOBs (311) | Equivalent Gate Count | Connection delay (ns) | Pin delay (ns) |
| 4×4       | Standard    | 32          | 17            | 16         | 192                   | 0.958                 | 2.823          |
| 6×6       | Standard    | 75          | 40            | 24         | 450                   | 0.892                 | 3.209          |
| 8×8       | Standard    | 133         | 69            | 32         | 798                   | 0.859                 | 2.703          |
| 12×12     | Standard    | 295         | 152           | 48         | 1770                  | 1.013                 | 3.230          |

Table 4: FPGA resource utilization for standard Braun's multiplier for Spartan-3A XC3S700A (Package: fg484, speed grade:-5)

|           |             | Four Input   | Occupied      | Bonded     | Total                 | Average               | Maximum        |  |
|-----------|-------------|--------------|---------------|------------|-----------------------|-----------------------|----------------|--|
| Bit Width | Multipliers | LUTs (11776) | Slices (5888) | IOBs (372) | Equivalent Gate Count | Connection delay (ns) | Pin delay (ns) |  |
| 4×4       | Standard    | 32           | 17            | 16         | 192                   | 1.336                 | 3.425          |  |
| 6×6       | Standard    | 75           | 40            | 24         | 450                   | 0.874                 | 2.965          |  |
| 8×8       | Standard    | 133          | 69            | 32         | 798                   | 1.325                 | 4.187          |  |
| 12×12     | Standard    | 295          | 152           | 48         | 1770                  | 1.020                 | 4.626          |  |



Fig. 2: Block diagram of 4×4-bit Braun's multiplier



Fig. 3: Block diagram of 6×6-bit Braun's multiplier



Fig. 4: Block diagram of 8×8-bit Braun's multiplier



Fig. 5: Block diagram of 12×12-bit Braun's multiplier



Fig. 6: Complete RTL schematic of the 4×4-bit Braun's multiplier

shows similar findings in Spartan-3A FPGA XC5S50A, XC3S200A, XC3S400A and XC3S700A. Figs. 10 and 11 show the differences in average connection delay and maximum pin delay for Spartan-3A FPGA devices.

The only difference is seen in for average connection delay and maximum pin delay. The  $4\times4$  bit multiplier shows a gradual increase in value for average connection



Fig. 7: Complete RTL schematic of the 6×6-bit Braun's multiplier



Fig. 8: Part of RTL schematic of the 8×8-bit Braun's multiplier



Fig. 9: Part of RTL schematic of the 12×12-bit Braun's multiplier



Fig. 10: The average connection delay for Spartan-3A for Braun's multipliers



Fig. 11: The maximum pin delay for Spartan-3A for Braun's multipliers

delay for all the devices. The  $6\times 6$  bit multiplier demonstrates a change of pattern in XC3S700A device for average connection delay. The same trend is observed in  $8\times 8$  and  $12\times 12$  bit multipliers as compared to  $4\times 4$  bit multiplier for average connection delay. The maximum pin delay for Spartan-3A FPGA XC5S50A, XC3S200A, XC3S400A and XC3S700A devices show same value for  $4\times 4$  bit multiplier. The low value is observed for XC3S700A for  $6\times 6$  bit multiplier for maximum pin delay. For  $8\times 8$  and  $12\times 12$  bit multipliers lower value is seen in XC3S700A for maximum pin delay. For average connection delay and maximum pin delay all devices exhibit somehow similar trend.

# CONCLUSION

We have presented hardware design and implementation of FPGA based parallel architecture for standard Braun's multipliers using VHDL. The design was implemented on Xilinx Spartan-3A XC5S50A (package: tq144, speed grade: -5), XC3S200A (package: ft256, speed grade: -5), XC3S400A (package: Fg400, speed grade: -5), XC3S700A (package: fg484, speed grade: -5) using the Xilinx ISE 9.2i design tool. The comparison between Spartan-3A devices show same numbers for four input LUTs, occupied slices, bonded IOBs, total equivalent gate count but their average connection and maximum pin delays are different. However, all devices show similar trend for average and maximum pin delays.

# ACKNOWLEDGMENT

The author extends appreciation to the College of Applied Medical Sciences Research Center and the Deanship of Scientific Research at King Saud University for funding this research.

### REFERENCES

- Rais, M.H., 2009. Efficient hardware realization of truncated multipliers using FPGA, World Academy of Science, Engineering and Technology, 33: 741-745. http://www.waset.org/journals/waset/v33.php
- Rais, M.H. and M.H. Al Mijalli, 2011. Braun's multipliers: Spartan-3AN based design and implementation, J. Comput. Sci., 7(11): 1629-1632.
- Vijeyakumar, K.N., V. Sumathy, S. Komanduri and C.C.G. Suji, 2011. Design of low- power high-speed error tolerant shift and add multiplier, J. Computer Sci., 7(12): 1839-1845.
- Song, M.A., L.D. Van and S.Y. Kuo, 2007. Adaptive low-error fixed-width Booth multipliers, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sci., E90-A(6): 1180-1187.

- Parker, M., 2010. Digital Signal Processing: Everything You Need to Know to Get Started, Newnes, Burlington, MA.
- Rais, M.H. and M.H. Al Mijalli, 2011. Field programmable gate arrays based design, implementation and delay study of Braun's multipliers, J. Computer Sci., 7(12): 1894-1899.
- Rais, M.H. and M.H. Al Mijalli, 2011. Virtex-5 FPGA Based Braun's Multipliers, Int. J. Comput. Sci. Network Security, 11(8): 81-84.
- Algnabi, Y.S., R. Teymourzadeh, M. Othman, M.S. Islam and M.V. Hong, 2010. On-Chip Implementation of Pipeline Digit-Slicing Multiplier-Less Butterfly for Fast Fourier Transform Architecture, Am. J. Eng. Appl. Sci., 3(4): 757-764.
- Al-Qadi, Z and and M. Aqel, 2009. Performance Analysis of Parallel Matrix Multiplication Algorithms Used in Image Processing, World Applied Sciences Journal, 6(1): 45-52.
- Asadi, P. and K. Navi, 2007. A New Low Power 32×32- bit Multiplier, World Applied Sciences Journal, 2(4): 341-347.
- 11. Xilinx, Spartan-3 FPGA family datasheet, 2008. http://www.xilinx.com/support/documentation/data \_sheets/ds099.pdf.
- 12. Xilinx, ISE 9.2i design tool, 2007. www.xilinx.com/prs\_rls/2007/software/0786\_ise92i.h tm.