scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

An efficient implementation of floating point multiplier

TL;DR: An efficient implementation of an IEEE 754 single precision floating point multiplier targeted for Xilinx Virtex-5 FPGA using VHDL to implement a technology-independent pipelined design.
Abstract: In this paper we describe an efficient implementation of an IEEE 754 single precision floating point multiplier targeted for Xilinx Virtex-5 FPGA. VHDL is used to implement a technology-independent pipelined design. The multiplier implementation handles the overflow and underflow cases. Rounding is not implemented to give more precision when using the multiplier in a Multiply and Accumulate (MAC) unit. With latency of three clock cycles the design achieves 301 MFLOPs. The multiplier was verified against Xilinx floating point multiplier core.

Content maybe subject to copyright    Report

Citations
More filters
01 Nov 1997
TL;DR: Recognizing the mannerism ways to get this books computer organization and design the hardware software interface 4th fourth edition by patterson hennessy is additionally useful.
Abstract: Recognizing the mannerism ways to get this books computer organization and design the hardware software interface 4th fourth edition by patterson hennessy is additionally useful. You have remained in right site to begin getting this info. acquire the computer organization and design the hardware software interface 4th fourth edition by patterson hennessy join that we manage to pay for here and check out the link.

832 citations

Journal ArticleDOI
06 Jan 2021-Nature
TL;DR: In this paper, the authors demonstrate a computationally specific integrated photonic hardware accelerator (tensor core) that is capable of operating at speeds of trillions of multiply-accumulate operations per second.
Abstract: With the proliferation of ultrahigh-speed mobile networks and internet-connected devices, along with the rise of artificial intelligence (AI)1, the world is generating exponentially increasing amounts of data that need to be processed in a fast and efficient way. Highly parallelized, fast and scalable hardware is therefore becoming progressively more important2. Here we demonstrate a computationally specific integrated photonic hardware accelerator (tensor core) that is capable of operating at speeds of trillions of multiply-accumulate operations per second (1012 MAC operations per second or tera-MACs per second). The tensor core can be considered as the optical analogue of an application-specific integrated circuit (ASIC). It achieves parallelized photonic in-memory computing using phase-change-material memory arrays and photonic chip-based optical frequency combs (soliton microcombs3). The computation is reduced to measuring the optical transmission of reconfigurable and non-resonant passive components and can operate at a bandwidth exceeding 14 gigahertz, limited only by the speed of the modulators and photodetectors. Given recent advances in hybrid integration of soliton microcombs at microwave line rates3-5, ultralow-loss silicon nitride waveguides6,7, and high-speed on-chip detectors and modulators, our approach provides a path towards full complementary metal-oxide-semiconductor (CMOS) wafer-scale integration of the photonic tensor core. Although we focus on convolutional processing, more generally our results indicate the potential of integrated photonics for parallel, fast, and efficient computational hardware in data-heavy AI applications such as autonomous driving, live video processing, and next-generation cloud computing services.

478 citations

Journal ArticleDOI
TL;DR: The results indicate the potential of integrated photonics for parallel, fast, and efficient computational hardware in data-heavy AI applications such as autonomous driving, live video processing, and next-generation cloud computing services.
Abstract: With the proliferation of ultra-high-speed mobile networks and internet-connected devices, along with the rise of artificial intelligence, the world is generating exponentially increasing amounts of data - data that needs to be processed in a fast, efficient and smart way. These developments are pushing the limits of existing computing paradigms, and highly parallelized, fast and scalable hardware concepts are becoming progressively more important. Here, we demonstrate a computational specific integrated photonic tensor core - the optical analog of an ASIC-capable of operating at Tera-Multiply-Accumulate per second (TMAC/s) speeds. The photonic core achieves parallelized photonic in-memory computing using phase-change memory arrays and photonic chip-based optical frequency combs (soliton microcombs). The computation is reduced to measuring the optical transmission of reconfigurable and non-resonant passive components and can operate at a bandwidth exceeding 14 GHz, limited only by the speed of the modulators and photodetectors. Given recent advances in hybrid integration of soliton microcombs at microwave line rates, ultra-low loss silicon nitride waveguides, and high speed on-chip detectors and modulators, our approach provides a path towards full CMOS wafer-scale integration of the photonic tensor core. While we focus on convolution processing, more generally our results indicate the major potential of integrated photonics for parallel, fast, and efficient computational hardware in demanding AI applications such as autonomous driving, live video processing, and next generation cloud computing services.

463 citations

Proceedings ArticleDOI
31 Dec 2012
TL;DR: The Urdhva-triyakbhyam sutra is used for the multiplication of Mantissa and Vedic Multiplication Technique is used to implement IEEE 754 Floating point multiplier.
Abstract: In this paper, Vedic Multiplication Technique is used to implement IEEE 754 Floating point multiplier. The Urdhva-triyakbhyam sutra is used for the multiplication of Mantissa. The underflow and over flow cases are handled. The inputs to the multiplier are provided in IEEE 754, 32 bit format. The multiplier is implemented in VHDL and Virtex-5 FPGA is used.

32 citations


Cites methods from "An efficient implementation of floa..."

  • ...9, Number 1, pp 827-831 2009 [5] Al-Ashrafy, M....

    [...]

  • ...The Exponent Calculation Unit is implemented in this paper using 8 BIT Ripple Carry Adder [5, 6]....

    [...]

Proceedings ArticleDOI
01 Nov 2013
TL;DR: An efficient floating point multiplier using Karatsuba algorithm that implements the significant multiplication along with sign bit and exponent computations is presented.
Abstract: This paper presents an efficient floating point multiplier using Karatsuba algorithm Digital signal processing algorithms and media applications use a large number of multiplications, which is both time and power consuming We have used IEEE 754 format for binary representation of the floating point numbers Verilog HDL is used to implement Karatsuba multiplication algorithm which is technology independent pipelined design This multiplier implements the significant multiplication along with sign bit and exponent computations Three stage pipelining is being used in the design with the latency of 8 clock cycles In this design, the mantissa bits are divided into three parts of particular bit width in such a way so that the multiplication can be done using the standard multipliers available in FPGA cyclone II device family and synthesized using Altera-Quartus II

24 citations


Cites methods from "An efficient implementation of floa..."

  • ...Logic is reduced by simply implementing first seven one subtractors (OS) and two MSB zero subtractors (ZS) [6]....

    [...]

References
More filters
Book
01 Jan 1992
TL;DR: This paper presents a meta-analysis of the Z-Transform and its application to the Analysis of LTI Systems, and its properties and applications, as well as some of the algorithms used in this analysis.
Abstract: 1. Introduction. 2. Discrete-Time Signals and Systems. 3. The Z-Transform and Its Application to the Analysis of LTI Systems. 4. Frequency Analysis of Signals and Systems. 5. The Discrete Fourier Transform: Its Properties and Applications. 6. Efficient Computation of the DFT: Fast Fourier Transform Algorithms. 7. Implementation of Discrete-Time Systems. 8. Design of Digital Filters. 9. Sampling and Reconstruction of Signals. 10. Multirate Digital Signal Processing. 11. Linear Prediction and Optimum Linear Filters. 12. Power Spectrum Estimation. Appendix A. Random Signals, Correlation Functions, and Power Spectra. Appendix B. Random Numbers Generators. Appendix C. Tables of Transition Coefficients for the Design of Linear-Phase FIR Filters. Appendix D. List of MATLAB Functions. References and Bibliography. Index.

3,911 citations

Book
01 Jan 1993
TL;DR: The third edition of the book as mentioned in this paper has been updated with new pedagogical features, such as new information and challenging exercises for the advanced student, as well as a complete index of the material in the book and on the CD appears in the printed index.
Abstract: What's New in the Third Edition, Revised Printing The same great book gets better! This revised printing features all of the original content along with these additional features:. Appendix A (Assemblers, Linkers, and the SPIM Simulator) has been moved from the CD-ROM into the printed book. Corrections and bug fixesThird Edition featuresNew pedagogical features.Understanding Program Performance -Analyzes key performance issues from the programmer's perspective .Check Yourself Questions -Helps students assess their understanding of key points of a section .Computers In the Real World -Illustrates the diversity of applications of computing technology beyond traditional desktop and servers .For More Practice -Provides students with additional problems they can tackle .In More Depth -Presents new information and challenging exercises for the advanced student New reference features .Highlighted glossary terms and definitions appear on the book page, as bold-faced entries in the index, and as a separate and searchable reference on the CD. .A complete index of the material in the book and on the CD appears in the printed index and the CD includes a fully searchable version of the same index. .Historical Perspectives and Further Readings have been updated and expanded to include the history of software R&D. .CD-Library provides materials collected from the web which directly support the text. In addition to thoroughly updating every aspect of the text to reflect the most current computing technology, the third edition .Uses standard 32-bit MIPS 32 as the primary teaching ISA. .Presents the assembler-to-HLL translations in both C and Java. .Highlights the latest developments in architecture in Real Stuff sections: -Intel IA-32 -Power PC 604 -Google's PC cluster -Pentium P4 -SPEC CPU2000 benchmark suite for processors -SPEC Web99 benchmark for web servers -EEMBC benchmark for embedded systems -AMD Opteron memory hierarchy -AMD vs. 1A-64 New support for distinct course goals Many of the adopters who have used our book throughout its two editions are refining their courses with a greater hardware or software focus. We have provided new material to support these course goals: New material to support a Hardware Focus .Using logic design conventions .Designing with hardware description languages .Advanced pipelining .Designing with FPGAs .HDL simulators and tutorials .Xilinx CAD tools New material to support a Software Focus .How compilers work .How to optimize compilers .How to implement object oriented languages .MIPS simulator and tutorial .History sections on programming languages, compilers, operating systems and databases On the CD.NEW: Search function to search for content on both the CD-ROM and the printed text.CD-Bars: Full length sections that are introduced in the book and presented on the CD .CD-Appendixes: Appendices B-D .CD-Library: Materials collected from the web which directly support the text .CD-Exercises: For More Practice provides exercises and solutions for self-study.In More Depth presents new information and challenging exercises for the advanced or curious student .Glossary: Terms that are defined in the text are collected in this searchable reference .Further Reading: References are organized by the chapter they support .Software: HDL simulators, MIPS simulators, and FPGA design tools .Tutorials: SPIM, Verilog, and VHDL .Additional Support: Processor Models, Labs, Homeworks, Index covering the book and CD contents Instructor Support Instructor support provided on textbooks.elsevier.com:.Solutions to all the exercises .Figures from the book in a number of formats .Lecture slides prepared by the authors and other instructors .Lecture notes

1,521 citations

01 Nov 1997
TL;DR: Recognizing the mannerism ways to get this books computer organization and design the hardware software interface 4th fourth edition by patterson hennessy is additionally useful.
Abstract: Recognizing the mannerism ways to get this books computer organization and design the hardware software interface 4th fourth edition by patterson hennessy is additionally useful. You have remained in right site to begin getting this info. acquire the computer organization and design the hardware software interface 4th fourth edition by patterson hennessy join that we manage to pay for here and check out the link.

832 citations

Proceedings ArticleDOI
19 Apr 1995
TL;DR: Using higher-level languages, like VHDL, facilitates the development of custom operators without significantly impacting operator performance or area, as well as properties, including area consumption and speed of working arithmetic operator units used in real-time applications.
Abstract: Many algorithms rely on floating point arithmetic for the dynamic range of representations and require millions of calculations per second. Such computationally intensive algorithms are candidates for acceleration using custom computing machines (CCMs) being tailored for the application. Unfortunately, floating point operators require excessive area (or time) for conventional implementations. Instead, custom formats, derived for individual applications, are feasible on CCMs, and can be implemented on a fraction of a single FPGA. Using higher-level languages, like VHDL, facilitates the development of custom operators without significantly impacting operator performance or area. Properties, including area consumption and speed of working arithmetic operator units used in real-time applications, are discussed.

248 citations


"An efficient implementation of floa..." refers methods in this paper

  • ...In [3], a custom 16/18 bit three stage pipelined floating point multiplier that doesn’t support rounding modes was implemented....

    [...]