scispace - formally typeset
Search or ask a question

Showing papers on "Residue number system published in 2013"


Proceedings ArticleDOI
07 Apr 2013
TL;DR: A new RNS Montgomery multiplication algorithm with fault detection capability with additionnally compatible with a leak resistant arithmetic is proposed and an architecture that implements the proposed algorithm is presented.
Abstract: Recent studies have demonstrated the importance of protecting the hardware implementations of cryptographic functions against side channel and fault attacks. In last years, very efficient implementations of modular arithmetic have been done in RNS (RSA, ECC, pairings) as well on FPGA as on GPU. Thus the protection of RNS Montgomery modular multiplication is a crucial issue. For that purpose, some techniques have been proposed to protect this RNS operation against side channel analysis. Nevertheless, there are still no effective and generic approaches for the detection of fault injection, which would be additionnally compatible with a leak resistant arithmetic. This paper proposes a new RNS Montgomery multiplication algorithm with fault detection capability. A mathematical analysis demonstrates the validity of the proposed approach. Moreover, an architecture that implements the proposed algorithm is presented. A comparative analysis shows that the introduction of the proposed fault detection technique requires only a limited increase in area.

28 citations


Patent
30 Apr 2013
TL;DR: In one or more implementations, an RNS ALU or processor comprises a plurality of digit slices configured to perform modular arithmetic functions as mentioned in this paper, and operation of the digit slices may be controlled by a controller.
Abstract: Methods and systems for residue number system based ALUs, processors, and other hardware provide the full range of arithmetic operations while taking advantage of the benefits of the residue numbers in certain operations. In one or more embodiments, an RNS ALU or processor comprises a plurality of digit slices configured to perform modular arithmetic functions. Operation of the digit slices may be controlled by a controller. Residue numbers may be converted to and from fixed or mixed radix number systems for internal use and for use in various computing systems.

27 citations


Journal ArticleDOI
20 Jan 2013
TL;DR: Experimental results obtained for the supported back-ends are presented targeting the implementation of the modular exponentiation used in the Rivest-Shamir-Adleman (RSA) algorithm and Elliptic Curve (EC) point multiplication and suggest competitive latency and throughput with minimum design effort.
Abstract: This article proposes the Computing with the ResidueNumber System (CRNS) framework, which aims at the design automation of accelerators for Modular Arithmetic (MA). The framework provides a comprehensive set of tools ranging from a programming language and respective compiler to back-ends targeting parallel computation platforms such as Graphical Processing Units (GPUs) and reconfigurable hardware. Given an input algorithm described with a high-level programming language, the CRNS can be used to obtain in a few seconds the corresponding optimized Parallel Thread Execution (PTX) program ready to be run on GPUs or the Hardware Description Language (HDL) specification of a fully functional accelerator suitable for reconfigurable hardware and embedded systems. The resulting framework's implementations benefit from the Residue Number System (RNS) arithmetic's parallelization properties in a fully automated way. Designers do not need to be familiar with the mathematical details concerning the employed arithmetic, namely the RNS representation. In order to thoroughly describe and evaluate the proposed framework, experimental results obtained for the supported back-ends (GPU and HDL) are presented targeting the implementation of the modular exponentiation used in the Rivest-Shamir-Adleman (RSA) algorithm and Elliptic Curve (EC) point multiplication. Results suggest competitive latency and throughput with minimum design effort and overcoming all the development issues that arise in the specification and verification of dedicated solutions.

25 citations


Journal ArticleDOI
TL;DR: This paper presents a new design of efficient residue generators and the design approach is demonstrated with large input wordlength of 64 bits for arbitrary moduli of up to 6 bits, and the proposed design eliminates the bottleneck carry propagation additions and modular adder tree of existing designs, and circumvents the undesirably high architectural disparity.
Abstract: Recent analyses demonstrate that operations in some bases of Residue Number System (RNS) exhibit higher resiliency to process variations than in normal binary number system. Under this premise, arbitrary moduli offer greater flexibility in forming high cardinality balanced RNS with variation-insensitive small residue operations for a given dynamic range. Limited in number theoretic property, converting an integer into residue for an arbitrary modulus is as difficult as complex arithmetic operation, particularly for very large wordlength ratio of integer to modulus. This paper presents a new design of efficient residue generators and the design approach is demonstrated with large input wordlength of 64 bits for arbitrary moduli of up to 6 bits. The proposed design eliminates the bottleneck carry propagation additions and modular adder tree of existing designs, and circumvents the undesirably high architectural disparity for different moduli of inconsistent cyclic periodicity. Our experimental results on moduli of different periodicities show that the proposed design is on average 27.7% faster and 28.7% smaller than the state-of-the-art residue generator. Our power simulation results also show that the proposed residue generator has on average reduced the total power and the leakage power of the latter by 44.5% and 24.7%, respectively.

25 citations


Book ChapterDOI
20 Aug 2013
TL;DR: A new RNS modular inversion algorithm based on the extended Euclidean algorithm and the plus-minus trick is described, which is 6 to 10 times faster than an RNS version based on Fermat's little theorem.
Abstract: The paper describes a new RNS modular inversion algorithm based on the extended Euclidean algorithm and the plus-minus trick. In our algorithm, comparisons over large RNS values are replaced by cheap computations modulo 4. Comparisons to an RNS version based on Fermat's little theorem were carried out. The number of elementary modular operations is significantly reduced: a factor 12 to 26 for multiplications and 6 to 21 for additions. Virtex 5 FPGAs implementations show that for a similar area, our plus-minus RNS modular inversion is 6 to 10 times faster.

22 citations


Journal ArticleDOI
TL;DR: In this brief, a fast and area efficient 2n signed integer RNS scaler for the moduli set {2n-1, 2n,2n+1} is proposed and achieves at least 21.6% of area saving, 28.8% of speedup, and 32.5% of total power reduction for n ranging from 5 to 8.
Abstract: Scaling is a problematic operation in residue number system (RNS) but a necessary evil in implementing many digital signal processing algorithms for which RNS is particularly good. Existing signed integer RNS scalers entail a dedicated sign detection circuit, which is as complex as the magnitude scaling operation preceding it. In order to correct the incorrectly scaled negative integer in residue form, substantial hardware overheads have been incurred to detect the range of the residues upon magnitude scaling. In this brief, a fast and area efficient 2n signed integer RNS scaler for the moduli set {2n-1, 2n, 2n+1} is proposed. A complex sign detection circuit has been obviated and replaced by simple logic manipulation of some bit-level information of intermediate magnitude scaling results. Compared with the latest signed integer RNS scalers of comparable dynamic ranges, the proposed architecture achieves at least 21.6% of area saving, 28.8% of speedup, and 32.5% of total power reduction for n ranging from 5 to 8.

22 citations


Proceedings ArticleDOI
14 Nov 2013
TL;DR: The increase of data transmission reliability increasing in WSN with the use of Residue Number System error correcting code is considered.
Abstract: The WSN standard IEEE802.15.4 basically uses frequency of 2,4 GHz for data transmission. This unlicensed frequency band is used by a variety of devices, standards and applications: IEEE802.11, Bluetooth and etc. In this paper is considered the increase of data transmission reliability increasing in WSN with the use of Residue Number System error correcting code. These codes have high correcting characteristics and simplified coding procedure.

20 citations


Proceedings Article
20 Jun 2013
TL;DR: The proposed work proves that using the RNS results in faster and power-reduced image filtering applications, and highlights the negative effect of using improper moduli sets in RNS based image-processing applications.
Abstract: This paper presents an implementation of a digital image processing application on the field programmable gate array (FPGA) using the residue number system (RNS). This application includes applying a number of filters in spatial domain, such as sharpen, edge detection and enhancing on a gray-scale image. All the processing is done using the RNS instead of the binary number system. The proposed work proves that using the RNS results in faster and power-reduced image filtering applications. Moreover, this paper highlights the negative effect of using improper moduli sets in RNS based image-processing applications.

18 citations


Journal ArticleDOI
TL;DR: A novel algorithm and its VLSI implementation structure are proposed for modulo 2n-2k-1 adder to eliminate the re-computation of carries and get flexible tradeoff between area and delay.
Abstract: Modular adder is one of the key components for the application of residue number system (RNS) Moduli set with the form of 2n-2k-1 (1 ≤ k ≤ n-2) can offer excellent balance among the RNS channels for multi-channels RNS processing In this paper, a novel algorithm and its VLSI implementation structure are proposed for modulo 2n-2k-1 adder In the proposed algorithm, parallel prefix operation and carry correction techniques are adopted to eliminate the re-computation of carries Any existing parallel prefix structure can be used in the proposed structure Thus, we can get flexible tradeoff between area and delay with the proposed structure Compared with same type modular adder with traditional structures, the proposed modulo 2n-2k-1 adder offers better performance in delay and area

18 citations


Proceedings ArticleDOI
04 Sep 2013
TL;DR: This paper proposes a robustness evaluation of an RSA cryptosystem against collision attacks and correlation electromagnetic analysis, and explores the robustness of RNS-RSA against EM analyses.
Abstract: This paper proposes a robustness evaluation of an RSA cryptosystem against collision attacks and correlation electromagnetic analysis. Our hardware co-processor is based on the Residue Number System (RNS) in order to perform modular operations over large numbers. To increase its robustness against Side-Channel Analysis, we implemented two different countermeasures. The first one spatially permutates the elements of the RNS bases in order to blur electromagnetic emanations. The second countermeasure aims at randomizing RNS bases before each modular exponentiation. To the best knowledge of authors, this is the first paper that explores the robustness of RNS-RSA against EM analyses.

14 citations


Proceedings ArticleDOI
01 Jul 2013
TL;DR: This work proposes a novel architecture for image filtering and edge detection using the Residue Number System (RNS), and dictates that the employment of RNS for the arithmetic processing leads to small hardware complexity and high operation frequency.
Abstract: Image edge detection plays a fundamental role in image processing as well as in computer and machine vision while when applied in medical images can improve medical diagnosis. Thus, efficient hardware implementation of an edge detection system is highly appreciated. In this work we propose a novel architecture for image filtering and edge detection using the Residue Number System (RNS). A VLSI implementation of the proposed architecture dictates that the employment of RNS for the arithmetic processing leads to small hardware complexity and high operation frequency.

Proceedings ArticleDOI
19 May 2013
TL;DR: It is shown that the RNS substantially reduces the filter sensitivity to delay variations, when compared to digital filter designs that use conventional positional number systems, such as the widely-used two's-complement representation.
Abstract: This paper investigates the use of the Residue Number System (RNS) in the hardware design of VLSI FIR filters implemented in nano-scale technologies prone to process variation effects. It is here shown that the RNS substantially reduces the filter sensitivity to delay variations, when compared to digital filter designs that use conventional positional number systems, such as the widely-used two's-complement representation. The inherent tolerance of the introduced RNS architectures to the delay variations, is here shown to allow to circumvent the use of large design parameter margins. Therefore, we demonstrate that the use of RNS can achieve a high timing yield without resorting to costly over-design, which may unnecessarily increase system complexity. The particular benefit comes in addition to area, time and power benefits achieved due to the use of the RNS. The quantitative digital filter design space exploration reported in the paper takes into consideration the filter order as well as criteria related to the filter output signal quality such as the signal-to-noise ratio (SNR) and it demonstrates that the proposed architectures offer effective solutions for hardware design using modern and future nano-scale processes, for filter cases of practical interest.

Journal ArticleDOI
TL;DR: The Cyclic Property of Residue-Digit Difference (CPRDD) and the Target Race Distance (TRD) are presented, which are used to speed up residue scaling and error correction in residue number systems without the need for Mixed Radix Conversion or Chinese Residues Theorem techniques.
Abstract: In this paper, we present two new algorithms in residue number systems for scaling and error correction. The first algorithm is the Cyclic Property of Residue-Digit Difference (CPRDD). It is used to speed up the residue multiple error correction due to its parallel processes. The second is called the Target Race Distance (TRD). It is used to speed up residue scaling. Both of these two algorithms are used without the need for Mixed Radix Conversion (MRC) or Chinese Residue Theorem (CRT) techniques, which are time consuming and require hardware complexity. Furthermore, the residue scaling can be performed in parallel for any combination of moduli set members without using lookup tables.

Journal ArticleDOI
TL;DR: Theoretical analysis and simulation results demonstrate that the proposed residue number system (RNS)-based OFDM parallel transmission scheme has the ability to achieve desirable PAPR reduction and low computational complexity without nonlinear distortion.
Abstract: The peak-to-average power (PAPR) is one of the main challenges in multicarrier transmissions. Aiming at reducing the PAPR, we propose a residue number system (RNS)-based OFDM parallel transmission scheme. The key idea of the proposed scheme is to utilize the parallel property of RNS to convert the input signals into the parallel smaller residue signals while utilizing the characteristic of RNS modular operation to effectively limit the output in each residue subchannel after inverse fast Fourier transform, which is smaller than the corresponding modulus. The main contribution of the proposed scheme is to reduce the dynamic range of the transmitted signal without nonlinear distortion so as to reduce the PAPR during the transmission. A generalized performance of the proposed scheme is analyzed in this paper, including the PAPR reduction, the complexity, the transmission bandwidth, etc. Also, an approximate formula to calculate the transmission bandwidth of the proposed scheme is derived, which simplifies design procedure in practice and implies that a minor increase of the dynamic range of RNS will bring comparative improvement of the transmission bandwidth consumption. Theoretical analysis and simulation results demonstrate that the proposed scheme has the ability to achieve desirable PAPR reduction and low computational complexity without nonlinear distortion.

Proceedings ArticleDOI
01 Nov 2013
TL;DR: A novel algorithm for single error correction in a redundant residue number system (RRNS) by introducing the pseudo syndromes to represent the extention of received residues by implementing the parallelism property of the Chinese remainder theory (CRT).
Abstract: In this paper, we propose a novel algorithm for single error correction in a redundant residue number system (RRNS). First, the pseudo syndromes are introduced to represent the extention of received residues by implementing the parallelism property of the Chinese remainder theory (CRT). And then for each pseudo syndrome, we modify it by canceling a list of specific values parallelly. For each specific value, we define the corresponding result as the assuming syndrome. Under specific conditions there exists one and only one assuming syndrome equal to the actual sydrome. At last, the assuming syndromes is used to locate a unique error through a mapping we will build in advance. The hardware complexity of this new algorithm is O(k) and the latency is O(log2k).

Proceedings ArticleDOI
01 Nov 2013
TL;DR: Four fast and small architectures for these specific moduli targeting modern FPGAs with fast carry chains, designed for binary and diminished-one representation with and without zero value management are presented.
Abstract: Modular addition is a widely used operation in Residue Number System applications. Specific sets of moduli allow fast RNS operations such as binary conversions and multiplications. Most of them use modulo 2n − 1 and 2n ; 1 additions. This paper presents four fast and small architectures for these specific moduli targeting modern FPGAs with fast carry chains. The use of this arithmetic dedicated feature allows fast and small modular adders. Our modulo 2n − 1 adders have a single zero representation. Our modulo 2n;1 adders are designed for binary and diminished-one representation with and without zero value management.

Proceedings ArticleDOI
01 Oct 2013
TL;DR: This paper addresses a recent work on modulo-(2n ± 3) multipliers that are realized as normal n-bit multipliers, followed by conversion of 2n-bit products to RNS residues, and aims to enhance the performance of such modular multipliers via eliminating the carry propagate adder that operates at the end of preliminary binary multiplication.
Abstract: Modular adders and multipliers have applications in residue number system (RNS) arithmetic, cryptography, and error-checking, where general architectures are usually designed for moduli of the form 2n±k ± 1, with very efficient realizations. However, less efficient arithmetic circuits also occasionally appear in the relevant literature for moduli of the form 2n ± δ, where δ is an odd integer and δ ≠1. In particular, adders, multipliers and RNS converters have been recently offered for modulo-(2n ± 3). In this paper, we address a recent work on modulo-(2n ± 3) multipliers that are realized as normal n-bit multipliers, followed by conversion of 2n-bit products to RNS residues. We aim to enhance the performance of such modular multipliers via eliminating the carry propagate adder that operates at the end of preliminary binary multiplication. Analytical and synthesis based evaluation has shown improvements in latency and power dissipation. Also our designs require less area consumption for the same delay.

Proceedings ArticleDOI
25 Nov 2013
TL;DR: A new architecture for carry-free BSD-RNS addition utilizing a recently proposed posibit and negabit BSD representation is presented, which has 21% less delay and more area and less power than the most efficient existing BSD -RNS adder.
Abstract: Binary Signed-Digit Residue Number System (BSD-RNS) has been proposed in the literatures as an appropriate number system to perform the arithmetic operations in parallel. BSD-RNS addition is the basic operation and improving its performance results in efficient VLSI arithmetic circuits. Here, we present a new architecture for carry-free BSD-RNS addition utilizing a recently proposed posibit and negabit BSD representation. Compared to 2's complement BSD-RNS adder, the proposed architecture has 21% less delay. Besides, for a same delay (0.6ns), we obtain 48% less area and 28% less power than the most efficient existing BSD-RNS adder.

Proceedings ArticleDOI
02 Jul 2013
TL;DR: The paper discusses the Residue Number System (RNS) implementation of all-pole recursive digital filter, which offers the advantage of using arithmetic based on integer operations and a simple hardware realization involving arithmetic operations implemented in parallel.
Abstract: The paper discusses the Residue Number System (RNS) implementation of all-pole recursive digital filter. The RNS offers the advantage of using arithmetic based on integer operations and a simple hardware realization involving arithmetic operations implemented in parallel. In residue number system of IIR filter needs a scaling operation which is necessary to keep data within the limited dynamic range. The effects of residue scaling to performances of recursive digital filters are described here.

Journal Article
TL;DR: The Aryabhata remainder theorem (ART) is used to achieve the functionality of a t-out-of-n OT protocol, which is more efficient than Chang and Lee’s mechanism and utilized BAN logic to prove that the proposed protocol maintains the security when messages are transmitted between the sender and the receiver.
Abstract: Because the t-out-of-n oblivious transfer (OT) protocol can guarantee the privacy of both participants, i.e., the sender and the receiver, it has been used extensively in the study of cryptography. Recently, Chang and Lee presented a robust t-out-of-n OT protocol based on the Chinese remainder theorem (CRT). In this paper, we use the Aryabhata remainder theorem (ART) to achieve the functionality of a t-out-of-n OT protocol, which is more efficient than Chang and Lee’s mechanism. Analysis showed that our proposed protocol meets the fundamental requirements of a general t-out-of-n OT protocol. We also utilized BAN logic to prove that our proposed protocol maintains the security when messages are transmitted between the sender and the receiver.

Proceedings ArticleDOI
01 Jul 2013
TL;DR: A simple methodology for designing reverse converters based on the Chinese Remainder Theorem (CRT) or the New CRT-I method is introduced and proposed converters are shown to be area, delay and power efficient for several moduli sets.
Abstract: The diminished-one encoding is often considered when representing the operands in the modulo 2k+1 channels of a Residue Number System (RNS) since it can offer increased arithmetic processing speed. However, limited research is available on the design of residue-to-binary (reverse) converters for RNSs that use the diminished-one encoding in one or more channels. In this paper we introduce a simple methodology for designing such converters which can be applied to reverse converters based on the Chinese Remainder Theorem (CRT) or the New CRT-I method. Efficient converters for three moduli sets, covering different dynamic ranges, are also analytically presented. The proposed converters are shown to be area, delay and power efficient for several moduli sets.

Proceedings ArticleDOI
02 Dec 2013
TL;DR: A novel implementation of three moduli set scaling system and the constant scaling factor is presented and the main difference of the proposed system to existing scaling systems includes elimination of the overhead conversion system and employing modular reducers in implementation of the design.
Abstract: The need for faster digital circuitry has turned the researchers to alternative number systems and arithmetic level modifications. One such number system being used is the Residue Number System (RNS). With computationally fast addition, subtraction and multiplication it has become widely used in many vast areas of Digital Signal Processing (DSP). The drawback to using RNS is that it has several computationally slow and resource intense operations, most notably, scaling and conversion. This paper presents a novel implementation of three moduli set {2n - 1, 2n, 2n + 1} scaling system and the constant scaling factor. The main difference of the proposed system to existing scaling systems includes elimination of the overhead conversion system and employing modular reducers in implementation of the design. Many algorithms for fast scaling in RNS exist, but this paper will focus on developing a particular algorithm using adder based techniques.

Journal ArticleDOI
05 Mar 2013
TL;DR: A technique, based on the residue number system with diminished-1 encoded channel, has being used for implementing a finite impulse response (FIR) digital filter and the systolic design is used for the efficient realization of FIR filter.
Abstract: A technique, based on the residue number system (RNS) with diminished-1 encoded channel, has being used for implementing a finite impulse response (FIR) digital filter. The proposed RNS architecture of the filter consists of three main blocks: forward and reverse converter and arithmetic processor for each channel. Architecture for residue to binary (reverse) convertor with diminished-1 encoded channel has been proposed. Besides, for all RNS channels, the systolic design is used for the efficient realization of FIR filter. A numerical example illustrates the principles of diminished-1 residue arithmetic, signal processing, and decoding for FIR filters.

Book ChapterDOI
20 Jul 2013
TL;DR: Experimental results show that the proposed crypto-processor with coarse-grained reconfigurable datapath has better tradeoff among algorithm flexibility, performance and area than other related works.
Abstract: This paper proposes a unified and programmable crypto-processor with coarse-grained reconfigurable datapath to perform either RSA or elliptic curve cryptosystems (ECC) over prime field GF(p), which uses Residue Number System (RNS) as basic arithmetic to exploit data-level parallelism and Transport Triggered Architecture to improve instruction-level parallelism. The reconfigurable datapath provides three configuration modes to accelerate the RNS Montgomery multiplication(RNSMM). An efficient RNS base, 2n − ci, is chosen to reduce the multiplication complexity of RNSMM. Experimental results show that the proposed processor has better tradeoff among algorithm flexibility, performance and area than other related works.

Proceedings ArticleDOI
02 Dec 2013
TL;DR: This work contributes to the modular multiplication operation C = A × B, the basis of many public-key cryptosystems including RSA and Elliptic Curve Cryptography (ECC).
Abstract: This work contributes to the modular multiplication operation C = A × B, the basis of many public-key cryptosystems including RSA and Elliptic Curve Cryptography (ECC). We use the Residue Number System (RNS) to speed up long wordlength modular multiplication. The RNS leads to a highly parallel algorithm which we exploit with a massively parallel hardware implementation capable of exceptionally low latency. This paper presents architecture for this scheme consisting of a scalable array of identical processing elements.

01 Jan 2013
TL;DR: The proposed multipliers are considerably faster and more compact than that of the hardware implementations, which make them a viable option for efficient designs.
Abstract: Modular multiplication plays an important role in encryption. One of the encryption methods which need fast modular multiplication is RSA where large numbers are needed to empower large modules. In such methods, in order to show numbers, RNS is usually used with multiplication as the core. Modulo 2 n +1 multipliers are the primitive computational logic components widely used in residue arithmetic, digital signal processing, faulttolerant and cryptography. Here, two residue number system multipliers are introduced, both based on classifications of couples or triplets of input operands, which results in a low complexity RNS multiplier. The first modular multiplier is a combinational circuit which enables parallel prefix adder application in modulo 2 n +1. The second modulo 2 n +1 multiplier uses n+1 partial product, each with $n$ bit width, constructed by utilizing an inverted end-around-carry, carry save adder (CSA) tree and a parallel adder at the end. The performance and efficiently of the proposed multipliers are evaluated and compared with that of the earlier fastest modulo 2 n +1 multipliers. The proposed multipliers are considerably faster and more compact than that of the hardware implementations, which make them a viable option for efficient designs.

Proceedings ArticleDOI
25 Nov 2013
TL;DR: The proposed residue generator is well suited for Residue Number System (RNS) based applications which use both modulo 2n+1 and 2-1 residues.
Abstract: In this paper, we propose an efficient residue generator which concurrently computes the residues modulo 2n+1 and modulo 2n-1. The input operands are divided into n-bit vectors which are then grouped into two sets and added by two separate Carry Save Adder (CSA) trees. The output carry of each stage of the first CSA tree is used as input to the corresponding stage of the second CSA tree, while the output vectors of the trees are finally added modulo 2n+1 and 2n-1 to compute the residues. The proposed residue generator is well suited for Residue Number System (RNS) based applications which use both modulo 2n+1 and 2n-1 residues. An efficient configurable modulo 2n±1 residue generator is also proposed.

Proceedings ArticleDOI
17 Oct 2013
TL;DR: The Residue Number System (RNS) has great potential for accelerating arithmetic operation, achieved by breaking operands into several residues and computing with them independently.
Abstract: The paper is a part of student cooperation in AKTION project (Austria-Czech). Theoretical work on the numerical solution of ordinary differential equations by the Taylor series method has been going on for a number of years. The simulation language TKSL has been created to test the properties of the technical initial problems and to test an algorithm for Taylor series method [1]. The Residue Number System (RNS) has great potential for accelerating arithmetic operation, achieved by breaking operands into several residues and computing with them independently.

Journal Article
TL;DR: In this work a scaling technique of signed residue numbers is proposed based on conversion to the Mixed-Radix System adapted for the FPGA implementation and the basic blocks of the scaler are realized in the form of the modified two-operand modulo adders with included additional multiply and modulo reduction operations.
Abstract: In this work a scaling technique of signed residue numbers is proposed. The method is based on conversion to the Mixed-Radix System(MRS) adapted for the FPGA implementation. The scaling factor is assumed to be a moduli product from the Residue Number System (RNS) base. Scaling is performed by scaling of terms of the mixed-radix expansion, generation of residue representations of scaled terms, binary addition of these representations and generation of residues for all moduli. The sign is detected on the basis of the value of the most significant coefficient of the MRS representation. For negative numbers their residues are adequately corrected. The basic blocks of the scaler are realized in the form of the modified two-operand modulo adders with included additional multiply and modulo reduction operations. The pipelined realization of the scaler in the Xilinx environment is shown and analyzed with respect to hardware amount and maximum pipelining frequency. The design is based on the LUTs(26x 1) that simulate small RAMs serving as the main component for the look-up realization.

Book ChapterDOI
01 Jan 2013
TL;DR: This chapter attempts to review opportunities to exploit the bit-level flexibility of the target to match the arithmetic to the application and systematically investigates non-standard precisions, but also non-Standard number systems and non- standard operations which can be implemented efficiently on reconfigurable hardware.
Abstract: An often overlooked way to increase the efficiency of HPC on FPGA is to exploit the bit-level flexibility of the target to match the arithmetic to the application. The ideal operator, for each elementary computation, should toggle and transmit just the number of bits required by the application at this point. FPGAs have the potential to get much closer to this ideal than microprocessors. Therefore, reconfigurable computing should systematically investigate non-standard precisions, but also non-standard number systems and non-standard operations which can be implemented efficiently on reconfigurable hardware. This chapter attempts to review these opportunities systematically.