Showing papers in &quot;Iet Computers and Digital Techniques in 2007&quot;

VLSI architecture and chip for combined invisible robust and fragile watermarking

TL;DR: A novel design is provided for the BCD-digit multiplier, which can serve as the key building block of a decimal multiplier, irrespective of the degree of parallelism, in semi- and fully parallel hardware decimal multiplication units.

...read moreread less

Abstract: With the growing popularity of decimal computer arithmetic in scientific, commercial, financial and Internet-based applications, hardware realisation of decimal arithmetic algorithms is gaining more importance. Hardware decimal arithmetic units now serve as an integral part of some recently commercialised general purpose processors, where complex decimal arithmetic operations, such as multiplication, have been realised by rather slow iterative hardware algorithms. However, with the rapid advances in very large scale integration (VLSI) technology, semi- and fully parallel hardware decimal multiplication units are expected to evolve soon. The dominant representation for decimal digits is the binary-coded decimal (BCD) encoding. The BCD-digit multiplier can serve as the key building block of a decimal multiplier, irrespective of the degree of parallelism. A BCD-digit multiplier produces a two-BCD digit product from two input BCD digits. We provide a novel design for the latter, showing some advantages in BCD multiplier implementations.

...read moreread less

73 citations

Journal Article•DOI•

[...]

Saraju P. Mohanty¹, Elias Kougianos¹, Nagarajan Ranganathan²•Institutions (2)

University of North Texas¹, University of South Florida²

Design of efficient modulo 2/sup n/ + 1 multipliers

TL;DR: The development of a very-large-scale integration architecture for a high-performance watermarking chip is presented which can perform both invisible robust and invisible fragile image water marking in the spatial domain.

...read moreread less

Abstract: Research in digital watermarking is mature. Several software implementations of watermarking algorithms are described in the literature, but few attempts have been made to describe hardware implementations. The ultimate objective of the research is to develop low-power, high- performance, real-time, reliable and secure watermarking systems, which can be achieved through hardware implementations. The development of a very-large-scale integration architecture for a high-performance watermarking chip is presented which can perform both invisible robust and invisible fragile image watermarking in the spatial domain. The watermarking architecture is prototyped in two ways: (i) by using a Xilinx field-programmable gate array and (ii) by building a custom integrated circuit. This prototype is the first watermarking chip with both invisible robust and invisible fragile watermarking capabilities.

...read moreread less

68 citations

Journal Article•DOI•

[...]

Haridimos T. Vergos¹, C. Efstathiou•Institutions (1)

University of Patras¹

05 Mar 2007-Iet Computers and Digital Techniques

TL;DR: Qualitative and quantitative comparisons indicate that the proposed multipliers compare favourably with the earlier solutions.

...read moreread less

Abstract: A new modulo 2 n +1 multiplier architecture is proposed for operands in the weighted representation. A new set of partial products is derived and it is shown that all required correction factors can be merged into a single constant one. It is also proposed that part of the correction factor is treated as a partial product, whereas the rest is handled by the final parallel adder. The proposed multipliers utilise a total of (n+1) partial products, each n bits wide and are built using an inverted end-around-carry, carry-save adder tree and a final adder. Area and delay qualitative and quantitative comparisons indicate that the proposed multipliers compare favourably with the earlier solutions

...read moreread less

65 citations

Journal Article•DOI•

Improving residue number system multiplication with more balanced moduli sets and enhanced modular arithmetic structures

[...]

Ricardo Chaves, Leonel Sousa

Comparison of the cost metrics through investigation of the relation between optimal NCV and optimal NCT three-qubit reversible circuits

TL;DR: Two complementary approaches have been proposed to improve the efficiency of RNS based on triple moduli sets: enhancing multipliers modulo 2n+1, which perform the most complex arithmetic operation, and overloading the binary channel in order to obtain a more balanced moduli set.

...read moreread less

Abstract: Residue number systems (RNS) are non-weighted systems that allow to perform addition, subtraction and multiplication operations concurrently and independently on each residue. The triple moduli set {2n−1, 2n, 2n+1} and its respective extensions have gained unprecedent importance in RNS, mainly because of the simplicity of the arithmetic units for the individual channels and also of the converters to and from RNS. However, there is neither a perfect balance between the various elements of this moduli set nor an exact equivalence in the complexity of the individual arithmetic units for each individual residue. Two complementary approaches have been proposed to improve the efficiency of RNS based on this type of moduli sets: enhancing multipliers modulo 2n+1, which perform the most complex arithmetic operation, and overloading the binary channel in order to obtain a more balanced moduli set. Experimental results show that, when applied together, these techniques can improve the efficiency of the multipliers up to 32%.

...read moreread less

58 citations

Journal Article•DOI•

[...]

Dmitri Maslov¹, D.M. Miller²•Institutions (2)

University of Waterloo¹, University of Victoria²

Non-uniform random number generation through piecewise linear approximations

TL;DR: This work provides basic results and motivation for continued study of the direct synthesis of NCV circuits, and establishes relations between function realizations in different circuit cost metrics.

...read moreread less

Abstract: A breadth-first search method for determining optimal three-qubit circuits composed of quantum NOT, CNOT, controlled-V and controlled-V + (NCV) gates is introduced. Results are presented for simple gate count and for technology-motivated cost metrics. The optimal NCV circuits are also compared with NCV circuits derived from optimal NOT, CNOT and Toffoli (NCT) gate circuits. This work provides basic results and motivation for continued study of the direct synthesis of NCV circuits, and establishes relations between function realizations in different circuit cost metrics

...read moreread less

48 citations

Journal Article•DOI•

[...]

David B. Thomas¹, Wayne Luk¹•Institutions (1)

Imperial College London¹

Debug enhancements in assertion-checker generation

TL;DR: Comparisons with Gaussian specific generators show that the new architecture uses less than half the resources, provides a higher sample rate, and retains statistical quality for up to 50 billion samples, but can also generate other distributions.

...read moreread less

Abstract: A hardware architecture for non-uniform random number generation, which allows the generator's distribution to be modified at run-time without reconfiguration is presented. The architecture is based on a piecewise linear approximation, using just one table lookup, one comparison and one subtract operation to map from a uniform source to an arbitrary non-uniform distribution, resulting in very low area utilisation and high speeds. Customisation of the distribution is fully automatic, requiring less than a second of CPU time to approximate a new distribution, and typically around 1000 cycles to switch distributions at run-time. Comparison with Gaussian-specific generators shows that the new architecture uses less than half the resources, provides a higher sample rate and retains statistical quality for up to 50 billion samples, but can also generate other distributions. When higher statistical quality is required and multiple samples are required per cycle, a two-level piecewise generator can be used, reducing the RAM required per generated sample while retaining the simplicity and speed of the basic technique.

...read moreread less

45 citations

Journal Article•DOI•

[...]

Marc Boule¹, J.-S. Chenard¹, Zeljko Zilic¹•Institutions (1)

McGill University¹

05 Nov 2007-Iet Computers and Digital Techniques

TL;DR: Key features such as assertion threading, activity monitors, assertion and cover counters and completion mode assertions are presented to provide better and more diversified ways to achieve visibility within the assertion circuits, which, in turn, lead to more efficient circuit debugging.

...read moreread less

Abstract: Although assertions are a great tool for aiding debugging in the design and implementation verification stages, their use in silicon debug has been limited so far. A set of techniques for debugging with the assertions in either pre-silicon or post-silicon scenarios are discussed. Presented are features such as assertion threading, activity monitors, assertion and cover counters and completion mode assertions. The common goal of these checker enhancements is to provide better and more diversified ways to achieve visibility within the assertion circuits, which, in turn, lead to more efficient circuit debugging. Experimental results show that such modifications can be done with modest checker hardware overhead.

...read moreread less

32 citations

Journal Article•DOI•

High performance physical random number generator

[...]

K.H. Tsoi¹, Ka Hei Leung¹, Philip H. W. Leong¹•Institutions (1)

The Chinese University of Hong Kong¹

Built-in time measurement circuits -- a comparative design study

TL;DR: A field programmable gate array (FPGA) -based implementation of a physical random number generator (PRNG) that can be implemented completely in digital technology, requires no external components, is very small in area, achieves very high throughput and has good statistical properties.

...read moreread less

Abstract: A field programmable gate array (FPGA) -based implementation of a physical random number generator (PRNG) is presented. The PRNG uses an alternating step generator construction to decorrelate an oscillator-phase-noise-based physical random source. The resulting design can be implemented completely in digital technology, requires no external components, is very small in area, achieves very high throughput and has good statistical properties. The PRNG was implemented on an FPGA device and tested using the NIST, Diehard and TestU01 random number test suites.

...read moreread less

31 citations

Journal Article•DOI•

[...]

M.A. Abas¹, G. Russell¹, D.J. Kinniment¹•Institutions (1)

Newcastle University¹

Optimised realisations of large integer multipliers and squarers using embedded block

TL;DR: The results of the analysis indicate that TDIM is the most efficient of the three circuits analysed; this method has been incorporated in a high-resolution time measurement system in the sub-picosecond range and has subsequently been fabricated by Sun Microsystems.

...read moreread less

Abstract: An increasingly important issue in the implementation of high-performance circuits using either System-on-Chip or System-in-Package technology is ensuring the correct timing performance at the input/output interfaces of cores or chips. These interfaces are not accessible to conventional Automatic Test Equipment (ATE). However, had these nodes been accessible the limitations of the ATE to make accurate measurements would necessitate the use of tight guard bands adversely impacting upon yield. To address this issue of internal time parameter measurement, the circuitry normally resident in the ATE to perform the measurements is incorporated into the design itself. This paper is a case study of three time measurement techniques potentially suitable for circuit integration, namely, Time Difference Measurement (TDM), Successive Approximation Time Measurement (SATM) and Time Delay Interpolation Measurement (TDIM) methods. The techniques are analysed and compared for a number of design parameters such as area overhead, ease of calibration, timing resolution, robustness to processing, temperature and supply voltage variations. The results of the analysis indicate that TDIM is the most efficient of the three circuits analysed; this method has been incorporated in a high-resolution time measurement system in the sub-picosecond range and has subsequently been fabricated by Sun Microsystems.

...read moreread less

30 citations

Journal Article•DOI•

[...]

Shuli Gao, Noureddine Chabini, Dhamin Al-Khalili, J. M. Pierre Langlois¹•Institutions (1)

École Polytechnique de Montréal¹

01 Jan 2007-Iet Computers and Digital Techniques

TL;DR: An efficient design methodology and a systematic approach for the implementation of multiplication and squaring functions for unsigned large integers, using small-size embedded multipliers are presented and a set of equations is derived to aid in the realisation.

...read moreread less

Abstract: An efficient design methodology and a systematic approach for the implementation of multiplication and squaring functions for unsigned large integers, using small-size embedded multipliers are presented. A general architecture of the multiplier and squarer is proposed and a set of equations is derived to aid in the realisation. The inputs of the multiplier and squarer are split into several segments leading to an efficient utilisation of the small-size embedded multipliers and a reduced number of required addition operations. Various benchmarks were tested for different segments ranging from 2 to 5 targeting Xilinx Spartan-3 FPGAs. The synthesis was performed with the aid of the Xilinx ISE 7.1 XST tool. The approach was compared with the traditional technique using the same tool. The results illustrate that the design approach is very efficient in terms of both timing and area savings. Combinational delay is reduced by an average of 7.71% for the multiplier and 21.73% for the squarer. In terms of 4-inputs look-up tables, area is lowered by an average of 11.63% for the multiplier and 52.22% for the squarer. In the case of the multiplier, both approaches use the same number of embedded multipliers. For the squarer, the proposed approach reduces the number of required embedded multipliers by an average of 32.77% compared with the traditional technique.

...read moreread less

24 citations

Journal Article•DOI•

Design and Simulation of Reusable IP CORDIC Core for Special-Purpose Processors

[...]

Tze-Yun Sung¹, Hsi-Chin Hsin²•Institutions (2)

Chung Hua University¹, National United University²

Deterministic logic BIST for transition fault testing

TL;DR: A double rotation CORDIC algorithm with an efficient strategy to predict the rotation direction is proposed for a high-speed sine and cosine generator and complex multiplier and results show that the computation time can be improved and the overall power consumption reduced.

...read moreread less

Abstract: Coordinate rotation digital computer (CORDIC) is a well-known algorithm using simple adders and shifters to evaluate various elementary functions. A double rotation CORDIC algorithm with an efficient strategy to predict the rotation direction is proposed for a high-speed sine and cosine generator and complex multiplier. Simulation results show that the computation time can be improved by 37.2%, 42.67% and 46.04% for 16-bit, 32-bit and 64-bit operands, respectively. In addition, the overall power consumption per CORDIC arithmetic computation can be improved by 21.2% and 38.5% for 32-bit and 64-bit operands, respectively. Thus, the proposed double rotation CORDIC processor is suitable for high-speed applications.

...read moreread less

Journal Article•

[...]

Gherman, Schloeffel, Garbers

01 Jan 2007-Iet Computers and Digital Techniques

TL;DR: This paper presents an extension of a DLBIST scheme for transition fault testing, the so-called transition fault model, which is widely used for complexity reasons and is used to generate the required pattern pairs.

...read moreread less

Abstract: BIST is an attractive approach to detect delay faults due to its inherent support for at-speed test. Deterministic logic BIST (DLBIST) is a technique which was successfully applied to stuck-at fault testing. As delay faults have lower random pattern testability than stuck-at faults, the need for DLBIST schemes is increased. Nevertheless, an extension to delay fault testing is not trivial, since this necessitates the application of pattern pairs. Consequently, delay fault testing is expected to require a larger mapping effort and logic overhead than stuck-at fault testing. In this paper, we consider the so-called transition fault model, which is widely used for complexity reasons. We present an extension of a DLBIST scheme for transition fault testing. Functional justification is used to generate the required pattern pairs. The efficiency of the extended scheme is investigated by using industrial benchmark circuits.

...read moreread less

Journal Article•DOI•

Embedded high-resolution delay measurement system using time amplification

[...]

Mohd Abas¹, G. Russell¹, D.J. Kinniment¹•Institutions (1)

Newcastle University¹

Enhanced prefix inclusion coding filter-encoding algorithm for packet classification with ternary content addressable memory

TL;DR: This work presents two high-resolution time measurement schemes for digital Built-in Self-Test (BIST) applications, namely: Two-Delay Interpolation Method and the Time Amplifier.

...read moreread less

Abstract: The rapid pace of change in IC technology, specifically in the speed of operation, demands sophisticated design solutions for IC testing methodologies. Moreover, the current technology of System-on-Chip makes great demands on the accurate testing of internal timing parameters as access to internal nodes through input/output pins becomes more difficult. This work presents two high-resolution time measurement schemes for digital Built-in Self-Test (BIST) applications, namely: Two-Delay Interpolation Method and the Time Amplifier. The two schemes are subsequently combined to produce a novel design for BIST time measurement which offers two main advantages: a small time interval measurement capability which advances the state of the art and a small footprint, occupying 0.2 mm 2 or equivalent to 3020 transistors, compared with a recent design which has the equivalent of 4800 transistors

...read moreread less

Journal Article•DOI•

[...]

Derek Pao¹, Peng Zhou¹, Bin Liu², Xin Zhang³•Institutions (3)

City University of Hong Kong¹, Tsinghua University², Carnegie Mellon University³

Extending gate-level diagnosis tools to CMOS intra-gate faults

TL;DR: Major modifications to the PIC scheme to improve its update performance are presented and the new coding scheme is called PIC with segmented domain, which can be implemented with embedded SRAM rather than with TCAM.

...read moreread less

Abstract: Filter encoding can effectively enhance the efficiency of ternary content addressable memory (TCAM)-based packet classification. It can minimise the range expansion problem, reduce the TCAM space requirement and improve the lookup rate for IPv6. However, additional complexity will incur inevitably in the filter table update operations. Although the average update cost of the prefix inclusion coding (PIC) scheme is very low, the worst-case update cost can be significantly higher. Major modifications to the PIC scheme to improve its update performance are presented. The new coding scheme is called PIC with segmented domain. By dividing the field value domain into multiple segments, the mapping of field values to code points can be more structural and help avoid massive code-point relocation in the event of new insertions. Moreover, the simplified codeword lookup for the address fields can be implemented with embedded SRAM rather than with TCAM. Consequently, the lookup rate of the search engine can be improved to handle the OC-768 line rate.

...read moreread less

Journal Article•DOI•

[...]

Xinyue Fan¹, Will R. Moore², C. Hora³, G. Gronthoud³•Institutions (3)

Boston Consulting Group¹, University of Oxford², NXP Semiconductors³

05 Nov 2007-Iet Computers and Digital Techniques

TL;DR: A number of real diagnosis results from the wafer testing data including both stuck-open faults and intra-gate bridging faults have confirmed the effectiveness of this new method.

...read moreread less

Abstract: A comprehensive solution to the intra-gate diagnosis problem, including intra-gate bridging and stuck-open faults is provided. The work is based on a local transformation technique that allows transistor-level faults to be diagnosed by the commonly available gate-level fault diagnosis tools without having to deal with the complexity of a transistor-level description of the whole circuit. Three transformations are described: one for stuck-open faults, one for intra-gate resistive-open faults and one for intra-gate bridging faults. Experimental work has been conducted at NXP Semiconductors using the NXP diagnosis tool – FALOC. A number of real diagnosis results from the wafer testing data including both stuck-open faults and intra-gate bridging faults have confirmed the effectiveness of this new method.

...read moreread less

Journal Article•DOI•

Comparative analysis of gals clocking schemes

[...]

S. Dasgupta¹, Alex Yakovlev¹•Institutions (1)

Newcastle University¹

Multi-layer floorplanning for reconfigurable designs

TL;DR: A comparison of clock pausing, clock stretching, and data-driven clocking schemes and how they can be applied to an existing partitioned synchronous architecture to obtain a reliable, low-latency and efficient clock control architectures is presented.

...read moreread less

Abstract: Because of the increase in complexity of distributing a global clock over a single die, globally asynchronous and locally synchronous systems are becoming an efficient alternative technique to design distributed system-on-chip (SOC). A number of independently clocked synchronous domains can be integrated by clock pausing, clock stretching or data-driven clocking techniques. Such techniques are applied on point-to-point inter-domain communication schemes. Presented here is a comparison of these schemes and how they can be applied to an existing partitioned synchronous architecture to obtain a reliable, low-latency and efficient clock control architectures. The comparison highlights the advantages and disadvantages of one scheme over the other in terms of logical correctness, circuit implementation, performance and relative power consumption. Also presented are circuit solutions for stretchable and data-driven clocking schemes. These circuit solutions can be easily plugged into existing partitioned synchronous islands. To enable early evaluation of functional correctness, also proposed is the use of Petri net modelling techniques to model the asynchronous control blocks that constitute the interface between the synchronous islands.

...read moreread less

Journal Article•DOI•

[...]

Love Singhal¹, Elaheh Bozorgzadeh¹•Institutions (1)

University of California, Irvine¹

Design-time application mapping and platform exploration for MP-SoC customised run-time management

TL;DR: In this article, a multi-layer sequence pair-representation-based floorplanner is proposed to find high-quality floorplans for applications that use partial reconfiguration.

...read moreread less

Abstract: Partial dynamic reconfiguration is an emerging area in field programmable gate arrays (FPGA) designs, which is used for saving device area and cost. In order to reduce the reconfiguration overhead, two consecutive similar sub-designs should be placed in the same locations to get the maximum reuse of common components. This requires that all the future designs be considered while floorplanning for any given design. A comprehensive framework for floorplanning designs on partial reconfigurable architecture is provided. Several reconfiguration-specific floorplanning cost functions and moves that aim to reduce the reconfiguration overhead are introduced. A new multi-layer sequence pair-representation-based floorplanner that allows overlap of static and non-static components of multiple designs and guarantees a feasible overlapping floorplan with minimal area packing is introduced. A new matching algorithm that covers all possible matchings of static blocks during floorplanning for multiple designs is presented. In our experiments, it is shown that the proposed floorplanner gives more than 50% savings in reconfiguration frames compared with the scheme where no reuse is done. Further, compared with a traditional sequential floorplanner, our floorplanner removes infeasibility in many designs, achieves an improvement of clock period by 12% on average and reduces the place and route time significantly. The proposed floorplanner could be used for finding high-quality floorplans for applications that use partial reconfiguration.

...read moreread less

Journal Article•DOI•

[...]

Chantal Ykman-Couvreur¹, V. Nollet¹, Théodore Marescaux¹, Erik Brockmeyer¹, Francky Catthoor¹, Henk Corporaal - Show less +2 more•Institutions (1)

Katholieke Universiteit Leuven¹

Admitting and ejecting flits in wormhole-switched networks on chip

TL;DR: A Pareto-based approach is proposed combining a design-time application and platform exploration with a low-complexity run-time manager to avoid conservative worst-case assumptions and eliminate large run- time overheads on the state-of-the-art RTOS kernels.

...read moreread less

Abstract: In an Multi-Processor system-on-Chip (MP-SoC) environment, a customized run-time management layer should be incorporated on top of the basic Operating System services to alleviate the run-time decision-making and to globally optimise costs (e.g. energy consumption) across all active applications, according to application constraints (e.g. performance, user requirements) and available platform resources. To that end, to avoid conservative worst-case assumptions, while also eliminating large run-time overheads on the state-of-the-art RTOS kernels, a Pareto-based approach is proposed combining a design-time application and platform exploration with a low-complexity run-time manager. The design-time exploration phase of this approach is the main contribution of this work. It is also substantiated with two real-life applications (image processing and video codec multimedia). These are simulated on MP-SoC platform simulator and used to illustrate the optimal trade-offs offered by the design-time exploration to the run-time manager.

...read moreread less

Journal Article•DOI•

[...]

Zhonghai Lu, Axel Jantsch

On-chip bus thermal analysis and optimisation

TL;DR: The results show that both schemes do not degrade network performance in terms of average packet latency and throughput if the flit injection rate is slower than 0.57 flit/cycle/node.

...read moreread less

Abstract: Reducing the design complexity of switches is essential for cost reduction and power saving in on-chip networks In wormhole-switched networks, packets are split into flits which are then admitted into and delivered in the network When reaching destinations, flits are ejected from the network Since flit admission, flit delivery and flit ejection interfere with each other directly and indirectly, techniques for admitting and ejecting flits exert a significant impact on network performance and switch cost Different flit-admission and flit-ejection micro-architectures are investigated In particular, for flit admission, a novel coupling scheme which binds a flit-admission queue with a physical channel (PC) is presented This scheme simplifies the switch crossbar from 2p×p to (p+1)×p, where p is the number of PCs per switch For flit ejection, a p-sink model that uses only p flit sinks to eject flits is proposed In contrast to an ideal ejection model which requires p · v flit sinks (v is the number of virtual channels per PC), the buffering cost of flit sinks becomes independent of v The proposed flit-admission and flit-ejection schemes are evaluated with both uniform and locality traffic in a 2D 4×4 mesh network The results show that both schemes do not degrade network performance in terms of average packet latency and throughput if the flit injection rate is slower than 057 flit/cycle/node

...read moreread less

Journal Article•DOI•

[...]

Feng Wang¹, M. De Bole¹, Xiaoxia Wu¹, Yuan Xie¹, Narayanan Vijaykrishnan¹, Mary Jane Irwin¹ - Show less +2 more•Institutions (1)

Pennsylvania State University¹

Generation of hardware modules for run-time reconfigurable hybrid CPU/FPGA systems

TL;DR: This paper proposes an irredundant bus encoding scheme for on-chip buses to tackle the thermal issue and results show that the encoding scheme is very efficient to reduce the on- chip bus temperature rise over substrate temperature, with much less overhead compared to other low power encoding schemes.

...read moreread less

Abstract: As technology scales, increasing clock rates, decreasing interconnect pitch and the introduction of low-k dielectrics have made self-heating of the global interconnects an important issue in VLSI design. Further, high bus temperatures have had a negative impact on the delay and reliability of on-chip interconnects. Energy and thermal models are used to characterise the effects of self-heating on the temperature of on-chip interconnects. The results obtained show that self-heating of on-chip buses contribute significantly to the temperature of the bus, which increases as technology scales, motivating the need to find solutions to mitigate this effect. The theoretical analysis performed shows that spreading switching activities among all bus lines can effectively reduce the peak temperature of the on-chip bus. Based on this observation, a thermal spreading encoding scheme for on-chip buses is proposed to tackle the thermal issue. The results obtained show that this approach is very effective in reducing the transient peak temperature among bus lines, with much less overhead compared with other low-power encoding schemes. This technique can then be combined with low-power encoding schemes to further reduce the on-chip bus temperature.

...read moreread less

Journal Article•DOI•

[...]

Miguel L. Silva, João Canas Ferreira

Deterministic built-in self-test using split linear feedback shift register reseeding for low-power testing

TL;DR: A tool called BITLINKER, that creates partially reconfigurable modules from the bit-streams of individual components is described, capable of performing restricted component placement and interconnect routing between the assembled components.

...read moreread less

Abstract: A tool called BITLINKER, that creates partially reconfigurable modules from the bit-streams of individual components is described. It is also capable of performing restricted component placement and interconnect routing between the assembled components. The resulting modules are used in applications that exploit partial dynamic reconfiguration. The tool is integrated in a design flow particularly aimed at dynamically reconfigurable platform field-programmable gate arrays (FPGAs). The associated development design flow and a run-time support system that can be used to manage module activation and data communication are described. Evaluation results obtained with a Virtex-II Pro system are also reported.

...read moreread less

Journal Article•DOI•

[...]

Myung-Hoon Yang¹, Youbean Kim¹, Young-Kyu Park¹, Duk C. Lee¹, Sungho Kang¹ - Show less +1 more•Institutions (1)

Yonsei University¹

Design of power-efficient pipelined truncated multipliers with various output precision

TL;DR: Experimental results for the largest ISCAS'89 benchmark circuits show that the proposed scheme can reduce the switching activity by 50% with little hardware overhead compared with previous schemes.

...read moreread less

Abstract: A new low-power testing methodology to reduce the excessive power dissipation associated with scan-based designs in the deterministic test pattern generated by linear feedback shift registers (LFSRs) in built-in self-test is proposed. This new method utilises two split LFSRs to reduce the amount of the switching activity. The original test cubes are partitioned into zero-set and one-set cubes according to specified bits in the test cubes, and the split LFSR generates a zero-set or one-set cube in the given test cube. In cases where the current scan shifting value is a do not care bit accounting for the output values of the LFSRs, the last value shifted into the scan chain is repeatedly shifted into the scan chain and no transition is produced. Experimental results for the largest ISCAS'89 benchmark circuits show that the proposed scheme can reduce the switching activity by 50% with little hardware overhead compared with previous schemes.

...read moreread less

Journal Article•DOI•

[...]

Shiann-Rong Kuang¹, Jiun-Ping Wang¹•Institutions (1)

National Sun Yat-sen University¹

FPGA-based fault emulation of synchronous sequential circuits

TL;DR: A low-power signed pipelined truncated multiplier is proposed that can dynamically detect multiple combinations of input ranges and deactivate a large amount of the unnecessary transitions in non-effective ranges to reduce the power consumption.

...read moreread less

Abstract: An energy-efficient multiplier is very desirable for multimedia and digital signal processing systems. In many of these systems, the effective dynamic range of input operands for multipliers is generally limited to a small range and the case with maximum range seldom occurs. In addition, the output products of multipliers are usually rounded or truncated to avoid growth in word size. Based on these features, a low-power signed pipelined truncated multiplier is proposed that can dynamically detect multiple combinations of input ranges and deactivate a large amount of the unnecessary transitions in non-effective ranges to reduce the power consumption. Moreover, the proposed multiplier can trade output precision against power consumption so as to further reduce power consumption. Experimental results show that the proposed multiplier consumes up to 90% less power than the conventional standard multiplier while still maintaining an acceptable output precision and quality.

...read moreread less

Journal Article•DOI•

[...]

Peeter Ellervee¹, Jaan Raik¹, Kalle Tammemäe¹, Raimund Ubar¹•Institutions (1)

Tallinn University of Technology¹

SPECIAL SECTION ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS - Statistical placement for FPGAs considering

TL;DR: A feasibility study of accelerating fault simulation by emulation by emulation on FPGA is described, showing that it is beneficial to use emulation for circuits/methods that require large numbers of test vectors, e.g., sequential circuits and/or genetic algorithms.

...read moreread less

Abstract: A feasibility study of accelerating fault simulation by emulation on field programmable gate arrays (FPGAs) is described. Fault simulation is an important subtask in test pattern generation and it is frequently used throughout the test generation process. The problems associated with fault simulation of sequential circuits are explained. Alternatives that can be considered as trade-offs in terms of the required FPGA resources and accuracy of test quality assessment are discussed. In addition, an extension to the existing environment for re-configurable hardware emulation of fault simulation is presented. It incorporates hardware support for fault dropping. The proposed approach allows simulation speed-up of 40–500 times as compared to the state-of-the-art in software-based fault simulation. On the basis of the experiments, it can be concluded that it is beneficial to use emulation for circuits/methods that require large numbers of test vectors while using simple but flexible algorithmic test vector generating circuits, for example built-in self-test.

...read moreread less

Journal Article•DOI•

[...]

Yan Lin¹, Michael D. Hutton², Lei He¹•Institutions (2)

University of California, Los Angeles¹, Altera²

Low error fixed-width two's complement squarer design using Booth-folding technique

TL;DR: The first in-depth study on applying statistical timing analysis with cross-chip and on-chip variations to speed-binning and guard- banding in FPGAs has been presented and the effects of timing-model with guard-banding and speed- binning on statistical performance and timing yield are quantified.

...read moreread less

Abstract: Process variations affecting timing and power is an important issue for modern integrated circuits in nanometre technologies Field programmable gate arrays (FPGA) are similar to application-specific integrated circuit (ASIC) in their susceptibility to these issues, but face unique challenges in that critical paths are unknown at test time The first in-depth study on applying statistical timing analysis with cross-chip and on-chip variations to speed-binning and guard- banding in FPGAs has been presented Considering the uniqueness of re-programmability in FPGAs, the effects of timing-model with guard-banding and speed-binning on statistical performance and timing yield are quantified A new variation aware statistical placement, which is the first statistical algorithm for FPGA layout and achieves a yield loss of 297% of the original yield loss with guard-banding and a yield loss of 4% of the original one with speed-binning for Microelectronics Center of North Carolina (MCNC) and Quartus University Interface Program (QUIP) designs, has also been developed

...read moreread less

Journal Article•DOI•

[...]

K.-J. Cho¹, J.-G. Chung¹•Institutions (1)

Chonbuk National University¹

Robust paradigm for diagnosing hold-time faults in scan chains

TL;DR: By simulations, it is shown that the performance of the proposed method is close to that of the true rounding method and much better than those of other methods.

...read moreread less

Abstract: This study presents a design method for fixed-width two's complement squarer that receives an n-bit input and produces an n-bit squared product. To efficiently compensate for the truncation error, modified Booth-folding encoder signals are used for the generation of error compensation bias. The truncated bits are divided into two groups depending on their effects on the truncation error. Then, different error compensation methods are applied to each group. By simulations, it is shown that the performance of the proposed method is close to that of the true rounding method and much better than those of other methods. Also, it is shown that the proposed fixed-width two's complement squarers lead to about 34% reduction in area, 35% reduction in power consumption and 10% improvement in speed compared with conventional squarers.

...read moreread less

Journal Article•DOI•

[...]

Chao-Wen Tzeng¹, J.-J. Hsu², Shi-Yu Huang¹•Institutions (2)

National Tsing Hua University¹, TSMC²

05 Nov 2007-Iet Computers and Digital Techniques

TL;DR: A robust new paradigm for diagnosing hold-time violation at scan chains with the ability to tolerate non-ideal conditions is presented and two algorithms including a greedy algorithm and a so-called best-alignment based algorithm are proposed.

...read moreread less

Abstract: Hold-time violation is a common cause of failure at scan chains. A robust new paradigm for diagnosing such failures is presented. As compared to previous methods, the main advantage of this is the ability to tolerate non-ideal conditions, for example, under the presence of certain core logic faults or for those faults that manifest themselves intermittently. The diagnosis problem is first formulated as a ‘delay insertion process’. Upon this formulation, two algorithms – a ‘greedy’ algorithm and a so-called ‘best-alignment-based’ algorithm – is proposed. Experimental results on a number of practical designs and ISCAS'89 benchmark circuits are presented to demonstrate its effectiveness.

...read moreread less

Journal Article•DOI•

High speed hardware architecture to compute galois fields GF(p) montgomery inversion with scalability features

[...]

Adnan Gutub¹•Institutions (1)

King Fahd University of Petroleum and Minerals¹

Novel low-overhead roll-forward recovery scheme for distributed systems

TL;DR: The study included remodeling the entire hardware architecture removing the shifter from the scalable computing part and embedding it in the non-scalable memory unit instead, which resulted in a speedup to the complete inversion process with an area increase due to the new memory shifting unit.

...read moreread less

Abstract: Modular inversion is a fundamental process in several cryptographic systems. It can be computed in software or hardware, but hardware computation has been proven to be faster and more secure. This research focused on improving an old scalable inversion hardware architecture proposed in 2004 for finite field GF(p). The architecture comprises two parts, a computing unit and a memory unit. The memory unit holds all the data bits of computation whereas the computing unit performs all the arithmetic operations in word (digit) by word bases such that the design is scalable. The main objective of this paper is to show the cost and benefit of modifying the memory unit to include shifting, which was previously one of the tasks of the scalable computing unit. The study included remodeling the entire hardware architecture removing the shifter from the scalable computing part and embedding it in the non-scalable memory unit instead. This modification resulted in a speedup to the complete inversion process with an area increase due to the new memory shifting unit. Several design schemes have been compared giving the user the complete picture to choose from depending on the application need.

...read moreread less

Journal Article•DOI•

[...]

Bidyut Gupta¹, Shahram Rahimi¹, Ziping Liu¹•Institutions (1)

Southern Illinois University Carbondale¹