scispace - formally typeset
Search or ask a question

Showing papers in "Iet Computers and Digital Techniques in 2017"


Journal ArticleDOI
TL;DR: The ECP proposed in this study over F p performs better than available hardware in terms of area and timing, and the area–delay product of this design is very low compared with similar designs.
Abstract: This study presents a description of an efficient hardware implementation of an elliptic curve cryptography processor (ECP) for modern security applications. A high-performance elliptic curve scalar multiplication (ECSM), which is the key operation of an ECP, is developed both in affine and Jacobian coordinates over a prime field of size p using the National Institute of Standards and Technology standard. A novel combined point doubling and point addition architecture is proposed using efficient modular arithmetic to achieve high speed and low hardware utilisation of the ECP in Jacobian coordinates. This new architecture has been synthesised both in application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA). A 65 nm CMOS ASIC implementation of the proposed ECP in Jacobian coordinates takes between 0.56 and 0.73 ms for 224-bit and 256-bit elliptic curve cryptography, respectively. The ECSM is also implemented in an FPGA and provides a better delay performance than previous designs. The implemented design is area-efficient and this means that it requires not many resources, without any digital signal processing (DSP) slices, on an FPGA. Moreover, the area–delay product of this design is very low compared with similar designs. To the best of the authors’ knowledge, the ECP proposed in this study over F p performs better than available hardware in terms of area and timing.

52 citations


Journal ArticleDOI
TL;DR: This study proposes a multi-key ECC based on the residue number system that employees deep pipelining to allow the concurrent encryption of 21 keys and results are compared with existing ECC architectures.
Abstract: Public-key cryptosystems such as elliptic curve cryptography (ECC) and Rivest–Shamir–Adleman (RSA) are widely used for data security in computing systems. ECC provides a high level of security with a much smaller key than RSA, which makes ECC a preferred choice in many applications. This study proposes a multi-key ECC based on the residue number system. The proposed architecture employees deep pipelining to allow the concurrent encryption of 21 keys. The proposed architectures are implemented on two different field programmable gate array (FPGA) platforms and results are compared with existing ECC architectures. The proposed implementation on Virtex-7 FPGA achieves a throughput of 1816 kbps at a clock frequency of 73 MHz.

17 citations


Journal ArticleDOI
TL;DR: The authors introduce an algorithm to optimise the computing time of feature extraction methods for the colour image by choosing generalised Fourier descriptor (GFD) and generalised colour Fourier descriptors (GCFD) models.
Abstract: Optimising computing times of applications is an increasingly important task in many different areas such as scientific and industrial applications. Graphics processing unit (GPU) is considered as one of the powerful engines for computationally demanding applications since it proposes a highly parallel architecture. In this context, the authors introduce an algorithm to optimise the computing time of feature extraction methods for the colour image. They choose generalised Fourier descriptor (GFD) and generalised colour Fourier descriptor (GCFD) models, as a method to extract the image feature for various applications such as colour object recognition in real-time or image retrieval. They compare the computing time experimental results on central processing unit and GPU. They also present a case study of these experimental results descriptors using two platforms: a NVIDIA GeForce GT525M and a NVIDIA GeForce GTX480. Their experimental results demonstrate that the execution time can considerably be reduced until 34× for GFD and 56× for GCFD.

15 citations


Journal ArticleDOI
TL;DR: In this study, an area and power-efficient iterative floating-point (FP) multiplier architecture is designed and implemented on FPGA devices with pipelined architecture and achieves lower-power consumption.
Abstract: In this study, an area and power-efficient iterative floating-point (FP) multiplier architecture is designed and implemented on FPGA devices with pipelined architecture. The proposed multiplier supports both single-precision (SP) and double-precision (DP) operations. The operation mode can be switched during run time by changing the precision selection signal. The Karatsuba algorithm is applied when mapping the mantissa multiplier in order to reduce the number of digital signal processing (DSP) blocks required. For DP operations, the iterative method is applied which require much less hardware than a fully pipelined DP multiplier and thus reduces the power consumption. To further reduce the power consumption, the unused logic blocks for a specific operation mode are disabled. Compared to previous work, the proposed multiplier can achieve 33% reduction of DSP blocks, 4.3% less look-up tables (LUTs), and 31.2% less flip-flops while having 4% faster clock frequency on Virtex-5 devices. Compared to the intellectual property core DP multiplier provided by the FPGA vendors, the proposed multiplier required less DSP blocks and achieves lower-power consumption. The mapping solutions and implementation results of the proposed multiplier on Xilinx Virtex-7 and Altera Arria-10 devices are also presented. In addition, the results of a direct implementation of the proposed architecture on STM-90 nm ASIC platform are reported.

15 citations


Journal ArticleDOI
TL;DR: This study proposes an effective strategy to ensure an energy consumption gain that takes into account time constraints through a power-aware model based on the dynamic voltage and frequency scaling and the dynamic power management that are appropriate to the WSNs and on a global Earliest Deadline First scheduler.
Abstract: The sharp increase of the wireless sensor networks (WSNs) performance has increased their power requirements. However, with a limited battery lifetime it is more and more difficult to deploy many more sensors with today's solutions. Therefore, the authors need to implement autonomous WSNs without any human intervention or external power supply. To this end, this study proposes an effective strategy to ensure an energy consumption gain that takes into account time constraints through a power-aware model based on the dynamic voltage and frequency scaling and the dynamic power management that are appropriate to the WSNs and on a global Earliest Deadline First scheduler. To select the most suitable simulator to integrate and simulate the developed models, >25 of the existing WSN simulators are outlined and evaluated. On the basis of this comparative study analysis, the authors chose the simulation tool for real-time multiprocessor scheduling (STORM) to validate their work for its multiple advantages.

13 citations


Journal ArticleDOI
TL;DR: This study predicts circuit timing for all test patterns using three machine learning techniques, neural network (NN), support vector regression (SVR), and least-square boosting (LSBoost), and proposes four feature extractions to reduce the huge dimension of raw data.
Abstract: Excessive power supply noise (PSN) such as IR drop can cause yield loss when testing very large scale integration chips. However, simulation of circuit timing with PSN is not an easy task. In this study, the authors predict circuit timing for all test patterns using three machine learning techniques, neural network (NN), support vector regression (SVR), and least-square boosting (LSBoost). To reduce the huge dimension of raw data, they propose four feature extractions: input/output transition (IOT), flip-flop transition in window (FFTW), switching activity in window (SAW), and terminal FF transition of long paths (PATH). SAW and FFTW are physical-aware features while PATH is a timing-aware feature. Their experimental results on leon3mp benchmark circuit (638 K gates, 2 K test patterns) show that, compared with the simple IOT method, SAW effectively reduced the dimension by up to 472 times, without significant impact on prediction accuracy [correlation coefficient = 0.79]. Their results show that NN has best prediction accuracy and SVR has the least under-prediction. LSBoost uses the least memory. The proposed method is more than six orders of magnitude faster than traditional circuit simulation tools.

13 citations


Journal ArticleDOI
TL;DR: The results show that the proposed architecture for cube operation based on Yavadunam sutra of Vedic mathematics is useful for less area and high-speed application in microprocessor environment.
Abstract: This study presents a generalised architecture for cube operation based on Yavadunam sutra of Vedic mathematics. This algorithm converts the cube of a large magnitude number into smaller magnitude number and addition operation. The Vedic sutra for decimal numbers is extended to binary radix-2 number system considering digital platforms. The cubic architecture is synthesised and simulated using Xilinx ISE 14.1 software and implemented on various Field-programmable gate array devices for comparison purpose. The Encounter(R) RTL Compiler RC13.10 v13.10-s006_1 of cadence tool is also used considering Application specific integrated circuit platform. The performance parameters such as delay, area and power are obtained from synthesis reports. The results show that the proposed architecture is useful for less area and high-speed application in microprocessor environment.

12 citations


Journal ArticleDOI
TL;DR: Comparison of analytical results in terms of performance and costs for different network dimensions indicate that the proposed CBP-Mesh offers short latency, high throughput and good scalability at small increase in power and energy.
Abstract: This study presents an efficient and scalable networks-on-chip (NoC) topology termed as cross-by-pass-mesh (CBP-Mesh). The proposed architecture is derived from the traditional mesh topology by addition of cross-by-pass links in the network. The design and impact of adding cross-by-pass links on the topology is analysed in detail with the help of synthetic, hotspot as well as embedded traffic traces. The advantages of proposed CBP-Mesh as compared with its competitor topologies include reduction in the network diameter, increase in bisection bandwidth, reduction in average numbers of hops, improvement in symmetry and regularity of the network. The synthetic traffic traces and some real embedded system workloads are applied on the proposed CBP-Mesh and its competitor two-dimensional-based NoC topologies. The comparison of analytical results in terms of performance and costs for different network dimensions indicate that the proposed CBP-Mesh offers short latency, high throughput and good scalability at small increase in power and energy.

11 citations


Journal ArticleDOI
TL;DR: A modified robust mixed norm (MRMN) adaptive filter algorithm robust to impulsive noise with higher convergence rate and lower steady state error (SSE) and a significant improvement in SSE and convergence speed is obtained compared with the existing adaptive filters for similar specifications.
Abstract: Adaptive filters are prevalent in many real-time signal processing applications. Many adaptive algorithms already exist, but most of them assume white Gaussian noise as disturbance. However, for many applications such as electrocardiogram, foetus heart rate measurement, low frequency atmospheric noise, underwater acoustic noise and signal measurement in instrumentation, the impulsive noise is more common. This study presents a modified robust mixed norm (MRMN) adaptive filter algorithm robust to impulsive noise with higher convergence rate and lower steady state error (SSE). MRMN adaptive filter algorithm has been simulated using Matlab and Xilinx system generator high level synthesis tool and a significant improvement in SSE and convergence speed is obtained compared with the existing adaptive filter algorithms for similar specifications. The proposed algorithm is also described using VHDL and synthesised using Xilinx synthesiser tool in order to implement on field-programmable gate array (FPGA). The FPGA post route and place implementation results show nearly 90% reduction in resource utilisation and nearly 2.6 times improvement in clock frequency as compared with the existing FPGA based implementation for similar specifications.

10 citations


Journal ArticleDOI
TL;DR: This work proposes a simple and intuitive dynamic encoding scheme that eliminates all TT and HT at hot locations, hence reducing energy consumption and improving MLC STT-MRAM lifetime.
Abstract: Shifting market trends towards mobile, Internet of things, and data-centric applications create opportunities for emerging low-power non-volatile memories. The attractive features of spin-torque-transfer magnetic-RAM (STT-MRAM) make it a promising candidate for future on-chip cache memory. Two-bit multiple-level cell (MLC) STT-MRAMs suffer from higher write energy, performance overhead, and lower cell endurance when compared with single-level counterpart. These unwanted effects are mainly due to write operations known as two-step (TT) and hard transitions (HT). Here, the authors offer a solution to tackle write energy problem in MLC STT-MRAM by minimising the number of TT and HT transitions. By analysing real applications, it was observed that specific locations within a cache block undergo much more TT and HT transitions resulting in hot locations when compared with other ones (cold locations). These hot locations are more detrimental to the lifetime and reliability of MRAM device. In this work, the authors propose a simple and intuitive dynamic encoding scheme that eliminates all TT and HT at hot locations, hence reducing energy consumption and improving MLC STT-MRAM lifetime. Results on PARSEC benchmarks demonstrate the effectiveness and scalability of the proposed approach to potentially prolong MLC STT-MRAM lifetime.

9 citations


Journal ArticleDOI
TL;DR: The results indicate that the proposed scheduler improves deadline tardiness and provides hard real-time guarantees by combining cache and task partitioning with scheduling optimisations.
Abstract: A two-phase colour-aware real-time scheduler to reduce the contention caused by the cache coherence protocol due to accesses to shared cache partitions in a multicore processor is proposed. The first phase is a colour-aware task partitioning (CAP) algorithm that assigns tasks that share colours to a common processor whenever possible. The second phase is a dynamic colour-aware scheduler that detects cache coherence activities at run-time, preventing the execution of tasks that interfere with each other and thus reducing the contention caused by the cache coherence protocol. The authors compare the proposed scheduler with a CAP without run-time optimisation and with the best-fit decreasing heuristic in terms of deadline misses and tardiness of several task sets using a real-time operating system and a modern 8-core processor. The results indicate that the proposed scheduler improves deadline tardiness and provides hard real-time guarantees by combining cache and task partitioning with scheduling optimisations.

Journal ArticleDOI
TL;DR: The authors apply Markov chain to model and analyse the component dependability of CPSs and propose the recovery techniques to guarantee a high level of dependability so as to take care of assuring the continuity of system operation.
Abstract: As cyber physical system (CPS) is often used in safety critical areas, dependability of the system is an important issue that needs to be analysed. Any failure on the components of the CPS could result in a degradation of the physical state, which then causes major harm to life and/or property. Since the concept of dependence leads to that of trust, the subsystems of the CPS should be dependable to each other to deliver the requested services as specified without failing during its operation. In this study, the authors apply Markov chain to model and analyse the component dependability of CPSs and propose the recovery techniques to guarantee a high level of dependability so as to take care of assuring the continuity of system operation.

Journal ArticleDOI
TL;DR: Experimental results demonstrate the effectiveness of the proposed algorithm in significantly reducing the area and power of synthesised sequential circuits while enhancing their fault tolerance.
Abstract: Recently, a finite state machine-based fault tolerance technique for sequential circuits based on protecting few states with high probability of occurrence has been proposed. In this study, the authors propose an algorithm that starts with a given state assignment targeting the optimisation of either area or power and generates a state assignment that preserves the original state assignment and satisfies the fault tolerance requirements for the protected states. Experimental results demonstrate the effectiveness of the proposed algorithm in significantly reducing the area and power of synthesised sequential circuits while enhancing their fault tolerance.

Journal ArticleDOI
TL;DR: This study describes a hardware implementation of the real-time scheduler named nHSE (hardware scheduler engine for n tasks) and presents the results obtained using the appropriate schedulability methods used in real- time environments.
Abstract: Taking into consideration the requirements of real-time embedded systems, the processor scheduler must guarantee a constant scheduling frequency, providing determinism and predictability of tasks execution. The purpose of this study is to implement the nMPRA (multi pipeline register architecture) processor into field-programmable gate array, and to integrate the already existing scheduling methods, thus providing a preemptive schedulability analysis of the proposed architecture based on the pipeline assembly line and hardware scheduler. This study describes a hardware implementation of the real-time scheduler named nHSE (hardware scheduler engine for n tasks) and presents the results obtained using the appropriate schedulability methods used in real-time environments. The scheduling and task switch operations are the main source of non-determinism, being successfully dealt with real-time nMPRA concept, in order to improve the system's functionality. Some mechanisms used for synchronisation and inter-task communication are also taken into consideration.

Journal ArticleDOI
TL;DR: A temperature gradient-aware thermal simulator for three-dimensional ICs (called 3D-TarGA) at the architectural level that considers the thermal effects in leakage power, thermal conductivity, thermal radiation, and thermal convection to reflect the physical–thermal interactive effects of ICs at the early stages of IC design.
Abstract: Nowadays, thermal simulators of integrated circuits (ICs) at architectural level tend to neglect thermal effects in temperature-dependent factors (such as leakage power and thermal conductivity) and a heat dissipation mechanism for thermal radiation at the early stages of IC design. Hence, the analysis results of thermal simulators may be not sufficient to reflect the physical–thermal interactive effects of ICs. This study presents a temperature gradient-aware thermal simulator for three-dimensional ICs (called 3D-TarGA) at the architectural level. The temperature gradient-aware thermal analysis of 3D-TarGA considers the thermal effects in leakage power, thermal conductivity, thermal radiation, and thermal convection to reflect the physical–thermal interactive effects of ICs at the early stages of IC design. Experimental results show that the maximum absolute error for the temperature of IC with ignoring the thermal effects using 3D-TarGA is 1.62°C, in contrast to the published thermal simulator, HotSpot. Moreover, the maximum absolute difference for the temperature of IC by considering the thermal effects is 2.7°C, as compared with that when ignoring the thermal effects for 3D-TarGA.

Journal ArticleDOI
TL;DR: This study presents exploration of a low-cost optimised HLS solution capable of handling hardware Trojan (providing security) that alters computational output and indicates significant reduction in the cost of security-aware HLS Solution through the proposed approach compared with a recent approach.
Abstract: Owing to massive complexity of modern digital integrated circuits (ICs) disabling complete in-house development, globalisation of the design process establishes itself as an inevitable solution for faster and efficient design. However, globalisation incurs importing intellectual property (IP) cores from various third party vendors, rendering an IP susceptible to hardware threats. To provide trust and security in digital ICs within user constraints, design of a low-cost optimised dual modular redundant, through Trojan secured high-level synthesis (HLS) methodology, is crucial. This study presents exploration of a low-cost optimised HLS solution capable of handling hardware Trojan (providing security) that alters computational output. The key contributions of the study are as: (i) novel low-cost security-aware HLS approach; (ii) novel encoding for representing bacterium in the design space (comprising of candidate datapath resource configuration and vendor allocation information for Trojan secured solution); and (iii) novel exploration process of an efficient vendor allocation procedure that assists in yielding a low-cost Trojan secured schedule. Experimental results indicate significant reduction in the cost of security-aware HLS solution (82.4%) through the proposed approach compared with a recent approach.

Journal ArticleDOI
TL;DR: An online non-clairvoyant scheduling multiprocessor priority round robin (MPRR) is introduced, which is O ( α ) - competitive; more precisely ( 2 α 2 / ( α − ( 1 / 2 ) ) ) -competitive, i.e. the competitive ratio is 5.33 for α = 2 and 7.3 for α’s = 3.
Abstract: In the current epoch, the energy consumption is a great concern in the online non-clairvoyant job scheduling. The online non-clairvoyant scheduling is studied less extensively than the online clairvoyant scheduling. The authors study non-clairvoyant scheduling problem of minimising the total prioritised flow time plus energy, where the jobs with arbitrary sizes and priorities arrive online. The authors consider unbounded speed model in multiprocessor settings, where the speed of m individual processors can vary from zero to infinity, i.e. [0, ∞) , to save the energy and to optimise the prioritised flow time plus energy. The authors consider the traditional power function P s = s α , where s is the speed of a processor and α > 1, a constant. In this study, the authors introduce an online non-clairvoyant scheduling multiprocessor priority round robin (MPRR), which is O ( α ) -competitive; more precisely ( 2 α 2 / ( α − ( 1 / 2 ) ) ) -competitive, i.e. the competitive ratio is 5.33 for α = 2 and 7.3 for α = 3. In this study, the algorithm is studied using the potential analysis against an optimal offline adversary.

Journal ArticleDOI
TL;DR: Experiments show that, compared with the fully-throttling based vertical throttling scheme, the proposed CFP-DTM can improve the throughput by 27.5% and reduce the thermal control oscillation by 3°C under the maximum system workload.
Abstract: Three-dimensional networks-on-chip are beneficial for performance improvement, but suffer from severe thermal issues. Dynamic thermal management (DTM) schemes have been proposed to keep the temperature below the thermal limit while improve the system performance. However, existing fully-throttling DTM schemes degrade the network availability and thus decrease the system performance. In this study, a novel collaborative fuzzy-based partially-throttling DTM (CFP-DTM) scheme is developed. Two main components are involved in the CFP-DTM: (i) a fuzzy-based clock gating scheme dynamically adjusting the throttling ratio and throttled nodes (ii) a highly adaptive throttling-aware routing scheme for packets to detour the easily congested channels. Experiments show that, compared with the fully-throttling based vertical throttling scheme, the proposed CFP-DTM can improve the throughput by 27.5% and reduce the thermal control oscillation by 3°C under the maximum system workload.

Journal ArticleDOI
TL;DR: A formal verification methodology for checking both functional and timing requirements of real-time digital controllers targeted at field programmable gate array technology is proposed and one of the key ideas is the overloaded use of rank functions for timing verification.
Abstract: A formal verification methodology for checking both functional and timing requirements of real-time digital controllers targeted at field programmable gate array technology is proposed. Timed transition systems (TTSs) are used to model both the digital controller circuit and the high-level specification requirements. Timed well-founded simulation (TWFS) refinement is used as the notion of correctness and defines what it means for an implementation TTS to satisfy a specification TTS. The primary contribution is a set of proof obligation templates (based on TWFS refinement) that account for both functional and timing requirements. The proof obligations generated using the templates can be checked using a decision procedure. One of the key ideas is the overloaded use of rank functions (that are typically used for liveness verification) for timing verification. The efficiency and scalability of the approach is demonstrated using three case studies.

Journal ArticleDOI
TL;DR: This study proposes an efficient very large scale integration (VLSI) architecture for quadruple throughput fixed point multiply accumulate circuit (MAC) and shows that the proposed architecture achieves better improvement in throughput than existing designs.
Abstract: This study proposes an efficient very large scale integration (VLSI) architecture for quadruple throughput fixed point multiply accumulate circuit (MAC). The proposed n × n bits MAC is used to perform one n × n bits or two n × (n/2) bits or four (n/2) × (n/2) bits MAC operations in parallel. The objective of the proposed MAC is to improve throughput of the existing MAC designs. The proposed and existing designs are implemented by 45 nm CMOS TSMC library and the results show that the proposed architecture achieves better improvement in throughput than existing designs. For example, the proposed 32 × 32 bits MAC architecture achieves 60.4% of improvement in throughput over existing array multiplier-based double throughput MAC.

Journal ArticleDOI
TL;DR: This study motivates the use of STT-MRAM for the first-level caches of a multicore processor to reduce energy consumption without significantly degrading the performance, and proposes a proposed STT hierarchy which shows good scalability over the CMOS with a few benchmarks which scale significantly better.
Abstract: Spintronic memory [spin-transfer torque-magnetic random access memory (STT-MRAM)] is an attractive alternative technology to CMOS since it offers higher density and virtually no leakage current. Spintronic memory continues to require higher write energy, however, presenting a challenge to memory hierarchy design when energy consumption is a concern. This study motivates the use of STT-MRAM for the first-level caches of a multicore processor to reduce energy consumption without significantly degrading the performance. The large STT-MRAM first-level cache implementation saves leakage power. Moreover, the use of small level-0 cache regains the performance drop due to STT-MRAM long write latencies. The combination of both reduces the energy-delay product by 65% on average compared with CMOS baseline. The proposed STT hierarchy also shows good scalability over the CMOS with a few benchmarks which scale significantly better. The PARSEC and Splash2 benchmark suites are analysed running on a modern multicore platform, comparing performance, energy consumption and scalability of the spintronic cache system to a CMOS design.

Journal ArticleDOI
TL;DR: This study aims to optimise the VLSI implementation of the WiMedia MAC system by proposing a hybrid shared-memory and message-passing multiprocessor system-on-chip (MPSoC) architecture that achieves 24% performance improvement and 22% power savings over the conventional shared- memory-only one.
Abstract: Ultra-wideband (UWB) is a well-known radio technology whose media access control (MAC) protocol, WiMedia MAC, has considerable potential to ensure high-speed and high-quality data communication for wireless personal area networks. However, these benefits involve a heavy computational workload, thereby posing a challenge to the conventional very-large-scale integration (VLSI) approach in terms of providing the required performance and power efficiency. Therefore, this study aims to optimise the VLSI implementation of the WiMedia MAC system by proposing a hybrid shared-memory and message-passing multiprocessor system-on-chip (MPSoC) architecture. The proposed solution combines the state-of-the-art MPSoC technology and application-specific instruction-set processor techniques to (i) accelerate the MAC protocol at task level by using parallel processing, (ii) enable the using of custom instructions to optimise the inter-processor communication by using an explicit message passing mechanism, and (iii) ease the implementation process by using a high-level software/hardware co-design methodology. The proposed platform is implemented on both system-level SystemC for architecture exploration and standard-cell technology for future chip implementation. Experimental results show that the proposed hybrid MPSoC architecture achieves 24% performance improvement and 22% power savings over the conventional shared-memory-only one.

Journal ArticleDOI
TL;DR: The authors have proposed using two well-known micro-architectural approaches: the last-write-update (LWU) and the value locality (VL), to minimize the number of write operations into a register.
Abstract: In this paper, the authors propose two approaches that employ spin transfer torque random access memory (STTRAM) in the design of the register file, an important part of embedded processors. However, STTRAM suffers from both endurance and latency in the write operation. Consequently, employing STTRAM in the register file entails two challenges: (i) the lifetime significantly decreases as data are frequently written into a register file; (ii) the delay of the critical path increases as a result of the slow write operation. They have proposed using two well-known micro-architectural approaches: the last-write-update (LWU) and the value locality (VL), to minimize the number of write operations into a register. The main observation behind LWU is that only the last write operations of certain registers in the re-order buffer should be considered. VL exploits the fact that a limited number of values are more likely to be written into a register file. Their simulation shows that by employing the LWU and VL approaches, the lifetime of the STTRAM-based register file, compared with the lifetime of traditional static RAM-based register file architecture, extends to about 40 years on average and around 3.5 years in the worst case while improving power consumption by 30%.

Journal ArticleDOI
TL;DR: A new efficient algorithm for translating linear temporal logic (LTL) formulas to Buchi automata, which are used by LTL model checkers, using the principle of alternating automata and keeping only the positive transitions without generating the intermediate generalised automata.
Abstract: In this study, the author presents a new efficient algorithm for translating linear temporal logic (LTL) formulas to Buchi automata, which are used by LTL model checkers. The general idea of this algorithm is to generate Buchi automata from LTL formulas, using the principle of alternating automata and keeping only the positive transitions without generating the intermediate generalised automata. The LTL translation is the heart of any LTL model checker, which affects its performance. The translation performance is measured in addition to its speed and the size of the produced Buchi automaton (number of states and number of transitions), by correctness of produced Buchi automaton and its level of determinism. The author will show that this method is different from the others and it is very competitive with the most efficient translators to date.

Journal ArticleDOI
TL;DR: This study presents a fast and accurate delay model by extracting the key parameters affecting FPGA delay and by combining the classical Elmore equivalent model and the powerful learning capability of neural network.
Abstract: Field programmable gate arrays (FPGAs) are adopted in many electronic systems, due to their design flexibility and high performance. For providing right FPGAs for different applications, FPGA architectural exploration is needed. Accurate estimation of area and delay of low-level FPGA circuits is required to evaluate different architecture candidates during the exploration. In this study, the authors present a fast and accurate delay model by extracting the key parameters affecting FPGA delay and by combining the classical Elmore equivalent model and the powerful learning capability of neural network. The derived model can be integrated with the existing FPGA architecture exploration flow perfectly. Experimental results show that compared with circuit simulator tool HSPICE, this model speeds up the delay estimation by 2863 times with the average error of 1.9% during the architectural exploration process. This fast and accurate estimation allows FPGA architects to explore more architectural options in limited time, resulting in optimised FPGA architecture.

Journal ArticleDOI
Irith Pomeranz1
TL;DR: This study describes a reconstruction procedure that is based on repeating short subsequences of primary input patterns from the sequence that provides a new low-complexity option for increasing the fault coverage, and thus addressing the high computational complexity of sequential test generation.
Abstract: Simulation-based sequential test generation procedures address the high computational complexity of sequential test generation by replacing the deterministic branch-and-bound process with lower-complexity processes. These processes introduce new primary input patterns into a functional test sequence in order to increase its fault coverage. This study observes that, even without introducing new primary input patterns, it is possible to increase the fault coverage of a functional test sequence by applying the same primary input patterns in different orders. This is referred to as reconstruction of the sequence. It provides a new low-complexity option for increasing the fault coverage, and thus addressing the high computational complexity of sequential test generation. This study describes a reconstruction procedure that is based on repeating short subsequences of primary input patterns from the sequence. Experimental results demonstrate the effectiveness of the reconstruction procedure in increasing the fault coverage as part of a simulation-based sequential test generation procedure.

Journal ArticleDOI
Irith Pomeranz1
TL;DR: In this article, the authors developed a quantitative metric for assessing the ability of functional capture cycles to take the circuit into its functional state space, and ensure functional operation conditions, based on the distances between the states that the circuit traverses during functional capture cycle and reachable states that a circuit can enter during functional operation.
Abstract: Several test generation procedures are based on the expectation that after clocking a circuit in functional mode for several clock cycles, the circuit enters its functional state space and operates under functional operation conditions. Functional operation conditions are important for avoiding overtesting of delay faults. This study develops a quantitative metric for assessing the ability of functional capture cycles to take the circuit into its functional state space, and ensure functional operation conditions. The metric is based on the distances between the states that the circuit traverses during functional capture cycles and reachable states that the circuit can enter during functional operation. The paper also describes a procedure for modifying a test set so as to reduce the values of the metric for its tests.