Showing papers in "Iet Computers and Digital Techniques in 2017"

PDF

Open Access

Journal Article•DOI•

High-performance elliptic curve cryptography processor over NIST prime fields

[...]

Selim Hossain, Yinan Kong, Ehsan Saeedi, Niras C. Vayalil

01 Jan 2017-Iet Computers and Digital Techniques

TL;DR: The ECP proposed in this study over F p performs better than available hardware in terms of area and timing, and the area–delay product of this design is very low compared with similar designs.

...read moreread less

Abstract: This study presents a description of an efficient hardware implementation of an elliptic curve cryptography processor (ECP) for modern security applications. A high-performance elliptic curve scalar multiplication (ECSM), which is the key operation of an ECP, is developed both in affine and Jacobian coordinates over a prime field of size p using the National Institute of Standards and Technology standard. A novel combined point doubling and point addition architecture is proposed using efficient modular arithmetic to achieve high speed and low hardware utilisation of the ECP in Jacobian coordinates. This new architecture has been synthesised both in application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA). A 65 nm CMOS ASIC implementation of the proposed ECP in Jacobian coordinates takes between 0.56 and 0.73 ms for 224-bit and 256-bit elliptic curve cryptography, respectively. The ECSM is also implemented in an FPGA and provides a better delay performance than previous designs. The implemented design is area-efficient and this means that it requires not many resources, without any digital signal processing (DSP) slices, on an FPGA. Moreover, the area–delay product of this design is very low compared with similar designs. To the best of the authors’ knowledge, the ECP proposed in this study over F p performs better than available hardware in terms of area and timing.

...read moreread less

52 citations

Journal Article•DOI•

High-throughput multi-key elliptic curve cryptosystem based on residue number system

[...]

Shahzad Asif, Selim Hossain, Yinan Kong

01 Sep 2017-Iet Computers and Digital Techniques

TL;DR: This study proposes a multi-key ECC based on the residue number system that employees deep pipelining to allow the concurrent encryption of 21 keys and results are compared with existing ECC architectures.

...read moreread less

Abstract: Public-key cryptosystems such as elliptic curve cryptography (ECC) and Rivest–Shamir–Adleman (RSA) are widely used for data security in computing systems. ECC provides a high level of security with a much smaller key than RSA, which makes ECC a preferred choice in many applications. This study proposes a multi-key ECC based on the residue number system. The proposed architecture employees deep pipelining to allow the concurrent encryption of 21 keys. The proposed architectures are implemented on two different field programmable gate array (FPGA) platforms and results are compared with existing ECC architectures. The proposed implementation on Virtex-7 FPGA achieves a throughput of 1816 kbps at a clock frequency of 73 MHz.

...read moreread less

17 citations

Journal Article•DOI•

Image feature extraction algorithm based on CUDA architecture: case study GFD and GCFD

[...]

Haythem Bahri, Fatma Ezahra Sayadi, Randa Khemiri, Marwa Chouchene, Mohamed Atri - Show less +1 more

18 Jan 2017-Iet Computers and Digital Techniques

TL;DR: The authors introduce an algorithm to optimise the computing time of feature extraction methods for the colour image by choosing generalised Fourier descriptor (GFD) and generalised colour Fourier descriptors (GCFD) models.

...read moreread less

Abstract: Optimising computing times of applications is an increasingly important task in many different areas such as scientific and industrial applications. Graphics processing unit (GPU) is considered as one of the powerful engines for computationally demanding applications since it proposes a highly parallel architecture. In this context, the authors introduce an algorithm to optimise the computing time of feature extraction methods for the colour image. They choose generalised Fourier descriptor (GFD) and generalised colour Fourier descriptor (GCFD) models, as a method to extract the image feature for various applications such as colour object recognition in real-time or image retrieval. They compare the computing time experimental results on central processing unit and GPU. They also present a case study of these experimental results descriptors using two platforms: a NVIDIA GeForce GT525M and a NVIDIA GeForce GTX480. Their experimental results demonstrate that the execution time can considerably be reduced until 34× for GFD and 56× for GCFD.

...read moreread less

15 citations

Journal Article•DOI•

Area- and power-efficient iterative single/double-precision merged floating-point multiplier on FPGA

[...]

Hao Zhang, Dongdong Chen¹, Seok-Bum Ko•Institutions (1)

Intel¹

26 Jun 2017-Iet Computers and Digital Techniques

TL;DR: In this study, an area and power-efficient iterative floating-point (FP) multiplier architecture is designed and implemented on FPGA devices with pipelined architecture and achieves lower-power consumption.

...read moreread less

Abstract: In this study, an area and power-efficient iterative floating-point (FP) multiplier architecture is designed and implemented on FPGA devices with pipelined architecture. The proposed multiplier supports both single-precision (SP) and double-precision (DP) operations. The operation mode can be switched during run time by changing the precision selection signal. The Karatsuba algorithm is applied when mapping the mantissa multiplier in order to reduce the number of digital signal processing (DSP) blocks required. For DP operations, the iterative method is applied which require much less hardware than a fully pipelined DP multiplier and thus reduces the power consumption. To further reduce the power consumption, the unused logic blocks for a specific operation mode are disabled. Compared to previous work, the proposed multiplier can achieve 33% reduction of DSP blocks, 4.3% less look-up tables (LUTs), and 31.2% less flip-flops while having 4% faster clock frequency on Virtex-5 devices. Compared to the intellectual property core DP multiplier provided by the FPGA vendors, the proposed multiplier required less DSP blocks and achieves lower-power consumption. The mapping solutions and implementation results of the proposed multiplier on Xilinx Virtex-7 and Altera Arria-10 devices are also presented. In addition, the results of a direct implementation of the proposed architecture on STM-90 nm ASIC platform are reported.

...read moreread less

15 citations

Journal Article•DOI•

Evaluation of simulator tools and power-aware scheduling model for wireless sensor networks

[...]

Rym Cheour, Mohamed Wassim Jmal, Olfa Kanoun, Mohamed Abid

25 Apr 2017-Iet Computers and Digital Techniques

TL;DR: This study proposes an effective strategy to ensure an energy consumption gain that takes into account time constraints through a power-aware model based on the dynamic voltage and frequency scaling and the dynamic power management that are appropriate to the WSNs and on a global Earliest Deadline First scheduler.

...read moreread less

Abstract: The sharp increase of the wireless sensor networks (WSNs) performance has increased their power requirements. However, with a limited battery lifetime it is more and more difficult to deploy many more sensors with today's solutions. Therefore, the authors need to implement autonomous WSNs without any human intervention or external power supply. To this end, this study proposes an effective strategy to ensure an energy consumption gain that takes into account time constraints through a power-aware model based on the dynamic voltage and frequency scaling and the dynamic power management that are appropriate to the WSNs and on a global Earliest Deadline First scheduler. To select the most suitable simulator to integrate and simulate the developed models, >25 of the existing WSN simulators are outlined and evaluated. On the basis of this comparative study analysis, the authors chose the simulation tool for real-time multiprocessor scheduling (STORM) to validate their work for its multiple advantages.

...read moreread less

13 citations

Journal Article•DOI•

PSN-aware circuit test timing prediction using machine learning

[...]

Yu-Cheng Liu¹, Cheng-Yu Han¹, Shih-Yao Lin¹, James Chien-Mo Li¹•Institutions (1)

National Taiwan University¹

28 Feb 2017-Iet Computers and Digital Techniques

TL;DR: This study predicts circuit timing for all test patterns using three machine learning techniques, neural network (NN), support vector regression (SVR), and least-square boosting (LSBoost), and proposes four feature extractions to reduce the huge dimension of raw data.

...read moreread less

Abstract: Excessive power supply noise (PSN) such as IR drop can cause yield loss when testing very large scale integration chips. However, simulation of circuit timing with PSN is not an easy task. In this study, the authors predict circuit timing for all test patterns using three machine learning techniques, neural network (NN), support vector regression (SVR), and least-square boosting (LSBoost). To reduce the huge dimension of raw data, they propose four feature extractions: input/output transition (IOT), flip-flop transition in window (FFTW), switching activity in window (SAW), and terminal FF transition of long paths (PATH). SAW and FFTW are physical-aware features while PATH is a timing-aware feature. Their experimental results on leon3mp benchmark circuit (638 K gates, 2 K test patterns) show that, compared with the simple IOT method, SAW effectively reduced the dimension by up to 472 times, without significant impact on prediction accuracy [correlation coefficient = 0.79]. Their results show that NN has best prediction accuracy and SVR has the least under-prediction. LSBoost uses the least memory. The proposed method is more than six orders of magnitude faster than traditional circuit simulation tools.

...read moreread less

13 citations

Journal Article•DOI•

Efficient ASIC and FPGA implementation of cube architecture

[...]

Ranjan Kumar Barik, Manoranjan Pradhan

01 Jan 2017-Iet Computers and Digital Techniques

TL;DR: The results show that the proposed architecture for cube operation based on Yavadunam sutra of Vedic mathematics is useful for less area and high-speed application in microprocessor environment.

...read moreread less

Abstract: This study presents a generalised architecture for cube operation based on Yavadunam sutra of Vedic mathematics. This algorithm converts the cube of a large magnitude number into smaller magnitude number and addition operation. The Vedic sutra for decimal numbers is extended to binary radix-2 number system considering digital platforms. The cubic architecture is synthesised and simulated using Xilinx ISE 14.1 software and implemented on various Field-programmable gate array devices for comparison purpose. The Encounter(R) RTL Compiler RC13.10 v13.10-s006_1 of cadence tool is also used considering Application specific integrated circuit platform. The performance parameters such as delay, area and power are obtained from synthesis reports. The results show that the proposed architecture is useful for less area and high-speed application in microprocessor environment.

...read moreread less

12 citations

Journal Article•DOI•

Efficient and scalable cross-by-pass-mesh topology for networks-on-chip

[...]

Usman Ali Gulzari, Sheraz Anjum, Shahrukh Aghaa, Sarzamin Khan, Frank Sill Torres - Show less +1 more

03 Feb 2017-Iet Computers and Digital Techniques

TL;DR: Comparison of analytical results in terms of performance and costs for different network dimensions indicate that the proposed CBP-Mesh offers short latency, high throughput and good scalability at small increase in power and energy.

...read moreread less

Abstract: This study presents an efficient and scalable networks-on-chip (NoC) topology termed as cross-by-pass-mesh (CBP-Mesh). The proposed architecture is derived from the traditional mesh topology by addition of cross-by-pass links in the network. The design and impact of adding cross-by-pass links on the topology is analysed in detail with the help of synthetic, hotspot as well as embedded traffic traces. The advantages of proposed CBP-Mesh as compared with its competitor topologies include reduction in the network diameter, increase in bisection bandwidth, reduction in average numbers of hops, improvement in symmetry and regularity of the network. The synthetic traffic traces and some real embedded system workloads are applied on the proposed CBP-Mesh and its competitor two-dimensional-based NoC topologies. The comparison of analytical results in terms of performance and costs for different network dimensions indicate that the proposed CBP-Mesh offers short latency, high throughput and good scalability at small increase in power and energy.

...read moreread less

11 citations

Journal Article•DOI•

FPGA implementation of hardware efficient adaptive filter robust to impulsive noise

[...]

Chintan A. Parmar, Bhaskar Ramanadham, Anand D. Darji

01 May 2017-Iet Computers and Digital Techniques

TL;DR: A modified robust mixed norm (MRMN) adaptive filter algorithm robust to impulsive noise with higher convergence rate and lower steady state error (SSE) and a significant improvement in SSE and convergence speed is obtained compared with the existing adaptive filters for similar specifications.

...read moreread less

Abstract: Adaptive filters are prevalent in many real-time signal processing applications. Many adaptive algorithms already exist, but most of them assume white Gaussian noise as disturbance. However, for many applications such as electrocardiogram, foetus heart rate measurement, low frequency atmospheric noise, underwater acoustic noise and signal measurement in instrumentation, the impulsive noise is more common. This study presents a modified robust mixed norm (MRMN) adaptive filter algorithm robust to impulsive noise with higher convergence rate and lower steady state error (SSE). MRMN adaptive filter algorithm has been simulated using Matlab and Xilinx system generator high level synthesis tool and a significant improvement in SSE and convergence speed is obtained compared with the existing adaptive filter algorithms for similar specifications. The proposed algorithm is also described using VHDL and synthesised using Xilinx synthesiser tool in order to implement on field-programmable gate array (FPGA). The FPGA post route and place implementation results show nearly 90% reduction in resource utilisation and nearly 2.6 times improvement in clock frequency as compared with the existing FPGA based implementation for similar specifications.

...read moreread less

10 citations

Journal Article•DOI•

Extending multi-level STT-MRAM cell lifetime by minimising two-step and hard state transitions in hot bits

[...]

Imtiaz Ahmad, Mahmoud Imdoukh, Mohammad Gh. Alfailakawi

17 Aug 2017-Iet Computers and Digital Techniques

TL;DR: This work proposes a simple and intuitive dynamic encoding scheme that eliminates all TT and HT at hot locations, hence reducing energy consumption and improving MLC STT-MRAM lifetime.

...read moreread less

Abstract: Shifting market trends towards mobile, Internet of things, and data-centric applications create opportunities for emerging low-power non-volatile memories. The attractive features of spin-torque-transfer magnetic-RAM (STT-MRAM) make it a promising candidate for future on-chip cache memory. Two-bit multiple-level cell (MLC) STT-MRAMs suffer from higher write energy, performance overhead, and lower cell endurance when compared with single-level counterpart. These unwanted effects are mainly due to write operations known as two-step (TT) and hard transitions (HT). Here, the authors offer a solution to tackle write energy problem in MLC STT-MRAM by minimising the number of TT and HT transitions. By analysing real applications, it was observed that specific locations within a cache block undergo much more TT and HT transitions resulting in hot locations when compared with other ones (cold locations). These hot locations are more detrimental to the lifetime and reliability of MRAM device. In this work, the authors propose a simple and intuitive dynamic encoding scheme that eliminates all TT and HT at hot locations, hence reducing energy consumption and improving MLC STT-MRAM lifetime. Results on PARSEC benchmarks demonstrate the effectiveness and scalability of the proposed approach to potentially prolong MLC STT-MRAM lifetime.

...read moreread less

9 citations

Journal Article•DOI•

Two-phase colour-aware multicore real-time scheduler

[...]

Giovani Gracioli, Antônio Augusto Fröhlich

01 Jul 2017-Iet Computers and Digital Techniques

TL;DR: The results indicate that the proposed scheduler improves deadline tardiness and provides hard real-time guarantees by combining cache and task partitioning with scheduling optimisations.

...read moreread less

Abstract: A two-phase colour-aware real-time scheduler to reduce the contention caused by the cache coherence protocol due to accesses to shared cache partitions in a multicore processor is proposed. The first phase is a colour-aware task partitioning (CAP) algorithm that assigns tasks that share colours to a common processor whenever possible. The second phase is a dynamic colour-aware scheduler that detects cache coherence activities at run-time, preventing the execution of tasks that interfere with each other and thus reducing the contention caused by the cache coherence protocol. The authors compare the proposed scheduler with a CAP without run-time optimisation and with the best-fit decreasing heuristic in terms of deadline misses and tardiness of several task sets using a real-time operating system and a modern 8-core processor. The results indicate that the proposed scheduler improves deadline tardiness and provides hard real-time guarantees by combining cache and task partitioning with scheduling optimisations.

...read moreread less

Journal Article•DOI•

Dependability analysis of cyber physical systems

[...]

Temesgen Seyoum Alemayehu¹, Jai-Hoon Kim¹•Institutions (1)

Ajou University¹

12 Sep 2017-Iet Computers and Digital Techniques

TL;DR: The authors apply Markov chain to model and analyse the component dependability of CPSs and propose the recovery techniques to guarantee a high level of dependability so as to take care of assuring the continuity of system operation.

...read moreread less

Abstract: As cyber physical system (CPS) is often used in safety critical areas, dependability of the system is an important issue that needs to be analysed. Any failure on the components of the CPS could result in a degradation of the physical state, which then causes major harm to life and/or property. Since the concept of dependence leads to that of trust, the subsystems of the CPS should be dependable to each other to deliver the requested services as specified without failing during its operation. In this study, the authors apply Markov chain to model and analyse the component dependability of CPSs and propose the recovery techniques to guarantee a high level of dependability so as to take care of assuring the continuity of system operation.

...read moreread less

Journal Article•DOI•

Finite state machine-based fault tolerance technique with enhanced area and power of synthesised sequential circuits

[...]

Aiman H. El-Maleh

18 Apr 2017-Iet Computers and Digital Techniques

TL;DR: Experimental results demonstrate the effectiveness of the proposed algorithm in significantly reducing the area and power of synthesised sequential circuits while enhancing their fault tolerance.

...read moreread less

Abstract: Recently, a finite state machine-based fault tolerance technique for sequential circuits based on protecting few states with high probability of occurrence has been proposed. In this study, the authors propose an algorithm that starts with a given state assignment targeting the optimisation of either area or power and generates a state assignment that preserves the original state assignment and satisfies the fault tolerance requirements for the protected states. Experimental results demonstrate the effectiveness of the proposed algorithm in significantly reducing the area and power of synthesised sequential circuits while enhancing their fault tolerance.

...read moreread less

Journal Article•DOI•

Implementation of nMPRA CPU architecture based on preemptive hardware scheduler engine and different scheduling algorithms

[...]

Ionel Zagan¹, Vasile Gaitan¹•Institutions (1)

Ştefan cel Mare University of Suceava¹

12 Sep 2017-Iet Computers and Digital Techniques

TL;DR: This study describes a hardware implementation of the real-time scheduler named nHSE (hardware scheduler engine for n tasks) and presents the results obtained using the appropriate schedulability methods used in real- time environments.

...read moreread less

Abstract: Taking into consideration the requirements of real-time embedded systems, the processor scheduler must guarantee a constant scheduling frequency, providing determinism and predictability of tasks execution. The purpose of this study is to implement the nMPRA (multi pipeline register architecture) processor into field-programmable gate array, and to integrate the already existing scheduling methods, thus providing a preemptive schedulability analysis of the proposed architecture based on the pipeline assembly line and hardware scheduler. This study describes a hardware implementation of the real-time scheduler named nHSE (hardware scheduler engine for n tasks) and presents the results obtained using the appropriate schedulability methods used in real-time environments. The scheduling and task switch operations are the main source of non-determinism, being successfully dealt with real-time nMPRA concept, in order to improve the system's functionality. Some mechanisms used for synchronisation and inter-task communication are also taken into consideration.

...read moreread less

Journal Article•DOI•

Temperature gradient-aware thermal simulator for three-dimensional integrated circuits

[...]

Liang Ying Lu¹, Lih-Yih Chiou¹•Institutions (1)

National Cheng Kung University¹

01 Sep 2017-Iet Computers and Digital Techniques

TL;DR: A temperature gradient-aware thermal simulator for three-dimensional ICs (called 3D-TarGA) at the architectural level that considers the thermal effects in leakage power, thermal conductivity, thermal radiation, and thermal convection to reflect the physical–thermal interactive effects of ICs at the early stages of IC design.

...read moreread less

Abstract: Nowadays, thermal simulators of integrated circuits (ICs) at architectural level tend to neglect thermal effects in temperature-dependent factors (such as leakage power and thermal conductivity) and a heat dissipation mechanism for thermal radiation at the early stages of IC design. Hence, the analysis results of thermal simulators may be not sufficient to reflect the physical–thermal interactive effects of ICs. This study presents a temperature gradient-aware thermal simulator for three-dimensional ICs (called 3D-TarGA) at the architectural level. The temperature gradient-aware thermal analysis of 3D-TarGA considers the thermal effects in leakage power, thermal conductivity, thermal radiation, and thermal convection to reflect the physical–thermal interactive effects of ICs at the early stages of IC design. Experimental results show that the maximum absolute error for the temperature of IC with ignoring the thermal effects using 3D-TarGA is 1.62°C, in contrast to the published thermal simulator, HotSpot. Moreover, the maximum absolute difference for the temperature of IC by considering the thermal effects is 2.7°C, as compared with that when ignoring the thermal effects for 3D-TarGA.

...read moreread less

Journal Article•DOI•

Low-cost security aware HLS methodology

[...]

Anirban Sengupta, Saumya Bhadauria, Saraju P. Mohanty

28 Feb 2017-Iet Computers and Digital Techniques

TL;DR: This study presents exploration of a low-cost optimised HLS solution capable of handling hardware Trojan (providing security) that alters computational output and indicates significant reduction in the cost of security-aware HLS Solution through the proposed approach compared with a recent approach.

...read moreread less

Abstract: Owing to massive complexity of modern digital integrated circuits (ICs) disabling complete in-house development, globalisation of the design process establishes itself as an inevitable solution for faster and efficient design. However, globalisation incurs importing intellectual property (IP) cores from various third party vendors, rendering an IP susceptible to hardware threats. To provide trust and security in digital ICs within user constraints, design of a low-cost optimised dual modular redundant, through Trojan secured high-level synthesis (HLS) methodology, is crucial. This study presents exploration of a low-cost optimised HLS solution capable of handling hardware Trojan (providing security) that alters computational output. The key contributions of the study are as: (i) novel low-cost security-aware HLS approach; (ii) novel encoding for representing bacterium in the design space (comprising of candidate datapath resource configuration and vendor allocation information for Trojan secured solution); and (iii) novel exploration process of an efficient vendor allocation procedure that assists in yielding a low-cost Trojan secured schedule. Experimental results indicate significant reduction in the cost of security-aware HLS solution (82.4%) through the proposed approach compared with a recent approach.

...read moreread less

Journal Article•DOI•

Energy-aware online non-clairvoyant multiprocessor scheduling: multiprocessor priority round robin

[...]

Pawan Singh¹, Nirayo Hailu¹•Institutions (1)

Informatics Institute of Technology¹

01 Jan 2017-Iet Computers and Digital Techniques

TL;DR: An online non-clairvoyant scheduling multiprocessor priority round robin (MPRR) is introduced, which is O ( α ) - competitive; more precisely ( 2 α 2 / ( α − ( 1 / 2 ) ) ) -competitive, i.e. the competitive ratio is 5.33 for α = 2 and 7.3 for α’s = 3.

...read moreread less

Abstract: In the current epoch, the energy consumption is a great concern in the online non-clairvoyant job scheduling. The online non-clairvoyant scheduling is studied less extensively than the online clairvoyant scheduling. The authors study non-clairvoyant scheduling problem of minimising the total prioritised flow time plus energy, where the jobs with arbitrary sizes and priorities arrive online. The authors consider unbounded speed model in multiprocessor settings, where the speed of m individual processors can vary from zero to infinity, i.e. [0, ∞) , to save the energy and to optimise the prioritised flow time plus energy. The authors consider the traditional power function P s = s α , where s is the speed of a processor and α > 1, a constant. In this study, the authors introduce an online non-clairvoyant scheduling multiprocessor priority round robin (MPRR), which is O ( α ) -competitive; more precisely ( 2 α 2 / ( α − ( 1 / 2 ) ) ) -competitive, i.e. the competitive ratio is 5.33 for α = 2 and 7.3 for α = 3. In this study, the algorithm is studied using the potential analysis against an optimal offline adversary.

...read moreread less

Journal Article•DOI•

Collaborative fuzzy-based partially-throttling dynamic thermal management scheme for three-dimensional networks-on-chip

[...]

Gaizhen Yan¹, Ning Wu¹, Fen Ge¹, Hao Xiao¹, Fang Zhou¹ - Show less +1 more•Institutions (1)

Nanjing University of Aeronautics and Astronautics¹

01 Jan 2017-Iet Computers and Digital Techniques

TL;DR: Experiments show that, compared with the fully-throttling based vertical throttling scheme, the proposed CFP-DTM can improve the throughput by 27.5% and reduce the thermal control oscillation by 3°C under the maximum system workload.

...read moreread less

Abstract: Three-dimensional networks-on-chip are beneficial for performance improvement, but suffer from severe thermal issues. Dynamic thermal management (DTM) schemes have been proposed to keep the temperature below the thermal limit while improve the system performance. However, existing fully-throttling DTM schemes degrade the network availability and thus decrease the system performance. In this study, a novel collaborative fuzzy-based partially-throttling DTM (CFP-DTM) scheme is developed. Two main components are involved in the CFP-DTM: (i) a fuzzy-based clock gating scheme dynamically adjusting the throttling ratio and throttled nodes (ii) a highly adaptive throttling-aware routing scheme for packets to detour the easily congested channels. Experiments show that, compared with the fully-throttling based vertical throttling scheme, the proposed CFP-DTM can improve the throughput by 27.5% and reduce the thermal control oscillation by 3°C under the maximum system workload.

...read moreread less

Journal Article•DOI•

Formal verification methodology for real-time Field Programmable Gate Array

[...]

Shaista Jabeen¹, Sudarshan K. Srinivasan¹, Sana Shuja•Institutions (1)

Dakota State University¹

13 Jun 2017-Iet Computers and Digital Techniques

TL;DR: A formal verification methodology for checking both functional and timing requirements of real-time digital controllers targeted at field programmable gate array technology is proposed and one of the key ideas is the overloaded use of rank functions for timing verification.

...read moreread less

Abstract: A formal verification methodology for checking both functional and timing requirements of real-time digital controllers targeted at field programmable gate array technology is proposed. Timed transition systems (TTSs) are used to model both the digital controller circuit and the high-level specification requirements. Timed well-founded simulation (TWFS) refinement is used as the notion of correctness and defines what it means for an implementation TTS to satisfy a specification TTS. The primary contribution is a set of proof obligation templates (based on TWFS refinement) that account for both functional and timing requirements. The proof obligations generated using the templates can be checked using a decision procedure. One of the key ideas is the overloaded use of rank functions (that are typically used for liveness verification) for timing verification. The efficiency and scalability of the approach is demonstrated using three case studies.

...read moreread less

Journal Article•DOI•

Quadruple throughput fixed point quarter precision multiply accumulate circuit design

[...]

M. Mohamed Asan Basiri, S.k. Noor Mohammad

27 Apr 2017-Iet Computers and Digital Techniques

TL;DR: This study proposes an efficient very large scale integration (VLSI) architecture for quadruple throughput fixed point multiply accumulate circuit (MAC) and shows that the proposed architecture achieves better improvement in throughput than existing designs.

...read moreread less

Abstract: This study proposes an efficient very large scale integration (VLSI) architecture for quadruple throughput fixed point multiply accumulate circuit (MAC). The proposed n × n bits MAC is used to perform one n × n bits or two n × (n/2) bits or four (n/2) × (n/2) bits MAC operations in parallel. The objective of the proposed MAC is to improve throughput of the existing MAC designs. The proposed and existing designs are implemented by 45 nm CMOS TSMC library and the results show that the proposed architecture achieves better improvement in throughput than existing designs. For example, the proposed 32 × 32 bits MAC architecture achieves 60.4% of improvement in throughput over existing array multiplier-based double throughput MAC.

...read moreread less

Journal Article•DOI•

Impact of spintronic memory on multicore cache hierarchy design

[...]

Cong Ma¹, William Tuohy¹, David J. Lilja¹•Institutions (1)

University of Minnesota¹

01 Mar 2017-Iet Computers and Digital Techniques

TL;DR: This study motivates the use of STT-MRAM for the first-level caches of a multicore processor to reduce energy consumption without significantly degrading the performance, and proposes a proposed STT hierarchy which shows good scalability over the CMOS with a few benchmarks which scale significantly better.

...read moreread less

Abstract: Spintronic memory [spin-transfer torque-magnetic random access memory (STT-MRAM)] is an attractive alternative technology to CMOS since it offers higher density and virtually no leakage current. Spintronic memory continues to require higher write energy, however, presenting a challenge to memory hierarchy design when energy consumption is a concern. This study motivates the use of STT-MRAM for the first-level caches of a multicore processor to reduce energy consumption without significantly degrading the performance. The large STT-MRAM first-level cache implementation saves leakage power. Moreover, the use of small level-0 cache regains the performance drop due to STT-MRAM long write latencies. The combination of both reduces the energy-delay product by 65% on average compared with CMOS baseline. The proposed STT hierarchy also shows good scalability over the CMOS with a few benchmarks which scale significantly better. The PARSEC and Splash2 benchmark suites are analysed running on a modern multicore platform, comparing performance, energy consumption and scalability of the spintronic cache system to a CMOS design.

...read moreread less

Journal Article•DOI•

Hybrid shared‐memory and message‐passing multiprocessor system‐on‐chip for UWB MAC layer

[...]

Hao Xiao¹, Busheng Zheng¹, Tsuyoshi Isshiki, Hiroaki Kunieda•Institutions (1)

Nanjing University of Aeronautics and Astronautics¹

01 Jan 2017-Iet Computers and Digital Techniques

TL;DR: This study aims to optimise the VLSI implementation of the WiMedia MAC system by proposing a hybrid shared-memory and message-passing multiprocessor system-on-chip (MPSoC) architecture that achieves 24% performance improvement and 22% power savings over the conventional shared- memory-only one.

...read moreread less

Abstract: Ultra-wideband (UWB) is a well-known radio technology whose media access control (MAC) protocol, WiMedia MAC, has considerable potential to ensure high-speed and high-quality data communication for wireless personal area networks. However, these benefits involve a heavy computational workload, thereby posing a challenge to the conventional very-large-scale integration (VLSI) approach in terms of providing the required performance and power efficiency. Therefore, this study aims to optimise the VLSI implementation of the WiMedia MAC system by proposing a hybrid shared-memory and message-passing multiprocessor system-on-chip (MPSoC) architecture. The proposed solution combines the state-of-the-art MPSoC technology and application-specific instruction-set processor techniques to (i) accelerate the MAC protocol at task level by using parallel processing, (ii) enable the using of custom instructions to optimise the inter-processor communication by using an explicit message passing mechanism, and (iii) ease the implementation process by using a high-level software/hardware co-design methodology. The proposed platform is implemented on both system-level SystemC for architecture exploration and standard-cell technology for future chip implementation. Experimental results show that the proposed hybrid MPSoC architecture achieves 24% performance improvement and 22% power savings over the conventional shared-memory-only one.

...read moreread less

Journal Article•DOI•

Micro-architectural approach to the efficient employment of STTRAM cells in a microprocessor register file

[...]

Bahar Asgari, Mahdi Fazeli, Ahmad Patooghy, Seyed Vahid Azhari

01 Jan 2017-Iet Computers and Digital Techniques

TL;DR: The authors have proposed using two well-known micro-architectural approaches: the last-write-update (LWU) and the value locality (VL), to minimize the number of write operations into a register.

...read moreread less

Abstract: In this paper, the authors propose two approaches that employ spin transfer torque random access memory (STTRAM) in the design of the register file, an important part of embedded processors. However, STTRAM suffers from both endurance and latency in the write operation. Consequently, employing STTRAM in the register file entails two challenges: (i) the lifetime significantly decreases as data are frequently written into a register file; (ii) the delay of the critical path increases as a result of the slow write operation. They have proposed using two well-known micro-architectural approaches: the last-write-update (LWU) and the value locality (VL), to minimize the number of write operations into a register. The main observation behind LWU is that only the last write operations of certain registers in the re-order buffer should be considered. VL exploits the fact that a limited number of values are more likely to be written into a register file. Their simulation shows that by employing the LWU and VL approaches, the lifetime of the STTRAM-based register file, compared with the lifetime of traditional static RAM-based register file architecture, extends to about 40 years on average and around 3.5 years in the worst case while improving power consumption by 30%.

...read moreread less

Journal Article•DOI•

LTL transformation modulo positive transitions

[...]

Mustapha Bourahla

11 Aug 2017-Iet Computers and Digital Techniques

TL;DR: A new efficient algorithm for translating linear temporal logic (LTL) formulas to Buchi automata, which are used by LTL model checkers, using the principle of alternating automata and keeping only the positive transitions without generating the intermediate generalised automata.

...read moreread less

Abstract: In this study, the author presents a new efficient algorithm for translating linear temporal logic (LTL) formulas to Buchi automata, which are used by LTL model checkers. The general idea of this algorithm is to generate Buchi automata from LTL formulas, using the principle of alternating automata and keeping only the positive transitions without generating the intermediate generalised automata. The LTL translation is the heart of any LTL model checker, which affects its performance. The translation performance is measured in addition to its speed and the size of the produced Buchi automaton (number of states and number of transitions), by correctness of produced Buchi automaton and its level of determinism. The author will show that this method is different from the others and it is very competitive with the most efficient translators to date.

...read moreread less

Journal Article•DOI•

Fast and accurate circuit delay model for FPGA architectural exploration

[...]

Qiang Liu¹, HanJing Qian¹•Institutions (1)

Tianjin University¹

01 May 2017-Iet Computers and Digital Techniques

TL;DR: This study presents a fast and accurate delay model by extracting the key parameters affecting FPGA delay and by combining the classical Elmore equivalent model and the powerful learning capability of neural network.

...read moreread less

Abstract: Field programmable gate arrays (FPGAs) are adopted in many electronic systems, due to their design flexibility and high performance. For providing right FPGAs for different applications, FPGA architectural exploration is needed. Accurate estimation of area and delay of low-level FPGA circuits is required to evaluate different architecture candidates during the exploration. In this study, the authors present a fast and accurate delay model by extracting the key parameters affecting FPGA delay and by combining the classical Elmore equivalent model and the powerful learning capability of neural network. The derived model can be integrated with the existing FPGA architecture exploration flow perfectly. Experimental results show that compared with circuit simulator tool HSPICE, this model speeds up the delay estimation by 2863 times with the average error of 1.9% during the architectural exploration process. This fast and accurate estimation allows FPGA architects to explore more architectural options in limited time, resulting in optimised FPGA architecture.

...read moreread less

Journal Article•DOI•

Reconstruction of a Functional Test Sequence for Increased Fault Coverage

[...]

Irith Pomeranz¹•Institutions (1)

Purdue University¹

27 Apr 2017-Iet Computers and Digital Techniques

TL;DR: This study describes a reconstruction procedure that is based on repeating short subsequences of primary input patterns from the sequence that provides a new low-complexity option for increasing the fault coverage, and thus addressing the high computational complexity of sequential test generation.

...read moreread less

Abstract: Simulation-based sequential test generation procedures address the high computational complexity of sequential test generation by replacing the deterministic branch-and-bound process with lower-complexity processes. These processes introduce new primary input patterns into a functional test sequence in order to increase its fault coverage. This study observes that, even without introducing new primary input patterns, it is possible to increase the fault coverage of a functional test sequence by applying the same primary input patterns in different orders. This is referred to as reconstruction of the sequence. It provides a new low-complexity option for increasing the fault coverage, and thus addressing the high computational complexity of sequential test generation. This study describes a reconstruction procedure that is based on repeating short subsequences of primary input patterns from the sequence. Experimental results demonstrate the effectiveness of the reconstruction procedure in increasing the fault coverage as part of a simulation-based sequential test generation procedure.

...read moreread less

Journal Article•DOI•

Metric for the ability of functional capture cycles to ensure functional operation conditions

[...]

Irith Pomeranz¹•Institutions (1)

Purdue University¹

01 May 2017-Iet Computers and Digital Techniques

TL;DR: In this article, the authors developed a quantitative metric for assessing the ability of functional capture cycles to take the circuit into its functional state space, and ensure functional operation conditions, based on the distances between the states that the circuit traverses during functional capture cycle and reachable states that a circuit can enter during functional operation.

...read moreread less

Abstract: Several test generation procedures are based on the expectation that after clocking a circuit in functional mode for several clock cycles, the circuit enters its functional state space and operates under functional operation conditions. Functional operation conditions are important for avoiding overtesting of delay faults. This study develops a quantitative metric for assessing the ability of functional capture cycles to take the circuit into its functional state space, and ensure functional operation conditions. The metric is based on the distances between the states that the circuit traverses during functional capture cycles and reachable states that the circuit can enter during functional operation. The paper also describes a procedure for modifying a test set so as to reduce the values of the metric for its tests.

...read moreread less