Other affiliations: Imperial College London
Bio: Qiang Liu is an academic researcher from Tianjin University. The author has contributed to research in topics: Field-programmable gate array & Hardware Trojan. The author has an hindex of 17, co-authored 59 publications receiving 696 citations. Previous affiliations of Qiang Liu include Imperial College London.
Papers published on a yearly basis
TL;DR: An IC market model is elaborate to illustrate the potential HT threats faced by the parties involved in the model and categorize the recent research advances in the countermeasures against HT attacks.
Abstract: Hardware Trojans (HTs) can be implanted in security-weak parts of a chip with various means to steal the internal sensitive data or modify original functionality, which may lead to huge economic losses and great harm to society. Therefore, it is very important to analyze the specific HT threats existing in the whole life cycle of integrated circuits (ICs), and perform protection against hardware Trojans. In this paper, we elaborate an IC market model to illustrate the potential HT threats faced by the parties involved in the model. Then we categorize the recent research advances in the countermeasures against HT attacks. Finally, the challenges and prospects for HT defense are illuminated.
••03 Apr 2017
TL;DR: A novel method to optimise CNN-based object detection algorithms targeting embedded FPGA platforms by taking network architectures and resource constraints as input, and tunes hardware parameters with algorithm-specific information to explore the design space and achieve high performance.
Abstract: Algorithms based on Convolutional Neural Network (CNN) have recently been applied to object detection applications, greatly improving their performance. However, many devices intended for these algorithms have limited computation resources and strict power consumption constraints, and are not suitable for algorithms designed for GPU workstations. This paper presents a novel method to optimise CNN-based object detection algorithms targeting embedded FPGA platforms. Given parameterised CNN hardware modules, an optimisation flow takes network architectures and resource constraints as input, and tunes hardware parameters with algorithm-specific information to explore the design space and achieve high performance. The evaluation shows that our design model accuracy is above 85% and, with optimised configuration, our design can achieve 49.6 times speed-up compared with software implementation.
23 Sep 2008
TL;DR: A nonlinear optimization framework is proposed to automate exploration of the design space consisting of data-reuse (buffering) decisions and loop-level parallelization, in the context of field-programmable-gate-array-targeted hardware compilation, and shows that efficient solution techniques exist for this problem.
Abstract: A nonlinear optimization framework is proposed in this paper to automate exploration of the design space consisting of data-reuse (buffering) decisions and loop-level parallelization, in the context of field-programmable-gate-array-targeted hardware compilation. Buffering frequently accessed data in on-chip memories can reduce off-chip memory accesses and open avenues for parallelization. However, the exploitation of both data reuse and parallelization is limited by the memory resources available on-chip. As a result, considering these two problems separately, e.g., first exploring data reuse and then exploring data-level parallelization, based on the data-reuse options determined in the first step, may not yield the performance-optimal designs for limited on-chip memory resources. We consider both problems at the same time, exposing the dependence between the two. We show that this combined problem can be formulated as a nonlinear program and further show that efficient solution techniques exist for this problem, based on recent advances in optimization of so-called geometric programming problems. The results from applying this framework to several real benchmarks implemented on a Xilinx device demonstrate that given different constraints on on-chip memory utilization, the corresponding performance-optimal designs are automatically determined by the framework. We have also implemented designs determined by a two-stage optimization method that first explores data reuse and then explores parallelization on the same platform, and by comparison, the performance-optimal designs proposed by our framework are faster than the designs determined by the two-stage method by up to 5.7 times.
TL;DR: The proposed AES design can overcome most existing designs and achieves a throughput of 75.9 Gbps on a latest FPGA device and two parallel implementations of the proposed design can meet the real-time encryption/decryption demand for 100 Gbps data rate.
Abstract: Aiming at protection of high speed data, field programmable gate array (FPGA)-based advanced encryption standard (AES) design is proposed here. Deep investigation into the logical operations of AES with regard to FPGA architectures leads to two efficient pipelining structures for the AES hardware implementation. The two design options allow users to make a trade-off among speed, resource usage and power consumption. In addition, a new key expansion scheme is proposed to address the potential issues of existing key expansion scheme used in AES. The proposed key expansion scheme with additional non-linear operations increases the complexity of cracking keys by up to 2(N − 1) times for N-round AES. The proposed design is evaluated on various FPGA devices and is compared with several existing AES implementations. In terms of both throughput and throughput per slice, the proposed design can overcome most existing designs and achieves a throughput of 75.9 Gbps on a latest FPGA device. Two parallel implementations of the proposed design can meet the real-time encryption/decryption demand for 100 Gbps data rate. Furthermore, the proposed AES design is implemented on the Zynq xc7z020 FPGA platform, demonstrating its application to image encryption.
••01 Sep 2009
TL;DR: Harmonic is a toolchain that targets multiprocessor heterogeneous systems comprising different types of processing elements such as general-purposed processors, digital signal processors, and field-programmable gate arrays from a high-level C program.
Abstract: This paper describes Harmonic, a toolchain that targets multiprocessor heterogeneous systems comprising different types of processing elements such as general-purposed processors (GPPs), digital signal processors (DSP), and field-programmable gate arrays (FPGAs) from a high-level C program. The main goal of Harmonic is to improve an application by partitioning and optimising each part of the program, and selecting the most appropriate processing element in the system to execute each part. The core tools include a task transformation engine, a mapping selector, a data representation optimiser, and a hardware synthesiser. We also use the C language with source-annotations as intermediate representation for the toolchain, making it easier for users to understand and to control the compilation process.
01 Jan 2011
TL;DR: The method is suited to online forecasting in many applications and in this paper it is used to predict hourly values of solar power for horizons of up to 36 h, where the results indicate that for forecasts up to 2 h ahead the most important input is the available observations ofSolar power, while for longer horizons NWPs are theMost important input.
Abstract: This paper describes a new approach to online forecasting of power production from PV systems. The method is suited to online forecasting in many applications and in this paper it is used to predict hourly values of solar power for horizons of up to 36 h. The data used is 15-min observations of solar power from 21 PV systems located on rooftops in a small village in Denmark. The suggested method is a two-stage method where first a statistical normalization of the solar power is obtained using a clear sky model. The clear sky model is found using statistical smoothing techniques. Then forecasts of the normalized solar power are calculated using adaptive linear time series models. Both autoregressive (AR) and AR with exogenous input (ARX) models are evaluated, where the latter takes numerical weather predictions (NWPs) as input. The results indicate that for forecasts up to 2 h ahead the most important input is the available observations of solar power, while for longer horizons NWPs are the most important input. A root mean square error improvement of around 35% is achieved by the ARX model compared to a proposed reference model.
TL;DR: This article surveys Heterogeneous Computing Techniques (HCTs) such as workload partitioning that enable utilizing both CPUs and GPUs to improve performance and/or energy efficiency and reviews both discrete and fused CPU-GPU systems.
Abstract: As both CPUs and GPUs become employed in a wide range of applications, it has been acknowledged that both of these Processing Units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated a significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this article, we survey Heterogeneous Computing Techniques (HCTs) such as workload partitioning that enable utilizing both CPUs and GPUs to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler, and application levels. Further, we review both discrete and fused CPU-GPU systems and discuss benchmark suites designed for evaluating Heterogeneous Computing Systems (HCSs). We believe that this article will provide insights into the workings and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.
TL;DR: A convolutional neural network architecture in which the neural network is divided into hardware and software parts to increase performance and reduce the cost of implementation resources is proposed.
Abstract: Convolutional neural networks are a promising tool for solving the problem of pattern recognition. Most well-known convolutional neural networks implementations require a significant amount of memory to store weights in the process of learning and working. We propose a convolutional neural network architecture in which the neural network is divided into hardware and software parts to increase performance and reduce the cost of implementation resources. We also propose to use the residue number system (RNS) in the hardware part to implement the convolutional layer of the neural network. Software simulations using Matlab 2018b showed that convolutional neural network with a minimum number of layers can be quickly and successfully trained. The hardware implementation of the convolution layer shows that the use of RNS allows to reduce the hardware costs on 7.86%–37.78% compared to the two’s complement implementation. The use of the proposed heterogeneous implementation reduces the average time of image recognition by 41.17%.
01 Jan 2010
TL;DR: This paper proposes BlueChip, a defensive strategy that has both a design-time component and a runtime component that is able to prevent all hardware attacks the authors evaluate while incurring a small runtime overhead.
Abstract: The computer systems security arms race between attackers and defenders are largely taken place in the domain of software systems, but as hardware complexity and design processes have envolved, novel and potent hardware-based security threats are now possible. This article presents Unused Circuit Identification (UCI), an approach for detecting suspicious circuits during design time, and BlueChip, a hybrid hardware/software approach to detaching suspicious circuits and making up for UCI classifier errors during runtime.