scispace - formally typeset
Search or ask a question
Author

Xi Li

Bio: Xi Li is an academic researcher from University of Science and Technology of China. The author has contributed to research in topics: Speedup & Scheduling (computing). The author has an hindex of 18, co-authored 166 publications receiving 1359 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper designs deep learning accelerator unit (DLAU), which is a scalable accelerator architecture for large-scale deep learning networks using field-programmable gate array (FPGA) as the hardware prototype and employs three pipelined processing units to improve the throughput.
Abstract: As the emerging field of machine learning, deep learning shows excellent ability in solving complex learning problems. However, the size of the networks becomes increasingly large scale due to the demands of the practical applications, which poses significant challenge to construct a high performance implementations of deep learning neural networks. In order to improve the performance as well as to maintain the low power cost, in this paper we design deep learning accelerator unit (DLAU), which is a scalable accelerator architecture for large-scale deep learning networks using field-programmable gate array (FPGA) as the hardware prototype. The DLAU accelerator employs three pipelined processing units to improve the throughput and utilizes tile techniques to explore locality for deep learning applications. Experimental results on the state-of-the-art Xilinx FPGA board demonstrate that the DLAU accelerator is able to achieve up to $36.1 {\times }$ speedup comparing to the Intel Core2 processors, with the power consumption at 234 mW.

268 citations

Journal ArticleDOI
TL;DR: A new architecture for FPGA-based CNN accelerator that maps all the layers to their own on-chip units and working concurrently as a pipeline is proposed, which can achieve maximum resource utilization as well as optimal computational efficiency.
Abstract: Recently, field-programmable gate arrays (FPGAs) have been widely used in the implementations of hardware accelerator for convolutional neural networks (CNNs). However, most of these existing accelerators are designed in the same idea as their ASIC counterparts, in which all operations from different layers are mapped to the same hardware units and working in a multiplexed way. This manner does not take full advantage of reconfigurability and customizability of FPGAs, resulting in a certain degree of computational efficiency degradation. In this paper, we propose a new architecture for FPGA-based CNN accelerator that maps all the layers to their own on-chip units and working concurrently as a pipeline. A comprehensive mapping and optimizing methodology based on establishing roofline model oriented optimization model is proposed, which can achieve maximum resource utilization as well as optimal computational efficiency. Besides, to ease the programming burden, we propose a design framework which can provide a one-stop function for developers to generate the accelerator with our optimizing methodology. We evaluate our proposal by implementing different modern CNN models on Xilinx Zynq-7020 and Virtex-7 690t FPGA platforms. Experimental results show that our implementations can achieve a peak performance of 910.2 GOPS on Virtex-7 690t, and 36.36 GOP/s/W energy efficiency on Zynq-7020, which are superior to the previous approaches.

80 citations

Journal ArticleDOI
01 Jan 2016
TL;DR: Experimental results on the prototype system demonstrate NeverStop can efficiently facilitate researchers to reduce the average waiting time for vehicles, and a genetic algorithm illustrating how theaverage waiting time is derived is presented.
Abstract: We presentNeverStop, which utilizes genetic algorithms and fuzzy control methods in big data intelligent transportation systems.We integrate the fuzzy control module into the NeverStop system design. The fuzzy control architecture completes the integration and modeling of the traffic control systems.We present a genetic algorithm illustrating how the average waiting time is derived. The involvement amplifies the NeverStop system and facilitates the fuzzy control module.NeverStop utilizes fuzzy control method and genetic algorithm to adjust the waiting time for the traffic lights, consequently the average waiting time can be significantly reduced. The academic and industry have entered big data era in many computer software and embedded system related fields. Intelligent transportation system problem is one of the important areas in the real big data application scenarios. However, it is posing significant challenge to manage the traffic lights efficiently due to the accumulated dynamic car flow data scale. In this paper, we present NeverStop, which utilizes genetic algorithms and fuzzy control methods in big data intelligent transportation systems. NeverStop is constructed with sensors to control the traffic lights at intersection automatically. It utilizes fuzzy control method and genetic algorithm to adjust the waiting time for the traffic lights, consequently the average waiting time can be significantly reduced. A prototype system has been implemented at an EBox-II terminal device, running the fuzzy control and genetic algorithms. Experimental results on the prototype system demonstrate NeverStop can efficiently facilitate researchers to reduce the average waiting time for vehicles.

62 citations

Proceedings ArticleDOI
04 May 2015
TL;DR: This work uses FPGA to design a deep learning accelerator, the accelerator focuses on the implementation of the prediction process, data access optimization and pipeline structure, and can achieve promising result.
Abstract: Recently, machine learning is widely used in applications and cloud services And as the emerging field of machine learning, deep learning shows excellent ability in solving complex learning problems To give users better experience, high performance implementations of deep learning applications seem very important As a common means to accelerate algorithms, FPGA has high performance, low power consumption, small size and other characteristics So we use FPGA to design a deep learning accelerator, the accelerator focuses on the implementation of the prediction process, data access optimization and pipeline structure Compared with Core 2 CPU 23GHz, our accelerator can achieve promising result

57 citations

Proceedings ArticleDOI
11 Nov 2010
TL;DR: A dynamic role-dependent reputation evaluation mechanism is presented to determine whether an incoming traffic message is significant and trustworthy to the driver, and the proposed system can effectively prevent false messages spread on VANET environments.
Abstract: Vehicular Ad Hoc Networks (VANETs) are received more and more researchers' attention with their promising functions in road safety by exchanging real-time warning messages through vehicle-to-infrastructure (V2I) and vehicle-to-vehicle (V2V) communication. However, an inaccurate traffic warning message will impact drivers' decisions, misguide drivers' behavior, and even invoke serious car accidents. In this paper, we proposed an event-based reputation model to filter bogus warning messages. Our solution is classifying all vehicles encounter the same traffic event different roles. A dynamic role-dependent reputation evaluation mechanism is presented to determine whether an incoming traffic message is significant and trustworthy to the driver. The simulation results show that our proposed system can effectively prevent false messages spread on VANET environments.

53 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A comprehensive review of historical and recent state-of-the-art approaches in visual, audio, and text processing; social network analysis; and natural language processing is presented, followed by the in-depth analysis on pivoting and groundbreaking advances in deep learning applications.
Abstract: The field of machine learning is witnessing its golden era as deep learning slowly becomes the leader in this domain. Deep learning uses multiple layers to represent the abstractions of data to build computational models. Some key enabler deep learning algorithms such as generative adversarial networks, convolutional neural networks, and model transfers have completely changed our perception of information processing. However, there exists an aperture of understanding behind this tremendously fast-paced domain, because it was never previously represented from a multiscope perspective. The lack of core understanding renders these powerful methods as black-box machines that inhibit development at a fundamental level. Moreover, deep learning has repeatedly been perceived as a silver bullet to all stumbling blocks in machine learning, which is far from the truth. This article presents a comprehensive review of historical and recent state-of-the-art approaches in visual, audio, and text processing; social network analysis; and natural language processing, followed by the in-depth analysis on pivoting and groundbreaking advances in deep learning applications. It was also undertaken to review the issues faced in deep learning such as unsupervised learning, black-box models, and online learning and to illustrate how these challenges can be transformed into prolific future research avenues.

824 citations

Journal ArticleDOI
TL;DR: The emerging researches of deep learning models for big data feature learning are reviewed and the remaining challenges of big data deep learning are pointed out and the future topics are discussed.
Abstract: Deep learning, as one of the most currently remarkable machine learning techniques, has achieved great success in many applications such as image analysis, speech recognition and text understanding. It uses supervised and unsupervised strategies to learn multi-level representations and features in hierarchical architectures for the tasks of classification and pattern recognition. Recent development in sensor networks and communication technologies has enabled the collection of big data. Although big data provides great opportunities for a broad of areas including e-commerce, industrial control and smart medical, it poses many challenging issues on data mining and information processing due to its characteristics of large volume, large variety, large velocity and large veracity. In the past few years, deep learning has played an important role in big data analytic solutions. In this paper, we review the emerging researches of deep learning models for big data feature learning. Furthermore, we point out the remaining challenges of big data deep learning and discuss the future topics.

785 citations

Journal ArticleDOI
24 Jun 2019
TL;DR: In this paper, the authors review state-of-the-art approaches in these areas as well as explore potential solutions to address these challenges, including providing enough computing power, redundancy, and security so as to guarantee the safety of autonomous vehicles.
Abstract: Safety is the most important requirement for autonomous vehicles; hence, the ultimate challenge of designing an edge computing ecosystem for autonomous vehicles is to deliver enough computing power, redundancy, and security so as to guarantee the safety of autonomous vehicles. Specifically, autonomous driving systems are extremely complex; they tightly integrate many technologies, including sensing, localization, perception, decision making, as well as the smooth interactions with cloud platforms for high-definition (HD) map generation and data storage. These complexities impose numerous challenges for the design of autonomous driving edge computing systems. First, edge computing systems for autonomous driving need to process an enormous amount of data in real time, and often the incoming data from different sensors are highly heterogeneous. Since autonomous driving edge computing systems are mobile, they often have very strict energy consumption restrictions. Thus, it is imperative to deliver sufficient computing power with reasonable energy consumption, to guarantee the safety of autonomous vehicles, even at high speed. Second, in addition to the edge system design, vehicle-to-everything (V2X) provides redundancy for autonomous driving workloads and alleviates stringent performance and energy constraints on the edge side. With V2X, more research is required to define how vehicles cooperate with each other and the infrastructure. Last, safety cannot be guaranteed when security is compromised. Thus, protecting autonomous driving edge computing systems against attacks at different layers of the sensing and computing stack is of paramount concern. In this paper, we review state-of-the-art approaches in these areas as well as explore potential solutions to address these challenges.

369 citations

Journal ArticleDOI
TL;DR: A novel classification framework is proposed that provides a full picture of current literature on where and how BDA has been applied within the SCM context and reveals a number of research gaps, which leads to future research directions.
Abstract: The rapid growing interest from both academics and practitioners towards the application of Big Data Analytics (BDA) in Supply Chain Management (SCM) has urged the need of review up-to-date research development in order to develop new agenda. This review responds to this call by proposing a novel classification framework that provides a full picture of current literature on where and how BDA has been applied within the SCM context. The classification framework is structured based on the content analysis method of Mayring (2008), addressing four research questions on (1) what areas of SCM that BDA is being applied, (2) what level of analytics is BDA used in these application areas, (3) what types of BDA models are used, and finally (4) what BDA techniques are employed to develop these models. The discussion tackling these four questions reveals a number of research gaps, which leads to future research directions.

329 citations

Journal ArticleDOI
TL;DR: The techniques investigated in this paper represent the recent trends in the FPGA-based accelerators of deep learning networks and are expected to direct the future advances on efficient hardware accelerators and to be useful for deep learning researchers.
Abstract: Due to recent advances in digital technologies, and availability of credible data, an area of artificial intelligence, deep learning, has emerged and has demonstrated its ability and effectiveness in solving complex learning problems not possible before. In particular, convolutional neural networks (CNNs) have demonstrated their effectiveness in the image detection and recognition applications. However, they require intensive CPU operations and memory bandwidth that make general CPUs fail to achieve the desired performance levels. Consequently, hardware accelerators that use application-specific integrated circuits, field-programmable gate arrays (FPGAs), and graphic processing units have been employed to improve the throughput of CNNs. More precisely, FPGAs have been recently adopted for accelerating the implementation of deep learning networks due to their ability to maximize parallelism and their energy efficiency. In this paper, we review the recent existing techniques for accelerating deep learning networks on FPGAs. We highlight the key features employed by the various techniques for improving the acceleration performance. In addition, we provide recommendations for enhancing the utilization of FPGAs for CNNs acceleration. The techniques investigated in this paper represent the recent trends in the FPGA-based accelerators of deep learning networks. Thus, this paper is expected to direct the future advances on efficient hardware accelerators and to be useful for deep learning researchers.

308 citations