scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Circuits and Systems Magazine in 2021"


Journal ArticleDOI
TL;DR: In this paper, the authors survey the recent progresses in SRAM and RRAM-based CIM macros that have been demonstrated in silicon and discuss general design challenges of the CIM chips including analog-to-digital conversion bottleneck, variations in analog compute, and device non-idealities.
Abstract: Compute-in-memory (CIM) is a new computing paradigm that addresses the memory-wall problem in hardware accelerator design for deep learning. The input vector and weight matrix multiplication, i.e., the multiply-and-accumulate (MAC) operation, could be performed in the analog domain within memory sub-array, leading to significant improvements in throughput and energy efficiency. Static random access memory (SRAM) and emerging non-volatile memories such as resistive random access memory (RRAM) are promising candidates to store the weights of deep neural network (DNN) models. In this review, firstly we survey the recent progresses in SRAM and RRAM based CIM macros that have been demonstrated in silicon. Then we discuss general design challenges of the CIM chips including analog-to-digital conversion (ADC) bottleneck, variations in analog compute, and device non-idealities. Next we introduce the DNN+NeuroSim benchmark framework that is capable of evaluating versatile device technologies for CIM inference and training performance from software/hardware co-design's perspective.

94 citations


Journal ArticleDOI
TL;DR: A survey of the state of the art in modeling, analysis, and control of epidemic dynamics can be found in this paper, where the authors discuss different approaches to epidemic modeling, ranging from the first implementations of scalar systems of differential equations, which describe the epidemic spreading at the population level, to the most recent models on dynamic networks, which capture the spatial spread and the time-varying nature of human interactions.
Abstract: During the ongoing COVID-19 pandemic, mathematical models of epidemic spreading have emerged as powerful tools to produce valuable predictions of the evolution of the pandemic, helping public health authorities decide which intervention policies should be implemented. The study of these modelsNgrounded in the systems theory and often analyzed using control-theoretic toolsNis an extremely important area for many researchers from different fields, including epidemiology, engineering, physics, mathematics, computer science, sociology, economics, and management. In this survey, we review the history and present the state of the art in the modeling, analysis, and control of epidemic dynamics. We discuss different approaches to epidemic modeling, either deterministic or stochastic, ranging from the first implementations of scalar systems of differential equations, which describe the epidemic spreading at the population level, to the most recent models on dynamic networks, which capture the spatial spread and the time-varying nature of human interactions.

48 citations


Journal ArticleDOI
TL;DR: An overview of previous work on FPGA-based robotic accelerators covering different stages of the robotic system pipeline is presented in this article, along with some commercial and space applications, to serve as a guide for future work.
Abstract: Recent researches on robotics have shown significant improvement, spanning from algorithms, mechanics to hardware architectures. Robotics, including manipulators, legged robots, drones, and autonomous vehicles, are now widely applied in diverse scenarios. However, the high computation and data complexity of robotic algorithms pose great challenges to its applications. On the one hand, CPU platform is flexible to handle multiple robotic tasks. GPU platform has higher computational capacities and easy-to-use development frameworks, so they have been widely adopted in several applications. On the other hand, FPGA-based robotic accelerators are becoming increasingly competitive alternatives, especially in latency-critical and power-limited scenarios. With specialized designed hardware logic and algorithm kernels, FPGA-based accelerators can surpass CPU and GPU in performance and energy efficiency. In this paper, we give an overview of previous work on FPGA-based robotic accelerators covering different stages of the robotic system pipeline. An analysis of software and hardware optimization techniques and main technical issues is presented, along with some commercial and space applications, to serve as a guide for future work.

35 citations


Journal ArticleDOI
TL;DR: The design of an FPGA architecture involves many different design choices starting from the high-level architectural parameters down to the transistor-level implementation details, with the goal of making a highly programmable device while minimizing the area and performance cost of reconfigurability as discussed by the authors.
Abstract: Since their inception more than thirty years ago, field-programmable gate arrays (FPGAs) have been widely used to implement a myriad of applications from different domains. As a result of their low-level hardware reconfigurability, FPGAs have much faster design cycles and lower development costs compared to custom-designed chips. The design of an FPGA architecture involves many different design choices starting from the high-level architectural parameters down to the transistor-level implementation details, with the goal of making a highly programmable device while minimizing the area and performance cost of reconfigurability. As the needs of applications and the capabilities of process technology are constantly evolving, FPGA architecture must also adapt. In this article, we review the evolution of the different key components of modern commercial FPGA architectures and shed the light on their main design principles and implementation challenges.

32 citations


Journal ArticleDOI
TL;DR: A comprehensive overview of the state of the art in non-conventional computer arithmetic can be found in this article, where several alternative computing models and emerging technologies are analyzed, such as nanotechnologies, superconductor devices, and biological and quantum-based computing.
Abstract: Arithmetic plays a major role in a computer?s performance and efficiency. Building new computing platforms supported by the traditional binary arithmetic and silicon-based technologies to meet the requirements of today?s applications is becoming increasingly more challenging, regardless whether we consider embedded devices or high-performance computers. As a result, a significant amount of research effort has been devoted to the study of nonconventional number systems to investigate more efficient arithmetic circuits and improved computer technologies to facilitate the development of computational units that can meet the requirements of applications in emergent domains. This paper presents an overview of the state of the art in nonconventional computer arithmetic. Several different alternative computing models and emerging technologies are analyzed, such as nanotechnologies, superconductor devices, and biological- and quantum-based computing, and their applications to multiple domains are discussed. A comprehensive approach is followed in a survey of the logarithmic and residue number systems, the hyperdimensional and stochastic computation models, and the arithmetic for quantum- and DNA-based computing systems and techniques for approximate computing. Technologies, processors and systems addressing these nonconventional computer arithmetic systems are also reviewed, taking into consideration some of the most prominent applications, such as deep learning or postquantum cryptography. In the end, some conclusions are drawn, and directions for future research on nonconventional computer arithmetic are discussed.

28 citations


Journal ArticleDOI
TL;DR: In this article, the authors present a survey of the state-of-the-art advances in human vital signs detection using radar sensors, their integration and coexistence with communication systems, and their issues in spectrum sharing.
Abstract: This paper presents a survey of the state-of-the-art advances in human vital signs detection using radar sensors, their integration and coexistence with communication systems, and their issues in spectrum sharing. The focus of this survey is to review the detection, monitoring, and tracking of vital signs, specifically the respiration rate and heartbeat rate over the recent five years. It is observed that in line with technological advancements, a multitude of radar types operating in diverse frequency spectra have been introduced with different hardware implementations, considering various detection scenarios, and applying multiple signal processing algorithms. The aim of these researches varies, from enhancing the detection accuracy, improving the processing speed, reducing the power consumption, simplifying the hardware used, lowering implementation costs, and the combinations of them. Besides that, this review also focuses on literature aimed at increasing the detection accuracy and reducing the processing time using FPGAs, prior to benchmarking them against other processing platforms. Finally, a perspective on the future of human vital signs detection using radar sensors concludes this review.

20 citations


Journal ArticleDOI
TL;DR: In this article, the authors focus on FPGA accelerators that have seen wide-scale deployment in large cloud infrastructures, and discuss the existing practices in big data analytics frameworks, discusses the aforementioned gap in development abstractions, and provides some perspectives on how to address these challenges in the future.
Abstract: The big data revolution has ushered an era with ever increasing volumes and complexity of data requiring ever faster computational analysis. During this very same era, CPU performance growth has been stagnating, pushing the industry to either scale their computation horizontally using multiple nodes in datacenters, or to scale vertically using heterogeneous components to reduce compute time. However, networking and storage continue to provide both higher throughput and lower latency, which allows for leveraging heterogeneous components, deployed in data centers around the world. Still, the integration of big data analytics frameworks with heterogeneous hardware components such as GPGPUs and FPGAs is challenging, because there is an increasing gap in the level of abstraction between analytics solutions developed with big data analytics frameworks, and accelerated kernels developed with heterogeneous components. In this article, we focus on FPGA accelerators that have seen wide-scale deployment in large cloud infrastructures. FPGAs allow the implementation of highly optimized hardware architectures, tailored exactly to an application, and unburdened by the overhead associated with traditional general-purpose computer architectures. FPGAs implementing dataflow-oriented architectures with high levels of (pipeline) parallelism can provide high application throughput, often providing high energy efficiency. Latency-sensitive applications can leverage FPGA accelerators by directly connecting to the physical layer of a network, and perform data transformations without going through the software stacks of the host system. While these advantages of FPGA accelerators hold promise, difficulties associated with programming and integration limit their use. This article explores the existing practices in big data analytics frameworks, discusses the aforementioned gap in development abstractions, and provides some perspectives on how to address these challenges in the future.

14 citations


Journal ArticleDOI
TL;DR: In this paper, the authors propose a distributed control mechanism for high-level synthesis (HLS) tools to handle dynamic events, which can exploit the same optimization opportunities as standard HLS circuits, but also introduce HLS features similar to those of modern superscalar processors.
Abstract: High-level synthesis (HLS) tools generate hardware designs from high-level programming languages and should liberate designers from the details of hardware description languages like VHDL and Verilog. HLS tools typically build datapaths that are controlled using a centralized controller, which relies on a compile-time schedule to determine the clock cycle when each operation executes. Such an approach results in high-throughput pipelined designs only in cases where memory accesses are provably independent and critical control decisions are determinable during code compilation. Unfortunately, when this is not the case, current tools must make pessimistic assumptions, yielding inferior schedules and lower performance. Recent advances in HLS have explored methods to overcome the conservatism in static scheduling and to remove the inability of HLS tools to handle dynamic events. Dataflow circuits play a significant role in this context: they are built out of units that communicate using point-to-point pairs of handshake control signals and this distributed control mechanism effectively implements a dynamic schedule, adapted at runtime to particular memory and control outcomes. Dataflow circuits can exploit the same optimization opportunities as standard HLS circuits (i.e., pipelining and resource sharing), but also introduce to HLS features similar to those of modern superscalar processors (i.e., out-of-order memory accesses and speculative execution), which are key for HLS to be successful in new contexts and broader application domains.

11 citations


Journal ArticleDOI
TL;DR: In this paper, the tradeoff between accuracy and power consumption under various background noises is achieved by using the 8-stage radix-2 single-path delay feedback FFT (R2SDF-FFT) and the precision self-adaptive architecture with approximate computing.
Abstract: This paper presents system-architecture-circuits co-designs for computing the MFCC feature extraction for speech keywords recognition. The trade-off between accuracy and power consumption under various background noises is achieved by using the 8-stage radix-2 single-path delay feedback FFT (R2SDF-FFT) and the precision self-adaptive architecture with approximate computing. The R2SDF-FFT structure with the fine-grained bit-width quantization can reduce 35.7% of memory size. Approximate multiplication and addition with Dual-Vdd are proposed to further improve the FFT computing energy efficiency. Finally, we present the precision self-adaptive MFCC architecture with the proposed FFT, which can be dynamically configured to use two calculation modes with different hardware settings according to the input speech background noise. Implemented and evaluated under 22 nm technology, the power consumption of the proposed design can be reduced by up to 76.3%, while the accuracy increased by 0.8%.

10 citations


Journal ArticleDOI
TL;DR: In this article, a comprehensive review of post-CMOS technologies based hardware security primitives and methodologies, particularly true random number generators, physically unclonable functions, sidechannel analysis countermeasures, and hardware obfuscation techniques, are presented.
Abstract: Emerging nanoelectronic semiconductor devices have been quite promising in enhancing hardware-oriented security and trust. However, implementing hardware security primitives and methodologies requires large area overhead and power consumption. Furthermore, emerging new attack models and vulnerabilities are regularly evolving and cannot be adequately addressed by current CMOS technology. This paper for the first time presents a comprehensive review of numerous post-CMOS technologies based hardware security primitives and methodologies, particularly true random number generators, physically unclonable functions, sidechannel analysis countermeasures, and hardware obfuscation techniques. Various beyond-CMOS device technologies including tunneling FET (TFET), hybrid phase transition FET (HyperFET), carbon nanotube FET (CNTFET), silicon nanowire FET (SiNWFET), symmetrical tunneling FET (SymFET), phase-change memory (PCM), spin-transfer torque magnetic tunnel junction (STT-MTJ), resistive random access memory (RRAM) have been considered in this study. First, the basic principle of operation and unusual characteristics of nanoelectronic devices used for hardware security applications have been extensively discussed. Later, CMOS technology challenges and benefits of emerging nanotechnologies for the design of hardware security primitives and methodologies have been reported. Finally, different analyses have been presented to demonstrate the promising performance of post-CMOS devices over the current CMOS technology in different countermeasures. Additionally, challenges, future directions, and plans have been presented to achieve more research outcomes in this field.

9 citations


Journal ArticleDOI
TL;DR: In this paper, the authors look at trends in deep learning research that present new opportunities for domain-specific hardware architectures and explore how next-generation compilation tools might support them, and they explore how to exploit domain specific information to improve the performance of deep learning workloads.
Abstract: With the continued slowing of Moore?s law and Dennard scaling, it has become more imperative that hardware designers make the best use of domain-specific information to improve designs. Gone are the days when we could rely primarily on silicon process technology improvements to provide faster and more efficient computation. Instead, architectural improvements are necessary to provide improved performance, power reduction, and/or reduced cost. Nowhere is this more apparent than when looking at Deep Learning workloads. Cutting-edge techniques achieving state-of-the-art training accuracy demand ever-larger training data-sets and more-complex network topologies, which results in longer training times. At the same time, after training these networks, we expect them to be deployed widely. As a result, executing large networks efficiently becomes critical, whether that execution is done in a data center or in an embedded system. In this article, we look at trends in deep learning research that present new opportunities for domain-specific hardware architectures and explore how next-generation compilation tools might support them.

Journal ArticleDOI
TL;DR: ArtificialVentilator as discussed by the authors is a less than $ 200 artificial ventilator that can be used against COVID-19 pandemic using low-cost easy-to-find materials, it was designed for helping developing countries where supplies for building new medical equipments are limited.
Abstract: In this paper, a less than $ 200 artificial ventilator that can be used against COVID-19 pandemic is presented. Using low-cost easyto-find materials, it has been designed for helping developing countries where supplies for building new medical equipments are limited. It complies with medical requirements, allowing to monitor and adjust ventilation parameters such as tidal volume, maximum intra-lung pressure and breath rate. Even if this ventilator is low cost, focus has been placed on improving its global reliability. Using low-cost recycled materials may lead to mechanical failures, this potential drawback is addressed with an intelligent embedded hardware failure detector implemented inside the microcontroller. Using K-means optimized algorithm, it learns in a short time normal operation corresponding to the couple formed by a given ventilator set-up and a patient. In case of a mechanical breakdown, an alert is generated to inform medical staff. First, mechanical, electrical and software architectures of the system are presented, then hardware failure detection algorithm is detailed. Finally, test results done at IRBA using an artificial lung are discussed. The overall project has been published as an open source one on GitHub: https://github.com/iutgeiitoulon/ArtificialVentilator.

Journal ArticleDOI
TL;DR: In this paper, the authors showed that the average current through a periodically inverted capacitor C remains the same when it is replaced by a resistor of value C = {T}/{2}{C}$, where T is the period of inversions.
Abstract: The first publication of a switched-capacitor resistor occurred nearly 150 years ago in James Clark Maxwell’s pioneering book A Treatise on Electricity and Magnetism [1] . He pointed out that the average current through a periodically inverted capacitor C remains the same when it is replaced by a resistor of value ${R} = {T}/{2}{C}$ , where ${T}$ is the period of inversions. He used this equivalence to give a method for measuring capacitance using a circuit employing a battery and galvanometer. The next publication came nearly a century later, when D. L. Fried proposed the idea of sampled-data analog filters [2] , containing only switches, capacitors and (if necessary) amplifiers. His paper showed how to realize the equivalent of a resistor using two switches and a capacitor. A motivation for using such circuits may be found from the history of analog filters. These were developed for telephony, and used initially resistors, capacitors and inductors. Inductors were bulky and lossy, and were replaced at the earliest opportunity by alternative circuits using amplifiers. The resulting active-RC filters were realized by discrete elements: capacitors, resistors and integrated-circuit amplifiers. With the development of integrated-circuit technology, there was strong motivation to put these filters on a single-substrate IC. However, the absolute values of resistors and capacitors could only poorly be controlled by the fabrication process: errors of 20-30% were common. Since the errors of resistors and capacitors were not tracking each other, the errors of time constants given by RC products were unacceptably high. This made their frequency responses unpredictable. Trimming could be used to tune such filters, but this was expensive. When the resistors were replaced by their switched-capacitor (SC) equivalents, the RC time constants were replaced by time constants of the form ${TC}_{1}/{C}_{2}$ . Since the switching period ${T}$ can be accurately controlled, as can the ratio of on-chip capacitors, the SC filters (SCFs) could be implemented with high accuracy. The design of such filters initially was based on those of active-RC ones, but it was soon recognized that they can be more effectively designed directly in the sampled-data domain, in terms of the ${z}$ variable, similarly to digital filters. Although other design techniques exist, the most popular one constructs higher-order filters as a cascade of lower-order sections, such as biquadratic filter or “biquad” shown in Figure 1 .

Journal ArticleDOI
TL;DR: In this article, the authors introduce the great scientific contributions of A.F. Ioffe and several research projects of the members of the IEEE CAS Chapter, St. Petersburg.
Abstract: October 2020 has become remarkable for commemorating the 140th birthday of the outstanding academician A.F. Ioffem "the father of Soviet physics." A.F. Ioffe supervised the first research projects on semiconductor physics, in particular, thermoelectricity. In this article, we introduce the great scientific contributions of A.F. Ioffe and several research projects of the members of the IEEE CAS Chapter, St. Petersburg, based on the results of Ioffe's scientific group.


Journal ArticleDOI
TL;DR: U=RIsolve as discussed by the authors is a web-based framework for teaching and self-learning the Node Voltage Method (NVM) for electrical circuit analysis, which can be used to learn circuit analysis for electrical engineering undergraduate students.
Abstract: Learning circuit analysis is often a challenging issue for Electrical Engineering undergraduate students. During the course's early stages, procedures such as the correct identification of Branches, Nodes and Loops, the expression of Kirchhoff Current and Voltage Laws, as well as the application of analysis/simplification Theorems and Algorithms, usually embed tricky and error-prone steps for beginners. While circuit simulators offer a validation for theoretical and experimental analysis, they do not explain how to obtain the results nor the fundamental laws supporting them. In this regard, a new web-based framework, - dubbed U=RIsolve (read as "you resolve") - for teaching and self-learning the Node Voltage Method (NVM) is presented in this paper. The U=RIsolve application goes far beyond the capabilities of traditional circuit simulators, as it outputs the fundamental circuit information, the methodology, the equations and the results related to the NVM. An outlook of the application main features, user interface and usage in electrical circuit analysis is provided, along with the future implementations.

Journal ArticleDOI
TL;DR: Presents information on the SERESSA 2020-16th International School on the Effects of Radiation on Embedded Systems for Space Applications on the effects of radiation on embedded systems for space applications.
Abstract: Presents information on the SERESSA 2020-16th International School on the Effects of Radiation on Embedded Systems for Space Applications.


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors designed a vision-controlled automatic wheelchair based on the pupil center and cornea reflection (PCCR) technique, which realized eye-controlled movements and other auxiliary functions such as emergency communication and rollover alarm.
Abstract: Quadriplegia is an extremely serious disease. Due to the impaired function of patients' limbs, it is difficult for them to move freely by a wheelchair. This dilemma has become a huge obstacle to the rehabilitation of the disabled, and meanwhile added extra burden to society. To improve the quality of their lives, our team designs a vision-controlled automatic wheelchair based on the pupil center and cornea reflection (PCCR) technique. By constructing a multi-sensor real-time intelligent control system, the wheelchair realizes eye-controlled movements and other auxiliary functions such as emergency communication and rollover alarm. Through continuous optimization of multiple versions, a more intelligent, highly reliable, and low-cost wheelchair is manufactured.

Journal ArticleDOI
TL;DR: ProtonDx will provide a response to the COVID-19 pandemic by bringing nucleic-acid based molecular diagnostics to the palm of your hand and will support the deployment of the Lacewing technology, which achieves accurate, rapid, handheld and low cost detection of SARS-CoV-2 and other respiratory infections.
Abstract: ProtonDx will provide a response to the COVID-19 pandemic by bringing nucleic-acid based molecular diagnostics to the palm of your hand. It will support the deployment of the Lacewing technology, which achieves accurate, rapid, handheld and low cost detection of SARS-CoV-2 and other respiratory infections. Results are synchronized to electronic health records and geotagged for real-time surveillance of disease progression. The device was designed for use at the point of need, in places such as pharmacies, schools and workplaces. Its unique approach combines standard semiconductor technology, advanced molecular biology and 3D printed microfluidics to match the performance of a bench-based instrument. Clinical trials are currently in progress at Imperial NHS Trust, London, UK which will lead to regulatory approvals and commercialization in the next few months.