scispace - formally typeset
Search or ask a question

Showing papers on "Programmable logic device published in 2014"


Proceedings ArticleDOI
23 Jun 2014
TL;DR: The nn-X system is presented, a scalable, low-power coprocessor for enabling real-time execution of deep neural networks, able to achieve a peak performance of 227 G-ops/s, which translates to a performance per power improvement of 10 to 100 times that of conventional mobile and desktop processors.
Abstract: Deep networks are state-of-the-art models used for understanding the content of images, videos, audio and raw input data. Current computing systems are not able to run deep network models in real-time with low power consumption. In this paper we present nn-X: a scalable, low-power coprocessor for enabling real-time execution of deep neural networks. nn-X is implemented on programmable logic devices and comprises an array of configurable processing elements called collections. These collections perform the most common operations in deep networks: convolution, subsampling and non-linear functions. The nn-X system includes 4 high-speed direct memory access interfaces to DDR3 memory and two ARM Cortex-A9 processors. Each port is capable of a sustained throughput of 950 MB/s in full duplex. nn-X is able to achieve a peak performance of 227 G-ops/s, a measured performance in deep learning applications of up to 200 G-ops/s while consuming less than 4 watts of power. This translates to a performance per power improvement of 10 to 100 times that of conventional mobile and desktop processors.

318 citations


Journal ArticleDOI
01 Jun 2014-Energy
TL;DR: The applications of artificial intelligence-based methods for tracking the maximum power point based upon neural networks, fuzzy logic, evolutionary algorithms, which include genetic algorithms, particle swarm optimization, ant colony optimization, and other hybrid methods are reviewed and analysed.

118 citations


Journal ArticleDOI
TL;DR: A feasibility and performance study of electrically reconfigurable nanowire transistors with selectable pFET and nFET operations is presented and a novel physical structure capable of computing a NAND as well as NOR function is introduced.
Abstract: A feasibility and performance study of electrically reconfigurable nanowire transistors with selectable pFET and nFET operations is presented. The challenges toward circuit implementation are evaluated based on transient simulations of logic circuits. A novel physical structure capable of computing a NAND as well as NOR function is introduced. The new approach provides a flexible platform to develop and test fine-grain reconfigurable circuits and systems.

100 citations


Journal ArticleDOI
TL;DR: This paper presents a hardware version of a modified Akers logic array, where the values stored within the array serve as primary inputs, and uses memristors, which are nonvolatile memory devices with noteworthy properties.

85 citations


Journal ArticleDOI
TL;DR: A novel field programmable gate array architecture with resistive random access memory (RRAM)-based programmable interconnects (FPGA-RPI) is introduced which has a 96% smaller footprint, 55% higher performance, and 79% lower power consumptions compared to other FPGA counterparts.
Abstract: In this paper we introduce a novel field programmable gate array (FPGA) architecture with resistive random access memory (RRAM)-based programmable interconnects (FPGA-RPI). Programmable interconnects are the dominant part of FPGA. We use RRAMs to build programmable interconnects, and optimize their structures by exploiting opportunities that emerge in RRAM-based circuits. FPGA-RPI can be fabricated by the existing CMOS-compatible RRAM process. Using an advanced placement and routing tool named VPR-RPI which was developed to deal with the novel architecture, a customized CAD flow is provided for FPGA-RPI. Results show that the programmable interconnects of FPGA-RR have a 96% smaller footprint, 55% higher performance, and 79% lower power consumptions compared to other FPGA counterparts.

84 citations


Proceedings ArticleDOI
03 Nov 2014
TL;DR: This paper introduces the notion of PUF-based logic which can be configured to be functionally equivalent to any arbitrary design, as well as a new architecture for wire merging that obfuscates signal paths exponentially.
Abstract: There is a great need to develop universal and robust techniques for intellectual property protection of integrated circuits. In this paper, we introduce techniques for the obfuscation of an arbitrary circuit by using physical unclonable functions (PUFs) and programmable logic. Specifically, we introduce the notion of PUF-based logic which can be configured to be functionally equivalent to any arbitrary design, as well as a new architecture for wire merging that obfuscates signal paths exponentially. We systematically apply our techniques in such a way so as to maximize obfuscation while minimizing area and delay overhead. We analyze our techniques on popular benchmark circuits and show them to be resilient against very powerful reverse engineering attacks in which the adversary has knowledge of the complete netlist along with the ability to read and write to any flip-flop in the circuit.

81 citations


Journal ArticleDOI
TL;DR: A new logic circuit design paradigm, which assumes parallel processing of input signals, is proposed, along with a methodology for the construction of robust programmable composite memristive switches of variable precision.
Abstract: This brief contributes to the design of computational and reconfigurable structures that exploit unique threshold-dependent switching response of single memristors and their compositions. A new logic circuit design paradigm, which assumes parallel processing of input signals, is proposed, along with a methodology for the construction of robust programmable composite memristive switches of variable precision. This methodology is applied to the design of memristive computing circuits. A SPICE simulation-based validation of the proposed circuits and systems is provided.

70 citations


Journal ArticleDOI
TL;DR: In this article, a programmable array logic device (PAL) was demonstrated on a single flexible transparent plastic foil, which can be configured for different logic functions by an additional process step of inkjet printing conductive wires and poly(3,4-ethylenedioxythiophene):poly(styrene sulfonate) (PEDOT:PSS) resistors between transistors or between logic blocks.

58 citations


Journal ArticleDOI
TL;DR: The proposed circuit can perform any of the four digital logic operations (NOT, NOR, XOR, AND) by using the appropriate optical pump signal at the selection port of the multiplexer.

48 citations


Journal ArticleDOI
TL;DR: In this article, spin-memeristor threshold logic (SMTL) gates employ a memristive cross-bar array to perform current-mode summation of binary inputs, whereas the low-voltage fast-switching spintronic threshold devices carry out the threshold operation in an energy efficient manner.
Abstract: A threshold logic gate performs weighted sum of multiple inputs and compares the sum with a threshold We propose spin-memeristor threshold logic (SMTL) gates, which employ a memristive cross-bar array to perform current-mode summation of binary inputs, whereas the low-voltage fast-switching spintronic threshold devices carry out the threshold operation in an energy efficient manner Field-programmable SMTL gate arrays can operate at a small terminal voltage of ∼50 mV, resulting in ultralow power consumption in gates as well as programmable interconnect networks We evaluate the performance of SMTL using threshold logic synthesis Results for common benchmarks show that SMTL-based programmable logic hardware can be more than 100 × energy efficient than the state-of-the-art CMOS field-programmable gate array

47 citations


Patent
24 Nov 2014
TL;DR: In this article, a reconfigurable logic device is integrated into a die-stacked memory device to provide implementation flexibility in performing various data manipulation operations and other memory operations that use data stored in the die-stack memory device or result in data that is to be stored in a memory device.
Abstract: A die-stacked memory device incorporates a reconfigurable logic device to provide implementation flexibility in performing various data manipulation operations and other memory operations that use data stored in the die-stacked memory device or that result in data that is to be stored in the die-stacked memory device. One or more configuration files representing corresponding logic configurations for the reconfigurable logic device can be stored in a configuration store at the die-stacked memory device, and a configuration controller can program a reconfigurable logic fabric of the reconfigurable logic device using a selected one of the configuration files. Due to the integration of the logic dies and the memory dies, the reconfigurable logic device can perform various data manipulation operations with higher bandwidth and lower latency and power consumption compared to devices external to the die-stacked memory device.

Patent
03 Feb 2014
TL;DR: In this article, a programmable logic device includes a plurality of programmable Logic elements (PLE) whose electrical connection is controlled by first configuration data, each of which includes an LUT, an FF to which the output signal of the LUT is input, and an MUX, which includes at least two switches each including first and second transistor.
Abstract: A programmable logic device includes a plurality of programmable logic elements (PLE) whose electrical connection is controlled by first configuration data. Each of The PLEs includes an LUT in which a relationship between a logic level of an input signal and a logic level of an output signal is determined by second configuration data, an FF to which the output signal of the LUT is input, and an MUX. The MUX includes at least two switches each including first and second transistor. A signal including third configuration data is input to a gate of the second transistor through the first transistor. The output signal of the LUT or an output signal of the FF is input to one of a source and a drain of the second transistor.

Journal ArticleDOI
TL;DR: A novel one-dimensional PLA element is reported that incorporates resistive switch gate structures on a semiconductor nanowire and it is shown that multiple elements can be integrated to realize functional PLAs.
Abstract: Programmable logic arrays (PLA) constitute a promising architecture for developing increasingly complex and functional circuits through nanocomputers from nanoscale building blocks. Here we report a novel one-dimensional PLA element that incorporates resistive switch gate structures on a semiconductor nanowire and show that multiple elements can be integrated to realize functional PLAs. In our PLA element, the gate coupling to the nanowire transistor can be modulated by the memory state of the resistive switch to yield programmable active (transistor) or inactive (resistor) states within a well-defined logic window. Multiple PLA nanowire elements were integrated and programmed to yield a working 2-to-4 demultiplexer with long-term retention. The well-defined, controllable logic window and long-term retention of our new one-dimensional PLA element provide a promising route for building increasingly complex circuits with nanoscale building blocks.

Proceedings ArticleDOI
03 Mar 2014
TL;DR: Comparing the conventional cell design using CMOS routing multiplexer (MUX), the proposed programmable-logic cell array performs 60% active power saving and 3 times faster operation.
Abstract: Programmable-logic cell that utilizes complementary atom switch (CAS) is fabricated using 65-nm node CMOS process. A 16-bit ALU is implemented and demonstrated on a 24×24 programmable-logic cell array including 645kbit CAS for both routing switches and configuration memories. Comparing the conventional cell design using CMOS routing multiplexer (MUX), the proposed programmable-logic cell array performs 60% active power saving and 3 times faster operation.

Patent
30 Oct 2014
TL;DR: In this article, an intelligent programmable logic controller over a plurality of scan cycles is used to select one or more soft-sensors available in a control program corresponding to a production unit, each of which comprising a local parameter or variable used by the control program.
Abstract: A method of operating an intelligent programmable logic controller over a plurality of scan cycles includes the intelligent programmable logic controller selecting one or more soft-sensors available in a control program corresponding to a production unit, each soft-sensor comprising a local parameter or variable used by the control program. The intelligent programmable logic controller determines updated soft-sensor values corresponding to the one or more soft-sensors during each scan cycle and stores those values during each scan cycle on a non-volatile computer-readable storage medium operably coupled to the intelligent programmable logic controller. Additionally, the intelligent programmable logic controller annotates the updated soft-sensor values with automation system context information to generate contextualized data.

Journal ArticleDOI
TL;DR: In this paper, a ring oscillator time-to-digital converter (TDC) is used for the detection of atmospheric muon flux attenuation due to the presence of matter.
Abstract: Time-of-flight (TOF) techniques are standard techniques in high energy physics to determine particles’ propagation directions. Since particle velocities are generally close to c, the speed of light, and detector typical dimensions at the metre level, the state-of-the-art TOF techniques should reach sub-nanosecond timing resolution. Among the various techniques already available, the recently developed ring oscillator time-to-digital converter (TDC) ones, implemented in low-cost programmable logic circuits like FPGA (field programmable gate array), feature a very interesting figure of merit since a very good timing performance may be achieved with limited processing resources. This issue is relevant for applications where unmanned sensors should have the lowest possible power consumption. Actually this paper describes in detail the application of this kind of TOF technique to muon tomography of geological bodies. Muon tomography aims at measuring density variations and absolute densities through the detection of atmospheric muon flux’s attenuation, due to the presence of matter. When the measured fluxes become very low, an identified source of noise comes from backwards propagating particles hitting the detector in a direction pointing to the geological body. The separation between through-going and backward-going particles on the basis of the TOF information is therefore a key parameter for the tomography analysis and subsequent forecasts. This paper describes a TDC implementation fulfilling the requirements of a TOF measurement applied to muon tomography.

Journal ArticleDOI
TL;DR: A new configurable logic block (CLB) is designed, implemented and simulated in the QCA, which used signal distribution network method to avoid the coplanar problem of crossing wires and can be configured as a FPGA.
Abstract: Quantum-dot cellular automata (QCA) is a promising, emerging nano-technology based on single electron effects in quantum dots and molecules. This paper presents design, implementation and simulation of a configurable logic block for a field programmable gate arrays (FPGA) by QCA. Previous works focus on QCA-based FPGA that have fixed logic and programmable interconnection or programmable logic and fixed interconnection; however, proposed structures in this paper have programmable logic and programmable interconnection. The presented look-up table implemented with novel structure which has been allowed as frequently as the read/write operation occurs, also acts as a pipeline. In this paper, we presented novel decoders and multiplexers and implemented with QCA, designed with the minimum number of majority gates and cells. Finally, a new configurable logic block (CLB) is designed, implemented and simulated in the QCA, which used signal distribution network method to avoid the coplanar problem of crossing wires. Also, QCADesigner software is used for detailed layout and QCADesigner attend with HDLQ verilog are used for circuit simulation. The proposed CLB is simulated with programming by the QCADesigner software. The area and delay of QCA-based CLB presented in this paper compared to the CLB based on CMOS, nanomaterial and CNT (32 nm). Results show that proposed CLB will do the task with a minimum clock and can be configured as a FPGA.

Patent
05 Feb 2014
TL;DR: In this paper, an information processing method and an electronic device are described, where a primary PLD is correspondingly connected to N data lines of a group of SGPIO (Serial General Purpose Input Output) buses through N data pins, and is communicated with at least two secondary PLDs through the group of buses.
Abstract: The invention discloses an information processing method and an electronic device. The information processing method comprises that a primary PLD (Programmable Logic Device) is correspondingly connected to N data lines of a group of SGPIO (Serial General Purpose Input Output) buses through N data pins, and is communicated with at least two secondary PLDs through the group of SGPIO buses, wherein N is the number of pins of one secondary PLD in connection with the SGPIO buses, and the secondary PLDs are communicated with the primary PLD through the group of SGPIO buses.

Proceedings ArticleDOI
27 Aug 2014
TL;DR: This work presents the first dynamic emission analysis of a hardware implementation, and presents practical results for a common Complex Programmable Logic Device (CPLD), suggesting the same approach can be applied to hardware implementations in general.
Abstract: Today, hardware implementations are the basis for many security applications, such as cryptographic ciphers. Such applications are realized using complex combinatorial logic circuits of substantial size. Therefore, understanding the gate-level implementation can be crucial for the attacker. However, Hardware Description Language (HDL) behavioral models and gate-level net list are seldom available for a particular design. Executing software directly on the device to assist in understanding the implementation is one potential solution. However, this may either be infeasible or completely impossible in practice as target devices may be incapable of executing code. Currently, few works have proposed forms of dynamic gate-level analysis of the actual hardware implementations. Moreover, current reverse-engineering techniques based on physical delayering and optical imaging cannot be applied to programmable logic. In this work we present the first dynamic emission analysis of a hardware implementation. This technique does not require any prior knowledge about the target device. Furthermore, it does not require code to be executed by the target. Hardware implementations consist of basic primitives that form the building blocks of complex hardware functions. By individually analyzing each primitive and correlating the corresponding optical images, the emission fingerprint of each primitive can be identified. As a result the hardware implementation of the device can be reconstructed. We present practical results for a common Complex Programmable Logic Device (CPLD). However, the same approach can be applied to hardware implementations in general.

Book ChapterDOI
18 Jul 2014
TL;DR: Since temporal logic specifications are notoriously difficult to use in practice, G4LTL-ST supports engineers in specifying realizable control problems by suggesting suitable restrictions on the behavior of the control environment from failed synthesis attempts.
Abstract: G4LTL-ST automatically synthesizes control code for industrial Programmable Logic Controls (PLC) from timed behavioral specifications of input-output signals. These specifications are expressed in a linear temporal logic (LTL) extended with non-linear arithmetic constraints and timing constraints on signals. G4LTL-ST generates code in IEC 61131-3-compatible Structured Text, which is compiled into executable code for a large number of industrial field-level devices. The synthesis algorithm of G4LTL-ST implements pseudo-Boolean abstraction of data constraints and the compilation of timing constraints into LTL, together with a counterstrategy-guided abstraction-refinement synthesis loop. Since temporal logic specifications are notoriously difficult to use in practice, G4LTL-ST supports engineers in specifying realizable control problems by suggesting suitable restrictions on the behavior of the control environment from failed synthesis attempts.

Journal ArticleDOI
TL;DR: This work proposes a new architecture, called fine-grain dynamically reconfigurable (FDR), that consists of an array of homogeneous reconfigured logic elements (LEs), which significantly enhances the flexibility of allocating hardware resources between LUTs and interconnects based on application needs.
Abstract: Prior work has shown that due to the overhead incurred in enabling reconfigurability, field-programmable gate arrays (FPGAs) require 21× more silicon area, 3× larger delay, and 10× more dynamic power consumption compared with application-specific integrated circuits (ASICs). We have earlier presented a hybrid CMOS/nanotechnology reconfigurable architecture (NATURE). It uses the concept of temporal logic folding and fine-grain (i.e., cycle-level) dynamic reconfiguration to increase logic density by an order of magnitude. Since logic folding reduces area usage significantly, on-chip communications tend to become localized. To take full advantage of this fact, we propose a new architecture, called fine-grain dynamically reconfigurable (FDR), that consists of an array of homogeneous reconfigurable logic elements (LEs). Each LE can be arbitrarily configured into a lookup table (LUT) or interconnect or a combination of both. This significantly enhances the flexibility of allocating hardware resources between LUTs and interconnects based on application needs. The proposed FDR architecture eliminates most of the long-distance and global wires, which occupy most of the area in conventional FPGAs. Fine-grain dynamic reconfiguration is enabled by local embedded static RAM blocks. The experiments show that, on an average, area, delay, and power are improved by 9.14×, 1.11×, and 1.45×, compared with a conventional FPGA architecture that does not use the concept of logic folding. Compared with NATURE with deep logic folding, area, delay, and power are improved by 2.12×, 3.28×, and 1.74×, respectively. Although this does not eliminate the FPGA-ASIC area/delay/power gaps, it makes progress toward bridging these gaps.

Proceedings ArticleDOI
10 Jul 2014
TL;DR: This paper presents a set of techniques for taking advantage of the streaming character of the algorithm by selectively switching off parts of the circuit that cannot execute, thus saving power.
Abstract: Streaming applications describe a broad class of computing algorithms in areas such as signal processing, media coding and compression, cryptography, video analytics, network touting and packet processing and many others. For many of these applications, programmable logic devices such as FP-GAs are the implementation platform of choice due to their higher flexibility compared to ASICs and lower power consumption and higher performance compared to processors. This paper presents a set of techniques for taking advantage of the streaming character of the algorithm by selectively switching off parts of the circuit that cannot execute, thus saving power. The implementation is integrated into an existing high-level synthesis flow, and applied to a variety of appli-cations, resulting in up to 20% power reduction with a very small additional logic footprint and no loss in throughput. © 2014 European Electronic Chips & Systems design ECSI.

Journal ArticleDOI
TL;DR: It is demonstrated that the short-circuit effect in an FTM provides the opportunity to design a novel FTM-based Boolean logic block, which implements logic operation inside a single memristor.
Abstract: Thanks to the progress in nonvolatile (NV) devices, such as magnetic tunnel junctions and phase change memories, various NV logic blocks have recently been proposed to overcome energy/delay bottlenecks caused by von Neumann computing architecture. The ferroelectric tunnel memristor (FTM) is an emerging NV multilevel device and was recently reported to show excellent performance. In this paper, we demonstrated that the short-circuit effect in an FTM provides the opportunity to design a novel FTM-based Boolean logic block. This block is composed of an FTM and a load resistor. Unlike classical schemes, where at least two cells are required as operands, our FTM-based block implements logic operation inside a single memristor. With a compact model of an FTM, transient simulation is performed to validate NAND and NOR logic functions. Finally, we provide the method of performance optimization and discuss the advantages/disadvantages of the proposed logic block to summarize our work.

Proceedings ArticleDOI
22 May 2014
TL;DR: This work describes the design and implementation of a remote reconfigurable FPGA-based SoC (System on Chip) with a Service-oriented configuration interface over the Internet, and underlines several solutions in order to meet the specific challenges raised by this type of integration.
Abstract: The growing development of Cloud Computing raised the need of making hardware available “as a Service”. Reconfigurable hardware - like FPGA (Field Programmable Gate Arrays) - makes the ideal candidate for being integrated into Cloud systems due to their high flexibility and scalability. This work describes the design and implementation of a remote reconfigurable FPGA-based SoC (System on Chip) with a Service-oriented configuration interface over the Internet, and underlines several solutions in order to meet the specific challenges raised by this type of integration. The FPGA board (that can be at a remote location) is configured by a Web Service linked to a JSP (JavaServer Pages) Web-page where the user can provide a bitstream configuration file for the Programmable Logic (PL). On the SoC/FPGA board, dedicated software has been developed to run on the embedded Processing System (PS), managing the downloaded bitstreams and configuring the PL.

BookDOI
13 Oct 2014
TL;DR: A considerable part of the book is devoted to design methods oriented on implementing control units using FPGA and CPLD chips, such important issues as design of reliable FSMs, automatic design of concurrent logic controllers, and the models and methods for creating infrastructure IP services for the SoCs are presented.
Abstract: Logic design of digital devices is a very important part of the Computer Science. It deals with design and testing of logic circuits for both data-path and control unit of a digital system. Design methods depend strongly on logic elements using for implementation of logic circuits. Different programmable logic devices are wide used for implementation of logic circuits. Nowadays, we witness the rapid growth of new and new chips, but there is a strong lack of new design methods. This book includes a variety of design and test methods targeted on different digital devices. It covers methods of digital system design, the development of theoretical base for construction and designing of the PLDbased devices, application of UML for digital design. A considerable part of the book is devoted to design methods oriented on implementing control units using FPGA and CPLD chips. Such important issues as design of reliable FSMs, automatic design of concurrent logic controllers, the models and methods for creating infrastructure IP services for the SoCs are also presented.The editors of the book hope that it will be interesting and useful for experts in Computer Science and Electronics, as well as for students, who are viewed as designers of future digital devices and systems.

Journal ArticleDOI
TL;DR: In this article, a multi-context field-programmable gate array (FPGA) enabling fine-grained power gating (PG) is fabricated by a hybrid process involving a 1.0?m c-axis aligned crystalline In?Ga?Zn?O (CAAC-IGZO) field effect transistor (FET) and a 0.5?m complementary metal oxide semiconductor (CMOS) FET.
Abstract: A multi-context (MC) field-programmable gate array (FPGA) enabling fine-grained power gating (PG) is fabricated by a hybrid process involving a 1.0 ?m c-axis aligned crystalline In?Ga?Zn?O (CAAC-IGZO) field-effect transistor (FET), which is one of CAAC oxide-semiconductor (OS) FETs, and a 0.5 ?m complementary metal oxide semiconductor (CMOS) FET. The FPGA achieves a 20% layout area reduction in a routing switch and an 82.8% reduction in power required to retain data of configuration memory (CM) cells at 2.5 V driving compared to a static random access memory (SRAM)-based FPGA. A controller for fine-grained PG can be implemented at an area overhead of 7.5% per programmable logic element (PLE) compared to a PLE without PG. For each PLE, the power overhead with fine-grained PG amounts to 2.25 and 2.26 nJ for power-on and power-off, respectively, and break-even time (BET) is 19.4 ?s at 2.5 V and 10 MHz driving.

Journal ArticleDOI
TL;DR: A novel very fast hardware accelerator is proposed and implemented in the programmable logic (PL) of a Xilinx Zynq microchip, demonstrating significant speedup comparing to software running in general-purpose PC and in the ARM.
Abstract: The paper suggests a technique for solving the matrix/set covering problem in all programmable systems-on-chip. A novel very fast hardware accelerator is proposed and implemented in the programmable logic (PL) of a Xilinx Zynq microchip. The accelerator is managed by software running in the processing system (ARM Cortex-A9) available on the same microchip and communicating with the PL through high-speed interfaces. The results of implementation, experiments, and comparisons demonstrate significant speedup comparing to software running in general-purpose PC and in the ARM. DOI: http://dx.doi.org/10.5755/j01.eee.20.5.7116

Proceedings ArticleDOI
01 Dec 2014
TL;DR: A hardware-software co-design approach of an OpenFlow switch using a state-of-the-art heterogeneous system-on-chip (SoC) platform and results show that the design targeted at Xilinx Zynq can achieve a total 88 Gbps throughput for a 1K flow table which supports dynamic and hitless updates.
Abstract: Software Defined Networking (SDN) has been proposed as a flexible solution for the next generation Internet provision. OpenFlow is a pioneering protocol for SDN which enables a hardware data plane to be managed by a software-based controller in a standard way. In this paper, we present a hardware-software co-design approach of an OpenFlow switch using a state-of-the-art heterogeneous system-on-chip (SoC) platform. Specifically, we implement the OpenFlow switch on a Xilinx Zynq ZC706 board. The Xilinx Zynq SoC family provides a tight coupling of field programmable gate array (FPGA) fabric and ARM processor cores, making it an attractive on-chip implementation platform for SDN switches. High-performance, yet highly-programmable, data plane processing can reside in programmable logic, while complex control software can reside in ARM processor. Our proposed architecture involves a methodology that scales across: (a) a range of possible packet throughput rates and (b) a range of possible flow table sizes. Post-place-and-route results show that our design targeted at Xilinx Zynq can achieve a total 88 Gbps throughput for a 1K flow table which supports dynamic and hitless updates. Correct operation has been demonstrated using a ZC706 board.

Proceedings ArticleDOI
24 Mar 2014
TL;DR: The possibility of using coupled spin-torque nano oscillators for low-power non-Boolean computing and circuit and system-level design techniques need to be explored that average the specific spin-device characterisitcs to achieve energy-efficiency, performance and reliability comparable to those of CMOS.
Abstract: In this paper we discuss the potential of emerging spin-torque devices for computing applications. Recent proposals for spin-based computing schemes may be differentiated as 'all-spin' vs. hybrid, programmable vs. fixed, and, Boolean vs. non-Boolean. All-spin logic-styles may offer high area-density due to small form-factor of nano-magnetic devices. However, circuit and system-level design techniques need to be explored that leaverage the specific spin-device characterisitcs to achieve energy-efficiency, performance and reliability comparable to those of CMOS. The non-volatility of nano-magnets can be exploited in the design of energy and area-efficient programmable logic. In such logic-styles, spin-devices may play the dual-role of computing as well as memory-elements that provide field-programmability. Spin-based threshold logic design is presented as an example. Emerging spintronic phenomena may lead to ultra-low-voltage, current-mode, spin-torque switches that can offer attractive computing capabilities, beyond digital switches. Such devices may be suitable for non-Boolean data-processing applications which involve analog processing. Integration of such spin-torque devices with charge-based devices like CMOS and resistive memory can lead to highly energy-efficient information processing hardware for applicatons like pattern-matching, neuromorphic-computing, image-processing and data-conversion. Finally, we discuss the possibility of using coupled spin-torque nano oscillators for low-power non-Boolean computing.

Proceedings ArticleDOI
29 Sep 2014
TL;DR: Test results show that the experiment platform has a high speed and stable performance to manage the storage data, and can effectively evaluate the indicators of storage system based on NAND flash.
Abstract: In this paper we describe a hardware-software co-design experiments platform for NAND flash based on Zynq. Our novel experimental platform utilizes the PL (Programmable Logic within Zynq) to achieve the timing control and bad block management of NAND flash, and to provide a high-speed parallel algorithm verification environment for users. Besides, it utilizes the PS (Processing System within the Zynq) to achieve the famous hybrid FAST FTL algorithm, so it can also provide a valuation baseline of FTL algorithms, and provide the running environment for compute-intensive data processing algorithms, like compression or error correction. Test results show that our experiment platform has a high speed and stable performance to manage the storage data, and can effectively evaluate the indicators of storage system based on NAND flash.