Showing papers presented at "Asia and South Pacific Design Automation Conference in 2014"

PDF

Open Access

Proceedings Article•DOI•

Qubit placement to minimize communication overhead in 2D quantum architectures

[...]

Alireza Shafaei¹, Mehdi Saeedi¹, Massoud Pedram¹•Institutions (1)

20 Feb 2014

TL;DR: In this article, the authors proposed an optimization method that considers qubit-to-qubit interactions in 2D grid architectures to alleviate the latency of quantum circuits mapped to these architectures.

...read moreread less

Abstract: Regular, local-neighbor topologies of quantum architectures restrict interactions to adjacent qubits, which in turn increases the latency of quantum circuits mapped to these architectures. To alleviate this effect, optimization methods that consider qubit-to-qubit interactions in 2D grid architectures are presented in this paper. The proposed approaches benefit from Mixed Integer Programming (MIP) formulation for the qubit placement problem. Simulation results on various benchmarks show 27% on average reduction in communication overhead between qubits compared to best results of previous work.

...read moreread less

122 citations

Proceedings Article•DOI•

Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators

[...]

Zidong Du, Avinash Lingamneni¹, Yunji Chen, Krishna V. Palem¹, Olivier Temam², Chengyong Wu - Show less +2 more•Institutions (2)

Rice University¹, French Institute for Research in Computer Science and Automation²

20 Feb 2014

TL;DR: This paper proposes to expand the application scope, error tolerance as well as the energy savings of inexact computing systems through neural network architectures, and demonstrates that the proposed inexact neural network accelerator could achieve 43.91%-62.49% savings in energy consumption.

...read moreread less

Abstract: In recent years, inexact computing has been increasingly regarded as one of the most promising approaches for reducing energy consumption in many applications that can tolerate a degree of inaccuracy. Driven by the principle of trading tolerable amounts of application accuracy in return for significant resource savings - the energy consumed, the (critical path) delay and the (silicon) area being the resources - this approach has been limited to certain application domains. In this paper, we propose to expand the application scope, error tolerance as well as the energy savings of inexact computing systems through neural network architectures. Such neural networks are fast emerging as popular candidate accelerators for future heterogeneous multi-core platforms, and have flexible error tolerance limits owing to their ability to be trained. Our results based on simulated 65nm technology designs demonstrate that the proposed inexact neural network accelerator could achieve 43.91%-62.49% savings in energy consumption (with corresponding delay and area savings being 18.79% and 31.44% respectively) when compared to existing baseline neural network implementation, at the cost of an accuracy loss (quantified as the Mean Square Error (MSE) which increases from 0.14 to 0.20 on average).

...read moreread less

114 citations

Proceedings Article•DOI•

Training itself: Mixed-signal training acceleration for memristor-based neural network

[...]

Boxun Li¹, Yuzhi Wang¹, Yu Wang¹, Yi Chen², Huazhong Yang¹ - Show less +1 more•Institutions (2)

Tsinghua University¹, University of Pittsburgh²

20 Feb 2014

TL;DR: This work modify the original stochastic gradient descent algorithm by approximating calculations and designing an alternative computing method, and proposes a mixed-signal acceleration architecture for the modified training algorithm by equipping the original memristor-based neural network architecture with the copy crossbar technique, weight update units, sign calculation units and other assistant units.

...read moreread less

Abstract: The artificial neural network (ANN) is among the most widely used methods in data processing applications. The memristor-based neural network further demonstrates a power efficient hardware realization of ANN. Training phase is the critical operation of memristor-based neural network. However, the traditional training method for memristor-based neural network is time consuming and energy inefficient. Users have to first work out the parameters of memristors through digital computing systems and then tune the memristor to the corresponding state. In this work, we introduce a mixed-signal training acceleration framework, which realizes the self-training of memristor-based neural network. We first modify the original stochastic gradient descent algorithm by approximating calculations and designing an alternative computing method. We then propose a mixed-signal acceleration architecture for the modified training algorithm by equipping the original memristor-based neural network architecture with the copy crossbar technique, weight update units, sign calculation units and other assistant units. The experiment on the MNIST database demonstrates that the proposed mixed-signal acceleration is 3 orders of magnitude faster and 4 orders of magnitude more energy efficient than the CPU implementation counterpart at the cost of a slight decrease of the recognition accuracy (<; 5%).

...read moreread less

88 citations

Proceedings Article•DOI•

A DC-DC boost converter with variation tolerant MPPT technique and efficient ZCS circuit for thermoelectric energy harvesting applications

[...]

Jungmoon Kim¹, Minseob Shim¹, Junwon Jung¹, Heejun Kim¹, Chulwoo Kim¹ - Show less +1 more•Institutions (1)

Korea University¹

20 Feb 2014

TL;DR: A finely controlled zero-current switching (ZCS) scheme together with the accurate MPPT technique enhances the overall efficiency of the converter because of an optimal turn-on time generated by a one-shot pulse generator that is proposed.

...read moreread less

Abstract: This paper presents a boost converter with the maximum power point tracking (MPPT) technique for thermoelectric energy harvesting (EH) applications. The technique realizes variation tolerance by adjusting the switching frequency fSW of the converter. A finely controlled zero-current switching (ZCS) scheme together with the accurate MPPT technique enhances the overall efficiency (η) of the converter because of an optimal turn-on time generated by a one-shot pulse generator that is proposed. Moreover, the ZCS technique can deal with low and high temperature differences applied to the thermoelectric generator. Experimentally, the converter implemented in a 0.35 um BCDMOS process had a peak of 72% at the input voltage VIN of 500mV while supplying a 5.62V output.

...read moreread less

87 citations

Proceedings Article•DOI•

Optimal SWAP gate insertion for nearest neighbor quantum circuits

[...]

Robert Wille¹, Aaron Lye¹, Rolf Drechsler¹•Institutions (1)

University of Bremen¹

20 Feb 2014

TL;DR: This work presents an exact approach that enables nearest neighbor-compliance by inserting a minimal number of SWAP gates, and demonstrates the applicability of the approach which enabled a comparison of results obtained by heuristic methods to the actual optimum.

...read moreread less

Abstract: Motivated by its promising applications e.g. for database search or factorization, significant progress has been made in the development of automated design methods for quantum circuits. But in order to keep up with recent physical developments in this domain, new technological constraints have to be considered. Limited interaction distance between gate qubits is one of the most common of these constraints. This led to the development of several strategies aiming at making a given quantum circuit nearest neighbor-compliant by inserting SWAP gates into the existing structure. Usually these strategies are of heuristic nature. In this work, we present an exact approach that enables nearest neighbor-compliance by inserting a minimal number of SWAP gates. Experiments demonstrate the applicability of the approach which enabled a comparison of results obtained by heuristic methods to the actual optimum.

...read moreread less

78 citations

Proceedings Article•DOI•

Service adaptions for mixed-criticality systems

[...]

Pengcheng Huang¹, Georgia Giannopoulou¹, Nikolay Stoimenov¹, Lothar Thiele¹•Institutions (1)

ETH Zurich¹

20 Feb 2014

TL;DR: This paper studies the reconfiguration of services provided to low criticality tasks in reaction to the overruns of highcritical tasks, and derives tight analysis results under Earliest Deadline First (EDF) scheduling.

...read moreread less

Abstract: Complex embedded systems are typically mixed-critical, where heterogeneous guarantees must be provided for functionalities of different criticalities We study in this paper the reconfiguration of services provided to low criticality tasks in reaction to the overruns of high criticality tasks We further investigate the quantification of the resetting time of the system services For both service reconfiguration and resetting, we derive tight analysis results under Earliest Deadline First (EDF) scheduling

...read moreread less

62 citations

Proceedings Article•DOI•

Task- and network-level schedule co-synthesis of Ethernet-based time-triggered systems

[...]

Licong Zhang, Dip Goswami, Reinhard Schneider, Samarjit Chakraborty

20 Feb 2014

TL;DR: This work forms the co-synthesis problem of task and communication schedules as a Mixed Integer Programming (MIP) model taking into account a number of Ethernet-specific timing parameters such as interframe gap, precision and synchronization error.

...read moreread less

Abstract: In this paper, we study time-triggered distributed systems where periodic application tasks are mapped onto different end stations (processing units) communicating over a switched Ethernet network. We address the problem of application level (i.e., both task- and network-level) schedule synthesis and optimization. In this context, most of the recent works [10], [11] either focus on communication schedule or consider a simplified task model. In this work, we formulate the co-synthesis problem of task and communication schedules as a Mixed Integer Programming (MIP) model taking into account a number of Ethernet-specific timing parameters such as interframe gap, precision and synchronization error. Our formulation is able to handle one or multiple timing objectives such as application response time, end-to-end delay and their combinations. We show the applicability of our formulation considering an industrial size case study using a number of different sets of objectives. Further, we show that our formulation scales to systems with reasonably large size.

...read moreread less

60 citations

Proceedings Article•DOI•

Storage-less and converter-less maximum power point tracking of photovoltaic cells for a nonvolatile microprocessor

[...]

Cong Wang¹, Naehyuck Chang², Younghyun Kim², Sangyoung Park², Yongpan Liu¹, Hyung Gyu Lee³, Rong Luo¹, Huazhong Yang¹ - Show less +4 more•Institutions (3)

Tsinghua University¹, Seoul National University², Daegu University³

20 Feb 2014

TL;DR: This paper pioneers the maximum power point tracking (MPPT) of photovoltaic cells that directly supply power to a microprocessor without an energy storage element (a battery or a large-size capacitor) nor power converters with huge reduction in cost, weight and volume, and extended lifetime.

...read moreread less

Abstract: This paper pioneers the maximum power point tracking (MPPT) of photovoltaic (PV) cells that directly supply power to a microprocessor without an energy storage element (a battery or a large-size capacitor) nor power converters. The maximum power point tracking is conventionally performed by an MPPT charger that stores in the energy storage element, and a voltage regulator (typically a DC-DC converter) produces a proper voltage level for the microprocessor. The energy storage element is an energy buffer and makes it possible to perform MPPT of the PV cells and power management of the microprocessor independently. However, the energy storage element, MPPT charger and DC-DC converter cause seriously limited lifetime (when a typical battery is adopted), significant energy loss (typically over 20%), increased weight/volume and high cost, etc. The proposed method enables extremely fine-grain dynamic power management (DPM) in every a few hundred microseconds and performs the MPPT without using an MPPT charger and a DC-DC converter as well as an energy storage element. We achieve 84.5% of energy harvesting efficiency using the proposed setup with huge reduction in cost, weight and volume, and extended lifetime, which is not even numerically comparable with conventional MPPT methods.

...read moreread less

60 citations

Proceedings Article•DOI•

A model-based design of Cyber-Physical Energy Systems

[...]

Mohammad Abdullah Al Faruque¹, Fereidoun Ahourai¹•Institutions (1)

University of California, Irvine¹

20 Feb 2014

TL;DR: A MBD method and its associated tool for the purpose of designing and validating various control algorithms for a residential microgrid is demonstrated and various use cases are presented to demonstrate how different levels of control algorithms may be developed, simulated, debugged, and analyzed by using the GridMat toolbox.

...read moreread less

Abstract: Cyber-Physical Energy Systems (CPES) are an amalgamation of both power gird technology, and the intelligent communication and co-ordination between the supply and the demand side through distributed embedded computing. Through this combination, CPES are intended to deliver power efficiently, reliably, and economically. The design and development work needed to either implement a new power grid network or upgrade a traditional power grid to a CPES-compliant one is both challenging and time consuming due to the heterogeneous nature of the associated components/subsystems. The Model Based Design (MBD) methodology has been widely seen as a promising solution to address the associated design challenges of creating a CPES. In this paper, we demonstrate a MBD method and its associated tool for the purpose of designing and validating various control algorithms for a residential microgrid. Our presented co-simulation engine GridMat is a MATLAB/Simulink toolbox; the purpose of it is to co-simulate the power systems modeled in GridLAB-D as well as the control algorithms that are modeled in Simulink. We have presented various use cases to demonstrate how different levels of control algorithms may be developed, simulated, debugged, and analyzed by using our GridMat toolbox for a residential mi-crogrid.

...read moreread less

56 citations

Proceedings Article•DOI•

The data center as a grid load stabilizer

[...]

Hao Chen¹, Michael C. Caramanis¹, Ayse K. Coskun¹•Institutions (1)

Boston University¹

20 Feb 2014

TL;DR: This paper proposes a dynamic control policy that modulates the data center power consumption in response to ISO requests by leveraging server power capping techniques and various server power states, and demonstrates that using this policy, data centers can provide fast reserves in quantities that are substantial proportions of their average energy consumption.

...read moreread less

Abstract: To accommodate the increasing presence of volatile and intermittent renewable energy sources in power generation, independent system operators (ISO) offer opportunities for demand side regulation service (RS) so as to stabilize the grid load. These power market features allow the demand side to earn monetary credits by modulating its power consumption dynamically following an RS signal broadcast by ISO. This paper studies the capacities and benefits of a major potential demand side, the data center, to provide RS. We propose a dynamic control policy that modulates the data center power consumption in response to ISO requests by leveraging server power capping techniques and various server power states. Results demonstrate that using our policy, data centers can provide fast reserves in quantities that are substantial proportions (around 50%) of their average energy consumption, with no major deterioration in quality of service (QoS). By doing so, data centers decrease their energy costs around 50%, while providing the ISOs and the society in general with cost effective demand side reserves that render massive renewable generation adoption affordable.

...read moreread less

52 citations

Proceedings Article•DOI•

CNPUF: A Carbon Nanotube-based Physically Unclonable Function for secure low-energy hardware design

[...]

S. T. Choden Konigsmark¹, Leslie K. Hwang¹, Deming Chen¹, Martin D. F. Wong¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

20 Feb 2014

TL;DR: Carbon Nanotube PUF is presented, the first PUF design that takes advantage of unique CNFET characteristics and achieves higher reliability against environmental variations and increased resistance against modeling attacks.

...read moreread less

Abstract: Physically Unclonable Functions (PUFs) are used to provide identification, authentication and secret key generation based on unique and unpredictable physical characteristics. Carbon Nanotube Field Effect Transistors (CNFETs) were shown to have excellent electrical and unique physical characteristics and are promising candidates to replace silicon transistors in future Very Large Scale Integration (VLSI) designs. We present Carbon Nanotube PUF (CNPUF), the first PUF design that takes advantage of unique CNFET characteristics. CNPUF achieves higher reliability against environmental variations and increased resistance against modeling attacks. Furthermore, CNPUF has a considerable power and energy reduction in comparison to previous ultra-low power PUF designs of 89.6% and 98%, respectively. Additionally, CNPUF allows power-security tradeoff.

...read moreread less

Proceedings Article•DOI•

A network-flow-based optimal sample preparation algorithm for digital microfluidic biochips

[...]

Trung Anh Dinh¹, Shigeru Yamashita¹, Tsung-Yi Ho²•Institutions (2)

Ritsumeikan University¹, National Cheng Kung University²

20 Feb 2014

TL;DR: For the first time, an optimal sample preparation algorithm based on a minimum-cost maximum-flow model is presented that can obtain both the optimal cost of sample and buffer usage and the waste amount even for multiple-target concentrations.

...read moreread less

Abstract: Sample preparation, which is a front-end process to produce droplets of the desired target concentrations from input reagents, plays a pivotal role in every assay, laboratory, and application in biomedical engineering and life science. The consumption of sample/buffer/waste is usually used to evaluate the effectiveness of a sample preparation process. In this paper, for the first time, we present an optimal sample preparation algorithm based on a minimum-cost maximum-flow model. By using the proposed model, we can obtain both the optimal cost of sample and buffer usage and the waste amount even for multiple-target concentrations. Experiments demonstrate that we can consistently achieve much better results not only in the consumption of sample and buffer but also the waste amount when compared with all the state-of-the-art of the previous approaches.

...read moreread less

Proceedings Article•DOI•

Architectural aspects in design and analysis of SOT-based memories

[...]

Rajendra Bishnoi¹, Mojtaba Ebrahimi¹, Fabian Oboril¹, Mehdi B. Tahoori¹•Institutions (1)

Karlsruhe Institute of Technology¹

20 Feb 2014

TL;DR: This work provides a very detailed analysis of SOT-MRAM at both circuit- and architecture-level, and presents a detailed evaluation of performance and energy related parameters and compares the novel SOTS MRAM with several other memory technologies.

...read moreread less

Abstract: Magnetic Random Access Memory (MRAM) is a very promising emerging memory technology because of its various advantages such as non-volatility, high density and scalability. In particular, Spin Orbit Torque (SOT) MRAM is gaining interest as it comes along with all the benefits of its predecessor Spin Transfer Torque (STT) MRAM, but is supposed to eliminate some of its shortcomings. Especially the split of read and write paths in SOT-MRAM promises faster access times and lower energy consumption compared to STT-MRAM. In this work, we provide a very detailed analysis of SOT-MRAM at both circuit- and architecture-level. We present a detailed evaluation of performance and energy related parameters and compare the novel SOT-MRAM with several other memory technologies. Our architecture-level analysis shows that with a hybrid-combination of SRAM for the L1-cache and SOT-MRAM for the L2-cache the energy consumption can be reduced by 63 % in average while the performance can be increased by 1 %. In addition, the memory area is 43% lower compared to an SRAM-only configuration.

...read moreread less

Proceedings Article•DOI•

QoS-aware dynamic resource allocation for spatial-multitasking GPUs

[...]

Paula Aguilera¹, Katherine Morrow¹, Nam Sung Kim¹•Institutions (1)

University of Wisconsin-Madison¹

20 Feb 2014

TL;DR: A runtime technique to dynamically partition GPU resources between concurrently running applications - at least one of which has a quality-of-service requirement - can satisfy a 100% QoS requirement while also achieving either a 7W power consumption reduction or a 17.57% performance improvement for co-executing best-effort applications.

...read moreread less

Abstract: General-purpose computing on GPUs (GPGPU computing) is becoming widely adopted; however, some GPGPU applications fail to fully utilize GPU resources. In these cases, spatial multitasking better exploits the parallelism offered by GPUs by partitioning the GPU resources among simultaneously-running applications. When one or more such applications have quality-of-service (QoS) requirements, enough resources must be allocated for those applications to satisfy their requirements. Remaining resources can be either disabled to reduce power consumption or used to accelerate other applications. However, we observe that the amount of resources for a QoS application to satisfy its performance requirement is dependent in part upon the co-executing applications. In this paper, we propose a runtime technique to dynamically partition GPU resources between concurrently running applications - at least one of which has a QoS requirement. We demonstrate that the proposed technique can satisfy a 100% QoS requirement while also achieving either a 7W power consumption reduction or a 17.57% performance improvement for co-executing best-effort applications.

...read moreread less

Proceedings Article•DOI•

Efficient synthesis of quantum circuits implementing clifford group operations

[...]

Philipp Niemann¹, Robert Wille¹, Rolf Drechsler¹•Institutions (1)

University of Bremen¹

20 Feb 2014

TL;DR: An automatic synthesis approach for quantum circuits that implement Clifford Group operations that exploits specific properties of the unitary transformation matrices that are associated to quantum operations is proposed.

...read moreread less

Abstract: Quantum circuits established themselves as a promising emerging technology and, hence, attracted considerable attention in the domain of computer-aided design. As a result, many approaches for synthesis of corresponding netlists have been proposed in the last decade. However, as the design of quantum circuits faces serious obstacles caused by phenomena such as superposition, entanglement, and phase shifts, automatic synthesis still represents a significant challenge. In this paper, we propose an automatic synthesis approach for quantum circuits that implement Clifford Group operations. These circuits are essential for many quantum applications and cover core aspects of quantum functionality. The proposed approach exploits specific properties of the unitary transformation matrices that are associated to quantum operations. Furthermore, Quantum Multiple-Valued Decision Diagrams (QMDDs) are employed for an efficient representation of these matrices. Experimental results confirm that this enables a compact realization of the respective quantum functionality.

...read moreread less

Proceedings Article•DOI•

Adjustable contiguity of run-time task allocation in networked many-core systems

[...]

Mohammad Fattah¹, Pasi Liljeberg¹, Juha Plosila¹, Hannu Tenhunen¹•Institutions (1)

Information Technology University¹

01 Jan 2014

TL;DR: Up to 35% drop in the network costs can be gained by adjusting the level of contiguity compared to non-contiguous cases, while the achieved throughput is kept constant in CASqA.

...read moreread less

Abstract: In this paper, we propose a run-time mapping algorithm, CASqA, for networked many-core systems. In this algorithm, the level of contiguousness of the allocated processors (α) can be adjusted in a fine-grained fashion. A strictly contiguous allocation (α = 0) decreases the latency and power dissipation of the network and improves the applications execution time. However, it limits the achievable throughput and increases the turnaround time of the applications. As a result, recent works consider non-contiguous allocation (α = 1) to improve the throughput traded off against applications execution time and network metrics. In contradiction, our experiments show that a higher throughput (by 3%) with improved network performance can be achieved when using intermediate α values. More precisely, up to 35% drop in the network costs can be gained by adjusting the level of contiguity compared to non-contiguous cases, while the achieved throughput is kept constant. Moreover, CASqA provides at least 32% energy saving in the network compared to other works.

...read moreread less

Proceedings Article•DOI•

A coherent hybrid SRAM and STT-RAM L1 cache architecture for shared memory multicores

[...]

Jianxing Wang¹, Yenni Tim¹, Weng-Fai Wong¹, Zhong-Liang Ong¹, Zhenyu Sun², Hai Li² - Show less +2 more•Institutions (2)

National University of Singapore¹, University of Pittsburgh²

20 Feb 2014

TL;DR: This paper proposes a hybrid L1 cache architecture that incorporates both SRAM and STT-RAM and the key novelty of the proposal is the exploition of the MESI cache coherence protocol to perform dynamic block reallocation between different cache partitions.

...read moreread less

Abstract: STT-RAM is an emerging NVRAM technology that promises high density, low energy and a comparable access speed to conventional SRAM. This paper proposes a hybrid L1 cache architecture that incorporates both SRAM and STT-RAM. The key novelty of the proposal is the exploition of the MESI cache coherence protocol to perform dynamic block reallocation between different cache partitions. Compared to the pure SRAM-based design, our hybrid scheme achieves 38% of energy saving with a mere 0.8% IPC degradation while extending the lifespan of STT-RAM partition at the same time.

...read moreread less

Proceedings Article•DOI•

HDTV1080p HEVC Intra encoder with source texture based CU/PU mode pre-decision

[...]

Jia Zhu¹, Zhenyu Liu¹, Dongsheng Wang¹, Qingrui Han², Yang Song² - Show less +1 more•Institutions (2)

Tsinghua University¹, Huawei²

01 Jan 2014

TL;DR: To alleviate the burden of Intra encoder, the RD-cost from the source image textures is estimated, and two promising CU/PU mode candidates are dynamically select to execute exhaustive RDO processing.

...read moreread less

Abstract: HEVC doubles the coding efficiency with more than 4x coding complexity as compared to H.264/AVC. To alleviate the burden of Intra encoder, we estimate the RD-cost from the source image textures, and dynamically select two promising CU/PU mode candidates to execute exhaustive RDO processing. As integrated in our hardwired encoder, the averaged 61.7% computation complexity was saved with 4.53% rate augment. With TSMC 90nm technology, the real-time encoder for HDTV1080p at 44fps is implemented with 2269k-gate at 357MHz operating frequency.

...read moreread less

Proceedings Article•DOI•

A comprehensive and accurate latency model for Network-on-Chip performance analysis

[...]

Zhiliang Qian¹, Da-Cheng Juan², Paul Bogdan³, Chi-Ying Tsui¹, Diana Marculescu², Radu Marculescu² - Show less +2 more•Institutions (3)

Hong Kong University of Science and Technology¹, Carnegie Mellon University², University of Southern California³

20 Feb 2014

TL;DR: The proposed framework analyzes the links dependency and then determines the ordering of queuing analysis for performance modeling, and can be used to analyze various traffic scenarios for NoC platforms with arbitrary buffer and packet lengths.

...read moreread less

Abstract: In this work, we propose a new, accurate, and comprehensive analytical model for Network-on-Chip (NoC) performance analysis. Given the application communication graph, the NoC architecture, and the routing algorithm, the proposed framework analyzes the links dependency and then determines the ordering of queuing analysis for performance modeling. The channel waiting times in the links are estimated using a generalized G/G/1/K queuing model, which can tackle bursty traffic and dependent arrival times with general service time distributions. The proposed model is general and can be used to analyze various traffic scenarios for NoC platforms with arbitrary buffer and packet lengths. Experimental results on both synthetic and real applications demonstrate the accuracy and scalability of the newly proposed model.

...read moreread less

Proceedings Article•DOI•

The stochastic modeling of TiO 2 memristor and its usage in neuromorphic system design

[...]

Miao Hu¹, Yu Wang², Qinru Qiu³, Yi Chen¹, Hai Li¹ - Show less +1 more•Institutions (3)

University of Pittsburgh¹, Tsinghua University², Syracuse University³

01 Jan 2014

TL;DR: A macro cell design composed of multiple parallel connecting memristors can be successfully used in implementing the weight storage unit and the stochastic neuron - the two fundamental components in neural network (NN)s, providing a feasible solution in memristor-based hardware implementation.

...read moreread less

Abstract: Memristor, the fourth basic circuit element, has shown great potential in neuromorphic circuit design for its unique synapse-like feature. However, though the continuous resistance state of memristor has been expected, obtaining and maintaining an arbitrary intermediate state cannot be well controlled in nowadays memristive system. In addition, the stochastic switching behaviors have been widely observed. To facilitate the investigation on memristor-based hardware implementation, we built a stochastic behavior model of TiO2 memristive devices based on the real experimental results. By leveraging the stochastic behavior of memristors, a macro cell design composed of multiple parallel connecting memristors can be successfully used in implementing the weight storage unit and the stochastic neuron - the two fundamental components in neural network (NN)s, providing a feasible solution in memristor-based hardware implementation.

...read moreread less

Proceedings Article•DOI•

ABCD-NL: Approximating Continuous non-linear dynamical systems using purely Boolean models for analog/mixed-signal verification

[...]

Aadithya V. Karthik¹, Sayak Ray¹, Pierluigi Nuzzo¹, Alan Mishchenko¹, Robert K. Brayton¹, Jaijeet Roychowdhury¹ - Show less +2 more•Institutions (1)

University of California, Berkeley¹

20 Feb 2014

TL;DR: This work formally verify the throughput of an AMS signaling system - modelled in SPICE using 22nm BSIM4 transistors, Booleanized with high accuracy using ABCD-NL, and property-checked using ABC.

...read moreread less

Abstract: We present ABCD-NL, a technique that approximates non-linear analog circuits using purely Boolean models, to high accuracy. Given an analog/mixed-signal (AMS) system (e.g., a SPICE netlist), ABCD-NL produces a Boolean circuit representation (e.g., an And Inverter Graph, Finite State Machine, or Binary Decision Diagram) that captures the I/O behaviour of the given system, to near SPICE-level accuracy, without making any apriori simplifications. The Boolean models produced by ABCD-NL can be used for high-speed simulation and formal verification of AMS designs, by leveraging existing tools developed for Boolean/hybrid systems analysis (e.g., ABC [1]). We apply ABCD-NL to a number of SPICE-level AMS circuits, including data converters, charge pumps, comparators, non-linear signaling/communications sub-systems, etc. Also, we formally verify the throughput of an AMS signaling system - modelled in SPICE using 22nm BSIM4 transistors, Booleanized with high accuracy using ABCD-NL, and property-checked using ABC.

...read moreread less

Proceedings Article•DOI•

DPA: A data pattern aware error prevention technique for NAND flash lifetime extension

[...]

Jie Guo¹, Zhijie Chen¹, Danghui Wang, Zili Shao, Yi Chen¹ - Show less +1 more•Institutions (1)

University of Pittsburgh¹

20 Feb 2014

TL;DR: Data Pattern Aware (DPA) error protection technique is proposed to extend the lifespan of NAND flash based storage systems (NFSS) by up to 4×, offering a complementing solution to other lifetime enhancement techniques like wear-leveling.

...read moreread less

Abstract: The recent research reveals that the bit error rate of a NAND flash cell is highly dependent on the stored data patterns. In this work, we propose Data Pattern Aware (DPA) error protection technique to extend the lifespan of NAND flash based storage systems (NFSS). DPA manipulates the ratio of 1's and 0's in the stored data to minimize occurrence of the data patterns which are susceptible to bit error noise. Consequently, the NAND flash cell bit error rate is reduced, leading to system endurance extension. Our simulation result shows that, with marginal hardware and power overhead, DPA scheme can increase the NFSS lifetime by up to 4×, offering a complementing solution to other lifetime enhancement techniques like wear-leveling.

...read moreread less

Proceedings Article•DOI•

Through-silicon-via inductor: Is it real or just a fantasy?

[...]

Umamaheswara Rao Tida, Cheng Zhuo¹, Yiyu Shi•Institutions (1)

Intel¹

20 Feb 2014

TL;DR: A novel shield mechanism utilizing the micro-channel, a technique conventionally used for heat removal, to reduce the substrate loss and increase the quality factor and the inductance of the TSV inductor is proposed.

...read moreread less

Abstract: Through-silicon-vias (TSVs) can potentially be used to implement inductors in three-dimensional (3D) integrated systems for minimal footprint and large inductance. However, different from conventional 2D spiral inductors, TSV inductors are fully buried in the lossy substrate, thus suffering from low quality factor. In this paper, we propose a novel shield mechanism utilizing the micro-channel, a technique conventionally used for heat removal, to reduce the substrate loss. This technique increases the quality factor and the inductance of the TSV inductor by up to 21x and 17x respectively. It enables us to implement TSV inductors of up to 38x smaller area and 33% higher quality factor, compared with spiral inductors of the same inductance. To the best of the authors' knowledge, this is the first proposal on improving quality factor of TSV inductors. We hope our study shall point out a new and exciting research direction for 3D IC designers.

...read moreread less

Proceedings Article•DOI•

Efficient techniques for the capacitance extraction of chip-scale VLSI interconnects using floating random walk algorithm

[...]

Chao Zhang¹, Wenjian Yu¹•Institutions (1)

Tsinghua University¹

01 Jan 2014

TL;DR: To enable the capacitance extraction of chip-scale large VLSI layout using the floating random walk (FRW) algorithm, two techniques are proposed, including a virtual Gaussian surface sampling technique that makes efficient random sampling on theGaussian surface for complex nets with vias, and optimizes the sampling scheme to reduce the time of random walk.

...read moreread less

Abstract: To enable the capacitance extraction of chip-scale large VLSI layout using the floating random walk (FRW) algorithm, two techniques are proposed. The first one is a virtual Gaussian surface sampling technique. It makes efficient random sampling on the Gaussian surface for complex nets with vias, and optimizes the sampling scheme to reduce the time of random walk. The other one is a parallelized, improved construction approach for Octree based space management structure. It can be over 5000X faster than the existing approach and provides same convenience to the FRW procedure. Numerical experiments on large cases with up to half million conductors validate the proposed techniques, and demonstrate a fast FRW solver for chip-scale extraction task.

...read moreread less

Proceedings Article•DOI•

Energy efficient in-memory machine learning for data intensive image-processing by non-volatile domain-wall memory

[...]

Hao Yu¹, Yuhao Wang¹, Shuai Chen¹, Wei Fei¹, Chuliang Weng², Junfeng Zhao², Zhulin Wei² - Show less +3 more•Institutions (2)

Nanyang Technological University¹, Huawei²

01 Jan 2014

TL;DR: It is shown that all operations involved in machine learning on neural network can be mapped to a logic-in-memory architecture by non-volatile domain-wall nanowire, called DW-NN.

...read moreread less

Abstract: Image processing in conventional logic-memory I/O-integrated systems will incur significant communication congestion at memory I/Os for excessive big image data at exa-scale. This paper explores an in-memory machine learning on neural network architecture by utilizing the newly introduced domain-wall nanowire, called DW-NN. We show that all operations involved in machine learning on neural network can be mapped to a logic-in-memory architecture by non-volatile domain-wall nanowire. Domain-wall nanowire based logic is customized for in machine learning within image data storage. As such, both neural network training and processing can be performed locally within the memory. The experimental results show that system throughput in DW-NN is improved by 11.6x and the energy efficiency is improved by 92x when compared to conventional image processing system.

...read moreread less

Proceedings Article•DOI•

An overview of spin-based integrated circuits

[...]

Wang Kang, Weisheng Zhao, Zhaohao Wang, Jacques-Olivier Klein, Yue Zhang, Djaafar Chabi, Youguang Zhang¹, Dafiné Ravelosona, Claude Chappert - Show less +5 more•Institutions (1)

Beihang University¹

20 Feb 2014

TL;DR: The status and prospects of spin-based integrated circuits under intense investigation are overviewed and particularly their merits and challenges for practical applications are addressed.

...read moreread less

Abstract: Conventional CMOS integrated circuits suffer from serve power and scalability challenges as technology node scales into ultra-deep-micron technology nodes. Alternative approaches beyond charge-only based circuits. In particular, spin-based devices or integrated circuits show promising merits to overcome these issues by adding the spin freedom of electrons to the electronic circuits. Spintronics has now become a hot topic in both academics and industrials. This paper overviews the status and prospects of spin-based integrated circuits under intense investigation and address particularly their merits and challenges for practical applications.

...read moreread less

Proceedings Article•DOI•

Modeling and design analysis of 3D vertical resistive memory — A low cost cross-point architecture

[...]

Cong Xu¹, Niu Dimin¹, Shimeng Yu², Yuan Xie¹•Institutions (2)

Pennsylvania State University¹, Arizona State University²

20 Feb 2014

TL;DR: An array-level model is developed which is capable of analyzing the read/write noise margin of a 3D-VRAM array in the presence of the sneak leakage current and voltage drop and a system-level design tool is built that is able to explore the design space with specified constraints and find the optimal design points with different targets.

...read moreread less

Abstract: Resistive Random Access Memory (ReRAM) is one of the most promising emerging non-volatile memory (NVM) candidates due to its fast read/write speed, excellent scalability and low-power operation. Recently proposed 3D vertical cross-point ReRAM (3D-VRAM) architecture attracts a lot of attention because it offers a cost-competitive solution as NAND Flash replacement. In this work, we first develop an array-level model which includes the geometries and properties of all the components in the 3D structure. The model is capable of analyzing the read/write noise margin of a 3D-VRAM array in the presence of the sneak leakage current and voltage drop. Then we build a system-level design tool that is able to explore the design space with specified constraints and find the optimal design points with different targets. We also study the impact of different design parameters on the array size, bit density, and overall cost-per-bit. Compared to the state-of-the-art 3D horizontal ReRAM (3D-HRAM), the 3D-VRAM shows great cost advantage when stacking more than 16 layers.

...read moreread less

Proceedings Article•DOI•

Low power design of the next-generation High Efficiency Video Coding

[...]

Muhammad Shafique¹, Jorg Henkel¹•Institutions (1)

Karlsruhe Institute of Technology¹

20 Feb 2014

TL;DR: A comprehensive analysis of the computational complexity, power consumption, temperature, and memory access behavior for the next-generation High Efficiency Video Coding (HEVC) standard is provided.

...read moreread less

Abstract: This paper provides a comprehensive analysis of the computational complexity, power consumption, temperature, and memory access behavior for the next-generation High Efficiency Video Coding (HEVC) standard. We highlight the associated design challenges and present several low-power algorithmic and architectural techniques for developing power-efficient HEVC-based multimedia system. We explore the interplay between the algorithms and architectures to provide high power efficiency while leveraging the application-specific knowledge and video content characteristics.

...read moreread less

Proceedings Article•DOI•

3DLAT: TSV-based 3D ICs crosstalk minimization utilizing Less Adjacent Transition code

[...]

Qiaosha Zou¹, Niu Dimin¹, Yan Cao¹, Yuan Xie¹•Institutions (1)

Pennsylvania State University¹

01 Jan 2014

TL;DR: A novel ω-LAT coding scheme is proposed to reduce the capacitive crosstalk and minimize the power consumption overhead in the TSV array and combining with the Transition Signaling, the LAT coding scheme restricts the number of transitions in every transmission cycle to minimize the crosStalk and power consumption.

...read moreread less

Abstract: 3D integration is one of the promising solutions to overcome the interconnect bottleneck with vertical interconnect through-silicon vias (TSVs). This paper investigates the crosstalk in 3D IC designs, especially the capacitive crosstalk in TSV interconnects. We propose a novel ω-LAT coding scheme to reduce the capacitive crosstalk and minimize the power consumption overhead in the TSV array. Combining with the Transition Signaling, the LAT coding scheme restricts the number of transitions in every transmission cycle to minimize the crosstalk and power consumption. Compared to other 3D crosstalk minimization coding schemes, the proposed coding can provide the same delay reduction with more affordable overhead. The performance and power analysis show that when ω is 4, the proposed LAT coding scheme can achieve 38% interconnect crosstalk delay reduction compared to the data transmission without coding. By reducing the value of ω, further reduction can be achieved1.

...read moreread less

Proceedings Article•DOI•

Self-aligned double patterning layout decomposition with complementary e-beam lithography

[...]

Jhih-Rong Gao¹, Bei Yu¹, David Z. Pan¹•Institutions (1)

University of Texas at Austin¹

20 Feb 2014

TL;DR: A new layout decomposition framework for self-aligned double patterning and complementary EBL is presented, which considers overlay minimization and EBL throughput optimization simultaneously and performs conflict elimination by merge-and-cut technique.

...read moreread less

Abstract: Advanced lithography techniques enable higher pattern resolution; however, techniques such as extreme ultraviolet lithography and e-beam lithography (EBL) are not yet ready for high volume production. Recently, complementary lithography has become promising, which allows two different lithography processes work together to achieve high quality layout patterns while not increasing much manufacturing cost. In this paper, we present a new layout decomposition framework for self-aligned double patterning and complementary EBL, which considers overlay minimization and EBL throughput optimization simultaneously. We perform conflict elimination by merge-and-cut technique and formulate it as a matching-based problem. The results show that our approach is fast and effective, where all conflicts are solved with minimal overlay error and e-beam utilization.

...read moreread less