scispace - formally typeset
Search or ask a question

Showing papers by "Hewlett-Packard published in 2016"


Journal ArticleDOI
18 Jun 2016
TL;DR: This work explores an in-situ processing approach, where memristor crossbar arrays not only store input weights, but are also used to perform dot-product operations in an analog manner.
Abstract: A number of recent efforts have attempted to design accelerators for popular machine learning algorithms, such as those involving convolutional and deep neural networks (CNNs and DNNs). These algorithms typically involve a large number of multiply-accumulate (dot-product) operations. A recent project, DaDianNao, adopts a near data processing approach, where a specialized neural functional unit performs all the digital arithmetic operations and receives input weights from adjacent eDRAM banks.This work explores an in-situ processing approach, where memristor crossbar arrays not only store input weights, but are also used to perform dot-product operations in an analog manner. While the use of crossbar memory as an analog dot-product engine is well known, no prior work has designed or characterized a full-fledged accelerator based on crossbars. In particular, our work makes the following contributions: (i) We design a pipelined architecture, with some crossbars dedicated for each neural network layer, and eDRAM buffers that aggregate data between pipeline stages. (ii) We define new data encoding techniques that are amenable to analog computations and that can reduce the high overheads of analog-to-digital conversion (ADC). (iii) We define the many supporting digital components required in an analog CNN accelerator and carry out a design space exploration to identify the best balance of memristor storage/compute, ADCs, and eDRAM storage on a chip. On a suite of CNN and DNN workloads, the proposed ISAAC architecture yields improvements of 14.8×, 5.5×, and 7.5× in throughput, energy, and computational density (respectively), relative to the state-of-the-art DaDianNao architecture.

1,558 citations


Journal ArticleDOI
18 Jun 2016
TL;DR: This work proposes a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory, and distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving.
Abstract: Processing-in-memory (PIM) is a promising solution to address the "memory wall" challenges for future computer systems. Prior proposed PIM architectures put additional computation logic in or near memory. The emerging metal-oxide resistive random access memory (ReRAM) has showed its potential to be used for main memory. Moreover, with its crossbar array structure, ReRAM can perform matrix-vector multiplication efficiently, and has been widely studied to accelerate neural network (NN) applications. In this work, we propose a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory. In PRIME, a portion of ReRAM crossbar arrays can be configured as accelerators for NN applications or as normal memory for a larger memory space. We provide microarchitecture and circuit designs to enable the morphable functions with an insignificant area overhead. We also design a software/hardware interface for software developers to implement various NNs on PRIME. Benefiting from both the PIM architecture and the efficiency of using ReRAM for NN computation, PRIME distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance by ~2360× and the energy consumption by ~895×, across the evaluated machine learning benchmarks.

1,197 citations


Proceedings ArticleDOI
05 Jun 2016
TL;DR: The Dot-Product Engine (DPE) is developed as a high density, high power efficiency accelerator for approximate matrix-vector multiplication, invented a conversion algorithm to map arbitrary matrix values appropriately to memristor conductances in a realistic crossbar array.
Abstract: Vector-matrix multiplication dominates the computation time and energy for many workloads, particularly neural network algorithms and linear transforms (e.g, the Discrete Fourier Transform). Utilizing the natural current accumulation feature of memristor crossbar, we developed the Dot-Product Engine (DPE) as a high density, high power efficiency accelerator for approximate matrix-vector multiplication. We firstly invented a conversion algorithm to map arbitrary matrix values appropriately to memristor conductances in a realistic crossbar array, accounting for device physics and circuit issues to reduce computational errors. The accurate device resistance programming in large arrays is enabled by close-loop pulse tuning and access transistors. To validate our approach, we simulated and benchmarked one of the state-of-the-art neural networks for pattern recognition on the DPEs. The result shows no accuracy degradation compared to software approach (99 % pattern recognition accuracy for MNIST data set) with only 4 Bit DAC/ADC requirement, while the DPE can achieve a speed-efficiency product of 1,000× to 10,000× compared to a custom digital ASIC.

603 citations


Proceedings ArticleDOI
05 Jun 2016
TL;DR: This work proposes Pinatubo, a Processing In Non-volatile memory ArchiTecture for bUlk Bitwise Operations, which redesigns the read circuitry so that it can compute the bitwise logic of two or more memory rows very efficiently, and support one-step multi-row operations.
Abstract: Processing-in-memory (PIM) provides high bandwidth, massive parallelism, and high energy efficiency by implementing computations in main memory, therefore eliminating the overhead of data movement between CPU and memory. While most of the recent work focused on PIM in DRAM memory with 3D die-stacking technology, we propose to leverage the unique features of emerging non-volatile memory (NVM), such as resistance-based storage and current sensing, to enable efficient PIM design in NVM. We propose Pinatubo1, a Processing In Non-volatile memory ArchiTecture for bUlk Bitwise Operations. Instead of integrating complex logic inside the cost-sensitive memory, Pinatubo redesigns the read circuitry so that it can compute the bitwise logic of two or more memory rows very efficiently, and support one-step multi-row operations. The experimental results on data intensive graph processing and database applications show that Pinatubo achieves a ∼500 x speedup, ∼28000x energy saving on bitwise operations, and 1.12× overall speedup, 1.11× overall energy saving over the conventional processor.

389 citations


Journal ArticleDOI
TL;DR: In this Review, memristors are examined from the frameworks of both von Neumann and neuromorphic computing architectures and a new logic computational process based on the material implication is discussed, which will substantially decrease the energy consumption for futuristic information technology.
Abstract: In this Review, memristors are examined from the frameworks of both von Neumann and neuromorphic computing architectures. For the former, a new logic computational process based on the material implication is discussed. It consists of several memristors which play roles of combined logic processor and memory, called stateful logic circuit. In this circuit configuration, the logic process flows primarily along a time dimension, whereas in current von Neumann computers it occurs along a spatial dimension. In the stateful logic computation scheme, the energy required for the data transfer between the logic and memory chips can be saved. The non-volatile memory in this circuit also saves the energy required for the data refresh. Neuromorphic (cognitive) computing refers to a computing paradigm that mimics the human brain. Currently, the neuromorphic or cognitive computing mainly relies on the software emulation of several brain functionalities, such as image and voice recognition utilizing the recently highlighted deep learning algorithm. However, the human brain typically consumes ≈10–20 Watts for selected “human-like” tasks, which can be currently mimicked by a supercomputer with power consumption of several tens of kilo- to megawatts. Therefore, hardware implementation of such brain functionality must be eventually sought for power-efficient computation. Several fundamental ideas for utilizing the memristors and their recent progresses in these regards are reviewed. Finally, material and processing issues are dealt with, which is followed by the conclusion and outlook of the field. These technical improvements will substantially decrease the energy consumption for futuristic information technology.

260 citations


Journal ArticleDOI
TL;DR: In this article, the formation of an Al-rich conduction channel through the AlN layer is revealed, and the motion of positively charged nitrogen vacancies is likely responsible for the observed switching.
Abstract: High-performance memristors based on AlN films have been demonstrated, which exhibit ultrafast ON/OFF switching times (≈85 ps for microdevices with waveguide) and relatively low switching current (≈15 μA for 50 nm devices). Physical characterizations are carried out to understand the device switching mechanism, and rationalize speed and energy performance. The formation of an Al-rich conduction channel through the AlN layer is revealed. The motion of positively charged nitrogen vacancies is likely responsible for the observed switching.

245 citations


Journal ArticleDOI
TL;DR: Evaluating occupancy trends for 511 populations of terrestrial mammals and birds, representing 244 species from 15 tropical forest protected areas on three continents, finds that occupancy declined in 22, increased in 17%, and exhibited no change in 22% of populations during the last 3–8 years, while 39% of population were detected too infrequently to assess occupancy changes.
Abstract: Extinction rates in the Anthropocene are three orders of magnitude higher than background and disproportionately occur in the tropics, home of half the world’s species. Despite global efforts to combat tropical species extinctions, lack of high-quality, objective information on tropical biodiversity has hampered quantitative evaluation of conservation strategies. In particular, the scarcity of population-level monitoring in tropical forests has stymied assessment of biodiversity outcomes, such as the status and trends of animal populations in protected areas. Here, we evaluate occupancy trends for 511 populations of terrestrial mammals and birds, representing 244 species from 15 tropical forest protected areas on three continents. For the first time to our knowledge, we use annual surveys from tropical forests worldwide that employ a standardized camera trapping protocol, and we compute data analytics that correct for imperfect detection. We found that occupancy declined in 22%, increased in 17%, and exhibited no change in 22% of populations during the last 3–8 years, while 39% of populations were detected too infrequently to assess occupancy changes. Despite extensive variability in occupancy trends, these 15 tropical protected areas have not exhibited systematic declines in biodiversity (i.e., occupancy, richness, or evenness) at the community level. Our results differ from reports of widespread biodiversity declines based on aggregated secondary data and expert opinion and suggest less extreme deterioration in tropical forest protected areas. We simultaneously fill an important conservation data gap and demonstrate the value of large-scale monitoring infrastructure and powerful analytics, which can be scaled to incorporate additional sites, ecosystems, and monitoring methods. In an era of catastrophic biodiversity loss, robust indicators produced from standardized monitoring infrastructure are critical to accurately assess population outcomes and identify conservation strategies that can avert biodiversity collapse.

188 citations


Journal ArticleDOI
TL;DR: Evaluation in two testbeds on Android phones shows that the SemanticSLAM system can achieve 0.53 meters human median localization errors, which is 62 percent better than a system that does not use SLAM, and has a 33 percent lower convergence time.
Abstract: Indoor localization using mobile sensors has gained momentum lately. Most of the current systems rely on an extensive calibration step to achieve high accuracy. We propose SemanticSLAM , a novel unsupervised indoor localization scheme that bypasses the need for war-driving. SemanticSLAM leverages the idea that certain locations in an indoor environment have a unique signature on one or more phone sensors. Climbing stairs, for example, has a distinct pattern on the phone's accelerometer; a specific spot may experience an unusual magnetic interference while another may have a unique set of Wi-Fi access points covering it. SemanticSLAM uses these unique points in the environment as landmarks and combines them with dead-reckoning in a new Simultaneous Localization And Mapping (SLAM) framework to reduce both the localization error and convergence time. In particular, the phone inertial sensors are used to keep track of the user's path, while the observed landmarks are used to compensate for the accumulation of error in a unified probabilistic framework. Evaluation in two testbeds on Android phones shows that the system can achieve $0.53$ meters human median localization errors. In addition, the system can detect the location of landmarks with 0.83 meters median error. This is 62 percent better than a system that does not use SLAM. Moreover, SemanticSLAM has a 33 percent lower convergence time compared to the same systems. This highlights the promise of SemanticSLAM as an unconventional approach for indoor localization.

176 citations


Journal ArticleDOI
TL;DR: In this paper, the authors developed a highly accurate compact dynamical model for their electrical conduction that showed that the negative differential resistance in these devices results from a thermal feedback mechanism, which can be minimized by thermally isolating the selector or by incorporating materials with larger activation energies for electron motion.
Abstract: A number of important commercial applications would benefit from the introduction of easily manufactured devices that exhibit current-controlled, or “S-type,” negative differential resistance (NDR). A leading example is emerging non-volatile memory based on crossbar array architectures. Due to the inherently linear current vs. voltage characteristics of candidate non-volatile memristor memory elements, individual memory cells in these crossbar arrays can be addressed only if a highly non-linear circuit element, termed a “selector,” is incorporated in the cell. Selectors based on a layer of niobium oxide sandwiched between two electrodes have been investigated by a number of groups because the NDR they exhibit provides a promisingly large non-linearity. We have developed a highly accurate compact dynamical model for their electrical conduction that shows that the NDR in these devices results from a thermal feedback mechanism. A series of electrothermal measurements and numerical simulations corroborate this model. These results reveal that the leakage currents can be minimized by thermally isolating the selector or by incorporating materials with larger activation energies for electron motion.

157 citations


Journal ArticleDOI
TL;DR: A charge-trap-associated switching model is proposed to account for this self-rectifying memrisive behavior and an asymmetric voltage scheme (AVS) to decrease the write power consumption by utilizing thisSelf- rectifying memristor is described.
Abstract: A Pt/NbOx/TiOy/NbOx/TiN stack integrated on a 30 nm contact via shows a programming current as low as 10 nA and 1 pA for the set and reset switching, respectively, and a self-rectifying ratio as high as ∼105, which are suitable characteristics for low-power memristor applications. It also shows a forming-free characteristic. A charge-trap-associated switching model is proposed to account for this self-rectifying memrisive behavior. In addition, an asymmetric voltage scheme (AVS) to decrease the write power consumption by utilizing this self-rectifying memristor is also described. When the device is used in a 1000 × 1000 crossbar array with the AVS, the programming power can be decreased to 8.0% of the power consumption of a conventional biasing scheme. If the AVS is combined with a nonlinear selector, a power consumption reduction to 0.31% of the reference value is possible.

150 citations


Proceedings ArticleDOI
25 Mar 2016
TL;DR: This work presents the design and implementation of JUSTDO logging, a new failure atomicity mechanism that greatly reduces the memory footprint of logs, simplifies log management, and enables fast parallel recovery following failure.
Abstract: Persistent memory invites applications to manipulate persistent data via load and store instructions. Because failures during updates may destroy transient data (e.g., in CPU registers), preserving data integrity in the presence of failures requires failure-atomic bundles of updates. Prior failure atomicity approaches for persistent memory entail overheads due to logging and CPU cache flushing. Persistent caches can eliminate the need for flushing, but conventional logging remains complex and memory intensive. We present the design and implementation of JUSTDO logging, a new failure atomicity mechanism that greatly reduces the memory footprint of logs, simplifies log management, and enables fast parallel recovery following failure. Crash-injection tests confirm that JUSTDO logging preserves application data integrity and performance evaluations show that it improves throughput 3x or more compared with a state-of-the-art alternative for a spectrum of data-intensive algorithms.

Journal ArticleDOI
TL;DR: This work investigated the suitability of tantalum oxide (TaOx) transistor-memristor (1T1R) arrays for such applications, particularly the ability to accurately, repeatedly, and rapidly reach arbitrary conductance states and the trade-offs between programming speed and programming error.
Abstract: Beyond use as high density non-volatile memories, memristors have potential as synaptic components of neuromorphic systems. We investigated the suitability of tantalum oxide (TaOx) transistor-memristor (1T1R) arrays for such applications, particularly the ability to accurately, repeatedly, and rapidly reach arbitrary conductance states. Programming is performed by applying an adaptive pulsed algorithm that utilizes the transistor gate voltage to control the SET switching operation and increase programming speed of the 1T1R cells. We show the capability of programming 64 conductance levels with <0.5% average accuracy using 100 ns pulses and studied the trade-offs between programming speed and programming error. The algorithm is also utilized to program 16 conductance levels on a population of cells in the 1T1R array showing robustness to cell-to-cell variability. In general, the proposed algorithm results in approximately 10× improvement in programming speed over standard algorithms that do not use the transistor gate to control memristor switching. In addition, after only two programming pulses (an initialization pulse followed by a programming pulse), the resulting conductance values are within 12% of the target values in all cases. Finally, endurance of more than 10(6) cycles is shown through open-loop (single pulses) programming across multiple conductance levels using the optimized gate voltage of the transistor. These results are relevant for applications that require high speed, accurate, and repeatable programming of the cells such as in neural networks and analog data processing.

Proceedings ArticleDOI
19 Oct 2016
TL;DR: This paper presents Makalu, a system that addresses non-volatile memory management and offers an integrated allocator and recovery-time garbage collector that maintains internal consistency, avoids NVRAM memory leaks, and is efficient, all in the face of failures.
Abstract: Byte addressable non-volatile memory (NVRAM) is likely to supplement, and perhaps eventually replace, DRAM. Applications can then persist data structures directly in memory instead of serializing them and storing them onto a durable block device. However, failures during execution can leave data structures in NVRAM unreachable or corrupt. In this paper, we present Makalu, a system that addresses non-volatile memory management. Makalu offers an integrated allocator and recovery-time garbage collector that maintains internal consistency, avoids NVRAM memory leaks, and is efficient, all in the face of failures. We show that a careful allocator design can support a less restrictive and a much more familiar programming model than existing persistent memory allocators. Our allocator significantly reduces the per allocation persistence overhead by lazily persisting non-essential metadata and by employing a post-failure recovery-time garbage collector. Experimental results show that the resulting online speed and scalability of our allocator are comparable to well-known transient allocators, and significantly better than state-of-the-art persistent allocators.

Journal ArticleDOI
TL;DR: In this article, a behavioral circuit model for microring that quantitatively explains the wide variations in resonance splitting observed in experiments is presented. But, due to the stochastic nature of backscattering, this splitting is different for each resonance.
Abstract: Silicon microring resonators very often exhibit resonance splitting due to backscattering This effect is hard to quantitatively and predicatively model This paper presents a behavioral circuit model for microrings that quantitatively explains the wide variations in resonance splitting observed in experiments The model is based on an in-depth analysis of the contributions to backscattering by both the waveguides and couplers Backscattering transforms unidirectional microrings into bidirectional circuits by coupling the clockwise and counterclockwise circulating modes In high-Q microrings, visible resonance splitting will be induced, but, due to the stochastic nature of backscattering, this splitting is different for each resonance Our model, based on temporal coupled mode theory, and the associated fitting method, are both accurate and robust, and can also explain asymmetrically split resonances The cause of asymmetric resonance splitting is identified as the backcoupling in the couplers This is experimentally confirmed and its dependency on gap and coupling length is further analyzed Moreover, the wide variation in resonance splitting of one spectrum is analyzed and successfully explained by our circuit model that incorporates most linear parasitic effects in the microring This analysis uncovers multi-cavity interference within the microring as an important source of this variation

Journal ArticleDOI
TL;DR: A dynamic voltage divider between the RS and memristor during both the set and the reset switching cycles can suppress the inherent irregularity of the voltage dropped on the memory, resulting in a greatly reduced switching variability.
Abstract: The impact of a series resistor (R(S)) on the variability and endurance performance of memristor was studied in the TaO(x) memristive system. A dynamic voltage divider between the R(S) and memristor during both the set and the reset switching cycles can suppress the inherent irregularity of the voltage dropped on the memristor, resulting in a greatly reduced switching variability. By selecting the proper resistance value of R(S) for the set and reset cycles respectively, we observed a dramatically improved endurance of the TaO(x) memristor. Such a voltage divider effect can thus be critical for the memristor applications that require low variability, high endurance and fast speed.

Proceedings ArticleDOI
01 Dec 2016
TL;DR: A patient similarity evaluation framework based on temporal matching of longitudinal patient EHRs, which takes a convolutional neural network architecture, and learns an optimal representation of patient clinical record through medical concept embedding.
Abstract: Evaluating the clinical similarities between pairwisepatients is a fundamental problem in healthcare informatics. Aproper patient similarity measure enables various downstreamapplications, such as cohort study and treatment comparative effectiveness research. One major carrier for conductingpatient similarity research is the Electronic Health Records(EHRs), which are usually heterogeneous, longitudinal, andsparse. Though existing studies on learning patient similarityfrom EHRs have shown being useful in solving real clinicalproblems, their applicability is limited due to the lack of medicalinterpretations. Moreover, most previous methods assume avector based representation for patients, which typically requiresaggregation of medical events over a certain time period. As aconsequence, the temporal information will be lost. In this paper, we propose a patient similarity evaluation framework based ontemporal matching of longitudinal patient EHRs. Two efficientmethods are presented, unsupervised and supervised, both ofwhich preserve the temporal properties in EHRs. The supervisedscheme takes a convolutional neural network architecture, andlearns an optimal representation of patient clinical recordswith medical concept embedding. The empirical results on real-world clinical data demonstrate substantial improvement overthe baselines.

Proceedings ArticleDOI
14 Mar 2016
TL;DR: This work designs a system for incremental deployment of hybrid SDN networks consisting of both legacy forwarding devices and programmable SDN switches, and designs the system on a production SDN controller to answer the following questions: which legacy devices to upgrade to SDN, and how legacy and SDN devices can interoperate in a hybrid environment.
Abstract: Introducing SDN into an existing network causes both deployment and operational issues. A systematic incremental deployment methodology as well as a hybrid operation model is needed. We present such a system for incremental deployment of hybrid SDN networks consisting of both legacy forwarding devices (i.e., traditional IP routers) and programmable SDN switches. We design the system on a production SDN controller to answer the following questions: which legacy devices to upgrade to SDN, and how legacy and SDN devices can interoperate in a hybrid environment to satisfy a variety of traffic engineering (TE) goals such as load balancing and fast failure recovery. Evaluation on real ISP and enterprise topologies shows that with only 20% devices upgraded to SDN, our system reduces the maximum link usage by an average of 32% compared with pure-legacy networks (shortest path routing), while only requiring an average of 41% of flow table capacity compared with pure-SDN networks.

Journal ArticleDOI
20 Aug 2016
TL;DR: In this article, a waveguide Si-germanium (Si-Ge)-based avalanche photodiodes (APDs) have shown a significant improvement in receiver sensitivity compared to their III-V counterparts due to the superior impact ionization property of silicon.
Abstract: Silicon-germanium (Si–Ge)-based avalanche photodiodes (APDs) have shown a significant improvement in receiver sensitivity compared to their III–V counterparts due to the superior impact ionization property of silicon. However, conventional Si–Ge APDs typically operate at high voltages and low speed, limiting the application of this technology to data communication. In this paper, we present a waveguide Si–Ge avalanche photodiode using a thin silicon multiplication region with a breakdown voltage of −10 V, a speed of 25 GHz, and a gain-bandwidth product (GBP) of 276 GHz. At 1550 nm, sensitivities of −25 dBm and −16 dBm are achieved at 12.5 Gbps and 25 Gbps, respectively. This design will enable implementation of Si–Ge APDs for optical interconnects in data centers and high-performance computers, allowing significant reductions in aggregate system laser power (and therefore cost).

Journal ArticleDOI
TL;DR: In this paper, a three-terminal hybrid III-V-on-silicon laser that integrates a metaloxide-semiconductor (MOS) capacitor into the laser cavity is demonstrated.
Abstract: Finely tunable microring laser exploits integrated capacitive structure. Large-scale computer installations are severely limited by network-bandwidth constraints and energy costs that arise from architectural designs originally based on copper interconnects1. Wavelength-division multiplexed (WDM) photonic links can increase the network bandwidth but are sensitive to environmental perturbations and manufacturing imperfections that can affect the precise emission wavelength and output power of laser transmitters2,3. Here, we demonstrate a new design of a three-terminal hybrid III–V-on-silicon laser that integrates a metal-oxide-semiconductor (MOS) capacitor into the laser cavity. The MOS capacitor makes it possible to introduce the plasma-dispersion effect4 and thus change the laser modal refractive index and free-carrier absorption (FCA) loss to tune the laser wavelength and output power, respectively. The approach enables a highly energy efficient method to tune the output power and wavelength of microring lasers, with future prospects for high-speed, chirp-free direct laser modulation. The concept is potentially applicable to other diode laser platforms.

Journal ArticleDOI
TL;DR: The stochastic behaviour near the point contact regime is model using Molecular Dynamics–Langevin simulations and the observed frequency-dependent noise behaviour is understood in terms of thermally activated atomic-scale fluctuations that make and break a quantum conductance channel.
Abstract: Tantalum oxide memristors can switch continuously from a low-conductance semiconducting to a high-conductance metallic state. At the boundary between these two regimes are quantized conductance states, which indicate the formation of a point contact within the oxide characterized by multistable conductance fluctuations and enlarged electronic noise. Here, we observe diverse conductance-dependent noise spectra, including a transition from 1/f(2) (activated transport) to 1/f (flicker noise) as a function of the frequency f, and a large peak in the noise amplitude at the conductance quantum GQ=2e(2)/h, in contrast to suppressed noise at the conductance quantum observed in other systems. We model the stochastic behaviour near the point contact regime using Molecular Dynamics-Langevin simulations and understand the observed frequency-dependent noise behaviour in terms of thermally activated atomic-scale fluctuations that make and break a quantum conductance channel. These results provide insights into switching mechanisms and guidance to device operating ranges for different applications.

Journal ArticleDOI
TL;DR: Oxygen migration in tantalum oxide, a promising next-generation storage material, is studied using in operando X-ray absorption spectromicroscopy, establishing the critical role of temperature-driven oxygen migration.
Abstract: Oxygen migration in tantalum oxide, a promising next-generation storage material, is studied using in operando X-ray absorption spectromicroscopy. This approach allows a physical description of the evolution of conduction channel and eventual device failure. The observed ring-like patterns of oxygen concentration are modeled using thermophoretic forces and Fick diffusion, establishing the critical role of temperature-driven oxygen migration.

Journal ArticleDOI
TL;DR: An integrated memory cell with a memristor and a trilayer crested barrier selector, showing repeatable nonlinear current–voltage switching loops is presented.
Abstract: An integrated memory cell with a mem-ristor and a trilayer crested barrier selector, showing repeatable nonlinear current-voltage switching loops is presented. The fully atomic-layer-deposited TaN1+x /Ta2 O5 /TaN1+x crested barrier selector yields a large nonlinearity (>10(4) ), high endurance (>10(8) ), low variability, and low temperature dependence.

Journal ArticleDOI
01 Oct 2016
TL;DR: The key objective of MOCC is to avoid clobbered reads for high conflict workloads, without any centralized mechanisms or heavyweight interthread communication, and it achieves orders of magnitude higher performance for dynamic workloads on modern servers.
Abstract: Future servers will be equipped with thousands of CPU cores and deep memory hierarchies. Traditional concurrency control (CC) schemes---both optimistic and pessimistic---slow down orders of magnitude in such environments for highly contended workloads. Optimistic CC (OCC) scales the best for workloads with few conflicts, but suffers from clobbered reads for high conflict workloads. Although pessimistic locking can protect reads, it floods cache-coherence backbones in deep memory hierarchies and can also cause numerous deadlock aborts.This paper proposes a new CC scheme, mostly-optimistic concurrency control (MOCC), to address these problems. MOCC achieves orders of magnitude higher performance for dynamic workloads on modern servers. The key objective of MOCC is to avoid clobbered reads for high conflict workloads, without any centralized mechanisms or heavyweight interthread communication. To satisfy such needs, we devise a native, cancellable reader-writer spinlock and a serializable protocol that can acquire, release and re-acquire locks in any order without expensive interthread communication. For low conflict workloads, MOCC maintains OCC's high performance without taking read locks.Our experiments with high conflict YCSB workloads on a 288-core server reveal that MOCC performs 8× and 23× faster than OCC and pessimistic locking, respectively. It achieves 17 million TPS for TPC-C and more than 110 million TPS for YCSB without conflicts, 170× faster than pessimistic methods.

Journal ArticleDOI
13 Dec 2016-ACS Nano
TL;DR: It is shown that the formation and dissolution of the conduction channel are successfully modeled by radial thermophoresis and Fick diffusion of oxygen atoms driven by Joule heating, confirming and quantification of two opposing nanoscale radial forces that affect bipolar memristor switching.
Abstract: Transition-metal-oxide memristors, or resistive random-access memory (RRAM) switches, are under intense development for storage-class memory because of their favorable operating power, endurance, speed, and density. Their commercial deployment critically depends on predictive compact models based on understanding nanoscale physicochemical forces, which remains elusive and controversial owing to the difficulties in directly observing atomic motions during resistive switching, Here, using scanning transmission synchrotron X-ray spectromicroscopy to study in situ switching of hafnium oxide memristors, we directly observed the formation of a localized oxygen-deficiency-derived conductive channel surrounded by a low-conductivity ring of excess oxygen. Subsequent thermal annealing homogenized the segregated oxygen, resetting the cells toward their as-grown resistance state. We show that the formation and dissolution of the conduction channel are successfully modeled by radial thermophoresis and Fick diffusion o...

Proceedings ArticleDOI
03 Oct 2016
TL;DR: This paper designs and implements MUSE, a lightweight user grouping algorithm, which addresses the above challenges and shows MUSE can achieve high throughput gains over existing designs.
Abstract: Multi-User MIMO, the hallmark of IEEE 802.11ac and the upcoming 802.11ax, promises significant throughput gains by supporting multiple concurrent data streams to a group of users. However, identifying the best-throughput MU-MIMO groups in commodity 802.11ac networks poses three major challenges: a) Commodity 802.11ac users do not provide full CSI feedback, which has been widely used for MU-MIMO grouping. b) Heterogeneous channel bandwidth users limit grouping opportunities. c) Limited-resource on APs cannot support computationally and memory expensive operations, required by existing algorithms. Hence, state-of-the-art designs are either not portable in 802.11ac APs, or perform poorly, as shown by our testbed experiments. In this paper, we design and implement MUSE, a lightweight user grouping algorithm, which addresses the above challenges. Our experiments with commodity 802.11ac testbeds show MUSE can achieve high throughput gains over existing designs.

Proceedings Article
01 Jan 2016
TL;DR: This paper proposes a new device-agnostic system, called BLE-Guardian, that protects the privacy of the users/environments equipped with BLE devices/IoTs and enables the users and administrators to control those who discover, scan and connect to their devices.
Abstract: Bluetooth Low Energy (BLE) has emerged as an attractive technology to enable Internet of Things (IoTs) to interact with others in their vicinity. Our study of the behavior of more than 200 types of BLE-equipped devices has led to a surprising discovery: the BLE protocol, despite its privacy provisions, fails to address the most basic threat of all—hiding the device’s presence from curious adversaries. Revealing the device’s existence is the stepping stone toward more serious threats that include user profiling/fingerprinting, behavior tracking, inference of sensitive information, and exploitation of known vulnerabilities on the device. With thousands of manufacturers and developers around the world, it is very challenging, if not impossible, to envision the viability of any privacy or security solution that requires changes to the devices or the BLE protocol. In this paper, we propose a new device-agnostic system, called BLE-Guardian, that protects the privacy of the users/environments equipped with BLE devices/IoTs. It enables the users and administrators to control those who discover, scan and connect to their devices. We have implemented BLE-Guardian using Ubertooth One, an off-the-shelf open Bluetooth development platform, facilitating its broad deployment. Our evaluation with real devices shows that BLE-Guardian effectively protects the users’ privacy while incurring little overhead on the communicating BLE-devices.

Proceedings ArticleDOI
25 Mar 2016
TL;DR: This paper proposes Silent Shredder, which repurposes initialization vectors used in standard counter mode encryption to completely eliminate the data shredding writes in operating systems, and speeds up reading shredded cache lines, and hence reduces power consumption and improves overall performance.
Abstract: As non-volatile memory (NVM) technologies are expected to replace DRAM in the near future, new challenges have emerged. For example, NVMs have slow and power-consuming writes, and limited write endurance. In addition, NVMs have a data remanence vulnerability, i.e., they retain data for a long time after being powered off. NVM encryption alleviates the vulnerability, but exacerbates the limited endurance by increasing the number of writes to memory. We observe that, in current systems, a large percentage of main memory writes result from data shredding in operating systems, a process of zeroing out physical pages before mapping them to new processes, in order to protect previous processes' data. In this paper, we propose Silent Shredder, which repurposes initialization vectors used in standard counter mode encryption to completely eliminate the data shredding writes. Silent Shredder also speeds up reading shredded cache lines, and hence reduces power consumption and improves overall performance. To evaluate our design, we run three PowerGraph applications and 26 multi-programmed workloads from the SPEC 2006 suite, on a gem5-based full system simulator. Silent Shredder eliminates an average of 48.6% of the writes in the initialization and graph construction phases. It speeds up main memory reads by 3.3 times, and improves the number of instructions per cycle (IPC) by 6.4% on average. Finally, we discuss several use cases, including virtual machines' data isolation and user-level large data initialization, where Silent Shredder can be used effectively at no extra cost.

Journal ArticleDOI
TL;DR: Experiments on publicly available datasets show that the proposed reference-based method for person reidentification across different cameras outperforms most of the state-of-the-art approaches.
Abstract: Person identification across nonoverlapping cameras, also known as person reidentification, aims to match people at different times and locations. Reidentifying people is of great importance in crucial applications such as wide-area surveillance and visual tracking. Due to the appearance variations in pose, illumination, and occlusion in different camera views, person reidentification is inherently difficult. To address these challenges, a reference-based method is proposed for person reidentification across different cameras. Instead of directly matching people by their appearance, the matching is conducted in a reference space where the descriptor for a person is translated from the original color or texture descriptors to similarity measures between this person and the exemplars in the reference set. A subspace is first learned in which the correlations of the reference data from different cameras are maximized using regularized canonical correlation analysis (RCCA). For reidentification, the gallery data and the probe data are projected onto this RCCA subspace and the reference descriptors (RDs) of the gallery and probe are generated by computing the similarity between them and the reference data. The identity of a probe is determined by comparing the RD of the probe and the RDs of the gallery. A reranking step is added to further improve the results using a saliency-based matching scheme. Experiments on publicly available datasets show that the proposed method outperforms most of the state-of-the-art approaches.

Posted Content
TL;DR: An analytical framework for distribution of popular content in an information centric network (ICN) that is comprised of access ICNs, a transit ICN, and a content provider is developed using a generalized Zipf distribution to model content popularity.
Abstract: We develop an analytical framework for distribution of popular content in an Information Centric Network (ICN) that comprises of Access ICNs, a Transit ICN and a Content Provider. Using a generalized Zipf distribution to model content popularity, we devise a game theoretic approach to jointly determine caching and pricing strategies in such an ICN. Under the assumption that the caching cost of the access and transit ICNs is inversely proportional to popularity, we show that the Nash caching strategies in the ICN are 0-1 (all or nothing) strategies. Further, for the case of symmetric Access ICNs, we show that the Nash equilibrium is unique and the caching policy (0 or 1) is determined by a threshold on the popularity of the content (reflected by the Zipf probability metric), i.e., all content more popular than the threshold value is cached. We also show that the resulting threshold of the Access and Transit ICNs, as well as all prices can be obtained by a decomposition of the joint caching and pricing problem into two independent caching only and pricing only problems.

Proceedings ArticleDOI
07 May 2016
TL;DR: The first real-world investigation of software practitioners' ability to identify gender-inclusiveness issues in software they create/maintain using GenderMag is presented, which was a multiple-case field study of software teams at three major U.S. technology organizations.
Abstract: Gender inclusiveness in computing settings is receiving a lot of attention, but one potentially critical factor has mostly been overlooked -- software itself To help close this gap, we recently created GenderMag, a systematic inspection method to enable software practitioners to evaluate their software for issues of gender-inclusiveness In this paper, we present the first real-world investigation of software practitioners' ability to identify gender-inclusiveness issues in software they create/maintain using this method Our investigation was a multiple-case field study of software teams at three major US technology organizations The results were that, using GenderMag to evaluate software, these software practitioners identified a surprisingly high number of gender-inclusiveness issues: 25% of the software features they evaluated had gender-inclusiveness issues