scispace - formally typeset
Search or ask a question

Showing papers by "Nagarajan Kandasamy published in 2019"


Journal ArticleDOI
07 Oct 2019
TL;DR: PALP as mentioned in this paper is a new mechanism that enables partition-level parallelism within each PCM bank, and exploits such parallelism by using the memory controller's access scheduling decisions.
Abstract: Phase-change memory (PCM) devices have multiple banks to serve memory requests in parallel. Unfortunately, if two requests go to the same bank, they have to be served one after another, leading to lower system performance. We observe that a modern PCM bank is implemented as a collection of partitions that operate mostly independently while sharing a few global peripheral structures, which include the sense amplifiers (to read) and the write drivers (to write). Based on this observation, we propose PALP, a new mechanism that enables partition-level parallelism within each PCM bank, and exploits such parallelism by using the memory controller’s access scheduling decisions. PALP consists of three new contributions. First, we introduce new PCM commands to enable parallelism in a bank’s partitions in order to resolve the read-write bank conflicts, with no changes needed to PCM logic or its interface. Second, we propose simple circuit modifications that introduce a new operating mode for the write drivers, in addition to their default mode of serving write requests. When configured in this new mode, the write drivers can resolve the read-read bank conflicts, working jointly with the sense amplifiers. Finally, we propose a new access scheduling mechanism in PCM that improves performance by prioritizing those requests that exploit partition-level parallelism over other requests, including the long outstanding ones. While doing so, the memory controller also guarantees starvation-freedom and the PCM’s running-average-power-limit (RAPL).We evaluate PALP with workloads from the MiBench and SPEC CPU2017 Benchmark suites. Our results show that PALP reduces average PCM access latency by 23%, and improves average system performance by 28% compared to the state-of-the-art approaches.

38 citations


Journal ArticleDOI
TL;DR: The proposed framework first extracts the precise times at which a charge pump in the hardware is activated to support neural computations within a workload, then uses a characterized NBTI reliability model to estimate the charge pump's aging during the workload execution.
Abstract: Neuromorphic hardware with non-volatile memory (NVM) can implement machine learning workload in an energy-efficient manner. Unfortunately, certain NVMs such as phase change memory (PCM) require high voltages for correct operation. These voltages are supplied from an on-chip charge pump. If the charge pump is activated too frequently, its internal CMOS devices do not recover from stress, accelerating their aging and leading to negative bias temperature instability (NBTI) generated defects. Forcefully discharging the stressed charge pump can lower the aging rate of its CMOS devices, but makes the neuromorphic hardware unavailable to perform computations while its charge pump is being discharged. This negatively impacts performance such as latency and accuracy of the machine learning workload being executed. In this letter, we propose a novel framework to exploit workload-specific performance and lifetime trade-offs in neuromorphic computing. Our framework first extracts the precise times at which a charge pump in the hardware is activated to support neural computations within a workload. This timing information is then used with a characterized NBTI reliability model to estimate the charge pump's aging during the workload execution. We use our framework to evaluate workload-specific performance and reliability impacts of using 1) different SNN mapping strategies and 2) different charge pump discharge strategies. We show that our framework can be used by system designers to explore performance and reliability trade-offs early in the design of neuromorphic hardware such that appropriate reliability-oriented design margins can be set.

33 citations


Posted Content
TL;DR: In this paper, the authors propose a framework to exploit workload-specific performance and lifetime trade-offs in neuromorphic computing by extracting the precise times at which a charge pump is activated to support neural computations within a workload.
Abstract: Neuromorphic hardware with non-volatile memory (NVM) can implement machine learning workload in an energy-efficient manner. Unfortunately, certain NVMs such as phase change memory (PCM) require high voltages for correct operation. These voltages are supplied from an on-chip charge pump. If the charge pump is activated too frequently, its internal CMOS devices do not recover from stress, accelerating their aging and leading to negative bias temperature instability (NBTI) generated defects. Forcefully discharging the stressed charge pump can lower the aging rate of its CMOS devices, but makes the neuromorphic hardware unavailable to perform computations while its charge pump is being discharged. This negatively impacts performance such as latency and accuracy of the machine learning workload being executed. In this paper, we propose a novel framework to exploit workload-specific performance and lifetime trade-offs in neuromorphic computing. Our framework first extracts the precise times at which a charge pump in the hardware is activated to support neural computations within a workload. This timing information is then used with a characterized NBTI reliability model to estimate the charge pump's aging during the workload execution. We use our framework to evaluate workload-specific performance and reliability impacts of using 1) different SNN mapping strategies and 2) different charge pump discharge strategies. We show that our framework can be used by system designers to explore performance and reliability trade-offs early in the design of neuromorphic hardware such that appropriate reliability-oriented design margins can be set.

15 citations


Posted Content
TL;DR: P is proposed, a new mechanism that enables partition-level parallelism within each PCM bank, and exploits such parallelism by using the memory controller’s access scheduling decisions, and reduces average PCM access latency and improves average system performance.
Abstract: Phase-change memory (PCM) devices have multiple banks to serve memory requests in parallel. Unfortunately, if two requests go to the same bank, they have to be served one after another, leading to lower system performance. We observe that a modern PCM bank is implemented as a collection of partitions that operate mostly independently while sharing a few global peripheral structures, which include the sense amplifiers (to read) and the write drivers (to write). Based on this observation, we propose PALP, a new mechanism that enables partition-level parallelism within each PCM bank, and exploits such parallelism by using the memory controller's access scheduling decisions. PALP consists of three new contributions. First, we introduce new PCM commands to enable parallelism in a bank's partitions in order to resolve the read-write bank conflicts, with minimal changes needed to PCM logic and its interface. Second, we propose simple circuit modifications that introduce a new operating mode for the write drivers, in addition to their default mode of serving write requests. When configured in this new mode, the write drivers can resolve the read-read bank conflicts, working jointly with the sense amplifiers. Finally, we propose a new access scheduling mechanism in PCM that improves performance by prioritizing those requests that exploit partition-level parallelism over other requests, including the long outstanding ones. While doing so, the memory controller also guarantees starvation-freedom and the PCM's running-average-power-limit (RAPL). We evaluate PALP with workloads from the MiBench and SPEC CPU2017 Benchmark suites. Our results show that PALP reduces average PCM access latency by 23%, and improves average system performance by 28% compared to the state-of-the-art approaches.

7 citations


Journal ArticleDOI
TL;DR: A strategy based on adaptive-rate compressive sampling that exploits the fact that the signals of interest often can be sparsified under an appropriate representation basis and that the sampling rate can be tuned as a function of sparsity is developed and validates.
Abstract: Performance monitoring of datacenters provides vital information for dynamic resource provisioning, anomaly detection, and capacity planning decisions. Online monitoring, however, incurs a variety of costs: the very act of monitoring a system interferes with its performance, consuming network bandwidth and disk space. With the goal of reducing these costs, this paper develops and validates a strategy based on adaptive-rate compressive sampling. It exploits the fact that the signals of interest often can be sparsified under an appropriate representation basis and that the sampling rate can be tuned as a function of sparsity. We use the Trade6 application as our experimental platform and measure the signals of interest—in our case, signals pertaining to memory and disk I/O activity—using adaptive sampling. We then evaluate whether the reconstructed signals can be used for trend detection to track the gradual deterioration of system performance associated with software aging. Our experiments show that the signals recovered by our methods can be used to detect, with high confidence, the existence of trends within the original signal. We also evaluate the reconstructed signals for threshold-violation detection wherein the magnitude of the signal exceeds a preset value. Our experiments show that performance bottlenecks and anomalies that manifest themselves in portions of the signal where its magnitude exceeds a threshold value can also be detected using the reconstructed signals. Most importantly, detection of these anomalies is achieved using a substantially reduced sample size—a reduction of more than 70 percent when compared to the standard fixed-rate sampling method.

5 citations


Proceedings ArticleDOI
08 Jul 2019
TL;DR: A low-cost method of obtaining a sparse representation of the data collected at each individual server while preserving a specified fidelity with respect to the original signal is developed, which is a function of the specified fidelity.
Abstract: The volume of data needed for effective monitoring of datacenters poses significant challenges in its collection, transmission, analysis, and storage. Considering a setting wherein data collected locally at a server is sent to a monitoring station for analysis, this paper develops computationally efficient methods for systematic reduction of this data during the transfer and its subsequent recovery at the monitoring station. Specifically, we develop a low-cost method of obtaining a sparse representation of the data collected at each individual server while preserving a specified fidelity with respect to the original signal. The sparsified representation obtained from the data-collection step is amenable to further compression prior to transmission to the monitoring station. Upon receipt of the compressed-data stream at the monitoring station, a method of sparse-signal recovery is utilized to reconstruct the original full-length signal for further analysis. The techniques are validated using workload traces collected from one of Google's production clusters. Experiments show that the achieved data reduction, which is a function of the specified fidelity, is significant: to reconstruct the signal with a fidelity between 90%–95%, the sample size that must be be transferred to the monitoring station is under 10% of the original. We also verify that the recovered signal tracks the target minimum fidelity requirements specified by the operator with high precision.

3 citations


Journal ArticleDOI
01 Sep 2019
TL;DR: The implemented rolling physical layer key policy and Phy-Leave system resulted in a less than 1% increase in the area of a Virtex6 FPGA, demonstrating physical layer obfuscation as a means to increase the security of wireless communication without a significant cost in hardware.
Abstract: Obfuscation of the orthogonal frequency-division multiplexing (OFDM) physical layer is described in this paper as a means to enhance the security of wireless communication. The standardization of the communication channel between two trusted parties results in a variety of security threats, including vulnerabilities in WPA/WPA2 protocols that allow for the extraction of the software layer encryption key. Obfuscating the physical layer of the OFDM pipeline provides an additional layer of security in the event that the software layer key is compromised and allows for rolling updates of the physical layer key without altering the software layer key. The interleaver stage of the OFDM pipeline is redesigned to utilize a physical layer key, which is termed Phy-Leave. The Phy-Leave interleaver is evaluated through both MATLAB simulation and hardware prototyping on the Software Defined Communication (SDC) testbed using a Virtex6 FPGA. The implemented rolling physical layer key policy and Phy-Leave system resulted in a less than 1% increase in the area of a Virtex6 FPGA, demonstrating physical layer obfuscation as a means to increase the security of wireless communication without a significant cost in hardware.

1 citations


Patent
05 Dec 2019
TL;DR: In this paper, a key-based interleaver for enhancing the security of wireless communication includes a physical layer communication channel key to provide security even when the software encryption key is compromised.
Abstract: A key-based interleaver for enhancement the security of wireless communication includes a physical layer communication channel key to provide security even when the software encryption key is compromised. A method of creating a secure communication link using a physical layer interleaving system includes implementing a key policy implementation that utilizes temporal dependency and interleaving bits using a flexible inter and intra-block data interleaver.

1 citations