Showing papers by "Nagarajan Kandasamy published in 2019"

PDF

Open Access

Journal Article•DOI•

Enabling and Exploiting Partition-Level Parallelism (PALP) in Phase Change Memories

[...]

Shihao Song¹, Anup Das¹, Onur Mutlu², Nagarajan Kandasamy¹•Institutions (2)

07 Oct 2019

TL;DR: PALP as mentioned in this paper is a new mechanism that enables partition-level parallelism within each PCM bank, and exploits such parallelism by using the memory controller's access scheduling decisions.

...read moreread less

Abstract: Phase-change memory (PCM) devices have multiple banks to serve memory requests in parallel. Unfortunately, if two requests go to the same bank, they have to be served one after another, leading to lower system performance. We observe that a modern PCM bank is implemented as a collection of partitions that operate mostly independently while sharing a few global peripheral structures, which include the sense amplifiers (to read) and the write drivers (to write). Based on this observation, we propose PALP, a new mechanism that enables partition-level parallelism within each PCM bank, and exploits such parallelism by using the memory controller’s access scheduling decisions. PALP consists of three new contributions. First, we introduce new PCM commands to enable parallelism in a bank’s partitions in order to resolve the read-write bank conflicts, with no changes needed to PCM logic or its interface. Second, we propose simple circuit modifications that introduce a new operating mode for the write drivers, in addition to their default mode of serving write requests. When configured in this new mode, the write drivers can resolve the read-read bank conflicts, working jointly with the sense amplifiers. Finally, we propose a new access scheduling mechanism in PCM that improves performance by prioritizing those requests that exploit partition-level parallelism over other requests, including the long outstanding ones. While doing so, the memory controller also guarantees starvation-freedom and the PCM’s running-average-power-limit (RAPL).We evaluate PALP with workloads from the MiBench and SPEC CPU2017 Benchmark suites. Our results show that PALP reduces average PCM access latency by 23%, and improves average system performance by 28% compared to the state-of-the-art approaches.

...read moreread less

38 citations

Journal Article•DOI•

A Framework to Explore Workload-Specific Performance and Lifetime Trade-offs in Neuromorphic Computing

[...]

Adarsha Balaji¹, Shihao Song¹, Anup Das¹, Nikil Dutt², Jeffrey L. Krichmar², Nagarajan Kandasamy¹, Francky Catthoor³ - Show less +3 more•Institutions (3)

Drexel University¹, University of California, Irvine², Katholieke Universiteit Leuven³

01 Jul 2019-IEEE Computer Architecture Letters

TL;DR: The proposed framework first extracts the precise times at which a charge pump in the hardware is activated to support neural computations within a workload, then uses a characterized NBTI reliability model to estimate the charge pump's aging during the workload execution.

...read moreread less

Abstract: Neuromorphic hardware with non-volatile memory (NVM) can implement machine learning workload in an energy-efficient manner. Unfortunately, certain NVMs such as phase change memory (PCM) require high voltages for correct operation. These voltages are supplied from an on-chip charge pump. If the charge pump is activated too frequently, its internal CMOS devices do not recover from stress, accelerating their aging and leading to negative bias temperature instability (NBTI) generated defects. Forcefully discharging the stressed charge pump can lower the aging rate of its CMOS devices, but makes the neuromorphic hardware unavailable to perform computations while its charge pump is being discharged. This negatively impacts performance such as latency and accuracy of the machine learning workload being executed. In this letter, we propose a novel framework to exploit workload-specific performance and lifetime trade-offs in neuromorphic computing. Our framework first extracts the precise times at which a charge pump in the hardware is activated to support neural computations within a workload. This timing information is then used with a characterized NBTI reliability model to estimate the charge pump's aging during the workload execution. We use our framework to evaluate workload-specific performance and reliability impacts of using 1) different SNN mapping strategies and 2) different charge pump discharge strategies. We show that our framework can be used by system designers to explore performance and reliability trade-offs early in the design of neuromorphic hardware such that appropriate reliability-oriented design margins can be set.

...read moreread less

33 citations

Posted Content•

A Framework to Explore Workload-Specific Performance and Lifetime Trade-offs in Neuromorphic Computing

[...]

Adarsha Balaji¹, Shihao Song¹, Anup Das¹, Nikil Dutt², Jeffrey L. Krichmar², Nagarajan Kandasamy¹, Francky Catthoor³ - Show less +3 more•Institutions (3)

Drexel University¹, University of California, Irvine², Katholieke Universiteit Leuven³

01 Nov 2019-arXiv: Emerging Technologies

TL;DR: In this paper, the authors propose a framework to exploit workload-specific performance and lifetime trade-offs in neuromorphic computing by extracting the precise times at which a charge pump is activated to support neural computations within a workload.

...read moreread less

Abstract: Neuromorphic hardware with non-volatile memory (NVM) can implement machine learning workload in an energy-efficient manner. Unfortunately, certain NVMs such as phase change memory (PCM) require high voltages for correct operation. These voltages are supplied from an on-chip charge pump. If the charge pump is activated too frequently, its internal CMOS devices do not recover from stress, accelerating their aging and leading to negative bias temperature instability (NBTI) generated defects. Forcefully discharging the stressed charge pump can lower the aging rate of its CMOS devices, but makes the neuromorphic hardware unavailable to perform computations while its charge pump is being discharged. This negatively impacts performance such as latency and accuracy of the machine learning workload being executed. In this paper, we propose a novel framework to exploit workload-specific performance and lifetime trade-offs in neuromorphic computing. Our framework first extracts the precise times at which a charge pump in the hardware is activated to support neural computations within a workload. This timing information is then used with a characterized NBTI reliability model to estimate the charge pump's aging during the workload execution. We use our framework to evaluate workload-specific performance and reliability impacts of using 1) different SNN mapping strategies and 2) different charge pump discharge strategies. We show that our framework can be used by system designers to explore performance and reliability trade-offs early in the design of neuromorphic hardware such that appropriate reliability-oriented design margins can be set.

...read moreread less

15 citations

Posted Content•

Enabling and Exploiting Partition-Level Parallelism (PALP) in Phase Change Memories

[...]

Shihao Song¹, Anup Das¹, Onur Mutlu², Nagarajan Kandasamy¹•Institutions (2)

Drexel University¹, ETH Zurich²

21 Aug 2019-arXiv: Hardware Architecture

TL;DR: P is proposed, a new mechanism that enables partition-level parallelism within each PCM bank, and exploits such parallelism by using the memory controller’s access scheduling decisions, and reduces average PCM access latency and improves average system performance.

...read moreread less

Abstract: Phase-change memory (PCM) devices have multiple banks to serve memory requests in parallel. Unfortunately, if two requests go to the same bank, they have to be served one after another, leading to lower system performance. We observe that a modern PCM bank is implemented as a collection of partitions that operate mostly independently while sharing a few global peripheral structures, which include the sense amplifiers (to read) and the write drivers (to write). Based on this observation, we propose PALP, a new mechanism that enables partition-level parallelism within each PCM bank, and exploits such parallelism by using the memory controller's access scheduling decisions. PALP consists of three new contributions. First, we introduce new PCM commands to enable parallelism in a bank's partitions in order to resolve the read-write bank conflicts, with minimal changes needed to PCM logic and its interface. Second, we propose simple circuit modifications that introduce a new operating mode for the write drivers, in addition to their default mode of serving write requests. When configured in this new mode, the write drivers can resolve the read-read bank conflicts, working jointly with the sense amplifiers. Finally, we propose a new access scheduling mechanism in PCM that improves performance by prioritizing those requests that exploit partition-level parallelism over other requests, including the long outstanding ones. While doing so, the memory controller also guarantees starvation-freedom and the PCM's running-average-power-limit (RAPL). We evaluate PALP with workloads from the MiBench and SPEC CPU2017 Benchmark suites. Our results show that PALP reduces average PCM access latency by 23%, and improves average system performance by 28% compared to the state-of-the-art approaches.

...read moreread less

7 citations

Journal Article•DOI•

An Efficient Strategy for Online Performance Monitoring of Datacenters via Adaptive Sampling

[...]

Tingshan Huang¹, Nagarajan Kandasamy², Harish Sethu², Matthew C. Stamm²•Institutions (2)

Akamai Technologies¹, Drexel University²

01 Jan 2019-IEEE Transactions on Cloud Computing

TL;DR: A strategy based on adaptive-rate compressive sampling that exploits the fact that the signals of interest often can be sparsified under an appropriate representation basis and that the sampling rate can be tuned as a function of sparsity is developed and validates.

...read moreread less

Abstract: Performance monitoring of datacenters provides vital information for dynamic resource provisioning, anomaly detection, and capacity planning decisions. Online monitoring, however, incurs a variety of costs: the very act of monitoring a system interferes with its performance, consuming network bandwidth and disk space. With the goal of reducing these costs, this paper develops and validates a strategy based on adaptive-rate compressive sampling. It exploits the fact that the signals of interest often can be sparsified under an appropriate representation basis and that the sampling rate can be tuned as a function of sparsity. We use the Trade6 application as our experimental platform and measure the signals of interest—in our case, signals pertaining to memory and disk I/O activity—using adaptive sampling. We then evaluate whether the reconstructed signals can be used for trend detection to track the gradual deterioration of system performance associated with software aging. Our experiments show that the signals recovered by our methods can be used to detect, with high confidence, the existence of trends within the original signal. We also evaluate the reconstructed signals for threshold-violation detection wherein the magnitude of the signal exceeds a preset value. Our experiments show that performance bottlenecks and anomalies that manifest themselves in portions of the signal where its magnitude exceeds a threshold value can also be detected using the reconstructed signals. Most importantly, detection of these anomalies is achieved using a substantially reduced sample size—a reduction of more than 70 percent when compared to the standard fixed-rate sampling method.

...read moreread less

5 citations

Proceedings Article•DOI•

Data Reduction, Compression, and Recovery for Online Performance Monitoring

[...]

Salvador DeCelles¹, Matthew C. Stamm¹, Nagarajan Kandasamy¹•Institutions (1)

Drexel University¹

08 Jul 2019

TL;DR: A low-cost method of obtaining a sparse representation of the data collected at each individual server while preserving a specified fidelity with respect to the original signal is developed, which is a function of the specified fidelity.

...read moreread less

Abstract: The volume of data needed for effective monitoring of datacenters poses significant challenges in its collection, transmission, analysis, and storage. Considering a setting wherein data collected locally at a server is sent to a monitoring station for analysis, this paper develops computationally efficient methods for systematic reduction of this data during the transfer and its subsequent recovery at the monitoring station. Specifically, we develop a low-cost method of obtaining a sparse representation of the data collected at each individual server while preserving a specified fidelity with respect to the original signal. The sparsified representation obtained from the data-collection step is amenable to further compression prior to transmission to the monitoring station. Upon receipt of the compressed-data stream at the monitoring station, a method of sparse-signal recovery is utilized to reconstruct the original full-length signal for further analysis. The techniques are validated using workload traces collected from one of Google's production clusters. Experiments show that the achieved data reduction, which is a function of the specified fidelity, is significant: to reconstruct the signal with a fidelity between 90%–95%, the sample size that must be be transferred to the monitoring station is under 10% of the original. We also verify that the recovered signal tracks the target minimum fidelity requirements specified by the operator with high precision.

...read moreread less

3 citations

Journal Article•DOI•

Securing Wireless Communication via Hardware-Based Packet Obfuscation

[...]

James Chacko¹, Kyle Juretus¹, Marko Jacovic¹, Cem Sahin¹, Nagarajan Kandasamy¹, Ioannis Savidis¹, Kapil R. Dandekar¹ - Show less +3 more•Institutions (1)

Drexel University¹

01 Sep 2019

TL;DR: The implemented rolling physical layer key policy and Phy-Leave system resulted in a less than 1% increase in the area of a Virtex6 FPGA, demonstrating physical layer obfuscation as a means to increase the security of wireless communication without a significant cost in hardware.

...read moreread less

Abstract: Obfuscation of the orthogonal frequency-division multiplexing (OFDM) physical layer is described in this paper as a means to enhance the security of wireless communication. The standardization of the communication channel between two trusted parties results in a variety of security threats, including vulnerabilities in WPA/WPA2 protocols that allow for the extraction of the software layer encryption key. Obfuscating the physical layer of the OFDM pipeline provides an additional layer of security in the event that the software layer key is compromised and allows for rolling updates of the physical layer key without altering the software layer key. The interleaver stage of the OFDM pipeline is redesigned to utilize a physical layer key, which is termed Phy-Leave. The Phy-Leave interleaver is evaluated through both MATLAB simulation and hardware prototyping on the Software Defined Communication (SDC) testbed using a Virtex6 FPGA. The implemented rolling physical layer key policy and Phy-Leave system resulted in a less than 1% increase in the area of a Virtex6 FPGA, demonstrating physical layer obfuscation as a means to increase the security of wireless communication without a significant cost in hardware.

...read moreread less

1 citations

Patent•

Physical Layer Key based Interleaving for Secure Wireless Communication

[...]

Kapil R. Dandekar¹, James Chacko, Kyle Juretus, Marko Jacovic, Cem Sahin, Nagarajan Kandasamy, Ioannis Savidis - Show less +3 more•Institutions (1)

Drexel University¹

05 Dec 2019

TL;DR: In this paper, a key-based interleaver for enhancing the security of wireless communication includes a physical layer communication channel key to provide security even when the software encryption key is compromised.

...read moreread less

Abstract: A key-based interleaver for enhancement the security of wireless communication includes a physical layer communication channel key to provide security even when the software encryption key is compromised. A method of creating a secure communication link using a physical layer interleaving system includes implementing a key policy implementation that utilizes temporal dependency and interleaving bits using a flexible inter and intra-block data interleaver.

...read moreread less

1 citations