Author

Ron Kalla

Bio: Ron Kalla is an academic researcher from IBM. The author has contributed to research in topics: POWER5 & IBM. The author has an hindex of 3, co-authored 4 publications receiving 322 citations.

Topics: POWER5, IBM, Multi-core processor, Microarchitecture, Cache pollution ...read more

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Single-chip microprocessor that communicates directly using light

[...]

Chen Sun¹, Chen Sun², Mark T. Wade³, Yunsup Lee², Jason S. Orcutt⁴, Jason S. Orcutt¹, Luca Alloatti¹, Michael Georgas¹, Andrew Waterman², Jeffrey M. Shainline⁴, Jeffrey M. Shainline³, Rimas Avizienis², Sen Lin², Benjamin Moss¹, Rajesh Kumar³, Fabio Pavanello³, Amir H. Atabaki¹, Henry Cook², Albert Ou², Jonathan Leu¹, Yu-Hsin Chen¹, Krste Asanovic², Rajeev J. Ram¹, Milos A. Popovic³, Vladimir Stojanovic² - Show less +21 more•Institutions (4)

Massachusetts Institute of Technology¹, University of California, Berkeley², University of Colorado Boulder³, National Institute of Standards and Technology⁴

24 Dec 2015-Nature

TL;DR: This demonstration could represent the beginning of an era of chip-scale electronic–photonic systems with the potential to transform computing system architectures, enabling more powerful computers, from network infrastructure to data centres and supercomputers.

...read moreread less

Abstract: An electronic–photonic microprocessor chip manufactured using a conventional microelectronics foundry process is demonstrated; the chip contains 70 million transistors and 850 photonic components and directly uses light to communicate to other chips. The rapid transfer of data between chips in computer systems and data centres has become one of the bottlenecks in modern information processing. One way of increasing speeds is to use optical connections rather than electrical wires and the past decade has seen significant efforts to develop silicon-based nanophotonic approaches to integrate such links within silicon chips, but incompatibility between the manufacturing processes used in electronics and photonics has proved a hindrance. Now Chen Sun et al. describe a 'system on a chip' microprocessor that successfully integrates electronics and photonics yet is produced using standard microelectronic chip fabrication techniques. The resulting microprocessor combines 70 million transistors and 850 photonic components and can communicate optically with the outside world. This result promises a way forward for new fast, low-power computing systems architectures. Data transport across short electrical wires is limited by both bandwidth and power density, which creates a performance bottleneck for semiconductor microchips in modern computer systems—from mobile phones to large-scale data centres. These limitations can be overcome1,2,3 by using optical communications based on chip-scale electronic–photonic systems4,5,6,7 enabled by silicon-based nanophotonic devices8. However, combining electronics and photonics on the same chip has proved challenging, owing to microchip manufacturing conflicts between electronics and photonics. Consequently, current electronic–photonic chips9,10,11 are limited to niche manufacturing processes and include only a few optical devices alongside simple circuits. Here we report an electronic–photonic system on a single chip integrating over 70 million transistors and 850 photonic components that work together to provide logic, memory, and interconnect functions. This system is a realization of a microprocessor that uses on-chip photonic devices to directly communicate with other chips using light. To integrate electronics and photonics at the scale of a microprocessor chip, we adopt a ‘zero-change’ approach to the integration of photonics. Instead of developing a custom process to enable the fabrication of photonics12, which would complicate or eliminate the possibility of integration with state-of-the-art transistors at large scale and at high yield, we design optical devices using a standard microelectronics foundry process that is used for modern microprocessors13,14,15,16. This demonstration could represent the beginning of an era of chip-scale electronic–photonic systems with the potential to transform computing system architectures, enabling more powerful computers, from network infrastructure to data centres and supercomputers.

...read moreread less

1,058 citations

Journal Article•DOI•

A Scaling Roadmap and Performance Evaluation of In-Plane and Perpendicular MTJ Based STT-MRAMs for High-Density Cache Memory

[...]

Ki-Chul Chun¹, Hui Zhao¹, Jonathan Harms¹, Tae-Hyoung Kim², Jian-Ping Wang¹, Chris H. Kim¹ - Show less +2 more•Institutions (2)

University of Minnesota¹, Nanyang Technological University²

01 Jan 2013-IEEE Journal of Solid-state Circuits

TL;DR: The studies based on the proposed scaling methodology show that in-plane STT-MRAM will outperform SRAM from 15 nm node, while its perpendicular counterpart requires further innovations in MTJ material in order to overcome the poor write performance scaling from 22 nm node onwards.

...read moreread less

Abstract: This paper explores the scalability of in-plane and perpendicular MTJ based STT-MRAMs from 65 nm to 8 nm while taking into consideration realistic variability effects. We focus on the read and write performances of a STT-MRAM based cache rather than the obvious advantages such as the denser bit-cell and zero static power. An accurate MTJ macromodel capturing key MTJ properties was adopted for efficient Monte Carlo simulations. For the simulation of access devices and peripheral circuitries, ITRS projected transistor parameters were utilized and calibrated using the MASTAR tool that has been widely used in industry. 6T SRAM and STT-MRAM arrays were implemented with aggressive assist schemes to mimic industrial memory designs. A constant JC0·RA/VDD scaling scenario was used which to the first order gives the optimal balance between read and write margins of STT-MRAMs. The thermal stability factor ensuring a 10 year retention time was obtained by adjusting the free layer thickness as well as assuming improvement in the crystalline anisotropy. Our studies based on the proposed scaling methodology show that in-plane STT-MRAM will outperform SRAM from 15 nm node, while its perpendicular counterpart requires further innovations in MTJ material in order to overcome the poor write performance scaling from 22 nm node onwards.

...read moreread less

322 citations

Journal Article•DOI•

Understanding POWER multiprocessors

[...]

Susmit Sarkar¹, Peter Sewell¹, Jade Alglave², Luc Maranget³, Derek Edward Williams⁴ - Show less +1 more•Institutions (4)

University of Cambridge¹, University of Oxford², French Institute for Research in Computer Science and Automation³, IBM⁴

04 Jun 2011

TL;DR: An abstract-machine semantics that abstracts from most of the implementation detail but explains the behaviour of a range of subtle examples of IBM POWER multiprocessors is given, which should bring new clarity to concurrent systems programming for these architectures.

...read moreread less

Abstract: Exploiting today's multiprocessors requires high-performance and correct concurrent systems code (optimising compilers, language runtimes, OS kernels, etc.), which in turn requires a good understanding of the observable processor behaviour that can be relied on. Unfortunately this critical hardware/software interface is not at all clear for several current multiprocessors.In this paper we characterise the behaviour of IBM POWER multiprocessors, which have a subtle and highly relaxed memory model (ARM multiprocessors have a very similar architecture in this respect). We have conducted extensive experiments on several generations of processors: POWER G5, 5, 6, and 7. Based on these, on published details of the microarchitectures, and on discussions with IBM staff, we give an abstract-machine semantics that abstracts from most of the implementation detail but explains the behaviour of a range of subtle examples. Our semantics is explained in prose but defined in rigorous machine-processed mathematics; we also confirm that it captures the observable processor behaviour, or the architectural intent, for our examples with an executable checker. While not officially sanctioned by the vendor, we believe that this model gives a reasonable basis for reasoning about current POWER multiprocessors.Our work should bring new clarity to concurrent systems programming for these architectures, and is a necessary precondition for any analysis or verification. It should also inform the design of languages such as C and C++, where the language memory model is constrained by what can be efficiently compiled to such multiprocessors.

...read moreread less

249 citations

Proceedings Article•DOI•

Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?

[...]

Eric S. Chung¹, Peter Milder¹, James C. Hoe¹, Ken Mai¹•Institutions (1)

Carnegie Mellon University¹

04 Dec 2010

TL;DR: In this article, the relative merits between different approaches in the face of technology constraints are analyzed for U-cores and the predictive power of their model depends upon U-core-specific parameters derived by measuring performance and power of tuned applications on today's state-of-the-art multicores, GPUs, FPGAs, and ASICs.

...read moreread less

Abstract: To extend the exponential performance scaling of future chip multiprocessors, improving energy efficiency has become a first-class priority. Single-chip heterogeneous computing has the potential to achieve greater energy efficiency by combining traditional processors with unconventional cores (U-cores) such as custom logic, FPGAs, or GPGPUs. Although U-cores are effective at increasing performance, their benefits can also diminish given the scarcity of projected bandwidth in the future. To understand the relative merits between different approaches in the face of technology constraints, this work builds on prior modeling of heterogeneous multicores to support U-cores. Unlike prior models that trade performance, power, and area using well-known relationships between simple and complex processors, our model must consider the less-obvious relationships between conventional processors and a diverse set of U-cores. Further, our model supports speculation of future designs from scaling trends predicted by the ITRS road map. The predictive power of our model depends upon U-core-specific parameters derived by measuring performance and power of tuned applications on today's state-of-the-art multicores, GPUs, FPGAs, and ASICs. Our results reinforce some current-day understandings of the potential and limitations of U-cores and also provides new insights on their relative merits.

...read moreread less

249 citations

Proceedings Article•DOI•

PHANTOM: practical oblivious computation in a secure processor

[...]

Martin Maas¹, Eric Love¹, Emil Stefanov¹, Mohit Tiwari², Elaine Shi³, Krste Asanovic¹, John Kubiatowicz¹, Dawn Song¹ - Show less +4 more•Institutions (3)

University of California, Berkeley¹, University of Texas at Austin², University of Maryland, College Park³

04 Nov 2013

TL;DR: PHANTOM is the first demonstration of a practical, oblivious processor that can provide strong confidentiality guarantees when offloading computation to the cloud and is efficient in both area and performance.

...read moreread less

Abstract: We introduce PHANTOM [1] a new secure processor that obfuscates its memory access trace. To an adversary who can observe the processor's output pins, all memory access traces are computationally indistinguishable (a property known as obliviousness). We achieve obliviousness through a cryptographic construct known as Oblivious RAM or ORAM. We first improve an existing ORAM algorithm and construct an empirical model for its trusted storage requirement. We then present PHANTOM, an oblivious processor whose novel memory controller aggressively exploits DRAM bank parallelism to reduce ORAM access latency and scales well to a large number of memory channels. Finally, we build a complete hardware implementation of PHANTOM on a commercially available FPGA-based server, and through detailed experiments show that PHANTOM is efficient in both area and performance. Accessing 4KB of data from a 1GB ORAM takes 26.2us (13.5us for the data to be available), a 32x slowdown over accessing 4KB from regular memory, while SQLite queries on a population database see 1.2-6x slowdown. PHANTOM is the first demonstration of a practical, oblivious processor and can provide strong confidentiality guarantees when offloading computation to the cloud.

...read moreread less

249 citations

Collapse

Ron Kalla

Papers

Cited by