scispace - formally typeset
Search or ask a question
Author

Ken Mai

Other affiliations: Stanford University
Bio: Ken Mai is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Flash memory & Flash (photography). The author has an hindex of 30, co-authored 76 publications receiving 5858 citations. Previous affiliations of Ken Mai include Stanford University.


Papers
More filters
Journal ArticleDOI
01 Apr 2001
TL;DR: Wires that shorten in length as technologies scale have delays that either track gate delays or grow slowly relative to gate delays, which is good news since these "local" wires dominate chip wiring.
Abstract: Concern about the performance of wires wires in scaled technologies has led to research exploring other communication methods. This paper examines wire and gate delays as technologies migrate from 0.18-/spl mu/m to 0.035-/spl mu/m feature sizes to better understand the magnitude of the the wiring problem. Wires that shorten in length as technologies scale have delays that either track gate delays or grow slowly relative to gate delays. This result is good news since these "local" wires dominate chip wiring. Despite this scaling of local wire performance, computer-aided design (CAD) tools must still become move sophisticated in dealing with these wires. Under scaling, the total number of wires grows exponentially, so CAD tools will need to handle an ever-growing percentage of all the wires in order to keep designer workloads constant. Global wires present a more serious problem to designers. These are wires that do not scale in length since they communicate signals across the chip. The delay of these wives will remain constant if repeaters are used meaning that relative to gate delays, their delays scale upwards. These increased delays for global communication will drive architectures toward modular designs with explicit global latency mechanisms.

1,486 citations

Proceedings ArticleDOI
Ken Mai1, Tim Paaske1, Nuwan Jayasena1, Ron Ho1, William J. Dally1, Mark Horowitz1 
01 May 2000
TL;DR: Simulations of the mappings show that the Smart Memories architecture can successfully map two very different machines at opposite ends of the architectural spectrum, the Imagine stream processor and the Hydra speculative multiprocessor, with only modest performance degradation.
Abstract: Trends in VLSI technology scaling demand that future computing devices be narrowly focused to achieve high performance and high efficiency, yet also target the high volumes and low costs of widely applicable general purpose designs. To address these conflicting requirements, we propose a modular reconfigurable architecture called Smart Memories, targeted at computing needs in the 0.1m technology generation. A Smart Memories chip is made up of many processing tiles, each containing local memory, local interconnect, and a processor core. For efficient computation under a wide class of possible applications, the memories, the wires, and the computational model can all be altered to match the applications. To show the applicability of this design, two very different machines at opposite ends of the architectural spectrum, the Imagine stream processor and the Hydra speculative multiprocessor, are mapped onto the Smart Memories computing substrate. Simulations of the mappings show that the Smart Memories architecture can successfully map these architectures with only modest performance degradation.

477 citations

Proceedings ArticleDOI
12 Mar 2012
TL;DR: A framework for fast and accurate characterization of flash memory throughout its lifetime is designed and implemented and distinct error patterns, such as cycle-dependency, location- dependency and value- dependency, for various types of flash operations are demonstrated.
Abstract: As NAND flash memory manufacturers scale down to smaller process technology nodes and store more bits per cell, reliability and endurance of flash memory reduce. Wear-leveling and error correction coding can improve both reliability and endurance, but finding effective algorithms requires a strong understanding of flash memory error patterns. To enable such understanding, we have designed and implemented a framework for fast and accurate characterization of flash memory throughout its lifetime. This paper examines the complex flash errors that occur at 30-40nm flash technologies. We demonstrate distinct error patterns, such as cycle-dependency, location-dependency and value-dependency, for various types of flash operations. We analyze the discovered error patterns and explain why they exist from a circuit and device standpoint. Our hope is that the understanding developed from this characterization serves as a building block for new error tolerance algorithms for flash memory.

383 citations

Proceedings ArticleDOI
01 Dec 2007
TL;DR: Two-dimensional (2D) error coding in embedded memories is proposed, a scalable multi-bit error protection technique to improve memory reliability and yield and it is shown that 2D error coding can correct clustered errors up to 32times32 bits with significantly smaller performance, area, and power overheads than conventional techniques.
Abstract: In deep sub-micron ICs, growing amounts of on-die memory and scaling effects make embedded memories increasingly vulnerable to reliability and yield problems. As scaling progresses, soft and hard errors in the memory system will increase and single error events are more likely to cause large-scale multi- bit errors. However, conventional memory protection techniques can neither detect nor correct large-scale multi-bit errors without incurring large performance, area, and power overheads. We propose two-dimensional (2D) error coding in embedded memories, a scalable multi-bit error protection technique to improve memory reliability and yield. The key innovation is the use of vertical error coding across words that is used only for error correction in combination with conventional per-word horizontal error coding. We evaluate this scheme in the cache hierarchies of two representative chip multiprocessor designs and show that 2D error coding can correct clustered errors up to 32times32 bits with significantly smaller performance, area, and power overheads than conventional techniques.

343 citations

Proceedings ArticleDOI
18 Mar 2013
TL;DR: A key result is that the threshold voltage distribution can be modeled, with more than 95% accuracy, as a Gaussian distribution with additive white noise, which shifts to the right and widens as P/E cycles increase.
Abstract: With continued scaling of NAND flash memory process technology and multiple bits programmed per cell, NAND flash reliability and endurance are degrading. Understanding, characterizing, and modeling the distribution of the threshold voltages across different cells in a modern multi-level cell (MLC) flash memory can enable the design of more effective and efficient error correction mechanisms to combat this degradation. We show the first published experimental measurement-based characterization of the threshold voltage distribution of flash memory. To accomplish this, we develop a testing infrastructure that uses the read retry feature present in some 2Y-nm (i.e., 20-24nm) flash chips. We devise a model of the threshold voltage distributions taking into account program/erase (P/E) cycle effects, analyze the noise in the distributions, and evaluate the accuracy of our model. A key result is that the threshold voltage distribution can be modeled, with more than 95% accuracy, as a Gaussian distribution with additive white noise, which shifts to the right and widens as P/E cycles increase. The novel characterization and models provided in this paper can enable the design of more effective error tolerance mechanisms for future flash memories.

290 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Journal ArticleDOI
TL;DR: Focusing on using probabilistic metrics such as average values or variance to quantify design objectives such as performance and power will lead to a major change in SoC design methodologies.
Abstract: On-chip micronetworks, designed with a layered methodology, will meet the distinctive challenges of providing functionally correct, reliable operation of interacting system-on-chip components. A system on chip (SoC) can provide an integrated solution to challenging design problems in the telecommunications, multimedia, and consumer electronics domains. Much of the progress in these fields hinges on the designers' ability to conceive complex electronic engines under strong time-to-market pressure. Success will require using appropriate design and process technologies, as well as interconnecting existing components reliably in a plug-and-play fashion. Focusing on using probabilistic metrics such as average values or variance to quantify design objectives such as performance and power will lead to a major change in SoC design methodologies. Overall, these designs will be based on both deterministic and stochastic models. Creating complex SoCs requires a modular, component-based approach to both hardware and software design. Despite numerous challenges, the authors believe that developers will solve the problems of designing SoC networks. At the same time, they believe that a layered micronetwork design methodology will likely be the only path to mastering the complexity of future SoC designs.

3,852 citations

Journal ArticleDOI
10 Jun 2009
TL;DR: The current performance and future demands of interconnects to and on silicon chips are examined and the requirements for optoelectronic and optical devices are project if optics is to solve the major problems of interConnects for future high-performance silicon chips.
Abstract: We examine the current performance and future demands of interconnects to and on silicon chips. We compare electrical and optical interconnects and project the requirements for optoelectronic and optical devices if optics is to solve the major problems of interconnects for future high-performance silicon chips. Optics has potential benefits in interconnect density, energy, and timing. The necessity of low interconnect energy imposes low limits especially on the energy of the optical output devices, with a ~ 10 fJ/bit device energy target emerging. Some optical modulators and radical laser approaches may meet this requirement. Low (e.g., a few femtofarads or less) photodetector capacitance is important. Very compact wavelength splitters are essential for connecting the information to fibers. Dense waveguides are necessary on-chip or on boards for guided wave optical approaches, especially if very high clock rates or dense wavelength-division multiplexing (WDM) is to be avoided. Free-space optics potentially can handle the necessary bandwidths even without fast clocks or WDM. With such technology, however, optics may enable the continued scaling of interconnect capacity required by future chips.

1,959 citations

Journal ArticleDOI
TL;DR: The research shows that NoC constitutes a unification of current trends of intrachip communication rather than an explicit new alternative.
Abstract: The scaling of microchip technologies has enabled large scale systems-on-chip (SoC). Network-on-chip (NoC) research addresses global communication in SoC, involving (i) a move from computation-centric to communication-centric design and (ii) the implementation of scalable communication structures. This survey presents a perspective on existing NoC research. We define the following abstractions: system, network adapter, network, and link to explain and structure the fundamental concepts. First, research relating to the actual network design is reviewed. Then system level design and modeling are discussed. We also evaluate performance analysis techniques. The research shows that NoC constitutes a unification of current trends of intrachip communication rather than an explicit new alternative.

1,720 citations

Proceedings ArticleDOI
23 Jun 2002
TL;DR: An end-to-end model is described and validated that enables us to compute the soft error rates (SER) for existing and future microprocessor-style designs and predicts that the SER per chip of logic circuits will increase nine orders of magnitude from 1992 to 2011 and at that point will be comparable to the SERper chip of unprotected memory elements.
Abstract: This paper examines the effect of technology scaling and microarchitectural trends on the rate of soft errors in CMOS memory and logic circuits. We describe and validate an end-to-end model that enables us to compute the soft error rates (SER) for existing and future microprocessor-style designs. The model captures the effects of two important masking phenomena, electrical masking and latching-window masking, which inhibit soft errors in combinational logic. We quantify the SER due to high-energy neutrons in SRAM cells, latches, and logic circuits for feature sizes from 600 nm to 50 nm and clock periods from 16 to 6 fan-out-of-4 inverter delays. Our model predicts that the SER per chip of logic circuits will increase nine orders of magnitude from 1992 to 2011 and at that point will be comparable to the SER per chip of unprotected memory elements. Our result emphasizes that computer system designers must address the risks of soft errors in logic circuits for future designs.

1,506 citations