Home
/
Authors
/
Gaurav Mittal

Author

Gaurav Mittal

Bio: Gaurav Mittal is an academic researcher from IBM. The author has contributed to research in topics: Circuit design & POWER6. The author has an hindex of 3, co-authored 6 publications receiving 222 citations.

Topics: Circuit design, POWER6, Microprocessor, Integrated circuit, Batch file ...read more

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Design of the Power6 Microprocessor

[...]

Joshua Friedrich¹, Bradley McCredie¹, Norman Karl James¹, B. Huott¹, Brian W. Curran¹, Eric Fluhr¹, Gaurav Mittal¹, E. Chan¹, Y.H. Chan¹, Donald W. Plass¹, Sam Gat-Shang Chu¹, Hung Le¹, L. Clark¹, J. Ripley¹, Scott A. Taylor¹, Jack DiLullo¹, M. Lanzerotti¹ - Show less +13 more•Institutions (1)

IBM¹

18 Jun 2007

TL;DR: The POWER6trade microprocessor combines ultra-high frequency operation, aggressive power reduction, a highly scalable memory subsystem, and mainframe-like reliability, availability, and serviceability.

...read moreread less

Abstract: The POWER6trade microprocessor combines ultra-high frequency operation, aggressive power reduction, a highly scalable memory subsystem, and mainframe-like reliability, availability, and serviceability. The 341mm2 700M transistor dual-core microprocessor is fabricated in a 65nm SOI process with 10 levels of low-k copper interconnect. It operates at clock frequencies over 5GHz in high-performance applications, and consumes under 100W in power-sensitive applications.

...read moreread less

120 citations

Journal Article•DOI•

POWER7™, a Highly Parallel, Scalable Multi-Core High End Server Processor

[...]

Dieter Wendel¹, Ronald Nick Kalla¹, James D. Warnock¹, Robert Alan Cargnoni¹, Sam Gat-Shang Chu¹, Joachim Gerhard Clabes¹, Daniel M. Dreps¹, David A. Hrusecky¹, Joshua Friedrich¹, Saiful Islam¹, J. Kahle¹, Jentje Leenstra¹, Gaurav Mittal¹, Jose Angel Paredes¹, J. Pille¹, Phillip J. Restle¹, Balaram Sinharoy¹, G Smith¹, William J. Starke¹, Scott A. Taylor¹, J. A. Van Norstrand¹, S. Weitzel¹, Phillip G. Williams¹, Victor Zyuban¹ - Show less +20 more•Institutions (1)

IBM¹

01 Jan 2011-IEEE Journal of Solid-state Circuits

TL;DR: The organization of the design and the features of the processor core are described, before moving on to discuss the circuits used for analog elements, clock generation and distribution, and I/O designs, including special features for test, debug, and chip frequency tuning.

...read moreread less

Abstract: This paper gives an overview of the latest member of the POWER™ processor family, POWER7™. Eight quad-threaded cores, operating at frequencies up to 4.14 GHz, are integrated together with two memory controllers and high speed system links on a 567 mm die, employing 1.2B transistors in a 45 nm CMOS SOI technology with 11 layers of low-k copper wiring. The technology features deep trench capacitors which are used to build a 32 MB embedded DRAM L3 based on a 0.067 m DRAM cell. The functionally equivalent chip transistor count would have been over 2.7B if the L3 had been implemented with a conventional 6 transistor SRAM cell. (A detailed paper about the eDRAM implementation will be given in a separate paper of this Journal). Deep trench capacitors are also used to reduce on-chip voltage island supply noise. This paper describes the organization of the design and the features of the processor core, before moving on to discuss the circuits used for analog elements, clock generation and distribution, and I/O designs. The final section describes the details of the clocked storage elements, including special features for test, debug, and chip frequency tuning.

...read moreread less

53 citations

Journal Article•DOI•

Design and Implementation of the POWER6 Microprocessor

[...]

Benjamin Stolt, Yonatan Mittlefehldt, Sanjay Dubey, Gaurav Mittal, Mike Lee, Joshua Friedrich, Eric Fluhr - Show less +3 more

28 Jan 2008-IEEE Journal of Solid-state Circuits

TL;DR: Some of the circuit methodology and implementation innovations used in the development of POWER6, with particular emphasis on custom, synthesized, register file and SRAM design, as well as the electrical characterizations performed in the lab are described.

...read moreread less

Abstract: The IBM POWER6 processor is a dual-core, 341 mm2, 790 million transistor chip fabricated using IBM's 65 nm partially-depleted SOI process. Capable of running at frequencies up to 5 GHz in high performance applications, it can also operate under 100 W for power-sensitive applications. Traditional power-intensive and deep-pipelining techniques used in high frequency design were abandoned in favor of more power efficient circuit design methodologies. The complexity and size of POWER6, together with its high operating frequency, presented a number of significant challenges for its multi-site team to complete the design on an aggressive schedule. This paper describes some of the circuit methodology and implementation innovations used in the development of POWER6, with particular emphasis on custom, synthesized, register file and SRAM design, as well as the electrical characterizations performed in the lab.

...read moreread less

44 citations

Patent•

System and method for text based placement engine for custom circuit design

[...]

Sanjay Dubey¹, Gaurav Mittal¹•Institutions (1)

IBM¹

19 Apr 2005

TL;DR: In this paper, a system and method that uses a text-based script file to capture a circuit design and allow a circuit designer to manipulate the script file is presented, where the circuit designer can add, delete, or move components using various tags and commands that are stored in the script files.

...read moreread less

Abstract: A system and method that uses a text-based script file to capture a circuit design and allows a circuit designer to manipulate the script file. The circuit designer can add, delete, or move components using various tags and commands that are stored in the script file. When the design is complete, or ready to be tested, the script file is processed creating a layout representation file that is readable by a graphics-based circuit design tool.

...read moreread less

3 citations

Patent•

Methods for generating a contributor-based power abstract for a device

[...]

Nagashyamala R. Dhanwada¹, William W. Dungan¹, David J. Hathaway¹, Arun Joseph¹, Gaurav Mittal¹, Ricardo H. Nigaglioni¹ - Show less +2 more•Institutions (1)

IBM¹

05 Jan 2017

TL;DR: In this article, a clock power component for each of a plurality of clock gating domains is identified, and a switching characteristic for each switching characteristic is combined into a domain combination list.

...read moreread less

Abstract: Generating a contributor-based power abstract for a device, including: identifying a clock power component for each of a plurality of clock gating domains, identifying a switching characteristic for each of the clock gating domains, combining the switching characteristics for all of the clock gating domains into a domain combination list, performing a per-case simulation based at least on the domain combination list, calculating an effective capacitance for each of the clock gating domains based at least on the per-case simulation, and generating a power abstract for each of the clock gating domains based at least on the effective capacitance.

...read moreread less

2 citations

Cited by

PDF

Open Access

More filters

Processor: A 64-Core SoC with Mesh Interconnect

[...]

Shane L. Bell, Bruce S. Edwards, John Amann, Rich Conlin, Kevin Joyce, Vince Leung, John MacKay, Mike Reif, Liewei Bao, J.F. Brown, Matthew Mattina, Chyi-Chang Miao, Carl Ramey, David Wentzlaff, Walker Anderson, Ethan Berger, Nat Fairbanks, Durlov Khan, Froilan Montenegro, Jay Stickney, John Zook - Show less +17 more

01 Jan 2010

TL;DR: The TILE64TM processor as mentioned in this paper is a multicore SoC targeting the high-performance demands of a wide range of embedded applications across networking and digital multimedia applications, with 64 tile processors arranged in an 8x8 array.

...read moreread less

Abstract: The TILE64TM processor is a multicore SoC targeting the high-performance demands of a wide range of embedded applications across networking and digital multimedia applications. A figure shows a block diagram with 64 tile processors arranged in an 8x8 array. These tiles connect through a scalable 2D mesh network with high-speed I/Os on the periphery. Each general-purpose processor is identical and capable of running SMP Linux.

...read moreread less

634 citations

Proceedings Article•DOI•

TILE64 - Processor: A 64-Core SoC with Mesh Interconnect

[...]

Shane L. Bell, Bruce S. Edwards, John Amann, Richard Conlin, Kevin Joyce, V. Leung, J. MacKay, M. Reif, Liewei Bao, J.F. Brown, Matthew Mattina, Chyi-Chang Miao, Carl Ramey, David Wentzlaff, W. Anderson, E. Berger, N. Fairbanks, D. Khan, F. Montenegro, J. Stickney, J. Zook - Show less +17 more

01 Feb 2008

TL;DR: The TILE64TM processor is a multicore SoC targeting the high-performance demands of a wide range of embedded applications across networking and digital multimedia applications.

...read moreread less

587 citations

Proceedings Article•DOI•

Process Variation Tolerant 3T1D-Based Cache Architectures

[...]

Xiaoyao Liang¹, Ramon Canal¹, Gu-Yeon Wei¹, David Brooks²•Institutions (2)

Harvard University¹, Polytechnic University of Catalonia²

01 Dec 2007

TL;DR: A range of cache refresh and placement schemes that are sensitive to retention time are proposed, and it is shown that most of the retention time variations can be masked by the microarchitecture when using these schemes.

...read moreread less

Abstract: Process variations will greatly impact the stability, leakage power consumption, and performance of future microprocessors. These variations are especially detrimental to 6T SRAM (6-transistor static memory) structures and will become critical with continued technology scaling. In this paper, we propose new on-chip memory architectures based on novel 3T1D DRAM (3-transistor, 1-diode dynamic memory) cells. We provide a detailed comparison between 6T and 3T1D designs in the context of a L1 data cache. The effects of physical device variation on a 3T1D cache can be lumped into variation of data retention times. This paper proposes a range of cache refresh and placement schemes that are sensitive to reten- tion time, and we show that most of the retention time variations can be masked by the microarchitecture when using these schemes. We have performed detailed circuit and architectural simulations assuming different degrees of variability in advanced technology nodes, and we show that the resulting memory architecture can tol- erate large process variations with little or even no impact on per- formance when compared to ideal 6T SRAM designs. Furthermore, these designs are robust to memory cell stability issues and can achieve large power savings. These advantages make the new mem- ory architectures a promising choice for on-chip variation-tolerant cache structures required for next generation microprocessors.

...read moreread less

359 citations

Journal Article•DOI•

Scale-out processors

[...]

Pejman Lotfi-Kamran¹, Boris Grot¹, Michael Ferdman², Stavros Volos¹, Onur Kocberber¹, Javier Picorel¹, Almutaz Adileh¹, Djordje Jevdjic¹, Sachin Satish Idgunji, Emre Ozer, Babak Falsafi¹ - Show less +7 more•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, Carnegie Mellon University²

09 Jun 2012

TL;DR: This work introduces a methodology for designing scalable and efficient scale-out server processors based on a metric of performance-density, and facilitates the design of optimal multi-core configurations, called pods.

...read moreread less

Abstract: Scale-out datacenters mandate high per-server throughput to get the maximum benefit from the large TCO investment Emerging applications (eg, data serving and web search) that run in these datacenters operate on vast datasets that are not accommodated by on-die caches of existing server chips Large caches reduce the die area available for cores and lower performance through long access latency when instructions are fetched Performance on scale-out workloads is maximized through a modestly-sized last-level cache that captures the instruction footprint at the lowest possible access latency In this work, we introduce a methodology for designing scalable and efficient scale-out server processors Based on a metric of performance-density, we facilitate the design of optimal multi-core configurations, called pods Each pod is a complete server that tightly couples a number of cores to a small last-level cache using a fast interconnect Replicating the pod to fill the die area yields processors which have optimal performance density, leading to maximum per-chip throughput Moreover, as each pod is a stand-alone server, scale-out processors avoid the expense of global (ie, inter-pod) interconnect and coherence These features synergistically maximize throughput, lower design complexity, and improve technology scalability In 20nm technology, scale-out chips improve throughput by 5x-65x over conventional and by 16x-19x over emerging tiled organizations

...read moreread less

185 citations

Journal Article•DOI•

IBM POWER7 multicore server processor

[...]

Balaram Sinharoy¹, Ronald Nick Kalla¹, W. J. Starke¹, Hung Qui Le¹, Robert Alan Cargnoni¹, J. A. Van Norstrand¹, Bruce Joseph Ronchetti¹, Jeffrey A. Stuecheli¹, Jentje Leenstra¹, Guy Lynn Guthrie¹, Dung Quoc Nguyen¹, Bartholomew Blaner¹, Charles F. Marino¹, Eric E. Retter¹, Peter Williams¹ - Show less +11 more•Institutions (1)

IBM¹

01 May 2011-Journal of Reproduction and Development

TL;DR: The processor core and caches of the POWER7 processor chip are significantly enhanced to boost the performance of both single-threaded response-time-oriented, as well as multithreaded, throughput-oriented applications.

...read moreread less

Abstract: The IBM POWER® processor is the dominant reduced instruction set computing microprocessor in the world today, with a rich history of implementation and innovation over the last 20 years. In this paper, we describe the key features of the POWER7® processor chip. On the chip is an eight-core processor, with each core capable of four-way simultaneous multithreaded operation. Fabricated in IBM's 45-nm silicon-on-insulator (SOI) technology with 11 levels of metal, the chip contains more than one billion transistors. The processor core and caches are significantly enhanced to boost the performance of both single-threaded response-time-oriented, as well as multithreaded, throughput-oriented applications. The memory subsystem contains three levels of on-chip cache, with SOI embedded dynamic random access memory (DRAM) devices used as the last level of cache. A new memory interface using buffered double-data-rate-three DRAM and improvements in reliability, availability, and serviceability are discussed

...read moreread less

167 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

Collapse