Topic

Supercomputer

About: Supercomputer is a research topic. Over the lifetime, 9990 publications have been published within this topic receiving 150873 citations. The topic is also known as: High performance computing & High-performance computing.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Quantum supremacy using a programmable superconducting processor

[...]

Frank Arute¹, Kunal Arya¹, Ryan Babbush¹, Dave Bacon¹, Joseph C. Bardin¹, Joseph C. Bardin², Rami Barends¹, Rupak Biswas³, Sergio Boixo¹, Fernando G. S. L. Brandão⁴, Fernando G. S. L. Brandão¹, David A. Buell¹, B. Burkett¹, Yu Chen¹, Zijun Chen¹, Ben Chiaro⁵, Roberto Collins¹, William Courtney¹, Andrew Dunsworth¹, Edward Farhi¹, Brooks Foxen⁵, Brooks Foxen¹, Austin G. Fowler¹, Craig Gidney¹, Marissa Giustina¹, R. Graff¹, Keith Guerin¹, Steve Habegger¹, Matthew P. Harrigan¹, Michael J. Hartmann⁶, Michael J. Hartmann¹, Alan Ho¹, Markus R. Hoffmann¹, Trent Huang¹, Travis S. Humble⁷, Sergei V. Isakov¹, Evan Jeffrey¹, Zhang Jiang¹, Dvir Kafri¹, Kostyantyn Kechedzhi¹, Julian Kelly¹, Paul V. Klimov¹, Sergey Knysh¹, Alexander N. Korotkov¹, Alexander N. Korotkov⁸, Fedor Kostritsa¹, David Landhuis¹, Mike Lindmark¹, E. Lucero¹, Dmitry I. Lyakh⁷, Salvatore Mandrà³, Jarrod R. McClean¹, Matt McEwen⁵, Anthony Megrant¹, Xiao Mi¹, Kristel Michielsen⁹, Kristel Michielsen¹⁰, Masoud Mohseni¹, Josh Mutus¹, Ofer Naaman¹, Matthew Neeley¹, Charles Neill¹, Murphy Yuezhen Niu¹, Eric Ostby¹, Andre Petukhov¹, John Platt¹, Chris Quintana¹, Eleanor Rieffel³, Pedram Roushan¹, Nicholas C. Rubin¹, Daniel Sank¹, Kevin J. Satzinger¹, Vadim Smelyanskiy¹, Kevin J. Sung¹¹, Kevin J. Sung¹, Matthew D. Trevithick¹, Amit Vainsencher¹, Benjamin Villalonga¹², Benjamin Villalonga¹, Theodore White¹, Z. Jamie Yao¹, Ping Yeh¹, Adam Zalcman¹, Hartmut Neven¹, John M. Martinis⁵, John M. Martinis¹ - Show less +82 more•Institutions (12)

Google¹, University of Massachusetts Amherst², Ames Research Center³, California Institute of Technology⁴, University of California, Santa Barbara⁵, University of Erlangen-Nuremberg⁶, Oak Ridge National Laboratory⁷, University of California, Riverside⁸, Forschungszentrum Jülich⁹, RWTH Aachen University¹⁰, University of Michigan¹¹, University of Illinois at Urbana–Champaign¹²

24 Oct 2019-Nature

TL;DR: Quantum supremacy is demonstrated using a programmable superconducting processor known as Sycamore, taking approximately 200 seconds to sample one instance of a quantum circuit a million times, which would take a state-of-the-art supercomputer around ten thousand years to compute.

...read moreread less

Abstract: The promise of quantum computers is that certain computational tasks might be executed exponentially faster on a quantum processor than on a classical processor1. A fundamental challenge is to build a high-fidelity processor capable of running quantum algorithms in an exponentially large computational space. Here we report the use of a processor with programmable superconducting qubits2-7 to create quantum states on 53 qubits, corresponding to a computational state-space of dimension 253 (about 1016). Measurements from repeated experiments sample the resulting probability distribution, which we verify using classical simulations. Our Sycamore processor takes about 200 seconds to sample one instance of a quantum circuit a million times-our benchmarks currently indicate that the equivalent task for a state-of-the-art classical supercomputer would take approximately 10,000 years. This dramatic increase in speed compared to all known classical algorithms is an experimental realization of quantum supremacy8-14 for this specific computational task, heralding a much-anticipated computing paradigm.

...read moreread less

2,527 citations

The Landscape of Parallel Computing Research: A View from Berkeley

[...]

Krste Asanovic, Ras Bodik, Bryan Catanzaro, Joseph Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, William Plishker, John Shalf, Samuel Williams, Katherine Yelick - Show less +7 more

18 Dec 2006

TL;DR: The parallel landscape is frame with seven questions, and the following are recommended to explore the design space rapidly: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS each development dollar.

...read moreread less

Abstract: Author(s): Asanovic, K; Bodik, R; Catanzaro, B; Gebis, J; Husbands, P; Keutzer, K; Patterson, D; Plishker, W; Shalf, J; Williams, SW | Abstract: The recent switch to parallel microprocessors is a milestone in the history of computing. Industry has laid out a roadmap for multicore designs that preserves the programming paradigm of the past via binary compatibility and cache coherence. Conventional wisdom is now to double the number of cores on a chip with each silicon generation. A multidisciplinary group of Berkeley researchers met nearly two years to discuss this change. Our view is that this evolutionary approach to parallel hardware and software may work from 2 or 8 processor systems, but is likely to face diminishing returns as 16 and 32 processor systems are realized, just as returns fell with greater instruction-level parallelism. We believe that much can be learned by examining the success of parallelism at the extremes of the computing spectrum, namely embedded computing and high performance computing. This led us to frame the parallel landscape with seven questions, and to recommend the following: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS per development dollar. • Instead of traditional benchmarks, use 13 “Dwarfs” to design and evaluate parallel programming models and architectures. (A dwarf is an algorithmic method that captures a pattern of computation and communication.) • “Autotuners” should play a larger role than conventional compilers in translating parallel programs. • To maximize programmer productivity, future programming models must be more human-centric than the conventional focus on hardware or applications. • To be successful, programming models should be independent of the number of processors. • To maximize application efficiency, programming models should support a wide range of data types and successful models of parallelism: task-level parallelism, word-level parallelism, and bit-level parallelism. 1 The Landscape of Parallel Computing Research: A View From Berkeley • Architects should not include features that significantly affect performance or energy if programmers cannot accurately measure their impact via performance counters and energy counters. • Traditional operating systems will be deconstructed and operating system functionality will be orchestrated using libraries and virtual machines. • To explore the design space rapidly, use system emulators based on Field Programmable Gate Arrays (FPGAs) that are highly scalable and low cost. Since real world applications are naturally parallel and hardware is naturally parallel, what we need is a programming model, system software, and a supporting architecture that are naturally parallel. Researchers have the rare opportunity to re-invent these cornerstones of computing, provided they simplify the efficient programming of highly parallel systems.

...read moreread less

2,262 citations

Proceedings Article•DOI•

DaDianNao: A Machine-Learning Supercomputer

[...]

Yunji Chen¹, Luo Tao¹, Liu Shaoli¹, Zhang Shijin¹, Liqiang He², Jia Wang¹, Ling Li¹, Tianshi Chen¹, Zhiwei Xu¹, Ninghui Sun¹, Olivier Temam³ - Show less +7 more•Institutions (3)

Chinese Academy of Sciences¹, Inner Mongolia University², French Institute for Research in Computer Science and Automation³

13 Dec 2014

TL;DR: This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system.

...read moreread less

Abstract: Many companies are deploying services, either for consumers or industry, which are largely based on machine-learning algorithms for sophisticated processing of large amounts of data. The state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CNNs and DNNs), which are known to be both computationally and memory intensive. A number of neural network accelerators have been recently proposed which can offer high computational capacity/area ratio, but which remain hampered by memory accesses. However, unlike the memory wall faced by processors on general-purpose workloads, the CNNs and DNNs memory footprint, while large, is not beyond the capability of the on chip storage of a multi-chip system. This property, combined with the CNN/DNN algorithmic characteristics, can lead to high internal bandwidth and low external communications, which can in turn enable high-degree parallelism at a reasonable area cost. In this article, we introduce a custom multi-chip machine-learning architecture along those lines. We show that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system. We implement the node down to the place and route at 28nm, containing a combination of custom storage and computational units, with industry-grade interconnects.

...read moreread less

1,486 citations

Proceedings Article•

GPFS: A Shared-Disk File System for Large Computing Clusters

[...]

Frank B. Schmuck¹, Roger L. Haskin¹•Institutions (1)

IBM¹

28 Jan 2002

TL;DR: GPFS is IBM's parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters, and discusses how distributed locking and recovery techniques were extended to scale to large clusters.

...read moreread less

Abstract: GPFS is IBM's parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters. GPFS is used on many of the largest supercomputers in the world. GPFS was built on many of the ideas that were developed in the academic community over the last several years, particularly distributed locking and recovery technology. To date it has been a matter of conjecture how well these ideas scale. We have had the opportunity to test those limits in the context of a product that runs on the largest systems in existence. While in many cases existing ideas scaled well, new approaches were necessary in many key areas. This paper describes GPFS, and discusses how distributed locking and recovery techniques were extended to scale to large clusters.

...read moreread less

1,434 citations

Book•

Fat-trees: universal networks for hardware-efficient supercomputing

[...]

Charles E. Leiserson

01 Jun 1994

TL;DR: In this article, the authors presented a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer, and proved that a fat-tree of a given size is nearly the best routing network of that size.

...read moreread less

Abstract: The author presents a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer. A fat-tree routing network is parameterized not only in the number of processors, but also in the amount of simultaneous communication it can support. Since communication can be scaled independently from the number of processors, substantial hardware can be saved for such applications as finite-element analysis without resorting to a special-purpose architecture. It is proved that a fat-tree of a given size is nearly the best routing network of that size. This universality theorem is established using a three-dimensional VLSI model that incorporates wiring as a direct cost. In this model, hardware size is measured as physical volume. It is proved that for any given amount of communications hardware, a fat-tree built from that amount of hardware can stimulate every other network built from the same amount of hardware, using only slightly more time (a polylogarithmic factor greater).

...read moreread less

1,227 citations

Collapse

Network Information

Performance

Metrics

11,011

Papers

165,696

Citations

No. of papers in the topic in previous years
Year	Papers
2023	304
2022	708
2021	328
2020	406
2019	459
2018	545

Supercomputer

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics