scispace - formally typeset
Search or ask a question
Author

Ras Bodik

Bio: Ras Bodik is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: Legacy system & Database theory. The author has an hindex of 5, co-authored 7 publications receiving 2333 citations.

Papers
More filters
18 Dec 2006
TL;DR: The parallel landscape is frame with seven questions, and the following are recommended to explore the design space rapidly: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS each development dollar.
Abstract: Author(s): Asanovic, K; Bodik, R; Catanzaro, B; Gebis, J; Husbands, P; Keutzer, K; Patterson, D; Plishker, W; Shalf, J; Williams, SW | Abstract: The recent switch to parallel microprocessors is a milestone in the history of computing. Industry has laid out a roadmap for multicore designs that preserves the programming paradigm of the past via binary compatibility and cache coherence. Conventional wisdom is now to double the number of cores on a chip with each silicon generation. A multidisciplinary group of Berkeley researchers met nearly two years to discuss this change. Our view is that this evolutionary approach to parallel hardware and software may work from 2 or 8 processor systems, but is likely to face diminishing returns as 16 and 32 processor systems are realized, just as returns fell with greater instruction-level parallelism. We believe that much can be learned by examining the success of parallelism at the extremes of the computing spectrum, namely embedded computing and high performance computing. This led us to frame the parallel landscape with seven questions, and to recommend the following: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS per development dollar. • Instead of traditional benchmarks, use 13 “Dwarfs” to design and evaluate parallel programming models and architectures. (A dwarf is an algorithmic method that captures a pattern of computation and communication.) • “Autotuners” should play a larger role than conventional compilers in translating parallel programs. • To maximize programmer productivity, future programming models must be more human-centric than the conventional focus on hardware or applications. • To be successful, programming models should be independent of the number of processors. • To maximize application efficiency, programming models should support a wide range of data types and successful models of parallelism: task-level parallelism, word-level parallelism, and bit-level parallelism. 1 The Landscape of Parallel Computing Research: A View From Berkeley • Architects should not include features that significantly affect performance or energy if programmers cannot accurately measure their impact via performance counters and energy counters. • Traditional operating systems will be deconstructed and operating system functionality will be orchestrated using libraries and virtual machines. • To explore the design space rapidly, use system emulators based on Field Programmable Gate Arrays (FPGAs) that are highly scalable and low cost. Since real world applications are naturally parallel and hardware is naturally parallel, what we need is a programming model, system software, and a supporting architecture that are naturally parallel. Researchers have the rare opportunity to re-invent these cornerstones of computing, provided they simplify the efficient programming of highly parallel systems.

2,262 citations

01 Jan 2008
TL;DR: This report is based on a proposal for creating a Universal Parallel Computing ResearchCenter (UPCRC) that a technical committee from Intel and Microsoft unanimously selected as the top proposal in a competition with the top 25 computer science departments.
Abstract: In December 2006, we published a broad survey of the issues for the whole field concerning themulticore/manycore sea change (see view.eecs.berkeley.edu). (Asanovic, Bodik et al. 2006)We view the ultimate goal as the ability to create efficient and correct software productively thatscales smoothly as the number of cores per chip doubles biennially. This much shorter reportcovers the specific research agenda that a large group of us at Berkeley is going to follow.This report is based on a proposal for creating a Universal Parallel Computing ResearchCenter (UPCRC) that a technical committee from Intel and Microsoft unanimously selected as thetop proposal in a competition with the top 25 computer science departments. The five-year,$10M, UPCRCforms the foundation for the U.C. Berkeley Parallel Computing Laboratory, orPar Lab, a multidisciplinary research project exploring the future of parallel processing (seeparlab.eecs.berkeley.edu)To take a fresh approach to the longstanding parallel computing problem, our researchagenda will be driven by compelling applications developed by domain experts. Historically, pastefforts to resolve these challenges have often been driven "bottom-up" from the hardware, withapplications an afterthought. We will focus on exciting new applications that need much morecomputing horsepower to run well, rather than on legacy programs that already run well ontoday's computers. Our applications are in the areas of personal health, image retrieval, music,speech understanding, and web browsers.The development of parallel software is the heart of our research agenda. The task will bedivided into two layers: an efficiency layer that aims at low overhead for 10 percent of the bestprogrammers, and a productivity layer for the rest of the programming community--includingdomain experts--that reuses the parallel software developed at the efficiency layer. Key to thisapproach is a layer of libraries and programming frameworks centered on the 13 computationalbottlenecks ("motifs") that we identified in the original Berkeley View report. (Asanovic, Bodiket al. 2006) We will also create a Composition and Coordination Language to make it easier tocompose these components. Finally, we will rely on autotuning to map the software efficiently toa particular parallel computer. Past attempts have often relied on a single programmingabstraction and language for all programmers and on automatically parallelizing compilers.The role of the operating system and the architecture in this project is to support software andapplications in achieving the ultimate goal, rather than the conventional approach of fixing theenvironment in which parallel software must survive. Example innovations include very thinhypervisors, which allow user-level control of processor scheduling, and hardware support forpartitioning and fast barrier synchronization.We will prototype the hardware of the future using field-programmable gate arrays (FPGAs),which we believe are fast enough to be interesting to parallel software researchers, yet flexibleenough to "tape out" new designs every day, while being cheap enough that universityresearchers can afford to construct systems containing hundreds of processors. This prototypinginfrastructure is called RAMP (Research Accelerator for Multiple Processors), and is beingdeveloped by a consortium of universities and companies (see ramp.eecs.berkeley.edu).

59 citations

Proceedings ArticleDOI
05 Oct 2014
TL;DR: This work presents Programming by Manipulation, a new programming methodology for specifying the layout of data visualizations, targeted at non-programmers, and suggests that the tool is 5-times more productive than direct programming with constraints.
Abstract: We present Programming by Manipulation, a new programming methodology for specifying the layout of data visualizations, targeted at non-programmers. We address the two central sources of bugs that arise when programming with constraints: ambiguities and conflicts (inconsistencies). We rule out conflicts by design and exploit ambiguity to explore possible layout designs. Our users design layouts by highlighting undesirable aspects of a current design, effectively breaking spurious constraints and introducing ambiguity by giving some elements freedom to move or resize. Subsequently, the tool indicates how the ambiguity can be removed, by computing how the free elements can be fixed with available constraints. To support this workflow, our tool computes the ambiguity and summarizes it visually. We evaluate our work with two user-studies demonstrating that both non-programmers and programmers can effectively use our prototype. Our results suggest that our tool is 5-times more productive than direct programming with constraints.

33 citations

Journal ArticleDOI
TL;DR: A college education has two goals: to produce intellectually mature, sophisticated leaders who can think deeply and productively in a range of fields and contexts and to provide students with skills that they can apply successfully throughout a long career in their chosen profession or professions.
Abstract: A college education has two goals. First, to produce intellectually mature, sophisticated leaders who can think deeply and productively in a range of fields and contexts. Second, to provide students with skills that they can apply successfully throughout a long career in their chosen profession or professions. Programming language concepts and ways of thinking can be a critically important part of the successful education of virtually any college student, regardless of discipline. For computer scientists these concepts and ways of thinking are indispensible.

8 citations

01 Jan 2013
TL;DR: This paper presents Quicksilver, a programming-by-demonstration solution that derives queries from user inputs that is designed to be easy and intuitive for users who are not familiar with database theory.
Abstract: Relational data has become so widespread that even end-users such as secretaries and teachers frequently interact with them. However, finding the right query to retrieve the necessary data from complex databases can be very difficult for end-users. Many of these users can be seen voicing their confusion on several Excel help forums, receiving little help and waiting days for responses from experts. In this paper, we present Quicksilver, a programming-by-demonstration solution that derives queries from user inputs. It is designed to be easy and intuitive for users who are not familiar with database theory. We present Quicksilver’s interface designs and synthesis algorithms. We conclude with a user study designed to evaluate Quicksilver’s performance. Index Terms Programming-by-demonstration, Program Synthesis, Databases

6 citations


Cited by
More filters
Proceedings ArticleDOI
04 Oct 2009
TL;DR: This characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.
Abstract: This paper presents and characterizes Rodinia, a benchmark suite for heterogeneous computing. To help architects study emerging platforms such as GPUs (Graphics Processing Units), Rodinia includes applications and kernels which target multi-core CPU and GPU platforms. The choice of applications is inspired by Berkeley's dwarf taxonomy. Our characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.

2,697 citations

18 Dec 2006
TL;DR: The parallel landscape is frame with seven questions, and the following are recommended to explore the design space rapidly: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS each development dollar.
Abstract: Author(s): Asanovic, K; Bodik, R; Catanzaro, B; Gebis, J; Husbands, P; Keutzer, K; Patterson, D; Plishker, W; Shalf, J; Williams, SW | Abstract: The recent switch to parallel microprocessors is a milestone in the history of computing. Industry has laid out a roadmap for multicore designs that preserves the programming paradigm of the past via binary compatibility and cache coherence. Conventional wisdom is now to double the number of cores on a chip with each silicon generation. A multidisciplinary group of Berkeley researchers met nearly two years to discuss this change. Our view is that this evolutionary approach to parallel hardware and software may work from 2 or 8 processor systems, but is likely to face diminishing returns as 16 and 32 processor systems are realized, just as returns fell with greater instruction-level parallelism. We believe that much can be learned by examining the success of parallelism at the extremes of the computing spectrum, namely embedded computing and high performance computing. This led us to frame the parallel landscape with seven questions, and to recommend the following: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS per development dollar. • Instead of traditional benchmarks, use 13 “Dwarfs” to design and evaluate parallel programming models and architectures. (A dwarf is an algorithmic method that captures a pattern of computation and communication.) • “Autotuners” should play a larger role than conventional compilers in translating parallel programs. • To maximize programmer productivity, future programming models must be more human-centric than the conventional focus on hardware or applications. • To be successful, programming models should be independent of the number of processors. • To maximize application efficiency, programming models should support a wide range of data types and successful models of parallelism: task-level parallelism, word-level parallelism, and bit-level parallelism. 1 The Landscape of Parallel Computing Research: A View From Berkeley • Architects should not include features that significantly affect performance or energy if programmers cannot accurately measure their impact via performance counters and energy counters. • Traditional operating systems will be deconstructed and operating system functionality will be orchestrated using libraries and virtual machines. • To explore the design space rapidly, use system emulators based on Field Programmable Gate Arrays (FPGAs) that are highly scalable and low cost. Since real world applications are naturally parallel and hardware is naturally parallel, what we need is a programming model, system software, and a supporting architecture that are naturally parallel. Researchers have the rare opportunity to re-invent these cornerstones of computing, provided they simplify the efficient programming of highly parallel systems.

2,262 citations

Journal ArticleDOI
TL;DR: The Roofline model offers insight on how to improve the performance of software and hardware in the rapidly changing world of connected devices.
Abstract: The Roofline model offers insight on how to improve the performance of software and hardware.

2,181 citations

Proceedings Article
12 Dec 2011
TL;DR: In this paper, the authors present an update scheme called HOGWILD!, which allows processors access to shared memory with the possibility of overwriting each other's work, which achieves a nearly optimal rate of convergence.
Abstract: Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking. We present an update scheme called HOGWILD! which allows processors access to shared memory with the possibility of overwriting each other's work. We show that when the associated optimization problem is sparse, meaning most gradient updates only modify small parts of the decision variable, then HOGWILD! achieves a nearly optimal rate of convergence. We demonstrate experimentally that HOGWILD! outperforms alternative schemes that use locking by an order of magnitude.

1,939 citations

Posted Content
TL;DR: This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking, and presents an update scheme called HOGWILD! which allows processors access to shared memory with the possibility of overwriting each other's work.
Abstract: Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking. We present an update scheme called HOGWILD! which allows processors access to shared memory with the possibility of overwriting each other's work. We show that when the associated optimization problem is sparse, meaning most gradient updates only modify small parts of the decision variable, then HOGWILD! achieves a nearly optimal rate of convergence. We demonstrate experimentally that HOGWILD! outperforms alternative schemes that use locking by an order of magnitude.

1,413 citations