scispace - formally typeset
Search or ask a question
Author

Raghavan Raman

Bio: Raghavan Raman is an academic researcher from Rice University. The author has contributed to research in topics: Graph database & Programming paradigm. The author has an hindex of 12, co-authored 20 publications receiving 601 citations. Previous affiliations of Raghavan Raman include IBM & Oracle Corporation.

Papers
More filters
Proceedings ArticleDOI
23 May 2009
TL;DR: This paper introduces a new work-stealing scheduler with compiler support for async-finish task parallelism that can accommodate both work- first and help-first scheduling policies, and provides insights on scenarios in which the help- first policy yields better results than the work-first policy and vice versa.
Abstract: Multiple programming models are emerging to address an increased need for dynamic task parallelism in applications for multicore processors and shared-address-space parallel computing. Examples include OpenMP 3.0, Java Concurrency Utilities, Microsoft Task Parallel Library, Intel Thread Building Blocks, Cilk, X10, Chapel, and Fortress. Scheduling algorithms based on work stealing, as embodied in Cilk's implementation of dynamic spawn-sync parallelism, are gaining in popularity but also have inherent limitations. In this paper, we address the problem of efficient and scalable implementation of X10's async-finish task parallelism, which is more general than Cilk's spawn-sync parallelism. We introduce a new work-stealing scheduler with compiler support for async-finish task parallelism that can accommodate both work-first and help-first scheduling policies. Performance results on two different multicore SMP platforms show significant improvements due to our new work-stealing algorithm compared to the existing work-sharing scheduler for X10, and also provide insights on scenarios in which the help-first policy yields better results than the work-first policy and vice versa.

174 citations

Proceedings ArticleDOI
11 Jun 2012
TL;DR: This work presents a new precise dynamic race detector that leverages structured parallelism in order to address limitations of existing dynamic race detectors, and requires constant space per memory location, works in parallel, and is efficient in practice.
Abstract: Existing dynamic race detectors suffer from at least one of the following three limitations:(i)space overhead per memory location grows linearly with the number of parallel threads [13], severely limiting the parallelism that the algorithm can handle;(ii)sequentialization: the parallel program must be processed in a sequential order, usually depth-first [12, 24]. This prevents the analysis from scaling with available hardware parallelism, inherently limiting its performance;(iii) inefficiency: even though race detectors with good theoretical complexity exist, they do not admit efficient implem entations and are unsuitable for practical use [4, 18].We present a new precise dynamic race detector that leverages structured parallelism in order to address these limitations. Our algorithm requires constant space per memory location, works in parallel, and is efficient in practice. We implemented and evaluated our algorithm on a set of 15 benchmarks. Our experimental results indicate an average (geometric mean) slowdown of 2.78x on a 16-core SMP system.

110 citations

Proceedings ArticleDOI
25 Oct 2009
TL;DR: The main components of Rice University's Habanero Multicore Software Research Project are described, which proposes a new approach to multicore software enablement based on a two-level programming model consisting of a higher-level coordination language for domain experts and a lower-level parallel language for programming experts.
Abstract: Multiple programming models are emerging to address an increased need for dynamic task parallelism in multicore shared-memory multiprocessors. This poster describes the main components of Rice University's Habanero Multicore Software Research Project, which proposes a new approach to multicore software enablement based on a two-level programming model consisting of a higher-level coordination language for domain experts and a lower-level parallel language for programming experts.

84 citations

Book ChapterDOI
Raghavan Raman1, Jisheng Zhao1, Vivek Sarkar1, Martin Vechev2, Eran Yahav2 
01 Nov 2010
TL;DR: An efficient dynamic race detector algorithm targeting the async-finish task-parallel parallel programming model, which generalize the spawn-sync constructs used in Cilk, while still ensuring that all computation graphs are deadlock-free.
Abstract: A major productivity hurdle for parallel programming is the presence of data races. Data races can lead to all kinds of harmful program behaviors, including determinism violations and corrupted memory. However, runtime overheads of current dynamic data race detectors are still prohibitively large (often incurring slowdowns of 10× or larger) for use in mainstream software development. In this paper, we present an efficient dynamic race detector algorithm targeting the async-finish task-parallel parallel programming model. The async and finish constructs are at the core of languages such as X10 and Habanero Java (HJ). These constructs generalize the spawn-sync constructs used in Cilk, while still ensuring that all computation graphs are deadlock-free. We have implemented our algorithm in a tool called TASKCHECKER and evaluated it on a suite of 12 benchmarks. To reduce overhead of the dynamic analysis, we have also implemented various static optimizations in the tool. Our experimental results indicate that our approach performs well in practice, incurring an average slowdown of 3.05× compared to a serial execution in the optimized case.

74 citations

Book ChapterDOI
14 Sep 2010
TL;DR: The main idea is to leverage the structure of the program to reduce determinism verification to an independence property that can be proved using a simple sequential analysis.
Abstract: We present a static analysis for automatically verifying determinism of structured parallel programs. The main idea is to leverage the structure of the program to reduce determinism verification to an independence property that can be proved using a simple sequential analysis. Given a task-parallel program, we identify program fragments that may execute in parallel and check that these fragments perform independent memory accesses using a sequential analysis. Since the parts that can execute in parallel are typically only a small fraction of the program, we can employ powerful numerical abstractions to establish that tasks executing in parallel only perform independent memory accesses. We have implemented our analysis in a tool called DICE and successfully applied it to verify determinism on a suite of benchmarks derived from those used in the highperformance computing community.

33 citations


Cited by
More filters
01 Jan 2013

1,098 citations

Proceedings ArticleDOI
23 Feb 2013
TL;DR: This paper presents a lightweight graph processing framework that is specific for shared-memory parallel/multicore machines, which makes graph traversal algorithms easy to write and significantly more efficient than previously reported results using graph frameworks on machines with many more cores.
Abstract: There has been significant recent interest in parallel frameworks for processing graphs due to their applicability in studying social networks, the Web graph, networks in biology, and unstructured meshes in scientific simulation. Due to the desire to process large graphs, these systems have emphasized the ability to run on distributed memory machines. Today, however, a single multicore server can support more than a terabyte of memory, which can fit graphs with tens or even hundreds of billions of edges. Furthermore, for graph algorithms, shared-memory multicores are generally significantly more efficient on a per core, per dollar, and per joule basis than distributed memory systems, and shared-memory algorithms tend to be simpler than their distributed counterparts.In this paper, we present a lightweight graph processing framework that is specific for shared-memory parallel/multicore machines, which makes graph traversal algorithms easy to write. The framework has two very simple routines, one for mapping over edges and one for mapping over vertices. Our routines can be applied to any subset of the vertices, which makes the framework useful for many graph traversal algorithms that operate on subsets of the vertices. Based on recent ideas used in a very fast algorithm for breadth-first search (BFS), our routines automatically adapt to the density of vertex sets. We implement several algorithms in this framework, including BFS, graph radii estimation, graph connectivity, betweenness centrality, PageRank and single-source shortest paths. Our algorithms expressed using this framework are very simple and concise, and perform almost as well as highly optimized code. Furthermore, they get good speedups on a 40-core machine and are significantly more efficient than previously reported results using graph frameworks on machines with many more cores.

816 citations

Proceedings ArticleDOI
14 Nov 2009
TL;DR: This work investigates the design and scalability of work stealing on modern distributed memory systems and demonstrates high efficiency and low overhead when scaling to 8,192 processors for three benchmark codes: a producer-consumer benchmark, the unbalanced tree search benchmark, and a multiresolution analysis kernel.
Abstract: Irregular and dynamic parallel applications pose significant challenges to achieving scalable performance on large-scale multicore clusters. These applications often require ongoing, dynamic load balancing in order to maintain efficiency. Scalable dynamic load balancing on large clusters is a challenging problem which can be addressed with distributed dynamic load balancing systems. Work stealing is a popular approach to distributed dynamic load balancing; however its performance on large-scale clusters is not well understood. Prior work on work stealing has largely focused on shared memory machines. In this work we investigate the design and scalability of work stealing on modern distributed memory systems. We demonstrate high efficiency and low overhead when scaling to 8,192 processors for three benchmark codes: a producer-consumer benchmark, the unbalanced tree search benchmark, and a multiresolution analysis kernel.

286 citations

Proceedings ArticleDOI
02 Jul 2018
TL;DR: An ITiCSE working group conducted a systematic review of the introductory programming literature to explore trends, highlight advances in knowledge over the past 15 years, and indicate possible directions for future research.
Abstract: As computing becomes a mainstream discipline embedded in the school curriculum and acts as an enabler for an increasing range of academic disciplines in higher education, the literature on introductory programming is growing. Although there have been several reviews that focus on specific aspects of introductory programming, there has been no broad overview of the literature exploring recent trends across the breadth of introductory programming. This paper is the report of an ITiCSE working group that conducted a systematic review in order to gain an overview of the introductory programming literature. Partitioning the literature into papers addressing the student, teaching, the curriculum, and assessment, we explore trends, highlight advances in knowledge over the past 15 years, and indicate possible directions for future research.

282 citations

Proceedings ArticleDOI
27 May 2018
TL;DR: A new class of approaches, namely program repair techniques, whose key idea is to try to automatically repair software systems by producing an actual fix that can be validated by the testers before it is finally accepted, or that is adapted to properly fit the system.
Abstract: Despite their growing complexity and increasing size, modern software applications must satisfy strict release requirements that impose short bug fixing and maintenance cycles, putting significant pressure on developers who are responsible for timely producing high-quality software. To reduce developers workload, repairing and healing techniques have been extensively investigated as solutions for efficiently repairing and maintaining software in the last few years. In particular, repairing solutions have been able to automatically produce useful fixes for several classes of bugs that might be present in software programs. A range of algorithms, techniques, and heuristics have been integrated, experimented, and studied, producing a heterogeneous and articulated research framework where automatic repair techniques are proliferating. This paper organizes the knowledge in the area by surveying a body of 108 papers about automatic software repair techniques, illustrating the algorithms and the approaches, comparing them on representative examples, and discussing the open challenges and the empirical evidence reported so far.

256 citations