Home
/
Authors
/
Hassan Chafi

Author

Hassan Chafi

Other affiliations: Stanford University, Oracle Corporation

Bio: Hassan Chafi is an academic researcher from Business International Corporation. The author has contributed to research in topics: Graph (abstract data type) & Compiler. The author has an hindex of 31, co-authored 102 publications receiving 3935 citations. Previous affiliations of Hassan Chafi include Stanford University & Oracle Corporation.

Papers published on a yearly basis

2023
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2007
2006
2005

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Green-Marl: a DSL for easy and efficient graph analysis

[...]

Sungpack Hong¹, Hassan Chafi¹, Edic Sedlar², Kunle Olukotun¹•Institutions (2)

Stanford University¹, Oracle Corporation²

03 Mar 2012

TL;DR: This paper describes Green-Marl, a domain-specific language (DSL) whose high level language constructs allow developers to describe their graph analysis algorithms intuitively, but expose the data-level parallelism inherent in the algorithms.

...read moreread less

Abstract: The increasing importance of graph-data based applications is fueling the need for highly efficient and parallel implementations of graph analysis software. In this paper we describe Green-Marl, a domain-specific language (DSL) whose high level language constructs allow developers to describe their graph analysis algorithms intuitively, but expose the data-level parallelism inherent in the algorithms. We also present our Green-Marl compiler which translates high-level algorithmic description written in Green-Marl into an efficient C++ implementation by exploiting this exposed data-level parallelism. Furthermore, our Green-Marl compiler applies a set of optimizations that take advantage of the high-level semantic knowledge encoded in the Green-Marl DSL. We demonstrate that graph analysis algorithms can be written very intuitively with Green-Marl through some examples, and our experimental results show that the compiler-generated implementation out of such descriptions performs as well as or better than highly-tuned hand-coded implementations.

...read moreread less

308 citations

Proceedings Article•DOI•

The LDBC Social Network Benchmark: Interactive Workload

[...]

Orri Erling¹, Alex Averbuch, Josep L. Larriba-Pey, Hassan Chafi², Andrey Gubichev³, Arnau Prat⁴, Minh-Duc Pham⁵, Peter Boncz - Show less +4 more•Institutions (5)

OpenLink Software¹, Oracle Corporation², Technische Universität München³, Polytechnic University of Catalonia⁴, VU University Amsterdam⁵

27 May 2015

TL;DR: This paper describes the LDBC Social Network Benchmark (SNB), and presents database benchmarking innovation in terms of graph query functionality tested, correlated graph generation techniques, as well as a scalable benchmark driver on a workload with complex graph dependencies.

...read moreread less

Abstract: The Linked Data Benchmark Council (LDBC) is now two years underway and has gathered strong industrial participation for its mission to establish benchmarks, and benchmarking practices for evaluating graph data management systems. The LDBC introduced a new choke-point driven methodology for developing benchmark workloads, which combines user input with input from expert systems architects, which we outline. This paper describes the LDBC Social Network Benchmark (SNB), and presents database benchmarking innovation in terms of graph query functionality tested, correlated graph generation techniques, as well as a scalable benchmark driver on a workload with complex graph dependencies. SNB has three query workloads under development: Interactive, Business Intelligence, and Graph Algorithms. We describe the SNB Interactive Workload in detail and illustrate the workload with some early results, as well as the goals for the two other workloads.

...read moreread less

262 citations

Proceedings Article•DOI•

A practical concurrent binary search tree

[...]

Nathan G. Bronson¹, Jared Casper¹, Hassan Chafi¹, Kunle Olukotun¹•Institutions (1)

Stanford University¹

09 Jan 2010

TL;DR: Experimental evidence shows that the proposed concurrent relaxed balance AVL tree algorithm outperforms a highly tuned concurrent skip list for many access patterns, with an average of 39% higher single- threaded throughput and 32% higher multi-threaded throughput over a range of contention levels and operation mixes.

...read moreread less

Abstract: We propose a concurrent relaxed balance AVL tree algorithm that is fast, scales well, and tolerates contention. It is based on optimistic techniques adapted from software transactional memory, but takes advantage of specific knowledge of the the algorithm to reduce overheads and avoid unnecessary retries. We extend our algorithm with a fast linearizable clone operation, which can be used for consistent iteration of the tree. Experimental evidence shows that our algorithm outperforms a highly tuned concurrent skip list for many access patterns, with an average of 39% higher single-threaded throughput and 32% higher multi-threaded throughput over a range of contention levels and operation mixes.

...read moreread less

226 citations

Journal Article•DOI•

Architectural Semantics for Practical Transactional Memory

[...]

Austen McDonald¹, JaeWoong Chung¹, Brian D. Carlstrom¹, Chi Cao Minh¹, Hassan Chafi¹, Christos Kozyrakis¹, Kunle Olukotun¹ - Show less +3 more•Institutions (1)

Stanford University¹

01 May 2006

TL;DR: This paper introduces three key mechanisms: two-phase commit; support for software handlers on commit, violation, and abort; and full support for open- and closed-nested transactions with independent rollback, which provide a flexible interface to implement programming language and operating system functionality.

...read moreread less

Abstract: Transactional Memory (TM) simplifies parallel programming by allowing for parallel execution of atomic tasks. Thus far, TM systems have focused on implementing transactional state buffering and conflict resolution. Missing is a robust hardware/software interface, not limited to simplistic instructions defining transaction boundaries. Without rich semantics, current TM systems cannot support basic features of modern programming languages and operating systems such as transparent library calls, conditional synchronization, system calls, I/O, and runtime exceptions. This paper presents a comprehensive instruction set architecture (ISA) for TM systems. Our proposal introduces three key mechanisms: two-phase commit; support for software handlers on commit, violation, and abort; and full support for open- and closed-nested transactions with independent rollback. These mechanisms provide a flexible interface to implement programming language and operating system functionality. We also show that these mechanisms are practical to implement at the ISA and microarchitecture level for various TM systems. Using an execution-driven simulation, we demonstrate both the functionality (e.g., I/O and conditional scheduling within transactions) and performance potential (2.2× improvement for SPECjbb2000) of the proposed mechanisms. Overall, this paper establishes a rich and efficient interface to foster both hardware and software research on transactional memory.

...read moreread less

221 citations

Proceedings Article•

OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning

[...]

Arvind K. Sujeeth¹, HyoukJoong Lee¹, Kevin J. Brown¹, Tiark Rompf², Hassan Chafi¹, Michael Wu¹, Anand R. Atreya¹, Martin Odersky², Kunle Olukotun¹ - Show less +5 more•Institutions (2)

Stanford University¹, École Polytechnique Fédérale de Lausanne²

28 Jun 2011

TL;DR: OptiML is an implicitly parallel, expressive and high performance alternative to MATLAB and C++ and shows that OptiML outperforms explicitly parallelized MATLAB code in nearly all cases.

...read moreread less

Abstract: As the size of datasets continues to grow, machine learning applications are becoming increasingly limited by the amount of available computational power. Taking advantage of modern hardware requires using multiple parallel programming models targeted at different devices (e.g. CPUs and GPUs). However, programming these devices to run efficiently and correctly is difficult, error-prone, and results in software that is harder to read and maintain. We present OptiML, a domain-specific language (DSL) for machine learning. OptiML is an implicitly parallel, expressive and high performance alternative to MATLAB and C++. OptiML performs domain-specific analyses and optimizations and automatically generates CUDA code for GPUs. We show that OptiML outperforms explicitly parallelized MATLAB code in nearly all cases.

...read moreread less

210 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

Model Driven Engineering

[...]

Stuart Kent

01 Jan 2002-Lecture Notes in Computer Science

TL;DR: A framework for model driven engineering is set out, which proposes an organisation of the modelling 'space' and how to locate models in that space, and identifies the need for defining families of languages and transformations, and for developing techniques for generating/configuring tools from such definitions.

...read moreread less

Abstract: The Object Management Group's (OMG) Model Driven Architecture (MDA) strategy envisages a world where models play a more direct role in software production, being amenable to manipulation and transformation by machine. Model Driven Engineering (MDE) is wider in scope than MDA. MDE combines process and analysis with architecture. This article sets out a framework for model driven engineering, which can be used as a point of reference for activity in this area. It proposes an organisation of the modelling 'space' and how to locate models in that space. It discusses different kinds of mappings between models. It explains why process and architecture are tightly connected. It discusses the importance and nature of tools. It identifies the need for defining families of languages and transformations, and for developing techniques for generating/configuring tools from such definitions. It concludes with a call to align metamodelling with formal language engineering techniques.

...read moreread less

1,476 citations

Proceedings Article•DOI•

Spark SQL: Relational Data Processing in Spark

[...]

Michael Armbrust, Reynold Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan¹, Michael J. Franklin¹, Ali Ghodsi, Matei Zaharia - Show less +7 more•Institutions (1)

University of California, Berkeley¹

27 May 2015

TL;DR: Spark SQL is a new module in Apache Spark that integrates relational processing with Spark's functional programming API, and includes a highly extensible optimizer, Catalyst, built using features of the Scala programming language.

...read moreread less

Abstract: Spark SQL is a new module in Apache Spark that integrates relational processing with Spark's functional programming API. Built on our experience with Shark, Spark SQL lets Spark programmers leverage the benefits of relational processing (e.g. declarative queries and optimized storage), and lets SQL users call complex analytics libraries in Spark (e.g. machine learning). Compared to previous systems, Spark SQL makes two main additions. First, it offers much tighter integration between relational and procedural processing, through a declarative DataFrame API that integrates with procedural Spark code. Second, it includes a highly extensible optimizer, Catalyst, built using features of the Scala programming language, that makes it easy to add composable rules, control code generation, and define extension points. Using Catalyst, we have built a variety of features (e.g. schema inference for JSON, machine learning types, and query federation to external databases) tailored for the complex needs of modern data analysis. We see Spark SQL as an evolution of both SQL-on-Spark and of Spark itself, offering richer APIs and optimizations while keeping the benefits of the Spark programming model.

...read moreread less

1,230 citations

What is Twitter

[...]

Rizal Setya Perdana

01 Jan 2013

1,098 citations

Proceedings Article•DOI•

Evaluating MapReduce for Multi-core and Multiprocessor Systems

[...]

C. Ranger¹, R. Raghuraman¹, A. Penmetsa¹, Gary Bradski¹, Christos Kozyrakis¹ - Show less +1 more•Institutions (1)

Stanford University¹

10 Feb 2007

TL;DR: It is established that, given a careful implementation, MapReduce is a promising model for scalable performance on shared-memory systems with simple parallel code.

...read moreread less

Abstract: This paper evaluates the suitability of the MapReduce model for multi-core and multi-processor systems. MapReduce was created by Google for application development on data-centers with thousands of servers. It allows programmers to write functional-style code that is automatically parallelized and scheduled in a distributed system. We describe Phoenix, an implementation of MapReduce for shared-memory systems that includes a programming API and an efficient runtime system. The Phoenix runtime automatically manages thread creation, dynamic task scheduling, data partitioning, and fault tolerance across processor nodes. We study Phoenix with multi-core and symmetric multiprocessor systems and evaluate its performance potential and error recovery features. We also compare MapReduce code to code written in lower-level APIs such as P-threads. Overall, we establish that, given a careful implementation, MapReduce is a promising model for scalable performance on shared-memory systems with simple parallel code

...read moreread less

1,058 citations

Proceedings Article•DOI•

TVM: an automated end-to-end optimizing compiler for deep learning

[...]

Tianqi Chen¹, Thierry Moreau¹, Ziheng Jiang¹, Lianmin Zheng², Eddie Yan¹, Meghan Cowan¹, Haichen Shen¹, Leyuan Wang³, Yuwei Hu⁴, Luis Ceze¹, Carlos Guestrin¹, Arvind Krishnamurthy¹ - Show less +8 more•Institutions (4)

University of Washington¹, Shanghai Jiao Tong University², University of California, Davis³, Cornell University⁴

08 Oct 2018

TL;DR: TVM as discussed by the authors is a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends, such as mobile phones, embedded devices, and accelerators.

...read moreread less

Abstract: There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms - such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) - requires significant manual effort. We propose TVM, a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, mapping to arbitrary hardware primitives, and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations. Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs. We also demonstrate TVM's ability to target new accelerator back-ends, such as the FPGA-based generic deep learning accelerator. The system is open sourced and in production use inside several major companies.

...read moreread less

991 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse