Home
/
Authors
/
Vijay Menon

Author

Vijay Menon

Bio: Vijay Menon is an academic researcher from Intel. The author has contributed to research in topics: Transactional memory & Software transactional memory. The author has an hindex of 19, co-authored 29 publications receiving 1625 citations. Previous affiliations of Vijay Menon include Cornell University.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Compiler and runtime support for efficient software transactional memory

[...]

Ali-Reza Adl-Tabatabai¹, Brian T. Lewis¹, Vijay Menon¹, Brian R. Murphy¹, Bratin Saha¹, Tatiana Shpeisman¹ - Show less +2 more•Institutions (1)

Intel¹

11 Jun 2006

TL;DR: A high-performance software transactional memory system (STM) integrated into a managed runtime environment is presented and the JIT compiler is the first to optimize the overheads of STM, and novel techniques for enabling JIT optimizations on STM operations are shown.

...read moreread less

Abstract: Programmers have traditionally used locks to synchronize concurrent access to shared data. Lock-based synchronization, however, has well-known pitfalls: using locks for fine-grain synchronization and composing code that already uses locks are both difficult and prone to deadlock. Transactional memory provides an alternate concurrency control mechanism that avoids these pitfalls and significantly eases concurrent programming. Transactional memory language constructs have recently been proposed as extensions to existing languages or included in new concurrent language specifications, opening the door for new compiler optimizations that target the overheads of transactional memory.This paper presents compiler and runtime optimizations for transactional memory language constructs. We present a high-performance software transactional memory system (STM) integrated into a managed runtime environment. Our system efficiently implements nested transactions that support both composition of transactions and partial roll back. Our JIT compiler is the first to optimize the overheads of STM, and we show novel techniques for enabling JIT optimizations on STM operations. We measure the performance of our optimizations on a 16-way SMP running multi-threaded transactional workloads. Our results show that these techniques enable transactional memory's performance to compete with that of well-tuned synchronization.

...read moreread less

318 citations

Proceedings Article•DOI•

Open nesting in software transactional memory

[...]

Yang Ni¹, Vijay Menon¹, Ali-Reza Adl-Tabatabai¹, Antony L. Hosking², Richard L. Hudson¹, J. Eliot B. Moss³, Bratin Saha¹, Tatiana Shpeisman¹ - Show less +4 more•Institutions (3)

Intel¹, Purdue University², University of Massachusetts Amherst³

14 Mar 2007

TL;DR: New language constructs to support open nesting in Java are described, and it is demonstrated how these constructs can be mapped efficiently to existing STM data structures, demonstrating how open nesting can enhance application scalability.

...read moreread less

Abstract: Transactional memory (TM) promises to simplify concurrent programming while providing scalability competitive to fine-grained locking. Language-based constructs allow programmers to denote atomic regions declaratively and to rely on the underlying system to provide transactional guarantees along with concurrency. In contrast with fine-grained locking, TM allows programmers to write simpler programs that are composable and deadlock-free.TM implementations operate by tracking loads and stores to memory and by detecting concurrent conflicting accesses by different transactions. By automating this process, they greatly reduce the programmer's burden, but they also are forced to be conservative. Incertain cases, conflicting memory accesses may not actually violate the higher-level semantics of a program, and a programmer may wish to allow seemingly conflicting transactions to execute concurrently.Open nested transactions enable expert programmers to differentiate between physical conflicts, at the level of memory, and logical conflicts that actually violate application semantics. A TMsystem with open nesting can permit physical conflicts that are not logical conflicts, and thus increase concurrency among application threads.Here we present an implementation of open nested transactions in a Java-based software transactional memory (STM)system. We describe new language constructs to support open nesting in Java, and we discuss new abstract locking mechanisms that a programmer can use to prevent logical conflicts. We demonstrate how these constructs can be mapped efficiently to existing STM data structures. Finally, we evaluate our system on a set of Java applications and data structures, demonstrating how open nesting can enhance application scalability.

...read moreread less

212 citations

Proceedings Article•DOI•

Enforcing isolation and ordering in STM

[...]

Tatiana Shpeisman¹, Vijay Menon¹, Ali-Reza Adl-Tabatabai¹, Steven Balensiefer², Dan Grossman², Richard L. Hudson¹, Katherine F. Moore², Bratin Saha¹ - Show less +4 more•Institutions (2)

Intel¹, University of Washington²

10 Jun 2007

TL;DR: The results on a set of Java programs show that strong atomicity can be implemented efficiently in a high-performance STM system and introduces a dynamic escape analysis that differentiates private and public data at runtime to make barriers cheaper and a static not-accessed-in-transaction analysis that removes many barriers completely.

...read moreread less

Abstract: Transactional memory provides a new concurrency control mechanism that avoids many of the pitfalls of lock-based synchronization. High-performance software transactional memory (STM) implementations thus far provide weak atomicity: Accessing shared data both inside and outside a transaction can result in unexpected, implementation-dependent behavior. To guarantee isolation and consistent ordering in such a system, programmers are expected to enclose all shared-memory accesses inside transactions.A system that provides strong atomicity guarantees isolation even in the presence of threads that access shared data outside transactions. A strongly-atomic system also orders transactions with conflicting non-transactional memory operations in a consistent manner.In this paper, we discuss some surprising pitfalls of weak atomicity, and we present an STM system that avoids these problems via strong atomicity. We demonstrate how to implement non-transactional data accesses via efficient read and write barriers, and we present compiler optimizations that further reduce the overheads of these barriers. We introduce a dynamic escape analysis that differentiates private and public data at runtime to make barriers cheaper and a static not-accessed-in-transaction analysis that removes many barriers completely. Our results on a set of Java programs show that strong atomicity can be implemented efficiently in a high-performance STM system.

...read moreread less

209 citations

Proceedings Article•DOI•

Practical weak-atomicity semantics for java stm

[...]

Vijay Menon¹, Steven Balensiefer², Tatiana Shpeisman¹, Ali-Reza Adl-Tabatabai¹, Richard L. Hudson¹, Bratin Saha¹, Adam Welc¹ - Show less +3 more•Institutions (2)

Intel¹, University of Washington²

14 Jun 2008

TL;DR: A new weakly atomic Java STM implementation is described that provides single global lock semantics while permitting concurrent execution, but it is shown that this comes at a significant performance cost.

...read moreread less

Abstract: As memory transactions have been proposed as a language-level replacement for locks, there is growing need for well-defined semantics. In contrast to database transactions, transaction memory (TM) semantics are complicated by the fact that programs may access the same memory locations both inside and outside transactions. Strongly atomic semantics, where non transactional accesses are treated as implicit single-operation transactions, remain difficult to provide without specialized hardware support or significant performance overhead. As an alternative, many in the community have informally proposed that a single global lock semantics [18,10], where transaction semantics are mapped to those of regions protected by a single global lock, provide an intuitive and efficiently implementable model for programmers.In this paper, we explore the implementation and performance implications of single global lock semantics in a weakly atomic STM from the perspective of Java, and we discuss why even recent STM implementations fall short of these semantics. We describe a new weakly atomic Java STM implementation that provides single global lock semantics while permitting concurrent execution, but we show that this comes at a significant performance cost. We also propose and implement various alternative semantics that loosen single lock requirements while still providing strong guarantees. We compare our new implementations to previous ones, including a strongly atomic STM.[24]

...read moreread less

133 citations

Proceedings Article•DOI•

Enabling scalability and performance in a large scale CMP environment

[...]

Bratin Saha¹, Ali-Reza Adl-Tabatabai¹, Anwar Ghuloum¹, Mohan Rajagopalan¹, Richard L. Hudson¹, Leaf Petersen¹, Vijay Menon¹, Brian R. Murphy¹, Tatiana Shpeisman¹, Eric Sprangle¹, Anwar Rohillah¹, Doug Carmean¹, Jesse Fang¹ - Show less +9 more•Institutions (1)

Intel¹

21 Mar 2007

TL;DR: This paper presents the architecture of McRT and discusses the experiences with the system, including experimental evaluation that lead to several interesting, non-intuitive findings, providing key insights about the structure of the system stack at this scale.

...read moreread less

Abstract: Hardware trends suggest that large-scale CMP architectures, with tens to hundreds of processing cores on a single piece of silicon, are iminent within the next decade. While existing CMP machines have traditionally been handled in the same way as SMPs, this magnitude of parallelism introduces several fundamental challenges at the architectural level and this, in turn, translates to novel challenges in the design of the software stack for these platforms. This paper presents the "Many Core Run Time" (McRT), a software prototype of an integrated language runtime that was designed to explore configurations of the software stack for enabling performance and scalability on large scale CMP platforms. This paper presents the architecture of McRT and discusses our experiences with the system, including experimental evaluation that lead to several interesting, non-intuitive findings, providing key insights about the structure of the system stack at this scale. A key contribution of this paper is to demonstrate how McRT enables near linear improvements in performance and scalability for desktop workloads such as the popular XviD encoder and a set of RMS (recognition, mining, and synthesis) applications. Another key contribution of this work is its use of McRT to explore non-traditional system configurations such as a light-weight executive in which McRT runs on "bare metal" and replaces the traditional OS. Such configurations are becoming an increasingly attractive alternative to leverage heterogeneous computing uints as seen in today's CPU-GPU configurations.

...read moreread less

83 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

ScanImage: Flexible software for operating laser scanning microscopes

[...]

Thomas A. Pologruto¹, Thomas A. Pologruto², Bernardo L. Sabatini², Bernardo L. Sabatini¹, Karel Svoboda¹ - Show less +1 more•Institutions (2)

Howard Hughes Medical Institute¹, Harvard University²

17 May 2003-Biomedical Engineering Online

TL;DR: This work describes a simple, software-based approach to operating a laser scanning microscope without the need for custom data acquisition hardware and quantifies the effectiveness of the data acquisition and signal conditioning algorithm under a variety of conditions.

...read moreread less

Abstract: Background: Laser scanning microscopy is a powerful tool for analyzing the structure and function of biological specimens. Although numerous commercial laser scanning microscopes exist, some of the more interesting and challenging applications demand custom design. A major impediment to custom design is the difficulty of building custom data acquisition hardware and writing the complex software required to run the laser scanning microscope. Results: We describe a simple, software-based approach to operating a laser scanning microscope without the need for custom data acquisition hardware. Data acquisition and control of laser scanning are achieved through standard data acquisition boards. The entire burden of signal integration and image processing is placed on the CPU of the computer. We quantitate the effectiveness of our data acquisition and signal conditioning algorithm under a variety of conditions. We implement our approach in an open source software package (ScanImage) and describe its functionality. Conclusions: We present ScanImage, software to run a flexible laser scanning microscope that allows easy custom design.

...read moreread less

1,223 citations

Proceedings Article•DOI•

STAMP: Stanford Transactional Applications for Multi-Processing

[...]

Chi Cao Minh¹, JaeWoong Chung¹, Christos Kozyrakis¹, Kunle Olukotun¹•Institutions (1)

Stanford University¹

30 Sep 2008

TL;DR: This paper introduces the Stanford Transactional Application for Multi-Processing (STAMP), a comprehensive benchmark suite for evaluating TM systems and uses the suite to evaluate six different TM systems, identify their shortcomings, and motivate further research on their performance characteristics.

...read moreread less

Abstract: Transactional Memory (TM) is emerging as a promising technology to simplify parallel programming. While several TM systems have been proposed in the research literature, we are still missing the tools and workloads necessary to analyze and compare the proposals. Most TM systems have been evaluated using microbenchmarks, which may not be representative of any real-world behavior, or individual applications, which do not stress a wide range of execution scenarios. We introduce the Stanford Transactional Application for Multi-Processing (STAMP), a comprehensive benchmark suite for evaluating TM systems. STAMP includes eight applications and thirty variants of input parameters and data sets in order to represent several application domains and cover a wide range of transactional execution cases (frequent or rare use of transactions, large or small transactions, high or low contention, etc.). Moreover, STAMP is portable across many types of TM systems, including hardware, software, and hybrid systems. In this paper, we provide descriptions and a detailed characterization of the applications in STAMP. We also use the suite to evaluate six different TM systems, identify their shortcomings, and motivate further research on their performance characteristics.

...read moreread less

934 citations

Journal Article•DOI•

Parallel Programmability and the Chapel Language

[...]

Bradford L. Chamberlain¹, David Callahan², Hans P. Zima³•Institutions (3)

Cray¹, Microsoft², University of Vienna³

01 Aug 2007

TL;DR: A candidate list of desirable qualities for a parallel programming language is offered, and how these qualities are addressed in the design of the Chapel language is described, providing an overview of Chapel's features and how they help address parallel productivity.

...read moreread less

Abstract: In this paper we consider productivity challenges for parallel programmers and explore ways that parallel language design might help improve end-user productivity. We offer a candidate list of desirable qualities for a parallel programming language, and describe how these qualities are addressed in the design of the Chapel language. In doing so, we provide an overview of Chapel's features and how they help address parallel productivity. We also survey current techniques for parallel programming and describe ways in which we consider them to fall short of our idealized productive programming model.

...read moreread less

905 citations

Proceedings Article•DOI•

LogTM: log-based transactional memory

[...]

Kevin E. Moore¹, Jayaram Bobba¹, M.J. Moravan¹, Mark D. Hill¹, Darien Wood¹ - Show less +1 more•Institutions (1)

University of Wisconsin-Madison¹

27 Feb 2006

TL;DR: This paper presents a new implementation of transactional memory, log-based transactionalMemory (LogTM), that makes commits fast by storing old values to a per-thread log in cacheable virtual memory and storing new values in place.

...read moreread less

Abstract: Transactional memory (TM) simplifies parallel programming by guaranteeing that transactions appear to execute atomically and in isolation. Implementing these properties includes providing data version management for the simultaneous storage of both new (visible if the transaction commits) and old (retained if the transaction aborts) values. Most (hardware) TM systems leave old values "in place" (the target memory address) and buffer new values elsewhere until commit. This makes aborts fast, but penalizes (the much more frequent) commits. In this paper, we present a new implementation of transactional memory, log-based transactional memory (LogTM), that makes commits fast by storing old values to a per-thread log in cacheable virtual memory and storing new values in place. LogTM makes two additional contributions. First, LogTM extends a MOESI directory protocol to enable both fast conflict detection on evicted blocks and fast commit (using lazy cleanup). Second, LogTM handles aborts in (library) software with little performance penalty. Evaluations running micro- and SPLASH-2 benchmarks on a 32-way multiprocessor support our decision to optimize for commit by showing that only 1-2% of transactions abort.

...read moreread less

724 citations

Proceedings Article•DOI•

Omega: flexible, scalable schedulers for large compute clusters

[...]

Malte Schwarzkopf¹, Andy Konwinski², Michael Abd-El-Malek³, John Wilkes³•Institutions (3)

University of Cambridge¹, University of California, Berkeley², Google³

15 Apr 2013

TL;DR: This work presents a novel approach to address increasing scale and the need for rapid response to changing requirements using parallelism, shared state, and lock-free optimistic concurrency control to address monolithic cluster scheduler architectures.

...read moreread less

Abstract: Increasing scale and the need for rapid response to changing requirements are hard to meet with current monolithic cluster scheduler architectures. This restricts the rate at which new features can be deployed, decreases efficiency and utilization, and will eventually limit cluster growth. We present a novel approach to address these needs using parallelism, shared state, and lock-free optimistic concurrency control.We compare this approach to existing cluster scheduler designs, evaluate how much interference between schedulers occurs and how much it matters in practice, present some techniques to alleviate it, and finally discuss a use case highlighting the advantages of our approach -- all driven by real-life Google production workloads.

...read moreread less

710 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse