Home
/
Authors
/
Sisira Weeratunga

Author

Sisira Weeratunga

Bio: Sisira Weeratunga is an academic researcher from Ames Research Center. The author has contributed to research in topics: Distributed memory & MIMD. The author has an hindex of 10, co-authored 16 publications receiving 2866 citations.

Topics: Distributed memory, MIMD, Intel iPSC, Computation, Supercomputer architecture ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The Nas Parallel Benchmarks

[...]

David H. Bailey, Eric Barszcz, John T. Barton, D. S. Browning, Russell Carter, Leonardo Dagum, Rod Fatoohi, Paul O. Frederickson, T. A. Lasinski, Robert Schreiber, Horst D. Simon, V. Venkatakrishnan, Sisira Weeratunga - Show less +9 more

01 Sep 1991

TL;DR: A new set of benchmarks has been developed for the performance evaluation of highly parallel supercom puters that mimic the computation and data move ment characteristics of large-scale computational fluid dynamics applications.

...read moreread less

Abstract: A new set of benchmarks has been developed for the performance evaluation of highly parallel supercom puters. These consist of five "parallel kernel" bench marks and three "simulated application" benchmarks. Together they mimic the computation and data move ment characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their "pencil and paper" specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional bench- marking approaches on highly parallel systems are avoided.

...read moreread less

2,246 citations

Proceedings Article•DOI•

The NAS parallel benchmarks—summary and preliminary results

[...]

David H. Bailey¹, Eric Barszcz¹, J. T. Barton¹, D. S. Browning¹, Russell Carter¹, L. Dagum¹, Rod Fatoohi¹, Paul O. Frederickson¹, T. A. Lasinski¹, Robert Schreiber¹, Horst D. Simon¹, V. Venkatakrishnan¹, Sisira Weeratunga¹ - Show less +9 more•Institutions (1)

Ames Research Center¹

01 Aug 1991

495 citations

PDE-based non-linear diffusion techniques for denoising scientific and industrial images: an empirical study

[...]

Sisira Weeratunga, Chandrika Kamath

20 Dec 2001

TL;DR: The empirical results show that, with an appropriate choice of parameters, diffusion-based schemes can be as effective as competitive techniques.

...read moreread less

Abstract: Removing noise from data is often the first step in data analysis. Denoising techniques should not only reduce the noise, but do so without blurring or changing the location of the edges. Many approaches have been proposed to accomplish this; in this paper, they focus on one such approach, namely the use of non-linear diffusion operators. This approach has been studied extensively from a theoretical viewpoint ever since the 1987 work of Perona and Malik showed that non-linear filters outperformed the more traditional linear Canny edge detector. They complement this theoretical work by investigating the performance of several isotropic diffusion operators on test images from scientific domains. They explore the effects of various parameters such as the choice of diffusivity function, explicit and implicit methods for the discretization of the PDE, and approaches for the spatial discretization of the non-linear operator etc. They also compare these schemes with simple spatial filters and the more complex wavelet-based shrinkage techniques. The empirical results show that, with an appropriate choice of parameters, diffusion-based schemes can be as effective as competitive techniques.

...read moreread less

37 citations

Proceedings Article•DOI•

Parallel computation of 3-D Navier-Stokes flowfields for supersonic vehicles

[...]

James Ryan¹, Sisira Weeratunga¹•Institutions (1)

Ames Research Center¹

01 Jan 1993

TL;DR: A new CFD module has been developed to combine the robust accuracy of conventional codes with the ability to run on parallel architectures, achieved by parallelizing the ARC3D algorithm, a central-differenced Navier-Stokes method, on the Intel iPSC/860.

...read moreread less

Abstract: Multidisciplinary design optimization of aircraft will require unprecedented capabilities of both analysis software and computer hardware. The speed and accuracy of the analysis will depend heavily on the computational fluid dynamics (CFD) module which is used. A new CFD module has been developed to combine the robust accuracy of conventional codes with the ability to run on parallel architectures. This is achieved by parallelizing the ARC3D algorithm, a central-differenced Navier-Stokes method, on the Intel iPSC/860. The computed solutions are identical to those from conventional machines. Computational speed on 64 processors is comparable to the rate on one Cray Y-MP processor and will increase as new generations of parallel computers become available.

...read moreread less

31 citations

Proceedings Article•DOI•

Aeroelastic computations for wings through direct coupling on distributed-memory MIMD parallel computers

[...]

Eddy Pramono, Sisira Weeratunga

10 Jan 1994

26 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Book•

Parallel Computer Architecture: A Hardware/Software Approach

[...]

David E. Culler, Anoop Gupta, Jaswinder Pal Singh

15 Aug 1998

TL;DR: This book explains the forces behind this convergence of shared-memory, message-passing, data parallel, and data-driven computing architectures and provides comprehensive discussions of parallel programming for high performance and of workload-driven evaluation, based on understanding hardware-software interactions.

...read moreread less

Abstract: The most exciting development in parallel computer architecture is the convergence of traditionally disparate approaches on a common machine structure. This book explains the forces behind this convergence of shared-memory, message-passing, data parallel, and data-driven computing architectures. It then examines the design issues that are critical to all parallel architecture across the full range of modern design, covering data access, communication performance, coordination of cooperative work, and correct implementation of useful semantics. It not only describes the hardware and software techniques for addressing each of these issues but also explores how these techniques interact in the same system. Examining architecture from an application-driven perspective, it provides comprehensive discussions of parallel programming for high performance and of workload-driven evaluation, based on understanding hardware-software interactions. * synthesizes a decade of research and development for practicing engineers, graduate students, and researchers in parallel computer architecture, system software, and applications development * presents in-depth application case studies from computer graphics, computational science and engineering, and data mining to demonstrate sound quantitative evaluation of design trade-offs * describes the process of programming for performance, including both the architecture-independent and architecture-dependent aspects, with examples and case-studies * illustrates bus-based and network-based parallel systems with case studies of more than a dozen important commercial designs Table of Contents 1 Introduction 2 Parallel Programs 3 Programming for Performance 4 Workload-Driven Evaluation 5 Shared Memory Multiprocessors 6 Snoop-based Multiprocessor Design 7 Scalable Multiprocessors 8 Directory-based Cache Coherence 9 Hardware-Software Tradeoffs 10 Interconnection Network Design 11 Latency Tolerance 12 Future Directions APPENDIX A Parallel Benchmark Suites

...read moreread less

1,571 citations

Proceedings Article•DOI•

Architecting phase change memory as a scalable dram alternative

[...]

Benjamin C. Lee¹, Engin Ipek¹, Onur Mutlu², Doug Burger¹•Institutions (2)

Microsoft¹, Carnegie Mellon University²

20 Jun 2009

TL;DR: This work proposes, crafted from a fundamental understanding of PCM technology parameters, area-neutral architectural enhancements that address these limitations and make PCM competitive with DRAM.

...read moreread less

Abstract: Memory scaling is in jeopardy as charge storage and sensing mechanisms become less reliable for prevalent memory technologies, such as DRAM. In contrast, phase change memory (PCM) storage relies on scalable current and thermal mechanisms. To exploit PCM's scalability as a DRAM alternative, PCM must be architected to address relatively long latencies, high energy writes, and finite endurance.We propose, crafted from a fundamental understanding of PCM technology parameters, area-neutral architectural enhancements that address these limitations and make PCM competitive with DRAM. A baseline PCM system is 1.6x slower and requires 2.2x more energy than a DRAM system. Buffer reorganizations reduce this delay and energy gap to 1.2x and 1.0x, using narrow rows to mitigate write energy and multiple rows to improve locality and write coalescing. Partial writes enhance memory endurance, providing 5.6 years of lifetime. Process scaling will further reduce PCM energy costs and improve endurance.

...read moreread less

1,568 citations

Book•

Parallel Programming in OpenMP

[...]

Rohit Chandra, Leonardo Dagum, Dave Kohr, Dror E. Maydan¹, Jeff McDonald, Ramesh Menon - Show less +2 more•Institutions (1)

Tensilica¹

11 Oct 2000

TL;DR: Aimed at the working researcher or scientific C/C++ or Fortran programmer, this text introduces the competent research programmer to a new vocabulary of idioms and techniques for parallelizing software using OpenMP.

...read moreread less

Abstract: Aimed at the working researcher or scientific C/C++ or Fortran programmer, this text introduces the competent research programmer to a new vocabulary of idioms and techniques for parallelizing software using OpenMP.

...read moreread less

1,253 citations

Book Chapter•DOI•

StreamIt: A Language for Streaming Applications

[...]

William Thies¹, Michal Karczmarek¹, Saman Amarasinghe¹•Institutions (1)

Massachusetts Institute of Technology¹

08 Apr 2002

TL;DR: The StreamIt language provides novel high-level representations to improve programmer productivity and program robustness within the streaming domain and the StreamIt compiler aims to improve the performance of streaming applications via stream-specific analyses and optimizations.

...read moreread less

Abstract: We characterize high-performance streaming applications as a new and distinct domain of programs that is becoming increasingly important. The StreamIt language provides novel high-level representations to improve programmer productivity and program robustness within the streaming domain. At the same time, the StreamIt compiler aims to improve the performance of streaming applications via stream-specific analyses and optimizations. In this paper, we motivate, describe and justify the language features of StreamIt, which include: a structured model of streams, a messaging system for control, a re-initialization mechanism, and a natural textual syntax.

...read moreread less

1,224 citations

Benchmarking modern multiprocessors

[...]

Kai Li¹, Christian Bienia¹•Institutions (1)

Princeton University¹

01 Jan 2011

TL;DR: A methodology to design effective benchmark suites is developed and its effectiveness is demonstrated by developing and deploying a benchmark suite for evaluating multiprocessors called PARSEC, which has been adopted by many architecture groups in both research and industry.

...read moreread less

Abstract: Benchmarking has become one of the most important methods for quantitative performance evaluation of processor and computer system designs. Benchmarking of modern multiprocessors such as chip multiprocessors is challenging because of their application domain, scalability and parallelism requirements. In my thesis, I have developed a methodology to design effective benchmark suites and demonstrated its effectiveness by developing and deploying a benchmark suite for evaluating multiprocessors. More specifically, this thesis includes several contributions. First, the thesis shows that a new benchmark suite for multiprocessors is needed because the behavior of modern parallel programs is significantly different from those represented by SPLASH-2, the most popular parallel benchmark suite developed over ten years ago. Second, the thesis quantitatively describes the requirements and characteristics of a set of multithreaded programs and their underlying technology trends. Third, the thesis presents a systematic approach to scale and select benchmark inputs with the goal of optimizing benchmarking accuracy subject to constrained execution or simulation time. Finally, the thesis describes a parallel benchmark suite called PARSEC for evaluating modern shared-memory multiprocessors. Since its initial release, PARSEC has been adopted by many architecture groups in both research and industry.

...read moreread less

1,043 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse