Home
/
Authors
/
David Kaeli

Author

David Kaeli

Other affiliations: Polytechnic University of Catalonia, Northeastern University (China), Rutgers University ...read more

Bio: David Kaeli is an academic researcher from Northeastern University. The author has contributed to research in topics: Cache & Speedup. The author has an hindex of 42, co-authored 358 publications receiving 6822 citations. Previous affiliations of David Kaeli include Polytechnic University of Catalonia & Northeastern University (China).

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1993
1992
1991
1989

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Multi2Sim: a simulation framework for CPU-GPU computing

[...]

Rafael Ubal¹, Byunghyun Jang², Perhaad Mistry¹, Dana Schaa¹, David Kaeli¹ - Show less +1 more•Institutions (2)

Northeastern University¹, University of Mississippi²

19 Sep 2012

TL;DR: This paper presents Multi2Sim, an open-source, modular, and fully configurable toolset that enables ISA-level simulation of an ×86 CPU and an AMD Evergreen GPU, and addresses program emulation correctness, as well as architectural simulation accuracy, using AMD's OpenCL benchmark suite.

...read moreread less

Abstract: Accurate simulation is essential for the proper design and evaluation of any computing platform. Upon the current move toward the CPU-GPU heterogeneous computing era, researchers need a simulation framework that can model both kinds of computing devices and their interaction. In this paper, we present Multi2Sim, an open-source, modular, and fully configurable toolset that enables ISA-level simulation of an ×86 CPU and an AMD Evergreen GPU. Focusing on a model of the AMD Radeon 5870 GPU, we address program emulation correctness, as well as architectural simulation accuracy, using AMD's OpenCL benchmark suite. Simulation capabilities are demonstrated with a preliminary architectural exploration study, and workload characterization examples. The project source code, benchmark packages, and a detailed user's guide are publicly available at www.multi2sim.org.

...read moreread less

440 citations

Book•

Heterogeneous Computing with OpenCL

[...]

Benedict R. Gaster, Lee Howes, David Kaeli, Perhaad Mistry, Dana Schaa - Show less +1 more

31 Aug 2011

TL;DR: Heterogeneous Computing with OpenCL teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units such as AMD Fusion technology.

...read moreread less

Abstract: Heterogeneous Computing with OpenCL teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units (APUs) such as AMD Fusion technology. Designed to work on multiple platforms and with wide industry support, OpenCL will help you more effectively program for a heterogeneous future. Written by leaders in the parallel computing and OpenCL communities, this book will give you hands-on OpenCL experience to address a range of fundamental parallel algorithms. The authors explore memory spaces, optimization techniques, graphics interoperability, extensions, and debugging and profiling. Intended to support a parallel programming course, Heterogeneous Computing with OpenCL includes detailed examples throughout, plus additional online exercises and other supporting materials.Explains principles and strategies to learn parallel programming with OpenCL, from understanding the four abstraction models to thoroughly testing and debugging complete applications.Covers image processing, web plugins, particle simulations, video editing, performance optimization, and more.Shows how OpenCL maps to an example target architecture and explains some of the tradeoffs associated with mapping to various architecturesAddresses a range of fundamental programming techniques, with multiple examples and case studies that demonstrate OpenCL extensions for a variety of hardware platforms

...read moreread less

247 citations

Book Chapter•DOI•

Introduction to Parallel Programming

[...]

Benedict R. Gaster, Lee Howes, David Kaeli, Perhaad Mistry, Dana Schaa - Show less +1 more

01 Jan 2013

TL;DR: This chapter is an introduction to parallel programming to address the need for teaching parallel programming on current system architectures using OpenCL as the target language, and it includes examples for CPUs, GPUs, and their integration in the accelerated processing unit (APU).

...read moreread less

Abstract: This chapter is an introduction to parallel programming. It is organized to address the need for teaching parallel programming on current system architectures using OpenCL as the target language, and it includes examples for CPUs, GPUs, and their integration in the accelerated processing unit (APU). Its other major goal is to provide a guide to programmers to develop well-designed programs in OpenCL targeting parallel systems. It leads the programmer through the various abstractions and features provided by the OpenCL programming environment. The examples offer a simple introduction and more complicated optimizations, and suggest further development and goals at which to aim.

...read moreread less

235 citations

Journal Article•DOI•

Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures

[...]

Byunghyun Jang¹, Dana Schaa¹, Perhaad Mistry¹, David Kaeli¹•Institutions (1)

Northeastern University¹

01 Jan 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Techniques for enhancing the memory efficiency of applications on data-parallel architectures are presented, based on the analysis and characterization of memory access patterns in loop bodies; they target vectorization via data transformation to benefit vector-based architectures and algorithmic memory selection for scalar- based architectures.

...read moreread less

Abstract: The introduction of General-Purpose computation on GPUs (GPGPUs) has changed the landscape for the future of parallel computing. At the core of this phenomenon are massively multithreaded, data-parallel architectures possessing impressive acceleration ratings, offering low-cost supercomputing together with attractive power budgets. Even given the numerous benefits provided by GPGPUs, there remain a number of barriers that delay wider adoption of these architectures. One major issue is the heterogeneous and distributed nature of the memory subsystem commonly found on data-parallel architectures. Application acceleration is highly dependent on being able to utilize the memory subsystem effectively so that all execution units remain busy. In this paper, we present techniques for enhancing the memory efficiency of applications on data-parallel architectures, based on the analysis and characterization of memory access patterns in loop bodies; we target vectorization via data transformation to benefit vector-based architectures (e.g., AMD GPUs) and algorithmic memory selection for scalar-based architectures (e.g., NVIDIA GPUs). We demonstrate the effectiveness of our proposed methods with kernels from a wide range of benchmark suites. For the benchmark kernels studied, we achieve consistent and significant performance improvements (up to 11.4× and 13.5× over baseline GPU implementations on each platform, respectively) by applying our proposed methodology.

...read moreread less

204 citations

Patent•

VMM-based intrusion detection system

[...]

Micha Moffie¹, David Kaeli¹, Aviram Cohen¹, Javed A. Aslam¹, Malak Alshawabkeh¹, Jennifer G. Dy¹, Fatemeh Azmandian¹ - Show less +3 more•Institutions (1)

Northeastern University¹

02 Feb 2009

TL;DR: In this paper, an intrusion detection system collects architectural level events from a Virtual Machine Monitor where the collected events represent operation of a corresponding Virtual Machine and the events are consolidated into features that are compared with features from a known normal operating system.

...read moreread less

Abstract: An intrusion detection system collects architectural level events from a Virtual Machine Monitor where the collected events represent operation of a corresponding Virtual Machine. The events are consolidated into features that are compared with features from a known normal operating system. If an amount of any differences between the collected features and the normal features exceeds a threshold value, a compromised Virtual Machine may be indicated. The comparison thresholds are determined by training on normal and abnormal systems and analyzing the collected events with machine learning algorithms to arrive at a model of normal operation.

...read moreread less

192 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Journal Article•

Data Mining Practical Machine Learning Tools and Techniques

[...]

อนิรุธ สืบสิงห์

01 Jan 2014-Journal of management science

9,185 citations

Journal Article•DOI•

A review on time series data mining

[...]

Tak-chung Fu¹•Institutions (1)

Hong Kong Polytechnic University¹

01 Feb 2011-Engineering Applications of Artificial Intelligence

TL;DR: The primary objective of this paper is to serve as a glossary for interested researchers to have an overall picture on the current time series data mining development and identify their potential research direction to further investigation.

...read moreread less

1,358 citations

Journal Article•DOI•

Regression testing minimization, selection and prioritization: a survey

[...]

Shin Yoo¹, Mark Harman¹•Institutions (1)

King's College London¹

01 Mar 2012-Software Testing, Verification & Reliability

TL;DR: This paper surveys each area of minimization, selection and prioritization technique and discusses open problems and potential directions for future research.

...read moreread less

Abstract: Regression testing is a testing activity that is performed to provide confidence that changes do not harm the existing behaviour of the software. Test suites tend to grow in size as software evolves, often making it too costly to execute entire test suites. A number of different approaches have been studied to maximize the value of the accrued test suite: minimization, selection and prioritization. Test suite minimization seeks to eliminate redundant test cases in order to reduce the number of tests to run. Test case selection seeks to identify the test cases that are relevant to some set of recent changes. Test case prioritization seeks to order test cases in such a way that early fault detection is maximized. This paper surveys each area of minimization, selection and prioritization technique and discusses open problems and potential directions for future research. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

1,276 citations

Journal Article•

Measuring statistical dependence with Hilbert-Schmidt norms

[...]

Arthur Gretton, Olivier Bousquet, Alexander J. Smola, Bernhard Schölkopf

01 Jan 2005-Lecture Notes in Computer Science

TL;DR: An independence criterion based on the eigen-spectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator, or HSIC, is proposed.

...read moreread less

Abstract: We propose an independence criterion based on the eigen-spectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator (we term this a Hilbert-Schmidt Independence Criterion, or HSIC). This approach has several advantages, compared with previous kernel-based independence criteria. First, the empirical estimate is simpler than any other kernel dependence test, and requires no user-defined regularisation. Second, there is a clearly defined population quantity which the empirical estimate approaches in the large sample limit, with exponential convergence guaranteed between the two: this ensures that independence tests based on HSIC do not suffer from slow learning rates. Finally, we show in the context of independent component analysis (ICA) that the performance of HSIC is competitive with that of previously published kernel-based criteria, and of other recently published ICA methods.

...read moreread less

1,134 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse