Home
/
Authors
/
Michel Dubois

Author

Michel Dubois

Bio: Michel Dubois is an academic researcher from University of Southern California. The author has contributed to research in topics: Cache & Shared memory. The author has an hindex of 30, co-authored 114 publications receiving 2678 citations.

Topics: Cache, Shared memory, Cache coherence, CPU cache, Cache coloring ...read more

Papers published on a yearly basis

2020
2018
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1982
1981

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Sequential hardware prefetching in shared-memory multiprocessors

[...]

Fredrik Dahlgren¹, Michel Dubois², Per Stenström¹•Institutions (2)

Lund University¹, University of Southern California²

01 Jul 1995-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Simulations of this adaptive scheme show reductions of the number of read misses, the read penalty, and of the execution time by up to 78%, 58%, and 25% respectively.

...read moreread less

Abstract: To offset the effect of read miss penalties on processor utilization in shared-memory multiprocessors, several software- and hardware-based data prefetching schemes have been proposed. A major advantage of hardware techniques is that they need no support from the programmer or compiler. Sequential prefetching is a simple hardware-controlled prefetching technique which relies on the automatic prefetch of consecutive blocks following the block that misses in the cache, thus exploiting spatial locality. In its simplest form, the number of prefetched blocks on each miss is fixed throughout the execution. However, since the prefetching efficiency varies during the execution of a program, we propose to adapt the number of pre-fetched blocks according to a dynamic measure of prefetching effectiveness. Simulations of this adaptive scheme show reductions of the number of read misses, the read penalty, and of the execution time by up to 78%, 58%, and 25% respectively. >

...read moreread less

167 citations

Proceedings Article•DOI•

Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors

[...]

Fredrik Dahlgren¹, Michel Dubois², Per Stenström¹•Institutions (2)

Lund University¹, University of Southern California²

16 Aug 1993

TL;DR: This work proposes to adapt the number of pref etched blocks according to a dynamic measure of prefetching effectiveness, and shows significant reductions of the read penalty and of the overall execution time.

...read moreread less

Abstract: To offset the effect of read miss penalties on processor utilization in shared-memory multiprocessors, several software- and hardware-based data prefetching schemes have been proposed. A major advantage of hardware tech niques is that they need no support from the programmer or compiler. Sequential prefetching is a simple hardware-controlled prefetching technique which relies on the automatic prefetch of consecutive blocks following the block that misses in the cache. In its simplest form, the number of prefetched blocks on each miss is fixed throughout the exe cution. However, since the prefetching efficiency varies during the execution of a program, we propose to adapt the number of pref etched blocks according to a dynamic measure of prefetching effectiveness. Simulations of this adaptive scheme show significant reductions of the read penalty and of the overall execution time.

...read moreread less

149 citations

Proceedings Article•DOI•

The detection and elimination of useless misses in multiprocessors

[...]

Michel Dubois¹, Jonas Skeppstedt¹, Livio Ricciulli¹, Krishnan Ramamurthy¹, Per Stenström¹ - Show less +1 more•Institutions (1)

University of Southern California¹

01 May 1993

TL;DR: A new classification of misses in shared-memory multiprocessors based on interprocessor communication is introduced, which identifies the set of essential misses, i.e., the smallest set of misses necessary for correct execution.

...read moreread less

Abstract: In this paper we introduce a new classification of misses in shared-memory multiprocessors based on interprocessor communication. We identify the set of essential misses, i.e., the smallest set of misses necessary for correct execution. Essential misses include cold misses and true sharing misses. All other misses are useless misses and can be ignored without affecting the correctness of program execution. Based on the new classification we compare the effectiveness of five different protocols which delay and combine invalidations leading to useless misses. In cache-based systems the protocols are very effective and have miss rates close to the essential miss rate. In virtual shared memory systems the techniques are also effective but leave room for improvements.

...read moreread less

121 citations

Journal Article•DOI•

Verification techniques for cache coherence protocols

[...]

Fong Pong¹, Michel Dubois²•Institutions (2)

Sun Microsystems¹, University of Southern California²

01 Mar 1997-ACM Computing Surveys

TL;DR: This article presents a comprehensive survey of various approaches for the verification of cache coherence protocols based on state enumeration, (symbolic model checking, and symbolic state models), and discusses the efficiency and the limitations of each technique in terms of memory and computation time.

...read moreread less

Abstract: In this article we present a comprehensive survey of various approaches for the verification of cache coherence protocols based on state enumeration, (symbolic model checking, and symbolic state models. Since these techniques search the state space of the protocol exhaustively, the amount of memory required to manipulate that state information and the verification time grow very fast with the number of processors and the complexity of the protocol mechanisms. To be successful for systems of arbitrary complexity, a verification technique must solve this so-called state space explosion problem. The emphasis of our discussion is onthe underlying theory in each method of handling the state space exposion problem, and formulationg and checking the safety properties (e.g., data consistency) and the liveness properties (absence of deadlock and livelock). We compare the efficiency and discuss the limitations of each technique in terms of memory and computation time. Also, we discuss issues of generality, applicability, automaticity, and amenity for existing tools in each class of methods. No method is truly superior because each method has its own strengths and weaknesses. Finally, refinements that can further reduce the verification time and/or the memory requirement are also discussed.

...read moreread less

111 citations

Journal Article•DOI•

Effects of cache coherency in multiprocessors

[...]

Michel Dubois, Fayė A. Briggs

01 Apr 1982

TL;DR: An in-depth analysis of the effects of cache coherency in multiprocessors is presented and a novel analytical model for the program behavior of a multitasked system is introduced.

...read moreread less

Abstract: In many commercial multiprocessor systems, each processor accesses the memory through a private cache. One problem that could limit the extensibility of the system and its performance is the enforcement of cache coherence. A mechanism must exist which prevents the existence of several different copies of the same data block in different private caches. In this paper, we present an indepth analysis of the effect of cache coherency in multiprocessors. A novel analytical model for the program behavior of a multitasked system is introduced. The model includes the behavior of each process and the interactions between processes with regard to the sharing of data blocks. An approximation is developed to derive the main effects of the cache coherency contributing to degradations in system performance.

...read moreread less

109 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

The SPLASH-2 programs: characterization and methodological considerations

[...]

Steven Cameron Woo¹, Moriyoshi Ohara¹, Evan Torrie¹, Jaswinder Pal Singh², Anoop Gupta¹ - Show less +1 more•Institutions (2)

Stanford University¹, Princeton University²

01 May 1995

TL;DR: This paper quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well, including the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality.

...read moreread less

Abstract: The SPLASH-2 suite of parallel applications has recently been released to facilitate the study of centralized and distributed shared-address-space multiprocessors. In this context, this paper has two goals. One is to quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well. The properties we study include the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality, as well as how these properties scale with problem size and the number of processors. The other, related goal is methodological: to assist people who will use the programs in architectural evaluations to prune the space of application and machine parameters in an informed and meaningful way. For example, by characterizing the working sets of the applications, we describe which operating points in terms of cache size and problem size are representative of realistic situations, which are not, and which re redundant. Using SPLASH-2 as an example, we hope to convey the importance of understanding the interplay of problem size, number of processors, and working sets in designing experiments and interpreting their results.

...read moreread less

4,002 citations

Journal Article•DOI•

Cache Memories

[...]

Alan Jay Smith¹•Institutions (1)

University of California, Berkeley¹

01 Sep 1982-ACM Computing Surveys

TL;DR: Specific aspects of cache memories investigated include: the cache fetch algorithm (demand versus prefetch), the placement and replacement algorithms, line size, store-through versus copy-back updating of main memory, cold-start versus warm-start miss ratios, mulhcache consistency, the effect of input /output through the cache, the behavior of split data/instruction caches, and cache size.

...read moreread less

Abstract: design issues. Specific aspects of cache memories tha t are investigated include: the cache fetch algorithm (demand versus prefetch), the placement and replacement algorithms, line size, store-through versus copy-back updating of main memory, cold-start versus warm-start miss ratios, mulhcache consistency, the effect of input /output through the cache, the behavior of split data/instruction caches, and cache size. Our discussion includes other aspects of memory system architecture, including translation lookaside buffers. Throughout the paper, we use as examples the implementation of the cache in the Amdahl 470V/6 and 470V/7, the IBM 3081, 3033, and 370/168, and the DEC VAX 11/780. An extensive bibliography is provided.

...read moreread less

1,614 citations

Book•

Parallel Computer Architecture: A Hardware/Software Approach

[...]

David E. Culler, Anoop Gupta, Jaswinder Pal Singh

15 Aug 1998

TL;DR: This book explains the forces behind this convergence of shared-memory, message-passing, data parallel, and data-driven computing architectures and provides comprehensive discussions of parallel programming for high performance and of workload-driven evaluation, based on understanding hardware-software interactions.

...read moreread less

Abstract: The most exciting development in parallel computer architecture is the convergence of traditionally disparate approaches on a common machine structure. This book explains the forces behind this convergence of shared-memory, message-passing, data parallel, and data-driven computing architectures. It then examines the design issues that are critical to all parallel architecture across the full range of modern design, covering data access, communication performance, coordination of cooperative work, and correct implementation of useful semantics. It not only describes the hardware and software techniques for addressing each of these issues but also explores how these techniques interact in the same system. Examining architecture from an application-driven perspective, it provides comprehensive discussions of parallel programming for high performance and of workload-driven evaluation, based on understanding hardware-software interactions. * synthesizes a decade of research and development for practicing engineers, graduate students, and researchers in parallel computer architecture, system software, and applications development * presents in-depth application case studies from computer graphics, computational science and engineering, and data mining to demonstrate sound quantitative evaluation of design trade-offs * describes the process of programming for performance, including both the architecture-independent and architecture-dependent aspects, with examples and case-studies * illustrates bus-based and network-based parallel systems with case studies of more than a dozen important commercial designs Table of Contents 1 Introduction 2 Parallel Programs 3 Programming for Performance 4 Workload-Driven Evaluation 5 Shared Memory Multiprocessors 6 Snoop-based Multiprocessor Design 7 Scalable Multiprocessors 8 Directory-based Cache Coherence 9 Hardware-Software Tradeoffs 10 Interconnection Network Design 11 Latency Tolerance 12 Future Directions APPENDIX A Parallel Benchmark Suites

...read moreread less

1,571 citations

Journal Article•DOI•

A survey of wormhole routing techniques in direct networks

[...]

Lionel M. Ni¹, Philip K. McKinley¹•Institutions (1)

Michigan State University¹

01 Feb 1993-IEEE Computer

TL;DR: The properties of direct networks are reviewed, and the operation and characteristics of wormhole routing are discussed in detail, along with a technique that allows multiple virtual channels to share the same physical channel.

...read moreread less

Abstract: Several research contributions and commercial ventures related to wormhole routing, a switching technique used in direct networks, are discussed. The properties of direct networks are reviewed, and the operation and characteristics of wormhole routing are discussed in detail. By its nature, wormhole routing is particularly susceptible to deadlock situations, in which two or more packets may block one another indefinitely. Several approaches to deadlock-free. routing, along with a technique that allows multiple virtual channels to share the same physical channel, are described. In addition, several open issues related to wormhole routing are discussed. >

...read moreread less

1,307 citations

Book•

コンピュータ・サイエンス : ACM computing surveys

[...]

共立出版株式会社

01 Jan 1978

1,055 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse