scispace - formally typeset
Search or ask a question

Showing papers by "Jean-Luc Gaudiot published in 2002"


Journal ArticleDOI
TL;DR: This paper evaluates the silicon overhead of SMT by performing a transistor/interconnect-level analysis of the layout and shows how the Instruction Set Architecture (ISA) and microarchitecture can have a large effect on the SMT overhead and performance.
Abstract: Simultaneous Multi-Threading (SMT) is a hardware technique that increases processor throughput by issuing instructions simultaneously from multiple threads. However, while SMT can be added to an existing microarchitecture with relatively low overhead, this additional chip area could be used for other resources such as more functional units, larger caches, or better branch predictors. How large is the SMT overhead and at what point does SMT no longer pay off for maximum throughput compared to adding other architecture features? This paper evaluates the silicon overhead of SMT by performing a transistor/interconnect-level analysis of the layout. We discuss microarchitecture issues that impact SMT implementations and show how the Instruction Set Architecture (ISA) and microarchitecture can have a large effect on the SMT overhead and performance. Results show that SMT yields large performance gains with small to moderate area overhead.

60 citations


DOI
01 Jan 2002
TL;DR: A flow-sensitive alias analysis algorithm that computes safe and efficient alias sets in Java and a references-set representation of aliased elements, its type table, and its propagation rules are proposed.
Abstract: We propose a flow-sensitive alias analysis algorithm that computes safe and efficient alias sets in Java. For that, we propose a references-set representation of aliased elements, its type table, and its propagation rules. Also, for an exception construct, we consider try/catch/finally blocks as well as potential exception statement nodes while building a control flow graph. Finally, for the safe alias computation on a control flow graph, we present a structural order traverse of each block and node.

12 citations


Book ChapterDOI
TL;DR: A methodology for parallel programming, along with MPI performance measurement and prediction in a class of a distributed computing environments, namely networks of workstations, is presented, based on a two-level model where analytical models are developed to represent execution behavior of parallel communications and code segments.
Abstract: We present a methodology for parallel programming, along with MPI performance measurement and prediction in a class of a distributed computing environments, namely networks of workstations. Our approach is based on a two-level model where, at the top, a new parallel version of timing graph representation is used to make explicit the parallel communication and code segments of a given parallel program, while at the bottom level, analytical models are developed to represent execution behavior of parallel communications and code segments. Execution time results obtained from execution, together with problem size and number of nodes, are input to the model, which allows us to predict the performance of similar cluster computing systems with a different number of nodes. The analytical model is validated by performing experiments over a homogeneous cluster of workstations. Final results show that our approach produces accurate predictions, within 5% of actual results.

10 citations


Proceedings ArticleDOI
16 Jun 2002
TL;DR: This work presents experimental evaluation of thread migration's ability to reduce the impact of remote array accesses across distributed-memory computers and compares these alternatives using various array access patterns.
Abstract: Thread migration is one approach to remote memory accesses on distributed memory parallel computers. In thread migration, threads of control migrate between processors to access data local to those processors, while conventional approaches tend to move data to the threads that need them. Migration approaches enhance spatial locality by making large address spaces local, but are less adept at exploiting temporal locality. Data-moving approaches, such as cached remote memory fetches or distributed shared memory, can use both types of locality. We present experimental evaluation of thread migration's ability to reduce the impact of remote array accesses across distributed-memory computers. Nomadic Threads uses compiler-generated fine-grain threads which either migrate to make data local or fetch cache lines, tolerating latency with multithreading. We compare these alternatives using various array access patterns.

9 citations


Journal ArticleDOI
TL;DR: It is clear that improved load balance leads to improved execution time and that load balancing for the case of computers with heterogeneous processing capacity is more challenging than for the homogeneous case.

6 citations