scispace - formally typeset
Search or ask a question
Author

L. Piuel

Bio: L. Piuel is an academic researcher from Complutense University of Madrid. The author has contributed to research in topics: Memory management & Out-of-order execution. The author has an hindex of 1, co-authored 1 publications receiving 4 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: This paper introduces two new dependence checking schemes with different design tradeoffs, but both explicitly rely on timing information as a primary instrument to rule out dependence violation.
Abstract: One of the main challenges of modern processor design is the implementation of a scalable and efficient mechanism to detect memory access order violations as a result of out-of-order execution. Traditional age-ordered associative load queues are complex, inefficient, and power hungry. In this paper, we introduce two new dependence checking schemes with different design tradeoffs, but both explicitly rely on timing information as a primary instrument to rule out dependence violation. Our timing-centric designs operate at a fraction of the energy cost of an associative LQ and achieve the same functionality with an insignificant performance impact on average. Studies with parallel benchmarks also show that they are equally effective and efficient in a chip-multiprocessor environment.

4 citations


Cited by
More filters
Journal Article
TL;DR: The most significant proposals in this research field are reviewed, focusing on the own contributions on optimizing address-based memory disambiguation logic, namely the load-store queue.
Abstract: One of the main challenges of modern processor designs is the implementation of scalable and efficient mechanisms to detect memory access order violations as a result of out-of-order execution. Conventional structures performing this task are complex, inefficient and power-hungry. This fact has generated a large body of work on optimizing address-based memory disambiguation logic, namely the load-store queue. In this paper we review the most significant proposals in this research field, focusing on our own contributions.

11 citations

Journal ArticleDOI
TL;DR: A straightforward filtering mechanism is introduced, which results in a more energy-efficient design than past techniques, using less and simpler hardware, and provides new opportunities for extra types of filtering, which lead to higher energy savings.
Abstract: In the last few years, many researchers have focused their efforts on the field of low-power processor design. Several jobs in this area have dealt with the logic that enforces correct memory-based dependences – the load-store queue – (LSQ) a pretty energy-consuming structure since many accesses are performed in an associative fashion. Among these proposals, some of them manage to reduce this resource's energy consumption by avoiding unnecessary lookups. In this context, the authors introduce a straightforward filtering mechanism, which results in a more energy-efficient design than past techniques, using less and simpler hardware. Besides, both the new scheme and some previous approaches are tested in the widespread x86 architecture. This microarchitectural model provides new opportunities for extra types of filtering, which lead to higher energy savings. On average, the authors proposal filters up to 75% of the associative accesses to the load queue, 56% to the store queue and 42% to the dependence predictor with a reduced amount of hardware – less than 100 bytes. According to their energy model, this means a dynamic energy saving of more than 39% over a conventional LSQ.

5 citations

Patent
15 Apr 1999
TL;DR: To provide a communication network with the agent capabilities such as subscribers can know the traffic congestion, or to be able to identify the most appropriate local communication network, needs to be provided.
Abstract: (57) [Abstract] For example, to provide a communication network with the agent capabilities such as subscribers can know the traffic congestion, or to be able to identify the most appropriate local communication network. A terminal (122) and a data processing system having a communication network (100). The terminal (122) can communicate with the communication network (100), said network (100) comprises a host platform (124) for receiving the agent (130) associated with said terminal (122), wherein agent (130) is configured to communicate with at least one other agent in said platform (124).

2 citations

01 Jan 2011
TL;DR: This thesis shows that such a decoupled design allows significant optimization benefits and is much less sensitive to conservatism applied in the correctness domain than possible in optimizing a monolithic design with correctness requirements.
Abstract: Optimizing the common case has been an adage in decades of processor design practices. However, as the system complexity and optimization techniques’ sophistication have increased substantially, maintaining correctness under all situations, however unlikely, is contributing to the necessity of extra conservatism in all layers of the system design. The mounting process, voltage, and temperature variation concerns further add to the conservatism in setting operating parameters. Excessive conservatism in turn hurts performance and efficiency in the common case. However, much of the system’s complexity comes from advanced performance features and may not compromise the whole system’s functionality and correctness even if some components are imperfect and introduce occasional errors. In this thesis, we propose to separate performance goals from the correctness goal using an explicitly-decoupled architecture. As a proof-of-concept, we discuss two such incarnations for an out-of-order microprocessor. First, we discuss how explicitly-decoupled architecture can be used to implement an efficient mechanism to track and enforce memory dependences. Later, we discuss enhancements to improve traditional ILP (instruction-level parallelism). In both the designs a decoupled performance enhancement engine performs optimistic execution and helps an independent correctness engine by passing high-quality predictions. The lack of concern for correctness in the performance domain allows us to optimize its execution in a more effective fashion than possible in optimizing a monolithic design with correctness requirements. In this thesis we show that such a decoupled design allows significant optimization benefits and is much less sensitive to conservatism applied in the correctness domain.

1 citations