Author

L. Piuel

Bio: L. Piuel is an academic researcher from Complutense University of Madrid. The author has contributed to research in topics: Memory management & Out-of-order execution. The author has an hindex of 1, co-authored 1 publications receiving 4 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Replacing Associative Load Queues: A Timing-Centric Approach

[...]

Fernando Castro¹, R. Noor², Alok Garg², Daniel Chaver¹, Michael C. Huang², L. Piuel¹, Manuel Prieto¹, Francisco Tirado¹ - Show less +4 more•Institutions (2)

Complutense University of Madrid¹, University of Rochester²

01 Apr 2009-IEEE Transactions on Computers

TL;DR: This paper introduces two new dependence checking schemes with different design tradeoffs, but both explicitly rely on timing information as a primary instrument to rule out dependence violation.

...read moreread less

Abstract: One of the main challenges of modern processor design is the implementation of a scalable and efficient mechanism to detect memory access order violations as a result of out-of-order execution. Traditional age-ordered associative load queues are complex, inefficient, and power hungry. In this paper, we introduce two new dependence checking schemes with different design tradeoffs, but both explicitly rely on timing information as a primary instrument to rule out dependence violation. Our timing-centric designs operate at a fraction of the energy cost of an associative LQ and achieve the same functionality with an insignificant performance impact on average. Studies with parallel benchmarks also show that they are equally effective and efficient in a chip-multiprocessor environment.

...read moreread less

4 citations

Cited by

PDF

Open Access

More filters

Journal Article•

Memory Disambiguation Hardware: a Review

[...]

Fernando Castro, Daniel Chaver, Luis Piñuel, Manuel Prieto, Francisco Tirado Fernández - Show less +1 more

01 Oct 2008-Journal of Computer Science and Technology

TL;DR: The most significant proposals in this research field are reviewed, focusing on the own contributions on optimizing address-based memory disambiguation logic, namely the load-store queue.

...read moreread less

Abstract: One of the main challenges of modern processor designs is the implementation of scalable and efficient mechanisms to detect memory access order violations as a result of out-of-order execution. Conventional structures performing this task are complex, inefficient and power-hungry. This fact has generated a large body of work on optimizing address-based memory disambiguation logic, namely the load-store queue. In this paper we review the most significant proposals in this research field, focusing on our own contributions.

...read moreread less

11 citations

Journal Article•DOI•

Hybrid timing-address oriented load-store queue filtering for an x86 architecture

[...]

Rubén Apolloni¹, Daniel Chaver², Fernando Castro², Luis Piñuel², Manuel Prieto², Francisco Tirado² - Show less +2 more•Institutions (2)

National University of San Luis¹, Complutense University of Madrid²

10 Mar 2011-Iet Computers and Digital Techniques

TL;DR: A straightforward filtering mechanism is introduced, which results in a more energy-efficient design than past techniques, using less and simpler hardware, and provides new opportunities for extra types of filtering, which lead to higher energy savings.

...read moreread less

Abstract: In the last few years, many researchers have focused their efforts on the field of low-power processor design. Several jobs in this area have dealt with the logic that enforces correct memory-based dependences – the load-store queue – (LSQ) a pretty energy-consuming structure since many accesses are performed in an associative fashion. Among these proposals, some of them manage to reduce this resource's energy consumption by avoiding unnecessary lookups. In this context, the authors introduce a straightforward filtering mechanism, which results in a more energy-efficient design than past techniques, using less and simpler hardware. Besides, both the new scheme and some previous approaches are tested in the widespread x86 architecture. This microarchitectural model provides new opportunities for extra types of filtering, which lead to higher energy savings. On average, the authors proposal filters up to 75% of the associative accesses to the load queue, 56% to the store queue and 42% to the dependence predictor with a reduced amount of hardware – less than 100 bytes. According to their energy model, this means a dynamic energy saving of more than 39% over a conventional LSQ.

...read moreread less

5 citations

Patent•

Data processing system and method for the

[...]

サートリ・フィリップ, デュボイ・ニコラス, バーグ・バーナード・ジョセフ, ベナマー・アブデルクリム, ロビンソン・ウィリアム・ニール - Show less +1 more

15 Apr 1999

TL;DR: To provide a communication network with the agent capabilities such as subscribers can know the traffic congestion, or to be able to identify the most appropriate local communication network, needs to be provided.

...read moreread less

Abstract: (57) [Abstract] For example, to provide a communication network with the agent capabilities such as subscribers can know the traffic congestion, or to be able to identify the most appropriate local communication network. A terminal (122) and a data processing system having a communication network (100). The terminal (122) can communicate with the communication network (100), said network (100) comprises a host platform (124) for receiving the agent (130) associated with said terminal (122), wherein agent (130) is configured to communicate with at least one other agent in said platform (124).

...read moreread less

2 citations

Exploring Performance-Correctness Explicitly-Decoupled Architectures

[...]

Alok Garg

01 Jan 2011

TL;DR: This thesis shows that such a decoupled design allows significant optimization benefits and is much less sensitive to conservatism applied in the correctness domain than possible in optimizing a monolithic design with correctness requirements.

...read moreread less

Abstract: Optimizing the common case has been an adage in decades of processor design practices. However, as the system complexity and optimization techniques’ sophistication have increased substantially, maintaining correctness under all situations, however unlikely, is contributing to the necessity of extra conservatism in all layers of the system design. The mounting process, voltage, and temperature variation concerns further add to the conservatism in setting operating parameters. Excessive conservatism in turn hurts performance and efficiency in the common case. However, much of the system’s complexity comes from advanced performance features and may not compromise the whole system’s functionality and correctness even if some components are imperfect and introduce occasional errors. In this thesis, we propose to separate performance goals from the correctness goal using an explicitly-decoupled architecture. As a proof-of-concept, we discuss two such incarnations for an out-of-order microprocessor. First, we discuss how explicitly-decoupled architecture can be used to implement an efficient mechanism to track and enforce memory dependences. Later, we discuss enhancements to improve traditional ILP (instruction-level parallelism). In both the designs a decoupled performance enhancement engine performs optimistic execution and helps an independent correctness engine by passing high-quality predictions. The lack of concern for correctness in the performance domain allows us to optimize its execution in a more effective fashion than possible in optimizing a monolithic design with correctness requirements. In this thesis we show that such a decoupled design allows significant optimization benefits and is much less sensitive to conservatism applied in the correctness domain.

...read moreread less

1 citations