scispace - formally typeset
Search or ask a question
Author

Daniel Chaver

Bio: Daniel Chaver is an academic researcher from Complutense University of Madrid. The author has contributed to research in topics: Memory hierarchy & Cache. The author has an hindex of 12, co-authored 43 publications receiving 451 citations.

Papers
More filters
Proceedings ArticleDOI
25 Aug 2003
TL;DR: This work proposes a methodology to reduce the energy consumption of the branch predictor by characterizing prediction demand using profiling and dynamically adjusting predictor resources accordingly, and disabling components of the hybrid direction predictor and resize the branch target buffer.
Abstract: High-end processors typically incorporate complex branch predictors consisting of many large structures that together consume a notable fraction of total chip power (more than 10% in some cases). Depending on the applications, some of these resources may remain underused for long periods of time. We propose a methodology to reduce the energy consumption of the branch predictor by characterizing prediction demand using profiling and dynamically adjusting predictor resources accordingly. Specifically, we disable components of the hybrid direction predictor and resize the branch target buffer. Detailed simulations show that this approach reduces the energy consumption in the branch predictor by an average of 72% and up to 89% with virtually no impact on prediction accuracy and performance.

47 citations

Journal Article
TL;DR: In this work, locality has been significantly improved by means of a novel approach called pipelined computation, which complements previous techniques based on loop tiling and non-linear layouts.
Abstract: This paper addresses the implementation of a 2-D Discrete Wavelet Transform on general-purpose microprocessors, focusing on both memory hierarchy and SIMD parallelization issues. Both topics are somewhat related, since SIMD extensions are only useful if the memory hierarchy is efficiently exploited. In this work, locality has been significantly improved by means of a novel approach called pipelined computation, which complements previous techniques based on loop tiling and non-linear layouts. As experimental platforms we have employed a Pentium-III (P-III) and a Pentium-4 (P-4) microprocessor. However, our SIMD-oriented tuning has been exclusively performed at source code level. Basically, we have reordered some loops and introduced some modifications that allow automatic vectorization. Taking into account the abstraction level at which the optimizations are carried out, the speedups obtained on the investigated platforms are quite satisfactory, even though further improvement can be obtained by dropping the level of abstraction (compiler intrinsics or assembly code).

36 citations

Proceedings ArticleDOI
18 Mar 2013
TL;DR: This work presents a behavior analysis of conventional cache replacement policies in terms of the amount of write to main memory, and new last level cache (LLC) replacement algorithms are exposed, aimed at reducing the number of writes to PCM and hence increasing its lifetime, without significantly degrading system performance.
Abstract: Phase Change Memory (PCM) is currently postulated as the best alternative for replacing Dynamic Random Access Memory (DRAM) as the technology used for implementing main memories, thanks to its significant advantages such as good scalability and low leakage. However, PCM also presents some drawbacks compared to DRAM, like its lower endurance. This work presents a behavior analysis of conventional cache replacement policies in terms of the amount of writes to main memory. Besides, new last level cache (LLC) replacement algorithms are exposed, aimed at reducing the number of writes to PCM and hence increasing its lifetime, without significantly degrading system performance.

33 citations

Proceedings ArticleDOI
15 Apr 2002
TL;DR: This paper discusses several issues relevant to the parallel implementation of a 2-D discrete wavelet transform (DWT) on general purpose multiprocessors and pays much attention to memory hierarchy exploitation.
Abstract: In this paper we discuss several issues relevant to the parallel implementation of a 2-D discrete wavelet transform (DWT) on general purpose multiprocessors. Our interest in this transform is motivated by its usage in an image fission application which has to manage large image sizes, making parallel computing highly advisable. We have also paid much attention to memory hierarchy exploitation, since it has a tremendous impact on performance due to the lack of spatial locality when the DWT processes image columns.

30 citations

Proceedings ArticleDOI
22 Apr 2003
TL;DR: This paper addresses the vectorization of the lifting-based wavelet transform on general-purpose microprocessors in the context of JPEG2000 by avoiding assembler language programming in order to improve both code portability and development cost.
Abstract: This paper addresses the vectorization of the lifting-based wavelet transform on general-purpose microprocessors in the context of JPEG2000. Since SIMD exploitation strongly depends on an efficient memory hierarchy usage, this research is based on previous work about cache-conscious DWT implementations. The experimental platform on which we have chosen to study the benefits of the SIMD extensions is an Intel Pentium-4 (P-4) based PC. However, unlike other authors, the vectorization has been performed avoiding assembler language programming in order to improve both code portability and development cost.

30 citations


Cited by
More filters
01 Jan 2016
TL;DR: The digital design and computer architecture is universally compatible with any devices to read and is available in the digital library an online access to it is set as public so you can download it instantly.
Abstract: Thank you for downloading digital design and computer architecture. As you may know, people have search numerous times for their chosen novels like this digital design and computer architecture, but end up in malicious downloads. Rather than enjoying a good book with a cup of coffee in the afternoon, instead they juggled with some infectious virus inside their laptop. digital design and computer architecture is available in our digital library an online access to it is set as public so you can download it instantly. Our digital library hosts in multiple countries, allowing you to get the most less latency time to download any of our books like this one. Merely said, the digital design and computer architecture is universally compatible with any devices to read.

246 citations

Patent
Navendu Jain1, Ishai Menache1
27 Jun 2011
TL;DR: In this article, a system for managing allocation of resources based on service level agreements between application owners and cloud operators is proposed, where the cloud operator may have responsibility for managing resource allocation to the software application and may manage the allocation such that the application executes within an agreed performance level.
Abstract: A system for managing allocation of resources based on service level agreements between application owners and cloud operators. Under some service level agreements, the cloud operator may have responsibility for managing allocation of resources to the software application and may manage the allocation such that the software application executes within an agreed performance level. Operating a cloud computing platform according to such a service level agreement may alleviate for the application owners the complexities of managing allocation of resources and may provide greater flexibility to cloud operators in managing their cloud computing platforms.

199 citations

Journal ArticleDOI
TL;DR: The aim of this survey is to enable engineers and researchers to get insights into the techniques for improving cache power efficiency and motivate them to invent novel solutions for enabling low-power operation of caches.

125 citations

Journal ArticleDOI
TL;DR: This study indicates that FBS outperforms LS in current-generationGPUs, and design trends suggest higher gains in future-generation GPUs.
Abstract: The widespread usage of the discrete wavelet transform (DWT) has motivated the development of fast DWT algorithms and their tuning on all sorts of computer systems. Several studies have compared the performance of the most popular schemes, known as filter bank scheme (FBS) and lifting scheme (LS), and have always concluded that LS is the most efficient option. However, there is no such study on streaming processors such as modern Graphics Processing Units (GPUs). Current trends have transformed these devices into powerful stream processors with enough flexibility to perform intensive and complex floating-point calculations. The opportunities opened up by these platforms, as well as the growing popularity of the DWT within the computer graphics field, make a new performance comparison of great practical interest. Our study indicates that FBS outperforms LS in current-generation GPUs. In our experiments, the actual FBS gains range between 10 percent and 140 percent, depending on the problem size and the type and length of the wavelet filter. Moreover, design trends suggest higher gains in future-generation GPUs.

116 citations

Patent
Navendu Jain1
13 May 2010
TL;DR: In this article, an optimization framework for hosting sites that dynamically places application instances across multiple hosting sites based on the energy cost and availability of energy at these sites, application SLAs (service level agreements), and cost of network bandwidth between sites, just to name a few.
Abstract: An optimization framework for hosting sites that dynamically places application instances across multiple hosting sites based on the energy cost and availability of energy at these sites, application SLAs (service level agreements), and cost of network bandwidth between sites, just to name a few. The framework leverages a global network of hosting sites, possibly co-located with renewable and non-renewable energy sources, to dynamically determine the best datacenter (site) suited to place application instances to handle incoming workload at a given point in time. Application instances can be moved between datacenters subject to energy availability and dynamic power pricing, for example, which can vary hourly in day-ahead markets and in a time span of minutes in realtime markets.

90 citations