scispace - formally typeset
Search or ask a question
Author

Julio Sahuquillo

Bio: Julio Sahuquillo is an academic researcher from Polytechnic University of Valencia. The author has contributed to research in topics: Cache & Cache algorithms. The author has an hindex of 21, co-authored 171 publications receiving 1743 citations. Previous affiliations of Julio Sahuquillo include Polytechnic University of Catalonia & University of Valencia.


Papers
More filters
Proceedings ArticleDOI
19 Nov 2007
TL;DR: The Multi2Sim simulation framework is presented, which models the major components of incoming systems, and is intended to cover the limitations of existing simulators.
Abstract: Current microprocessors are based in complex designs, integrating different components on a single chip, such as hardware threads, processor cores, memory hierarchy or interconnection networks. The permanent need of evaluating new designs on each of these components motivates the development of tools which simulate the system working as a whole. In this paper, we present the Multi2Sim simulation framework, which models the major components of incoming systems, and is intended to cover the limitations of existing simulators. A set of simulation examples is also included for illustrative purposes.

164 citations

Proceedings ArticleDOI
22 Sep 2002
TL;DR: This work investigates the design of on-chip interconnection networks for clustered microarchitectures and proposes point-to-point interconnects together with an effective latency-aware instruction steering scheme and shows that they achieve much better performance than bus-based interConnects.
Abstract: Clustering is an effective microarchitectural technique for reducing the impact of wire delays, the complexity, and the power requirements of microprocessors. In this work, we investigate the design of on-chip interconnection networks for clustered microarchitectures. This new class of interconnects has different demands and characteristics than traditional multiprocessor networks. In a clustered microarchitecture, a low inter-cluster communication latency is essential for high performance. We propose point-to-point interconnects together with an effective latency-aware instruction steering scheme and show that they achieve much better performance than bus-based interconnects. The results show that the connectivity of the network together with latency-aware steering schemes are key for high performance. We also show that these interconnects can be built with simple hardware and achieve a performance close to that of an idealized contention-free model.

154 citations

Proceedings ArticleDOI
14 Apr 2008
TL;DR: This paper proposes a novel soft power-aware real-time scheduler for a state-of-the-art multicore multithreaded processor, which implements dynamic voltage scaling techniques, and shows that using a fair scheduling policy, the proposed algorithm provides, on average, energy savings.
Abstract: High-performance microprocessors, e.g., multithreaded and multicore processors, are being implemented in embedded real-time systems because of the increasing computational requirements. These complex microprocessors have two major drawbacks when they are used for real-time purposes. First, their complexity difficults the calculation of the WCET (worst case execution time). Second, power consumption requirements are much larger, which is a major concern in these systems. In this paper we propose a novel soft power-aware real-time scheduler for a state-of-the-art multicore multithreaded processor, which implements dynamic voltage scaling techniques. The proposed scheduler reduces the energy consumption while satisfying the constraints of soft real-time applications. Different scheduling alternatives have been evaluated, and experimental results show that using a fair scheduling policy, the proposed algorithm provides, on average, energy savings ranging from 34% to 74%.

60 citations

Proceedings ArticleDOI
04 May 2005
TL;DR: A novel drowsy cache policy is proposed called Reuse Most Recently used On (RMRO), which makes use of reuse information to trade off performance versus energy consumption and improves the hit ratio fordrowsy lines by about 67%, while reducing the power consumption by about 11.7%.
Abstract: Technology projections indicate that static power will become a major concern in future generations of high-performance microprocessors. Caches represent a significant percentage of the overall microprocessor die area. Therefore, recent research has concentrated on the reduction of leakage current dissipated by caches. The variety of techniques to control current leakage can be classified as non-state preserving or state preserving. Non-state preserving techniques power off selected cache lines while state preserving place selected lines into a low-power state. Drowsy caches are a recently proposed state-preserving technique. In order to introduce low performance overhead, drowsy caches must be very selective on which cache lines are moved to a drowsy statePast research on cache organization has focused on how best to exploit the temporal locality present in the data stream. In this paper we propose a novel drowsy cache policy called Reuse Most Recently used On (RMRO), which makes use of reuse information to trade off performance versus energy consumption. Our proposal improves the hit ratio for drowsy lines by about 67%, while reducing the power consumption by about 11.7% (assuming 70nm technology) with respect to previously proposed drowsy cache policies.

53 citations

Journal ArticleDOI
TL;DR: This paper analyzes the perceived latency versus the traffic increase (both in bytes and in objects) to evaluate the benefits from the user's perspective and shows that higher algorithm complexity does not improve performance, object-based algorithms outperform those based on pages, and performance among object- based algorithms present minor differences in the object traffic increase.

52 citations


Cited by
More filters
01 May 1993
TL;DR: Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems.
Abstract: Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dynamics models which can be difficult to parallelize efficiently—those with short-range forces where the neighbors of each atom change rapidly. They can be implemented on any distributed-memory parallel machine which allows for message-passing of data between independently executing processors. The algorithms are tested on a standard Lennard-Jones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers--the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems. For large problems, the spatial algorithm achieves parallel efficiencies of 90% and a 1840-node Intel Paragon performs up to 165 faster than a single Cray C9O processor. Trade-offs between the three algorithms and guidelines for adapting them to more complex molecular dynamics simulations are also discussed.

29,323 citations

01 Nov 1997
TL;DR: Recognizing the mannerism ways to get this books computer organization and design the hardware software interface 4th fourth edition by patterson hennessy is additionally useful.
Abstract: Recognizing the mannerism ways to get this books computer organization and design the hardware software interface 4th fourth edition by patterson hennessy is additionally useful. You have remained in right site to begin getting this info. acquire the computer organization and design the hardware software interface 4th fourth edition by patterson hennessy join that we manage to pay for here and check out the link.

832 citations

01 Jan 2016
TL;DR: Thank you very much for downloading using mpi portable parallel programming with the message passing interface for reading a good book with a cup of coffee in the afternoon, instead they are facing with some malicious bugs inside their laptop.
Abstract: Thank you very much for downloading using mpi portable parallel programming with the message passing interface. As you may know, people have search hundreds times for their chosen novels like this using mpi portable parallel programming with the message passing interface, but end up in harmful downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they are facing with some malicious bugs inside their laptop.

593 citations

Journal ArticleDOI
TL;DR: This paper describes the main properties that a network workload generator should have today, and presents a tool for the generation of realistic network workload that can be used for the study of emerging networking scenarios.

434 citations