scispace - formally typeset
Search or ask a question
Author

Manuel Prieto

Other affiliations: Universidad Pública de Navarra
Bio: Manuel Prieto is an academic researcher from Complutense University of Madrid. The author has contributed to research in topics: Scheduling (computing) & Multigrid method. The author has an hindex of 21, co-authored 111 publications receiving 1763 citations. Previous affiliations of Manuel Prieto include Universidad Pública de Navarra.


Papers
More filters
Journal ArticleDOI
TL;DR: How the energy-cognizant scheduler's role has been extended beyond simple energy minimization to also include related issues like the avoidance of negative thermal effects as well as addressing asymmetric multicore architectures is explored.
Abstract: Execution time is no longer the only metric by which computational systems are judged. In fact, explicitly sacrificing raw performance in exchange for energy savings is becoming a common trend in environments ranging from large server farms attempting to minimize cooling costs to mobile devices trying to prolong battery life. Hardware designers, well aware of these trends, include capabilities like DVFS (to throttle core frequency) into almost all modern systems. However, hardware capabilities on their own are insufficient and must be paired with other logic to decide if, when, and by how much to apply energy-minimizing techniques while still meeting performance goals. One obvious choice is to place this logic into the OS scheduler. This choice is particularly attractive due to the relative simplicity, low cost, and low risk associated with modifying only the scheduler part of the OS. Herein we survey the vast field of research on energy-cognizant schedulers. We discuss scheduling techniques to perform energy-efficient computation. We further explore how the energy-cognizant scheduler's role has been extended beyond simple energy minimization to also include related issues like the avoidance of negative thermal effects as well as addressing asymmetric multicore architectures.

161 citations

Journal ArticleDOI
TL;DR: A multitude of new and exciting work is surveyed that explores the diverse new roles the OS scheduler can successfully take on, including those that exclusively make use of OS thread-level scheduling to achieve their goals.
Abstract: Chip multicore processors (CMPs) have emerged as the dominant architecture choice for modern computing platforms and will most likely continue to be dominant well into the foreseeable future. As with any system, CMPs offer a unique set of challenges. Chief among them is the shared resource contention that results because CMP cores are not independent processors but rather share common resources among cores such as the last level cache (LLC). Shared resource contention can lead to severe and unpredictable performance impact on the threads running on the CMP. Conversely, CMPs offer tremendous opportunities for mulithreaded applications, which can take advantage of simultaneous thread execution as well as fast inter thread data sharing. Many solutions have been proposed to deal with the negative aspects of CMPs and take advantage of the positive. This survey focuses on the subset of these solutions that exclusively make use of OS thread-level scheduling to achieve their goals. These solutions are particularly attractive as they require no changes to hardware and minimal or no changes to the OS. The OS scheduler has expanded well beyond its original role of time-multiplexing threads on a single core into a complex and effective resource manager. This article surveys a multitude of new and exciting work that explores the diverse new roles the OS scheduler can successfully take on.

161 citations

Proceedings ArticleDOI
13 Apr 2010
TL;DR: CAMP, a Comprehensive AMP scheduler is proposed, implemented, and evaluated, which delivers both efficiency and TLP specialization, and a new light-weight technique for discovering which threads utilize fast cores most efficiently is proposed.
Abstract: Symmetric-ISA (instruction set architecture) asymmetric-performance multicore processors were shown to deliver higher performance per watt and area for applications with diverse architectural requirements, and so it is likely that future multicore processors will combine a few fast cores characterized by complex pipelines, high clock frequency, high area requirements and power consumption, and many slow ones, characterized by simple pipelines, low clock frequency, low area requirements and power consumption. Asymmetric multicore processors (AMP) derive their efficiency from core specialization. Efficiency specialization ensures that fast cores are used for "CPU-intensive" applications, which efficiently utilize these cores' "expensive" features, while slow cores would be used for "memory-intensive" applications, which utilize fast cores inefficiently. TLP (thread-level parallelism) specialization ensures that fast cores are used to accelerate sequential phases of parallel applications, while leaving slow cores for energy-efficient execution of parallel phases. Specialization is effected by an asymmetry-aware thread scheduler, which maps threads to cores in consideration of the properties of both. Previous asymmetry-aware schedulers employed one type of specialization (either efficiency or TLP), but not both. As a result, they were effective only for limited workload scenarios. We propose, implement, and evaluate CAMP, a Comprehensive AMP scheduler, which delivers both efficiency and TLP specialization. Furthermore, we propose a new light-weight technique for discovering which threads utilize fast cores most efficiently. Our evaluation in the OpenSolaris operating system demonstrates that CAMP accomplishes an efficient use of an AMP system for a variety of workloads, while existing asymmetry-aware schedulers were effective only in limited scenarios.

144 citations

Journal ArticleDOI
TL;DR: This study indicates that FBS outperforms LS in current-generationGPUs, and design trends suggest higher gains in future-generation GPUs.
Abstract: The widespread usage of the discrete wavelet transform (DWT) has motivated the development of fast DWT algorithms and their tuning on all sorts of computer systems. Several studies have compared the performance of the most popular schemes, known as filter bank scheme (FBS) and lifting scheme (LS), and have always concluded that LS is the most efficient option. However, there is no such study on streaming processors such as modern Graphics Processing Units (GPUs). Current trends have transformed these devices into powerful stream processors with enough flexibility to perform intensive and complex floating-point calculations. The opportunities opened up by these platforms, as well as the growing popularity of the DWT within the computer graphics field, make a new performance comparison of great practical interest. Our study indicates that FBS outperforms LS in current-generation GPUs. In our experiments, the actual FBS gains range between 10 percent and 140 percent, depending on the problem size and the type and length of the wavelet filter. Moreover, design trends suggest higher gains in future-generation GPUs.

116 citations

Journal ArticleDOI
TL;DR: This letter proposes a GPU-based implementation of the automated morphological end member extraction algorithm, which is used in this letter as a representative case study of joint spatial/spectral techniques for hyperspectral image processing.
Abstract: Spatial/spectral algorithms have been shown in previous work to be a promising approach to the problem of extracting image end members from remotely sensed hyperspectral data. Such algorithms map nicely on high-performance systems such as massively parallel clusters and networks of computers. Unfortunately, these systems are generally expensive and difficult to adapt to onboard data processing scenarios, in which low-weight and low-power integrated components are highly desirable to reduce mission payload. An exciting new development in this context is the emergence of graphics processing units (GPUs), which can now satisfy extremely high computational requirements at low cost. In this letter, we propose a GPU-based implementation of the automated morphological end member extraction algorithm, which is used in this letter as a representative case study of joint spatial/spectral techniques for hyperspectral image processing. The proposed implementation is quantitatively assessed in terms of both end member extraction accuracy and parallel efficiency, using two generations of commercial GPUs from NVidia. Combined, these parts offer a thoughtful perspective on the potential and emerging challenges of implementing hyperspectral imaging algorithms on commodity graphics hardware.

74 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A tutorial/overview cross section of some relevant hyperspectral data analysis methods and algorithms, organized in six main topics: data fusion, unmixing, classification, target detection, physical parameter retrieval, and fast computing.
Abstract: Hyperspectral remote sensing technology has advanced significantly in the past two decades. Current sensors onboard airborne and spaceborne platforms cover large areas of the Earth surface with unprecedented spectral, spatial, and temporal resolutions. These characteristics enable a myriad of applications requiring fine identification of materials or estimation of physical parameters. Very often, these applications rely on sophisticated and complex data analysis methods. The sources of difficulties are, namely, the high dimensionality and size of the hyperspectral data, the spectral mixing (linear and nonlinear), and the degradation mechanisms associated to the measurement process such as noise and atmospheric effects. This paper presents a tutorial/overview cross section of some relevant hyperspectral data analysis methods and algorithms, organized in six main topics: data fusion, unmixing, classification, target detection, physical parameter retrieval, and fast computing. In all topics, we describe the state-of-the-art, provide illustrative examples, and point to future challenges and research directions.

1,604 citations

01 Jan 2010
TL;DR: This journal special section will cover recent progress on parallel CAD research, including algorithm foundations, programming models, parallel architectural-specific optimization, and verification, as well as other topics relevant to the design of parallel CAD algorithms and software tools.
Abstract: High-performance parallel computer architecture and systems have been improved at a phenomenal rate. In the meantime, VLSI computer-aided design (CAD) software for multibillion-transistor IC design has become increasingly complex and requires prohibitively high computational resources. Recent studies have shown that, numerous CAD problems, with their high computational complexity, can greatly benefit from the fast-increasing parallel computation capabilities. However, parallel programming imposes big challenges for CAD applications. Fully exploiting the computational power of emerging general-purpose and domain-specific multicore/many-core processor systems, calls for fundamental research and engineering practice across every stage of parallel CAD design, from algorithm exploration, programming models, design-time and run-time environment, to CAD applications, such as verification, optimization, and simulation. This journal special section will cover recent progress on parallel CAD research, including algorithm foundations, programming models, parallel architectural-specific optimization, and verification. More specifically, papers with in-depth and extensive coverage of the following topics will be considered, as well as other topics relevant to the design of parallel CAD algorithms and software tools. 1. Parallel algorithm design and specification for CAD applications 2. Parallel programming models and languages of particular use in CAD 3. Runtime support and performance optimization for CAD applications 4. Parallel architecture-specific design and optimization for CAD applications 5. Parallel program debugging and verification techniques particularly relevant for CAD The papers should be submitted via the Manuscript Central website and should adhere to standard ACM TODAES formatting requirements (http://todaes.acm.org/). The page count limit is 25.

459 citations

Journal ArticleDOI
TL;DR: An up-to-date review of research in IQA is provided, and several open challenges in this field are highlighted, including key properties of visual perception, image quality databases, existing full-reference, no- reference, and reduced-reference IQA algorithms.
Abstract: Image quality assessment (IQA) has been a topic of intense research over the last several decades. With each year comes an increasing number of new IQA algorithms, extensions of existing IQA algorithms, and applications of IQA to other disciplines. In this article, I first provide an up-to-date review of research in IQA, and then I highlight several open challenges in this field. The first half of this article provides discuss key properties of visual perception, image quality databases, existing full-reference, no-reference, and reduced-reference IQA algorithms. Yet, despite the remarkable progress that has been made in IQA, many fundamental challenges remain largely unsolved. The second half of this article highlights some of these challenges. I specifically discuss challenges related to lack of complete perceptual models for: natural images, compound and suprathreshold distortions, and multiple distortions, and the interactive effects of these distortions on the images. I also discuss challenges related to IQA of images containing nontraditional, and I discuss challenges related to the computational efficiency. The goal of this article is not only to help practitioners and researchers keep abreast of the recent advances in IQA, but to also raise awareness of the key limitations of current IQA knowledge.

412 citations

Journal ArticleDOI
24 Jun 2019
TL;DR: In this paper, the authors review state-of-the-art approaches in these areas as well as explore potential solutions to address these challenges, including providing enough computing power, redundancy, and security so as to guarantee the safety of autonomous vehicles.
Abstract: Safety is the most important requirement for autonomous vehicles; hence, the ultimate challenge of designing an edge computing ecosystem for autonomous vehicles is to deliver enough computing power, redundancy, and security so as to guarantee the safety of autonomous vehicles. Specifically, autonomous driving systems are extremely complex; they tightly integrate many technologies, including sensing, localization, perception, decision making, as well as the smooth interactions with cloud platforms for high-definition (HD) map generation and data storage. These complexities impose numerous challenges for the design of autonomous driving edge computing systems. First, edge computing systems for autonomous driving need to process an enormous amount of data in real time, and often the incoming data from different sensors are highly heterogeneous. Since autonomous driving edge computing systems are mobile, they often have very strict energy consumption restrictions. Thus, it is imperative to deliver sufficient computing power with reasonable energy consumption, to guarantee the safety of autonomous vehicles, even at high speed. Second, in addition to the edge system design, vehicle-to-everything (V2X) provides redundancy for autonomous driving workloads and alleviates stringent performance and energy constraints on the edge side. With V2X, more research is required to define how vehicles cooperate with each other and the infrastructure. Last, safety cannot be guaranteed when security is compromised. Thus, protecting autonomous driving edge computing systems against attacks at different layers of the sensing and computing stack is of paramount concern. In this paper, we review state-of-the-art approaches in these areas as well as explore potential solutions to address these challenges.

369 citations