scispace - formally typeset
Search or ask a question
Author

Julita Corbalan

Other affiliations: Barcelona Supercomputing Center
Bio: Julita Corbalan is an academic researcher from Polytechnic University of Catalonia. The author has contributed to research in topics: Scheduling (computing) & Job scheduler. The author has an hindex of 20, co-authored 52 publications receiving 1210 citations. Previous affiliations of Julita Corbalan include Barcelona Supercomputing Center.


Papers
More filters
Book ChapterDOI
12 May 2008
TL;DR: It is found that work-first schedules seem to have the best performance, but because of the restrictions that OpenMP imposes a breadthfirstscheduler is a better choice to have as a default for an OpenMPruntime.
Abstract: OpenMP is in the process of adding a tasking model that allowsthe programmer to specify independent units of work, called tasks,but does not specify how the scheduling of these tasks should be done(although it imposes some restrictions). We have evaluated differentscheduling strategies (schedulers and cut-offs) with several applicationsand we found that work-first schedules seem to have the best performancebut because of the restrictions that OpenMP imposes a breadthfirstscheduler is a better choice to have as a default for an OpenMPruntime.

133 citations

Proceedings ArticleDOI
15 Nov 2008
TL;DR: This work proposes a new cut-off technique that, using information from the application collected at runtime, decides which tasks should be pruned to improve the performance of the application.
Abstract: In task parallel languages, an important factor for achieving a good performance is the use of a cut-off technique to reduce the number of tasks created. Using a cut-off to avoid an excessive number of tasks helps the runtime system to reduce the total overhead associated with task creation, particularlt if the tasks are fine grain. Unfortunately, the best cut-off technique its usually dependent on the application structure or even the input data of the application. We propose a new cut-off technique that, using information from the application collected at runtime, decides which tasks should be pruned to improve the performance of the application. This technique does not rely on the programmer to determine the cut-off technique that is best suited for the application. We have implemented this cut-off in the context of the new OpenMP tasking model. Our evaluation, with a variety of applications, shows that our adaptive cut-off is able to make good decisions and most of the time matches the optimal cut-off that could be set by hand by a programmer.

93 citations

Journal ArticleDOI
TL;DR: Which application/platform characteristics are necessary for a successful energy-performance trade-off of large scale parallel applications and how cluster power consumption characteristics together with application sensitivity to frequency scaling determine the energy effectiveness of the DVFS technique is analyzed.

78 citations

Proceedings ArticleDOI
22 Oct 2000
TL;DR: In this article, the authors present the SelfAnalyzer, an approach to dynamically analyze the performance of applications (speedup, efficiency and execution time), and the Performance-Driven Processor Allocation (PDPA), a new scheduling policy that distributes processors considering both the global conditions of the system and the particular characteristics of running applications.
Abstract: This work is focused on processor allocation in shared-memory multiprocessor systems, where no knowledge of the application is available when applications are submitted. We perform the processor allocation taking into account the characteristics of the application measured at run-time. We want to demonstrate the importance of an accurate performance analysis and the criteria used to distribute the processors. With this aim, we present the SelfAnalyzer, an approach to dynamically analyzing the performance of applications (speedup, efficiency and execution time), and the Performance-Driven Processor Allocation (PDPA), a new scheduling policy that distributes processors considering both the global conditions of the system and the particular characteristics of running applications. This work also defends the importance of the interaction between the medium-term and the long-term scheduler to control the multiprogramming level in the case of the clairvoyant scheduling pol-icies1. We have implemented our proposal in an SGI Origin2000 with 64 processors and we have compared its performance with that of some scheduling policies proposed so far and with the native IRIX scheduling policy. Results show that the combination of the SelfAnalyzer+PDPA with the medium/long-term scheduling interaction outperforms the rest of the scheduling policies evaluated. The evaluation shows that in workloads where a simple equipartition performs well, the PDPA also performs well, and in extreme workloads where all the applications have a bad performance, our proposal can achieve a speedup of 3.9 with respect to an equipartition and 11.8 with respect to the native IRIX scheduling policy.

67 citations

Proceedings ArticleDOI
01 May 1999
TL;DR: This paper presents some techniques for efficient thread forking and joining in parallel execution environments, taking into consideration the physical structure of NUMA machines and the support for multi-level parallelization and processor grouping.
Abstract: This paper presents some techniques for efficient thread forking and joining in parallel execution environments, taking into consideration the physical structure of NUMA machines and the support for multi-level parallelization and processor grouping. Two work generation schemes and one join mechanism are designed, implemented, evaluated and compared with the ones used in the IRIX MP library, an efficient implementation which supports a single level of parallelism. Supporting multiple levels of parallelism is a current research goal, both in shared and distributed memory machines. Our proposals include a first work generation scheme (GWD, or global work descriptor) which supports multiple levels of parallelism, but not processor grouping. The second work generation scheme (LWD, or local work descriptor) has been designed to support multiple levels of parallelism and processor grouping. Processor grouping is needed to distribute processors among different parts of the computation and maintain the working set of each processor across different parallel constructs. The mechanisms are evaluated using synthetic benchmarks, two SPEC95fp applications and one NAS application. The performance evaluation concludes that: i) the overhead of the proposed mechanisms is similar to the overhead of the existing ones when exploiting a single level of parallelism, and ii) a remarkable improvement in performance is obtained for applications that have multiple levels of parallelism. The comparison with the traditional single-level parallelism exploitation gives an improvement in the range of 30-65% for these applications.

67 citations


Cited by
More filters
Book
01 Jan 2018
TL;DR: This handbook presents the systems, tools, and services of the leading providers of cloud computing; including Google, Yahoo, Amazon, IBM, and Microsoft.
Abstract: Cloud computing has become a significant technology trend Experts believe cloud computing is currently reshaping information technology and the IT marketplace The advantages of using cloud computing include cost savings, speed to market, access to greater computing resources, high availability, and scalability Handbook of Cloud Computing includes contributions from world experts in the field of cloud computing from academia, research laboratories and private industry This book presents the systems, tools, and services of the leading providers of cloud computing; including Google, Yahoo, Amazon, IBM, and Microsoft The basic concepts of cloud computing and cloud computing applications are also introduced Current and future technologies applied in cloud computing are also discussed Case studies, examples, and exercises are provided throughout Handbook of Cloud Computing is intended for advanced-level students and researchers in computer science and electrical engineering as a reference book This handbook is also beneficial to computer and system infrastructure designers, developers, business managers, entrepreneurs and investors within the cloud computing related industry

425 citations

01 Jan 2016
TL;DR: The modern operating systems is universally compatible with any devices to read, and is available in the book collection an online access to it is set as public so you can get it instantly.
Abstract: Thank you for downloading modern operating systems. As you may know, people have look hundreds times for their favorite readings like this modern operating systems, but end up in infectious downloads. Rather than enjoying a good book with a cup of coffee in the afternoon, instead they juggled with some harmful bugs inside their desktop computer. modern operating systems is available in our book collection an online access to it is set as public so you can get it instantly. Our books collection spans in multiple locations, allowing you to get the most less latency time to download any of our books like this one. Kindly say, the modern operating systems is universally compatible with any devices to read.

368 citations

Proceedings ArticleDOI
12 Nov 2011
TL;DR: It is concluded that green datacenters and green-energy-aware scheduling can have a significant role in building a more sustainable IT ecosystem.
Abstract: In this paper, we propose GreenSlot, a parallel batch job scheduler for a datacenter powered by a photovoltaic solar array and the electrical grid (as a backup). GreenSlot predicts the amount of solar energy that will be available in the near future, and schedules the workload to maximize the green energy consumption while meeting the jobs' deadlines. If grid energy must be used to avoid deadline violations, the scheduler selects times when it is cheap. Our results for production scientific workloads demonstrate that Green-Slot can increase green energy consumption by up to 117% and decrease energy cost by up to 39%, compared to a conventional scheduler. Based on these positive results, we conclude that green datacenters and green-energy-aware scheduling can have a significant role in building a more sustainable IT ecosystem.

319 citations

Proceedings ArticleDOI
22 Sep 2009
TL;DR: This paper presents the proposed Barcelona OpenMP Tasks Suite (BOTS), with a set of applications exploiting regular and irregular parallelism, based on tasks, and presents an overall evaluation of the BOTS benchmarks in an Altix system.
Abstract: Traditional parallel applications have exploited regular parallelism, based on parallel loops. Only a few applications exploit sections parallelism. With the release of the new OpenMP specification (3.0), this programming model supports tasking. Parallel tasks allow the exploitation of irregular parallelism, but there is a lack of benchmarks exploiting tasks in OpenMP. With the current (and projected) multicore architectures that offer many more alternatives to execute parallel applications than traditional SMP machines, this kind of parallelism is increasingly important. And so, the need to have some set of benchmarks to evaluate it. In this paper, we motivate the need of having such a benchmarks suite, for irregular and/or recursive task parallelism. We present our proposal, the Barcelona OpenMP Tasks Suite (BOTS), with a set of applications exploiting regular and irregular parallelism, based on tasks. We present an overall evaluation of the BOTS benchmarks in an Altix system and we discuss some of the different experiments that can be done with the different compilation and runtime alternatives of the benchmarks.

285 citations

Proceedings Article
15 Jun 2011
TL;DR: The effects on performance imposed by resource contention and remote access latency are quantified and a new contention management algorithm is proposed and evaluated that significantly outperforms a NUMA-unaware algorithm proposed before as well as the default Linux scheduler.
Abstract: On multicore systems, contention for shared resources occurs when memory-intensive threads are co-scheduled on cores that share parts of the memory hierarchy, such as last-level caches and memory controllers. Previous work investigated how contention could be addressed via scheduling. A contention-aware scheduler separates competing threads onto separate memory hierarchy domains to eliminate resource sharing and, as a consequence, to mitigate contention. However, all previous work on contention-aware scheduling assumed that the underlying system is UMA (uniform memory access latencies, single memory controller). Modern multicore systems, however, are NUMA, which means that they feature non-uniform memory access latencies and multiple memory controllers. We discovered that state-of-the-art contention management algorithms fail to be effective on NUMA systems and may even hurt performance relative to a default OS scheduler. In this paper we investigate the causes for this behavior and design the first contention-aware algorithm for NUMA systems.

264 citations