scispace - formally typeset
Search or ask a question

Showing papers on "Scheduling (computing) published in 2011"


Journal ArticleDOI
01 Feb 2011
TL;DR: StarPU as mentioned in this paper is a runtime system that provides a high-level unified execution model for numerical kernel designers with a convenient way to generate parallel tasks over heterogeneous hardware and easily develop and tune powerful scheduling algorithms.
Abstract: In the field of HPC, the current hardware trend is to design multiprocessor architectures featuring heterogeneous technologies such as specialized coprocessors (e.g. Cell/BE) or data-parallel accelerators (e.g. GPUs). Approaching the theoretical performance of these architectures is a complex issue. Indeed, substantial efforts have already been devoted to efficiently offload parts of the computations. However, designing an execution model that unifies all computing units and associated embedded memory remains a main challenge. We therefore designed StarPU, an original runtime system providing a high-level, unified execution model tightly coupled with an expressive data management library. The main goal of StarPU is to provide numerical kernel designers with a convenient way to generate parallel tasks over heterogeneous hardware on the one hand, and easily develop and tune powerful scheduling algorithms on the other hand. We have developed several strategies that can be selected seamlessly at run-time, and we have analyzed their efficiency on several algorithms running simultaneously over multiple cores and a GPU. In addition to substantial improvements regarding execution times, we have obtained consistent superlinear parallelism by actually exploiting the heterogeneous nature of the machine. We eventually show that our dynamic approach competes with the highly optimized MAGMA library and overcomes the limitations of the corresponding static scheduling in a portable way. Copyright © 2010 John Wiley & Sons, Ltd.

1,116 citations


Journal ArticleDOI
TL;DR: The open-source framework LTE-Sim is presented to provide a complete performance verification of LTE networks and has been conceived to simulate uplink and downlink scheduling strategies in multicell/multiuser environments, taking into account user mobility, radio resource optimization, frequency reuse techniques, the adaptive modulation and coding module, and other aspects that are very relevant to the industrial and scientific communities.
Abstract: Long-term evolution (LTE) represents an emerging and promising technology for providing broadband ubiquitous Internet access. For this reason, several research groups are trying to optimize its performance. Unfortunately, at present, to the best of our knowledge, no open-source simulation platforms, which the scientific community can use to evaluate the performance of the entire LTE system, are freely available. The lack of a common reference simulator does not help the work of researchers and poses limitations on the comparison of results claimed by different research groups. To bridge this gap, herein, the open-source framework LTE-Sim is presented to provide a complete performance verification of LTE networks. LTE-Sim has been conceived to simulate uplink and downlink scheduling strategies in multicell/multiuser environments, taking into account user mobility, radio resource optimization, frequency reuse techniques, the adaptive modulation and coding module, and other aspects that are very relevant to the industrial and scientific communities. The effectiveness of the proposed simulator has been tested and verified considering 1) the software scalability test, which analyzes both memory and simulation time requirements; and 2) the performance evaluation of a realistic LTE network providing a comparison among well-known scheduling strategies.

685 citations


Proceedings ArticleDOI
15 Aug 2011
TL;DR: This work proposes a global management architecture and a set of algorithms that improve the transfer times of common communication patterns, such as broadcast and shuffle, and allow scheduling policies at the transfer level,such as prioritizing a transfer over other transfers.
Abstract: Cluster computing applications like MapReduce and Dryad transfer massive amounts of data between their computation stages. These transfers can have a significant impact on job performance, accounting for more than 50% of job completion times. Despite this impact, there has been relatively little work on optimizing the performance of these data transfers, with networking researchers traditionally focusing on per-flow traffic management. We address this limitation by proposing a global management architecture and a set of algorithms that (1) improve the transfer times of common communication patterns, such as broadcast and shuffle, and (2) allow scheduling policies at the transfer level, such as prioritizing a transfer over other transfers. Using a prototype implementation, we show that our solution improves broadcast completion times by up to 4.5X compared to the status quo in Hadoop. We also show that transfer-level scheduling can reduce the completion time of high-priority transfers by 1.7X.

612 citations


Proceedings ArticleDOI
12 Nov 2011
TL;DR: This paper presents an approach whereby the basic computing elements are virtual machines (VMs) of various sizes/costs, jobs are specified as workflows, users specify performance requirements by assigning (soft) deadlines to jobs, and the goal is to ensure all jobs are finished within their deadlines at minimum financial cost.
Abstract: A goal in cloud computing is to allocate (and thus pay for) only those cloud resources that are truly needed. To date, cloud practitioners have pursued schedule-based (e.g., time-of-day) and rule-based mechanisms to attempt to automate this matching between computing requirements and computing resources. However, most of these "auto-scaling" mechanisms only support simple resource utilization indicators and do not specifically consider both user performance requirements and budget concerns. In this paper, we present an approach whereby the basic computing elements are virtual machines (VMs) of various sizes/costs, jobs are specified as workflows, users specify performance requirements by assigning (soft) deadlines to jobs, and the goal is to ensure all jobs are finished within their deadlines at minimum financial cost. We accomplish our goal by dynamically allocating/deallocating VMs and scheduling tasks on the most cost-efficient instances. We evaluate our approach in four representative cloud workload patterns and show cost savings from 9.8% to 40.4% compared to other approaches.

556 citations


Proceedings ArticleDOI
14 Jun 2011
TL;DR: This work designs a MapReduce performance model and implements a novel SLO-based scheduler in Hadoop that determines job ordering and the amount of resources to allocate for meeting the job deadlines and validate the approach using a set of realistic applications.
Abstract: MapReduce and Hadoop represent an economically compelling alternative for efficient large scale data processing and advanced analytics in the enterprise. A key challenge in shared MapReduce clusters is the ability to automatically tailor and control resource allocations to different applications for achieving their performance goals. Currently, there is no job scheduler for MapReduce environments that given a job completion deadline, could allocate the appropriate amount of resources to the job so that it meets the required Service Level Objective (SLO). In this work, we propose a framework, called ARIA, to address this problem. It comprises of three inter-related components. First, for a production job that is routinely executed on a new dataset, we build a job profile that compactly summarizes critical performance characteristics of the underlying application during the map and reduce stages. Second, we design a MapReduce performance model, that for a given job (with a known profile) and its SLO (soft deadline), estimates the amount of resources required for job completion within the deadline. Finally, we implement a novel SLO-based scheduler in Hadoop that determines job ordering and the amount of resources to allocate for meeting the job deadlines.We validate our approach using a set of realistic applications. The new scheduler effectively meets the jobs' SLOs until the job demands exceed the cluster resources. The results of the extensive simulation study are validated through detailed experiments on a 66-node Hadoop cluster.

494 citations


Journal ArticleDOI
TL;DR: This work proposes a new parallel bi-objective hybrid genetic algorithm that takes into account, not only makespan, but also energy consumption, and focuses on the island parallel model and the multi-start parallel model.

327 citations


Proceedings ArticleDOI
12 Nov 2011
TL;DR: It is concluded that green datacenters and green-energy-aware scheduling can have a significant role in building a more sustainable IT ecosystem.
Abstract: In this paper, we propose GreenSlot, a parallel batch job scheduler for a datacenter powered by a photovoltaic solar array and the electrical grid (as a backup). GreenSlot predicts the amount of solar energy that will be available in the near future, and schedules the workload to maximize the green energy consumption while meeting the jobs' deadlines. If grid energy must be used to avoid deadline violations, the scheduler selects times when it is cheap. Our results for production scientific workloads demonstrate that Green-Slot can increase green energy consumption by up to 117% and decrease energy cost by up to 39%, compared to a conventional scheduler. Based on these positive results, we conclude that green datacenters and green-energy-aware scheduling can have a significant role in building a more sustainable IT ecosystem.

319 citations


Journal ArticleDOI
TL;DR: This work addresses the problem of scheduling precedence-constrained parallel applications on multiprocessor computer systems and presents two energy-conscious scheduling algorithms using dynamic voltage scaling (DVS) and a novel objective function and a variant from that.
Abstract: Traditionally, the primary performance goal of computer systems has focused on reducing the execution time of applications while increasing throughput. This performance goal has been mostly achieved by the development of high-density computer systems. As witnessed recently, these systems provide very powerful processing capability and capacity. They often consist of tens or hundreds of thousands of processors and other resource-hungry devices. The energy consumption of these systems has become a major concern. In this paper, we address the problem of scheduling precedence-constrained parallel applications on multiprocessor computer systems and present two energy-conscious scheduling algorithms using dynamic voltage scaling (DVS). A number of recent commodity processors are capable of DVS, which enables processors to operate at different voltage supply levels at the expense of sacrificing clock frequencies. In the context of scheduling, this multiple voltage facility implies that there is a trade-off between the quality of schedules and energy consumption. To effectively balance these two performance goals, we have devised a novel objective function and a variant from that. The main difference between the two algorithms is in their measurement of energy consumption. The extensive comparative evaluations conducted as part of this work show that the performance of our algorithms is very compelling in terms of both application completion time and energy consumption.

306 citations


Journal ArticleDOI
TL;DR: Numerical results suggest that incorporating the statistical knowledge into the scheduling policies can result in significant savings, especially for short tasks, and it is demonstrated with real price data from Commonwealth Edison that scheduling with mismatched modeling and online parameter estimation can still provide significant economic advantages to consumers.
Abstract: The problem of causally scheduling power consumption to minimize the expected cost at the consumer side is considered. The price of electricity is assumed to be time-varying. The scheduler has access to past and current prices, but only statistical knowledge about future prices, which it uses to make an optimal decision in each time period. The scheduling problem is naturally cast as a Markov decision process. Algorithms to find decision thresholds for both noninterruptible and interruptible loads under a deadline constraint are then developed. Numerical results suggest that incorporating the statistical knowledge into the scheduling policies can result in significant savings, especially for short tasks. It is demonstrated with real price data from Commonwealth Edison that scheduling with mismatched modeling and online parameter estimation can still provide significant economic advantages to consumers.

304 citations


Proceedings ArticleDOI
10 Apr 2011
TL;DR: A taxonomy for future CPU/GPU comparisons is suggested, and it is argued that this is not only germane for reporting performance, but is important to heterogeneous scheduling research in general.
Abstract: General purpose GPU Computing (GPGPU) has taken off in the past few years, with great promises for increased desktop processing power due to the large number of fast computing cores on high-end graphics cards. Many publications have demonstrated phenomenal performance and have reported speedups as much as 1000× over code running on multi-core CPUs. Other studies have claimed that well-tuned CPU code reduces the performance gap significantly. We demonstrate that this important discussion is missing a key aspect, specifically the question of where in the system data resides, and the overhead to move the data to where it will be used, and back again if necessary. We have benchmarked a broad set of GPU kernels on a number of platforms with different GPUs and our results show that when memory transfer times are included, it can easily take between 2 to 50× longer to run a kernel than the GPU processing time alone. Therefore, it is necessary to either include memory transfer overhead when reporting GPU performance, or to explain why this is not relevant for the application in question. We suggest a taxonomy for future CPU/GPU comparisons, and we argue that this is not only germane for reporting performance, but is important to heterogeneous scheduling research in general.

303 citations


Journal ArticleDOI
TL;DR: Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by today's IaaS clouds for both, task scheduling and execution.
Abstract: In recent years ad hoc parallel data processing has emerged to be one of the killer applications for Infrastructure-as-a-Service (IaaS) clouds. Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for customers to access these services and to deploy their programs. However, the processing frameworks which are currently used have been designed for static, homogeneous cluster setups and disregard the particular nature of a cloud. Consequently, the allocated compute resources may be inadequate for big parts of the submitted job and unnecessarily increase processing time and cost. In this paper, we discuss the opportunities and challenges for efficient parallel data processing in clouds and present our research project Nephele. Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by today's IaaS clouds for both, task scheduling and execution. Particular tasks of a processing job can be assigned to different types of virtual machines which are automatically instantiated and terminated during the job execution. Based on this new framework, we perform extended evaluations of MapReduce-inspired processing jobs on an IaaS cloud system and compare the results to the popular data processing framework Hadoop.

Journal ArticleDOI
TL;DR: This paper surveys single-project, single-objective, deterministic project scheduling problems in which activities can be processed using a finite or infinite number of modes concerning resources of various categories and types.

Proceedings Article
15 Jun 2011
TL;DR: TimeGraph is presented, a real-time GPU scheduler at the device-driver level for protecting important GPU workloads from performance interference and supports two priority-based scheduling policies in order to address the tradeoff between response times and throughput introduced by the asynchronous and non-preemptive nature of GPU processing.
Abstract: The Graphics Processing Unit (GPU) is now commonly used for graphics and data-parallel computing. As more and more applications tend to accelerate on the GPU in multi-tasking environments where multiple tasks access the GPU concurrently, operating systems must provide prioritization and isolation capabilities in GPU resource management, particularly in real-time setups. We present TimeGraph, a real-time GPU scheduler at the device-driver level for protecting important GPU workloads from performance interference. TimeGraph adopts a new event-driven model that synchronizes the GPU with the CPU to monitor GPU commands issued from the user space and control GPU resource usage in a responsive manner. TimeGraph supports two priority-based scheduling policies in order to address the tradeoff between response times and throughput introduced by the asynchronous and non-preemptive nature of GPU processing. Resource reservation mechanisms are also employed to account and enforce GPU resource usage, which prevent misbehaving tasks from exhausting GPU resources. Prediction of GPU command execution costs is further provided to enhance isolation. Our experiments using OpenGL graphics benchmarks demonstrate that TimeGraph maintains the frame-rates of primary GPU tasks at the desired level even in the face of extreme GPU workloads, whereas these tasks become nearly unresponsive without TimeGraph support. Our findings also include that the performance overhead imposed on TimeGraph can be limited to 4-10%, and its event-driven scheduler improves throughput by about 30 times over the existing tick-driven scheduler.

Proceedings Article
15 Jun 2011
TL;DR: The effects on performance imposed by resource contention and remote access latency are quantified and a new contention management algorithm is proposed and evaluated that significantly outperforms a NUMA-unaware algorithm proposed before as well as the default Linux scheduler.
Abstract: On multicore systems, contention for shared resources occurs when memory-intensive threads are co-scheduled on cores that share parts of the memory hierarchy, such as last-level caches and memory controllers. Previous work investigated how contention could be addressed via scheduling. A contention-aware scheduler separates competing threads onto separate memory hierarchy domains to eliminate resource sharing and, as a consequence, to mitigate contention. However, all previous work on contention-aware scheduling assumed that the underlying system is UMA (uniform memory access latencies, single memory controller). Modern multicore systems, however, are NUMA, which means that they feature non-uniform memory access latencies and multiple memory controllers. We discovered that state-of-the-art contention management algorithms fail to be effective on NUMA systems and may even hurt performance relative to a default OS scheduler. In this paper we investigate the causes for this behavior and design the first contention-aware algorithm for NUMA systems.

Journal ArticleDOI
TL;DR: This paper presents HCOC: The Hybrid Cloud Optimized Cost scheduling algorithm, which decides which resources should be leased from the public cloud and aggregated to the private cloud to provide sufficient processing power to execute a workflow within a given execution time.
Abstract: Workflows have been used to represent a variety of applications involving high processing and storage demands. As a solution to supply this necessity, the cloud computing paradigm has emerged as an on-demand resources provider. While public clouds charge users in a per-use basis, private clouds are owned by users and can be utilized with no charge. When a public cloud and a private cloud are merged, we have what we call a hybrid cloud. In a hybrid cloud, the user has elasticity provided by public cloud resources that can be aggregated to the private resources pool as necessary. One question faced by the users in such systems is: Which are the best resources to request from a public cloud based on the current demand and on resources costs? In this paper we deal with this problem, presenting HCOC: The Hybrid Cloud Optimized Cost scheduling algorithm. HCOC decides which resources should be leased from the public cloud and aggregated to the private cloud to provide sufficient processing power to execute a workflow within a given execution time. We present extensive experimental and simulation results which show that HCOC can reduce costs while achieving the established desired execution time.

Journal ArticleDOI
TL;DR: The design of a quality-of-service (QoS) aware packet scheduler for real-time downlink communications is considered, and a novel two-level scheduling algorithm is conceived based on discrete-time linear control theory.
Abstract: Long-term evolution represents an emerging technology that promises a broadband and ubiquitous Internet access. But several aspects have to be considered for providing effective multimedia services to mobile users. In particular, in this work, we consider the design of a quality-of-service (QoS) aware packet scheduler for real-time downlink communications. To this aim, a novel two-level scheduling algorithm is conceived. The upper level exploits an innovative approach based on discrete-time linear control theory. Instead, at the lower level, a proportional fair scheduler has been properly tailored to our purposes. The performance and the complexity of the proposed scheme have been evaluated both theoretically and by using simulations. A comparison with recently proposed scheduling strategies has been also presented, considering several network conditions and real-time multimedia flows. Particular attention has been devoted to the evaluation of the quality-of-experience (QoE) provided to end users. Results have clearly shown that the proposed approach is able to greatly outperform the existing ones especially in the presence of real-time video flows.

Proceedings ArticleDOI
01 Dec 2011
TL;DR: This paper considers the minimum electricity cost scheduling problem of smart home appliances, and the optimal power profile signal minimizes cost, while satisfying technical operation constraints and consumer preferences.
Abstract: This paper considers the minimum electricity cost scheduling problem of smart home appliances. Operation characteristics, such as expected duration and peak power consumption of the smart appliances, can be adjusted through a power profile signal. The optimal power profile signal minimizes cost, while satisfying technical operation constraints and consumer preferences. Constraints such as enforcing uninterruptible and sequential operations are modeled in the proposed framework using mixed integer linear programming (MILP). Several realistic scenarios based on actual spot price are considered, and the numerical results provide insight into tariff design. Computational issues and extensions of the proposed scheduling framework are also discussed.

01 Jan 2011
TL;DR: The experiments show that partitioned earliest-deadline first (EDF) scheduling is generally preferable in a hard real- time setting, whereas global and clustered EDF scheduling are effective in a soft real-time setting.
Abstract: With the widespread adoption of multicore architectures, multiprocessors are now a standard deployment platform for (soft) real-time applications. This dissertation addresses two questions fundamental to the design of multicore-ready real-time operating systems: (1) Which scheduling policies offer the greatest flexibility in satisfying temporal constraints; and (2) which locking algorithms should be used to avoid unpredictable delays? With regard to Question 1, LITMUSRT, a real-time extension of the Linux kernel, is presented and its design is discussed in detail. Notably, LITMUSRT implements link-based scheduling, a novel approach to controlling blocking due to non-preemptive sections. Each implemented scheduler (22 configurations in total) is evaluated under consideration of overheads on a 24-core Intel Xeon platform. The experiments show that partitioned earliest-deadline first (EDF) scheduling is generally preferable in a hard real-time setting, whereas global and clustered EDF scheduling are effective in a soft real-time setting. With regard to Question 2, real-time locking protocols are required to ensure that the maximum delay due to priority inversion can be bounded a priori. Several spinlock- and semaphore-based multiprocessor real-time locking protocols for mutual exclusion (mutex), reader-writer (RW) exclusion, and k-exclusion are proposed and analyzed. A new category of RW locks suited to worst-case analysis, termed phase-fair locks, is proposed and three efficient phase-fair spinlock implementations are provided (one with few atomic operations, one with low space requirements, and one with constant RMR complexity). Maximum priority-inversion blocking is proposed as a natural complexity measure for semaphore protocols. It is shown that there are two classes of schedulability analysis, namely suspension-oblivious and suspension-aware analysis, that yield two different lower bounds on blocking. Five asymptotically optimal locking protocols are designed and analyzed: a family of mutex, RW, and k-exclusion protocols for global, partitioned, and clustered scheduling that are asymptotically optimal in the suspension-oblivious case, and a mutex protocol for partitioned scheduling that is asymptotically optimal in the suspension-aware case. A LITMUSRT-based empirical evaluation is presented that shows these protocols to be practical.

Proceedings ArticleDOI
01 Dec 2011
TL;DR: A distributed suboptimal joint mode selection and resource allocation scheme is proposed that performs close to the optimal scheme both in terms of resource efficiency and user fairness and is benchmarked with respect to the centralized optimal solution.
Abstract: Device-to-device (D2D) communications underlaying a cellular infrastructure has recently been proposed as a means of increasing the cellular capacity, improving the user throughput and extending the battery lifetime of user equipments by facilitating the reuse of spectrum resources between D2D and cellular links. In network assisted D2D communications, when two devices are in the proximity of each other, the network can not only help the devices to set the appropriate transmit power and schedule time and frequency resources but also to determine whether communication should take place via the direct D2D link (D2D mode) or via the cellular base station (cellular mode). In this paper we formulate the joint mode selection, scheduling and power control task as an optimization problem that we first solve assuming the availability of a central entity. We also propose a distributed suboptimal joint mode selection and resource allocation scheme that we benchmark with respect to the centralized optimal solution. We find that the distributed scheme performs close to the optimal scheme both in terms of resource efficiency and user fairness.

Proceedings ArticleDOI
09 Oct 2011
TL;DR: Empirical evaluation shows that RT-Xen can provide effective real-time scheduling to guest Linux operating systems at a 1ms quantum, while incurring only moderate overhead for all the fixed-priority server algorithms.
Abstract: As system integration becomes an increasingly important challenge for complex real-time systems, there has been a significant demand for supporting real-time systems in virtualized environments. This paper presents RT-Xen, the first real-time hypervisor scheduling framework for Xen, the most widely used open-source virtual machine monitor (VMM). RT-Xen bridges the gap between real-time scheduling theory and Xen, whose wide-spread adoption makes it an attractive platform for integrating a broad range of real-time and embedded systems. Moreover, RT-Xen provides an open-source platform for researchers and integrators to develop and evaluate real-time scheduling techniques, which to date have been studied predominantly via analysis and simulations. Extensive experimental results demonstrate the feasibility, efficiency, and efficacy of fixed-priority hierarchical real-time scheduling in RT-Xen. RT-Xen instantiates a suite of fixed-priority servers (Deferrable Server, Periodic Server, Polling Server, and Sporadic Server). While the server algorithms are not new, this empirical study represents the first comprehensive experimental comparison of these algorithms within the same virtualization platform. Our empirical evaluation shows that RT-Xen can provide effective real-time scheduling to guest Linux operating systems at a 1ms quantum, while incurring only moderate overhead for all the fixed-priority server algorithms. While more complex algorithms such as Sporadic Server do incur higher overhead, none of the overhead differences among different server algorithms are significant. Deferrable Server generally delivers better soft real-time performance than the other server algorithms, while Periodic Server incurs high deadline miss ratios in overloaded situations.

Journal ArticleDOI
TL;DR: In this paper, the minimization of transmission completion time for a given number of bits per user in an energy harvesting communication system, where energy harvesting instants are known in an offline manner is considered.
Abstract: The minimization of transmission completion time for a given number of bits per user in an energy harvesting communication system, where energy harvesting instants are known in an offline manner is considered. An achievable rate region with structural properties satisfied by the 2-user AWGN Broadcast Channel capacity region is assumed. It is shown that even though all data are available at the beginning, a non-negative amount of energy from each energy harvest is deferred for later use such that the transmit power starts at its lowest value and rises as time progresses. The optimal scheduler ends the transmission to both users at the same time. Exploiting the special structure in the problem, the iterative offline algorithm, FlowRight, from earlier literature, is adapted and proved to solve this problem. The solution has polynomial complexity in the number of harvests used, and is observed to converge quickly on numerical examples.

Book ChapterDOI
01 Jan 2011
TL;DR: Intelligent Randomization In Scheduling (IRIS) system, a software scheduling assistant for the Federal Air Marshals that provide law enforcement aboard U.S. commercial flights, is implemented, with FAMS as leaders that commit to a flight coverage schedule and terrorists as followers that attempt to attack a flight.
Abstract: Security is a concern of major importance to governments and companies throughout the world. With limited resources, complete coverage of potential points of attack is not possible. Deterministic allocation of available law enforcement agents introduces predictable vulnerabilities that can be exploited by adversaries. Strategic randomization is a game theoretic alternative that we implement in Intelligent Randomization In Scheduling (IRIS) system, a software scheduling assistant for the Federal Air Marshals (FAMs) that provide law enforcement aboard U.S. commercial flights. In IRIS, we model the problem as a Stackelberg game, with FAMS as leaders that commit to a flight coverage schedule and terrorists as followers that attempt to attack a flight. The FAMS domain presents three challenges unique to transportation network security that we address in the implementation of IRIS. First, with tens of thousands of commercial flights per day, the size of the Stackelberg game we need to solve is tremendous. We use ERASERC, the fastest known algorithm for solving this class of Stackelberg games. Second, creating the game itself becomes a challenge due to number of payoffs we must enter for these large games. To address this, we create an attribute-based preference elicitation system to determine reward values. Third, the complex scheduling constraints in transportation networks make it computationally prohibitive to model the game by explicitly modeling all combinations of valid schedules. Instead, we model the leader’s strategy space by incorporating a representation of the underlying scheduling constraints. The scheduling assistant has been delivered to the FAMS and is currently undergoing testing and review for possible incorporation into their scheduling practices. In this paper, we discuss the design choices and challenges encountered during the implementation of IRIS.

Proceedings ArticleDOI
14 Jun 2011
TL;DR: This paper rethink resource allocation and job scheduling on a data analytics system in the cloud to embrace the heterogeneity of the underlying platforms and workloads and proposes a metric of share in a heterogeneous cluster to realize a scheduling scheme that achieves high performance and fairness.
Abstract: Data analytics are key applications running in the cloud computing environment. To improve performance and cost-effectiveness of a data analytics cluster in the cloud, the data analytics system should account for heterogeneity of the environment and workloads. In addition, it also needs to provide fairness among jobs when multiple jobs share the cluster. In this paper, we rethink resource allocation and job scheduling on a data analytics system in the cloud to embrace the heterogeneity of the underlying platforms and workloads. To that end, we suggest an architecture to allocate resources to a data analytics cluster in the cloud, and propose a metric of share in a heterogeneous cluster to realize a scheduling scheme that achieves high performance and fairness.

Journal ArticleDOI
TL;DR: An efficient distributed algorithm is proposed that produces a collision-free schedule for data aggregation in WSNs and it is theoretically proved that the delay of the aggregation schedule generated by the algorithm is at most 16R + Δ - 14 time slots.
Abstract: Data aggregation is a key functionality in wireless sensor networks (WSNs). This paper focuses on data aggregation scheduling problem to minimize the delay (or latency). We propose an efficient distributed algorithm that produces a collision-free schedule for data aggregation in WSNs. We theoretically prove that the delay of the aggregation schedule generated by our algorithm is at most 16R + Δ - 14 time slots. Here, R is the network radius and Δ is the maximum node degree in the communication graph of the original network. Our algorithm significantly improves the previously known best data aggregation algorithm with an upper bound of delay of 24D + 6Δ + 16 time slots, where D is the network diameter (note that D can be as large as 2R). We conduct extensive simulations to study the practical performances of our proposed data aggregation algorithm. Our simulation results corroborate our theoretical results and show that our algorithms perform better in practice. We prove that the overall lower bound of delay for data aggregation under any interference model is max{log n,R}, where n is the network size. We provide an example to show that the lower bound is (approximately) tight under the protocol interference model when rI = r, where rI is the interference range and r is the transmission range. We also derive the lower bound of delay under the protocol interference model when r <; rI <; 3r and rI ≥ 3r.

Proceedings ArticleDOI
29 Nov 2011
TL;DR: A new task decomposition method is proposed that decomposes each parallel task into a set of sequential tasks and achieves a resource augmentation bound of 2.62 when the decomposed tasks are scheduled using global EDF and partitioned deadline monotonic scheduling, respectively.
Abstract: Multi-core processors offer a significant performance increase over single core processors. Therefore, they have the potential to enable computation-intensive real-time applications with stringent timing constraints that cannot be met on traditional single-core processors. However, most results in traditional multiprocessor real-time scheduling are limited to sequential programming models and ignore intra-task parallelism. In this paper, we address the problem of scheduling periodic parallel tasks with implicit deadlines on multi-core processors. We first consider a synchronous task model where each task consists of segments, each segment having an arbitrary number of parallel threads that synchronize at the end of the segment. We propose a new task decomposition method that decomposes each parallel task into a set of sequential tasks. We prove that our task decomposition achieves a resource augmentation bound of 2.62 and 3.42 when the decomposed tasks are scheduled using global EDF and partitioned deadline monotonic scheduling, respectively. Finally, we extend our analysis to directed a cyclic graph tasks. We show how these tasks can be converted into synchronous tasks such that the same transformation can be applied and the same augmentation bounds hold.

Journal ArticleDOI
TL;DR: A new routing/scheduling back-pressure algorithm that not only guarantees network stability (throughput optimality), but also adaptively selects a set of optimal routes based on shortest-path information in order to minimize average path lengths between each source and destination pair is proposed.
Abstract: Back-pressure-type algorithms based on the algorithm by Tassiulas and Ephremides have recently received much attention for jointly routing and scheduling over multihop wireless networks. However, this approach has a significant weakness in routing because the traditional back-pressure algorithm explores and exploits all feasible paths between each source and destination. While this extensive exploration is essential in order to maintain stability when the network is heavily loaded, under light or moderate loads, packets may be sent over unnecessarily long routes, and the algorithm could be very inefficient in terms of end-to-end delay and routing convergence times. This paper proposes a new routing/scheduling back-pressure algorithm that not only guarantees network stability (throughput optimality), but also adaptively selects a set of optimal routes based on shortest-path information in order to minimize average path lengths between each source and destination pair. Our results indicate that under the traditional back-pressure algorithm, the end-to-end packet delay first decreases and then increases as a function of the network load (arrival rate). This surprising low-load behavior is explained due to the fact that the traditional back-pressure algorithm exploits all paths (including very long ones) even when the traffic load is light. On the other-hand, the proposed algorithm adaptively selects a set of routes according to the traffic load so that long paths are used only when necessary, thus resulting in much smaller end-to-end packet delays as compared to the traditional back-pressure algorithm .

Journal ArticleDOI
TL;DR: A hybrid Pareto-based discrete artificial bee colony algorithm for solving the multi-objective flexible job shop scheduling problem and comparisons with other recently published algorithms show the efficiency and effectiveness of the proposed algorithm.
Abstract: This paper presents a hybrid Pareto-based discrete artificial bee colony algorithm for solving the multi-objective flexible job shop scheduling problem. In the hybrid algorithm, each solution corresponds to a food source, which composes of two components, i.e., the routing component and the scheduling component. Each component is filled with discrete values. A crossover operator is developed for the employed bees to learn valuable information from each other. An external Pareto archive set is designed to record the non-dominated solutions found so far. A fast Pareto set update function is introduced in the algorithm. Several local search approaches are designed to balance the exploration and exploitation capability of the algorithm. Experimental results on the well-known benchmark instances and comparisons with other recently published algorithms show the efficiency and effectiveness of the proposed algorithm.

Proceedings ArticleDOI
26 Oct 2011
TL;DR: This paper develops methodologies for incorporating task placement constraints and machine properties into performance benchmarks of large compute clusters and provides a simple model of the performance impact of constraints in that task scheduling delays increase with UM.
Abstract: Evaluating the performance of large compute clusters requires benchmarks with representative workloads. At Google, performance benchmarks are used to obtain performance metrics such as task scheduling delays and machine resource utilizations to assess changes in application codes, machine configurations, and scheduling algorithms. Existing approaches to workload characterization for high performance computing and grids focus on task resource requirements for CPU, memory, disk, I/O, network, etc. Such resource requirements address how much resource is consumed by a task. However, in addition to resource requirements, Google workloads commonly include task placement constraints that determine which machine resources are consumed by tasks. Task placement constraints arise because of task dependencies such as those related to hardware architecture and kernel version. This paper develops methodologies for incorporating task placement constraints and machine properties into performance benchmarks of large compute clusters. Our studies of Google compute clusters show that constraints increase average task scheduling delays by a factor of 2 to 6, which often results in tens of minutes of additional task wait time. To understand why, we extend the concept of resource utilization to include constraints by introducing a new metric, the Utilization Multiplier (UM). UM is the ratio of the resource utilization seen by tasks with a constraint to the average utilization of the resource. UM provides a simple model of the performance impact of constraints in that task scheduling delays increase with UM. Last, we describe how to synthesize representative task constraints and machine properties, and how to incorporate this synthesis into existing performance benchmarks. Using synthetic task constraints and machine properties generated by our methodology, we accurately reproduce performance metrics for benchmarks of Google compute clusters with a discrepancy of only 13% in task scheduling delay and 5% in resource utilization.

Proceedings ArticleDOI
10 Apr 2011
TL;DR: System-level simulation results show that coordination at the transmission strategy and resource allocation level can already significantly improve the overall network throughput as compared to a conventional network design with fixed transmit power and per-cell zero-forcing beamforming.
Abstract: The mitigation of intercell interference is a central issue for future-generation wireless cellular networks where frequencies are reused aggressively and where hierarchical cellular structures may heavily overlap. The paper examines the benefit of coordinating transmission strategies and resource allocation schemes across multiple cells for interference mitigation. For a multicell network serving multiple users per cell sectors and where both the base-stations and the remote users are equipped with multiple antennas, this paper proposes a joint proportionally fair scheduling, spatial multiplexing, and power spectrum adaptation method that coordinates multiple base-stations with an objective of optimizing the overall network utility. The proposed scheme optimizes the user schedule, transmit and receive beamforming vectors, and transmit power spectra jointly, while taking into consideration both the intercell and intracell interference and the fairness among the users. The proposed system is shown to significantly improve the overall network throughput while maintaining fairness as compared to a conventional network with per-cell zero-forcing beamforming and with fixed transmit power spectrum. The proposed system goes toward the vision of a fully coordinated multicell network, whereby transmission strategies and resource allocation schemes (rather than transmit signals) are coordinated across the base-stations as a first step.

Journal Article
TL;DR: A Load Balanced Min-Min (LBMM) algorithm is proposed that reduces the makespan and increases the resource utilization in grid computing and it is shown that the proposed method has two-phases.
Abstract: Grid computing has become a real alternative to traditional supercomputing environments for developing parallel applications that harness massive computational resources. However, the complexity incurred in building such parallel Grid-aware applications is higher than the traditional parallel computing environments. It addresses issues such as resource discovery, heterogeneity, fault tolerance and task scheduling. Load balanced task scheduling is very important problem in complex grid environment. So task scheduling which is one of the NP-Complete problems becomes a focus of research scholars in grid computing area. The traditional Min-Min algorithm is a simple algorithm that produces a schedule that minimizes the makespan than the other traditional algorithms in the literature. But it fails to produce a load balanced schedule. In this paper a Load Balanced Min-Min (LBMM) algorithm is proposed that reduces the makespan and increases the resource utilization. The proposed method has two-phases. In the first phase the traditional Min-Min algorithm is executed and in the second phase the tasks are rescheduled to use the unutilized resources effectively.