scispace - formally typeset
Search or ask a question

Showing papers on "Scheduling (computing) published in 2010"


Book
20 Sep 2010
TL;DR: In this article, the authors present a modern theory of analysis, control, and optimization for dynamic networks, including wireless networks with time-varying channels, mobility, and randomly arriving traffic.
Abstract: This text presents a modern theory of analysis, control, and optimization for dynamic networks. Mathematical techniques of Lyapunov drift and Lyapunov optimization are developed and shown to enable constrained optimization of time averages in general stochastic systems. The focus is on communication and queueing systems, including wireless networks with time-varying channels, mobility, and randomly arriving traffic. A simple drift-plus-penalty framework is used to optimize time averages such as throughput, throughput-utility, power, and distortion. Explicit performance-delay tradeoffs are provided to illustrate the cost of approaching optimality. This theory is also applicable to problems in operations research and economics, where energy-efficient and profit-maximizing decisions must be made without knowing the future. Topics in the text include the following: - Queue stability theory - Backpressure, max-weight, and virtual queue methods - Primal-dual methods for non-convex stochastic utility maximization - Universal scheduling theory for arbitrary sample paths - Approximate and randomized scheduling theory - Optimization of renewal systems and Markov decision systems Detailed examples and numerous problem set questions are provided to reinforce the main concepts. Table of Contents: Introduction / Introduction to Queues / Dynamic Scheduling Example / Optimizing Time Averages / Optimizing Functions of Time Averages / Approximate Scheduling / Optimization of Renewal Systems / Conclusions

1,781 citations


Journal ArticleDOI
TL;DR: A diversified and detailed overview of recent operational research on operating room planning and scheduling is obtained that facilitates the identification of manuscripts related to the reader's specific interests.

1,099 citations


Journal ArticleDOI
TL;DR: An overview over various extensions of the basic RCPSP, including popular variants and extensions such as multiple modes, minimal and maximal time lags, and net present value-based objectives, is given.

856 citations


Proceedings ArticleDOI
20 Apr 2010
TL;DR: This paper presents a particle swarm optimization (PSO) based heuristic to schedule applications to cloud resources that takes into account both computation cost and data transmission cost, and shows that PSO can achieve as much as 3 times cost savings as compared to BRS.
Abstract: Cloud computing environments facilitate applications by providing virtualized resources that can be provisioned dynamically. However, users are charged on a pay-per-use basis. User applications may incur large data retrieval and execution costs when they are scheduled taking into account only the ‘execution time’. In addition to optimizing execution time, the cost arising from data transfers between resources as well as execution costs must also be taken into account. In this paper, we present a particle swarm optimization (PSO) based heuristic to schedule applications to cloud resources that takes into account both computation cost and data transmission cost. We experiment with a workflow application by varying its computation and communication costs. We compare the cost savings when using PSO and existing ‘Best Resource Selection’ (BRS) algorithm. Our results show that PSO can achieve: a) as much as 3 times cost savings as compared to BRS, and b) good distribution of workload onto resources.

837 citations


Journal ArticleDOI
TL;DR: This work improves the basic formulation of cooperative PSO by introducing stochastic repulsion among the particles and simultaneously scheduling all DER schedules, to investigate the potential consumer value added by coordinated DER scheduling.
Abstract: We describe algorithmic enhancements to a decision-support tool that residential consumers can utilize to optimize their acquisition of electrical energy services. The decision-support tool optimizes energy services provision by enabling end users to first assign values to desired energy services, and then scheduling their available distributed energy resources (DER) to maximize net benefits. We chose particle swarm optimization (PSO) to solve the corresponding optimization problem because of its straightforward implementation and demonstrated ability to generate near-optimal schedules within manageable computation times. We improve the basic formulation of cooperative PSO by introducing stochastic repulsion among the particles. The improved DER schedules are then used to investigate the potential consumer value added by coordinated DER scheduling. This is computed by comparing the end-user costs obtained with the enhanced algorithm simultaneously scheduling all DER, against the costs when each DER schedule is solved separately. This comparison enables the end users to determine whether their mix of energy service needs, available DER and electricity tariff arrangements might warrant solving the more complex coordinated scheduling problem, or instead, decomposing the problem into multiple simpler optimizations.

824 citations


Posted Content
TL;DR: This work develops optimal off-line scheduling policies which minimize the time by which all packets are delivered to the destination, under causality constraints on both data and energy arrivals.
Abstract: We consider the optimal packet scheduling problem in a single-user energy harvesting wireless communication system. In this system, both the data packets and the harvested energy are modeled to arrive at the source node randomly. Our goal is to adaptively change the transmission rate according to the traffic load and available energy, such that the time by which all packets are delivered is minimized. Under a deterministic system setting, we assume that the energy harvesting times and harvested energy amounts are known before the transmission starts. For the data traffic arrivals, we consider two different scenarios. In the first scenario, we assume that all bits have arrived and are ready at the transmitter before the transmission starts. In the second scenario, we consider the case where packets arrive during the transmissions, with known arrival times and sizes. We develop optimal off-line scheduling policies which minimize the time by which all packets are delivered to the destination, under causality constraints on both data and energy arrivals.

747 citations


Journal ArticleDOI
01 Jun 2010
TL;DR: An adaptive carrier sense multiple access (CSMA) scheduling algorithm that can achieve the maximal throughput distributively and is combined with congestion control to achieve the optimal utility and fairness of competing flows.
Abstract: In multihop wireless networks, designing distributed scheduling algorithms to achieve the maximal throughput is a challenging problem because of the complex interference constraints among different links. Traditional maximal-weight scheduling (MWS), although throughput-optimal, is difficult to implement in distributed networks. On the other hand, a distributed greedy protocol similar to IEEE 802.11 does not guarantee the maximal throughput. In this paper, we introduce an adaptive carrier sense multiple access (CSMA) scheduling algorithm that can achieve the maximal throughput distributively. Some of the major advantages of the algorithm are that it applies to a very general interference model and that it is simple, distributed, and asynchronous. Furthermore, the algorithm is combined with congestion control to achieve the optimal utility and fairness of competing flows. Simulations verify the effectiveness of the algorithm. Also, the adaptive CSMA scheduling is a modular MAC-layer algorithm that can be combined with various protocols in the transport layer and network layer. Finally, the paper explores some implementation issues in the setting of 802.11 networks.

697 citations


Journal ArticleDOI
13 Mar 2010
TL;DR: This study is the first to provide a comprehensive analysis of contention-mitigating techniques that use only scheduling, and finds a classification scheme that addresses not only contention for cache space, but contention for other shared resources, such as the memory controller, memory bus and prefetching hardware.
Abstract: Contention for shared resources on multicore processors remains an unsolved problem in existing systems despite significant research efforts dedicated to this problem in the past. Previous solutions focused primarily on hardware techniques and software page coloring to mitigate this problem. Our goal is to investigate how and to what extent contention for shared resource can be mitigated via thread scheduling. Scheduling is an attractive tool, because it does not require extra hardware and is relatively easy to integrate into the system. Our study is the first to provide a comprehensive analysis of contention-mitigating techniques that use only scheduling. The most difficult part of the problem is to find a classification scheme for threads, which would determine how they affect each other when competing for shared resources. We provide a comprehensive analysis of such classification schemes using a newly proposed methodology that enables to evaluate these schemes separately from the scheduling algorithm itself and to compare them to the optimal. As a result of this analysis we discovered a classification scheme that addresses not only contention for cache space, but contention for other shared resources, such as the memory controller, memory bus and prefetching hardware. To show the applicability of our analysis we design a new scheduling algorithm, which we prototype at user level, and demonstrate that it performs within 2\% of the optimal. We also conclude that the highest impact of contention-aware scheduling techniques is not in improving performance of a workload as a whole but in improving quality of service or performance isolation for individual applications.

532 citations


Proceedings ArticleDOI
01 Apr 2010
TL;DR: It is shown that the implementation of least-attained-service thread prioritization reduces the time the cores spend stalling and significantly improves system throughput, and ATLAS's performance benefit increases as the number of cores increases.
Abstract: Modern chip multiprocessor (CMP) systems employ multiple memory controllers to control access to main memory. The scheduling algorithm employed by these memory controllers has a significant effect on system throughput, so choosing an efficient scheduling algorithm is important. The scheduling algorithm also needs to be scalable — as the number of cores increases, the number of memory controllers shared by the cores should also increase to provide sufficient bandwidth to feed the cores. Unfortunately, previous memory scheduling algorithms are inefficient with respect to system throughput and/or are designed for a single memory controller and do not scale well to multiple memory controllers, requiring significant finegrained coordination among controllers. This paper proposes ATLAS (Adaptive per-Thread Least-Attained-Service memory scheduling), a fundamentally new memory scheduling technique that improves system throughput without requiring significant coordination among memory controllers. The key idea is to periodically order threads based on the service they have attained from the memory controllers so far, and prioritize those threads that have attained the least service over others in each period. The idea of favoring threads with least-attained-service is borrowed from the queueing theory literature, where, in the context of a single-server queue it is known that least-attained-service optimally schedules jobs, assuming a Pareto (or any decreasing hazard rate) workload distribution. After verifying that our workloads have this characteristic, we show that our implementation of least-attained-service thread prioritization reduces the time the cores spend stalling and significantly improves system throughput. Furthermore, since the periods over which we accumulate the attained service are long, the controllers coordinate very infrequently to form the ordering of threads, thereby making ATLAS scalable to many controllers. We evaluate ATLAS on a wide variety of multiprogrammed SPEC 2006 workloads and systems with 4–32 cores and 1–16 memory controllers, and compare its performance to five previously proposed scheduling algorithms. Averaged over 32 workloads on a 24-core system with 4 controllers, ATLAS improves instruction throughput by 10.8%, and system throughput by 8.4%, compared to PAR-BS, the best previous CMP memory scheduling algorithm. ATLAS's performance benefit increases as the number of cores increases.

439 citations


Journal ArticleDOI
TL;DR: A distributed algorithm based on the distributed coloring of the nodes, that increases the delay by a factor of 10–70 over centralized algorithms for 1000 nodes, and obtain upper bound for these schedules as a function of the total number of packets generated in the network.
Abstract: Algorithms for scheduling TDMA transmissions in multi-hop networks usually determine the smallest length conflict-free assignment of slots in which each link or node is activated at least once. This is based on the assumption that there are many independent point-to-point flows in the network. In sensor networks however often data are transferred from the sensor nodes to a few central data collectors. The scheduling problem is therefore to determine the smallest length conflict-free assignment of slots during which the packets generated at each node reach their destination. The conflicting node transmissions are determined based on an interference graph, which may be different from connectivity graph due to the broadcast nature of wireless transmissions. We show that this problem is NP-complete. We first propose two centralized heuristic algorithms: one based on direct scheduling of the nodes or node-based scheduling, which is adapted from classical multi-hop scheduling algorithms for general ad hoc networks, and the other based on scheduling the levels in the routing tree before scheduling the nodes or level-based scheduling, which is a novel scheduling algorithm for many-to-one communication in sensor networks. The performance of these algorithms depends on the distribution of the nodes across the levels. We then propose a distributed algorithm based on the distributed coloring of the nodes, that increases the delay by a factor of 10---70 over centralized algorithms for 1000 nodes. We also obtain upper bound for these schedules as a function of the total number of packets generated in the network.

381 citations


Journal ArticleDOI
TL;DR: In this paper, a survey of deterministic scheduling problems with availability constraints motivated by preventive maintenance is presented, where complexity results, exact algorithms and approximation algorithms in single machine, parallel machine, flow shop, open shop, job shop scheduling environment with different criteria are surveyed briefly.

Proceedings ArticleDOI
04 Dec 2010
TL;DR: This paper presents a new memory scheduling algorithm that addresses system throughput and fairness separately with the goal of achieving the best of both, and evaluates TCM on a wide variety of multiprogrammed workloads and compares its performance to four previously proposed scheduling algorithms, finding that TCM achieves both the best system throughputand fairness.
Abstract: In a modern chip-multiprocessor system, memory is a shared resource among multiple concurrently executing threads. The memory scheduling algorithm should resolve memory contention by arbitrating memory access in such a way that competing threads progress at a relatively fast and even pace, resulting in high system throughput and fairness. Previously proposed memory scheduling algorithms are predominantly optimized for only one of these objectives: no scheduling algorithm provides the best system throughput and best fairness at the same time. This paper presents a new memory scheduling algorithm that addresses system throughput and fairness separately with the goal of achieving the best of both. The main idea is to divide threads into two separate clusters and employ different memory request scheduling policies in each cluster. Our proposal, Thread Cluster Memory scheduling (TCM), dynamically groups threads with similar memory access behavior into either the latency-sensitive (memory-non-intensive) or the bandwidth-sensitive (memory-intensive) cluster. TCM introduces three major ideas for prioritization: 1) we prioritize the latency-sensitive cluster over the bandwidth-sensitive cluster to improve system throughput, 2) we introduce a ``niceness'' metric that captures a thread's propensity to interfere with other threads, 3) we use niceness to periodically shuffle the priority order of the threads in the bandwidth-sensitive cluster to provide fair access to each thread in a way that reduces inter-thread interference. On the one hand, prioritizing memory-non-intensive threads significantly improves system throughput without degrading fairness, because such ``light'' threads only use a small fraction of the total available memory bandwidth. On the other hand, shuffling the priority order of memory-intensive threads improves fairness because it ensures no thread is disproportionately slowed down or starved. We evaluate TCM on a wide variety of multiprogrammed workloads and compare its performance to four previously proposed scheduling algorithms, finding that TCM achieves both the best system throughput and fairness. Averaged over 96 workloads on a 24-core system with 4 memory channels, TCM improves system throughput and reduces maximum slowdown by 4.6%/38.6% compared to ATLAS (previous work providing the best system throughput) and 7.6%/4.6% compared to PAR-BS (previous work providing the best fairness).

Journal ArticleDOI
TL;DR: The paper reveals the complexity of the scheduling problem in Computational Grids when compared to scheduling in classical parallel and distributed systems and shows the usefulness of heuristic and meta-heuristic approaches for the design of efficient Grid schedulers.

Proceedings ArticleDOI
05 Jul 2010
TL;DR: This work analyzes and proposes a binary integer program formulation of the scheduling problem and finds that this approach results in a tractable solution for scheduling applications in the public cloud, but that the same method becomes much less feasible in a hybrid cloud setting due to very high solve time variances.
Abstract: With the recent emergence of public cloud offerings, surge computing –outsourcing tasks from an internal data center to a cloud provider in times of heavy load– has become more accessible to a wide range of consumers. Deciding which workloads to outsource to what cloud provider in such a setting, however, is far from trivial. The objective of this decision is to maximize the utilization of the internal data center and to minimize the cost of running the outsourced tasks in the cloud, while fulfilling the applications’ quality of service constraints. We examine this optimization problem in a multi-provider hybrid cloud setting with deadline-constrained and preemptible but non-provider-migratable workloads that are characterized by memory, CPU and data transmission requirements. Linear programming is a general technique to tackle such an optimization problem. At present, it is however unclear whether this technique is suitable for the problem at hand and what the performance implications of its use are. We therefore analyze and propose a binary integer program formulation of the scheduling problem and evaluate the computational costs of this technique with respect to the problem’s key parameters. We found out that this approach results in a tractable solution for scheduling applications in the public cloud, but that the same method becomes much less feasible in a hybrid cloud setting due to very high solve time variances.

Proceedings ArticleDOI
18 Dec 2010
TL;DR: Experimental results prove that this scheduling strategy on load balancing of VM resources based on genetic algorithm is able to realize load balancing and reasonable resources utilization both when system load is stable and variant.
Abstract: The current virtual machine(VM) resources scheduling in cloud computing environment mainly considers the current state of the system but seldom considers system variation and historical data, which always leads to load imbalance of the system. In view of the load balancing problem in VM resources scheduling, this paper presents a scheduling strategy on load balancing of VM resources based on genetic algorithm. According to historical data and current state of the system and through genetic algorithm, this strategy computes ahead the influence it will have on the system after the deployment of the needed VM resources and then chooses the least-affective solution, through which it achieves the best load balancing and reduces or avoids dynamic migration. This strategy solves the problem of load imbalance and high migration cost by traditional algorithms after scheduling. Experimental results prove that this method is able to realize load balancing and reasonable resources utilization both when system load is stable and variant.

Proceedings ArticleDOI
09 Jul 2010
TL;DR: A two-phase scheduling algorithm under a three-level cloud computing network is advanced that combines OLB (Opportunistic Load Balancing) and LBMM (Load Balance Min-Min) scheduling algorithms that can utilize more better executing efficiency and maintain the load balancing of system.
Abstract: Network bandwidth and hardware technology are developing rapidly, resulting in the vigorous development of the Internet. A new concept, cloud computing, uses low-power hosts to achieve high reliability. The cloud computing, an Internet-based development in which dynamically scalable and often virtualized resources are provided as a service over the Internet has become a significant issue. The cloud computing refers to a class of systems and applications that employ distributed resources to perform a function in a decentralized manner. Cloud computing is to utilize the computing resources (service nodes) on the network to facilitate the execution of complicated tasks that require large-scale computation. Thus, the selecting nodes for executing a task in the cloud computing must be considered, and to exploit the effectiveness of the resources, they have to be properly selected according to the properties of the task. However, in this study, a two-phase scheduling algorithm under a three-level cloud computing network is advanced. The proposed scheduling algorithm combines OLB (Opportunistic Load Balancing) and LBMM (Load Balance Min-Min) scheduling algorithms that can utilize more better executing efficiency and maintain the load balancing of system.

Journal ArticleDOI
TL;DR: A bi-population genetic algorithm is applied, which makes use of two separate populations and extend the serial schedule generation scheme by introducing a mode improvement procedure, which reveals that this procedure is amongst the most competitive algorithms.

Proceedings ArticleDOI
19 Jun 2010
TL;DR: This paper introduces dynamic warp subdivision (DWS), which allows a single warp to occupy more than one slot in the scheduler without requiring extra register file space, and improves latency hiding and memory level parallelism (MLP).
Abstract: SIMD organizations amortize the area and power of fetch, decode, and issue logic across multiple processing units in order to maximize throughput for a given area and power budget. However, throughput is reduced when a set of threads operating in lockstep (a warp) are stalled due to long latency memory accesses. The resulting idle cycles are extremely costly. Multi-threading can hide latencies by interleaving the execution of multiple warps, but deep multi-threading using many warps dramatically increases the cost of the register files (multi-threading depth x SIMD width), and cache contention can make performance worse. Instead, intra-warp latency hiding should first be exploited. This allows threads that are ready but stalled by SIMD restrictions to use these idle cycles and reduces the need for multi-threading among warps. This paper introduces dynamic warp subdivision (DWS), which allows a single warp to occupy more than one slot in the scheduler without requiring extra register file space. Independent scheduling entities allow divergent branch paths to interleave their execution, and allow threads that hit to run ahead. The result is improved latency hiding and memory level parallelism (MLP). We evaluate the technique on a coherent cache hierarchy with private L1 caches and a shared L2 cache. With an area overhead of less than 1%, experiments with eight data-parallel benchmarks show our technique improves performance on average by 1.7X.

Proceedings ArticleDOI
20 Sep 2010
TL;DR: It is shown that location alone is not sufficient to predict signal strength and motivate the use of tracks to enable effective prediction, and energy-aware scheduling algorithms for different workloads are developed via simulation driven by traces obtained during actual drives, demonstrating energy savings of up to 60%.
Abstract: Cellular radios consume more power and suffer reduced data rate when the signal is weak. According to our measurements, the communication energy per bit can be as much as 6x higher when the signal is weak than when it is strong. To realize energy savings, applications must preferentially communicate when the signal is strong, either by deferring non-urgent communication or by advancing anticipated communication to coincide with periods of strong signal. Allowing applications to perform such scheduling requires predicting signal strength, so that opportunities for energy-efficient communication can be anticipated. Furthermore, such prediction must be performed at little energy cost. In this paper, we make several contributions towards a practical system for energy-aware cellular data scheduling called Bartendr. First, we establish, via measurements, the relationship between signal strength and power consumption. Second, we show that location alone is not sufficient to predict signal strength and motivate the use of tracks to enable effective prediction. Finally, we develop energy-aware scheduling algorithms for different workloads - syncing and streaming - and evaluate these via simulation driven by traces obtained during actual drives, demonstrating energy savings of up to 60%. Our experiments have been performed on four cellular networks across two large metropolitan areas, one in India and the other in the U.S.

Proceedings ArticleDOI
30 Nov 2010
TL;DR: Extensive simulations based on both random topologies and real network topologies of a physical testbed demonstrate that C-LLF is highly effective in meeting end-to-end deadlines in WirelessHART networks, and significantly outperforms common real-time scheduling policies.
Abstract: WirelessHART is an open wireless sensor-actuator network standard for industrial process monitoring and control that requires real-time data communication between sensor and actuator devices. Salient features of a WirelessHART network include a centralized network management architecture, multi-channel TDMA transmission, redundant routes, and avoidance of spatial reuse of channels for enhanced reliability and real-time performance. This paper makes several key contributions to real-time transmission scheduling in WirelessHART networks: (1) formulation of the end-to-end real-time transmission scheduling problem based on the characteristics of WirelessHART, (2) proof of NP-hardness of the problem, (3) an optimal branch-and-bound scheduling algorithm based on a necessary condition for schedulability, and (4) an efficient and practical heuristic-based scheduling algorithm called Conflict-aware Least Laxity First (C-LLF). Extensive simulations based on both random topologies and real network topologies of a physical testbed demonstrate that C-LLF is highly effective in meeting end-to-end deadlines in WirelessHART networks, and significantly outperforms common real-time scheduling policies.

Proceedings ArticleDOI
13 Apr 2010
TL;DR: This paper implemented bias scheduling over the Linux scheduler on a real system that models microarchitectural differences accurately and found that it can improve system performance significantly, and in proportion to the application bias diversity present in the workload.
Abstract: Heterogeneous architectures that integrate a mix of big and small cores are very attractive because they can achieve high single-threaded performance while enabling high performance thread-level parallelism with lower energy costs. Despite their benefits, they pose significant challenges to the operating system software. Thread scheduling is one of the most critical challenges.In this paper we propose bias scheduling for heterogeneous systems with cores that have different microarchitectures and performance.We identify key metrics that characterize an application bias, namely the core type that best suits its resource needs. By dynamically monitoring application bias, the operating system is able to match threads to the core type that can maximize system throughput. Bias scheduling takes advantage of this by influencing the existing scheduler to select the core type that bests suits the application when performing load balancing operations.Bias scheduling can be implemented on top of most existing schedulers since its impact is limited to changes in the load balancing code. In particular, we implemented it over the Linux scheduler on a real system that models microarchitectural differences accurately and found that it can improve system performance significantly, and in proportion to the application bias diversity present in the workload. Unlike previous work, bias scheduling does not require sampling of CPI on all core types or offline profiling. We also expose the limits of dynamic voltage/frequency scaling as an evaluation vehicle for heterogeneous systems.

Proceedings ArticleDOI
04 Oct 2010
TL;DR: FlexSC, an implementation of exceptionless system calls in the Linux kernel, and an accompanying user-mode thread package (Flex SC-Threads), binary compatible with POSIX threads, that translates legacy synchronous system calls into exception-less ones transparently to applications are presented.
Abstract: For the past 30+ years, system calls have been the de facto interface used by applications to request services from the operating system kernel. System calls have almost universally been implemented as a synchronous mechanism, where a special processor instruction is used to yield userspace execution to the kernel. In the first part of this paper, we evaluate the performance impact of traditional synchronous system calls on system intensive workloads. We show that synchronous system calls negatively affect performance in a significant way, primarily because of pipeline flushing and pollution of key processor structures (e.g., TLB, data and instruction caches, etc.).We propose a new mechanism for applications to request services from the operating system kernel: exception-less system calls. They improve processor efficiency by enabling flexibility in the scheduling of operating system work, which in turn can lead to significantly increased temporal and spacial locality of execution in both user and kernel space, thus reducing pollution effects on processor structures. Exception-less system calls are particularly effective on multicore processors. They primarily target highly threaded server applications, such as Web servers and database servers.We present FlexSC, an implementation of exceptionless system calls in the Linux kernel, and an accompanying user-mode thread package (FlexSC-Threads), binary compatible with POSIX threads, that translates legacy synchronous system calls into exception-less ones transparently to applications. We show how FlexSC improves performance of Apache by up to 116%, MySQL by up to 40%, and BIND by up to 105% while requiring no modifications to the applications.

Proceedings ArticleDOI
12 Apr 2010
TL;DR: This paper proposes a formal model for representing mixed-criticality workloads and demonstrates the intractability of determining whether a system specified in this model can be scheduled to meet all its certification requirements.
Abstract: Many safety-critical embedded systems are subject to certification requirements; some systems may be required to meet multiple sets of certification requirements, from different certification authorities. Certification requirements in such "mixed-criticality" systems give rise to some interesting scheduling problems, that cannot be satisfactorily addressed using techniques from conventional scheduling theory. In this paper, we propose a formal model for representing such mixed-criticality workloads. We demonstrate the intractability of determining whether a system specified in this model can be scheduled to meet all its certification requirements. For dual-criticality systems -- systems subject to two sets of certification requirements -- we quantify, via the metric of processor speedup factor, the effectiveness of 2 techniques (reservation-based scheduling and priority-based scheduling) that are widely used in scheduling such mixed-criticality systems.

Proceedings ArticleDOI
30 Nov 2010
TL;DR: This paper provides a partitioned preemptive fixed-priority scheduling algorithm for periodic fork-join tasks under the fork join structure used in OpenMP and shows that any task set that is feasible on m unit speed processors can be scheduled by the proposed algorithm on m processors that are 3:42 times faster.
Abstract: Massively multi-core processors are rapidly gaining market share with major chip vendors offering an ever increasing number of cores per processor. From a programming perspective, the sequential programming model does not scale very well for such multi-core systems. Parallel programming models such as OpenMP present promising solutions for more effectively using multiple processor cores. In this paper, we study the problem of scheduling periodic real-time tasks on multiprocessors under the fork join structure used in OpenMP. We illustrate the theoretical best-case and worst-case periodic fork-join task sets from a processor utilization perspective. Based on our observations of these task sets, we provide a partitioned preemptive fixed-priority scheduling algorithm for periodic fork-join tasks. The proposed multiprocessor scheduling algorithm is shown to have a resource augmentation bound of 3.42, which implies that any task set that is feasible on m unit speed processors can be scheduled by the proposed algorithm on m processors that are 3:42 times faster.

Proceedings ArticleDOI
18 Dec 2010
TL;DR: The role of communication fabric in data center energy consumption is underlines and a scheduling approach that combines energy efficiency and network awareness, termed DENS is presented, which balances the energy consumption of a data center, individual job performance, and traffic demands.
Abstract: In modern data centers, energy consumption accounts for a considerably large slice of operational expenses. The state of the art in data center energy optimization is focusing only on job distribution between computing servers based on workload or thermal profiles. This paper underlines the role of communication fabric in data center energy consumption and presents a scheduling approach that combines energy efficiency and network awareness, termed DENS. The DENS methodology balances the energy consumption of a data center, individual job performance, and traffic demands. The proposed approach optimizes the tradeoff between job consolidation (to minimize the amount of computing servers) and distribution of traffic patterns (to avoid hotspots in the data center network).

Proceedings ArticleDOI
30 Nov 2010
TL;DR: The experiments using the YICES SMT solver show that the scheduling problem can be solved by YICES out-of-the-box for a few hundred random frame instances on the network.
Abstract: Networks for real-time systems have stringent end-to-end latency and jitter requirements. One cost-efficient way to meet these requirements is the time-triggered communication paradigm which plans the transmission points in time of the frames off-line. This plan prevents contentions of frames on the network and is called a time-triggered schedule (tt-schedule). In general the tt-scheduling is a bin-packing problem, known to be NP-complete, where the complexity is mostly driven by the freedom in topology of the network, its associated hardware restrictions, and application-imposed constraints. Multi-hop networks, in particular, require the synthesis of path-dependent tt-schedules to maintain full determinism of time-triggered communication from sender to receiver. Our experiments using the YICES SMT solver show that the scheduling problem can be solved by YICES out-of-the-box for a few hundred random frame instances on the network. A customized tt-scheduler using YICES as a back-end solver allows to increase this number of frame instances up to tens of thousands. In terms of scheduling quality, the synthesis produces up to ninety percent maximum utilization on a communication link with schedule synthesis times of about half an hour for the biggest examples we have studied. As a nice side-effect the YICES out-of-the-box approach is immediately applicable for the verification of existing (even large-scale) tt-schedules and for debugging more sophisticated tt-schedulers.

Proceedings ArticleDOI
01 Mar 2010
TL;DR: This paper uses statistical models to predict resource requirements for Cloud computing applications and presents initial design of a workload generator that can be used to evaluate alternative configurations without the overhead of reproducing a real workload.
Abstract: A recent trend for data-intensive computations is to use pay-as-you-go execution environments that scale transparently to the user. However, providers of such environments must tackle the challenge of configuring their system to provide maximal performance while minimizing the cost of resources used. In this paper, we use statistical models to predict resource requirements for Cloud computing applications. Such a prediction framework can guide system design and deployment decisions such as scale, scheduling, and capacity. In addition, we present initial design of a workload generator that can be used to evaluate alternative configurations without the overhead of reproducing a real workload. This paper focuses on statistical modeling and its application to data-intensive workloads.

Proceedings ArticleDOI
13 Apr 2010
TL;DR: This work proposes a framework that provides an intelligent consolidation methodology using different techniques such as turning on/off machines, power-aware consolidation algorithms, and machine learning techniques to deal with uncertain information while maximizing performance in an energy-efficient data center.
Abstract: As energy-related costs have become a major economical factor for IT infrastructures and data-centers, companies and the research community are being challenged to find better and more efficient power-aware resource management strategies. There is a growing interest in "Green" IT and there is still a big gap in this area to be covered.In order to obtain an energy-efficient data center, we propose a framework that provides an intelligent consolidation methodology using different techniques such as turning on/off machines, power-aware consolidation algorithms, and machine learning techniques to deal with uncertain information while maximizing performance. For the machine learning approach, we use models learned from previous system behaviors in order to predict power consumption levels, CPU loads, and SLA timings, and improve scheduling decisions. Our framework is vertical, because it considers from watt consumption to workload features, and cross-disciplinary, as it uses a wide variety of techniques.We evaluate these techniques with a framework that covers the whole control cycle of a real scenario, using a simulation with representative heterogeneous workloads, and we measure the quality of the results according to a set of metrics focused toward our goals, besides traditional policies. The results obtained indicate that our approach is close to the optimal placement and behaves better when the level of uncertainty increases.

Journal ArticleDOI
TL;DR: This work considers the problem of throughput-optimal scheduling in wireless networks subject to interference constraints, and shows that a simple greedy algorithm can provide a 49-approximation, and the maximal matching scheduling policy achieves a guaranteed fraction of the capacity region for "all".
Abstract: We consider the problem of throughput-optimal scheduling in wireless networks subject to interference constraints. We model the interference using a family of K-hop interference models, under which no two links within a K-hop distance can successfully transmit at the same time. For a given K, we can obtain a throughput-optimal scheduling policy by solving the well-known maximum weighted matching problem. We show that for K > 1, the resulting problems are NP-Hard that cannot be approximated within a factor that grows polynomially with the number of nodes. Interestingly, for geometric unit-disk graphs that can be used to describe a wide range of wireless networks, the problems admit polynomial time approximation schemes within a factor arbitrarily close to 1. In these network settings, we also show that a simple greedy algorithm can provide a 49-approximation, and the maximal matching scheduling policy, which can be easily implemented in a distributed fashion, achieves a guaranteed fraction of the capacity region for "all K." The geometric constraints are crucial to obtain these throughput guarantees. These results are encouraging as they suggest that one can develop low-complexity distributed algorithms to achieve near-optimal throughput for a wide range of wireless networks.

Journal ArticleDOI
TL;DR: An Improved Genetic Algorithm to solve the Distributed and Flexible Job-shop Scheduling problem is proposed and has been compared with other algorithms for distributed scheduling and evaluated with satisfactory results on a large set of distributed-and-flexible scheduling problems derived from classical job-shop scheduling benchmarks.