scispace - formally typeset
Search or ask a question

Showing papers on "Scheduling (computing) published in 2015"


Proceedings ArticleDOI
01 Jan 2015
TL;DR: This work couple blocked algorithms with dynamic and memory aware task scheduling to achieve a parallel and out-of-core NumPy clone and shows how this extends the effective scale of modern hardware to larger datasets.
Abstract: Dask enables parallel and out-of-core computation. We couple blocked algorithms with dynamic and memory aware task scheduling to achieve a parallel and out-of-core NumPy clone. We show how this extends the effective scale of modern hardware to larger datasets and discuss how these ideas can be more broadly applied to other parallel collections.

496 citations


01 Jan 2015
TL;DR: This review covers research on the topic of mixed criticality systems that has been published since Vestal’s 2007 paper and covers the period up to and including December 2015.
Abstract: This review covers research on the topic of mixed criticality systems that has been published since Vestal’s 2007 paper. It covers the period up to and including December 2015. The review is organised into the following topics: introduction and motivation, models, single processor analysis (including job-based, hard and soft tasks, fixed priority and EDF scheduling, shared resources and static and synchronous scheduling), multiprocessor analysis, related topics, realistic models, formal treatments, and systems issues. An appendix lists funded projects in the area of mixed criticality.

471 citations


Journal ArticleDOI
TL;DR: Through analyzing the cloud computing architecture, this survey first presents taxonomy at two levels of scheduling cloud resources, then paints a landscape of the scheduling problem and solutions, and a comprehensive survey of state-of-the-art approaches is presented systematically.
Abstract: A disruptive technology fundamentally transforming the way that computing services are delivered, cloud computing offers information and communication technology users a new dimension of convenience of resources, as services via the Internet. Because cloud provides a finite pool of virtualized on-demand resources, optimally scheduling them has become an essential and rewarding topic, where a trend of using Evolutionary Computation (EC) algorithms is emerging rapidly. Through analyzing the cloud computing architecture, this survey first presents taxonomy at two levels of scheduling cloud resources. It then paints a landscape of the scheduling problem and solutions. According to the taxonomy, a comprehensive survey of state-of-the-art approaches is presented systematically. Looking forward, challenges and potential future research directions are investigated and invited, including real-time scheduling, adaptive dynamic scheduling, large-scale scheduling, multiobjective scheduling, and distributed and parallel scheduling. At the dawn of Industry 4.0, cloud computing scheduling for cyber-physical integration with the presence of big data is also discussed. Research in this area is only in its infancy, but with the rapid fusion of information and data technology, more exciting and agenda-setting topics are likely to emerge on the horizon.

416 citations


Journal ArticleDOI
Ali Allahverdi1
TL;DR: This paper is the third comprehensive survey paper which provides an extensive review of about 500 papers that have appeared since the mid-2006 to the end of 2014, including static, dynamic, deterministic, and stochastic environments, based on shop environments as single machine, parallel machine, flowshop, job shop, or open shop.

369 citations


Proceedings ArticleDOI
01 Nov 2015
TL;DR: This paper addresses the challenge of bringing TSCH (Time Slotted Channel Hopping MAC) to dynamic networks, focusing on low-power IPv6 and RPL networks, and introduces Orchestra, which allows Orchestra to build non-deterministic networks while exploiting the robustness of TSCH.
Abstract: Time slotted operation is a well-proven approach to achieve highly reliable low-power networking through scheduling and channel hopping. It is, however, difficult to apply time slotting to dynamic networks as envisioned in the Internet of Things. Commonly, these applications do not have pre-defined periodic traffic patterns and nodes can be added or removed dynamically.This paper addresses the challenge of bringing TSCH (Time Slotted Channel Hopping MAC) to such dynamic networks. We focus on low-power IPv6 and RPL networks, and introduce Orchestra. In Orchestra, nodes autonomously compute their own, local schedules. They maintain multiple schedules, each allocated to a particular traffic plane (application, routing, MAC), and updated automatically as the topology evolves. Orchestra (re)computes local schedules without signaling overhead, and does not require any central or distributed scheduler. Instead, it relies on the existing network stack information to maintain the schedules. This scheme allows Orchestra to build non-deterministic networks while exploiting the robustness of TSCH.We demonstrate the practicality of Orchestra and quantify its benefits through extensive evaluation in two testbeds, on two hardware platforms. Orchestra reduces, or even eliminates, network contention. In long running experiments of up to 72~h we show that Orchestra achieves end-to-end delivery ratios of over 99.99%. Compared to RPL in asynchronous low-power listening networks, Orchestra improves reliability by two orders of magnitude, while achieving a similar latency-energy balance.

360 citations


Proceedings Article
08 Jul 2015
TL;DR: GridGraph can stream the edges and apply on-the-fly vertex updates, thus reduce the I/O amount required for computation, and is even competitive with distributed systems, and provides significant cost efficiency in cloud environment.
Abstract: In this paper, we present GridGraph, a system for processing large-scale graphs on a single machine. Grid-Graph breaks graphs into 1D-partitioned vertex chunks and 2D-partitioned edge blocks using a first fine-grained level partitioning in preprocessing. A second coarse-grained level partitioning is applied in runtime. Through a novel dual sliding windows method, GridGraph can stream the edges and apply on-the-fly vertex updates, thus reduce the I/O amount required for computation. The partitioning of edges also enable selective scheduling so that some of the blocks can be skipped to reduce unnecessary I/O. This is very effective when the active vertex set shrinks with convergence. Our evaluation results show that GridGraph scales seamlessly with memory capacity and disk bandwidth, and outperforms state-of-the-art out-of-core systems, including GraphChi and X-Stream. Furthermore, we show that the performance of GridGraph is even competitive with distributed systems, and it also provides significant cost efficiency in cloud environment.

310 citations


Proceedings ArticleDOI
13 Apr 2015
TL;DR: This paper formulates the online virtual function mapping and scheduling problem and proposes a set of algorithms for solving it and proposes three greedy algorithms and a tabu search-based heuristic.
Abstract: Network function virtualization has received attention from both academia and industry as an important shift in the deployment of telecommunication networks and services. It is being proposed as a path towards cost efficiency, reduced time-to-markets, and enhanced innovativeness in telecommunication service provisioning. However, efficiently running virtualized services is not trivial as, among other initialization steps, it requires first mapping virtual networks onto physical networks, and thereafter mapping and scheduling virtual functions onto the virtual networks. This paper formulates the online virtual function mapping and scheduling problem and proposes a set of algorithms for solving it. Our main objective is to propose simple algorithms that may be used as a basis for future work in this area. To this end, we propose three greedy algorithms and a tabu search-based heuristic. We carry out evaluations of these algorithms considering parameters such as successful service mappings, total service processing times, revenue, cost etc, under varying network conditions. Simulations show that the tabu search-based algorithm performs only slightly better than the best greedy algorithm.

287 citations


Journal ArticleDOI
TL;DR: Experimental results show that based on these four metrics, a multi-objective optimization method is better than other similar methods, especially as it increased 56.6% in the best case scenario.
Abstract: For task-scheduling problems in cloud computing, a multi-objective optimization method is proposed here. First, with an aim toward the biodiversity of resources and tasks in cloud computing, we propose a resource cost model that defines the demand of tasks on resources with more details. This model reflects the relationship between the user’s resource costs and the budget costs. A multi-objective optimization scheduling method has been proposed based on this resource cost model. This method considers the makespan and the user’s budget costs as constraints of the optimization problem, achieving multi-objective optimization of both performance and cost. An improved ant colony algorithm has been proposed to solve this problem. Two constraint functions were used to evaluate and provide feedback regarding the performance and budget cost. These two constraint functions made the algorithm adjust the quality of the solution in a timely manner based on feedback in order to achieve the optimal solution. Some simulation experiments were designed to evaluate this method’s performance using four metrics: 1) the makespan; 2) cost; 3) deadline violation rate; and 4) resource utilization. Experimental results show that based on these four metrics, a multi-objective optimization method is better than other similar methods, especially as it increased 56.6% in the best case scenario.

265 citations


Proceedings ArticleDOI
17 Aug 2015
TL;DR: Aalo is presented, a non-clairvoyant scheduler that strikes a balance and efficiently schedules coflows without prior knowledge that is comparable to that of solutions using prior knowledge, and Aalo outperforms them in presence of cluster dynamics.
Abstract: Inter-coflow scheduling improves application-level communication performance in data-parallel clusters. However, existing efficient schedulers require a priori coflow information and ignore cluster dynamics like pipelining, task failures, and speculative executions, which limit their applicability. Schedulers without prior knowledge compromise on performance to avoid head-of-line blocking. In this paper, we present Aalo that strikes a balance and efficiently schedules coflows without prior knowledge. Aalo employs Discretized Coflow-Aware Least-Attained Service (D-CLAS) to separate coflows into a small number of priority queues based on how much they have already sent across the cluster. By performing prioritization across queues and by scheduling coflows in the FIFO order within each queue, Aalo's non-clairvoyant scheduler reduces coflow completion times while guaranteeing starvation freedom. EC2 deployments and trace-driven simulations show that communication stages complete 1.93X faster on average and 3.59X faster at the 95th percentile using Aalo in comparison to per-flow mechanisms. Aalo's performance is comparable to that of solutions using prior knowledge, and Aalo outperforms them in presence of cluster dynamics.

264 citations


Journal ArticleDOI
TL;DR: A stochastic programming framework for conducting optimal 24-h scheduling of CHP-based MGs consisting of wind turbine, fuel cell, boiler, a typical power-only unit, and energy storage devices is presented.
Abstract: Microgrids (MGs) are considered as a key solution for integrating renewable and distributed energy resources, combined heat and power (CHP) systems, as well as distributed energy-storage systems This paper presents a stochastic programming framework for conducting optimal 24-h scheduling of CHP-based MGs consisting of wind turbine, fuel cell, boiler, a typical power-only unit, and energy storage devices The objective of scheduling is to find the optimal set points of energy resources for profit maximization considering demand response programs and uncertainties The impact of the wind speed, market, and MG load uncertainties on the MG scheduling problem is characterized through a stochastic programming formulation This paper studies three cases to confirm the performance of the proposed model The effect of CHP-based MG scheduling in the islanded and grid-connected modes, as well as the effectiveness of applying the proposed DR program is investigated in the case studies

247 citations


Journal ArticleDOI
TL;DR: This paper addresses the problem of energy-efficient resource allocation in the downlink of a cellular orthogonal frequency division multiple access system and shows that the maximization of the energy efficiency is approximately equivalent to the maximizations of the spectral efficiency for small values of the maximum transmit power.
Abstract: This paper addresses the problem of energy-efficient resource allocation in the downlink of a cellular orthogonal frequency division multiple access system. Three definitions of energy efficiency are considered for system design, accounting for both the radiated and the circuit power. User scheduling and power allocation are optimized across a cluster of coordinated base stations with a constraint on the maximum transmit power (either per subcarrier or per base station). The asymptotic noise-limited regime is discussed as a special case. Results show that the maximization of the energy efficiency is approximately equivalent to the maximization of the spectral efficiency for small values of the maximum transmit power, while there is a wide range of values of the maximum transmit power for which a moderate reduction of the data rate provides large savings in terms of dissipated energy. In addition, the performance gap among the considered resource allocation strategies is reduced as the out-of-cluster interference increases.

Journal Article
TL;DR: In this article, a cloud task scheduling policy based on ant colony optimization algorithm compared with different scheduling algorithms FCFS and round-robin has been presented, the main goal of these algorithms is minimizing the makespan of a given tasks set.
Abstract: Cloud computing is the development of distributed computing, parallel computing and grid computing, or defined as the commercial implementation of these computer science concepts. One of the fundamental issues in this environment is related to task scheduling. Cloud task scheduling is an NP-hard optimization problem, and many meta-heuristic algorithms have been proposed to solve it. A good task scheduler should adapt its scheduling strategy to the changing environment and the types of tasks. In this paper a cloud task scheduling policy based on ant colony optimization algorithm compared with different scheduling algorithms FCFS and round-robin, has been presented. The main goal of these algorithms is minimizing the makespan of a given tasks set. Ant colony optimization is random optimization search approach that will be used for allocating the incoming jobs to the virtual machines. Algorithms have been simulated using Cloudsim toolkit package. Experimental results showed that the ant colony optimization outperformed FCFS and round-robin algorithms.

Journal ArticleDOI
TL;DR: This paper studies, for the first time, multi-user computation partitioning problem (MCPP), which considers the partitioning of multiple users' computations together with the scheduling of offloaded computations on the cloud resources, and designs an offline heuristic algorithm, namely SearchAdjust, to solve MCPP.
Abstract: Elastic partitioning of computations between mobile devices and cloud is an important and challenging research topic for mobile cloud computing. Existing works focus on the single-user computation partitioning, which aims to optimize the application completion time for one particular single user. These works assume that the cloud always has enough resources to execute the computations immediately when they are offloaded to the cloud. However, this assumption does not hold for large scale mobile cloud applications. In these applications, due to the competition for cloud resources among a large number of users, the offloaded computations may be executed with certain scheduling delay on the cloud. Single user partitioning that does not take into account the scheduling delay on the cloud may yield significant performance degradation. In this paper, we study, for the first time, multi-user computation partitioning problem (MCPP), which considers the partitioning of multiple users’ computations together with the scheduling of offloaded computations on the cloud resources. Instead of pursuing the minimum application completion time for every single user, we aim to achieve minimum average completion time for all the users, based on the number of provisioned resources on the cloud. We show that MCPP is different from and more difficult than the classical job scheduling problems. We design an offline heuristic algorithm, namely SearchAdjust , to solve MCPP. We demonstrate through benchmarks that SearchAdjust outperforms both the single user partitioning approaches and classical job scheduling approaches by 10 percent on average in terms of application delay. Based on SearchAdjust , we also design an online algorithm for MCPP that can be easily deployed in practical systems. We validate the effectiveness of our online algorithm using real world load traces.

Journal ArticleDOI
TL;DR: This work investigates the problem of scheduling tasks (which belong to the same or possibly different applications) in the MCC environment and presents a novel algorithm, which starts from a minimal-delay scheduling solution and performs energy reduction by migrating tasks among the local cores and the cloud and by applying the dynamic voltage and frequency scaling technique.
Abstract: Mobile cloud computing (MCC) offers significant opportunities in performance enhancement and energy saving for mobile, battery-powered devices. Applications running on mobile devices may be represented by task graphs. This work investigates the problem of scheduling tasks (which belong to the same or possibly different applications) in the MCC environment. More precisely, the scheduling problem involves the following steps: (i) determining the tasks to be offloaded onto the cloud, (ii) mapping the remaining tasks onto (potentially heterogeneous) local cores in the mobile device, (iii) determining the frequencies for executing local tasks, and (iv) scheduling tasks on the cores (for in-house tasks) and the wireless communication channels (for offloaded tasks) such that the task-precedence requirements and the application completion time constraint are satisfied while the total energy dissipation in the mobile device is minimized. A novel algorithm is presented, which starts from a minimal-delay scheduling solution and subsequently performs energy reduction by migrating tasks among the local cores and the cloud and by applying the dynamic voltage and frequency scaling technique. A linear-time rescheduling algorithm is proposed for the task migration. Simulation results demonstrate significant energy reduction with the application completion time constraint satisfied.

Journal ArticleDOI
TL;DR: It is found that the key factor determining the performance of an algorithm is its ability to decide which workflows in an ensemble to admit or reject for execution, and an admission procedure based on workflow structure and estimates of task runtimes can significantly improve the quality of solutions.

Journal ArticleDOI
TL;DR: In this article, a frame-based precoding problem is optimally solved using the principles of physical layer multicasting to multiple co-channel groups under per-antenna constraints, and a novel optimization problem that aims at maximizing the system sum rate under individual power constraints is proposed.
Abstract: The present work focuses on the forward link of a broadband multibeam satellite system that aggressively reuses the user link frequency resources. Two fundamental practical challenges, namely the need to frame multiple users per transmission and the per-antenna transmit power limitations, are addressed. To this end, the so-called frame-based precoding problem is optimally solved using the principles of physical layer multicasting to multiple co-channel groups under per-antenna constraints. In this context, a novel optimization problem that aims at maximizing the system sum rate under individual power constraints is proposed. Added to that, the formulation is further extended to include availability constraints. As a result, the high gains of the sum rate optimal design are traded off to satisfy the stringent availability requirements of satellite systems. Moreover, the throughput maximization with a granular spectral efficiency versus SINR function, is formulated and solved. Finally, a multicast-aware user scheduling policy, based on the channel state information, is developed. Thus, substantial multiuser diversity gains are gleaned. Numerical results over a realistic simulation environment exhibit as much as 30% gains over conventional systems, even for 7 users per frame, without modifying the framing structure of legacy communication standards.

Proceedings ArticleDOI
01 Dec 2015
TL;DR: In this paper, the authors present pHost, a new transport design aimed at achieving both: the near-optimal performance of pFabric and the commodity network design of Fastpass.
Abstract: The importance of minimizing flow completion times (FCT) in datacenters has led to a growing literature on new network transport designs. Of particular note is pFabric, a protocol that achieves near-optimal FCTs. However, pFabric's performance comes at the cost of generality, since pFabric requires specialized hardware that embeds a specific scheduling policy within the network fabric, making it hard to meet diverse policy goals. Aiming for generality, the recent Fastpass proposal returns to a design based on commodity network hardware and instead relies on a centralized scheduler. Fastpass achieves generality, but (as we show) loses many of pFabric's performance benefits.We present pHost, a new transport design aimed at achieving both: the near-optimal performance of pFabric and the commodity network design of Fastpass. Similar to Fastpass, pHost keeps the network simple by decoupling the network fabric from scheduling decisions. However, pHost introduces a new distributed protocol that allows end-hosts to directly make scheduling decisions, thus avoiding the overheads of Fastpass's centralized scheduler architecture. We show that pHost achieves performance on par with pFabric (within 4% for typical conditions) and significantly outperforms Fastpass (by a factor of 3.8×) while relying only on commodity network hardware.

Journal ArticleDOI
TL;DR: This paper focus on artificial intelligence approaches to NP-hard job shop scheduling (JSS) problem and successful approaches of artificial intelligence techniques such as neural network, genetic algorithm, multi agent systems, simulating annealing, bee colony optimization, ant colony optimization and particle swarm algorithm are presented.
Abstract: This paper focus on artificial intelligence approaches to NP-hard job shop scheduling (JSS) problem In the literature successful approaches of artificial intelligence techniques such as neural network, genetic algorithm, multi agent systems, simulating annealing, bee colony optimization, ant colony optimization, particle swarm algorithm, etc are presented as solution approaches to job shop scheduling problem These studies are surveyed and their successes are listed in this article

Proceedings ArticleDOI
24 Nov 2015
TL;DR: R-Storm as mentioned in this paper implements resource-aware scheduling within Storm, which can satisfy both soft and hard resource constraints as well as minimize network distance between components that communicate with each other, achieving 30-47% higher throughput and 69-350% better CPU utilization than default Storm.
Abstract: The era of big data has led to the emergence of new systems for real-time distributed stream processing, e.g., Apache Storm is one of the most popular stream processing systems in industry today. However, Storm, like many other stream processing systems lacks an intelligent scheduling mechanism. The default round-robin scheduling currently deployed in Storm disregards resource demands and availability, and can therefore be inefficient at times. We present R-Storm (Resource-Aware Storm), a system that implements resource-aware scheduling within Storm. R-Storm is designed to increase overall throughput by maximizing resource utilization while minimizing network latency. When scheduling tasks, R-Storm can satisfy both soft and hard resource constraints as well as minimizing network distance between components that communicate with each other. We evaluate R-Storm on set of micro-benchmark Storm applications as well as Storm applications used in production at Yahoo! Inc. From our experimental results we conclude that R-Storm achieves 30-47% higher throughput and 69-350% better CPU utilization than default Storm for the micro-benchmarks. For the Yahoo! Storm applications, R-Storm outperforms default Storm by around 50% based on overall throughput. We also demonstrate that R-Storm performs much better when scheduling multiple Storm applications than default Storm.

Proceedings ArticleDOI
24 Aug 2015
TL;DR: This work presents Rapier, a coflow-aware network optimization framework that seamlessly integrates routing and scheduling for better application performance, and demonstrates that Rapier significantly reduces the average coflow completion time.
Abstract: In the data flow models of today's data center applications such as MapReduce, Spark and Dryad, multiple flows can comprise a coflow group semantically. Only completing all flows in a coflow is meaningful to an application. To optimize application performance, routing and scheduling must be jointly considered at the level of a coflow rather than individual flows. However, prior solutions have significant limitation: they only consider scheduling, which is insufficient. To this end, we present Rapier, a coflow-aware network optimization framework that seamlessly integrates routing and scheduling for better application performance. Using a small-scale testbed implementation and large-scale simulations, we demonstrate that Rapier significantly reduces the average coflow completion time (CCT) by up to 79.30% compared to the state-of-the-art scheduling-only solution, and it is readily implementable with existing commodity switches.

Journal ArticleDOI
TL;DR: A radio frequency identification-based intelligent decision support system architecture to handle production monitoring and scheduling in a distributed manufacturing environment that has good extensibility and scalability and can easily be integrated with production decision-making as well as production and logistics operations in the supply chain is proposed.

Proceedings ArticleDOI
17 Aug 2015
TL;DR: Corral is a scheduling framework that uses characteristics of future workloads to determine an offline schedule which jointly places data and compute to achieve better data locality, and isolates jobs both spatially (by scheduling them in different parts of the cluster) and temporally, improving their performance.
Abstract: To reduce the impact of network congestion on big data jobs, cluster management frameworks use various heuristics to schedule compute tasks and/or network flows. Most of these schedulers consider the job input data fixed and greedily schedule the tasks and flows that are ready to run. However, a large fraction of production jobs are recurring with predictable characteristics, which allows us to plan ahead for them. Coordinating the placement of data and tasks of these jobs allows for significantly improving their network locality and freeing up bandwidth, which can be used by other jobs running on the cluster. With this intuition, we develop Corral, a scheduling framework that uses characteristics of future workloads to determine an offline schedule which (i) jointly places data and compute to achieve better data locality, and (ii) isolates jobs both spatially (by scheduling them in different parts of the cluster) and temporally, improving their performance. We implement Corral on Apache Yarn, and evaluate it on a 210 machine cluster using production workloads. Compared to Yarn's capacity scheduler, Corral reduces the makespan of these workloads up to 33% and the median completion time up to 56%, with 20-90% reduction in data transferred across racks.

Journal ArticleDOI
TL;DR: It is proved that the expected makespan of scheduling stochastic tasks is greater than or equal to the makes pan of scheduling deterministic tasks, where all processing times and communication times are replaced by their expected values.
Abstract: Generally, a parallel application consists of precedence constrained stochastic tasks, where task processing times and intertask communication times are random variables following certain probability distributions. Scheduling such precedence constrained stochastic tasks with communication times on a heterogeneous cluster system with processors of different computing capabilities to minimize a parallel application's expected completion time is an important but very difficult problem in parallel and distributed computing. In this paper, we present a model of scheduling stochastic parallel applications on heterogeneous cluster systems. We discuss stochastic scheduling attributes and methods to deal with various random variables in scheduling stochastic tasks. We prove that the expected makespan of scheduling stochastic tasks is greater than or equal to the makespan of scheduling deterministic tasks, where all processing times and communication times are replaced by their expected values. To solve the problem of scheduling precedence constrained stochastic tasks efficiently and effectively, we propose a stochastic dynamic level scheduling (SDLS) algorithm, which is based on stochastic bottom levels and stochastic dynamic levels. Our rigorous performance evaluation results clearly demonstrate that the proposed stochastic task scheduling algorithm significantly outperforms existing algorithms in terms of makespan, speedup, and makespan standard deviation.

Journal ArticleDOI
TL;DR: Three task scheduling algorithms, called MCC, MEMAX and CMMN for heterogeneous multi-cloud environment, which aim to minimize the makespan and maximize the average cloud utilization are presented.
Abstract: Cloud Computing has grown exponentially in the business and research community over the last few years. It is now an emerging field and becomes more popular due to recent advances in virtualization technology. In Cloud Computing, various applications are submitted to the datacenters to obtain some services on pay-per-use basis. However, due to limited resources, some workloads are transferred to other data centers to handle peak client demands. Therefore, scheduling workloads in heterogeneous multi-cloud environment is a hot topic and very challenging due to heterogeneity of the cloud resources with varying capacities and functionalities. In this paper, we present three task scheduling algorithms, called MCC, MEMAX and CMMN for heterogeneous multi-cloud environment, which aim to minimize the makespan and maximize the average cloud utilization. The proposed MCC algorithm is a single-phase scheduling whereas rests are two-phase scheduling. We perform rigorous experiments on the proposed algorithms using various benchmark as well as synthetic datasets. Their performances are evaluated in terms of makespan and average cloud utilization and experimental results are compared with that of existing single-phase and two-phase scheduling algorithms to demonstrate the efficacy of the proposed algorithms.

Journal ArticleDOI
TL;DR: An event-driven model that involves three types of events, i.e., departure events, arrival events, and passenger arrival rates change events is proposed that can be used to solve the train scheduling problem for an urban rail transit network.
Abstract: This paper considers the train scheduling problem for an urban rail transit network. We propose an event-driven model that involves three types of events, i.e., departure events, arrival events, and passenger arrival rates change events. The routing of the arriving passengers at transfer stations is also included in the train scheduling model. Moreover, the passenger transfer behavior (i.e., walking times and transfer times of passengers) is also taken into account in the model formulation. The resulting optimization problem is a real-valued nonlinear nonconvex problem. Nonlinear programming approaches (e.g., sequential quadratic programming) and evolutionary algorithms (e.g., genetic algorithms) can be used to solve this train scheduling problem. The effectiveness of the event-driven model is evaluated through a case study.

Proceedings Article
08 Jul 2015
TL;DR: This paper addresses the problem of efficient scheduling of large clusters under high load and heterogeneous workloads by proposing a new hybrid centralized/ distributed scheduler, called Hawk, and proposes a novel and efficient randomized work-stealing algorithm.
Abstract: This paper addresses the problem of efficient scheduling of large clusters under high load and heterogeneous workloads. A heterogeneous workload typically consists of many short jobs and a small number of large jobs that consume the bulk of the cluster's resources. Recent work advocates distributed scheduling to overcome the limitations of centralized schedulers for large clusters with many competing jobs. Such distributed schedulers are inherently scalable, but may make poor scheduling decisions because of limited visibility into the overall resource usage in the cluster. In particular, we demonstrate that under high load, short jobs can fare poorly with such a distributed scheduler. We propose instead a new hybrid centralized/ distributed scheduler, called Hawk. In Hawk, long jobs are scheduled using a centralized scheduler, while short ones are scheduled in a fully distributed way. Moreover, a small portion of the cluster is reserved for the use of short jobs. In order to compensate for the occasional poor decisions made by the distributed scheduler, we propose a novel and efficient randomized work-stealing algorithm. We evaluate Hawk using a trace-driven simulation and a prototype implementation in Spark. In particular, using a Google trace, we show that under high load, compared to the purely distributed Sparrow scheduler, Hawk improves the 50th and 90th percentile runtimes by 80% and 90% for short jobs and by 35% and 10% for long jobs, respectively. Measurements of a prototype implementation using Spark on a 100-node cluster confirm the results of the simulation.

Proceedings ArticleDOI
27 Aug 2015
TL;DR: Tarcil is presented, a distributed scheduler that targets both scheduling speed and quality, and uses an analytically derived sampling framework that adjusts the sample size based on load, and provides statistical guarantees on the quality of allocated resources.
Abstract: Scheduling diverse applications in large, shared clusters is particularly challenging. Recent research on cluster scheduling focuses either on scheduling speed, using sampling to quickly assign resources to tasks, or on scheduling quality, using centralized algorithms that search for the resources that improve both task performance and cluster utilization. We present Tarcil, a distributed scheduler that targets both scheduling speed and quality. Tarcil uses an analytically derived sampling framework that adjusts the sample size based on load, and provides statistical guarantees on the quality of allocated resources. It also implements admission control when sampling is unlikely to find suitable resources. This makes it appropriate for large, shared clusters hosting short- and long-running jobs. We evaluate Tarcil on clusters with hundreds of servers on EC2. For highly-loaded clusters running short jobs, Tarcil improves task execution time by 41% over a distributed, sampling-based scheduler. For more general scenarios, Tarcil achieves near-optimal performance for 4× and 2× more jobs than sampling-based and centralized schedulers respectively.

Journal ArticleDOI
TL;DR: A resource-aware hybrid scheduling algorithm suitable for Heterogeneous Distributed Computing, especially for modern High-Performance Computing (HPC) systems in which applications are modeled with various requirements (both IO and computational intensive), with accent on data from multimedia applications.

Posted Content
TL;DR: A quantum annealing solver for the renowned job-shop scheduling problem (JSP) is presented in detail and the results from the processor are compared against state-of-the-art global-optimum solvers.
Abstract: A quantum annealing solver for the renowned job-shop scheduling problem (JSP) is presented in detail. After formulating the problem as a time-indexed quadratic unconstrained binary optimization problem, several pre-processing and graph embedding strategies are employed to compile optimally parametrized families of the JSP for scheduling instances of up to six jobs and six machines on the D-Wave Systems Vesuvius processor. Problem simplifications and partitioning algorithms, including variable pruning and running strategies that consider tailored binary searches, are discussed and the results from the processor are compared against state-of-the-art global-optimum solvers.

Journal ArticleDOI
TL;DR: In this article, a distributed permutation flow shop scheduling problem is addressed, in which a set of jobs has to be scheduled over a number of identical factories, each one with its machines arranged as a flow shop.
Abstract: As the interest of practitioners and researchers in scheduling in a multi-factory environment is growing, there is an increasing need to provide efficient algorithms for this type of decision problems, characterised by simultaneously addressing the assignment of jobs to different factories/workshops and their subsequent scheduling. Here we address the so-called distributed permutation flowshop scheduling problem, in which a set of jobs has to be scheduled over a number of identical factories, each one with its machines arranged as a flowshop. Several heuristics have been designed for this problem, although there is no direct comparison among them. In this paper, we propose a new heuristic which exploits the specific structure of the problem. The computational experience carried out on a well-known testbed shows that the proposed heuristic outperforms existing state-of-the-art heuristics, being able to obtain better upper bounds for more than one quarter of the problems in the testbed.