scispace - formally typeset
Search or ask a question

Showing papers on "Scheduling (computing) published in 1996"


Journal ArticleDOI
TL;DR: It is shown that on real and synthetic applications, the “work” and “critical-path length” of a Cilk computation can be used to model performance accurately, and it is proved that for the class of “fully strict” (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal.

1,688 citations


Journal ArticleDOI
TL;DR: This paper describes a new approximation of fair queuing that achieves nearly perfect fairness in terms of throughput, requires only O(1) work to process a packet, and is simple enough to implement in hardware.
Abstract: Fair queuing is a technique that allows each flow passing through a network device to have a fair share of network resources. Previous schemes for fair queuing that achieved nearly perfect fairness were expensive to implement; specifically, the work required to process a packet in these schemes was O(log(n)), where n is the number of active flows. This is expensive at high speeds. On the other hand, cheaper approximations of fair queuing reported in the literature exhibit unfair behavior. In this paper, we describe a new approximation of fair queuing, that we call deficit round-robin. Our scheme achieves nearly perfect fairness in terms of throughput, requires only O(1) work to process a packet, and is simple enough to implement in hardware. Deficit round-robin is also applicable to other scheduling problems where servicing cannot be broken up into smaller units (such as load balancing) and to distributed queues.

1,589 citations


Journal ArticleDOI
TL;DR: A static scheduling algorithm for allocating task graphs to fully connected multiprocessors which has admissible time complexity, is economical in terms of the number of processors used and is suitable for a wide range of graph structures.
Abstract: In this paper, we propose a static scheduling algorithm for allocating task graphs to fully connected multiprocessors. We discuss six recently reported scheduling algorithms and show that they possess one drawback or the other which can lead to poor performance. The proposed algorithm, which is called the Dynamic Critical-Path (DCP) scheduling algorithm, is different from the previously proposed algorithms in a number of ways. First, it determines the critical path of the task graph and selects the next node to be scheduled in a dynamic fashion. Second, it rearranges the schedule on each processor dynamically in the sense that the positions of the nodes in the partial schedules are not fixed until all nodes have been considered. Third, it selects a suitable processor for a node by looking ahead the potential start times of the remaining nodes on that processor, and schedules relatively less important nodes to the processors already in use. A global as well as a pair-wise comparison is carried out for all seven algorithms under various scheduling conditions. The DCP algorithm outperforms the previous algorithms by a considerable margin. Despite having a number of new features, the DCP algorithm has admissible time complexity, is economical in terms of the number of processors used and is suitable for a wide range of graph structures.

842 citations


Proceedings ArticleDOI
24 Mar 1996
TL;DR: This paper proves that if a suitable queueing policy and scheduling algorithm are used then it is possible to achieve 100% throughput for all independent arrival processes.
Abstract: It is well known that head-of-line (HOL) blocking limits the throughput of an input-queued switch with FIFO queues. Under certain conditions, the throughput can be shown to be limited to approximately 58%. It is also known that if non-FIFO queueing policies are used, the throughput can be increased. However it has not been previously shown that if a suitable queueing policy and scheduling algorithm are used then it is possible to achieve 100% throughput for all independent arrival processes. In this paper we prove this to be the case using a simple linear programming argument and quadratic Lyapunov function. In particular we assume that each input maintains a separate FIFO queue for each output and that the switch is scheduled using a maximum weight bipartite matching algorithm.

829 citations


Journal ArticleDOI
TL;DR: Several well-documented applications of no-wait and blocking scheduling models are described and some ways in which the increasing use of modern manufacturing methods gives rise to other applications are illustrated.
Abstract: An important class of machine scheduling problems is characterized by a no-wait or blocking production environment, where there is no intermediate buffer between machines. In a no-wait environment, a job must be processed from start to completion, without any interruption either on or between machines. Blocking occurs when a job, having completed processing on a machine, remains on the machine until a downstream machine becomes available for processing. A no-wait or blocking production environment typically arises from characteristics of the processing technology itself, or from the absence of storage capacity between operations of a job. In this review paper, we describe several well-documented applications of no-wait and blocking scheduling models and illustrate some ways in which the increasing use of modern manufacturing methods gives rise to other applications. We review the computational complexity of a wide variety of no-wait and blocking scheduling problems and describe several problems which remain open as to complexity. We study several deterministic flowshop, jobshop, and openshop problems and describe efficient and enumerative algorithms, as well as heuristics and results about their performance. The literature on stochastic no-wait and blocking scheduling problems is also reviewed. Finally, we provide some suggestions for future research directions.

815 citations


Journal ArticleDOI
TL;DR: It is shown that the performance-ranking of priority rules does not differ for single-pass scheduling and sampling, that sampling improves the performance of single- pass scheduling significantly, and that the parallel method cannot be generally considered as superior.

685 citations


Proceedings ArticleDOI
28 Aug 1996
TL;DR: Start-time Fair Queuing algorithm is presented that is computationally efficient, achieves fairness regardless of variation in a server capacity, and has the smallest fairness measure among all known fair scheduling algorithms.
Abstract: We present Start-time Fair Queuing (SFQ) algorithm that is computationally efficient, achieves fairness regardless of variation in a server capacity, and has the smallest fairness measure among all known fair scheduling algorithms. We analyze its throughput, single server delay, and end-to-end delay guarantee for variable rate Fluctuation Constrained (FC) and Exponentially Bounded Fluctuation (EBF) servers. We show that SFQ is better suited than Weighted Fair Queuing for integrated services networks and it is strictly better than Self Clocked Fair Queuing. To support heterogeneous services and multiple protocol families in integrated services networks, we present a hierarchical SFQ scheduler and derive its performance bounds. Our analysis demonstrates that SFQ is suitable for integrated services networks since it: (1) achieves low average as well as maximum delay for low-throughput applications (e.g., interactive audio, telnet, etc.); (2) provides fairness which is desirable for VBR video; (3) provides fairness, regardless of variation in a server capacity, for throughput-intensive, flow-controlled data applications; (4) enables hierarchical link sharing which is desirable for managing heterogeneity; and (5) is computationally efficient.

610 citations


Patent
13 Aug 1996
TL;DR: In this paper, an electronic Personal Information Manager (PIM) including a peer-to-peer group scheduling/calendar system is described, and the system generates a scheduling invitation incorporating different formats.
Abstract: An electronic Personal Information Manager (PIM) including a peer-to-peer group scheduling/calendar system is described The group scheduling/calendar system provides methods for peer-to-peer group scheduling among users, including those users who only have simple e-mail support (ie, do not have access to the group scheduling/calendar system itself) If a user is able to receive and respond to e-mail, he or she is able to participate in group scheduling in an automated fashion Under user control, the system generates a scheduling invitation incorporating different formats Each format includes, in order of increasing content richness, a simple text embedded scheduling invitation, an HTML (Hypertext Markup Language) form embedded scheduling invitation, and a proprietary binary "MIME" (Multipurpose Internet Mail Extensions) scheduling invitation Each format is designed to transfer the highest degree of information content which a particular target client type can handle A recipient of the scheduling message employs the messaging format best suited for his or her environment Regardless of which format the recipient employs, the group scheduling system processes the reply message automatically, with the appropriate information automatically included in the user's group scheduling calendar The system supports different levels of participation of various individuals throughout various stages of group scheduling, despite the fact that some of the individuals who need to participate might use other proprietary software and reside in other time zones

603 citations


Proceedings ArticleDOI
24 Mar 1996
TL;DR: The theory of LR-servers enables computation of tight upper-bounds on end-to-end delay and buffer requirements in a network of servers in which the servers on a path may not all use the same scheduling algorithm.
Abstract: In this paper, we develop a general model, called latency-rate servers (LR-servers), for the analysis of traffic scheduling algorithms in broadband packet networks. The behavior of an LR scheduler is determined by two parameters-the latency and the allocated rate. We show that several well-known scheduling algorithms, such as weighted fair queueing, virtualclock, self-clocked fair queueing, weighted round robin, and deficit round robin, belong to the class of LR-servers. We derive tight upper bounds on the end-to-end delay, internal burstiness, and buffer requirements of individual sessions in an arbitrary network of LR-servers in terms of the latencies of the individual schedulers in the network, when the session traffic is shaped by a leaky bucket. Thus, the theory of LR-servers enables computation of tight upper-bounds on end-to-end delay and buffer requirements in a network of servers in which the servers on a path may not all use the same scheduling algorithm. We also define a self-contained approach to evaluate the fairness of LR-servers and use it to compare the fairness of many well-known scheduling algorithms.

493 citations


Proceedings ArticleDOI
17 Nov 1996
TL;DR: A set of principles underlying application-level scheduling is defined and a work-in-progress building AppLeS (application- level scheduling) agents are described and illustrated with a detailed description and results for a distributed 2D Jacobi application on two production heterogeneous platforms.
Abstract: Heterogeneous networks are increasingly being used as platforms for resource-intensive distributed parallel applications. A critical contributor to the performance of such applications is the scheduling of constituent application tasks on the network. Since often the distributed resources cannot be brought under the control of a single global scheduler, the application must be scheduled by the user. To obtain the best performance, the user must take into account both application-specific and dynamic system information in developing a schedule which meets his or her performance criteria. In this paper, we define a set of principles underlying application-level scheduling and describe our work-in-progress building AppLeS (application-level scheduling) agents. We illustrate the application-level scheduling approach with a detailed description and results for a distributed 2D Jacobi application on two production heterogeneous platforms.

452 citations


Proceedings ArticleDOI
04 Dec 1996
TL;DR: This work presents an algorithm that optimizes task frequencies and then schedules the resulting tasks with the limited computing resources available and is applicable to failure recovery and reconfiguration in real-time control systems.
Abstract: Most real-time computer-controlled systems are built in two separate steps, each in isolation: controller design and its digital implementation. Computational tasks that realize the control algorithms are usually scheduled by treating their execution times and periods as unchangeable parameters. Task scheduling therefore depends only on the limited computing resources available. On the other hand, controller design is primarily based on the continuous-time dynamics of the physical system being controlled. The set of tasks resulting from this controller design may not be schedulable with the limited computing resources available. Even if the given set of tasks is schedulable, the overall control performance may not be optimal in the sense that they do not make a full use of the computing resource. We propose an integrated approach to controller design and task scheduling. Specifically, task frequencies (or periods) are allowed to vary within a certain range as long as such a change does not affect critical control functions such as maintenance of system stability. We present an algorithm that optimizes task frequencies and then schedules the resulting tasks with the limited computing resources available. The proposed approach is also applicable to failure recovery and reconfiguration in real-time control systems.

Book
01 Sep 1996
TL;DR: The authors present the design and analysis of load distribution strategies for arbitrarily divisible loads in multiprocessor/multicomputer systems subject to the system constraints in the form of communication delays.
Abstract: From the Publisher: This book provides an in-depth study concerning a class of problems in the general area of load sharing and balancing in parallel and distributed systems. The authors present the design and analysis of load distribution strategies for arbitrarily divisible loads in multiprocessor/multicomputer systems subject to the system constraints in the form of communication delays. In particular, two system architectures - single-level tree or star network, and linear network - are thoroughly analyzed.

Proceedings ArticleDOI
13 May 1996
TL;DR: The proposed proportional share resource allocation algorithm provides support for dynamic operations, such as processes joining or leaving the competition, and for both fractional and non-uniform time quanta.
Abstract: We propose and analyze a proportional share resource allocation algorithm for realizing real-time performance in time-shared operating systems. Processes are assigned a weight which determines a share (percentage) of the resource they are to receive. The resource is then allocated in discrete-sized time quanta in such a manner that each process makes progress at a precise, uniform rate. Proportional share allocation algorithms are of interest because: they provide a natural means of seamlessly integrating real and non-real-time processing; they are easy to implement; they provide a simple and effective means of precisely controlling the real-time performance of a process; and they provide a natural means of policing so that processes that use more of a resource than they request have no ill-effect on well-behaved processes. We analyze our algorithm in the context of an idealized system in which a resource is assumed to be granted in arbitrarily small intervals of time and show that our algorithm guarantees that the difference between the service time that a process should receive and the service time it actually receives is optimally bounded by the size of a time quantum. In addition, the algorithm provides support for dynamic operations, such as processes joining or leaving the competition, and for both fractional and non-uniform time quanta. As a proof of concept we have implemented a prototype of a CPU scheduler under FreeBSD. The experimental results shows that our implementation performs within the theoretical bounds and hence supports real-time execution in a general purpose operating system.

Journal ArticleDOI
TL;DR: Five new on-line algorithms for servicing soft aperiodic requests in realtime systems, where a set of hard periodic tasks is scheduled using the Earliest Deadline First (EDF) algorithm, can achieve full processor utilization and enhance a periodic responsiveness.
Abstract: In this paper we present five new on-line algorithms for servicing soft aperiodic requests in realtime systems, where a set of hard periodic tasks is scheduled using the Earliest Deadline First (EDF) algorithm. All the proposed solutions can achieve full processor utilization and enhance aperiodic responsiveness, still guaranteeing the execution of the periodic tasks. Operation of the algorithms, performance, schedulability analysis, and implementation complexity are discussed and compared with classical alternative solutions, such as background and polling service. Extensive simulations show that algorithms with contained run-time overhead present nearly optimal responsiveness. A valuable contribution of this work is to provide the real-time system designer with a wide range of practical solutions which allow to balance efficiency against implementation complexity.

Proceedings ArticleDOI
24 Mar 1996
TL;DR: This work compares a variety of channel state dependent packet (CSDP) scheduling methods with a view towards enhancing the performance of the transport layer sessions and indicates that by employing a CSDP scheduler at the wireless LAN device driver level, significant improvement in the channel utilization can be achieved in typical wireless LAN configurations.
Abstract: Unlike wired networks, packets transmitted on wireless channels are often subject to burst errors which cause back to back packet losses. Most wireless LAN link layer protocols recover from packet losses by retransmitting lost segments. When the wireless channel is in a burst error state, most retransmission attempts fail thereby causing poor utilization of the wireless channel. Furthermore, in the event of multiple sessions sharing a wireless link, FIFO packet scheduling can cause the HOL blocking effect, resulting in unfair sharing of the bandwidth. This observation leads to a new class of packet dispatching methods which explicitly take the wireless channel characteristics into consideration in making packet dispatching decisions. We compare a variety of channel state dependent packet (CSDP) scheduling methods with a view towards enhancing the performance of the transport layer sessions. Our results indicate that by employing a CSDP scheduler at the wireless LAN device driver level, significant improvement in the channel utilization can be achieved in typical wireless LAN configurations.

Journal ArticleDOI
TL;DR: This paper focuses on the class of rate-controlled service (RCS) disciplines, in which traffic from each connection is reshaped at every hop, and develops end-to-end delay bounds for the general case where different reshapers are used at eachHop, and establishes that these bounds can be achieved when the shapers at each hop have the same "minimal" envelope.
Abstract: This paper addresses the problem of providing per-connection end-to-end delay guarantees in a high-speed network We consider a network comprised of store-and-forward packet switches, in which a packet scheduler is available at each output link We assume that the network is connection oriented and enforces some admission control which ensures that the source traffic conforms to specified traffic characteristics We concentrate on the class of rate-controlled service (RCS) disciplines, in which traffic from each connection is reshaped at every hop, and develop end-to-end delay bounds for the general case where different reshapers are used at each hop In addition, we establish that these bounds can also be achieved when the shapers at each hop have the same "minimal" envelope The main disadvantage of this class of service discipline is that the end-to-end delay guarantees are obtained as the sum of the worst-case delays at each node, but we show that this problem can be alleviated through "proper" reshaping of the traffic We illustrate the impact of this reshaping by demonstrating its use in designing RCS disciplines that outperform service disciplines that are based on generalized processor sharing (GPS) Furthermore, we show that we can restrict the space of "good" shapers to a family which is characterized by only one parameter We also describe extensions to the service discipline that make it work conserving and as a result reduce the average end-to-end delays

Journal ArticleDOI
TL;DR: Exact schedulability conditions are presented for three packet scheduling methods: earliest-deadline-first (EDF), static-priority (SP), and a novel scheduling method, referred to as rotating-priority-queues (RPQ), by characterizing the worst-case traffic with general subadditive functions, which can be applied to a large class of traffic models.
Abstract: To support the requirements for the transmission of continuous media, such as audio and video, multiservice packet-switching networks must provide service guarantees to connections, including guarantees on throughput, network delays, and network delay variations. For the most demanding applications, the network must offer a service which provides deterministically bounded delay guarantees, referred to as "bounded delay service." The admission control functions in a network with a bounded delay service require 'schedulability conditions' that detect violations of delay guarantees in a network switch. Exact schedulability conditions are presented for three packet scheduling methods: earliest-deadline-first (EDF), static-priority (SP), and a novel scheduling method, referred to as rotating-priority-queues (RPQ). By characterizing the worst-case traffic with general subadditive functions, the presented schedulability conditions can be applied to a large class of traffic models. Examples, which include actual MPEG video traces, are presented to demonstrate the trade-offs involved in selecting a packet scheduling method for a bounded delay service.

Proceedings ArticleDOI
17 Jun 1996
TL;DR: This study proposes a batching policy that schedules the video with the maximum factored queue length and shows that MFQ yields excellent empirical results in terms of standard performance measures such as average latency time, defection rates and fairness.
Abstract: In a video-on-demand environment, batching of video requests is often used to reduce I/O demand and improve throughput. Since viewers may defect if they experience long waits, a good video scheduling policy needs to consider not only the batch size but also the viewer defection probabilities and wait times. Two conventional scheduling policies for batching are first-come-first-served (FCFS) and maximum queue length (MOL). Neither of these policies lead to entirely satisfactory results. MQL tends to be too aggressive in scheduling popular videos by only considering the queue length to maximize batch size, while FCFS has the opposite effect. We introduce the notion of factored queue length and propose a batching policy that schedules the video with the maximum factored queue length. We refer to this as the MFQ policy. The factored queue length is obtained by weighting each video queue length with a factor which is biased against the more popular videos. An optimization problem is formulated to solve the best weighting factors for the various videos. A simulation is developed to compare the proposed MFQ policy with FCFS and MQL. Our study shows that MFQ yields excellent empirical results in terms of standard performance measures such as average latency time, defection rates and fairness.

Patent
14 Nov 1996
TL;DR: In this article, the authors propose a method to allocate bandwidth, fairly and dynamically, in a shared-media packet switched network to accommodate both elastic and inelastic applications by using a weighted fair queuing algorithm or a virtual clock algorithm to generate a sequence of upstream slot/transmission assignment grants.
Abstract: A method in accordance with the invention allocates bandwidth, fairly and dynamically, in a shared-media packet switched network to accommodate both elastic and inelastic applications. The method, executed by or in a head-end controller, allocates bandwidth transmission slots, converting requests for bandwidth into virtual scheduling times for granting access to the shared media. The method can use a weighted fair queuing algorithm or a virtual clock algorithm to generate a sequence of upstream slot/transmission assignment grants. The method supports multiple quality of service (QoS) classes via mechanisms which give highest priority to the service class with the most stringent QoS requirements.

01 Jan 1996
TL;DR: It appears that preemptive and non-preemptive scheduling are closely related and that the analysis of fixed versus dynamic scheduling might be unified according to the concept of higher priority busy period.
Abstract: Scheduling theory, as it applies to hard-real-time environment, has been widely studied in the last twenty years and it might be unclear to make it out within the plethora of results available. Our goal is first to collect in a single paper the results known for uniproces sor, non-idling, preemptive/non-preemptive, fixed/dynamic priority driven contexts, consid ering general task sets as a central figure for the description of possible processor loads. Second to establish new results when needed. In particular, optimality, feasibility conditions and worst-case response times are examined largely by utilizing the concepts of workload, processor demand and busy period. Some classic extensions such as jitter, resource sharing are also considered. Although this work is not oriented toward a formal comparison of these results, it appears that preemptive and non-preemptive scheduling are closely related and that the analysis of fixed versus dynamic scheduling might be unified according to the concept of higher priority busy period. In particular, we introduce the notion of deadline-d busy period for EDF sched ules, that we conjecture to be an interesting parallel of the level-i busy period, a concept already used in the analysis of fixed priority driven scheduling.

03 Oct 1996
TL;DR: A scheduling policy for complete, bounded execution of Kahn process network programs that operate on infinite streams of data and never terminate is presented, which can guarantee that programs execute forever with bounded buffering whenever possible.
Abstract: We present a scheduling policy for complete, bounded execution of Kahn process network programs. A program is a set of processes that communicate through a network of first-in first-out queues. In a complete execution, the program terminates if and only if all processes block attempting to consume data from empty communication channels. We are primarily interested in programs that operate on infinite streams of data and never terminate. In a bounded execution, the number of data elements buffered in each of the communication channels remains bounded. The Kahn process network model of computation is powerful enough that the questions of termination and bounded buffering are undecidable. No finite-time algorithm can decide these questions for all Kahn process network programs. Fortunately, because we are interested in programs that never terminate, our scheduler has infinite time and can guarantee that programs execute forever with bounded buffering whenever possible. Our scheduling policy has been implemented using Ptolemy, an object-oriented simulation and prototyping environment.

01 Jan 1996
TL;DR: A uniform, flexible approach is proposed for analysing the feasibility of deadline scheduled real-time systems and assumes sporadically periodic tasks with arbitrary deadlines, release jitter, and shared resources.
Abstract: A uniform, flexible approach is proposed for analysing the feasibility of deadline scheduled real-time systems. In its most general formulation, the analysis assumes sporadically periodic tasks with arbitrary deadlines, release jitter, and shared resources. System overheads of a tick driven scheduler implementation, and scheduling of soft aperiodic tasks are also accounted for. A procedure for the computation of task worst-case response times is also described for the same model. While this problem has been largely studied in the context of fixed priority systems, we are not aware of other works that have proposed a solution to it when deadline scheduling is assumed. The worst-case response time evaluation is a fundamental tool for analysing {\em end-to-end} timing constraints in distributed systems~\cite{Ti94b}.

Patent
28 Aug 1996
TL;DR: In this paper, the content sources request the schedule from a network resource scheduler and the scheduler determines at least a start time and a transfer rate for each content sources that can be accommodated.
Abstract: The transmission of data (e.g., a computer file) from one or more content sources over a network to one or more replicated servers is scheduled and performed according to the schedule. The content sources request the schedule from a network resource scheduler. The scheduler receives the requests and determines if and how the various requests can be accommodated. The scheduler determines at least a start time and a transfer rate for each of the content sources that can be accommodated.

Journal ArticleDOI
TL;DR: This work presents the state of the art for multiprocessor task scheduling and shows the rationale behind the concept of multip rocessor tasks, and the standard three-field notation is extended to accommodate multi-processor tasks.

Journal ArticleDOI
TL;DR: Results obtained with the proposed model do not indicate an exponential growth in the computational time required for larger problems, and are general enough to encompass both resource leveling and limited resource allocation problems unlike existing methods, which are class-dependent.
Abstract: A new approach for resource scheduling using genetic algorithms (GAs) is presented here. The methodology does not depend on any set of heuristic rules. Instead, its strength lies in the selection and recombination tasks of the GA to learn the domain of the specific project network. By this it is able to evolve improved schedules with respect to the objective function. Further, the model is general enough to encompass both resource leveling and limited resource allocation problems unlike existing methods, which are class-dependent. In this paper, the design and mechanisms of the model are described. Case studies with standard test problems are presented to demonstrate the performance of the GA-scheduler when compared against heuristic methods under various resource availability profiles. Results obtained with the proposed model do not indicate an exponential growth in the computational time required for larger problems.

Journal ArticleDOI
TL;DR: A novel branch-and-bound algorithm that branches on both discrete and continuous variables is proposed to address the large integrality gap in the formulation of this mixed integer linear programming (MILP) problem.

Book ChapterDOI
C. Lund1, S. Phillips1, N. Reingold1
23 Apr 1996
TL;DR: This paper presents a new algorithm, Fair Arbitrated Round Robin (FARR), for scheduling the crossbar of a high-speed input-buffered switch, which respects virtual circuit (VC) priorities and has per-VC fairness properties that have previously only been achieved in output- Buffered switches.
Abstract: The rapid growth of inter-networking and the popularity of ATM have resulted in a need for high-speed low-cost network components. This paper presents a new algorithm, Fair Arbitrated Round Robin (FARR), for scheduling the crossbar of a high-speed input-buffered switch. FARR respects virtual circuit (VC) priorities and has per-VC fairness properties that have previously only been achieved in output-buffered switches. Input-buffering is more cost-effective than output-buffering at high speeds, due to much more lenient memory speed requirements. Simulations are presented using a variety of work loads, traffic types and switch sizes. The simulations demonstrate the performance benefit of FARR over previous input-buffered switch algorithms and show that FARR performs similarly to Fair Prioritized Round Robin running on an output-buffered switch.

Patent
06 May 1996
TL;DR: In this paper, the authors propose a dispatcher model that maintains a dispatch queue for each processor and a separate global dispatch queue to unbound higher priority real-time threads in a multiprocessor system.
Abstract: The present invention provides a process scheduler or dispatcher for a multiprocessor system for real time applications. This embodiment of the present invention proposes a dispatcher model that maintains a dispatch queue for each processor and a separate global dispatch queue for unbound higher priority real time threads. A processor has its own queue and a dispatcher. Each queue has a separate schedule lock associated with it to protect scheduling operations. A processor's dispatcher selects a thread for execution from one of the queues in the system as a candidate thread to execute. When a candidate thread is selected for execution, the processor proceeds to verify against threads in the global real time queue and the processor's own dispatch queue to select a highest priority runnable thread in the system. Thus, the present invention allows the dispatcher to prevent race conditions and minimize lock contention while assuring that high-priority threads are dispatched as quickly as possible. The present invention is implemented by a synchronization between the operations of dispatching a thread and making a thread runnable.

Journal ArticleDOI
TL;DR: A simple greedy algorithm is presented for the problem of scheduling parallel programs represented as directed acyclic task graphs for execution on distributed memory parallel architectures which runs in O(n(n lg n+e) time, which is n times faster than the currently best known algorithm for this problem.
Abstract: This paper addresses the problem of scheduling parallel programs represented as directed acyclic task graphs for execution on distributed memory parallel architectures. Because of the high communication overhead in existing parallel machines, a crucial step in scheduling is task clustering, the process of coalescing fine grain tasks into single coarser ones so that the overall execution time is minimized. The task clustering problem is NP-hard, even when the number of processors is unbounded and task duplication is allowed. A simple greedy algorithm is presented for this problem which, for a task graph with arbitrary granularity, produces a schedule whose makespan is at most twice optimal. Indeed, the quality of the schedule improves as the granularity of the task graph becomes larger. For example, if the granularity is at least 1/2, the makespan of the schedule is at most 5/3 times optimal. For a task graph with n tasks and e inter-task communication constraints, the algorithm runs in O(n(n lg n+e)) time, which is n times faster than the currently best known algorithm for this problem. Similar algorithms are developed that produce: (1) optimal schedules for coarse grain graphs; (2) 2-optimal schedules for trees with no task duplication; and (3) optimal schedules for coarse grain trees with no task duplication.

Book ChapterDOI
16 Apr 1996
TL;DR: It is argued that by identifying these assumptions explicitly, it is possible to reach a level of convergence in the space of job schedulers for parallel supercomputers by associating a suitable cost function with the execution of each job.
Abstract: The space of job schedulers for parallel supercomputers is rather fragmented, because different researchers tend to make different assumptions about the goals of the scheduler, the information that is available about the workload, and the operations that the scheduler may perform. We argue that by identifying these assumptions explicitly, it is possible to reach a level of convergence. For example, it is possible to unite most of the different assumptions into a common framework by associating a suitable cost function with the execution of each job. The cost function reflects knowledge about the job and the degree to which it fits the goals of the system. Given such cost functions, scheduling is done to maximize the system's profit.