scispace - formally typeset
Search or ask a question

Showing papers on "Scheduling (computing) published in 1994"


Book
15 Jul 1994
TL;DR: Scheduling will serve as an essential reference for professionals working on scheduling problems in manufacturing and computing environments and Graduate students in operations management, operations research, industrial engineering and computer science will find the book to be an accessible and invaluable resource.
Abstract: This book on scheduling covers theoretical models as well as scheduling problems in the real world. Author Michael Pinedo also includes a CD that contains slide-shows from industry and movies dealing with implementations of scheduling systems. The book consists of three parts. The first part focuses on deterministic scheduling with the associated combinatorial problems. The second part covers probabilistic scheduling models. In this part it is assumed that processing times and other problem data are not known in advance. The third part deals with scheduling in practice. It covers heuristics that are popular with practitioners and discusses system design and development issues. Each chapter contains a series of computational and theoretical exercises. This book is of interest to theoreticians and practitioners alike. Graduate students in operations management, operations research, industrial engineering and computer science will find the book to be an accessible and invaluable resource. Scheduling will serve as an essential reference for professionals working on scheduling problems in manufacturing and computing environments. Michael Pinedo is the Julius Schlesinger Professor of Operations Management at New York University.

6,209 citations


Proceedings ArticleDOI
14 Nov 1994
TL;DR: A new metric for cpu energy performance, millions-of-instructions-per-joule (MIPJ), and several methods for varying the clock speed dynamically under control of the operating system, and examine the performance of these methods against workstation traces.
Abstract: The energy usage of computer systems is becoming more important, especially for battery operated systems. Displays, disks, and cpus, in that order, use the most energy. Reducing the energy used by displays and disks has been studied elsewhere; this paper considers a new method for reducing the energy used by the cpu. We introduce a new metric for cpu energy performance, millions-of-instructions-per-joule (MIPJ). We examine a class of methods to reduce MIPJ that are characterized by dynamic control of system clock speed by the operating system scheduler. Reducing clock speed alone does not reduce MIPJ, since to do the same work the system must run longer. However, a number of methods are available for reducing energy with reduced clock-speed, such as reducing the voltage [Chandrakasan et al 1992][Horowitz 1993] or using reversible [Younis and Knight 1993] or adiabatic logic [Athas et al 1994].What are the right scheduling algorithms for taking advantage of reduced clock-speed, especially in the presence of applications demanding ever more instructions-per-second? We consider several methods for varying the clock speed dynamically under control of the operating system, and examine the performance of these methods against workstation traces. The primary result is that by adjusting the clock speed at a fine grain, substantial CPU energy can be saved with a limited impact on performance.

1,225 citations


Journal ArticleDOI
TL;DR: The paper illustrates how a window-based analysis technique can be used to find the worst-case response times of a distributed task set.

753 citations


Journal ArticleDOI
TL;DR: An efficient method based on genetic algorithms is developed to solve the multiprocessor scheduling problem and results comparing the proposed genetic algorithm, the list scheduling algorithm, and the optimal schedule using random task graphs, and a robot inverse dynamics computational task graph are presented.
Abstract: The problem of multiprocessor scheduling can be stated as finding a schedule for a general task graph to be executed on a multiprocessor system so that the schedule length can be minimized. This scheduling problem is known to be NP-hard, and methods based on heuristic search have been proposed to obtain optimal and suboptimal solutions. Genetic algorithms have recently received much attention as a class of robust stochastic search algorithms for various optimization problems. In this paper, an efficient method based on genetic algorithms is developed to solve the multiprocessor scheduling problem. The representation of the search node is based on the order of the tasks being executed in each individual processor. The genetic operator proposed is based on the precedence relations between the tasks in the task graph. Simulation results comparing the proposed genetic algorithm, the list scheduling algorithm, and the optimal schedule using random task graphs, and a robot inverse dynamics computational task graph are presented. >

718 citations


Proceedings ArticleDOI
30 Nov 1994
TL;DR: This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models and characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.
Abstract: Module scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative module scheduling, that is capable of dealing with realistic machine models. This paper also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.

696 citations


Proceedings ArticleDOI
20 Nov 1994
TL;DR: This paper gives the first provably good work-stealing scheduler for multithreaded computations with dependencies, and shows that the expected time T/sub P/ to execute a fully strict computation on P processors using this work- Stealing Scheduler is T/ Sub P/=O(T/sub 1//P+T/ sub /spl infin//), where T/ sub 1/ is the minimum serial execution time of the multith readed computation and T/
Abstract: This paper studies the problem of efficiently scheduling fully strict (i.e., well-structured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMD-style computation is "work stealing," in which processors needing work steal computational threads from other processors. In this paper, we give the first provably good work-stealing scheduler for multithreaded computations with dependencies. Specifically, our analysis shows that the expected time T/sub P/ to execute a fully strict computation on P processors using our work-stealing scheduler is T/sub P/=O(T/sub 1//P+T/sub /spl infin//), where T/sub 1/ is the minimum serial execution time of the multithreaded computation and T/sub /spl infin// is the minimum execution time with an infinite number of processors. Moreover, the space S/sub P/ required by the execution satisfies S/sub P//spl les/S/sub 1/P. We also show that the expected total communication of the algorithm is at most O(T/sub /spl infin//S/sub max/P), where S/sub max/ is the size of the largest activation record of any thread, thereby justifying the folk wisdom that work-stealing schedulers are more communication efficient than their work-sharing counterparts. All three of these bounds are existentially optimal to within a constant factor. >

660 citations


Journal ArticleDOI
TL;DR: A fast branch and bound algorithm for the job-shop scheduling problem has been developed and it solves the 10 × 10 benchmark problem which has been open for more than 20 years.

463 citations


Proceedings ArticleDOI
15 May 1994
TL;DR: The authors designed a processor capacity reservation mechanism that isolates programs from the timing and execution characteristics of other programs in the same way that a memory protection system isolates them from outside memory accesses.
Abstract: Multimedia applications have timing requirements that cannot generally be satisfied using the time-sharing scheduling algorithms of general purpose operating systems. The authors provide the predictability of real-time systems while retaining the flexibility of a time-sharing system. They designed a processor capacity reservation mechanism that isolates programs from the timing and execution characteristics of other programs in the same way that a memory protection system isolates them from outside memory accesses. In the paper, they describe a scheduling framework that supports reservation and admission control, and introduce a novel reserve abstraction, specifically designed for the microkernel architecture, for measuring and controlling processor usage. The authors have implemented processor capacity reserves in Real-Time Mach, and they describe the performance of their system on several types of applications. >

451 citations


Proceedings ArticleDOI
07 Dec 1994
TL;DR: An idealised scheduling analysis for the CAN real-time bus is derived, and two actual interface chips are studied to see how the analysis can be applied.
Abstract: The increasing use of communication networks in time-critical applications presents engineers with fundamental problems with the determination of response times of communicating distributed processes. Although there has been some work on the analysis of communication protocols, most of this is for idealised networks. Experience with single-processor scheduling analysis has shown that models which abstract away from implementation details are at best very pessimistic, and at worst lead to an unschedulable system being deemed schedulable. In this paper, we derive an idealised scheduling analysis for the CAN real-time bus, and then study two actual interface chips to see how the analysis can be applied. >

449 citations


Journal ArticleDOI
01 Apr 1994
TL;DR: Petri net modeling combined with heuristic search provides a new scheduling method for flexible manufacturing systems that can handle features such as routing flexibility, shared resources, lot sizes and concurrency.
Abstract: Petri net modeling combined with heuristic search provides a new scheduling method for flexible manufacturing systems. The method formulates a scheduling problem with a Petri net model. Then, it generates and searches a partial reachability graph to find an optimal or near optimal feasible schedule in terms of the firing sequence of the transitions of the Petri net model. The method can handle features such as routing flexibility, shared resources, lot sizes and concurrency. By following the generated schedule, potential deadlocks in the Petri net model and the system can be avoided. Hence the analytical overhead to guarantee the liveness of the model and the system is eliminated. Some heuristic functions for efficient search are explored and the experimental results are presented. >

401 citations


Journal ArticleDOI
TL;DR: A robust scheduling protocol is proposed which is unique in providing a topology transparent solution to scheduled access in multi-hop mobile radio networks and is robust in the presence of mobile nodes.
Abstract: Transmissions scheduling is a key design problem in packet radio networks, relevant to TDMA and CDMA systems. A large number of topology-dependent scheduling algorithms are available, in which changes of topology inevitably require recomputation of transmission schedules. The need for constant adaptation of schedules to mobile topologies entails significant, sometime insurmountable, problems. These are the protocol overhead due to schedule recomputation, performance penalty due to suspension of transmissions during schedule reorganization, exchange of control message and new schedule broadcast. Furthermore, if topology changes faster than the rate at which new schedules can be recomputed and distributed, the network can suffer a catastrophic failure. The authors propose a robust scheduling protocol which is unique in providing a topology transparent solution to scheduled access in multi-hop mobile radio networks. The proposed solution adds the main advantages of random access protocols to scheduled access. Similarly to random access it is robust in the presence of mobile nodes. Unlike random access, however, it does not suffer from inherent instability, and performance deterioration due to packet collisions. Unlike current scheduled access protocols, the transmission schedules of the proposed solution are independent of topology changes, and channel access is inherently fair and traffic adaptive. >

Book
01 Oct 1994
TL;DR: Polynomial and exponential-time optimization algorithms as well as approximation and heuristic ones are presented using a Pascal-like notation, before being discussed in the light of particular problems.
Abstract: A theoretical and application-oriented analysis of deterministic scheduling problems arising in computer and manufacturing environments. The important classical results are surveyed with particular attention paid to single-processor scheduling, along with general models such as resource-constrained scheduling, flexible flow shops, dynamic job shops, and special flexible manufacturing systems. Polynomial and exponential-time optimization algorithms as well as approximation and heuristic ones are presented using a Pascal-like notation, before being discussed in the light of particular problems. Basic concepts from scheduling theory and related fields are described to assist less advanced readers.

Journal ArticleDOI
TL;DR: A fast parallel algorithm is given that provides good solutions to very large problems in a very short computation time and identifies a type of problem for which taboo search provides an optimal solution in a polynomial mean time in practice.
Abstract: We apply the global optimization technique called taboo search to the job shop scheduling problem and show that our method is typically more efficient than the shifting bottleneck procedure, and also more efficient than a recently proposed simulated annealing implementation. We also identify a type of problem for which taboo search provides an optimal solution in a polynomial mean time in practice, while an implementation of the shifting bottleneck procedure seems to take an exponential amount of computation time. Included are computational results that establish new best solutions for a number of benchmark problems from the literature. Finally, we give a fast parallel algorithm that provides good solutions to very large problems in a very short computation time. INFORMS Journal on Computing, ISSN 1091-9856, was published as ORSA Journal on Computing from 1989 to 1995 under ISSN 0899-1499.

Proceedings ArticleDOI
01 May 1994
TL;DR: This work examines the impact of complex logical-to-physical mappings and large prefetching caches on scheduling effectiveness and concludes that the cyclical scan algorithm, which always schedules requests in ascending logical order, achieves the highest performance among seek-reducing algorithms for such workloads.
Abstract: Disk subsystem performance can be dramatically improved by dynamically ordering, or scheduling, pending requests. Via strongly validated simulation, we examine the impact of complex logical-to-physical mappings and large prefetching caches on scheduling effectiveness. Using both synthetic workloads and traces captured from six different user environments, we arrive at three main conclusions: (1) Incorporating complex mapping information into the scheduler provides only a marginal (less than 2%) decrease in response times for seek-reducing algorithms. (2) Algorithms which effectively utilize prefetching disk caches provide significant performance improvements for workloads with read sequentiality. The cyclical scan algorithm (C-LOOK), which always schedules requests in ascending logical order, achieves the highest performance among seek-reducing algorithms for such workloads. (3) Algorithms that reduce overall positioning delays produce the highest performance provided that they recognize and exploit a prefetching cache.

Journal ArticleDOI
A. L. Narasimha Reddy1, Jim Wyllie1
TL;DR: This work restricts itself to the support provided at the server, with special emphasis on two service phases: disk scheduling and SCSI bus contention, and analyze how trade-offs that involve buffer space affect the performance of scheduling policies.
Abstract: In future computer system design, I/O systems will have to support continuous media such as video and audio, whose system demands are different from those of data such as text. Multimedia computing requires us to focus on designing I/O systems that can handle real-time demands. Video- and audio-stream playback and teleconferencing are real-time applications with different I/O demands. We primarily consider playback applications which require guaranteed real-time I/O throughput. In a multimedia server, different service phases of a real-time request are disk, small computer systems interface (SCSI) bus, and processor scheduling. Additional service might be needed if the request must be satisfied across a local area network. We restrict ourselves to the support provided at the server, with special emphasis on two service phases: disk scheduling and SCSI bus contention. When requests have to be satisfied within deadlines, traditional real-time systems use scheduling algorithms such as earliest deadline first (EDF) and least slack time first. However, EDF makes the assumption that disks are preemptable, and the seek-time overheads of its strict real-time scheduling result in poor disk utilization. We can provide the constant data rate necessary for real-time requests in various ways that require trade-offs. We analyze how trade-offs that involve buffer space affect the performance of scheduling policies. We also show that deferred deadlines, which increase buffer requirements, improve system performance significantly. >

Journal ArticleDOI
TL;DR: Most classical n-job, non-preemptive, single machine scheduling models, i.e. makespan, flow-time, total tardiness, number of tardy jobs, etc, are studied and it is shown that all these models remain polynomially solvable.

Patent
26 Jan 1994
TL;DR: The thread group structure maintains collective timeslice and CPU accounting for all threads in the group, each individual thread has a local scheduling priority for scheduling among the threads in its group as discussed by the authors.
Abstract: Closely related processing threads within a process in a multiprocessor system are collected into thread groups which are globally scheduled as a group based on the thread group structure's priority and scheduling parameters. The thread group structure maintains collective timeslice and CPU accounting for all threads in the group. Within each thread group, each individual thread has a local scheduling priority for scheduling among the threads in its group. The system utilizes a hierarchy of processing levels and run queues to facilitate affining thread groups with processors or groups of processors when possible. The system will tend to balance out the workload among system processors and will migrate threads groups up and down through processing levels to increase cache hits and overall performance. The system is periodically reset to avoid long term unbalanced operation conditions.

Journal ArticleDOI
TL;DR: An approach to designing systems that are capable of taking their own computational resources into consideration during planning and problem solving by using expectations about the performance of decision-making procedures and preferences over the outcomes resulting from applying those procedures.

Proceedings ArticleDOI
07 Dec 1994
TL;DR: Four new on-line algorithms for servicing soft aperiodic requests in real-time systems, where a set of hard periodic tasks is scheduled using the Earliest Deadline First (EDF) algorithm are presented.
Abstract: We present four new on-line algorithms for servicing soft aperiodic requests in real-time systems, where a set of hard periodic tasks is scheduled using the Earliest Deadline First (EDF) algorithm. All the proposed solutions can achieve full processor utilization and enhance aperiodic responsiveness, still guaranteeing the execution of the periodic tasks. Operation of the algorithms, performance, schedulability analysis, and implementation complexity are discussed and compared with classical alternative solutions, such as background and polling service. Extensive simulations show that algorithms with contained run-time overhead present nearly optimal responsiveness. A valuable contribution of this work is to provide the real-time system designer with a wide range of practical solutions which allow to balance efficiency against implementation complexity. >

Journal ArticleDOI
01 Jan 1994
TL;DR: It is shown how generalized rate-monotonic scheduling theory can be applied in practical system development, where special attention must be given to facilitate concurrent development by geographically distributed programming teams and the reuse of existing hardware and software components.
Abstract: Real-time computing systems are used to control telecommunication systems, defense systems, avionics, and modern factories. Generalized rate-monotonic scheduling theory, is a recent development that has had large impact on the development of real-time systems and open standards. In this paper we provide an up-to-date and self-contained review of generalized rate-monotonic scheduling theory. We show how this theory can be applied in practical system development, where special attention must be given to facilitate concurrent development by geographically distributed programming teams and the reuse of existing hardware and software components. >

Journal ArticleDOI
TL;DR: The authors propose a new loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and co-locate loop iterations with the necessary data and conclude that loop scheduling algorithms for shared-memory multiprocessors cannot afford to ignore the location of data, particularly in light of the increasing disparity between processor and memory speeds.
Abstract: Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on different processors. Previous approaches to loop scheduling attempted to achieve the minimum completion time by distributing the workload as evenly as possible while minimizing the number of synchronization operations required. The authors consider a third dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to nonlocal data. They show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a significant performance penalty on modern shared-memory multiprocessors. They propose a new loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and co-locate loop iterations with the necessary data. They compare the performance of this new algorithm to other known algorithms by using five representative kernel programs on a Silicon Graphics multiprocessor workstation, a BBN Butterfly, a Sequent Symmetry, and a KSR-1, and show that the new algorithm offers substantial performance improvements, up to a factor of 4 in some cases. The authors conclude that loop scheduling algorithms for shared-memory multiprocessors cannot afford to ignore the location of data, particularly in light of the increasing disparity between processor and memory speeds. >

Patent
28 Oct 1994
TL;DR: A scheduler with admission control in a continuous media file server is presented in this article, which is based on a combination of rate-monotonic and weighted round-robin scheduling schemes.
Abstract: A scheduler with admissions control in a continuous media file server is presented. The scheduler supports multiple classes of tasks with diverse performance requirements. The scheduler is based on a combination of rate-monotonic and weighted round-robin scheduling schemes. Scheduling is accomplished in a hierarchical manner. Isochronous tasks have the highest priority and are scheduled first followed by real-time and general-purpose tasks. Isochronous tasks run periodically and are invoked by a timer interrupt set for each task. After scheduling the isochronous tasks, the scheduler alternates between the real-time tasks and the general-purpose tasks using a weighted round-robin scheme.

Journal ArticleDOI
Qin Zheng1, K.G. Shin
TL;DR: The goal of this paper is to lay a mathematical basis for the problem of establishing real-time channels by deriving a necessary and sufficient condition for the schedulability of a set of channels over a link, and developing an efficient method for calculating the minimum delay bound over a links for each channel.
Abstract: There are numerous applications which require packets to be delivered within pre-specified delay bounds in point-to-point packet-switched networks. To meet this requirement, we define a real-time channel as a unidirectional connection between two nodes in such a network that guarantees every packet to be delivered before a user-defined, end-to-end deadline. The goal of this paper is to lay a mathematical basis for the problem of establishing real-time channels by (i) deriving a necessary and sufficient condition for the schedulability of a set of channels over a link, and (ii) developing an efficient method for calculating the minimum delay bound over a link for each channel. Given the traffic characteristics of a channel, our results can be used to check whether or not every packet will be delivered within a pre-specified delay bound. The results are also applicable to a wide variety of real-time task scheduling problems. >

Journal ArticleDOI
Claude Le Pape1
TL;DR: It is hoped-and expected-that object-oriented constraint programming tools like SCHEDULE will enable the industry to make decisive steps toward the implementation of 'state-of-the-art' highly flexible, constraint-based scheduling applications.
Abstract: It has been argued that the use of constraint-based techniques and tools enables the implementation of precise, flexible efficient and extensible scheduling systems; precise and flexible as the system can take into account any constraint expressible in the constraint language; efficient in as much as highly optimised constraint propagation procedures are now available; extensible as the consideration of a new type of constraint may require (especially in an object-oriented framework) only an extension to the constraint system or, in the worst case, the implementation of additional decision-making modules (without needs for modification of the existing code). The paper presents ILOG SCHEDULE, a C++ library enabling the representation of a wide collection of scheduling constraints in terms of 'resources' and 'activities'. ILOG SCHEDULE is based on SOLVER, the generic software tool for object-oriented constraint programming from ILOG. SOLVER variables and constraints can be accessed from SCHEDULE activities and resources. As a result, SCHEDULE users can make use of SOLVER to represent specific constraints, and implement and combine the specific problem-solving strategies that are the most appropriate for the scheduling application under consideration. It is hoped-and expected-that object-oriented constraint programming tools like SCHEDULE will enable the industry to make decisive steps toward the implementation of 'state-of-the-art' highly flexible, constraint-based scheduling applications.

Patent
Elizabeth M Sisley1, John Collins1
25 Feb 1994
TL;DR: In this paper, a modified "best-first" search technique that combines optimization, artificial intelligence, and constraint processing to arrive at near-optimal assignment and scheduling solutions is presented.
Abstract: A system and method for assigning and scheduling resource requests to resource providers use a modified "best-first" search technique that combines optimization, artificial intelligence, and constraint-processing to arrive at near-optimal assignment and scheduling solutions. In response to changes in a dynamic resource environment, potential changes to an existing assignment set are evaluated in a search for a better solution. New calls are assigned and scheduled as they are received, and the assignment set is readjusted as the field service environment changes, resulting in global optimization. Each search operation is in response to either an incremental change to the assignment set such as adding a new resource request, removing a pending resource request, reassigning a pending resource request, or to a request for further evaluation. Thus, the search technique assumes that the existing assignment set is already optimized, and limits the task only to evaluating the effects of the incremental change. In addition, each search operation produces a complete assignment and scheduling solution. Consequently, the search can be terminated to accept the best solution generated so far, making the technique an "anytime" search.

Journal ArticleDOI
TL;DR: A new technique for obtaining upper and lower bounds on the performance of Markovian queueing networks and scheduling policies is introduced, and analytic bounds which improve upon Kingman's bound (1970) for E/sub 2//M/1 queues are obtained.
Abstract: Except for the class of queueing networks and scheduling policies admitting a product form solution for the steady-state distribution, little is known about the performance of such systems. For example, if the priority of a part depends on its class (e.g., the buffer that the part is located in), then there are no existing results on performance, or even stability. In most applications such as manufacturing systems, however, one has to choose a control or scheduling policy, i.e., a priority discipline, that optimizes a performance objective. In this paper the authors introduce a new technique for obtaining upper and lower bounds on the performance of Markovian queueing networks and scheduling policies. Assuming stability, and examining the consequence of a steady state for general quadratic forms, the authors obtain a set of linear equality constraints on the mean values of certain random variables that determine the performance of the system. Further, the conservation of time and material gives an augmenting set of linear equality and inequality constraints. Together, these allow the authors to bound the performance, either above or below, by solving a linear program. The authors illustrate this technique on several typical problems of interest in manufacturing systems. For an open re-entrant line modeling a semiconductor plant, the authors plot a bound on the mean delay (called cycle-time) as a function of line loading. It is shown that the last buffer first serve policy is almost optimal in light traffic. For another such line, it is shown that it dominates the first buffer first serve policy. For a set of open queueing networks, the authors compare their lower bounds with those obtained by another method of Ou and Wein (1992). For a closed queueing network, the authors bracket the performance of all buffer priority policies, including the suggested priority policy of Harrison and Wein (1990). The authors also study the asymptotic heavy traffic limits of the lower and upper bounds. For a manufacturing system with machine failures, it is shown how the performance changes with failure and repair rates. For systems with finite buffers, the authors show how to bound the throughput. Finally, the authors illustrate the application of their method to GI/GI/1 queues. The authors obtain analytic bounds which improve upon Kingman's bound (1970) for E/sub 2//M/1 queues. >

Proceedings ArticleDOI
07 Dec 1994
TL;DR: The objective of this paper is to develop a scheme that allows for the dynamic scheduling and guaranteeing of distributed processes communicating via synchronous primitives, and a combination of off-line and on-line scheduling is performed.
Abstract: Many distributed real-time applications are structured as a set of processes communicating through synchronous channels. Unfortunately, process interactions and especially synchronous communications make the problem of predictably scheduling the tasks more complex. In distributed systems the local and remote tasks as well as the messages over the network must be properly scheduled and synchronized to meet the deadlines of the application. To find such a, schedule is not an easy task, in fact, this problem is NP complete even if one has complete knowledge of the future arrival times for all the processes in the system. The objective of this paper is to develop a scheme that allows for the dynamic scheduling and guaranteeing of distributed processes communicating via synchronous primitives. For efficiency reasons a combination of off-line and on-line scheduling is performed. Precedence and communication constraints are converted off-line into pseudo-deadlines for each task, enabling efficient on-line processing. The on-line scheduling operates in parallel at the sites involved in the distributed computation, further obtaining efficiency. The overall end-to-end scheduling includes the joint and coordinated scheduling of tasks and messages in a reflective memory distributed architecture. >

Patent
31 Oct 1994
TL;DR: In this paper, a method and apparatus for scheduling the transmission of a number of data streams over a common communications link is presented, where each of the data streams conforms to a corresponding set of flow control parameters.
Abstract: A method and apparatus for scheduling the transmission of a number of data streams over a common communications link, where each of the data streams conforms to a corresponding set of flow control parameters. Each of the data streams to be transmitted on the communications link is stored in a corresponding queue. The status of each queue is maintained, and a target transmission time is calculated for each queue. Signals are then generated for each queue at a time at least after the target transmission time, and these signals are used to indicate to a corresponding queue that is can transmit a cell on the link. Upon reception of a corresponding signal, a queue then transmits at least one cell onto the communications link.

Patent
04 May 1994
TL;DR: In a parallel data processing system, very long instruction words (VLIW) define operations able to be executed in parallel as mentioned in this paper, and the VLIWs corresponding to plural threads of computation are made available to the processing system simultaneously.
Abstract: In a parallel data processing system, very long instruction words (VLIW) define operations able to be executed in parallel. The VLIWs corresponding to plural threads of computation are made available to the processing system simultaneously. Each processing unit pipeline includes a synchronizer stage for selecting one of the plural threads of computation for execution in that unit. The synchronizers allow the plural units to select operations from different thread instruction words such that execution of VLIWs is interleaved across the plural units. The processors are grouped in clusters of processors which share register files. Cluster outputs may be stored directly in register files of other clusters through a cluster switch.

Journal ArticleDOI
01 Nov 1994
TL;DR: This paper examines the effects of OS scheduling and page migration policies on the performance of compute servers for multiprogramming and parallel application workloads, and suggests that policies based only on TLB miss information can be quite effective, and useful for addressing the data distribution problems of space-sharing schedulers.
Abstract: Several cache-coherent shared-memory multiprocessors have been developed that are scalable and offer a very tight coupling between the processing resources. They are therefore quite attractive for use as compute servers for multiprogramming and parallel application workloads. Process scheduling and memory management, however, remain challenging due to the distributed main memory found on such machines. This paper examines the effects of OS scheduling and page migration policies on the performance of such compute servers. Our experiments are done on the Stanford DASH, a distributed-memory cache-coherent multiprocessor. We show that for our multiprogramming workloads consisting of sequential jobs, the traditional Unix scheduling policy does very poorly. In contrast, a policy incorporating cluster and cache affinity along with a simple page-migration algorithm offers up to two-fold performance improvement. For our workloads consisting of multiple parallel applications, we compare space-sharing policies that divide the processors among the applications to time-slicing policies such as standard Unix or gang scheduling. We show that space-sharing policies can achieve better processor utilization due to the operating point effect, but time-slicing policies benefit strongly from user-level data distribution. Our initial experience with automatic page migration suggests that policies based only on TLB miss information can be quite effective, and useful for addressing the data distribution problems of space-sharing schedulers.