scispace - formally typeset
Search or ask a question

Showing papers on "Scheduling (computing) published in 2008"


Journal ArticleDOI
TL;DR: An extensive review of the scheduling literature on models with setup times (costs) from then to date covering more than 300 papers is provided, which classifies scheduling problems into those with batching and non-batching considerations, and with sequence-independent and sequence-dependent setup times.

1,264 citations


Journal ArticleDOI
Onur Mutlu1, Thomas Moscibroda1
01 Jun 2008
TL;DR: A parallelism-aware batch scheduler that seamlessly incorporates support for system-level thread priorities and can provide different service levels, including purely opportunistic service, to threads with different priorities, and is also simpler to implement than STFM.
Abstract: In a chip-multiprocessor (CMP) system, the DRAM system isshared among cores. In a shared DRAM system, requests from athread can not only delay requests from other threads by causingbank/bus/row-buffer conflicts but they can also destroy other threads’DRAM-bank-level parallelism. Requests whose latencies would otherwisehave been overlapped could effectively become serialized. As aresult both fairness and system throughput degrade, and some threadscan starve for long time periods.This paper proposes a fundamentally new approach to designinga shared DRAM controller that provides quality of service to threads,while also improving system throughput. Our parallelism-aware batchscheduler (PAR-BS) design is based on two key ideas. First, PARBSprocesses DRAM requests in batches to provide fairness and toavoid starvation of requests. Second, to optimize system throughput,PAR-BS employs a parallelism-aware DRAM scheduling policythat aims to process requests from a thread in parallel in the DRAMbanks, thereby reducing the memory-related stall-time experienced bythe thread. PAR-BS seamlessly incorporates support for system-levelthread priorities and can provide different service levels, includingpurely opportunistic service, to threads with different priorities.We evaluate the design trade-offs involved in PAR-BS and compareit to four previously proposed DRAM scheduler designs on 4-, 8-, and16-core systems. Our evaluations show that, averaged over 100 4-coreworkloads, PAR-BS improves fairness by 1.11X and system throughputby 8.3% compared to the best previous scheduling technique, Stall-Time Fair Memory (STFM) scheduling. Based on simple request prioritizationrules, PAR-BS is also simpler to implement than STFM.

575 citations


Patent
Hiroyuki Ishii1
26 Feb 2008
TL;DR: In this paper, a base station apparatus for performing time and frequency scheduling in uplink packet access with an interference amount measurement part configured to measure an uplink interference amount for each interference measurement unit which comprises a predetermined period and a predetermined number of frequency blocks was provided.
Abstract: The object is achieved by providing a base station apparatus for performing time and frequency scheduling in uplink packet access with: an interference amount measurement part configured to measure an uplink interference amount for each interference amount measurement unit which comprises a predetermined period and a predetermined number of frequency blocks; an interference amount determination part configured to determine whether the uplink interference amount satisfies a predetermined condition; and an overload indicator reporting part configured to report an overload indicator to a neighboring cell when the predetermined condition is satisfied.

498 citations


Journal ArticleDOI
01 Jun 2008
TL;DR: This work proposes a new, self-optimizing memory controller design that operates using the principles of reinforcement learning (RL), and shows that an RL-based memory controller improves the performance of a set of parallel applications run on a 4-core CMP by 19% on average and it improves DRAM bandwidth utilization by 22% compared to a state-of-the-art controller.
Abstract: Efficiently utilizing off-chip DRAM bandwidth is a critical issuein designing cost-effective, high-performance chip multiprocessors(CMPs). Conventional memory controllers deliver relativelylow performance in part because they often employ fixed,rigid access scheduling policies designed for average-case applicationbehavior. As a result, they cannot learn and optimizethe long-term performance impact of their scheduling decisions,and cannot adapt their scheduling policies to dynamic workloadbehavior.We propose a new, self-optimizing memory controller designthat operates using the principles of reinforcement learning (RL)to overcome these limitations. Our RL-based memory controllerobserves the system state and estimates the long-term performanceimpact of each action it can take. In this way, the controllerlearns to optimize its scheduling policy on the fly to maximizelong-term performance. Our results show that an RL-basedmemory controller improves the performance of a set of parallelapplications run on a 4-core CMP by 19% on average (upto 33%), and it improves DRAM bandwidth utilization by 22%compared to a state-of-the-art controller.

484 citations


Proceedings ArticleDOI
19 May 2008
TL;DR: A novel dynamic greedy algorithm for the formation of the clusters of cooperating BSs is presented and it is shown that a dynamic clustering approach with a cluster consisting of 2 cells outperforms static coordination schemes with much larger cluster sizes.
Abstract: Multi-cell cooperative processing (MCP) has recently attracted a lot of attention because of its potential for co-channel interference (CCI) mitigation and spectral efficiency increase. MCP inevitably requires increased signaling overhead and inter-base communication. Therefore in practice, only a limited number of base stations (BSs) can cooperate in order for the overhead to be affordable. The intrinsic problem of which BSs shall cooperate in a realistic scenario has been only partially investigated. In this contribution linear beamforming has been considered for the sum-rate maximisation of the uplink. A novel dynamic greedy algorithm for the formation of the clusters of cooperating BSs is presented for a cellular network incorporating MCP. This approach is chosen to be evaluated under a fair MS scheduling scenario (round-robin). The objective of the clustering algorithm is sum-rate maximisation of the already selected MSs. The proposed cooperation scheme is compared with some fixed cooperation clustering schemes. It is shown that a dynamic clustering approach with a cluster consisting of 2 cells outperforms static coordination schemes with much larger cluster sizes.

434 citations


Proceedings ArticleDOI
05 Mar 2008
TL;DR: This paper is the first to study the impact of the VMM scheduler on performance using multiple guest domains concurrently running different types of applications, and offers insight into the key problems in VMM scheduling for I/O and motivates future innovation in this area.
Abstract: This paper explores the relationship between domain scheduling in avirtual machine monitor (VMM) and I/O performance. Traditionally, VMM schedulers have focused on fairly sharing the processor resources among domains while leaving the scheduling of I/O resources as asecondary concern. However, this can resultin poor and/or unpredictable application performance, making virtualization less desirable for applications that require efficient and consistent I/O behavior.This paper is the first to study the impact of the VMM scheduler on performance using multiple guest domains concurrently running different types of applications. In particular, different combinations of processor-intensive, bandwidth-intensive, andlatency-sensitive applications are run concurrently to quantify the impacts of different scheduler configurations on processor and I/O performance. These applications are evaluated on 11 different scheduler configurations within the Xen VMM. These configurations include a variety of scheduler extensions aimed at improving I/O performance. This cross product of scheduler configurations and application types offers insight into the key problems in VMM scheduling for I/O and motivates future innovation in this area.

378 citations


Journal ArticleDOI
TL;DR: A method to dynamically schedule patients with different priorities to a diagnostic facility in a public health-care setting and the form of the optimal linear value function approximation and the resulting policy is presented.
Abstract: We present a method to dynamically schedule patients with different priorities to a diagnostic facility in a public health-care setting. Rather than maximizing revenue, the challenge facing the resource manager is to dynamically allocate available capacity to incoming demand to achieve wait-time targets in a cost-effective manner. We model the scheduling process as a Markov decision process. Because the state space is too large for a direct solution, we solve the equivalent linear program through approximate dynamic programming. For a broad range of cost parameter values, we present analytical results that give the form of the optimal linear value function approximation and the resulting policy. We investigate the practical implications and the quality of the policy through simulation.

361 citations


01 Jan 2008
TL;DR: The Time Synchronized Mesh Protocol (TSMP) enables reliable, low power, secure communication in a managed wireless mesh network and is a medium access and networking protocol designed for the recently ratified Wireless HART standard in industrial automation.
Abstract: The Time Synchronized Mesh Protocol (TSMP) enables reliable, low power, secure communication in a managed wireless mesh network. TSMP is a medium access and networking protocol designed for the recently ratified Wireless HART standard in industrial automation. TSMP benefits from synchronization of nodes in a multi-hop network to within a few hundred microseconds, allowing scheduling of collision-free pair-wise and broadcast communication to meet the traffic needs of all nodes while cycling through all available channels. Latency and reliability guarantees can be traded off for energy use, though our focus has been on providing high reliability (>99.9%) networks at the lowest power possible. TSMP has been demonstrated in multi-hop networks exceeding 250 nodes per access point, thousands of nodes with multiple access points, radio duty cycles of 0.01%, and with devices at radically different temperatures and traffic levels. With the 802.15.4 physical layer and 10 ms time slots, TSMP can theoretically achieve a secure payload throughput of 76 kbps at a single egress point.

356 citations


Journal ArticleDOI
TL;DR: A new surgical case scheduling approach is proposed which uses a novel extension of the Job Shop scheduling problem called multi-mode blocking job shop (MMBJS) as a mixed integer linear programming (MILP) problem and the use of the MMBJS model for scheduling elective and add-on cases is discussed.

338 citations


Book
17 Jan 2008
TL;DR: This work studies how protocol design for various functionalities within a communication network architecture can be viewed as a distributed resource allocation problem, and shows how to incorporate stability into protocols, and thus, prevent undesirable network behavior.
Abstract: We study how protocol design for various functionalities within a communication network architecture can be viewed as a distributed resource allocation problem. This involves understanding what resources are, how to allocate them fairly, and perhaps most importantly, how to achieve this goal in a distributed and stable fashion. We start with ideas of a centralized optimization framework and show how congestion control, routing and scheduling in wired and wireless networks can be thought of as fair resource allocation. We then move to the study of controllers that allow a decentralized solution of this problem. These controllers are the analytical equivalent of protocols in use on the Internet today, and we describe existing protocols as realizations of such controllers. The Internet is a dynamic system with feedback delays and flows that arrive and depart, which means that stability of the system cannot be taken for granted. We show how to incorporate stability into protocols, and thus, prevent undesirable network behavior. Finally, we consider a futuristic scenario where users are aware of the effects of their actions and try to game the system. We will see that the optimization framework is remarkably robust even to such gaming.

327 citations


Journal ArticleDOI
TL;DR: In this article, a mathematical programming model for the combined vehicle routing and scheduling problem with time windows and additional temporal constraints is presented, which allows for imposing pairwise synchronization and pairwise temporal precedence between customer visits, independently of the vehicles.

Journal ArticleDOI
TL;DR: In this paper a stochastic overbooking model is formulated and an appointment scheduling policy is developed for outpatient clinics that captures patient waiting time, staff overtime and patient revenue.
Abstract: In this paper a stochastic overbooking model is formulated and an appointment scheduling policy is developed for outpatient clinics. The schedule is constructed for a single service period partitioned into time slots of equal length. A clinic scheduler assigns patients to slots through a sequential patient call-in process where the scheduler must provide each calling patient with an appointment time before the patient's call terminates. Once an appointment is added to the schedule, it cannot be changed. Each calling patient has a no-show probability, and overbooking is used to compensate for patient no-shows. The scheduling objective captures patient waiting time, staff overtime and patient revenue. Conditions under which the objective evolution is unimodal are derived and the behavior of the scheduling policy is investigated under a variety of conditions. Practical observations on the performance of the policy are presented.

Proceedings ArticleDOI
13 Apr 2008
TL;DR: This work uses the technique of Lyapunov Optimization to design an online flow control, scheduling and resource allocation algorithm that meets the desired objectives and provides explicit performance guarantees.
Abstract: We develop opportunistic scheduling policies for cognitive radio networks that maximize the throughput utility of the secondary (unlicensed) users subject to maximum collision constraints with the primary (licensed) users. We consider a cognitive network with static primary users and potentially mobile secondary users. We use the technique of Lyapunov Optimization to design an online flow control, scheduling and resource allocation algorithm that meets the desired objectives and provides explicit performance guarantees.

Proceedings ArticleDOI
13 Apr 2008
TL;DR: A number of new analytic results characterizing the performance limits of greedy maximal scheduling are provided, including an equivalent characterization of the efficiency ratio of GMS through a topological property called the local-pooling factor of the network graph.
Abstract: In this paper, we characterize the performance of an important class of scheduling schemes, called greedy maximal scheduling (GMS), for multi-hop wireless networks. While a lower bound on the throughput performance of GMS is relatively well-known in the simple node-exclusive interference model, it has not been thoroughly explored in the more general K-hop interference model. Moreover, empirical observations suggest that the known bounds are quite loose, and that the performance of GMS is often close to optimal. In this paper, we provide a number of new analytic results characterizing the performance limits of GMS. We first provide an equivalent characterization of the efficiency ratio of GMS through a topological property called the local-pooling factor of the network graph. We then develop an iterative procedure to estimate the local-pooling factor under a large class of network topologies and interference models. We use these results to study the worst-case efficiency ratio of GMS on two classes of network topologies. First, we show how these results can be applied to tree networks to prove that GMS achieves the full capacity region in tree networks under the K-hop interference model. Second, we show that the worst-case efficiency ratio of GMS in geometric network graphs is between 1/6 and 1/3.

Proceedings ArticleDOI
23 Jun 2008
TL;DR: A scheduling approach in which users request resource leases, where leases can request either as-soon-as-possible ("best-effort") or reservation start times, is described, and a VM-based approach can provide better performance than a scheduler that does not support task pre-emption.
Abstract: As cluster computers are used for a wider range of applications, we encounter the need to deliver resources at particular times, to meet particular deadlines, and/or at the same time as other resources are provided elsewhere. To address such requirements, we describe a scheduling approach in which users request resource leases, where leases can request either as-soon-as-possible ("best-effort") or reservation start times. We present the design of a lease management architecture, Haizea, that implements leases as virtual machines (VMs), leveraging their ability to suspend, migrate, and resume computations and to provide leased resources with customized application environments. We discuss methods to minimize the overhead introduced by having to deploy VM images before the start of a lease. We also present the results of simulation studies that compare alternative approaches. Using workloads with various mixes of best-effort and advance reservation requests, we compare the performance of our VM-based approach with that of non-VM-based schedulers. We find that a VM-based approach can provide better performance (measured in terms of both total execution time and average delay incurred by best-effort requests) than a scheduler that does not support task pre-emption, and only slightly worse performance than a scheduler that does support task pre-emption. We also compare the impact of different VM image popularity distributions and VM image caching strategies on performance. These results emphasize the importance of VM image caching for the workloads studied and quantify the sensitivity of scheduling performance to VM image popularity distribution.

Proceedings ArticleDOI
31 Oct 2008
TL;DR: A programming model for those environments based on automatic function level parallelism that strives to be easy, flexible, portable, and performant is presented and it is demonstrated that it offers reasonable performance without tuning, and that it can rival highly tuned libraries with minimal tuning effort.
Abstract: Parallel programming on SMP and multi-core architectures is hard. In this paper we present a programming model for those environments based on automatic function level parallelism that strives to be easy, flexible, portable, and performant. Its main trait is its ability to exploit task level parallelism by analyzing task dependencies at run time. We present the programming environment in the context of algorithms from several domains and pinpoint its benefits compared to other approaches. We discuss its execution model and its scheduler. Finally we analyze its performance and demonstrate that it offers reasonable performance without tuning, and that it can rival highly tuned libraries with minimal tuning effort.

Journal ArticleDOI
TL;DR: It is shown that a simple distributed scheduling strategy, maximal scheduling, attains a guaranteed fraction of the maximum throughput region in arbitrary wireless networks, which can be generalized to end-to-end multihop sessions.
Abstract: The question of providing throughput guarantees through distributed scheduling, which has remained an open problem for some time, is addressed in this paper. It is shown that a simple distributed scheduling strategy, maximal scheduling, attains a guaranteed fraction of the maximum throughput region in arbitrary wireless networks. The guaranteed fraction depends on the ldquointerference degreerdquo of the network, which is the maximum number of transmitter-receiver pairs that interfere with any given transmitter-receiver pair in the network and do not interfere with each other. Depending on the nature of communication, the transmission powers and the propagation models, the guaranteed fraction can be lower-bounded by the maximum link degrees in the underlying topology, or even by constants that are independent of the topology. The guarantees are tight in that they cannot be improved any further with maximal scheduling. The results can be generalized to end-to-end multihop sessions. Finally, enhancements to maximal scheduling that can guarantee fairness of rate allocation among different sessions, are discussed.

Journal ArticleDOI
TL;DR: This paper defines and emphasize the importance, applications, and benefits of explicitly considering setup times/costs in scheduling research, and a review of the latest research on scheduling problems with setup costs is provided.

Journal ArticleDOI
TL;DR: This paper introduces the periodic resource model to characterize resource allocations provided to a single component and presents exact schedulability conditions for the standard Liu and Layland periodic task model and the proposed periodic resource models under EDF and RM scheduling.
Abstract: It is desirable to develop large complex systems using components based on systematic abstraction and composition. Our goal is to develop a compositional real-time scheduling framework to support abstraction and composition techniques for real-time aspects of components. In this paper, we present a formal description of compositional real-time scheduling problems, which are the component abstraction and composition problems. We identify issues that need be addressed by solutions and provide our framework for the solutions, which is based on the periodic interface. Specifically, we introduce the periodic resource model to characterize resource allocations provided to a single component. We present exact schedulability conditions for the standard Liu and Layland periodic task model and the proposed periodic resource model under EDF and RM scheduling, and we show that the component abstraction and composition problems can be addressed with periodic interfaces through the exact schedulability conditions. We also provide the utilization bounds of a periodic task set over the periodic resource model and the abstraction bounds of periodic interfaces for a periodic task set under EDF and RM scheduling. We finally present the analytical bounds of overheads that our solution incurs in terms of resource utilization increase and evaluate the overheads through simulations.

Journal ArticleDOI
Robert Knauerhase1, Paul Brett1, B. Hohlt1, Tong Li1, Scott D. Hahn1 
TL;DR: It is shown that the OS can use data obtained from dynamic runtime observation of task behavior to ameliorate performance variability and more effectively exploit multicore processor resources.
Abstract: Today's operating systems don't adequately handle the complexities of Multicore processors. Architectural features confound existing OS techniques for task scheduling, load balancing, and power management. This article shows that the OS can use data obtained from dynamic runtime observation of task behavior to ameliorate performance variability and more effectively exploit multicore processor resources. The authors' research prototypes demonstrate the utility of observation-based policy.

Proceedings ArticleDOI
13 Apr 2008
TL;DR: These algorithms are the first approximation algorithms in the literature with a tight worst-case guarantee for the NP-hard problem and can obtain an aggregate throughput which can be as much as 2.3 times more than that of the max-min fair allocation in 802.11b.
Abstract: In multi-rate wireless LANs, throughput-based fair bandwidth allocation can lead to drastically reduced aggregate throughput. To balance aggregate throughput while serving users in a fair manner, proportional fair or time-based fair scheduling has been proposed to apply at each access point (AP). However, since a realistic deployment of wireless LANs can consist of a network of APs, this paper considers proportional fairness in this much wider setting. Our technique is to intelligently associate users with APs to achieve optimal proportional fairness in a network of APs. We propose two approximation algorithms for periodical offline optimization. Our algorithms are the first approximation algorithms in the literature with a tight worst-case guarantee for the NP-hard problem. Our simulation results demonstrate that our algorithms can obtain an aggregate throughput which can be as much as 2.3 times more than that of the max-min fair allocation in 802.11b. While maintaining aggregate throughput, our approximation algorithms outperform the default user-AP association method in the 802.11b standard significantly in terms of fairness.

Journal ArticleDOI
TL;DR: This work compile and classify the research work conducted for Ethernet passive optical networks, and examines PON architectures and dynamic bandwidth allocation algorithms, and further examines the topics of QoS support, as well as fair bandwidth allocation.
Abstract: We compile and classify the research work conducted for Ethernet passive optical networks. We examine PON architectures and dynamic bandwidth allocation algorithms. Our classifications provide meaningful and insightful presentations of the prior work on EPONs. The main branches of our classification of DBA are: grant sizing, grant scheduling, and optical network unit queue scheduling. We further examine the topics of QoS support, as well as fair bandwidth allocation. The presentation allows those interested in advancing EPON research to quickly understand what already was investigated and what requires further investigation. We summarize results where possible and explicitly point to future avenues of research.

Journal ArticleDOI
TL;DR: The LDCP algorithm provides a practical solution for scheduling parallel applications with high communication costs in HeDCSs and outperforms the HEFT and DLS algorithms in terms of schedule length and speedup.

Journal ArticleDOI
TL;DR: This paper introduces multiple algorithms to include time buffers in a given schedule while a predefined project due date remains respected and multiple efficient heuristic and meta-heuristic procedures are proposed to allocate buffers throughout the schedule.

Patent
21 Oct 2008
TL;DR: In this article, a method for optimizing individual and/or group task scheduling and time management is described, where a scheduling application may receive a list of tasks from a user, and apply a scheduling algorithm to organize the tasks into an ordered list.
Abstract: A method for optimizing individual and/or group task scheduling and time management is disclosed. A scheduling application may receive a list of tasks from a user, and apply a scheduling algorithm to organize the tasks into an ordered list. Once an ordered list of tasks has been created, the scheduling application may then fit the tasks from the task list into generated time bins.

Journal ArticleDOI
TL;DR: In this paper, a Petri net (PN) model is developed for the system, which describes when the robot should wait and a robot wait is modeled as an event in an explicit way.
Abstract: With wafer residency time constraints for some wafer fabrication processes, such as low pressure chemical-vapor deposition, the schedulability and scheduling problems are still open. This paper aims to solve both problems. A Petri net (PN) model is developed for the system. This model describes when the robot should wait and a robot wait is modeled as an event in an explicit way. Thus, to schedule a single-arm cluster tool with wafer residency time constraint is to decide how long a robot wait should be. Based on this model, for the first time, we present the necessary and sufficient conditions under which a single-arm cluster tool with residency time constraints is schedulable, which can be checked analytically. Meanwhile, a closed form scheduling algorithm is developed to find an optimal periodic schedule if it is schedulable. Also, a simple method is presented for the implementation of the periodic schedule for steady state, which is not seen in any previous work.

Proceedings ArticleDOI
25 Oct 2008
TL;DR: Experiments on a wide variety of compute-intensive loops from the multimedia domain show that EMS improves throughput by 25% over traditional iterative modulo scheduling, and achieves 98% of the throughput of simulated annealing techniques at a fraction of the compilation time.
Abstract: Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing the potential for high computation throughput, scalability, low cost, and energy efficiency. CGRAs consist of an array of function units and register files often organized as a two dimensional grid. The most difficult challenge in deploying CGRAs is compiler scheduling technology that can efficiently map software implementations of compute intensive loops onto the array. Traditional schedulers focus on the placement of operations in time and space. With CGRAs, the challenge of placement is compounded by the need to explicitly route operands from producers to consumers. To systematically attack this problem, we take an edge-centric approach to modulo scheduling that focuses on the routing problem as its primary objective. With edge-centric modulo scheduling (EMS), placement is a by-product of the routing process, and the schedule is developed by routing each edge in the dataflow graph. Routing cost metrics provide the scheduler with a global perspective to guide selection. Experiments on a wide variety of compute-intensive loops from the multimedia domain show that EMS improves throughput by 25% over traditional iterative modulo scheduling, and achieves 98% of the throughput of simulated annealing techniques at a fraction of the compilation time.

Proceedings ArticleDOI
13 Apr 2008
TL;DR: This paper investigates how to design distributed algorithm for a future multi-hop CR network, with the objective of maximizing data rates for a set of user communication sessions via a cross-layer optimization approach.
Abstract: Cognitive radio (CR) is a revolution in radio technology and is viewed as an enabling technology for dynamic spectrum access. This paper investigates how to design distributed algorithm for a future multi-hop CR network, with the objective of maximizing data rates for a set of user communication sessions. We study this problem via a cross-layer optimization approach, with joint consideration of power control, scheduling, and routing. The main contribution of this paper is the development of a distributed optimization algorithm that iteratively increases data rates for user communication sessions. During each iteration, there are two separate processes, a Conservative Iterative Process (CIP) and an Aggressive Iterative Process (AIP). For both CIP and AIP, we describe our design of routing, minimalist scheduling, and power control/scheduling modules. To evaluate the performance of the distributed optimization algorithm, we compare it to an upper bound of the objective function, since the exact optimal solution to the objective function cannot be obtained via its mixed integer nonlinear programming (MINLP) formulation. Since the achievable performance via our distributed algorithm is close to the upper bound and the optimal solution (unknown) lies between the upper bound and the feasible solution obtained by our distributed algorithm, we conclude that the results obtained by our distributed algorithm are very close to the optimal solution.

Proceedings ArticleDOI
23 Jun 2008
TL;DR: Harmony, a runtime supported programming and execution model that provides semantics for simplifying parallelism management, dynamic scheduling of compute intensive kernels to heterogeneous processor resources, and online monitoring driven performance optimization for heterogeneous many core systems is proposed.
Abstract: The emergence of heterogeneous many core architectures presents a unique opportunity for delivering order of magnitude performance increases to high performance applications by matching certain classes of algorithms to specifically tailored architectures. Their ubiquitous adoption, however, has been limited by a lack of programming models and management frameworks designed to reduce the high degree of complexity of software development intrinsic to heterogeneous architectures. This paper proposes Harmony, a runtime supported programming and execution model that provides: (1) semantics for simplifying parallelism management, (2) dynamic scheduling of compute intensive kernels to heterogeneous processor resources, and (3) online monitoring driven performance optimization for heterogeneous many core systems. We are particulably concerned with simplifying development and ensuring binary portability and scalability across system configurations and sizes. Initial results from ongoing development demonstrate the binary compatibility with variable number of cores, as well as dynamic adaptation of schedules to data sets. We present preliminary results of key features for some benchmark applications.

Proceedings ArticleDOI
05 Nov 2008
TL;DR: This work performs extensive modeling and experimentation on two 20-node TelosB motes testbeds to compare a suite of interference models for their modeling accuracies and shows via solving the one shot scheduling problem, that the graded version can improve `expected throughput' over the thresholded version by scheduling imperfect links.
Abstract: Accurate interference models are important for use in transmission scheduling algorithms in wireless networks. In this work, we perform extensive modeling and experimentation on two 20-node TelosB motes testbeds -- one indoor and the other outdoor -- to compare a suite of interference models for their modeling accuracies. We first empirically build and validate the physical interference model via a packet reception rate vs. SINR relationship using a measurement driven method. We then similarly instantiate other simpler models, such as hop-based, range-based, protocol model, etc. The modeling accuracies are then evaluated on the two testbeds using transmission scheduling experiments. We observe that while the physical interference model is the most accurate, it is still far from perfect, providing a 90-percentile error about 20-25% (and 80 percentile error 7-12%), depending on the scenario. The accuracy of the other models is worse and scenario-specific. The second best model trails the physical model by roughly 12-18 percentile points for similar accuracy targets. Somewhat similar throughput performance differential between models is also observed when used with greedy scheduling algorithms. Carrying on further, we look closely into the the two incarnations of the physical model -- 'thresholded' (conservative, but typically considered in literature) and 'graded' (more realistic). We show via solving the one shot scheduling problem, that the graded version can improve `expected throughput' over the thresholded version by scheduling imperfect links.