scispace - formally typeset
Search or ask a question

Showing papers on "Scheduling (computing) published in 2013"


Journal ArticleDOI
TL;DR: An overview on the key issues that arise in the design of a resource allocation algorithm for LTE networks is provided, intended for a wide range of readers as it covers the topic from basics to advanced aspects.
Abstract: Future generation cellular networks are expected to provide ubiquitous broadband access to a continuously growing number of mobile users. In this context, LTE systems represent an important milestone towards the so called 4G cellular networks. A key feature of LTE is the adoption of advanced Radio Resource Management procedures in order to increase the system performance up to the Shannon limit. Packet scheduling mechanisms, in particular, play a fundamental role, because they are responsible for choosing, with fine time and frequency resolutions, how to distribute radio resources among different stations, taking into account channel condition and QoS requirements. This goal should be accomplished by providing, at the same time, an optimal trade-off between spectral efficiency and fairness. In this context, this paper provides an overview on the key issues that arise in the design of a resource allocation algorithm for LTE networks. It is intended for a wide range of readers as it covers the topic from basics to advanced aspects. The downlink channel under frequency division duplex configuration is considered as object of our study, but most of the considerations are valid for other configurations as well. Moreover, a survey on the most recent techniques is reported, including a classification of the different approaches presented in literature. Performance comparisons of the most well-known schemes, with particular focus on QoS provisioning capabilities, are also provided for complementing the described concepts. Thus, this survey would be useful for readers interested in learning the basic concepts before going into the details of a particular scheduling strategy, as well as for researchers aiming at deepening more specific aspects.

817 citations


Proceedings ArticleDOI
27 Aug 2013
TL;DR: In this paper, a minimalistic datacenter transport design that provides near theoretically optimal flow completion times even at the 99th percentile for short flows, while still minimizing average flow completion time for long flows is presented.
Abstract: In this paper we present pFabric, a minimalistic datacenter transport design that provides near theoretically optimal flow completion times even at the 99th percentile for short flows, while still minimizing average flow completion time for long flows. Moreover, pFabric delivers this performance with a very simple design that is based on a key conceptual insight: datacenter transport should decouple flow scheduling from rate control. For flow scheduling, packets carry a single priority number set independently by each flow; switches have very small buffers and implement a very simple priority-based scheduling/dropping mechanism. Rate control is also correspondingly simpler; flows start at line rate and throttle back only under high and persistent packet loss. We provide theoretical intuition and show via extensive simulations that the combination of these two simple mechanisms is sufficient to provide near-optimal performance.

765 citations


Journal ArticleDOI
TL;DR: A theoretical framework of energy-optimal mobile cloud computing under stochastic wireless channel is provided, and numerical results suggest that a significant amount of energy can be saved for the mobile device by optimally offloading mobile applications to the cloud in some cases.
Abstract: This paper provides a theoretical framework of energy-optimal mobile cloud computing under stochastic wireless channel. Our objective is to conserve energy for the mobile device, by optimally executing mobile applications in the mobile device (i.e., mobile execution) or offloading to the cloud (i.e., cloud execution). One can, in the former case sequentially reconfigure the CPU frequency; or in the latter case dynamically vary the data transmission rate to the cloud, in response to the stochastic channel condition. We formulate both scheduling problems as constrained optimization problems, and obtain closed-form solutions for optimal scheduling policies. Furthermore, for the energy-optimal execution strategy of applications with small output data (e.g., CloudAV), we derive a threshold policy, which states that the data consumption rate, defined as the ratio between the data size (L) and the delay constraint (T), is compared to a threshold which depends on both the energy consumption model and the wireless channel model. Finally, numerical results suggest that a significant amount of energy can be saved for the mobile device by optimally offloading mobile applications to the cloud in some cases. Our theoretical framework and numerical investigations will shed lights on system implementation of mobile cloud computing under stochastic wireless channel.

754 citations


Proceedings ArticleDOI
16 Mar 2013
TL;DR: Paragon is an online and scalable DC scheduler that is heterogeneity and interference-aware, derived from robust analytical methods and uses collaborative filtering techniques to quickly and accurately classify an unknown, incoming workload, by identifying similarities to previously scheduled applications.
Abstract: Large-scale datacenters (DCs) host tens of thousands of diverse applications each day. However, interference between colocated workloads and the difficulty to match applications to one of the many hardware platforms available can degrade performance, violating the quality of service (QoS) guarantees that many cloud workloads require. While previous work has identified the impact of heterogeneity and interference, existing solutions are computationally intensive, cannot be applied online and do not scale beyond few applications.We present Paragon, an online and scalable DC scheduler that is heterogeneity and interference-aware. Paragon is derived from robust analytical methods and instead of profiling each application in detail, it leverages information the system already has about applications it has previously seen. It uses collaborative filtering techniques to quickly and accurately classify an unknown, incoming workload with respect to heterogeneity and interference in multiple shared resources, by identifying similarities to previously scheduled applications. The classification allows Paragon to greedily schedule applications in a manner that minimizes interference and maximizes server utilization. Paragon scales to tens of thousands of servers with marginal scheduling overheads in terms of time or state.We evaluate Paragon with a wide range of workload scenarios, on both small and large-scale systems, including 1,000 servers on EC2. For a 2,500-workload scenario, Paragon enforces performance guarantees for 91% of applications, while significantly improving utilization. In comparison, heterogeneity-oblivious, interference-oblivious and least-loaded schedulers only provide similar guarantees for 14%, 11% and 3% of workloads. The differences are more striking in oversubscribed scenarios where resource efficiency is more critical.

709 citations


Proceedings ArticleDOI
03 Nov 2013
TL;DR: It is demonstrated that a decentralized, randomized sampling approach provides near-optimal performance while avoiding the throughput and availability limitations of a centralized design.
Abstract: Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism to provide low latency. Scheduling highly parallel jobs that complete in hundreds of milliseconds poses a major challenge for task schedulers, which will need to schedule millions of tasks per second on appropriate machines while offering millisecond-level latency and high availability. We demonstrate that a decentralized, randomized sampling approach provides near-optimal performance while avoiding the throughput and availability limitations of a centralized design. We implement and deploy our scheduler, Sparrow, on a 110-machine cluster and demonstrate that Sparrow performs within 12% of an ideal scheduler.

597 citations


Journal ArticleDOI
01 May 2013
TL;DR: An algorithm named honey bee behavior inspired load balancing (HBB-LB) is proposed, which aims to achieve well balanced load across virtual machines for maximizing the throughput and compared with existing load balancing and scheduling algorithms.
Abstract: Scheduling of tasks in cloud computing is an NP-hard optimization problem. Load balancing of non-preemptive independent tasks on virtual machines (VMs) is an important aspect of task scheduling in clouds. Whenever certain VMs are overloaded and remaining VMs are under loaded with tasks for processing, the load has to be balanced to achieve optimal machine utilization. In this paper, we propose an algorithm named honey bee behavior inspired load balancing (HBB-LB), which aims to achieve well balanced load across virtual machines for maximizing the throughput. The proposed algorithm also balances the priorities of tasks on the machines in such a way that the amount of waiting time of the tasks in the queue is minimal. We have compared the proposed algorithm with existing load balancing and scheduling algorithms. The experimental results show that the algorithm is effective when compared with existing algorithms. Our approach illustrates that there is a significant improvement in average execution time and reduction in waiting time of tasks on queue.

597 citations


Journal ArticleDOI
TL;DR: Two workflow scheduling algorithms are proposed which aim to minimize the workflow execution cost while meeting a deadline and have a polynomial time complexity which make them suitable options for scheduling large workflows in IaaS Clouds.

580 citations


Journal ArticleDOI
TL;DR: This paper proposes FlashLinQ - a synchronous peer-to-peer wireless PHY/MAC network architecture for distributed channel allocation that develops an analog energy-level based signaling scheme that enables SIR (Signal to Interference Ratio) based distributed scheduling.
Abstract: This paper proposes FlashLinQ--a synchronous peer-to-peer wireless PHY/MAC network architecture. FlashLinQ leverages the fine-grained parallel channel access offered by OFDM and incorporates an analog energy-level-based signaling scheme that enables signal-to-interference ratio (SIR)-based distributed scheduling. This new signaling mechanism, and the concomitant scheduling algorithm, enables efficient channel-aware spatial resource allocation, leading to significant gains over a CSMA/CA system using RTS/CTS. FlashLinQ is a complete system architecture including: 1) timing and frequency synchronization derived from cellular spectrum; 2) peer discovery; 3) link management; and 4) channel-aware distributed power, data rate, and link scheduling. FlashLinQ has been implemented for operation over licensed spectrum on a digital signal processor/ field-programmable gate array (DSP/FPGA) platform. In this paper, we present FlashLinQ performance results derived from both measurements and simulations.

451 citations


Journal ArticleDOI
TL;DR: The results from computational experiments indicate that the efficiency and effectiveness of the proposed MOACO are comparable to NSGA-II and SPEA2 and shows that durations of TOU periods and processing speed of machines have great influence on scheduling results as longer off-peak period and use of faster machines provide more flexibility for shifting high-energy operations to off- peak periods.

315 citations


Journal ArticleDOI
TL;DR: A new demand side management technique, namely, a new energy efficient scheduling algorithm, is proposed to arrange the household appliances for operation such that the monetary expense of a customer is minimized based on the time-varying pricing model.
Abstract: High quality demand side management has become indispensable in the smart grid infrastructure for enhanced energy reduction and system control. In this paper, a new demand side management technique, namely, a new energy efficient scheduling algorithm, is proposed to arrange the household appliances for operation such that the monetary expense of a customer is minimized based on the time-varying pricing model. The proposed algorithm takes into account the uncertainties in household appliance operation time and intermittent renewable generation. Moreover, it considers the variable frequency drive and capacity-limited energy storage. Our technique first uses the linear programming to efficiently compute a deterministic scheduling solution without considering uncertainties. To handle the uncertainties in household appliance operation time and energy consumption, a stochastic scheduling technique, which involves an energy consumption adaptation variable , is used to model the stochastic energy consumption patterns for various household appliances. To handle the intermittent behavior of the energy generated from the renewable resources, the offline static operation schedule is adapted to the runtime dynamic scheduling considering variations in renewable energy. The simulation results demonstrate the effectiveness of our approach. Compared to a traditional scheduling scheme which models typical household appliance operations in the traditional home scenario, the proposed deterministic linear programming based scheduling scheme achieves up to 45% monetary expense reduction, and the proposed stochastic design scheme achieves up to 41% monetary expense reduction. Compared to a worst case design where an appliance is assumed to consume the maximum amount of energy, the proposed stochastic design which considers the stochastic energy consumption patterns achieves up to 24% monetary expense reduction without violating the target trip rate of 0.5%. Furthermore, the proposed energy consumption scheduling algorithm can always generate the scheduling solution within 10 seconds, which is fast enough for household appliance applications.

312 citations


Journal ArticleDOI
TL;DR: A directional routing and scheduling scheme (DRSS) for green vehicle DTNs is presented by using Nash Q-learning approach that can optimize the energy efficiency with the considerations of congestion, buffer and delay.
Abstract: The vehicle delay tolerant networks (DTNs) make opportunistic communications by utilizing the mobility of vehicles, where the node makes delay-tolerant based "carry and forward" mechanism to deliver the packets. The routing schemes for vehicle networks are challenging for varied network environment. Most of the existing DTN routing including routing for vehicular DTNs mainly focus on metrics such as delay, hop count and bandwidth, etc. A new focus in green communications is with the goal of saving energy by optimizing network performance and ultimately protecting the natural climate. The energy---efficient communication schemes designed for vehicular networks are imminent because of the pollution, energy consumption and heat dissipation. In this paper, we present a directional routing and scheduling scheme (DRSS) for green vehicle DTNs by using Nash Q-learning approach that can optimize the energy efficiency with the considerations of congestion, buffer and delay. Our scheme solves the routing and scheduling problem as a learning process by geographic routing and flow control toward the optimal direction. To speed up the learning process, our scheme uses a hybrid method with forwarding and replication according to traffic pattern. The DRSS algorithm explores the possible strategies, and then exploits the knowledge obtained to adapt its strategy and achieve the desired overall objective when considering the stochastic non-cooperative game in on-line multi-commodity routing situations. The simulation results of a vehicular DTN with predetermined mobility model show DRSS achieves good energy efficiency with learning ability, which can guarantee the delivery ratio within the delay bound.

Proceedings ArticleDOI
16 Mar 2013
TL;DR: This paper presents a coordinated CTA-aware scheduling policy that utilizes four schemes to minimize the impact of long memory latencies, and indicates that the proposed mechanism can provide 33% average performance improvement compared to the commonly-employed round-robin warp scheduling policy.
Abstract: Emerging GPGPU architectures, along with programming models like CUDA and OpenCL, offer a cost-effective platform for many applications by providing high thread level parallelism at lower energy budgets. Unfortunately, for many general-purpose applications, available hardware resources of a GPGPU are not efficiently utilized, leading to lost opportunity in improving performance. A major cause of this is the inefficiency of current warp scheduling policies in tolerating long memory latencies.In this paper, we identify that the scheduling decisions made by such policies are agnostic to thread-block, or cooperative thread array (CTA), behavior, and as a result inefficient. We present a coordinated CTA-aware scheduling policy that utilizes four schemes to minimize the impact of long memory latencies. The first two schemes, CTA-aware two-level warp scheduling and locality aware warp scheduling, enhance per-core performance by effectively reducing cache contention and improving latency hiding capability. The third scheme, bank-level parallelism aware warp scheduling, improves overall GPGPU performance by enhancing DRAM bank-level parallelism. The fourth scheme employs opportunistic memory-side prefetching to further enhance performance by taking advantage of open DRAM rows. Evaluations on a 28-core GPGPU platform with highly memory-intensive applications indicate that our proposed mechanism can provide 33% average performance improvement compared to the commonly-employed round-robin warp scheduling policy.

Journal ArticleDOI
TL;DR: The hierarchical scheduling strategy is being implemented in the SwinDeW-C cloud workflow system and demonstrating satisfactory performance, and the experimental results show that the overall performance of ACO based scheduling algorithm is better than others on three basic measurements: the optimisations rate on makespan, the optimisation rate on cost and the CPU time.
Abstract: A cloud workflow system is a type of platform service which facilitates the automation of distributed applications based on the novel cloud infrastructure. One of the most important aspects which differentiate a cloud workflow system from its other counterparts is the market-oriented business model. This is a significant innovation which brings many challenges to conventional workflow scheduling strategies. To investigate such an issue, this paper proposes a market-oriented hierarchical scheduling strategy in cloud workflow systems. Specifically, the service-level scheduling deals with the Task-to-Service assignment where tasks of individual workflow instances are mapped to cloud services in the global cloud markets based on their functional and non-functional QoS requirements; the task-level scheduling deals with the optimisation of the Task-to-VM (virtual machine) assignment in local cloud data centres where the overall running cost of cloud workflow systems will be minimised given the satisfaction of QoS constraints for individual tasks. Based on our hierarchical scheduling strategy, a package based random scheduling algorithm is presented as the candidate service-level scheduling algorithm and three representative metaheuristic based scheduling algorithms including genetic algorithm (GA), ant colony optimisation (ACO), and particle swarm optimisation (PSO) are adapted, implemented and analysed as the candidate task-level scheduling algorithms. The hierarchical scheduling strategy is being implemented in our SwinDeW-C cloud workflow system and demonstrating satisfactory performance. Meanwhile, the experimental results show that the overall performance of ACO based scheduling algorithm is better than others on three basic measurements: the optimisation rate on makespan, the optimisation rate on cost and the CPU time.

Journal ArticleDOI
TL;DR: A model for task-oriented resource allocation in a cloud computing environment where an induced bias matrix is used to identify the inconsistent elements and improve the consistency ratio when conflicting weights in various tasks are assigned is proposed.
Abstract: Resource allocation is a complicated task in cloud computing environment because there are many alternative computers with varying capacities. The goal of this paper is to propose a model for task-oriented resource allocation in a cloud computing environment. Resource allocation task is ranked by the pairwise comparison matrix technique and the Analytic Hierarchy Process giving the available resources and user preferences. The computing resources can be allocated according to the rank of tasks. Furthermore, an induced bias matrix is further used to identify the inconsistent elements and improve the consistency ratio when conflicting weights in various tasks are assigned. Two illustrative examples are introduced to validate the proposed method.

Journal ArticleDOI
TL;DR: In the study, the IDEA shows its effectiveness to optimize task scheduling and resource allocation compared with both the DEA and the NSGA-II and is confirmed to find the better Pareto-optimal solutions.

Proceedings ArticleDOI
29 Jun 2013
TL;DR: Two advanced generic schedulers for Storm are proposed that provide improved performance for a wide range of application topologies and can produce schedules that achieve significantly better performances compared to those produced by Storm's default scheduler.
Abstract: Today we are witnessing a dramatic shift toward a data-driven economy, where the ability to efficiently and timely analyze huge amounts of data marks the difference between industrial success stories and catastrophic failures. In this scenario Storm, an open source distributed realtime computation system, represents a disruptive technology that is quickly gaining the favor of big players like Twitter and Groupon. A Storm application is modeled as a topology, i.e. a graph where nodes are operators and edges represent data flows among such operators. A key aspect in tuning Storm performance lies in the strategy used to deploy a topology, i.e. how Storm schedules the execution of each topology component on the available computing infrastructure.In this paper we propose two advanced generic schedulers for Storm that provide improved performance for a wide range of application topologies. The first scheduler works offline by analyzing the topology structure and adapting the deployment to it; the second scheduler enhance the previous approach by continuously monitoring system performance and rescheduling the deployment at run-time to improve overall performance. Experimental results show that these algorithms can produce schedules that achieve significantly better performances compared to those produced by Storm's default scheduler.

Proceedings ArticleDOI
07 Oct 2013
TL;DR: To reduce resource contention, this paper proposes a dynamic CTA scheduling mechanism, called DYNCTA, which modulates the TLP by allocating optimal number of CTAs, based on application characteristics, to minimize resource contention.
Abstract: General-purpose graphics processing units (GPGPUs) are at their best in accelerating computation by exploiting abundant thread-level parallelism (TLP) offered by many classes of HPC applications. To facilitate such high TLP, emerging programming models like CUDA and OpenCL allow programmers to create work abstractions in terms of smaller work units, called cooperative thread arrays (CTAs). CTAs are groups of threads and can be executed in any order, thereby providing ample opportunities for TLP. The state-of-the-art GPGPU schedulers allocate maximum possible CTAs per-core (limited by available on-chip resources) to enhance performance by exploiting TLP. However, we demonstrate in this paper that executing the maximum possible number of CTAs on a core is not always the optimal choice from the performance perspective. High number of concurrently executing threads might cause more memory requests to be issued, and create contention in the caches, network and memory, leading to long stalls at the cores. To reduce resource contention, we propose a dynamic CTA scheduling mechanism, called DYNCTA, which modulates the TLP by allocating optimal number of CTAs, based on application characteristics. To minimize resource contention, DYNCTA allocates fewer CTAs for applications suffering from high contention in the memory sub-system, compared to applications demonstrating high throughput. Simulation results on a 30-core GPGPU platform with 31 applications show that the proposed CTA scheduler provides 28% average improvement in performance compared to the existing CTA scheduler.

Journal ArticleDOI
TL;DR: An online algorithm, called the Energy-limited Scheduling Algorithm (ESA), is developed, which jointly manages the energy and makes power allocation decisions for packet transmissions and achieves a utility that is within O(ε) of the optimal, for any ε > O, while ensuring that the network congestion and the required capacity of the energy storage devices are deterministically upper-bounded by bounds of size O (1/ε).
Abstract: In this paper, we show how to achieve close-to-optimal utility performance in energy-harvesting networks with only finite capacity energy storage devices. In these networks, nodes are capable of harvesting energy from the environment. The amount of energy that can be harvested is time-varying and evolves according to some probability law. We develop an online algorithm, called the Energy-limited Scheduling Algorithm (ESA), which jointly manages the energy and makes power allocation decisions for packet transmissions. ESA only has to keep track of the amount of energy left at the network nodes and does not require any knowledge of the harvestable energy process. We show that ESA achieves a utility that is within O(e) of the optimal, for any e > 0, while ensuring that the network congestion and the required capacity of the energy storage devices are deterministically upper-bounded by bounds of size O(1/e). We then also develop the Modified-ESA (MESA) algorithm to achieve the same O(e) close-to-utility performance, with the average network congestion and the required capacity of the energy storage devices being only O([log(1/e)]2), which is close to the theoretical lower bound O(log(1/e)).

Journal ArticleDOI
TL;DR: A new task decomposition method is proposed that decomposes each parallel task into a set of sequential tasks and achieves a resource augmentation bound of 4 and 5 when the decomposed tasks are scheduled using global EDF and partitioned deadline monotonic scheduling, respectively.
Abstract: Multi-core processors offer a significant performance increase over single-core processors. They have the potential to enable computation-intensive real-time applications with stringent timing constraints that cannot be met on traditional single-core processors. However, most results in traditional multiprocessor real-time scheduling are limited to sequential programming models and ignore intra-task parallelism. In this paper, we address the problem of scheduling periodic parallel tasks with implicit deadlines on multi-core processors. We first consider a synchronous task model where each task consists of segments, each segment having an arbitrary number of parallel threads that synchronize at the end of the segment. We propose a new task decomposition method that decomposes each parallel task into a set of sequential tasks. We prove that our task decomposition achieves a resource augmentation bound of 4 and 5 when the decomposed tasks are scheduled using global EDF and partitioned deadline monotonic scheduling, respectively. Finally, we extend our analysis to a directed acyclic graph (DAG) task model where each node in the DAG has a unit execution requirement. We show how these tasks can be converted into synchronous tasks such that the same decomposition can be applied and the same augmentation bounds hold. Simulations based on synthetic workload demonstrate that the derived resource augmentation bounds are safe and sufficient.

Proceedings ArticleDOI
13 May 2013
TL;DR: Experimental results show the benefits of combining the allocation and migration algorithms and demonstrate their ability to achieve significant energy savings while maintaining feasible convergence times when compared with the best fit heuristic.
Abstract: This paper presents two exact algorithms for energy efficient scheduling of virtual machines (VMs) in cloud data centers. Modeling of energy aware allocation and consolidation to minimize overall energy consumption leads us to the combination of an optimal allocation algorithm with a consolidation algorithm relying on migration of VMs at service departures. The optimal allocation algorithm is solved as a bin packing problem with a minimum power consumption objective. It is compared with an energy aware best fit algorithm. The exact migration algorithm results from a linear and integer formulation of VM migration to adapt placement when resources are released. The proposed migration is general and goes beyond the current state of the art by minimizing both the number of migrations needed for consolidation and energy consumption in a single algorithm with a set of valid inequalities and conditions. Experimental results show the benefits of combining the allocation and migration algorithms and demonstrate their ability to achieve significant energy savings while maintaining feasible convergence times when compared with the best fit heuristic.

Journal ArticleDOI
TL;DR: This paper presents a survey of the existing approaches for reducing preemptions and compares them under different metrics, providing both qualitative and quantitative performance evaluations.
Abstract: The question whether preemptive algorithms are better than nonpreemptive ones for scheduling a set of real-time tasks has been debated for a long time in the research community. In fact, especially under fixed priority systems, each approach has advantages and disadvantages, and no one dominates the other when both predictability and efficiency have to be taken into account in the system design. Recently, limited preemption models have been proposed as a viable alternative between the two extreme cases of fully preemptive and nonpreemptive scheduling. This paper presents a survey of the existing approaches for reducing preemptions and compares them under different metrics, providing both qualitative and quantitative performance evaluations.

Proceedings ArticleDOI
07 Apr 2013
TL;DR: This paper develops a Stackelberg game framework in which a cellular TIE and a D2D TIE are group to form a leader-follower pair, and proposes an algorithm for joint scheduling and resource allocation.
Abstract: Device-to-device (D2D) communication as an underlay to cellular networks can bring significant benefits to users' throughput. However, as D2D user equipments (TIEs) can cause interference to cellular TIEs, the scheduling and allocation of channel resources and power to D2D communication need elaborate coordination. In this paper, we propose a joint scheduling and resource allocation scheme to improve the performance of D2D communication. We take network throughput and TIEs' fairness into account by performing interference management. Specifically, we develop a Stackelberg game framework in which we group a cellular TIE and a D2D TIE to form a leader-follower pair. The cellular user is the leader, and the D2D TIE is the follower who buys channel resources from the leader. We analyze the equilibrium of the game, and propose an algorithm for joint scheduling and resource allocation. Finally, we perform computer simulations to study the performance of the proposed algorithm.

Journal ArticleDOI
TL;DR: The physical-layer security against eavesdropping attacks in the cognitive radio network is investigated, the user scheduling scheme to achieve multiuser diversity for improving the security level of cognitive transmissions with a primary QoS constraint is proposed and it is proved that the full diversity is obtained by using the proposedMultiuser scheduling.
Abstract: In this paper, we consider a cognitive radio network that consists of one cognitive base station (CBS) and multiple cognitive users (CUs) in the presence of multiple eavesdroppers, where CUs transmit their data packets to CBS under a primary user's quality of service (QoS) constraint while the eavesdroppers attempt to intercept the cognitive transmissions from CUs to CBS. We investigate the physical-layer security against eavesdropping attacks in the cognitive radio network and propose the user scheduling scheme to achieve multiuser diversity for improving the security level of cognitive transmissions with a primary QoS constraint. Specifically, a cognitive user (CU) that satisfies the primary QoS requirement and maximizes the achievable secrecy rate of cognitive transmissions is scheduled to transmit its data packet. For the comparison purpose, we also examine the traditional multiuser scheduling and the artificial noise schemes. We analyze the achievable secrecy rate and intercept probability of the traditional and proposed multiuser scheduling schemes as well as the artificial noise scheme in Rayleigh fading environments. Numerical results show that given a primary QoS constraint, the proposed multiuser scheduling scheme generally outperforms the traditional multiuser scheduling and the artificial noise schemes in terms of the achievable secrecy rate and intercept probability. In addition, we derive the diversity order of the proposed multiuser scheduling scheme through an asymptotic intercept probability analysis and prove that the full diversity is obtained by using the proposed multiuser scheduling.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: Techniques that coordinate the thread scheduling and prefetching decisions in a General Purpose Graphics Processing Unit (GPGPU) architecture to better tolerate long memory latencies are presented and a new prefetch-aware warp scheduling policy is proposed that overcomes problems with existing warp scheduling policies.
Abstract: In this paper, we present techniques that coordinate the thread scheduling and prefetching decisions in a General Purpose Graphics Processing Unit (GPGPU) architecture to better tolerate long memory latencies. We demonstrate that existing warp scheduling policies in GPGPU architectures are unable to effectively incorporate data prefetching. The main reason is that they schedule consecutive warps, which are likely to access nearby cache blocks and thus prefetch accurately for one another, back-to-back in consecutive cycles. This either 1) causes prefetches to be generated by a warp too close to the time their corresponding addresses are actually demanded by another warp, or 2) requires sophisticated prefetcher designs to correctly predict the addresses required by a future "far-ahead" warp while executing the current warp.We propose a new prefetch-aware warp scheduling policy that overcomes these problems. The key idea is to separate in time the scheduling of consecutive warps such that they are not executed back-to-back. We show that this policy not only enables a simple prefetcher to be effective in tolerating memory latencies but also improves memory bank parallelism, even when prefetching is not employed. Experimental evaluations across a diverse set of applications on a 30-core simulated GPGPU platform demonstrate that the prefetch-aware warp scheduler provides 25% and 7% average performance improvement over baselines that employ prefetching in conjunction with, respectively, the commonly-employed round-robin scheduler or the recently-proposed two-level warp scheduler. Moreover, when prefetching is not employed, the prefetch-aware warp scheduler provides higher performance than both of these baseline schedulers as it better exploits memory bank parallelism.

Journal ArticleDOI
TL;DR: Formal models are presented for precedence-constrained parallel tasks, DVFS-enabled clusters, and energy consumption, and proposed scheduling heuristics to reduce energy consumption of a tasks execution are developed.

Proceedings ArticleDOI
16 Jun 2013
TL;DR: This paper proposes a method to jointly optimize the transmit power, the number of bits per symbol and the CPU cycles assigned to each application in order to minimize the power consumption at the mobile side, under an average latency constraint dictated by the application requirements.
Abstract: Mobile cloud computing is offering a very powerful storage and computational facility to enhance the capabilities of resource-constrained mobile handsets. However, full exploitation of the cloud computing capabilities can be achieved only if the allocation of radio and computational capabilities is performed jointly. In this paper, we propose a method to jointly optimize the transmit power, the number of bits per symbol and the CPU cycles assigned to each application in order to minimize the power consumption at the mobile side, under an average latency constraint dictated by the application requirements. We consider the case of a set of mobile handsets served by a single cloud and we show that the optimization leads to a one-to-one relationship between the transmit power and the percentage of CPU cycles assigned to each user. Based on our optimization, we propose then a computation scheduling technique and verify the stability of the computations queue. Then we show how these queues are affected by the degrees of freedom of the channels between mobile handsets and server.

Proceedings ArticleDOI
20 May 2013
TL;DR: This work presents the XKaapi runtime system for data-flow task programming on multi-CPU and multi-GPU architectures, which supports a data- flow task model and a locality-aware work stealing scheduler, and shows performance results on two dense linear algebra kernels and a highly efficient Cholesky factorization.
Abstract: Most recent HPC platforms have heterogeneous nodes composed of multi-core CPUs and accelerators, like GPUs. Programming such nodes is typically based on a combination of OpenMP and CUDA/OpenCL codes; scheduling relies on a static partitioning and cost model. We present the XKaapi runtime system for data-flow task programming on multi-CPU and multi-GPU architectures, which supports a data-flow task model and a locality-aware work stealing scheduler. XKaapi enables task multi-implementation on CPU or GPU and multi-level parallelism with different grain sizes. We show performance results on two dense linear algebra kernels, matrix product (GEMM) and Cholesky factorization (POTRF), to evaluate XKaapi on a heterogeneous architecture composed of two hexa-core CPUs and eight NVIDIA Fermi GPUs. Our conclusion is two-fold. First, fine grained parallelism and online scheduling achieve performance results as good as static strategies, and in most cases outperform them. This is due to an improved work stealing strategy that includes locality information; a very light implementation of the tasks in XKaapi; and an optimized search for ready tasks. Next, the multi-level parallelism on multiple CPUs and GPUs enabled by XKaapi led to a highly efficient Cholesky factorization. Using eight NVIDIA Fermi GPUs and four CPUs, we measure up to 2.43 TFlop/s on double precision matrix product and 1.79 TFlop/s on Cholesky factorization; and respectively 5.09 TFlop/s and 3.92 TFlop/s in single precision.

Proceedings ArticleDOI
14 Apr 2013
TL;DR: Though L2DCT is deadline unaware, results indicate that, for typical data center traffic patterns and deadlines and over a wide range of traffic load, its deadline miss rate is consistently smaller compared to existing deadline-driven data center transport protocols.
Abstract: For provisioning large-scale online applications such as web search, social networks and advertisement systems, data centers face extreme challenges in providing low latency for short flows (that result from end-user actions) and high throughput for background flows (that are needed to maintain data consistency and structure across massively distributed systems). We propose L2DCT, a practical data center transport protocol that targets a reduction in flow completion times for short flows by approximating the Least Attained Service (LAS) scheduling discipline, without requiring any changes in application software or router hardware, and without adversely affecting the long flows. L2DCT can co-exist with TCP and works by adapting flow rates to the extent of network congestion inferred via Explicit Congestion Notification (ECN) marking, a feature widely supported by the installed router base. Though L2DCT is deadline unaware, our results indicate that, for typical data center traffic patterns and deadlines and over a wide range of traffic load, its deadline miss rate is consistently smaller compared to existing deadline-driven data center transport protocols. L2DCT reduces the mean flow completion time by up to 50% over DCTCP and by up to 95% over TCP. In addition, it reduces the completion for 99th percentile flows by 37% over DCTCP. We present the design and analysis of L2DCT, evaluate its performance, and discuss an implementation built upon standard Linux protocol stack.

Journal ArticleDOI
TL;DR: This paper proposes an opportunistic scheduling scheme based on the optimal stopping rule as a real-time distributed scheduling algorithm for smart appliances' automation control that determines the best time for appliances' operation to balance electricity bill reduction and inconvenience resulting from the operation delay.
Abstract: Demand response is a key feature of the smart grid. The addition of bidirectional communication to today's power grid can provide real-time pricing (RTP) to customers via smart meters. A growing number of appliance companies have started to design and produce smart appliances which embed intelligent control modules to implement residential demand response based on RTP. However, most of the current residential load scheduling schemes are centralized and based on either day-ahead pricing (DAP) or predicted price, which can deviate significantly from the RTP. In this paper, we propose an opportunistic scheduling scheme based on the optimal stopping rule as a real-time distributed scheduling algorithm for smart appliances' automation control. It determines the best time for appliances' operation to balance electricity bill reduction and inconvenience resulting from the operation delay. It is shown that our scheme is a distributed threshold policy when no constraint is considered. When a total power constraint exists, the proposed scheduling algorithm can be implemented in either a centralized or distributed fashion. Our scheme has low complexity and can be easily implemented. Simulation results validate proposed scheduling scheme shifts the operation to off-peak times and consequently leads to significant electricity bill saving with reasonable waiting time.

Journal ArticleDOI
TL;DR: The genetic algorithm outperforms the classic decomposition approaches in case of small-size instances and is able to generate relatively good solutions for instances with up to 50 jobs, 5 machines, and 10 vehicles.