scispace - formally typeset
Search or ask a question

Showing papers by "Charles E. Leiserson published in 2006"


Journal ArticleDOI
TL;DR: A hardware implementation of unbounded transactional memory, called UTM, is described, which exploits the common case for performance without sacrificing correctness on transactions whose footprint can be nearly as large as virtual memory.
Abstract: This article advances the following thesis: transactional memory should be virtualized to support transactions of arbitrary footprint and duration. Such support should be provided through hardware and be made visible to software through the machines instruction set architecture. We call a transactional memory system unbounded if the system can handle transactions of arbitrary duration that have footprints nearly as big as the systems virtual memory. The primary goal of unbounded transactional memory is to make concurrent programming easier without incurring much implementation overhead. Unbounded transactional-memory architectures can achieve high performance in the common case of small transactions, without sacrificing correctness in large transactions

295 citations


Patent
11 Dec 2006
TL;DR: In this paper, the authors propose a private-labeled network content delivery network (NCDN or "private CDN") to enable a network service provider (NSP) to offer a private labeled network CDN to participating content providers.
Abstract: A CDN service provider shares its CDN infrastructure with a network to enable a network service provider (NSP) to offer a private-labeled network content delivery network (NCDN or “private CDN”) to participating content providers. The CDNSP preferably provides the hardware, software and services required to build, deploy, operate and manage the CDN for the NCDN customer. Thus, the NCDN customer has access to and can make available to participating content providers one or more of the content delivery services (e.g., HTTP delivery, streaming media delivery, application delivery, and the like) available from the global CDN without having to provide the large capital investment, R&D expense and labor necessary to successfully deploy and operate the network itself. Rather, the global CDN service provider simply operates the private CDN for the network as a managed service.

269 citations


Proceedings ArticleDOI
29 Mar 2006
TL;DR: This paper provides an overview of several adaptive thread schedulers developed that provide provably good history-based feedback about the job's parallelism without knowing the future of the job, and develops the first nonclairvoy-ant scheduling algorithms to offer such guarantees.
Abstract: Multiprocessor scheduling in a shared multiprogramming environment is often structured as two-level scheduling, where a kernel-level job scheduler allots processors to jobs and a user-level task scheduler schedules the work of a job on the allotted processors. In this context, the number of processors allotted to a particular job may vary during the job's execution, and the task scheduler must adapt to these changes in processor resources. For overall system efficiency, the task scheduler should also provide parallelism feedback to the job scheduler to avoid the situation where a job is allotted processors that it cannot use productively.We present an adaptive task scheduler for multitasked jobs with dependencies that provides continual parallelism feedback to the job scheduler in the form of requests for processors. Our scheduler guarantees that a job completes near optimally while utilizing at least a constant fraction of the allotted processor cycles. Our scheduler can be applied to schedule data-parallel programs, such as those written in High Performance Fortran (HPF), *Lisp, C*, NESL, and ZPL.Our analysis models the job scheduler as the task scheduler's adversary, challenging the task scheduler to be robust to the system environment and the job scheduler's administrative policies. For example, the job scheduler can make available a huge number of processors exactly when the job has little use for them. To analyze the performance of our adaptive task scheduler under this stringent adversarial assumption, we introduce a new technique called "trim analysis," which allows us to prove that our task scheduler performs poorly on at most a small number of time steps, exhibiting near-optimal behavior on the vast majority.To be precise, suppose that a job has work T1 and critical-path length T∞ and is running on a machine with P processors. Using trim analysis, we prove that our scheduler completes the job in O(T1/P + T∞ + Llg P) time steps, where L is the length of a scheduling quantum and P denotes the O(T∞ + L lg P)-trimmed availability. This quantity is the average of the processor availability over all time steps excluding the O(T∞ + L lg P) time steps with the highest processor availability. When T1/T∞ >> P (the job's parallelism dominates the O(T∞ + L lg P)-trimmed availability), the job achieves nearly perfect linear speedup. Conversely, when T1/T∞

73 citations


Journal ArticleDOI
TL;DR: It is shown how JCilk's linguistic mechanisms can be used to program the "queens" puzzle and a parallel alpha-beta search and the compiler implements continuations in a novel fashion by introducing goto statements into Java.

61 citations


Proceedings ArticleDOI
22 Oct 2006
TL;DR: A framework for defining and exploring the memory semantics of open nesting in a transactionl-memory setting is offered, which allows the traditional model of serializability and two new transactional-memory models, race freedom and prefix race freedom, to be defined.
Abstract: Open nesting provides a loophole in the strict model of atomic transactions. Moss and Hosking suggested adapting open nesting for transactional memory, and Moss and a group at Stanford have proposed hardware schemes to support open nesting. Since these researchers have described their schemes using only operational definitions, however, the semantics of these systems have not been specified in an implementation-independent way. This paper offers a framework for defining and exploring the memory semantics of open nesting in a transactionl-memory setting.Our framework allows us to define the traditional model of serializability and two new transactional-memory models, race freedom and prefix race freedom. The weakest of these memory models, prefix race freedom, closely resembles the Stanford openesting model. We prove that these three memory models are equivalent for transactional-memory systems that support only closed nesting, as long as aborted transactions are "ignored." We prove that for systems that support open nesting, however, the models of serializability, race freedom, and prefix race freedom are distinct. We show that the Stanford TM system implements a model at least as strong as prefix race freedom and strictly weaker than race freedom. Thus, their model compromises serializability, the property traditionally used to reason about the correctness of transactions.

36 citations


Proceedings ArticleDOI
04 Jul 2006
TL;DR: Simulation studies confirm with simulation studies that A-STEAL performs well when scheduling adaptively parallel work-stealing jobs on large-scale multiprocessors and provide evidence that A.STEAL consistently provides higher utilization than ABP for a variety of job mixes.
Abstract: A-STEAL is a provably good adaptive work-stealing thread scheduler that provides parallelism feedback to a multiprocessor job scheduler. A-STEAL uses a simple multiplicative-increase, multiplicative-decrease algorithm to provide continual parallelism feedback to the job scheduler in the form of processor requests. Although jobs scheduled by A-STEAL can be shown theoretically to complete in near-optimal time asymptotically while utilizing at least a constant fraction of the allotted processors, the constants in the analysis leave it open on whether A-STEAL works well in practice. This paper confirms with simulation studies that A-STEAL performs well when scheduling adaptively parallel work-stealing jobs on large-scale multiprocessors. Our studies monitored the behavior of A-STEAL on a simulated multiprocessor system using synthetic workloads. We measured the completion time and waste of A-STEAL on over 2300 job runs using a variety of processor availability profiles. Linear-regression analysis indicates that ASTEAL provides almost perfect linear speedup. In addition, A-STEAL typically wasted less than 20% of the processor cycles allotted to the job. We compared A-STEAL with the ABP algorithm, an adaptive work-stealing thread scheduler developed by Arora, Blumofe, and Plaxton which does not employ parallelism feedback. On moderately to heavily loaded large machines with predetermined availability profiles, A-STEAL typically completed jobs more than twice as quickly, despite being allotted the same or fewer processors on every step, while wasting only 10% of the processor cycles wasted by ABP. We compared the utilization of A-STEAL and ABP when many jobs with varying characteristics are using the same multiprocessor. These experiments provide evidence that A-STEAL consistently provides higher utilization than ABP for a variety of job mixes.

35 citations


Book ChapterDOI
26 Jun 2006
TL;DR: Because the length of the scheduling quantum can be adjusted to amortize the cost of context-switching during processor reallocation, the two two-level schedulers provide control over the scheduling overhead and ensure effective utilization of processors.
Abstract: Multiprocessor scheduling in a shared multiprogramming environment can be structured in two levels, where a kernel-level job scheduler allots processors to jobs and a user-level thread scheduler maps the ready threads of a job onto the allotted processors. This paper presents two-level scheduling schemes for scheduling "adaptive" multithreaded jobs whose parallelism can change during execution. The AGDEQ algorithm uses dynamic-equipartioning (DEQ) as a job-scheduling policy and an adaptive greedy algorithm (A-Greedy) as the thread scheduler. The ASDEQ algorithm uses DEQ for job scheduling and an adaptive work-stealing algorithm (A-Steal) as the thread scheduler. AGDEQ is suitable for scheduling in centralized scheduling environments, and ASDEQ is suitable for more decentralized settings. Both two-level schedulers achieve O(1)-competitiveness with respect to makespan for any set of multithreaded jobs with arbitrary release time. They are also O(1)- competitive for any batched jobs with respect to mean response time. Moreover, because the length of the scheduling quantum can be adjusted to amortize the cost of context-switching during processor reallocation, our schedulers provide control over the scheduling overhead and ensure effective utilization of processors.

30 citations