scispace - formally typeset
Journal ArticleDOI

Symbiotic jobscheduling for a simultaneous multithreaded processor

Allan Snavely, +1 more
- Vol. 35, Iss: 11, pp 234-244
TLDR
It is demonstrated that performance on a hardware multithreaded processor is sensitive to the set of jobs that are coscheduled by the operating system jobscheduler, and that a small sample of the possible schedules is sufficient to identify a good schedule quickly.
Abstract
Simultaneous Multithreading machines fetch and execute instructions from multiple instruction streams to increase system utilization and speedup the execution of jobs. When there are more jobs in the system than there is hardware to support simultaneous execution, the operating system scheduler must choose the set of jobs to coscheduleThis paper demonstrates that performance on a hardware multithreaded processor is sensitive to the set of jobs that are coscheduled by the operating system jobscheduler. Thus, the full benefits of SMT hardware can only be achieved if the scheduler is aware of thread interactions. Here, a mechanism is presented that allows the scheduler to significantly raise the performance of SMT architectures. This is done without any advance knowledge of a workload's characteristics, using sampling to identify jobs which run well together.We demonstrate an SMT jobscheduler called SOS. SOS combines an overhead-free sample phase which collects information about various possible schedules, and a symbiosis phase which uses that information to predict which schedule will provide the best performance. We show that a small sample of the possible schedules is sufficient to identify a good schedule quickly. On a system with random job arrivals and departures, response time is improved as much as 17% over a schedule which does not incorporate symbiosis.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance

TL;DR: This paper examines two single-ISA heterogeneous multi-core architectures in detail, demonstrating dynamic core assignment policies that provide significant performance gains over naive assignment, and even outperform the best static assignment.
Proceedings ArticleDOI

Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

TL;DR: This paper proposes a new memory access scheduler, called the Stall-Time Fair Memory scheduler (STFM), that provides quality of service to different threads sharing the DRAM memory system and shows that STFM significantly reduces the unfairness in theDRAM system while also improving system throughput on a wide variety of workloads and systems.
Journal ArticleDOI

Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

TL;DR: A parallelism-aware batch scheduler that seamlessly incorporates support for system-level thread priorities and can provide different service levels, including purely opportunistic service, to threads with different priorities, and is also simpler to implement than STFM.
Proceedings ArticleDOI

Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

TL;DR: It is found that optimizing fairness usually increases throughput, while maximizing throughput does not necessarily improve fairness, and two algorithms are proposed that optimize fairness.
Proceedings ArticleDOI

Predicting inter-thread cache contention on a chip multi-processor architecture

TL;DR: Three performance models are proposed that predict the impact of cache sharing on co-scheduled threads and the most accurate model, the inductive probability model, achieves an average error of only 3.9%.
References
More filters
Journal ArticleDOI

A Proof for the Queuing Formula: L = λW

TL;DR: In this paper, it was shown that if the three means are finite and the corresponding stochastic processes strictly stationary, and if the arrival process is metrically transitive with nonzero mean, then L = λW.
Proceedings ArticleDOI

Simultaneous multithreading: maximizing on-chip parallelism

TL;DR: Simultaneous multithreading has the potential to achieve 4 times the throughput of a superscalar, and double that of fine-grain multi-threading, and is an attractive alternative to single-chip multiprocessors.
Journal ArticleDOI

The UNIX time-sharing system

TL;DR: The nature and implementation of the file system and of the user command interface are discussed, including the ability to initiate asynchronous processes and over 100 subsystems including a dozen languages.
Proceedings ArticleDOI

Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor

TL;DR: This paper presents an architecture for simultaneous multithreading that minimizes the architectural impact on the conventional superscalar design, has minimal performance impact on a single thread executing alone, and achieves significant throughput gains when running multiple threads.
Proceedings ArticleDOI

The Tera computer system

TL;DR: The Tera architecture was designed with several goals in mind; it needed to be suitable for very high speed implementations, i.
Trending Questions (1)
How would you design a system of getting correct information about job status to identify delays quickly?

We show that a small sample of the possible schedules is sufficient to identify a good schedule quickly.