Symbiotic jobscheduling for a simultaneous multithreaded processor

doi:10.1145/356989.357011

Journal ArticleDOI

Symbiotic jobscheduling for a simultaneous multithreaded processor

- Vol. 35, Iss: 11, pp 234-244

TLDR

It is demonstrated that performance on a hardware multithreaded processor is sensitive to the set of jobs that are coscheduled by the operating system jobscheduler, and that a small sample of the possible schedules is sufficient to identify a good schedule quickly.

Abstract:

Simultaneous Multithreading machines fetch and execute instructions from multiple instruction streams to increase system utilization and speedup the execution of jobs. When there are more jobs in the system than there is hardware to support simultaneous execution, the operating system scheduler must choose the set of jobs to coscheduleThis paper demonstrates that performance on a hardware multithreaded processor is sensitive to the set of jobs that are coscheduled by the operating system jobscheduler. Thus, the full benefits of SMT hardware can only be achieved if the scheduler is aware of thread interactions. Here, a mechanism is presented that allows the scheduler to significantly raise the performance of SMT architectures. This is done without any advance knowledge of a workload's characteristics, using sampling to identify jobs which run well together.We demonstrate an SMT jobscheduler called SOS. SOS combines an overhead-free sample phase which collects information about various possible schedules, and a symbiosis phase which uses that information to predict which schedule will provide the best performance. We show that a small sample of the possible schedules is sufficient to identify a good schedule quickly. On a system with random job arrivals and departures, response time is improved as much as 17% over a schedule which does not incorporate symbiosis.

Symbiotic jobscheduling for a simultaneous multithreaded processor

Citations

Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance

Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

Predicting inter-thread cache contention on a chip multi-processor architecture

References

A Proof for the Queuing Formula: L = λW

Simultaneous multithreading: maximizing on-chip parallelism

The UNIX time-sharing system

Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor

The Tera computer system

Related Papers (5)

Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Memory access scheduling

Simultaneous multithreading: maximizing on-chip parallelism

Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Trending Questions (1)