Journal ArticleDOI
Symbiotic jobscheduling for a simultaneous multithreaded processor
Allan Snavely,Dean M. Tullsen +1 more
- Vol. 35, Iss: 11, pp 234-244
TLDR
It is demonstrated that performance on a hardware multithreaded processor is sensitive to the set of jobs that are coscheduled by the operating system jobscheduler, and that a small sample of the possible schedules is sufficient to identify a good schedule quickly.Abstract:
Simultaneous Multithreading machines fetch and execute instructions from multiple instruction streams to increase system utilization and speedup the execution of jobs. When there are more jobs in the system than there is hardware to support simultaneous execution, the operating system scheduler must choose the set of jobs to coscheduleThis paper demonstrates that performance on a hardware multithreaded processor is sensitive to the set of jobs that are coscheduled by the operating system jobscheduler. Thus, the full benefits of SMT hardware can only be achieved if the scheduler is aware of thread interactions. Here, a mechanism is presented that allows the scheduler to significantly raise the performance of SMT architectures. This is done without any advance knowledge of a workload's characteristics, using sampling to identify jobs which run well together.We demonstrate an SMT jobscheduler called SOS. SOS combines an overhead-free sample phase which collects information about various possible schedules, and a symbiosis phase which uses that information to predict which schedule will provide the best performance. We show that a small sample of the possible schedules is sufficient to identify a good schedule quickly. On a system with random job arrivals and departures, response time is improved as much as 17% over a schedule which does not incorporate symbiosis.read more
Citations
More filters
Journal ArticleDOI
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
TL;DR: This paper examines two single-ISA heterogeneous multi-core architectures in detail, demonstrating dynamic core assignment policies that provide significant performance gains over naive assignment, and even outperform the best static assignment.
Proceedings ArticleDOI
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors
Onur Mutlu,Thomas Moscibroda +1 more
TL;DR: This paper proposes a new memory access scheduler, called the Stall-Time Fair Memory scheduler (STFM), that provides quality of service to different threads sharing the DRAM memory system and shows that STFM significantly reduces the unfairness in theDRAM system while also improving system throughput on a wide variety of workloads and systems.
Journal ArticleDOI
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
Onur Mutlu,Thomas Moscibroda +1 more
TL;DR: A parallelism-aware batch scheduler that seamlessly incorporates support for system-level thread priorities and can provide different service levels, including purely opportunistic service, to threads with different priorities, and is also simpler to implement than STFM.
Proceedings ArticleDOI
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture
TL;DR: It is found that optimizing fairness usually increases throughput, while maximizing throughput does not necessarily improve fairness, and two algorithms are proposed that optimize fairness.
Proceedings ArticleDOI
Predicting inter-thread cache contention on a chip multi-processor architecture
TL;DR: Three performance models are proposed that predict the impact of cache sharing on co-scheduled threads and the most accurate model, the inductive probability model, achieves an average error of only 3.9%.
References
More filters
Journal ArticleDOI
A Proof for the Queuing Formula: L = λW
TL;DR: In this paper, it was shown that if the three means are finite and the corresponding stochastic processes strictly stationary, and if the arrival process is metrically transitive with nonzero mean, then L = λW.
Proceedings ArticleDOI
Simultaneous multithreading: maximizing on-chip parallelism
TL;DR: Simultaneous multithreading has the potential to achieve 4 times the throughput of a superscalar, and double that of fine-grain multi-threading, and is an attractive alternative to single-chip multiprocessors.
Journal ArticleDOI
The UNIX time-sharing system
Dennis M. Ritchie,Ken Thompson +1 more
TL;DR: The nature and implementation of the file system and of the user command interface are discussed, including the ability to initiate asynchronous processes and over 100 subsystems including a dozen languages.
Proceedings ArticleDOI
Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor
TL;DR: This paper presents an architecture for simultaneous multithreading that minimizes the architectural impact on the conventional superscalar design, has minimal performance impact on a single thread executing alone, and achieves significant throughput gains when running multiple threads.
Proceedings ArticleDOI
The Tera computer system
Robert Alverson,David Callahan,Daniel Cummings,Brian D. Koblenz,Allan Porterfield,Burton Smith +5 more
TL;DR: The Tera architecture was designed with several goals in mind; it needed to be suitable for very high speed implementations, i.
Related Papers (5)
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors
Onur Mutlu,Thomas Moscibroda +1 more