scispace - formally typeset
Search or ask a question

Showing papers by "Per Stenström published in 2022"


Journal ArticleDOI
TL;DR: A general model of task-parallel applications under quality-of-service requirements on the completion time, called Task-RM, exploits the variance in task execution-times and imbalance between tasks to allocate just enough resources so that the application completes before the deadline.
Abstract: Improving energy efficiency is an important goal of computer system design. This article focuses on a general model of task-parallel applications under quality-of-service requirements on the completion time. Our technique, called Task-RM, exploits the variance in task execution-times and imbalance between tasks to allocate just enough resources in terms of voltage-frequency and core-allocation so that the application completes before the deadline. Moreover, we provide a solution that can harness additional energy savings with the availability of additional processors. We observe that, for the proposed run-time resource manager to allocate resources, it requires specification of the soft deadlines to the tasks. This is accomplished by analyzing the energy-saving scenarios offline and by providing Task-RM with the performance requirements of the tasks. The evaluation shows an energy saving of 33% compared to race-to-idle and 22% compared to dynamic slack allocation (DSA) with an overhead of less than 1%.

6 citations


Journal ArticleDOI
TL;DR: In this article , the authors identify the four root causes of timing-based side-channel attacks: determinism, sharing, access violation and information flow, through a systematic analysis.
Abstract: Microarchitectural optimizations are expected to play a crucial role in ensuring performance scalability in future technology nodes. However, recent attacks have demonstrated that microarchitectural optimizations, which were assumed to be secure, can be exploited. Moreover, new attacks surface at a rapid pace limiting the scope of existing defenses. These developments prompt the need to review microarchitectural optimizations with an emphasis on security, understand the attack landscape and the potential defense strategies. We analyze timing-based side-channel attacks targeting a diverse set of microarchitectural optimizations. We provide a framework for analysing non-transient and transient attacks, which highlights the similarities. We identify the four root causes of timing-based side-channel attacks: determinism, sharing, access violation and information flow, through our systematic analysis. Our key insight is that a subset (or all) of the root causes are exploited by attacks and eliminating any of the exploited root causes, in any attack step, is enough to provide protection. Leveraging our framework, we systematize existing defenses and show that they target these root causes in the different attack steps.

1 citations


Proceedings ArticleDOI
01 Apr 2022
TL;DR: A new compression method – Global Base Delta Immediate compression (GBDI) – that offers substantially higher memory bandwidth by, unlike prior art, selecting base values across memory blocks by using a novel clustering algorithm through data analysis in the background.
Abstract: Memory bandwidth is limiting performance for many emerging applications. While compression techniques can unlock a higher memory bandwidth, prior art offers only modestly better bandwidth. This paper contributes with a new compression method – Global Base Delta Immediate compression (GBDI) – that offers substantially higher memory bandwidth by, unlike prior art, selecting base values across memory blocks. GBDI uses a novel clustering algorithm through data analysis in the background. The presented accelerator infrastructure offers low area overhead and latency. This paper shows that GBDI offers a compression ratio of 2.3×, and yields 1.5× higher bandwidth and 1.1× higher performance compared with a baseline without compression support, on average, for SPEC2017 benchmarks requiring medium to high memory bandwidth.

1 citations


Journal ArticleDOI
TL;DR: Cooperative Slack Management (CSM) is presented, which can potentially save up to 41% of system energy in a scenario in which both prior art and an extended version with local slack management for each core are ineffective.
Abstract: Processor resources can be adapted at runtime according to the dynamic behavior of applications to reduce the energy consumption of multicore processors without affecting the Quality-of-Service (QoS). To achieve this, an online resource management scheme is needed to control processor configurations such as cache partitioning, dynamic voltage-frequency scaling, and dynamic adaptation of core resources. Prior State-of-the-art has shown the potential for reducing energy without any performance degradation by coordinating the control of different resources. However, in this article, we show that by allowing short-term variations in processing speed (e.g., instructions per second rate), in a controlled fashion, we can enable substantial improvements in energy savings while maintaining QoS. We keep track of such variations in the form of performance slack. Slack can be generated, at some energy cost, by processing faster than the performance target. On the other hand, it can be utilized to save energy by allowing a temporary relaxation in the performance target. Based on this insight, we present Cooperative Slack Management (CSM). During runtime, CSM finds opportunities to generate slack at low energy cost by estimating the performance and energy for different resource configurations using analytical models. This slack is used later when it enables larger energy savings. CSM performs such trade-offs across multiple applications, which means that the slack collected for one application can be used to reduce the energy consumption of another. This cooperative approach significantly increases the opportunities to reduce system energy compared with independent slack management for each application. For example, we show that CSM can potentially save up to 41% of system energy (on average, 25%) in a scenario in which both prior art and an extended version with local slack management for each core are ineffective.