scispace - formally typeset
Search or ask a question
Author

George Papadimitriou

Bio: George Papadimitriou is an academic researcher from National and Kapodistrian University of Athens. The author has contributed to research in topics: Efficient energy use & Multi-core processor. The author has an hindex of 7, co-authored 8 publications receiving 146 citations.

Papers
More filters
Proceedings ArticleDOI
14 Oct 2017
TL;DR: This paper presents the first automated system-level analysis of multicore CPUs based on ARMv8 64-bit architecture when pushed to operate in scaled voltage conditions and proposes a new composite metric (severity) that aggregates the behavior of cores when undervolted and can support system operation and design protection decisions.
Abstract: In this paper, we present the first automated system-level analysis of multicore CPUs based on ARMv8 64-bit architecture (8-core, 28nm X-Gene 2 micro-server by AppliedMicro) when pushed to operate in scaled voltage conditions. We report detailed system-level effects including SDCs, corrected/uncorrected errors and application/system crashes. Our study reveals large voltage margins (that can be harnessed for energy savings) and also large $V_{min}$ variation among the 8 cores of the CPU chip, among 3 different chips (a nominal rated and two sigma chips), and among different benchmarks.Apart from the $V_{min}$ analysis we propose a new composite metric (severity) that aggregates the behavior of cores when undervolted and can support system operation and design protection decisions. Our undervolting characterization findings are the first reported analysis for an enterprise class 64-bit ARMv8 platform and we highlight key differences with previous studies on x86 platforms. We utilize the results of the system characterization along with performance counters information to measure the accuracy of prediction models for the behavior of benchmarks running in particular cores. Finally, we discuss how the detailed characterization and the prediction results can be effectively used to support design and system software decisions to harness voltage margins for energy efficiency while preserving operation correctness. Our findings show that, on average, 19.4% energy saving can be achieved without compromising the performance, while with 25% performance reduction, the energy saving raises to 38.8%.CCS CONCEPTS• Hardware → Power and energy → Power estimation and optimization; • Hardware → Robustness → Hardware reliability → Process, voltage and temperature variations

59 citations

Proceedings ArticleDOI
01 Feb 2019
TL;DR: This paper presents a comprehensive exploration of how two server-grade systems behave in different frequency and core allocation configurations beyond nominal voltage operation, and integrates a lightweight online monitoring daemon which decides the optimal combination of voltage, core allocation, and clock frequency to achieve higher energy efficiency.
Abstract: Energy efficiency is a known major concern for computing system designers. Significant effort is devoted to power optimization of modern systems, especially in largescale installations such as data centers, in which both high performance and energy efficiency are important. Power optimization can be achieved through different approaches, several of which focus on adaptive voltage regulation. In this paper, we present a comprehensive exploration of how two server-grade systems behave in different frequency and core allocation configurations beyond nominal voltage operation. Our analysis, which is built on top of two state-of-the-art ARMv8 microprocessor chips (Applied Micro’s X-Gene 2 and X-Gene 3) aims (1) to identify the best performance per watt operation points when the servers are operating in various voltage/frequency combinations, (2) to reveal how and why the different core allocation options on the available cores of the microprocessor affect the energy consumption, and (3) to enhance the default Linux scheduler to take task allocation decisions for balanced performance and energy efficiency. Our findings, on actual servers’ hardware, have been integrated into a lightweight online monitoring daemon which decides the optimal combination of voltage, core allocation, and clock frequency to achieve higher energy efficiency. Our approach reduces on average the energy by 25.2% on X-Gene 2, and 22.3% on X-Gene 3, with a minimal performance penalty of 3.2% on X-Gene 2 and 2.5% on X-Gene 3, compared to the default system configuration. Keywords-Energy efficiency; voltage and frequency scaling; power consumption; multicore characterization; micro-servers;

36 citations

Proceedings ArticleDOI
03 Jul 2017
TL;DR: The pessimistic voltage guardbands of two multicore x86-64 microprocessor chips that belong to different microarchitectures (one ultra-low power and one high-performance microprocessor), when programs are executed on individual cores of the CPU chips are explored.
Abstract: In this paper, we explore the pessimistic voltage guardbands of two multicore x86-64 microprocessor chips that belong to different microarchitectures (one ultra-low power and one high-performance microprocessor), when programs are executed on individual cores of the CPU chips. We also examine the energy and temperature gains as positive effects of lowering the voltage in both chips while preserving the functional correctness of programs. The behavior of the cores was examined executing 8 different workloads from the SPEC CPU2006 suite. Our differential experimental study is performed on two state-of-the-art x86-64 microprocessors: an ultra-low power Intel Core i5-4200U and a high-performance Intel Core i7-3970X. Based on the results, the cores on each microprocessor chip behave differently for different workloads when undervolted, and the voltage guardbands are more than 15% below the nominal voltage levels. We show that the energy efficiency can be increased by a maximum of 20% and the reduction of temperature can be up to 25%.

25 citations

Proceedings ArticleDOI
23 Jul 2018
TL;DR: The overall energy savings that could be achieved by shaving the adopted guardbands in the cores and memories using various applications are shown and show the potential to obtain up to 38.8% energy savings in cores and up-to 27.3% within DRAMs.
Abstract: In this paper, we present the results of our comprehensive measurement study of the timing and voltage guardbands in memories and cores of a commodity ARMv8 based micro-server. Using various synthetic micro-benchmarks, we reveal how the adopted voltage margins vary among the 8 cores of the CPU chip, and among 3 different sigma chips and we show how prone they are to worst-case voltage noise. In addition, we characterize the variation of 'weak' DRAM cells in terms of their retention time across 72 DRAM chips and evaluate the error mitigation efficacy of the available error-correcting codes in case of operation under aggressively relaxed refresh periods. Finally, we show the overall energy savings that could be achieved by shaving the adopted guardbands in the cores and memories using various applications. Our characterization results show the potential to obtain up-to 38.8% energy savings in cores and up-to 27.3% within DRAMs.

17 citations

Journal ArticleDOI
TL;DR: This study presents a comprehensive statistical analysis of the behavior of ARMv8 64-bit cores that are part of the enterprise 8-core X-Gene 2 micro-server family when they operate in scaled voltage conditions and shows that the model is able to accurately predict safe voltage margins that provide up to 20.28% power savings.
Abstract: Designers try to reduce the voltage margins of CPU chips to gain energy without sacrificing reliable operation. Statistical analysis methods are appealing to predict the safe operational margins at the system level as they do not induce area overheads and they can be applied during manufacturing or after the chips' release to the market. In this study, we present a comprehensive statistical analysis of the behavior of ARMv8 64-bit cores that are part of the enterprise 8-core X-Gene 2 micro-server family when they operate in scaled voltage conditions. Our prediction schemes that use real hardware counters as input are based on linear regression models with several feature selection techniques that aim to predict the safe voltage margins of any given workload when the cores operate in scaled conditions. Our findings show that our model is able to accurately predict safe voltage margins that provide up to 20.28% power savings.

16 citations


Cited by
More filters
Proceedings ArticleDOI
14 Oct 2017
TL;DR: This paper presents the first automated system-level analysis of multicore CPUs based on ARMv8 64-bit architecture when pushed to operate in scaled voltage conditions and proposes a new composite metric (severity) that aggregates the behavior of cores when undervolted and can support system operation and design protection decisions.
Abstract: In this paper, we present the first automated system-level analysis of multicore CPUs based on ARMv8 64-bit architecture (8-core, 28nm X-Gene 2 micro-server by AppliedMicro) when pushed to operate in scaled voltage conditions. We report detailed system-level effects including SDCs, corrected/uncorrected errors and application/system crashes. Our study reveals large voltage margins (that can be harnessed for energy savings) and also large $V_{min}$ variation among the 8 cores of the CPU chip, among 3 different chips (a nominal rated and two sigma chips), and among different benchmarks.Apart from the $V_{min}$ analysis we propose a new composite metric (severity) that aggregates the behavior of cores when undervolted and can support system operation and design protection decisions. Our undervolting characterization findings are the first reported analysis for an enterprise class 64-bit ARMv8 platform and we highlight key differences with previous studies on x86 platforms. We utilize the results of the system characterization along with performance counters information to measure the accuracy of prediction models for the behavior of benchmarks running in particular cores. Finally, we discuss how the detailed characterization and the prediction results can be effectively used to support design and system software decisions to harness voltage margins for energy efficiency while preserving operation correctness. Our findings show that, on average, 19.4% energy saving can be achieved without compromising the performance, while with 25% performance reduction, the energy saving raises to 38.8%.CCS CONCEPTS• Hardware → Power and energy → Power estimation and optimization; • Hardware → Robustness → Hardware reliability → Process, voltage and temperature variations

59 citations

Proceedings Article
01 Jan 2020
TL;DR: V0LTpwn is a novel hardware-oriented but software-controlled attack that affects the integrity of computation in virtually any execution mode on modern x86 processors, and represents the first attack on x86 integrity from software.
Abstract: Fault-injection attacks have been proven in the past to be a reliable way of bypassing hardware-based security measures, such as cryptographic hashes, privilege and access permission enforcement, and trusted execution environments. However, traditional fault-injection attacks require physical presence, and hence, were often considered out of scope in many real-world adversary settings. In this paper we show this assumption may no longer be justified. We present V0LTpwn, a novel hardware-oriented but software-controlled attack that affects the integrity of computation in virtually any execution mode on modern x86 processors. To the best of our knowledge, this represents the first attack on x86 integrity from software. The key idea behind our attack is to undervolt a physical core to force non-recoverable hardware faults. Under a V0LTpwn attack, CPU instructions will continue to execute with erroneous results and without crashes, allowing for exploitation. In contrast to recently presented side-channel attacks that leverage vulnerable speculative execution, V0LTpwn is not limited to information disclosure, but allows adversaries to affect execution, and hence, effectively breaks the integrity goals of modern x86 platforms. In our detailed evaluation we successfully launch software-based attacks against Intel SGX enclaves from a privileged process to demonstrate that a V0LTpwn attack can successfully change the results of computations within enclave execution across multiple CPU revisions.

53 citations

Journal ArticleDOI
TL;DR: A comprehensive up-to-date survey identifies the main trade-offs and limitations of the existing hardware-accelerated platforms and infrastructures for NFs and outlines directions for future research.
Abstract: In order to facilitate flexible network service virtualization and migration, network functions (NFs) are increasingly executed by software modules as so-called “softwarized NFs” on General-Purpose Computing (GPC) platforms and infrastructures. GPC platforms are not specifically designed to efficiently execute NFs with their typically intense Input/Output (I/O) demands. Recently, numerous hardware-based accelerations have been developed to augment GPC platforms and infrastructures, e.g., the central processing unit (CPU) and memory, to efficiently execute NFs. This article comprehensively surveys hardware-accelerated platforms and infrastructures for executing softwarized NFs. This survey covers both commercial products, which we consider to be enabling technologies, as well as relevant research studies. We have organized the survey into the main categories of enabling technologies and research studies on hardware accelerations for the CPU, the memory, and the interconnects (e.g., between CPU and memory), as well as custom and dedicated hardware accelerators (that are embedded on the platforms); furthermore, we survey hardware-accelerated infrastructures that connect GPC platforms to networks (e.g., smart network interface cards). We find that the CPU hardware accelerations have mainly focused on extended instruction sets and CPU clock adjustments, as well as cache coherency. Hardware accelerated interconnects have been developed for on-chip and chip-to-chip connections. Our comprehensive up-to-date survey identifies the main trade-offs and limitations of the existing hardware-accelerated platforms and infrastructures for NFs and outlines directions for future research.

51 citations

Journal ArticleDOI
TL;DR: A run-time simulation framework of both PD and architecture and captures their interactions that can achieve smaller than 1% deviation from SPICE for an entire PD system simulation and investigates the impact of dynamic noise on system level oxide breakdown reliability.
Abstract: With the reduced noise margin brought by relentless technology scaling, power integrity assurance has become more challenging than ever. On the other hand, traditional design methodologies typically focus on a single design layer without much cross-layer interaction, potentially introducing unnecessary guard-band and wasting significant design resources. Both issues imperatively call for a cross-layer framework for the co-exploration of power delivery (PD) and system architecture, especially in the early design stage with larger design and optimization freedom. Unfortunately, such a framework does not exist yet in the literature. As a step forward, this paper provides a run-time simulation framework of both PD and architecture and captures their interactions. Enabled by the proposed recursive run-time PD model, it can achieve smaller than 1% deviation from SPICE for an entire PD system simulation. Moreover, with seamless interactions among architecture, power and PD simulators, it can simulate actual benchmarks within reasonable time. The experimental results of running PARSEC suite have demonstrated the framework’s capability to discover the co-effect of PD and architecture for early stage design optimization. Moreover, it also shows multiple over-pessimism in traditional PD methodologies. Finally, the framework is able to investigate the impact of dynamic noise on system level oxide breakdown reliability and shows 31%–92% lifetime estimation deviations from typical static analysis.

45 citations

Proceedings ArticleDOI
20 Oct 2018
TL;DR: To attain power savings without NN accuracy loss, a novel technique is proposed that relies on the deterministic behavior of undervolting faults and can limit the accuracy loss to 0.1% without any timing-slack overhead.
Abstract: In this work, we evaluate aggressive undervolting, i.e., voltage scaling below the nominal level to reduce the energy consumption of Field Programmable Gate Arrays (FPGAs). Usually, voltage guardbands are added by chip vendors to ensure the worst-case process and environmental scenarios. Through experimenting on several FPGA architectures, we measure this voltage guardband to be on average 39% of the nominal level, which in turn, delivers more than an order of magnitude power savings. However, further undervolting below the voltage guardband may cause reliability issues as the result of the circuit delay increase, i.e., start to appear faults. We extensively characterize the behavior of these faults in terms of the rate, location, type, as well as sensitivity to environmental temperature, with a concentration of on-chip memories, or Block RAMs (BRAMs). Finally, we evaluate a typical FPGA-based Neural Network (NN) accelerator under low-voltage BRAM operations. In consequence, the substantial NN energy savings come with the cost of NN accuracy loss. To attain power savings without NN accuracy loss, we propose a novel technique that relies on the deterministic behavior of undervolting faults and can limit the accuracy loss to 0.1% without any timing-slack overhead.

45 citations