Showing papers presented at "Virtual Execution Environments in 2018"

PDF

Open Access

Proceedings Article•DOI•

[...]

Adam Ruprecht¹, Danny Jones¹, Dmitry Shiraev¹, Greg Harmon¹, Maya Spivak¹, Michael Krebs¹, Miche Baker-Harvey¹, Tyler Sanderson¹ - Show less +4 more•Institutions (1)

Google¹

25 Mar 2018

TL;DR: This paper presents how to use VM live migration at scale to eliminate this disruption with minimal impact to the guest, performing over 1,000,0001migrations monthly in the authors' production fleet, with 50ms median blackout, 300ms 99th percentile blackout.

...read moreread less

Abstract: Uninterrupted uptime is a critical aspect of Virtual Machines (VMs) offered by cloud hosting providers. Google's VMs run on top of rapidly changing infrastructure: we regularly update hardware and host software, and we must quickly respond to failing hardware. Frequent change is critical to both development velocity---deploying new versions of services and infrastructure---and the ability to respond rapidly to defects, including critical security fixes. Typically these updates would be disruptive, resulting in VM termination or restart. In this paper we present how we use VM live migration at scale to eliminate this disruption with minimal impact to the guest, performing over 1,000,0001migrations monthly in our production fleet, with 50ms median blackout, 300ms 99th percentile blackout.

...read moreread less

36 citations

Proceedings Article•DOI•

An Analysis of x86-64 Inline Assembly in C Programs

[...]

Manuel Rigger¹, Stefan Marr², Stephen Kell³, David Leopoldseder¹, Hanspeter Mössenböck¹ - Show less +1 more•Institutions (3)

Johannes Kepler University of Linz¹, University of Kent², University of Cambridge³

25 Mar 2018

TL;DR: The findings are intended to help developers of C-focused tools, those testing compilers, and language designers seeking to reduce the reliance on inline assembly, and they may also aid the design of tools focused oninline assembly itself.

...read moreread less

Abstract: C codebases frequently embed nonportable and unstandardized elements such as inline assembly code. Such elements are not well understood, which poses a problem to tool developers who aspire to support C code. This paper investigates the use of x86-64 inline assembly in 1264 C projects from GitHub and combines qualitative and quantitative analyses to answer questions that tool authors may have. We found that 28.1% of the most popular projects contain inline assembly code, although the majority contain only a few fragments with just one or two instructions. The most popular instructions constitute a small subset concerned largely with multicore semantics, performance optimization, and hardware control. Our findings are intended to help developers of C-focused tools, those testing compilers, and language designers seeking to reduce the reliance on inline assembly. They may also aid the design of tools focused on inline assembly itself.

...read moreread less

21 citations

Proceedings Article•DOI•

gMig: Efficient GPU Live Migration Optimized by Software Dirty Page for Full Virtualization

[...]

Jiacheng Ma¹, Xiao Zheng², Yaozu Dong², Wentai Li¹, Zhengwei Qi¹, Bingsheng He³, Haibing Guan¹ - Show less +3 more•Institutions (3)

Shanghai Jiao Tong University¹, Intel², National University of Singapore³

25 Mar 2018

TL;DR: This paper introduces gMig, an open-source and practical GPU live migration solution for full virtualization that presents the One-Shot Pre-Copy combined with the hashing based Software Dirty Page technique to achieve efficientGPU live migration.

...read moreread less

Abstract: This paper introduces gMig, an open-source and practical GPU live migration solution for full virtualization. By taking advantage of the dirty pattern of GPU workloads, gMig presents the One-Shot Pre-Copy combined with the hashing based Software Dirty Page technique to achieve efficient GPU live migration. Particularly, we propose three approaches for gMig: 1) Dynamic Graphics Address Remapping, which parses and manipulates GPU commands to adjust the address mapping to adapt to a different environment after migration, 2) Software Dirty Page, which utilizes a hashing based approach to detect page modification, overcomes the commodity GPU's hardware limitation, and speeds up the migration by only sending the dirtied pages, 3) One-Shot Pre-Copy, which greatly reduces the rounds of pre-copy of graphics memory. Our evaluation shows that gMig achieves GPU live migration with an average downtime of 302 ms on Windows and 119 ms on Linux. With the help of Software Dirty Page, the number of GPU pages transferred during the downtime is effectively reduced by 80.0%.

...read moreread less

11 citations

Proceedings Article•DOI•

Improving Dynamically-Generated Code Performance on Dynamic Binary Translators

[...]

Wenwen Wang¹, Jiacheng Wu², Xiaoli Gong², Tao Li², Pen-Chung Yew¹ - Show less +1 more•Institutions (2)

University of Minnesota¹, Nankai University²

25 Mar 2018

TL;DR: A novel approach to optimize DBT systems for guest applications with dynamically-generated code that can maximize the reuse of previously translated host code to mitigate the re-translation overhead.

...read moreread less

Abstract: The recent transition in the software industry toward dynamically generated code poses a new challenge to existing dynamic binary translation (DBT) systems. A significant re-translation overhead could be introduced due to the maintenance of the consistency between the dynamically-generated guest code and the corresponding translated host code. To address this issue, this paper presents a novel approach to optimize DBT systems for guest applications with dynamically-generated code. The proposed approach can maximize the reuse of previously translated host code to mitigate the re-translation overhead. A prototype based on such an approach has been implemented on an existing DBT system HQEMU. Experimental results on a set of JavaScript applications show that it can achieve a 1.24X performance speedup on average compared to the original HQEMU.

...read moreread less

7 citations

Proceedings Article•DOI•

Fast PokeEMU: Scaling Generated Instruction Tests Using Aggregation and State Chaining

[...]

Qiuchen Yan¹, Stephen McCamant¹•Institutions (1)

University of Minnesota¹

25 Mar 2018

TL;DR: This work explores techniques for combining many instruction tests into one program to amortize overheads such as booting an emulator and adopts the "Feistel network" construction from cryptography so that each step is invertible.

...read moreread less

Abstract: Software that emulates a CPU has many applications, but is difficult to implement correctly and requires extensive testing. Since a large number of test cases are required for full coverage, it is important that the tests execute efficiently. We explore techniques for combining many instruction tests into one program to amortize overheads such as booting an emulator. To ensure the results of each test are reflected in a final result, we use the outputs of one instruction test as an input to the next, and adopt the "Feistel network" construction from cryptography so that each step is invertible. We evaluate this approach by applying it to PokeEMU, a tool that generates emulator tests using symbolic execution. The combined tests run much faster, but still reveal most of the same behavior differences as when run individually.

...read moreread less

5 citations

Proceedings Article•DOI•

Hop, Skip, & Jump: Practical On-Stack Replacement for a Cross-Platform Language-Neutral VM

[...]

Kunshan Wang¹, Stephen M. Blackburn¹, Antony L. Hosking¹, Michael Norrish²•Institutions (2)

Australian National University¹, Commonwealth Scientific and Industrial Research Organisation²

25 Mar 2018

TL;DR: This paper presents an OSR abstraction based on Swapstack, materialized as the API for a low-level virtual machine, and shows how the abstraction of resumption protocols facilitates an elegant implementation of this API on real hardware.

...read moreread less

Abstract: On-stack replacement (OSR) is a performance-critical technology for many languages, especially dynamic languages. Conventional wisdom, apparent in JavaScript engines such as V8 and SpiderMonkey, is that OSR must be implemented in a low-level (i.e., in assembly) and language-specific way.This paper presents an OSR abstraction based on Swapstack, materialized as the API for a low-level virtual machine, and shows how the abstraction of resumption protocols facilitates an elegant implementation of this API on real hardware. Using an experimental JavaScript implementation, we demonstrate that this API enables the language implementation to perform OSR without the need to deal with machine-level details. We also show that the API itself is implementable on concrete hardware. This work helps crystallize OSR abstractions and, by providing a reusable implementation, brings OSR within reach for more language implementers.

...read moreread less

5 citations

Proceedings Article•DOI•

Demon: An Efficient Solution for on-Device MMU Virtualization in Mediated Pass-Through

[...]

Yu Xu¹, Jianguo Yao¹, Yaozu Dong², Kun Tian², Xiao Zheng², Haibing Guan¹ - Show less +2 more•Institutions (2)

Shanghai Jiao Tong University¹, Intel²

25 Mar 2018

TL;DR: Demon, an efficient solution for on-DEvice MMU virtualizatiON in mediated pass-through, takes advantage of IOMMU to construct a two-dimensional address translation and dynamically switches the 2nd-dimensional page table to a proper candidate when the device owner switches.

...read moreread less

Abstract: Memory Management Units (MMUs) for on-device address translation are widely used in modern devices. However, conventional solutions for on-device MMU virtualization, such as shadow page table implemented in mediated pass-through, still suffer from high complexity and low performance.We present Demon, an efficient solution for on-DEvice MMU virtualizatiON in mediated pass-through. The key insight is that Demon takes advantage of IOMMU to construct a two-dimensional address translation and dynamically switches the 2nd-dimensional page table to a proper candidate when the device owner switches. In order to support fine-grained parallelism for the device with multiple engines, we put forward a hardware proposal that separates the address space of each engine and enables simultaneous device address remapping for multiple virtual machines (VMs). We implement Demon with a prototype named gDemon which virtualizes Intel GPU MMU. Nonetheless, Demon is not limited to this particular case. Evaluations show that gDemon provides up to 19.73x better performance in the media transcoding workloads and achieves performance improvement of up to 17.09% and 13.73% in the 2D benchmarks and 3D benchmarks, respectively, compared with gVirt. The current release of gDemon scales up to 6 VMs with moderate performance in our experiments. In addition, gDemon simplifies the implementation of GPU MMU virtualization with 37% code reduction.

...read moreread less

4 citations