scispace - formally typeset
Search or ask a question

Showing papers presented at "Virtual Execution Environments in 2018"


Proceedings ArticleDOI
25 Mar 2018
TL;DR: This paper presents how to use VM live migration at scale to eliminate this disruption with minimal impact to the guest, performing over 1,000,0001migrations monthly in the authors' production fleet, with 50ms median blackout, 300ms 99th percentile blackout.
Abstract: Uninterrupted uptime is a critical aspect of Virtual Machines (VMs) offered by cloud hosting providers. Google's VMs run on top of rapidly changing infrastructure: we regularly update hardware and host software, and we must quickly respond to failing hardware. Frequent change is critical to both development velocity---deploying new versions of services and infrastructure---and the ability to respond rapidly to defects, including critical security fixes. Typically these updates would be disruptive, resulting in VM termination or restart. In this paper we present how we use VM live migration at scale to eliminate this disruption with minimal impact to the guest, performing over 1,000,0001migrations monthly in our production fleet, with 50ms median blackout, 300ms 99th percentile blackout.

36 citations


Proceedings ArticleDOI
25 Mar 2018
TL;DR: The findings are intended to help developers of C-focused tools, those testing compilers, and language designers seeking to reduce the reliance on inline assembly, and they may also aid the design of tools focused oninline assembly itself.
Abstract: C codebases frequently embed nonportable and unstandardized elements such as inline assembly code. Such elements are not well understood, which poses a problem to tool developers who aspire to support C code. This paper investigates the use of x86-64 inline assembly in 1264 C projects from GitHub and combines qualitative and quantitative analyses to answer questions that tool authors may have. We found that 28.1% of the most popular projects contain inline assembly code, although the majority contain only a few fragments with just one or two instructions. The most popular instructions constitute a small subset concerned largely with multicore semantics, performance optimization, and hardware control. Our findings are intended to help developers of C-focused tools, those testing compilers, and language designers seeking to reduce the reliance on inline assembly. They may also aid the design of tools focused on inline assembly itself.

21 citations


Proceedings ArticleDOI
25 Mar 2018
TL;DR: This paper introduces gMig, an open-source and practical GPU live migration solution for full virtualization that presents the One-Shot Pre-Copy combined with the hashing based Software Dirty Page technique to achieve efficientGPU live migration.
Abstract: This paper introduces gMig, an open-source and practical GPU live migration solution for full virtualization. By taking advantage of the dirty pattern of GPU workloads, gMig presents the One-Shot Pre-Copy combined with the hashing based Software Dirty Page technique to achieve efficient GPU live migration. Particularly, we propose three approaches for gMig: 1) Dynamic Graphics Address Remapping, which parses and manipulates GPU commands to adjust the address mapping to adapt to a different environment after migration, 2) Software Dirty Page, which utilizes a hashing based approach to detect page modification, overcomes the commodity GPU's hardware limitation, and speeds up the migration by only sending the dirtied pages, 3) One-Shot Pre-Copy, which greatly reduces the rounds of pre-copy of graphics memory. Our evaluation shows that gMig achieves GPU live migration with an average downtime of 302 ms on Windows and 119 ms on Linux. With the help of Software Dirty Page, the number of GPU pages transferred during the downtime is effectively reduced by 80.0%.

11 citations


Proceedings ArticleDOI
25 Mar 2018
TL;DR: A novel approach to optimize DBT systems for guest applications with dynamically-generated code that can maximize the reuse of previously translated host code to mitigate the re-translation overhead.
Abstract: The recent transition in the software industry toward dynamically generated code poses a new challenge to existing dynamic binary translation (DBT) systems. A significant re-translation overhead could be introduced due to the maintenance of the consistency between the dynamically-generated guest code and the corresponding translated host code. To address this issue, this paper presents a novel approach to optimize DBT systems for guest applications with dynamically-generated code. The proposed approach can maximize the reuse of previously translated host code to mitigate the re-translation overhead. A prototype based on such an approach has been implemented on an existing DBT system HQEMU. Experimental results on a set of JavaScript applications show that it can achieve a 1.24X performance speedup on average compared to the original HQEMU.

7 citations


Proceedings ArticleDOI
25 Mar 2018
TL;DR: This work explores techniques for combining many instruction tests into one program to amortize overheads such as booting an emulator and adopts the "Feistel network" construction from cryptography so that each step is invertible.
Abstract: Software that emulates a CPU has many applications, but is difficult to implement correctly and requires extensive testing. Since a large number of test cases are required for full coverage, it is important that the tests execute efficiently. We explore techniques for combining many instruction tests into one program to amortize overheads such as booting an emulator. To ensure the results of each test are reflected in a final result, we use the outputs of one instruction test as an input to the next, and adopt the "Feistel network" construction from cryptography so that each step is invertible. We evaluate this approach by applying it to PokeEMU, a tool that generates emulator tests using symbolic execution. The combined tests run much faster, but still reveal most of the same behavior differences as when run individually.

5 citations


Proceedings ArticleDOI
25 Mar 2018
TL;DR: This paper presents an OSR abstraction based on Swapstack, materialized as the API for a low-level virtual machine, and shows how the abstraction of resumption protocols facilitates an elegant implementation of this API on real hardware.
Abstract: On-stack replacement (OSR) is a performance-critical technology for many languages, especially dynamic languages. Conventional wisdom, apparent in JavaScript engines such as V8 and SpiderMonkey, is that OSR must be implemented in a low-level (i.e., in assembly) and language-specific way.This paper presents an OSR abstraction based on Swapstack, materialized as the API for a low-level virtual machine, and shows how the abstraction of resumption protocols facilitates an elegant implementation of this API on real hardware. Using an experimental JavaScript implementation, we demonstrate that this API enables the language implementation to perform OSR without the need to deal with machine-level details. We also show that the API itself is implementable on concrete hardware. This work helps crystallize OSR abstractions and, by providing a reusable implementation, brings OSR within reach for more language implementers.

5 citations


Proceedings ArticleDOI
Yu Xu1, Jianguo Yao1, Yaozu Dong2, Kun Tian2, Xiao Zheng2, Haibing Guan1 
25 Mar 2018
TL;DR: Demon, an efficient solution for on-DEvice MMU virtualizatiON in mediated pass-through, takes advantage of IOMMU to construct a two-dimensional address translation and dynamically switches the 2nd-dimensional page table to a proper candidate when the device owner switches.
Abstract: Memory Management Units (MMUs) for on-device address translation are widely used in modern devices. However, conventional solutions for on-device MMU virtualization, such as shadow page table implemented in mediated pass-through, still suffer from high complexity and low performance.We present Demon, an efficient solution for on-DEvice MMU virtualizatiON in mediated pass-through. The key insight is that Demon takes advantage of IOMMU to construct a two-dimensional address translation and dynamically switches the 2nd-dimensional page table to a proper candidate when the device owner switches. In order to support fine-grained parallelism for the device with multiple engines, we put forward a hardware proposal that separates the address space of each engine and enables simultaneous device address remapping for multiple virtual machines (VMs). We implement Demon with a prototype named gDemon which virtualizes Intel GPU MMU. Nonetheless, Demon is not limited to this particular case. Evaluations show that gDemon provides up to 19.73x better performance in the media transcoding workloads and achieves performance improvement of up to 17.09% and 13.73% in the 2D benchmarks and 3D benchmarks, respectively, compared with gVirt. The current release of gDemon scales up to 6 VMs with moderate performance in our experiments. In addition, gDemon simplifies the implementation of GPU MMU virtualization with 37% code reduction.

4 citations