Showing papers presented at "Virtual Execution Environments in 2011"

PDF

Open Access

Proceedings Article•DOI•

CloudNet: dynamic pooling of cloud resources by live WAN migration of virtual machines

[...]

Timothy Wood¹, Kadangode K. Ramakrishnan², Prashant Shenoy¹, Jacobus Van der Merwe²•Institutions (2)

University of Massachusetts Amherst¹, AT&T²

09 Mar 2011

TL;DR: The CloudNet architecure is presented as a cloud framework consisting of cloud computing platforms linked with a VPN based network infrastructure to provide seamless and secure connectivity between enterprise and cloud data center sites to realize the vision of efficiently pooling geographically distributed data center resources.

...read moreread less

Abstract: Virtual machine technology and the ease with which VMs can be migrated within the LAN, has changed the scope of resource management from allocating resources on a single server to manipulating pools of resources within a data center. We expect WAN migration of virtual machines to likewise transform the scope of provisioning compute resources from a single data center to multiple data centers spread across the country or around the world. In this paper we present the CloudNet architecure as a cloud framework consisting of cloud computing platforms linked with a VPN based network infrastructure to provide seamless and secure connectivity between enterprise and cloud data center sites. To realize our vision of efficiently pooling geographically distributed data center resources, CloudNet provides optimized support for live WAN migration of virtual machines. Specifically, we present a set of optimizations that minimize the cost of transferring storage and virtual machine memory during migrations over low bandwidth and high latency Internet links. We evaluate our system on an operational cloud platform distributed across the continental US. During simultaneous migrations of four VMs between data centers in Texas and Illinois, CloudNet's optimizations reduce memory migration time by 65% and lower bandwidth consumption for the storage and memory transfer by 19GB, a 50% reduction.

...read moreread less

317 citations

Proceedings Article•DOI•

Evaluation of delta compression techniques for efficient live migration of large virtual machines

[...]

Petter Svärd¹, Benoit Hudzia, Johan Tordsson¹, Erik Elmroth¹•Institutions (1)

Umeå University¹

09 Mar 2011

TL;DR: This contribution studies the application of delta compression during the transfer of memory pages in order to increase migration throughput and thus reduce downtime and discusses some general effects ofDelta compression on live migration and analyze when it is beneficial to use this technique.

...read moreread less

Abstract: Despite the widespread support for live migration of Virtual Machines (VMs) in current hypervisors, these have significant shortcomings when it comes to migration of certain types of VMs. More specifically, with existing algorithms, there is a high risk of service interruption when migrating VMs with high workloads and/or over low-bandwidth networks. In these cases, VM memory pages are dirtied faster than they can be transferred over the network, which leads to extended migration downtime. In this contribution, we study the application of delta compression during the transfer of memory pages in order to increase migration throughput and thus reduce downtime. The delta compression live migration algorithm is implemented as a modification to the KVM hypervisor. Its performance is evaluated by migrating VMs running different type of workloads and the evaluation demonstrates a significant decrease in migration downtime in all test cases. In a benchmark scenario the downtime is reduced by a factor of 100. In another scenario a streaming video server is live migrated with no perceivable downtime to the clients while the picture is frozen for eight seconds using standard approaches. Finally, in an enterprise application scenario, the delta compression algorithm successfully live migrates a very large system that fails after migration using the standard algorithm. Finally, we discuss some general effects of delta compression on live migration and analyze when it is beneficial to use this technique.

...read moreread less

191 citations

Proceedings Article•DOI•

Overdriver: handling memory overload in an oversubscribed cloud

[...]

Dan Williams¹, Hani Jamjoom², Yew-Huey Liu², Hakim Weatherspoon¹•Institutions (2)

Cornell University¹, IBM²

09 Mar 2011

TL;DR: Overdriver is presented, a system that adaptively takes advantage of mitigation approaches also as a continuum, complete with tradeoffs with respect to application performance and data center overhead, mitigating all overloads within 8% of well-provisioned performance.

...read moreread less

Abstract: With the intense competition between cloud providers, oversubscription is increasingly important to maintain profitability. Oversubscribing physical resources is not without consequences: it increases the likelihood of overload. Memory overload is particularly damaging. Contrary to traditional views, we analyze current data center logs and realistic Web workloads to show that overload is largely transient: up to 88.1% of overloads last for less than 2 minutes. Regarding overload as a continuum that includes both transient and sustained overloads of various durations points us to consider mitigation approaches also as a continuum, complete with tradeoffs with respect to application performance and data center overhead. In particular, heavyweight techniques, like VM migration, are better suited to sustained overloads, whereas lightweight approaches, like network memory, are better suited to transient overloads. We present Overdriver, a system that adaptively takes advantage of these tradeoffs, mitigating all overloads within 8% of well-provisioned performance. Furthermore, under reasonable oversubscription ratios, where transient overload constitutes the vast majority of overloads, Overdriver requires 15% of the excess space and generates a factor of four less network traffic than a migration-only approach.

...read moreread less

90 citations

Proceedings Article•DOI•

Dynamic cache contention detection in multi-threaded applications

[...]

Qin Zhao¹, David Koh¹, Syed Raza¹, Derek Bruening², Weng-Fai Wong³, Saman Amarasinghe¹ - Show less +2 more•Institutions (3)

Massachusetts Institute of Technology¹, Google², National University of Singapore³

09 Mar 2011

TL;DR: A novel approach is presented that efficiently analyzes interactions between threads to determine thread correlation and detect true and false sharing, and is able to improve the performance of some applications up to a factor of 12x and shed light on the obstacles that prevent their performance from scaling to many cores.

...read moreread less

Abstract: In today's multi-core systems, cache contention due to true and false sharing can cause unexpected and significant performance degradation. A detailed understanding of a given multi-threaded application's behavior is required to precisely identify such performance bottlenecks. Traditionally, however, such diagnostic information can only be obtained after lengthy simulation of the memory hierarchy.In this paper, we present a novel approach that efficiently analyzes interactions between threads to determine thread correlation and detect true and false sharing. It is based on the following key insight: although the slowdown caused by cache contention depends on factors including the thread-to-core binding and parameters of the memory hierarchy, the amount of data sharing is primarily a function of the cache line size and application behavior. Using memory shadowing and dynamic instrumentation, we implemented a tool that obtains detailed sharing information between threads without simulating the full complexity of the memory hierarchy. The runtime overhead of our approach --- a 5x slowdown on average relative to native execution --- is significantly less than that of detailed cache simulation. The information collected allows programmers to identify the degree of cache contention in an application, the correlation among its threads, and the sources of significant false sharing. Using our approach, we were able to improve the performance of some applications up to a factor of 12x. For other contention-intensive applications, we were able to shed light on the obstacles that prevent their performance from scaling to many cores.

...read moreread less

86 citations

Proceedings Article•DOI•

Workload-aware live storage migration for clouds

[...]

Jie Zheng¹, Tze Sing Eugene Ng¹, Kunwadee Sripanidkulchai•Institutions (1)

Rice University¹

09 Mar 2011

TL;DR: This paper presents a novel storage migration scheduling algorithm that can greatly improve storage I/O performance during wide-area migration and shows that the algorithm provides large performance benefits across a wide range of popular virtual machine workloads.

...read moreread less

Abstract: The emerging open cloud computing model will provide users with great freedom to dynamically migrate virtualized computing services to, from, and between clouds over the wide-area. While this freedom leads to many potential benefits, the running services must be minimally disrupted by the migration. Unfortunately, current solutions for wide-area migration incur too much disruption as they will significantly slow down storage I/O operations during migration. The resulting increase in service latency could be very costly to a business. This paper presents a novel storage migration scheduling algorithm that can greatly improve storage I/O performance during wide-area migration. Our algorithm is unique in that it considers individual virtual machine's storage I/O workload such as temporal locality, spatial locality and popularity characteristics to compute an efficient data transfer schedule. Using a fully implemented system on KVM and a trace-driven framework, we show that our algorithm provides large performance benefits across a wide range of popular virtual machine workloads.

...read moreread less

82 citations

Proceedings Article•DOI•

Fast and space-efficient virtual machine checkpointing

[...]

Eunbyung Park¹, Bernhard Egger¹, Jaejin Lee¹•Institutions (1)

Seoul National University¹

09 Mar 2011

TL;DR: This work presents a technique for fast and space-efficient checkpointing of virtual machines, which transparently tracks I/O operations of the guest to external storage and maintains a list of memory pages whose contents are duplicated on non-volatile storage at a checkpoint.

...read moreread less

Abstract: Checkpointing, i.e., recording the volatile state of a virtual machine (VM) running as a guest in a virtual machine monitor (VMM) for later restoration, includes storing the memory available to the VM. Typically, a full image of the VM's memory along with processor and device states are recorded. With guest memory sizes of up to several gigabytes, the size of the checkpoint images becomes more and more of a concern.In this work we present a technique for fast and space-efficient checkpointing of virtual machines. In contrast to existing methods, our technique eliminates redundant data and stores only a subset of the VM's memory pages. Our technique transparently tracks I/O operations of the guest to external storage and maintains a list of memory pages whose contents are duplicated on non-volatile storage. At a checkpoint, these pages are excluded from the checkpoint image.We have implemented the proposed technique for paravirtualized as well as fully-virtualized guests in the Xen VMM. Our experiments with a paravirtualized guest (Linux) and two fullyvirtualized guests (Linux, Windows) show a significant reduction in the size of the checkpoint image as well as the time required to complete the checkpoint. Compared to the current Xen implementation, we achieve, on average, an 81% reduction in the stored data and a 74% reduction in the time required to take a checkpoint for the paravirtualized Linux guest. In a fully-virtualized environment runningWindows and Linux guests, we achieve a 64% reduction of the image size along with a 62% reduction in checkpointing time.

...read moreread less

78 citations

Proceedings Article•DOI•

Virtual WiFi: bring virtualization from wired to wireless

[...]

Lei Xia¹, Sanjeev Kumar², Xue Yang², Praveen Gopalakrishnan², York Liu², Sebastian Schoenberg², Xingang Guo² - Show less +3 more•Institutions (2)

Northwestern University¹, Intel²

09 Mar 2011

TL;DR: The results show that with conventional virtualization overhead mitigation mechanisms, the proposed approach can support fully functional wireless functions inside VM, and achieve close to native performance of Wireless LAN with moderately increased CPU utilization.

...read moreread less

Abstract: As virtualization trend is moving towards "client virtualization", wireless virtualization remains to be one of the technology gaps that haven't been addressed satisfactorily. Today's approaches are mainly developed for wired network, and are not suitable for virtualizing wireless network interface due to the fundamental differences between wireless and wired LAN devices that we will elaborate in this paper. We propose a wireless LAN virtualization approach named virtual WiFi that addresses the technology gap. With our proposed solution, the full wireless LAN functionalities are supported inside virtual machines; each virtual machine can establish its own connection with self-supplied credentials; and multiple separate wireless LAN connections are supported through one physical wireless LAN network interface. We designed and implemented a prototype for our proposed virtual WiFi approach, and conducted detailed performance study. Our results show that with conventional virtualization overhead mitigation mechanisms, our proposed approach can support fully functional wireless functions inside VM, and achieve close to native performance of Wireless LAN with moderately increased CPU utilization.

...read moreread less

77 citations

Proceedings Article•DOI•

Minimal-overhead virtualization of a large scale supercomputer

[...]

John R. Lange¹, Kevin Pedretti², Peter A. Dinda³, Patrick G. Bridges⁴, Chang Bae³, Philip Soltero⁴, Alexander Merritt⁵ - Show less +3 more•Institutions (5)

University of Pittsburgh¹, Sandia National Laboratories², Northwestern University³, University of New Mexico⁴, Georgia Institute of Technology⁵

09 Mar 2011

TL;DR: This paper shows how careful use of hardware and VMM features enables the virtualization of a large-scale HPC system, specifically a Cray XT4 machine, with < = 5% overhead on key HPC applications, microbenchmarks, and guests at scales of up to 4096 nodes.

...read moreread less

Abstract: Virtualization has the potential to dramatically increase the usability and reliability of high performance computing (HPC) systems. However, this potential will remain unrealized unless overheads can be minimized. This is particularly challenging on large scale machines that run carefully crafted HPC OSes supporting tightly-coupled, parallel applications. In this paper, we show how careful use of hardware and VMM features enables the virtualization of a large-scale HPC system, specifically a Cray XT4 machine, with

...read moreread less

77 citations

Proceedings Article•DOI•

Fine-grained user-space security through virtualization

[...]

Mathias Payer¹, Thomas R. Gross¹•Institutions (1)

ETH Zurich¹

09 Mar 2011

TL;DR: An in-depth analysis of the different security guarantees and a performance analysis of libdetox, a prototype of the full protection platform are offered.

...read moreread less

Abstract: This paper presents an approach to the safe execution of applications based on software-based fault isolation and policy-based system call authorization. A running application is encapsulated in an additional layer of protection using dynamic binary translation in user-space. This virtualization layer dynamically recompiles the machine code and adds multiple dynamic security guards that verify the running code to protect and contain the application.The binary translation system redirects all system calls to a policy-based system call authorization framework. This interposition framework validates every system call based on the given arguments and the location of the system call. Depending on the user-loadable policy and an extensible handler mechanism the framework decides whether a system call is allowed, rejected, or redirect to a specific user-space handler in the virtualization layer.This paper offers an in-depth analysis of the different security guarantees and a performance analysis of libdetox, a prototype of the full protection platform. The combination of software-based fault isolation and policy-based system call authorization imposes only low overhead and is therefore an attractive option to encapsulate and sandbox applications to improve host security.

...read moreread less

77 citations

Proceedings Article•DOI•

Dolly: virtualization-driven database provisioning for the cloud

[...]

Emmanuel Cecchet¹, Rahul Singh¹, Upendra Sharma¹, Prashant Shenoy¹•Institutions (1)

University of Massachusetts Amherst¹

09 Mar 2011

TL;DR: It is argued that being able to determine state replication time is crucial for provisioning databases and shown that VM cloning provides this property and proposed Dolly, a database provisioning system based on VM cloning and cost models to adapt the provisioning policy to the cloud infrastructure specifics and application requirements.

...read moreread less

Abstract: Cloud computing platforms are becoming increasingly popular for e-commerce applications that can be scaled on-demand in a very cost effective way. Dynamic provisioning is used to autonomously add capacity in multi-tier cloud-based applications that see workload increases. While many solutions exist to provision tiers with little or no state in applications, the database tier remains problematic for dynamic provisioning due to the need to replicate its large disk state. In this paper, we explore virtual machine (VM) cloning techniques to spawn database replicas and address the challenges of provisioning shared-nothing replicated databases in the cloud. We argue that being able to determine state replication time is crucial for provisioning databases and show that VM cloning provides this property. We propose Dolly, a database provisioning system based on VM cloning and cost models to adapt the provisioning policy to the cloud infrastructure specifics and application requirements. We present an implementation of Dolly in a commercial-grade replication middleware and evaluate database provisioning strategies for a TPC-W workload on a private cloud and on Amazon EC2. By being aware of VM-based state replication cost, Dolly can solve the challenge of automated provisioning for replicated databases on cloud platforms.

...read moreread less

63 citations

Proceedings Article•DOI•

Performance profiling of virtual machines

[...]

Jiaqing Du¹, Nipun Sehrawat², Willy Zwaenepoel¹•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, University of Illinois at Urbana–Champaign²

09 Mar 2011

TL;DR: This paper implements both guest-wide and system-wide profiling for a VMM based on the x86 hardware virtualization extensions and system -wide profilingFor a V MM based on binary translation, demonstrating that these profilers provide good accuracy with only limited overhead.

...read moreread less

Abstract: Profilers based on hardware performance counters are indispensable for performance debugging of complex software systems. All modern processors feature hardware performance counters, but current virtual machine monitors (VMMs) do not properly expose them to the guest operating systems. Existing profiling tools require privileged access to the VMM to profile the guest and are only available for VMMs based on paravirtualization. Diagnosing performance problems of software running in a virtualized environment is therefore quite difficult.This paper describes how to extend VMMs to support performance profiling. We present two types of profiling in a virtualized environment: guest-wide profiling and system-wide profiling. Guest-wide profiling shows the runtime behavior of a guest. The profiler runs in the guest and does not require privileged access to the VMM. System-wide profiling exposes the runtime behavior of both the VMM and any number of guests. It requires profilers both in the VMM and in those guests.Not every VMM has the right architecture to support both types of profiling. We determine the requirements for each of them, and explore the possibilities for their implementation in virtual machines using hardware assistance, paravirtualization, and binary translation.We implement both guest-wide and system-wide profiling for a VMM based on the x86 hardware virtualization extensions and system-wide profiling for a VMM based on binary translation. We demonstrate that these profilers provide good accuracy with only limited overhead.

...read moreread less

Proceedings Article•DOI•

Selective hardware/software memory virtualization

[...]

Xiaolin Wang¹, Jiarui Zang¹, Zhenlin Wang², Yingwei Luo¹, Xiaoming Li¹ - Show less +1 more•Institutions (2)

Peking University¹, Michigan Technological University²

09 Mar 2011

TL;DR: A dynamic switching mechanism is proposed that monitors TLB misses and guest page faults on the fly, and dynam-ically switches between the two paging modes, and can match and, sometimes, even beat the better performance of HAP and SP.

...read moreread less

Abstract: As virtualization becomes a key technique for supporting cloud computing, much effort has been made to reduce virtualization overhead, so a virtualized system can match its native performance. One major overhead is due to memory or page table virtualization. Conventional virtual machines rely on a shadow mechanism to manage page tables, where a shadow page table maintained by the VMM (Virtual Machine Monitor) maps virtual addresses to machine addresses while a guest maintains its own virtual to physical page table. This shadow mechanism will result in expensive VM exits whenever there is a page fault that requires synchronization between the two page tables. To avoid this cost, both Intel and AMD provide hardware assists, EPT (extended page table) and NPT (nested page table), to facilitate address translation. With the hardware assists, the MMU (Memory Management Unit) maintains an ordinary guest page table that translates virtual addresses to guest physical addresses. In addition, the extended page table as provided by EPT translates from guest physical addresses to host physical or machine addresses. NPT works in a similar style. With EPT or NPT, a guest page fault can be handled by the guest itself without triggering VM exits. However, the hardware assists do have their disadvantage compared to the conventional shadow mechanism -- the page walk yields more memory accesses and thus longer latency. Our experimental results show that neither hardware-assisted paging (HAP) nor shadow paging (SP) can be a definite winner. Despite the fact that in over half of the cases, there is no noticeable gap between the two mechanisms, an up to 34% performance gap exists for a few benchmarks. We propose a dynamic switching mechanism that monitors TLB misses and guest page faults on the fly, and dynam-ically switches between the two paging modes. Our experiments show that this new mechanism can match and, sometimes, even beat the better performance of HAP and SP.

...read moreread less

Proceedings Article•DOI•

Fast restore of checkpointed memory using working set estimation

[...]

Irene Zhang¹, Alex Garthwaite¹, Yury Baskakov¹, Kenneth C. Barr¹•Institutions (1)

VMware¹

09 Mar 2011

TL;DR: The time-to-responsiveness metric is introduced, which better characterizes user experience while restoring a saved VM by measuring the time until there is no longer a noticeable performance impact on the restoring VM.

...read moreread less

Abstract: In order to make save and restore features practical, saved virtual machines (VMs) must be able to quickly restore to normal operation. Unfortunately, fetching a saved memory image from persistent storage can be slow, especially as VMs grow in memory size. One possible solution for reducing this time is to lazily restore memory after the VM starts. However, accesses to unrestored memory after the VM starts can degrade performance, sometimes rendering the VM unusable for even longer. Existing performance metrics do not account for performance degradation after the VM starts, making it difficult to compare lazily restoring memory against other approaches. In this paper, we propose both a better metric for evaluating the performance of different restore techniques and a better scheme for restoring saved VMs.Existing performance metrics do not reflect what is really important to the user -- the time until the VM returns to normal operation. We introduce the time-to-responsiveness metric, which better characterizes user experience while restoring a saved VM by measuring the time until there is no longer a noticeable performance impact on the restoring VM. We propose a new lazy restore technique, called working set restore, that minimizes performance degradation after the VM starts by prefetching the working set. We also introduce a novel working set estimator based on memory tracing that we use to test working set restore, along with an estimator that uses access-bit scanning. We show that working set restore can improve the performance of restoring a saved VM by more than 89% for some workloads.

...read moreread less

Proceedings Article•DOI•

Perfctr-Xen: a framework for performance counter virtualization

[...]

Ruslan Nikolaev¹, Godmar Back¹•Institutions (1)

Virginia Tech¹

09 Mar 2011

TL;DR: The challenges of performance monitoring inherent to virtualized environments are discussed and a technique to virtualize access to low-level performance counters on a per-thread basis is introduced, implemented in perfctr-xen, a framework for the Xen hypervisor that provides an infrastructure for higher-level profilers.

...read moreread less

Abstract: Virtualization is a powerful technique used for variety of application domains, including emerging cloud environments that provide access to virtual machines as a service. Because of the interaction of virtual machines with multiple underlying software and hardware layers, the analysis of the performance of applications running in virtualized environments has been difficult. Moreover, performance analysis tools commonly used in native environments were not available in virtualized environments, a gap which our work closes.This paper discusses the challenges of performance monitoring inherent to virtualized environments and introduces a technique to virtualize access to low-level performance counters on a per-thread basis. The technique was implemented in perfctr-xen, a framework for the Xen hypervisor that provides an infrastructure for higher-level profilers. This framework supports both accumulative event counts and interrupt-driven event sampling. It is light-weight, providing direct user mode access to logical counter values. perfctr-xen supports multiple modes of virtualization, including paravirtualization and hardware-assisted virtualization. perfctr-xen applies guest kernel-hypervisor coordination techniques to reduce virtualization overhead. We present experimental results based on microbenchmarks and SPEC CPU2006 macrobenchmarks that show the accuracy and usability of the obtained measurements when compared to native execution.

...read moreread less

Proceedings Article•DOI•

Rethink the virtual machine template

[...]

Kun Wang¹, Jia Rao¹, Cheng-Zhong Xu¹•Institutions (1)

Wayne State University¹

09 Mar 2011

TL;DR: The abstraction leverages an arrange of techniques, including VM miniaturization, generalization, clone and migration, storage copy-on-write, and on-the-fly resource configuration, for rapid deployment of VMs and VM clusters on demand.

...read moreread less

Abstract: Server virtualization technology facilitates the creation of an elastic computing infrastructure on demand. There are cloud applications like server-based computing and virtual desktop that concern startup latency and require impromptu requests for VM creation in a real-time manner. Conventional template-based VM creation is a time consuming process and lacks flexibility for the deployment of stateful VMs. In this paper, we present an abstraction of VM substrate to represent generic VM instances in miniature. Unlike templates that are stored as an image file in disk, VM substrates are docked in memory in a designated VM pool. They can be activated into stateful VMs without machine booting and application initialization. The abstraction leverages an arrange of techniques, including VM miniaturization, generalization, clone and migration, storage copy-on-write, and on-the-fly resource configuration, for rapid deployment of VMs and VM clusters on demand. We implement a prototype on a Xen platform and show that a server with typical configuration of TB disk and GB memory can accommodate more substrates in memory than templates in disk and stateful VMs can be created from the same or different substrates and deployed on to the same or different physical hosts in a cluster without causing any configuration conflicts. Experimental results show that general purpose VMs or a VM cluster for parallel computing can be deployed in a few seconds. We demonstrate the usage of VM substrates in a mobile gaming application.

...read moreread less

Proceedings Article•DOI•

ReHype: enabling VM survival across hypervisor failures

[...]

Michael Le¹, Yuval Tamir¹•Institutions (1)

University of California, Los Angeles¹

09 Mar 2011

TL;DR: The experimental results show that the ReHype prototype can successfully recover from over 90% of detected hypervisor failures, and the implementation was done incrementally, using results from fault injection experiments to identify the sources of dangerous state corruption and inconsistencies.

...read moreread less

Abstract: With existing virtualized systems, hypervisor failures lead to overall system failure and the loss of all the work in progress of virtual machines (VMs) running on the system. We introduce ReHype, a mechanism for recovery from hypervisor failures by booting a new instance of the hypervisor while preserving the state of running VMs. VMs are stalled during the hypervisor reboot and resume normal execution once the new hypervisor instance is running. Hypervisor failures can lead to arbitrary state corruption and inconsistencies throughout the system. ReHype deals with the challenge of protecting the recovered hypervisor instance from such corrupted state and resolving inconsistencies between different parts of hypervisor state as well as between the hypervisor and VMs and between the hypervisor and the hardware. We have implemented ReHype for the Xen hypervisor. The implementation was done incrementally, using results from fault injection experiments to identify the sources of dangerous state corruption and inconsistencies. The implementation of ReHype involved only 880 LOC added or modified in Xen. The memory space overhead of ReHype is only 2.1MB for a pristine copy of the hypervisor code and static data plus a small reserved memory area. The fault injection campaigns used to evaluate the effectiveness of ReHype involved a system with multiple VMs running I/O and hypercall-intensive benchmarks. Our experimental results show that the ReHype prototype can successfully recover from over 90% of detected hypervisor failures.

...read moreread less

Proceedings Article•DOI•

SymCall: symbiotic virtualization through VMM-to-guest upcalls

[...]

John R. Lange¹, Peter A. Dinda²•Institutions (2)

University of Pittsburgh¹, Northwestern University²

09 Mar 2011

TL;DR: The design and implementation of the SymCall symbiotic virtualization interface is described in the authors' publicly available Palacios VMM for modern x86 machines and the implementation of SwapBypass, a VMM service based on SymCall that reconsiders swap decisions made by a symbiotic Linux guest are described.

...read moreread less

Abstract: Symbiotic virtualization is a new approach to system virtualization in which a guest OS targets the native hardware interface as in full system virtualization, but also optionally exposes a software interface that can be used by a VMM, if present, to increase performance and functionality. Neither the VMM nor the OS needs to support the symbiotic virtualization interface to function together, but if both do, both benefit. We describe the design and implementation of the SymCall symbiotic virtualization interface in our publicly available Palacios VMM for modern x86 machines. SymCall makes it possible for Palacios to make clean synchronous upcalls into a symbiotic guest, much like system calls. One use of symcalls is to allow synchronous collection of semantically rich guest data during exit handling in order to enable new VMM features. We describe the implementation of SwapBypass, a VMM service based on SymCall that reconsiders swap decisions made by a symbiotic Linux guest. Finally, we present a detailed performance evaluation of both SwapBypass and SymCall.

...read moreread less

Proceedings Article•DOI•

Hybrid binary rewriting for memory access instrumentation

[...]

Amitabha Roy¹, Steven Hand¹, Tim Harris²•Institutions (2)

University of Cambridge¹, Microsoft²

09 Mar 2011

TL;DR: Hybrid binary rewriting is proposed, which aims to automatically instrument all shared memory accesses in critical sections of x86 binaries, while achieving overhead close to that obtained when performing manual instrumentation at the source code level.

...read moreread less

Abstract: Memory access instrumentation is fundamental to many applications such as software transactional memory systems, profiling tools and race detectors. We examine the problem of efficiently instrumenting memory accesses in x86 machine code to support software transactional memory and profiling. We aim to automatically instrument all shared memory accesses in critical sections of x86 binaries, while achieving overhead close to that obtained when performing manual instrumentation at the source code level.The two primary options in building such an instrumentation system are static and dynamic binary rewriting: the former instruments binaries at link time before execution, while the latter binary rewriting instruments binaries at runtime. Static binary rewriting offers extremely low overhead but is hampered by the limits of static analysis. Dynamic binary rewriting is able to use runtime information but typically incurs higher overhead. This paper proposes an alternative: hybrid binary rewriting. Hybrid binary rewriting is built around the idea of a persistent instrumentation cache (PIC) that is associated with a binary and contains instrumented code from it. It supports two execution modes when using instrumentation: active and passive modes. In the active execution mode, a dynamic binary rewriting engine (PIN) is used to intercept execution, and generate instrumentation into the PIC, which is an on-disk file. This execution mode can take full advantage of runtime information. Later, passive execution can be used where instrumented code is executed out of the PIC. This allows us to attain overheads similar to those incurred with static binary rewriting.This instrumentation methodology enables a variety of static and dynamic techniques to be applied. For example, in passive mode, execution occurs directly from the original executable save for regions that require instrumentation. This has allowed us to build a low-overhead transactional memory profiler. We also demonstrate how we can use the combination of static and dynamic techniques to eliminate instrumentation for accesses to locations that are thread-private.

...read moreread less

Proceedings Article•DOI•

Patch auditing in infrastructure as a service clouds

[...]

Lionel Litty¹, David Lie²•Institutions (2)

VMware¹, University of Toronto²

09 Mar 2011

TL;DR: P2 is proposed, a hypervisor-based patch audit solution that audits VMs and detects the execution of unpatched binary and non-binary files in an accurate, continuous and OSagnostic manner and implements a novel algorithm that identifies binaries in mid-execution.

...read moreread less

Abstract: A basic requirement of a secure computer system is that it be up to date with regard to software security patches. Unfortunately, Infrastructure as a Service (IaaS) clouds make this difficult. They leverage virtualization, which provides functionality that causes traditional security patch update systems to fail. In addition, the diversity of operating systems and the distributed nature of administration in the cloud compound the problem of identifying unpatched machines.In this work, we propose P2, a hypervisor-based patch audit solution. P2 audits VMs and detects the execution of unpatched binary and non-binary files in an accurate, continuous and OSagnostic manner. Two key innovations make P2 possible. First, P2 uses efficient information flow tracking to identify the use of unpatched non-binary files in a vulnerable way.We performed a patch survey and discover that 64% of files modified by security updates do not contain binary code, making the audit of non-binary files crucial. Second, P2 implements a novel algorithm that identifies binaries in mid-execution to allow handling of VMs resumed from a checkpoint or migrated into the cloud. We have implemented a prototype of P2 and and our experiments show that it accurately reports the execution of unpatched code while imposing performance overhead of 4%.

...read moreread less

Proceedings Article•DOI•

Fast and correct performance recovery of operating systems using a virtual machine monitor

[...]

Kenichi Kourai¹•Institutions (1)

Kyushu Institute of Technology¹

09 Mar 2011

TL;DR: The experimental results showed that the warm-cache reboot decreased performance degradation just after the reboot, and it was confirmed that the file cache corrupted by faults was not reused and the overheads for maintaining cache consistency were not usually large.

...read moreread less

Abstract: Rebooting an operating system is a final but effective recovery technique. However, the system performance largely degrades just after the reboot due to the page cache being lost in the main memory. For fast performance recovery, we propose a new reboot mechanism called the warm-cache reboot. The warm-cache reboot preserves the page cache during the reboot and enables an operating system to restore it after the reboot, with the help of a virtual machine monitor (VMM). To perform correct recovery, the VMM guarantees that the reused page cache is consistent with the corresponding files on disks. We have implemented the warm-cache reboot mechanism in the Xen VMM and the Linux operating system. Our experimental results showed that the warm-cache reboot decreased performance degradation just after the reboot. In addition, we confirmed that the file cache corrupted by faults was not reused. The overheads for maintaining cache consistency were not usually large.

...read moreread less

Proceedings Article•DOI•

Virtualization in the age of heterogeneous machines

[...]

David F. Bacon¹•Institutions (1)

IBM¹

09 Mar 2011

TL;DR: It is contended that multi-core chips are a fatally flawed approach - instead, maximum performance will be achieved by using heterogeneous chips and systems that combine customized and customizable computational substrates that achieve very high performance by closely matching the computational and communications structures of the application at hand.

...read moreread less

Abstract: Since their invention over 40 years ago, virtual machines have been used to virtualize one or more von Neumann processors and their associated peripherals. System virtual machines provide the illusion that the user has their own instance of a physical machine with a given instruction set architecture (ISA). Process virtual machines provide the illusion of running on a synthetic architecture independent of the underlying ISA, generally for the purpose of supporting a high-level language.To continue the historical trend of exponential increase in computational power in the face of limits on clock frequency scaling, we must find ways to harness the inherent parallelism of billions of transistors. I contend that multi-core chips are a fatally flawed approach - instead, maximum performance will be achieved by using heterogeneous chips and systems that combine customized and customizable computational substrates that achieve very high performance by closely matching the computational and communications structures of the application at hand. Such chips might look like a mashup of a conventional multicore, a GPU, an FPGA, some ASICs, and a DSP. But programming them with current technologies would be nightmarishly complex, portability would be lost, and innovation between chip generations would be severely limited.The answer (of course) is virtualization, and at both the device level and the language level. In this talk I will illustrate some challenges and potential solutions in the context of IBM's Liquid Metal project, in which we are designing a new high-level language (Lime) and compiler/runtime technology to virtualize the underlying computational devices by providing a uniform semantic model. I will also discuss problems (and opportunities) that this raises at the operating system and data center levels, particularly with computational elements like FPGAs for which "context switching" is currently either extremely expensive or simply impossible.

...read moreread less

Proceedings Article•

Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments

[...]

Erez Petrank¹, Doug Lea²•Institutions (2)

Technion – Israel Institute of Technology¹, State University of New York at Oswego²

09 Mar 2011

TL;DR: The 7th ACM SIGPLAN/SIGOPS Conference on Virtual Execution Environments (VEE'11) as discussed by the authors is the leading conference for presentation of research results on all aspects of virtualization, bringing together researchers representing a diverse set of interests.

...read moreread less

Abstract: It is our pleasure to welcome you to the 7th ACM SIGPLAN/SIGOPS Conference on Virtual Execution Environments (VEE'11). As the leading conference for presentation of research results on all aspects of virtualization, VEE brings together researchers representing a diverse set of interests. This year, we received 84 abstracts, 68 full submissions, and selected 20 papers for presentation at the conference. In selecting papers, the program committee placed high priority on work that is broadly informative and applicable to both researchers and practitioners. We are confident these papers will make for an interesting conference and a valuable contribution to the study and practice of virtualization. Additionally, the program includes a keynote presentation by David Bacon on virtualizing new forms of devices such as FPGAs. VEE'11 is again co-located with the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Our authors, program committee, sponsors, and supporters all span the boundaries between operating systems and programming language implementation, and reflect equally strong academic and industrial interests in the field.

...read moreread less