scispace - formally typeset
Search or ask a question
Author

Robert Bradford

Other affiliations: Deutsche Telekom
Bio: Robert Bradford is an academic researcher from Technical University of Berlin. The author has contributed to research in topics: Virtual machine & Web server. The author has an hindex of 2, co-authored 3 publications receiving 681 citations. Previous affiliations of Robert Bradford include Deutsche Telekom.

Papers
More filters
Proceedings ArticleDOI
13 Jun 2007
TL;DR: By combining a block-level solution with pre-copying and write throttling, it is shown that an entire running web server can be transferred, including its local persistent state, with minimal disruption.
Abstract: So far virtual machine (VM) migration has focused on transferring the run-time memory state of the VMs in local area networks (LAN). However, for wide-area network (WAN) migration it is crucial to not just transfer the VMs image but also transfer its local persistent state (its file system) and its on-going network connections. In this paper we address both: by combining a block-level solution with pre-copying and write throttling we show that we can transfer an entire running web server, including its local persistent state, with minimal disruption --- three seconds in the LAN and 68 seconds in the WAN); by combining dynDNS with tunneling, existing connections can continue transparently while new ones are redirected to the new network location. Thus we show experimentally that by combining well-known techniques in a novel manner we can provide system support for migrating virtual execution environments in the wide area.

469 citations

Patent
20 Feb 2008
TL;DR: In this article, the authors propose a method for transferring storage data of a virtual machine to be migrated from a first host device to a second host device via a communication network, including: running the virtual machine on the first host devices, storing, on a local storage device of the first-host device, a disk image used by the VM, detecting any changes made to the disk image, and transferring to the second-host devices in response to detected any changes detected on the VM.
Abstract: Method for transferring storage data of a virtual machine to be migrated from a first host device to a second host device via a communication network, including: running the virtual machine on the first host device; storing, on a local storage device of the first host device, a disk image used by the virtual machine; detecting, while the virtual machine is running on the first host device, any changes made to the disk image used by the virtual machine; establishing a connection over the communication network from the first host device to the second host device; transferring, to the second host device while the virtual machine is running on the first host device, the disk image used by the virtual machine and the detected any changes made; modifying the disk image transferred to the second host device in response to the detected any changes transferred to the second host device; and starting, using the modified disk image, a migrated virtual machine on the second host device at a current state of the virtual machine running on the first host device.

221 citations


Cited by
More filters
Proceedings Article
16 Apr 2008
TL;DR: Remus as mentioned in this paper is a high availability service that allows existing, unmodified software to be protected from the failure of the physical machine on which it runs by encapsulating protected software in a virtual machine, asynchronously propagating changed state to a backup host at frequencies as high as forty times a second.
Abstract: Allowing applications to survive hardware failure is an expensive undertaking, which generally involves reengineering software to include complicated recovery logic as well as deploying special-purpose hardware; this represents a severe barrier to improving the dependability of large or legacy applications. We describe the construction of a general and transparent high availability service that allows existing, unmodified software to be protected from the failure of the physical machine on which it runs. Remus provides an extremely high degree of fault tolerance, to the point that a running system can transparently continue execution on an alternate physical host in the face of failure with only seconds of downtime, while completely preserving host state such as active network connections. Our approach encapsulates protected software in a virtual machine, asynchronously propagates changed state to a backup host at frequencies as high as forty times a second, and uses speculative execution to concurrently run the active VM slightly ahead of the replicated system state.

715 citations

Proceedings ArticleDOI
11 Mar 2009
TL;DR: The design, implementation, and evaluation of post-copy based live migration for virtual machines (VMs) across a Gigabit LAN are presented and improvements in several migration metrics including pages transferred, total migration time and network overhead are shown.
Abstract: We present the design, implementation, and evaluation of post-copy based live migration for virtual machines (VMs) across a Gigabit LAN. Live migration is an indispensable feature in today's virtualization technologies. Post-copy migration defers the transfer of a VM's memory contents until after its processor state has been sent to the target host. This deferral is in contrast to the traditional pre-copy approach, which first copies the memory state over multiple iterations followed by a final transfer of the processor state. The post-copy strategy can provide a "win-win" by reducing total migration time closer to its equivalent time achieved by non-live VM migration. This is done while maintaining the liveness benefits of the pre-copy approach. We compare post-copy extensively against the traditional pre-copy approach on top of the Xen Hypervisor. Using a range of VM workloads we show improvements in several migration metrics including pages transferred, total migration time and network overhead. We facilitate the use of post-copy with adaptive pre-paging in order to eliminate all duplicate page transmissions. Our implementation is able to reduce the number of network-bound page faults to within 21% of the VM's working set for large workloads. Finally, we eliminate the transfer of free memory pages in both migration schemes through a dynamic self-ballooning (DSB) mechanism. DSB periodically releases free pages in a guest VM back to the hypervisor and significantly speeds up migration with negligible performance degradation.

454 citations

Patent
15 Dec 2009
TL;DR: In this paper, the authors describe a method to communicatively couple virtual private networks to virtual machines within distributive computing networks by specifying an address space within the router associated with at least one of the virtual machines or the virtual private network.
Abstract: Methods and apparatus to communicatively couple virtual private networks to virtual machines within distributive computing networks are disclosed. A disclosed example method includes receiving a request to provision a virtual machine from a virtual private network, determining a host for the virtual machine within a distributive computing network, creating the virtual machine within the host, communicatively coupling the virtual machine to a virtual local area network switch within the distributive computing network, configuring a portion of a router to be communicatively coupled to the virtual machine via the virtual local area network switch by specifying an address space within the router associated with at least one of the virtual machine or the virtual private network communicatively coupled to the router, and communicatively coupling the portion of the router to the virtual private network.

373 citations

Journal ArticleDOI
TL;DR: Using a range of VM workloads, post-copy improves several metrics including pages transferred, total migration time, and network overhead and is facilitated with adaptive prepaging techniques to minimize the number of page faults across the network.
Abstract: We present the design, implementation, and evaluation of post-copy based live migration for virtual machines (VMs) across a Gigabit LAN. Post-copy migration defers the transfer of a VM's memory contents until after its processor state has been sent to the target host. This deferral is in contrast to the traditional pre-copy approach, which first copies the memory state over multiple iterations followed by a final transfer of the processor state. The post-copy strategy can provide a "win-win" by reducing total migration time while maintaining the liveness of the VM during migration. We compare post-copy extensively against the traditional pre-copy approach on the Xen Hypervisor. Using a range of VM workloads we show that post-copy improves several metrics including pages transferred, total migration time, and network overhead. We facilitate the use of post-copy with adaptive prepaging techniques to minimize the number of page faults across the network. We propose different prepaging strategies and quantitatively compare their effectiveness in reducing network-bound page faults. Finally, we eliminate the transfer of free memory pages in both pre-copy and post-copy through a dynamic self-ballooning (DSB) mechanism. DSB periodically reclaims free pages from a VM and significantly speeds up migration with negligible performance impact on VM workload.

358 citations

Journal ArticleDOI
TL;DR: The way LM and DR are currently being performed and their operation in long-distance networking environments are presented, discussing related issues and bottlenecks and surveying other works.
Abstract: We study the virtual machine live migration (LM) and disaster recovery (DR) from a networking perspective, considering long-distance networks, for example, between data centers. These networks are usually constrained by limited available bandwidth, increased latency and congestion, or high cost of use when dedicated network resources are used, while their exact characteristics cannot be controlled. LM and DR present several challenges due to the large amounts of data that need to be transferred over long-distance networks, which increase with the number of migrated or protected resources. In this context, our work presents the way LM and DR are currently being performed and their operation in long-distance networking environments, discussing related issues and bottlenecks and surveying other works. We also present the way networks are evolving today and the new technologies and protocols (e.g., software-defined networking, or SDN, and flexible optical networks) that can be used to boost the efficiency of LM and DR over long distances. Traffic redirection in a long-distance environment is also an important part of the whole equation, since it directly affects the transparency of LM and DR. Related works and solutions both from academia and the industry are presented.

331 citations