Showing papers in &quot;Operating Systems Review in 2010&quot;

The Akamai network: a platform for high-performance internet applications

TL;DR: Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure.

...read moreread less

Abstract: Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure. Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different data centers). At this scale, small and large components fail continuously. The way Cassandra manages the persistent state in the face of these failures drives the reliability and scalability of the software systems relying on this service. While in many ways Cassandra resembles a database and shares many design and implementation strategies therewith, Cassandra does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format. Cassandra system was designed to run on cheap commodity hardware and handle high write throughput while not sacrificing read efficiency.

...read moreread less

2,870 citations

Journal Article•DOI•

[...]

Erik Nygren¹, Ramesh K. Sitaraman², Jennifer Sun¹•Institutions (2)

Akamai Technologies¹, University of Massachusetts Amherst²

The case for RAMClouds: scalable high-performance storage entirely in DRAM

TL;DR: An overview of the components and capabilities of the Akamai platform is given, and some insight into its architecture, design principles, operation, and management is offered.

...read moreread less

Abstract: Comprising more than 61,000 servers located across nearly 1,000 networks in 70 countries worldwide, the Akamai platform delivers hundreds of billions of Internet interactions daily, helping thousands of enterprises boost the performance and reliability of their Internet applications. In this paper, we give an overview of the components and capabilities of this large-scale distributed computing platform, and offer some insight into its architecture, design principles, operation, and management.

...read moreread less

769 citations

Journal Article•DOI•

[...]

John Ousterhout¹, Parag Agrawal¹, David Erickson¹, Christos Kozyrakis¹, Jacob Leverich¹, David Mazières¹, Subhasish Mitra¹, Aravind Narayanan¹, Guru Parulkar¹, Mendel Rosenblum¹, Stephen M. Rumble¹, Eric Stratmann¹, Ryan Stutsman¹ - Show less +9 more•Institutions (1)

Stanford University¹

Cloud9: a software testing service

TL;DR: This paper argues for a new approach to datacenter storage called RAMCloud, where information is kept entirely in DRAM and large-scale systems are created by aggregating the main memories of thousands of commodity servers.

...read moreread less

Abstract: Disk-oriented approaches to online storage are becoming increasingly problematic: they do not scale gracefully to meet the needs of large-scale Web applications, and improvements in disk capacity have far outstripped improvements in access latency and bandwidth. This paper argues for a new approach to datacenter storage called RAMCloud, where information is kept entirely in DRAM and large-scale systems are created by aggregating the main memories of thousands of commodity servers. We believe that RAMClouds can provide durable and available storage with 100-1000x the throughput of disk-based systems and 100-1000x lower access latency. The combination of low latency and large scale will enable a new breed of dataintensive applications.

...read moreread less

558 citations

Journal Article•DOI•

[...]

Liviu Ciortea¹, Cristian Zamfir¹, Stefan Bucur¹, Vitaly Chipounov¹, George Candea¹ - Show less +1 more•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

A case for the accountable cloud

TL;DR: To the authors' knowledge, Cloud9 is the first symbolic execution engine that scales to large clusters of machines, thus enabling thorough automated testing of real software in conveniently short amounts of time.

...read moreread less

Abstract: Cloud9 aims to reduce the resource-intensive and laborintensive nature of high-quality software testing. First, Cloud9 parallelizes symbolic execution (an effective, but still poorly scalable test automation technique) to large shared-nothing clusters. To our knowledge, Cloud9 is the first symbolic execution engine that scales to large clusters of machines, thus enabling thorough automated testing of real software in conveniently short amounts of time. Preliminary results indicate one to two orders of magnitude speedup over a state-of-the-art symbolic execution engine. Second, Cloud9 is an on-demand software testing service: it runs on compute clouds, like Amazon EC2, and scales its use of resources over a wide dynamic range, proportionally with the testing task at hand.

...read moreread less

209 citations

Journal Article•DOI•

[...]

Andreas Haeberlen¹•Institutions (1)

Max Planck Society¹

The VMware mobile virtualization platform: is that a hypervisor in your pocket?

TL;DR: This paper proposes that the cloud should be made accountable to both the customer and the provider, and outlines the technical requirements for an accountable cloud and describes several challenges that are not yet met by current accountability techniques.

...read moreread less

Abstract: For many companies, clouds are becoming an interesting alternative to a dedicated IT infrastructure. However, cloud computing also carries certain risks for both the customer and the cloud provider. The customer places his computation and data on machines he cannot directly control; the provider agrees to run a service whose details he does not know. If something goes wrong - for example, data leaks to a competitor, or the computation returns incorrect results - it can be difficult for customer and provider to determinewhich of themhas caused the problem, and, in the absence of solid evidence, it is nearly impossible for them to hold each other responsible for the problem if a dispute arises.In this paper, we propose that the cloud should be made accountable to both the customer and the provider. Both parties should be able to check whether the cloud is running the service as agreed. If a problem appears, they should be able to determine which of them is responsible, and to prove the presence of the problem to a third party, such as an arbitrator or a judge. We outline the technical requirements for an accountable cloud, and we describe several challenges that are not yet met by current accountability techniques.

...read moreread less

190 citations

Journal Article•DOI•

[...]

Kenneth C. Barr¹, Prashanth P. Bungale¹, Stephen Deasy¹, Viktor Gyuris¹, Perry Hung¹, Craig Newell¹, Harvey Tuch¹, Bruno Zoppis¹ - Show less +4 more•Institutions (1)

Virtual machine file system

TL;DR: The use case behind MVP, a novel system architecture for mobile virtualization, and key aspects of both core and platform virtualization on mobile devices are described.

...read moreread less

Abstract: The virtualization of mobile devices such as smartphones, tablets, netbooks, and MIDs offers significant potential in addressing the mobile manageability, security, cost, compliance, application development and deployment challenges that exist in the enterprise today. Advances in mobile processor performance, memory and storage capacities have led to the availability of many of the virtualization techniques that have previously been applied in the desktop and server domains. Leveraging these opportunities, VMware's Mobile Virtualization Platform (MVP) makes use of system virtualization to deliver an end-to-end solution for facilitating employee-owned mobile phones in the enterprise. In this paper we describe the use case behind MVP, and provide an overview of the hypervisor's design and implementation. We present a novel system architecture for mobile virtualization and describe key aspects of both core and platform virtualization on mobile devices

...read moreread less

160 citations

Journal Article•DOI•

[...]

Satyam B. Vaghani¹•Institutions (1)

The good, the bad and the ugly of consumer cloud storage

TL;DR: The VMFS architecture and its evolution over the years are presented and changes enable the file system to implement a hardware accelerated data mover and lock manager, among other things.

...read moreread less

Abstract: The Virtual Machine File System (VMFS) is a scalable and high performance symmetric clustered file system for hosting virtual machines (VMs) on shared block storage. It implements a clustered locking protocol exclusively using storage links, and does not require network-based inter-node communication between hosts participating in a VMFS cluster. VMFS layout and IO algorithms are optimized towards providing raw device speed IO throughput to VMs. An adaptive IO mechanism masks errors on the physical fabric using contextual information from the fabric. The VMFS lock service forms the basis of VMware's clustered applications such as vMotion, Storage vMotion, Distributed Resource Scheduling, High Availability, and Fault Tolerance. Virtual machine metadata is serialized to files and VMFS provides a POSIX interface for cluster-safe virtual machine management operations. It also contains a pipelined data mover for bulk data initialization and movement. In recent years, VMFS has inspired changes to diskarray firmware and the SCSI protocol. These changes enable the file system to implement a hardware accelerated data mover and lock manager, among other things. In this paper, we present the VMFS architecture and its evolution over the years

...read moreread less

116 citations

Journal Article•DOI•

[...]

Wenjin Hu¹, Tao Yang¹, Jeanna Matthews¹•Institutions (1)

Clarkson University¹

Provenance as first class cloud data

TL;DR: Four popular consumer cloud storage offerings - Mozy, Carbonite, Dropbox, and CrashPlan are evaluated to determine if they live up to the benefits users expect and derive a set of lessons and recommendations that if followed more uniformly, could substantially improve the cloud storage experience for many consumers.

...read moreread less

Abstract: The promise of automatic data backup into the cloud is alluring. Off-site backup offers protection against a whole class of catastrophic risks (fire, flood, etc.) that on-site backup solutions cannot. Data can be backed up into the cloud automatically with little or no user involvement. Incremental backup software running detects the latest changes, encrypts the data, and sends it into the cloud. Files can be restored on demand and some services allow copies of files to be downloaded through a web interface to other machines , providing a form of file sharing. With costs dropping to ∼$60-$100 per year for unlimited storage, it is not surprising that many home and small business users are signing up. In this paper, we evaluate four popular consumer cloud storage offerings - Mozy, Carbonite, Dropbox, and CrashPlan - to determine if they live up to the benefits users expect. We document wide variations in backup and restore performance, the type of data that is backed-up, no liability for data loss, and problems with data privacy. From our experiments, we derive a set of lessons and recommendations for consumer cloud storage that if followed more uniformly, could substantially improve the cloud storage experience for many consumers.

...read moreread less

102 citations

Journal Article•DOI•

[...]

Kiran-Kumar Muniswamy-Reddy¹, Margo Seltzer¹•Institutions (1)

Harvard University¹

Mining dependency in distributed systems through unstructured logs analysis

TL;DR: This work provides motivation for providers to treat provenance as first class data in the cloud and based on the experience with provenance in a local storage system, suggests a set of requirements that make provenance feasible and attractive.

...read moreread less

Abstract: Digital provenance is meta-data that describes the ancestry or history of a digital object. Most work on provenance focuses on how provenance increases the value of data to consumers. However, provenance is also valuable to storage providers. For example, provenance can provide hints on access patterns, detect anomalous behavior, and provide enhanced user search capabilities. As the next generation storage providers, cloud vendors are in the unique position to capitalize on this opportunity to incorporate provenance as a fundamental storage system primitive. To date, cloud offerings have not yet done so. We provide motivation for providers to treat provenance as first class data in the cloud and based on our experience with provenance in a local storage system, suggest a set of requirements that make provenance feasible and attractive.

...read moreread less

100 citations

Journal Article•DOI•

[...]

Jian-Guang Lou¹, Qiang Fu¹, Yi Wang², Jiang Li¹•Institutions (2)

Microsoft¹, Beijing University of Posts and Telecommunications²

Empirical evaluation of NAND flash memory performance

TL;DR: An approach to mine intercomponent dependencies from unstructured logs that requires neither additional system instrumentation nor any application specific knowledge and successfully identifies the dependencies among the distributed system components.

...read moreread less

Abstract: Dependencies among system components are crucial to locating root errors in a distributed system. In this paper, we propose an approach to mine intercomponent dependencies from unstructured logs. The technique requires neither additional system instrumentation nor any application specific knowledge. In the approach, we first parse each log message into its log key and parameters. Then, we find dependent log key pairs belong to different components by leveraging co-occurrence analysis and parameter correspondence. After that, we use Bayesian decision theory to estimate the dependency direction of each dependent log key pair. We further apply time delay consistency to remove false positive detections. Case studies on Hadoop show that the technique successfully identifies the dependencies among the distributed system components.

...read moreread less

89 citations

Journal Article•DOI•

[...]

Peter Desnoyers¹•Institutions (1)

Northeastern University¹

The design of a practical system for fault-tolerant virtual machines

TL;DR: Read, program, and erase latency were found to align closely with manufacturer's specified \typical" values in almost all cases, but program/erase endurance was found to exceed specified minimum values, often by as much as a factor of 100.

...read moreread less

Abstract: Reports of NAND ash device testing in the literature have for the most part been limited to examination of circuit-level parameters on raw ash chips or prototypes, and system-level parameters on entire storage subsystems. However, there has been little examination of system-level parameters of raw devices, such as mean latency and endurance values. We report the results of such tests on a variety of devices. Read, program, and erase latency were found to align closely with manufacturer's specified \typical" values in almost all cases. Program/erase endurance, however, was found to exceed specified minimum values, often by as much as a factor of 100. In addition significant performance changes were found with wear. These changes may be used to track wear, and in addition have significant implications for system performance over the lifespan of a device. Finally, random write patterns which incur performance penalties on current ash-based memory systems were found to incur no overhead on the devices themselves.

...read moreread less

Journal Article•DOI•

[...]

Daniel J. Scales¹, Michael Nelson¹, Ganesh Venkitachalam¹•Institutions (1)

Virtualizing networking and security in the cloud

TL;DR: An easy-to-use, commercial system that automatically restores redundancy after failure requires many additional components beyond replicated VM execution, and this work has designed and implemented these extra components and addressed many practical issues encountered in supporting VMs running enterprise applications.

...read moreread less

Abstract: We have implemented a commercial enterprise-grade system for providing fault-tolerant virtual machines, based on the approach of replicating the execution of a primary virtual machine (VM) via a backup virtual machine on another server. We have designed a complete system in VMware vSphere 4.0 that is easy to use, runs on commodity servers, and typically reduces performance of real applications by less than 10%. In addition, the data bandwidth needed to keep the primary and secondary VM executing in lockstep is less than 20 Mbit/s for several real applications, which allows for the possibility of implementing fault tolerance over longer distances. An easy-to-use, commercial system that automatically restores redundancy after failure requires many additional components beyond replicated VM execution. We have designed and implemented these extra components and addressed many practical issues encountered in supporting VMs running enterprise applications. In this paper, we describe our basic design, discuss alternate design choices and a number of the implementation details, and provide performance results for both micro-benchmarks and real applications

...read moreread less

Journal Article•DOI•

[...]

Debashis Basak¹, Rohit Toshniwal¹, Serge Maskalik¹, Allwyn Sequeira¹•Institutions (1)

Virtualization performance: perspectives and challenges ahead

TL;DR: This paper highlights a new trend in the industry to virtualize netsec functions inside security virtual appliances (SVAs), which can then be placed on hosts, and offer distributed security functions for network flows across the cluster.

...read moreread less

Abstract: Virtualization of computer workloads onto powerful x86 multicore platforms is leading to a massive transformation in the way services are produced by next generation data centers. Simultaneously, cloud computing principles are compelling a rethink in the way enterprises are beginning to consume such services. In this paper, we present the need for network and security (netsec) functions, which are currently realized in hardware appliances, to significantly evolve to keep pace with these new trends, and to provide "disruptively simplified" security that was not earlier possibleWith server consolidation and desktop virtualization, significantly more traffic remains within the data center racks, leading to blind spots for "in network" security appliances. Current netsec devices which are architected based on "scale up" principles cannot keep pace with increased bandwidth driven to the servers, and the ever increasing volume of threats at all layers of the network stack. Also, highly mobile workloads and increasing intelligence in the virtual/hypervisor layer, makes it increasingly hard for static network devices to interlock with dynamic policy changes and onthe- fly re-purposing of resources to serve different workloads, applications, or usersThis paper highlights a new trend in the industry to virtualize netsec functions inside security virtual appliances (SVAs), which can then be placed on hosts, and offer distributed security functions for network flows across the cluster. We analyze this trend in detail using the VMware vShield product line as an example. The approach replaces single choke-point based physical security devices like firewalls, IP address Management (IPAM), flow monitoring, and data leakage prevention (DLP) with distributed virtual counterparts running on slices of x86 co-located with compute workloads with ability to tap into traffic going in and out of virtual machines (VMs)vShield's distributed scale-out architecture means performance can scale up or down linearly as new SVAs are added, while simplifying the lifecycle management of these SVAs including installs, upgrades, ability to debug, and reliability by leveraging underlying virtualization primitives of VM cloning, deploy from template, and VM high availability and fault tolerance. Interactions with features like live migration (vmotion) of guest VMs and distributed power management of host servers introduce new aspects of appliance management that was not possible in the physical world. The paper analyzes these aspects of SVA management in depth. Our measurements of the security inspection throughput for given vCPUs and memory indicate it is comparable to those of physical counterparts with the additional flexibility of a scale-out deployment. Further, we demonstrate that with this approach a virtual datacenter (VDC) in the cloud can be deployed in minutes compared to days/weeks with physical datacenters. Finally, we present the additional security inspections that can be performed in the virtual world that were not possible in the physical world. The ability of SVAs to introspect traffic into and out of VMs implies they can perform checks for MAC spoofing, IP spoofing [6], ARP filtering at the source. Furthermore, based on security analysis if a VM is deemed suspect it can be quickly quarantinedConcepts such as flow introspection, automated insertion of SVAs into flows at VM ingress/egress, distributed scale out architecture across a cluster of hosts, encapsulation of secure VDCs, and programmability of security policies via RESTful interfaces, represent a significant architectural change, with wide applicability in enterprise data centers, and private/public cloud environments.

...read moreread less

Journal Article•DOI•

[...]

Richard McDougall¹, Jennifer M. Anderson¹•Institutions (1)

Differential RAID: rethinking RAID for SSD reliability

TL;DR: This paper describes how virtualization performance at all of these levels has progressed with advances in software and hardware and discusses some of the challenges and opportunities that lie ahead as the era of cloud computing begins.

...read moreread less

Abstract: Performance is a central requirement to the wide-spread adoption of virtualization. To deliver on the promise of simplifying IT via virtualization, the virtualization platform must provide excellent performance with minimal effort. Virtualization performance encompasses several different dimensions. An application running in a virtual machine must perform on-par with the same application natively. Multiple virtual machines running on the same host must scale well and share resources effectively. In this paper we will describe how virtualization performance at all of these levels has progressed with advances in software and hardware. We then discuss some of the challenges and opportunities that lie ahead as we move into the era of cloud computing

...read moreread less

Journal Article•DOI•

[...]

Asim Kadav¹, Mahesh Balakrishnan², Vijayan Prabhakaran², Dahlia Malkhi²•Institutions (2)

University of Wisconsin-Madison¹, Microsoft²

Scalable concurrent hash tables via relativistic programming

TL;DR: Diff-RAID is presented, a new RAID variant that distributes parity unevenly across SSDs to create age disparities within arrays and provides much greater reliability for SSDs compared to RAID-4 and RAID-5 for the same space overhead, and offers a trade-off curve between throughput and reliability.

...read moreread less

Abstract: Deployment of SSDs in enterprise settings is limited by the low erase cycles available on commodity devices. Redundancy solutions such as RAID can potentially be used to protect against the high Bit Error Rate (BER) of aging SSDs. Unfortunately, such solutions wear out redundant devices at similar rates, inducing correlated failures as arrays age in unison. We present Diff-RAID, a new RAID variant that distributes parity unevenly across SSDs to create age disparities within arrays. By doing so, Diff-RAID balances the high BER of old SSDs against the low BER of young SSDs. Diff-RAID provides much greater reliability for SSDs compared to RAID-4 and RAID-5 for the same space overhead, and offers a trade-off curve between throughput and reliability.

...read moreread less

Journal Article•DOI•

[...]

Josh Triplett¹, Paul E. McKenney², Jonathan Walpole¹•Institutions (2)

Portland State University¹, IBM²

Cloud-TM: harnessing the cloud with distributed transactional memories

TL;DR: This paper presents a novel concurrent hash table implementation which supports wait-free, near-linearly scalable lookup, even in the presence of concurrent modifications, and uses a new concurrent programming technique known as relativistic programming.

...read moreread less

Abstract: This paper presents a novel concurrent hash table implementation which supports wait-free, near-linearly scalable lookup, even in the presence of concurrent modifications. In particular, this hash table implementation supports concurrent moves of hash table elements between buckets, for purposes such as renames.Implementation of this algorithm in the Linux kernel demonstrates its performance and scalability. Benchmarks on a 64-way POWER system showed a 6x scalability improvement versus fine-grained locking, and a 1.5x improvement versus the current state of the art in Linux.To achieve these scalability improvements, the hash table implementation uses a new concurrent programming technique known as relativistic programming. This approach uses a copy-based update strategy to allow readers and writers to run concurrently without conflicts, avoiding many of the non-scalable costs of synchronization, inter-processor communication, and cache coherence. New techniques such as the proposed hash-table move algorithm allow readers to tolerate the resulting weak memory-ordering behavior that arises from allowing one version of a structure to be read concurrently with updates to a different version of the same structure. Relativistic programming techniques provide performance and scalability advantages over traditional synchronization, as demonstrated through several benchmarks.

...read moreread less

Journal Article•DOI•

[...]

Paolo Romano¹, Luís Rodrigues¹, Nuno Carvalho¹, João Cachopo¹•Institutions (1)

INESC-ID¹

TL;DR: This paper identifies where existing Distributed Transactional Memory platforms still fail to meet the requirements of the cloud and of its users, and points several open research problems whose solution is deemed as essential to materialize the Cloud-TM vision.

...read moreread less

Abstract: One of the main challenges to harness the potential of Cloud computing is the design of programming models that simplify the development of large-scale parallel applications and that allow ordinary programmers to take full advantage of the computing power and the storage provided by the Cloud, both of which made available, on demand, in a pay-only-forwhat-you-use pricing model.In this paper, we discuss the use of the Transactional Memory programming model in the context of the cloud computing paradigm, which we refer to as Cloud-TM. We identify where existing Distributed Transactional Memory platforms still fail to meet the requirements of the cloud and of its users, and we point several open research problems whose solution we deem as essential to materialize the Cloud-TM vision.

...read moreread less

Journal Article•DOI•

Efficiency matters

[...]

Eric Anderson¹, Joseph Tucek¹•Institutions (1)

Hewlett-Packard¹

Are clouds ready for large distributed applications

TL;DR: There is a pressing need to rethink the design of future data intensive computing systems, focusing on scalability without considering efficiency, and consider the direction of future research.

...read moreread less

Abstract: Current data intensive scalable computing (DISC) systems, although scalable, achieve embarrassingly low rates of processing per node. We feel that current DISC systems have repeated a mistake of old high-performance systems: focusing on scalability without considering efficiency. This poor efficiency comes with issues in reliability, energy, and cost. As the gap between theoretical performance and what is actually achieved has become glaringly large, we feel there is a pressing need to rethink the design of future data intensive computing and carefully consider the direction of future research.

...read moreread less

Journal Article•DOI•

[...]

Kunwadee Sripanidkulchai¹, Sambit Sahu¹, Yaoping Ruan¹, Anees Shaikh¹, Chitra Dorai¹ - Show less +1 more•Institutions (1)

IBM¹

Online cache modeling for commodity multicore processors

TL;DR: The initial findings indicate that while clouds are ready to support usage scenarios for individual users, there are still rich areas of future research to be explored to enable clouds to support large distributed applications such as those found in enterprise.

...read moreread less

Abstract: Cloud computing carries the promise of providing powerful new models and abstractions that could transform the way IT services are delivered today. In order to establish the readiness of clouds to deliver meaningful enterprise-class IT services, we identify three key issues that ought to be addressed as first priority from the perspective of potential cloud users: how to deploy large-scale distributed services, how to deliver high availability services, and how to perform problem resolution on the cloud. We analyze multiple sources of publicly available data to establish cloud user expectations and compare against the current state of cloud offerings, with a focus on contrasting the different requirements from two classes of users -- the individual and the enterprise. Through this process, our initial findings indicate that while clouds are ready to support usage scenarios for individual users, there are still rich areas of future research to be explored to enable clouds to support large distributed applications such as those found in enterprise.

...read moreread less

Journal Article•DOI•

[...]

Richard West¹, Puneet Zaroo², Carl A. Waldspurger², Xiao Zhang³•Institutions (3)

Boston University¹, VMware², University of Rochester³

The evolution of an x86 virtual machine monitor

TL;DR: This work introduces an efficient online technique for estimating the cache occupancies of software threads, and derives an analytical model that considers the impact of set-associativity, line replacement policy, and memory locality effects.

...read moreread less

Abstract: Modern chip-level multiprocessors (CMPs) contain multiple processor cores sharing a common last-level cache, memory interconnects, and other hardware resources. Workloads running on separate cores compete for these resources, often resulting in highlyvariable performance. It is generally desirable to co-schedule workloads that have minimal resource contention, in order to improve both performance and fairness. Unfortunately, commodity processors expose only limited information about the state of shared resources such as caches to the software responsible for scheduling workloads that execute concurrently. To make informed resourcemanagement decisions, it is important to obtain accurate measurements of per-workload cache occupancies and their impact on performance, often summarized by utility functions such as miss-ratio curves (MRCs)In this paper, we first introduce an efficient online technique for estimating the cache occupancy of individual software threads using only commonly-available hardware performance counters. We derive an analytical model as the basis of our occupancy estimation, and extend it for improved accuracy on modern cache configurations, considering the impact of set-associativity, line replacement policy, and memory locality effects. We demonstrate the effectiveness of occupancy estimation with a series of CMP simulations in which SPEC benchmarks execute concurrently on multiple cores. Leveraging our occupancy estimation technique, we also introduce a lightweight approach for online MRC construction, and demonstrate its effectiveness using a prototype implementation in the VMware ESX Server hypervisor. We present a series of experiments involving SPEC benchmarks, comparing the MRCs we construct online with MRCs generated offline in which various cache sizes are enforced via static page coloring

...read moreread less

Journal Article•DOI•

[...]

Ole Agesen¹, Alex Garthwaite¹, Jeffrey W. Sheldon¹, Pratap Subrahmanyam¹•Institutions (1)

Challenges in building scalable virtualized datacenter management

TL;DR: This work reviews how the x86 architecture was originally virtualized in the days of the Pentium II (1998), and follows the evolution of the virtual machine monitor forward through the introduction of virtual SMP, 64 bit (x64), and hardware support for virtualization to finish with a contemporary challenge, nested virtualization.

...read moreread less

Abstract: Twelve years have passed since VMware engineers first virtualized the x86 architecture. This technological breakthrough kicked off a transformation of an entire industry, and virtualization is now (once again) a thriving business with a wide range of solutions being deployed, developed and proposed. But at the base of it all, the fundamental quest is still the same: running virtual machines as well as we possibly can on top of a virtual machine monitor.We review how the x86 architecture was originally virtualized in the days of the Pentium II (1998), and follow the evolution of the virtual machine monitor forward through the introduction of virtual SMP, 64 bit (x64), and hardware support for virtualization to finish with a contemporary challenge, nested virtualization.

...read moreread less

Journal Article•DOI•

[...]

Vijayaraghavan Soundararajan¹, Kinshuk Govil¹•Institutions (1)

Enabling a marketplace of clouds: VMware's vCloud director

TL;DR: This paper presents some of the techniques used to address the need for high-performance, robust management tools that scale from a few hosts to cloud-scale poses interesting challenges for the management software.

...read moreread less

Abstract: Virtualization drives higher resource utilization and makes provisioning new systems very easy and cheap. This combination has led to an ever-increasing number of virtual machines: the largest data centers will likely have more than 100K in few years, and many deployments will span multiple data centers. Virtual machines are also getting increasingly more capable, consisting of more vCPUs, more memory, and higher-bandwidth virtual I/O devices with a variety of capabilities like bandwidth throttling and traffic mirroringTo reduce the work for IT administrators managing these environments, VMware and other companies provide several monitoring, automation, and policy-driven tools. These tools require a lot of information about various aspects of each VM and other objects in the system, such as physical hosts, storage infrastructure, and networking. To support these tools and the hundreds of simultaneous users who manage the environment, the management software needs to provide secure access to the data in real-time with some degree of consistency and backwardcompatibility, and very high availability under a variety of failures and planned maintenance. Such software must satisfy a continuum of designs: it must perform well at large-scale to accommodate the largest datacenters, but it must also accommodate smaller deployments by limiting its resource consumption and overhead according to demand. The need for high-performance, robust management tools that scale from a few hosts to cloud-scale poses interesting challenges for the management software. This paper presents some of the techniques we have employed to address these challenges

...read moreread less

Journal Article•DOI•

[...]

Orran Krieger¹, Phil McGachey¹, Arkady Kanevsky¹•Institutions (1)

I do declare: consensus in a logic language

TL;DR: A vision of a marketplace of clouds is described, what is needed to make this vision a reality is discussed, and what VMware is doing to help enable this marketplace model of cloud computing is described.

...read moreread less

Abstract: Cloud computing promises to bring about a fundamental shift in the computer industry where consumers of IT enjoy on-demand access to massive compute capacity and producers of IT benefit from economies of scale and automation. We believe that the advantages of cloud computing will be best realized if there is a highly competitive marketplace. We describe our vision of a marketplace of clouds, discuss what is needed to make this vision a reality, and then describe what VMware is doing to help enable this marketplace model of cloud computing

...read moreread less

Journal Article•DOI•

[...]

Peter Alvaro¹, Tyson Condie¹, Neil Conway¹, Joseph M. Hellerstein¹, Russell Sears¹ - Show less +1 more•Institutions (1)

University of California, Berkeley¹

Opportunities and challenges to unify workload, power, and cooling management in data centers

TL;DR: It is found that the Paxos algorithm is easily translated to declarative logic, in large part because the primitives used in consensus protocol specifications map directly to simple Overlog constructs such as aggregation and selection.

...read moreread less

Abstract: The Paxos consensus protocol can be specified concisely, but is notoriously difficult to implement in practice. We recount our experience building Paxos in Overlog, a distributed declarative programming language. We found that the Paxos algorithm is easily translated to declarative logic, in large part because the primitives used in consensus protocol specifications map directly to simple Overlog constructs such as aggregation and selection. We discuss the programming idioms that appear frequently in our implementation, and the applicability of declarative programming to related application domains.

...read moreread less

Proceedings Article•DOI•

[...]

Zhikui Wang¹, Niraj Tolia¹, Cullen E. Bash¹•Institutions (1)

Hewlett-Packard¹

13 Apr 2010-Operating Systems Review

TL;DR: This paper presents the detailed models derived from experiments on an blade enclosure system that can be representative of a data center, discusses the optimization opportunities for coordinated power and cooling management, and the challenges for controller design, and proposes a few design principles and examples.

...read moreread less

Abstract: Independent optimization for workload and power management, and active cooling control have been studied extensively to improve data center energy efficiency. Recently, proposals have started to advocate unified workload, power, and cooling management for further energy savings. In this paper, we study this problem with the objectives of both saving energy and capping power. We present the detailed models derived in our previous work from experiments on an blade enclosure system that can be representative of a data center, discuss the optimization opportunities for coordinated power and cooling management, and the challenges for controller design. We then propose a few design principles and examples for unified workload management, power minimization, and power capping. Our simulation-based evaluation shows that the controllers can cap the total power consumption while maintaining the thermal conditions and improve the overall energy efficiency. We argue that the same opportunities, challenges, and designs are also generally applicable to data center level management.

...read moreread less

Journal Article•DOI•

NEPI: using independent simulators, emulators, and testbeds for easy experimentation

[...]

Mathieu Lacage¹, Martin Ferrari¹, Mads Hansen¹, Thierry Turletti¹, Walid Dabbous¹ - Show less +1 more•Institutions (1)

French Institute for Research in Computer Science and Automation¹

Incremental learning of system log formats

TL;DR: It is shown how a single object model which encompasses every aspect of a typical experimentation workow can be used to completely describe experiments to be run within very different experimentation environments.

...read moreread less

Abstract: Evaluating new network protocols, applications, and architectures uses many kinds of experimentation environments: simulators, emulators, testbeds, and sometimes, combinations of these. As the functionality and complexity of these tools increases, mastering and efficiently using each of them is becoming increasingly difficult.In this paper, we consider how to make it easier to use multiple tools separately and together to improve the productivity of network researchers. We show how a single object model which encompasses every aspect of a typical experimentation workow can be used to completely describe experiments to be run within very different experimentation environments.Although Nepi is still in early design and prototyping stage, we expect that its ability to describe and automate easily complex mixed experiments will enable further experimentation with heterogenous networks.

...read moreread less

Journal Article•DOI•

[...]

Kenny Q. Zhu¹, Kathleen Fisher², David Walker³•Institutions (3)

Shanghai Jiao Tong University¹, AT&T Labs², Princeton University³

Scalable virtual machine storage using local disks

TL;DR: An incremental algorithm is presented that automatically infers the format of system log files and from the resulting format descriptions, a suite of data processing tools automatically is generated that can handle large-scale data sources whose formats evolve over time.

...read moreread less

Abstract: System logs come in a large and evolving variety of formats, many of which are semi-structured and/or non-standard. As a consequence, off-the-shelf tools for processing such logs often do not exist, forcing analysts to develop their own tools, which is costly and timeconsuming. In this paper, we present an incremental algorithm that automatically infers the format of system log files. From the resulting format descriptions, we can generate a suite of data processing tools automatically. The system can handle large-scale data sources whose formats evolve over time. Furthermore, it allows analysts to modify inferred descriptions as desired and incorporates those changes in future revisions.

...read moreread less

Journal Article•DOI•

[...]

Jacob Gorm Hansen¹, Eric Jul²•Institutions (2)

VMware¹, Bell Labs²

Accurate offline synchronization of distributed traces using kernel-level events

TL;DR: The design of Lithium borrows techniques from Byzantine Fault Tolerance, stream processing, and distributed version control software, and demonstrates their practical applicability to the performance-sensitive task of virtual machine storage.

...read moreread less

Abstract: In virtualized data centers, storage systems have traditionally been treated as black boxes administered separately from the compute nodes. Direct-attached storage is often left unused, to not have VM availabilty depend on individual hosts. Our work aims to integrate storage and compute, addressing the fundamental limitations of contemporary centralized storage solutions. We are building Lithium, a distributed storage system designed specifically for virtualization workloads running in large-scale data centers and clouds. Lithium aims to be scalable, highly available, and compatible with commodity hardware and existing application software. The design of Lithium borrows techniques from Byzantine Fault Tolerance, stream processing, and distributed version control software, and demonstrates their practical applicability to the performance-sensitive task of virtual machine storage

...read moreread less

Journal Article•DOI•

[...]

Benjamin Poirier¹, Robert Le Roy¹, Michel Dagenais¹•Institutions (1)

École Polytechnique de Montréal¹

Akamai state of the internet report, Q4 2009

TL;DR: This work presents an offline trace synchronization algorithm that is directly applicable to pairs of nodes and that can report approximate bounds on accuracy over short tracing durations and an efficient implementation of this algorithm and an experimental study of parameters that affect synchronization accuracy.

...read moreread less

Abstract: Tracing has proven to be a valuable tool for identifying functional and performance problems. In order to use it on distributed nodes, the timestamps in the traces need to be precisely synchronized. The objective of this work is to improve synchronization of traces recorded on distributed nodes. We aim for high precision and low intrusiveness. In this paper, we present an offline trace synchronization algorithm that is directly applicable to pairs of nodes and that can report approximate bounds on accuracy over short tracing durations. We also present an efficient implementation of this algorithm and an experimental study of parameters that affect synchronization accuracy.

...read moreread less

Journal Article•DOI•

[...]

David Belson¹•Institutions (1)

Akamai Technologies¹