scispace - formally typeset
Search or ask a question

Showing papers on "Data center published in 2013"


Journal ArticleDOI
TL;DR: This paper presents a system that uses virtualization technology to allocate data center resources dynamically based on application demands and support green computing by optimizing the number of servers in use and develops a set of heuristics that prevent overload in the system effectively while saving energy used.
Abstract: Cloud computing allows business customers to scale up and down their resource usage based on needs. Many of the touted gains in the cloud model come from resource multiplexing through virtualization technology. In this paper, we present a system that uses virtualization technology to allocate data center resources dynamically based on application demands and support green computing by optimizing the number of servers in use. We introduce the concept of "skewness” to measure the unevenness in the multidimensional resource utilization of a server. By minimizing skewness, we can combine different types of workloads nicely and improve the overall utilization of server resources. We develop a set of heuristics that prevent overload in the system effectively while saving energy used. Trace driven simulation and experiment results demonstrate that our algorithm achieves good performance.

859 citations


Journal ArticleDOI
TL;DR: A survey of the current state-of-the-art in data center networks virtualization, and a detailed comparison of the surveyed proposals are presented.
Abstract: With the growth of data volumes and variety of Internet applications, data centers (DCs) have become an efficient and promising infrastructure for supporting data storage, and providing the platform for the deployment of diversified network services and applications (e.g., video streaming, cloud computing). These applications and services often impose multifarious resource demands (storage, compute power, bandwidth, latency) on the underlying infrastructure. Existing data center architectures lack the flexibility to effectively support these applications, which results in poor support of QoS, deployability, manageability, and defence against security attacks. Data center network virtualization is a promising solution to address these problems. Virtualized data centers are envisioned to provide better management flexibility, lower cost, scalability, better resources utilization, and energy efficiency. In this paper, we present a survey of the current state-of-the-art in data center networks virtualization, and provide a detailed comparison of the surveyed proposals. We discuss the key research challenges for future research and point out some potential directions for tackling the problems related to data center design.

633 citations


Journal ArticleDOI
TL;DR: A very general model is proposed and it is proved that the optimal offline algorithm for dynamic right-sizing has a simple structure when viewed in reverse time, and this structure is exploited to develop a new “lazy” online algorithm, which is proven to be 3-competitive.
Abstract: Power consumption imposes a significant cost for data centers implementing cloud services, yet much of that power is used to maintain excess service capacity during periods of low load. This paper investigates how much can be saved by dynamically "right-sizing" the data center by turning off servers during such periods and how to achieve that saving via an online algorithm. We propose a very general model and prove that the optimal offline algorithm for dynamic right-sizing has a simple structure when viewed in reverse time, and this structure is exploited to develop a new "lazy" online algorithm, which is proven to be 3-competitive. We validate the algorithm using traces from two real data-center workloads and show that significant cost savings are possible. Additionally, we contrast this new algorithm with the more traditional approach of receding horizon control.

324 citations


Proceedings ArticleDOI
03 Nov 2013
TL;DR: SPANStore is presented, a key-value store that exports a unified view of storage services in geographically distributed data centers that can lower costs by over 10x in several scenarios, in comparison with alternative solutions that either use a single storage provider or replicate every object to every data center from which it is accessed.
Abstract: By offering storage services in several geographically distributed data centers, cloud computing platforms enable applications to offer low latency access to user data. However, application developers are left to deal with the complexities associated with choosing the storage services at which any object is replicated and maintaining consistency across these replicas.In this paper, we present SPANStore, a key-value store that exports a unified view of storage services in geographically distributed data centers. To minimize an application provider's cost, we combine three key principles. First, SPANStore spans multiple cloud providers to increase the geographical density of data centers and to minimize cost by exploiting pricing discrepancies across providers. Second, by estimating application workload at the right granularity, SPANStore judiciously trades off greater geo-distributed replication necessary to satisfy latency goals with the higher storage and data propagation costs this entails in order to satisfy fault tolerance and consistency requirements. Finally, SPANStore minimizes the use of compute resources to implement tasks such as two-phase locking and data propagation, which are necessary to offer a global view of the storage services that it builds upon. Our evaluation of SPANStore shows that it can lower costs by over 10x in several scenarios, in comparison with alternative solutions that either use a single storage provider or replicate every object to every data center from which it is accessed.

251 citations


Journal ArticleDOI
TL;DR: This paper proposes a systematic approach to maximize green data center's profit, i.e., revenue minus cost, and proposes a novel optimization-based profit maximization strategy, which significantly outperforms two comparable energy and performance management algorithms that are recently proposed in the literature.
Abstract: While a large body of work has recently focused on reducing data center's energy expenses, there exists no prior work on investigating the trade-off between minimizing data center's energy expenditure and maximizing their revenue for various Internet and cloud computing services that they may offer. In this paper, we seek to tackle this shortcoming by proposing a systematic approach to maximize green data center's profit, i.e., revenue minus cost. In this regard, we explicitly take into account practical service-level agreements (SLAs) that currently exist between data centers and their customers. Our model also incorporates various other factors such as availability of local renewable power generation at data centers and the stochastic nature of data centers' workload. Furthermore, we propose a novel optimization-based profit maximization strategy for data centers for two different cases, without and with behind-the-meter renewable generators. We show that the formulated optimization problems in both cases are convex programs; therefore, they are tractable and appropriate for practical implementation. Using various experimental data and via computer simulations, we assess the performance of the proposed optimization-based profit maximization strategy and show that it significantly outperforms two comparable energy and performance management algorithms that are recently proposed in the literature.

234 citations


Journal ArticleDOI
TL;DR: The basic idea of VMPlanner is to optimize both virtual machine placement and traffic flow routing so as to turn off as many unneeded network elements as possible for power saving in the virtualization-based data centers.

228 citations


Journal ArticleDOI
TL;DR: An update on recent developments in the field of ultra-highcapacity optical interconnects for intra-DCN communication is provided.
Abstract: Warehouse-scale data center operators need much-higher-bandwidth intra-data center networks (DCNs) to sustain the increase of network traffic due to cloud computing and other emerging web applications. Current DCNs based on commodity switches require excessive amounts of power to face this traffic increase. Optical intra-DCN interconnection networks have recently emerged as a promising solution that can provide higher throughput while consuming less power. This article provides an update on recent developments in the field of ultra-highcapacity optical interconnects for intra-DCN communication. Several recently proposed architectures and technologies are examined and compared, while future trends and research challenges are outlined.

202 citations


Journal ArticleDOI
01 Jul 2013
TL;DR: In this article, a self-organizing and adaptive approach for the consolidation of VMs on two resources, namely, CPU and RAM, is presented, which makes the approach very simple to implement.
Abstract: Power efficiency is one of the main issues that will drive the design of data centers, especially of those devoted to provide Cloud computing services. In virtualized data centers, consolidation of Virtual Machines (VMs) on the minimum number of physical servers has been recognized as a very efficient approach, as this allows unloaded servers to be switched off or used to accommodate more load, which is clearly a cheaper alternative to buy more resources. The consolidation problem must be solved on multiple dimensions, since in modern data centers CPU is not the only critical resource: depending on the characteristics of the workload other resources, for example, RAM and bandwidth, can become the bottleneck. The problem is so complex that centralized and deterministic solutions are practically useless in large data centers with hundreds or thousands of servers. This paper presents ecoCloud, a self-organizing and adaptive approach for the consolidation of VMs on two resources, namely CPU and RAM. Decisions on the assignment and migration of VMs are driven by probabilistic processes and are based exclusively on local information, which makes the approach very simple to implement. Both a fluid-like mathematical model and experiments on a real data center show that the approach rapidly consolidates the workload, and CPU-bound and RAM-bound VMs are balanced, so that both resources are exploited efficiently.

189 citations


Journal ArticleDOI
TL;DR: A virtual machine placement algorithm EAGLE is proposed, which can balance the utilization of multi-dimensional resources, reduce the number of running PMs, and thus lower the energy consumption, and is evaluated via extensive simulations and experiments on real traces.

185 citations


Journal ArticleDOI
TL;DR: This paper is devoted to categorization of green computing performance metrics in data centers, such as basic metrics like power metrics, thermal metrics and extended performance metrics i.e. multiple data center indicators.
Abstract: Data centers now play an important role in modern IT infrastructures. Although much research effort has been made in the field of green data center computing, performance metrics for green data centers have been left ignored. This paper is devoted to categorization of green computing performance metrics in data centers, such as basic metrics like power metrics, thermal metrics and extended performance metrics i.e. multiple data center indicators. Based on a taxonomy of performance metrics, this paper summarizes features of currently available metrics and presents insights for the study on green data center computing.

182 citations


Journal ArticleDOI
TL;DR: This paper study state-of-the-art techniques and research related to power saving in the IaaS of a cloud computing system, which consumes a huge part of total energy in a cloud Computing system.
Abstract: Although cloud computing has rapidly emerged as a widely accepted computing paradigm, the research on cloud computing is still at an early stage. Cloud computing suffers from different challenging issues related to security, software frameworks, quality of service, standardization, and power consumption. Efficient energy management is one of the most challenging research issues. The core services in cloud computing system are the SaaS (Software as a Service), PaaS (Platform as a Service), and IaaS (Infrastructure as a Service). In this paper, we study state-of-the-art techniques and research related to power saving in the IaaS of a cloud computing system, which consumes a huge part of total energy in a cloud computing system. At the end, some feasible solutions for building green cloud computing are proposed. Our aim is to provide a better understanding of the design challenges of energy management in the IaaS of a cloud computing system.

Journal ArticleDOI
TL;DR: This work is the first to explore the problem of electricity cost saving using energy storage in multiple data centers by considering both the spatial and temporal variations in wholesale electricity prices and workload arrival processes.
Abstract: Electricity expenditure comprises a significant fraction of the total operating cost in data centers. Hence, cloud service providers are required to reduce electricity cost as much as possible. In this paper, we consider utilizing existing energy storage capabilities in data centers to reduce electricity cost under wholesale electricity markets, where the electricity price exhibits both temporal and spatial variations. A stochastic program is formulated by integrating the center-level load balancing, the server-level configuration, and the battery management while at the same time guaranteeing the quality-of-service experience by end users. We use the Lyapunov optimization technique to design an online algorithm that achieves an explicit tradeoff between cost saving and energy storage capacity. We demonstrate the effectiveness of our proposed algorithm through extensive numerical evaluations based on real-world workload and electricity price data sets. As far as we know, our work is the first to explore the problem of electricity cost saving using energy storage in multiple data centers by considering both the spatial and temporal variations in wholesale electricity prices and workload arrival processes.

Proceedings ArticleDOI
08 Jul 2013
TL;DR: Harmony, a Heterogeneity-Aware dynamic capacity provisioning scheme for cloud data centers that uses the K-means clustering algorithm to divide workload into distinct task classes with similar characteristics in terms of resource and performance requirements and can reduce energy by 28 percent compared to heterogeneity-oblivious solutions.
Abstract: Data centers today consume tremendous amount of energy in terms of power distribution and cooling. Dynamic capacity provisioning is a promising approach for reducing energy consumption by dynamically adjusting the number of active machines to match resource demands. However, despite extensive studies of the problem, existing solutions for dynamic capacity provisioning have not fully considered the heterogeneity of both workload and machine hardware found in production environments. In particular, production data centers often comprise several generations of machines with different capacities, capabilities and energy consumption characteristics. Meanwhile, the workloads running in these data centers typically consist of a wide variety of applications with different priorities, performance objectives and resource requirements. Failure to consider heterogenous characteristics will lead to both sub-optimal energy-savings and long scheduling delays, due to incompatibility between workload requirements and the resources offered by the provisioned machines. To address this limitation, in this paper we present HARMONY, a Heterogeneity-Aware Resource Management System for dynamic capacity provisioning in cloud computing environments. Specifically, we first use the K-means clustering algorithm to divide the workload into distinct task classes with similar characteristics in terms of resource and performance requirements. Then we present a novel technique for dynamically adjusting the number of machines of each type to minimize total energy consumption and performance penalty in terms of scheduling delay. Through simulations using real traces from Google's compute clusters, we found that our approach can improve data center energy efficiency by up to 28% compared to heterogeneity-oblivious solutions.

Proceedings ArticleDOI
08 Jul 2013
TL;DR: The Glasgow Raspberry Pi Cloud (PiCloud), a scale model of a DC composed of clusters of Raspberry Pi devices, emulates every layer of a Cloud stack, ranging from resource virtualisation to network behaviour, providing a full-featured Cloud Computing research and educational environment.
Abstract: Data Centers (DC) used to support Cloud services often consist of tens of thousands of networked machines under a single roof. The significant capital outlay required to replicate such infrastructures constitutes a major obstacle to practical implementation and evaluation of research in this domain. Currently, most research into Cloud computing relies on either limited software simulation, or the use of a testbed environments with a handful of machines. The recent introduction of the Raspberry Pi, a low-cost, low-power single-board computer, has made the construction of a miniature Cloud DCs more affordable. In this paper, we present the Glasgow Raspberry Pi Cloud (PiCloud), a scale model of a DC composed of clusters of Raspberry Pi devices. The PiCloud emulates every layer of a Cloud stack, ranging from resource virtualisation to network behaviour, providing a full-featured Cloud Computing research and educational environment.

Proceedings ArticleDOI
05 May 2013
TL;DR: Facebook's current data center network architecture is reviewed and alternative architectures are explored to explore some alternative architectures used in data center networks around the world.
Abstract: We review Facebook's current data center network architecture and explore some alternative architectures.

Book ChapterDOI
26 Aug 2013
TL;DR: This paper proposes a novel VM placement algorithm to increase the environmental sustainability by taking into account distributed data centers with different carbon footprint rates and PUEs and results show that the proposed algorithm reduces the CO2 emission and power consumption, while it maintains the same level of quality of service.
Abstract: Due to the increasing use of Cloud computing services and the amount of energy used by data centers, there is a growing interest in reducing energy consumption and carbon footprint of data centers. Cloud data centers use virtualization technology to host multiple virtual machines (VMs) on a single physical server. By applying efficient VM placement algorithms, Cloud providers are able to enhance energy efficiency and reduce carbon footprint. Previous works have focused on reducing the energy used within a single or multiple data centers without considering their energy sources and Power Usage Effectiveness (PUE). In contrast, this paper proposes a novel VM placement algorithm to increase the environmental sustainability by taking into account distributed data centers with different carbon footprint rates and PUEs. Simulation results show that the proposed algorithm reduces the CO2 emission and power consumption, while it maintains the same level of quality of service compared to other competitive algorithms.

Proceedings Article
27 May 2013
TL;DR: VDC Planner is proposed, a migration-aware dynamic virtual data center embedding framework that aims at achieving high revenue while minimizing the total energy cost over-time and achieves both higher revenue and lower average scheduling delay compared to existing migration-oblivious solutions.
Abstract: Cloud computing promises to provide computing resources to a large number of service applications in an on demand manner. Traditionally, cloud providers such as Amazon only provide guaranteed allocation for compute and storage resources, and fail to support bandwidth requirements and performance isolation among these applications. To address this limitation, recently, a number of proposals advocate providing both guaranteed server and network resources in the form of Virtual Data Centers (VDCs). This raises the problem of optimally allocating both servers and data center networks to multiple VDCs in order to maximize the total revenue, while minimizing the total energy consumption in the data center. However, despite recent studies on this problem, none of the existing solutions have considered the possibility of using VM migration to dynamically adjust the resource allocation, in order to meet the fluctuating resource demand of VDCs. In this paper, we propose VDC Planner, a migration-aware dynamic virtual data center embedding framework that aims at achieving high revenue while minimizing the total energy cost over-time. Our framework supports various usage scenarios, including VDC embedding, VDC scaling as well as dynamic VDC consolidation. Through experiments using realistic workload traces, we show our proposed approach achieves both higher revenue and lower average scheduling delay compared to existing migration-oblivious solutions.

Journal ArticleDOI
TL;DR: This work studies timely, cost-minimizing upload of massive, dynamically-generated, geo-dispersed data into the cloud, for processing using a MapReduce-like framework, and proposes two online algorithms: an online lazy migration (OLM) algorithm and a randomized fixed horizon control (RFHC) algorithm.
Abstract: Cloud computing, rapidly emerging as a new computation paradigm, provides agile and scalable resource access in a utility-like fashion, especially for the processing of big data. An important open issue here is to efficiently move the data, from different geographical locations over time, into a cloud for effective processing. The de facto approach of hard drive shipping is not flexible or secure. This work studies timely, cost-minimizing upload of massive, dynamically-generated, geo-dispersed data into the cloud, for processing using a MapReduce-like framework. Targeting at a cloud encompassing disparate data centers, we model a cost-minimizing data migration problem, and propose two online algorithms: an online lazy migration (OLM) algorithm and a randomized fixed horizon control (RFHC) algorithm , for optimizing at any given time the choice of the data center for data aggregation and processing, as well as the routes for transmitting data there. Careful comparisons among these online and offline algorithms in realistic settings are conducted through extensive experiments, which demonstrate close-to-offline-optimum performance of the online algorithms.

Journal ArticleDOI
19 Sep 2013
TL;DR: This study analyzes robustness of the state-of-the-art Data Center Network (DCN) and presents multi-layered graph modeling of various DCNs, and proposes new procedures to quantify the DCN robustness.
Abstract: Data centers being an architectural and functional block of cloud computing are integral to the Information and Communication Technology (ICT) sector. Cloud computing is rigorously utilized by various domains, such as agriculture, nuclear science, smart grids, healthcare, and search engines for research, data storage, and analysis. A Data Center Network (DCN) constitutes the communicational backbone of a data center, ascertaining the performance boundaries for cloud infrastructure. The DCN needs to be robust to failures and uncertainties to deliver the required Quality of Service (QoS) level and satisfy Service Level Agreement (SLA). In this paper, we analyze robustness of the state-of-the-art DCNs. Our major contributions are: (a) we present multi-layered graph modeling of various DCNs; (b) we study the classical robustness metrics considering various failure scenarios to perform a comparative analysis; (c) we present the inadequacy of the classical network robustness metrics to appropriately evaluate the DCN robustness; and (d) we propose new procedures to quantify the DCN robustness. Currently, there is no detailed study available centering the DCN robustness. Therefore, we believe that this study will lay a firm foundation for the future DCN robustness research.

Proceedings ArticleDOI
30 Jul 2013
TL;DR: The study on the workloads reveals that data analysis applications share many inherent characteristics, which place them in a different class from desktop, HPC, and service workloads, including traditional server workloads (SPECweb200S) and scale-out service workloadS (four among six benchmarks in CloudSuite), and accordingly the authors give several recommendations for architecture and system optimizations.
Abstract: As the amount of data explodes rapidly, more and more corporations are using data centers to make effective decisions and gain a competitive edge. Data analysis applications play a significant role in data centers, and hence it has became increasingly important to understand their behaviors in order to further improve the performance of data center computer systems. In this paper, after investigating three most important application domains in terms of page views and daily visitors, we choose eleven representative data analysis workloads and characterize their micro-architectural characteristics by using hardware performance counters, in order to understand the impacts and implications of data analysis workloads on the systems equipped with modern superscalar out-of-order processors. Our study on the workloads reveals that data analysis applications share many inherent characteristics, which place them in a different class from desktop (SPEC CPU2006), HPC (HPCC), and service workloads, including traditional server workloads (SPECweb200S) and scale-out service workloads (four among six benchmarks in CloudSuite), and accordingly we give several recommendations for architecture and system optimizations. On the basis of our workload characterization work, we released a benchmark suite named DCBench for typical datacenter workloads, including data analysis and service workloads, with an open-source license on our project home page on http://prof.ict.ac.cnIDCBench. We hope that DCBench is helpful for performing architecture and small-to-medium scale system researches for datacenter computing.

Journal ArticleDOI
01 Jan 2013
TL;DR: Greenhead, a holistic resource management framework for embedding VDCs across geographically distributed data centers connected through a backbone network, is proposed to maximize the cloud provider's revenue while ensuring that the infrastructure is as environment-friendly as possible.
Abstract: Cloud computing promises to provide on-demand computing, storage, and networking resources. However, most cloud providers simply offer virtual machines (VMs) without bandwidth and delay guarantees, which may hurt the performance of the deployed services. Recently, some proposals suggested remediating such limitation by offering virtual data centers (VDCs) instead of VMs only. However, they have only considered the case where VDCs are embedded within a single data center. In practice, infrastructure providers should have the ability to provision requested VDCs across their distributed infrastructure to achieve multiple goals including revenue maximization, operational costs reduction, energy efficiency, and green IT, or to simply satisfy geographic location constraints of the VDCs. In this paper, we propose Greenhead, a holistic resource management framework for embedding VDCs across geographically distributed data centers connected through a backbone network. The goal of Greenhead is to maximize the cloud provider's revenue while ensuring that the infrastructure is as environment-friendly as possible. To evaluate the effectiveness of our proposal, we conducted extensive simulations of four data centers connected through the NSFNet topology. Results show that Greenhead improves requests' acceptance ratio and revenue by up to 40 percent while ensuring high usage of renewable energy and minimal carbon footprint.

Patent
01 Nov 2013
TL;DR: In this article, the authors propose a CND load balancing in the cloud, where server resources are allocated at an edge data center of a content delivery network to properties that are being serviced by edge data centers.
Abstract: CND load balancing in the cloud. Server resources are allocated at an edge data center of a content delivery network to properties that are being serviced by edge data center. Based on near real-time data, properties are sorted by trending traffic at the edge data center. Server resources are allocated for at least one property of the sorted properties at the edge data center. The server resources are allocated based on rules developed from long-term trends. The resource allocation includes calculating server needs for the property in a partition at the edge data center, and allocating the server needs for the property to available servers in the partition.

Journal ArticleDOI
TL;DR: This paper has implemented and simulated the state of the art DCN models in this paper, namely: (a) legacy DCN architecture, (b) switch‐based, and (c) hybrid models, and compared their effectiveness by monitoring the network: throughput and average packet delay.
Abstract: Data centers are experiencing a remarkable growth in the number of interconnected servers. Being one of the foremost data center design concerns, network infrastructure plays a pivotal role in the initial capital investment and ascertaining the performance parameters for the data center. Legacy data center network (DCN) infrastructure lacks the inherent capability to meet the data centers growth trend and aggregate bandwidth demands. Deployment of even the highest-end enterprise network equipment only delivers around 50% of the aggregate bandwidth at the edge of network. The vital challenges faced by the legacy DCN architecture trigger the need for new DCN architectures, to accommodate the growing demands of the ‘cloud computing’ paradigm. We have implemented and simulated the state of the art DCN models in this paper, namely: (a) legacy DCN architecture, (b) switch-based, and (c) hybrid models, and compared their effectiveness by monitoring the network: (a) throughput and (b) average packet delay. The presented analysis may be perceived as a background benchmarking study for the further research on the simulation and implementation of the DCN-customized topologies and customized addressing protocols in the large-scale data centers. We have performed extensive simulations under various network traffic patterns to ascertain the strengths and inadequacies of the different DCN architectures. Moreover, we provide a firm foundation for further research and enhancement in DCN architectures. Copyright ? 2012 John Wiley & Sons, Ltd.

Proceedings ArticleDOI
01 Oct 2013
TL;DR: This paper characterize Google applications, based on a one-month Google trace with over 650k jobs running across over 12,000 heterogeneous hosts from a Google data center, via a K-means clustering algorithm with optimized number of sets,based on task events and resource usage.
Abstract: In this paper, we characterize Google applications, based on a one-month Google trace with over 650k jobs running across over 12000 heterogeneous hosts from a Google data center. On one hand, we carefully compute the valuable statistics about task events and resource utilization for Google applications, based on various types of resources (such as CPU, memory) and execution types (e.g., whether they can run batch tasks or not). Resource utilization per application is observed with an extremely typical Pareto principle. On the other hand, we classify applications via a K-means clustering algorithm with optimized number of sets, based on task events and resource usage. The number of applications in the K-means clustering sets follows a Pareto-similar distribution. We believe our work is very interesting and valuable for the further investigation of Cloud environment.

Proceedings ArticleDOI
30 Sep 2013
TL;DR: An adaptive anomaly identification mechanism that explores the most relevant principal components of different failure types in cloud computing infrastructures and integrates the cloud performance metric analysis with filtering techniques to achieve automated, efficient, and accurate anomaly identification.
Abstract: Cloud computing has become increasingly popular by obviating the need for users to own and maintain complex computing infrastructures. However, due to their inherent complexity and large scale, production cloud computing systems are prone to various runtime problems caused by hardware and software faults and environmental factors. Autonomic anomaly detection is a crucial technique for understanding emergent, cloud-wide phenomena and self-managing cloud resources for system-level dependability assurance. To detect anomalous cloud behaviors, we need to monitor the cloud execution and collect runtime cloud performance data. These data consist of values of performance metrics for different types of failures, which display different correlations with the performance metrics. In this paper, we present an adaptive anomaly identification mechanism that explores the most relevant principal components of different failure types in cloud computing infrastructures. It integrates the cloud performance metric analysis with filtering techniques to achieve automated, efficient, and accurate anomaly identification. The proposed mechanism adapts itself by recursively learning from the newly verified detection results to refine future detections. We have implemented a prototype of the anomaly identification system and conducted experiments in an on-campus cloud computing environment and by using the Google data center traces. Our experimental results show that our mechanism can achieve more efficient and accurate anomaly detection than other existing schemes.

Proceedings ArticleDOI
12 Feb 2013
TL;DR: Shroud is presented, a general storage system that hides data access patterns from the servers running it, protecting user privacy, and shows, via new techniques such as oblivious aggregation, how to securely use many inexpensive secure coprocessors acting in parallel to improve request latency.
Abstract: Recent events have shown online service providers the perils of possessing private information about users Encrypting data mitigates but does not eliminate this threat: the pattern of data accesses still reveals information Thus, we present Shroud, a general storage system that hides data access patterns from the servers running it, protecting user privacy Shroud functions as a virtual disk with a new privacy guarantee: the user can look up a block without revealing the block's address Such a virtual disk can be used for many purposes, including map lookup, microblog search, and social networkingShroud aggressively targets hiding accesses among hundreds of terabytes of data We achieve our goals by adapting oblivious RAM algorithms to enable large-scale parallelization Specifically, we show, via new techniques such as oblivious aggregation, how to securely use many inexpensive secure coprocessors acting in parallel to improve request latency Our evaluation combines large-scale emulation with an implementation on secure coprocessors and suggests that these adaptations bring private data access closer to practicality

Journal ArticleDOI
TL;DR: This article presents a survey on enabling DCN technologies for future cloud infrastructures through which the huge amount of resources in data centers can be efficiently managed.
Abstract: The increasing adoption of cloud services is demanding the deployment of more data centers. Data centers typically house a huge amount of storage and computing resources, in turn dictating better networking technologies to connect the large number of computing and storage nodes. Data center networking (DCN) is an emerging field to study networking challenges in data centers. In this article, we present a survey on enabling DCN technologies for future cloud infrastructures through which the huge amount of resources in data centers can be efficiently managed. Specifically, we start with a detailed investigation of the architecture, technologies, and design principles for future DCN. Following that, we highlight some of the design challenges and open issues that should be addressed for future DCN to improve its energy efficiency and increase its throughput while lowering its cost.

Journal ArticleDOI
25 Nov 2013
TL;DR: In this article, the authors present a planning problem and an extremely efficient tabu search heuristic for optimizing the locations of cloud data centers and software components while simultaneously finding the information routing and network link capacities.
Abstract: The ubiquity of cloud applications requires the meticulous design of cloud networks with high quality of service, low costs, and low CO2 emissions. This paper presents a planning problem and an extremely efficient tabu search heuristic for optimizing the locations of cloud data centers and software components while simultaneously finding the information routing and network link capacities. The objectives are to optimize the network performance, the CO2 emissions, the capital expenditures (CAPEX), and the operational expenditures (OPEX). The problem is modeled using a mixed-integer programming model and solved with both an optimization solver and a tabu search heuristic. A case study of a web search engine is presented to explain and optimize the different aspects, showing how planners can use the model to direct the optimization and find the best solutions. The efficiency of the tabu search algorithm is presented for networks with up to 500 access nodes and 1,000 potential data center locations distributed around the globe.

Proceedings ArticleDOI
25 Mar 2013
TL;DR: In this paper, the authors present quantitative evidence that this crucial design consideration to meet interactive performance criteria limits data center consolidation and describe an architectural solution that is a seamless extension of today's cloud computing infrastructure.
Abstract: The convergence of mobile computing and cloud computing enables new multimedia applications that are both resource-intensive and interaction-intensive. For these applications, end-to-end network bandwidth and latency matter greatly when cloud resources are used to augment the computational power and battery life of a mobile device. We first present quantitative evidence that this crucial design consideration to meet interactive performance criteria limits data center consolidation. We then describe an architectural solution that is a seamless extension of today's cloud computing infrastructure.

Proceedings Article
27 May 2013
TL;DR: A new embedding solution for data centers that, in addition to virtual machine placement, explicitly considers the relation between switches and links, allows multiple resources of the same request to be mapped to a single physical resource, and reduces resource fragmentation in terms of CPU is proposed.
Abstract: Virtualizing data center networks has been considered a feasible alternative to satisfy the requirements of advanced cloud services. Proper mapping of virtual data center (VDC) resources to their physical counterparts, also known as virtual data center embedding, can impact the revenue of cloud providers. Similar to virtual networks, the problem of mapping virtual requests to physical infrastructures is known to be NPhard. Although some proposals have come up with heuristics to cope with the complexity of the embedding process focusing on virtual machine placement, these solutions ignore the correlation among other data resources, such as switches and storage. In this paper, we propose a new embedding solution for data centers that, in addition to virtual machine placement, explicitly considers the relation between switches and links, allows multiple resources of the same request to be mapped to a single physical resource, and reduces resource fragmentation in terms of CPU. Simulations show that our solution results in high acceptance ratio of VDC requests, improves utilization of the physical substrate, and generates increased revenue for infrastructure providers.