scispace - formally typeset
Search or ask a question

Showing papers on "Cloud computing published in 2008"


Proceedings ArticleDOI
01 Nov 2008
TL;DR: In this article, the authors compare and contrast cloud computing with grid computing from various angles and give insights into the essential characteristics of both the two technologies, and compare the advantages of grid computing and cloud computing.
Abstract: Cloud computing has become another buzzword after Web 2.0. However, there are dozens of different definitions for cloud computing and there seems to be no consensus on what a cloud is. On the other hand, cloud computing is not a completely new concept; it has intricate connection to the relatively new but thirteen-year established grid computing paradigm, and other relevant technologies such as utility computing, cluster computing, and distributed systems in general. This paper strives to compare and contrast cloud computing with grid computing from various angles and give insights into the essential characteristics of both.

3,132 citations


Journal ArticleDOI
31 Dec 2008
TL;DR: The concept of Cloud Computing is discussed to achieve a complete definition of what a Cloud is, using the main characteristics typically associated with this paradigm in the literature.
Abstract: This paper discusses the concept of Cloud Computing to achieve a complete definition of what a Cloud is, using the main characteristics typically associated with this paradigm in the literature. More than 20 definitions have been studied allowing for the extraction of a consensus definition as well as a minimum definition containing the essential characteristics. This paper pays much attention to the Grid paradigm, as it is often confused with Cloud technologies. We also describe the relationships and distinctions between the Grid and Cloud approaches.

2,518 citations


Journal ArticleDOI
01 Jul 2008
TL;DR: As software migrates from local PCs to distant Internet servers, users and developers alike go along for the ride.
Abstract: As software migrates from local PCs to distant Internet servers, users and developers alike go along for the ride.

2,265 citations


Book
Luiz Andre Barroso1, Urs Hoelzle1
01 Jan 2008
TL;DR: The architecture of WSCs is described, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base are described.
Abstract: As computation continues to move into the cloud, the computing platform of interest no longer resembles a pizza box or a refrigerator, but a warehouse full of computers. These new large datacenters are quite different from traditional hosting facilities of earlier times and cannot be viewed simply as a collection of co-located servers. Large portions of the hardware and software resources in these facilities must work in concert to efficiently deliver good levels of Internet service performance, something that can only be achieved by a holistic approach to their design and deployment. In other words, we must treat the datacenter itself as one massive warehouse-scale computer (WSC). We describe the architecture of WSCs, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base. We hope it will be useful to architects and programmers of today's WSCs, as well as those of future many-core platforms which may one day implement the equivalent of today's WSCs on a single board. Table of Contents: Introduction / Workloads and Software Infrastructure / Hardware Building Blocks / Datacenter Basics / Energy and Power Efficiency / Modeling Costs / Dealing with Failures and Repairs / Closing Remarks

1,938 citations


Proceedings ArticleDOI
25 Sep 2008
TL;DR: The need for convergence of competing IT paradigms for delivering the 21st century vision of computing is concluded.
Abstract: This keynote paper: presents a 21st century vision of computing; identifies various computing paradigms promising to deliver the vision of computing utilities; defines Cloud computing and provides the architecture for creating market-oriented Clouds by leveraging technologies such as VMs; provides thoughts on market-based resource management strategies that encompass both customer-driven service management and computational risk management to sustain SLA-oriented resource allocation; presents some representative Cloud platforms especially those developed in industries along with our current work towards realising market-oriented resource allocation of Clouds by leveraging the 3rd generation Aneka enterprise Grid technology; reveals our early thoughts on interconnecting Clouds for dynamically creating an atmospheric computing environment along with pointers to future community research; and concludes with the need for convergence of competing IT paradigms for delivering our 21st century vision.

1,827 citations


Proceedings ArticleDOI
08 Dec 2008
TL;DR: A new scheduling algorithm, Longest Approximate Time to End (LATE), that is highly robust to heterogeneity and can improve Hadoop response times by a factor of 2 in clusters of 200 virtual machines on EC2.
Abstract: MapReduce is emerging as an important programming model for large-scale data-parallel applications such as web indexing, data mining, and scientific simulation. Hadoop is an open-source implementation of MapReduce enjoying wide adoption and is often used for short jobs where low response time is critical. Hadoop's performance is closely tied to its task scheduler, which implicitly assumes that cluster nodes are homogeneous and tasks make progress linearly, and uses these assumptions to decide when to speculatively re-execute tasks that appear to be stragglers. In practice, the homogeneity assumptions do not always hold. An especially compelling setting where this occurs is a virtualized data center, such as Amazon's Elastic Compute Cloud (EC2). We show that Hadoop's scheduler can cause severe performance degradation in heterogeneous environments. We design a new scheduling algorithm, Longest Approximate Time to End (LATE), that is highly robust to heterogeneity. LATE can improve Hadoop response times by a factor of 2 in clusters of 200 virtual machines on EC2.

1,801 citations


Journal ArticleDOI
31 Dec 2008
TL;DR: This work examines the costs of cloud service data centers today and proposes (1) joint optimization of network and data center resources, and (2) new systems and mechanisms for geo-distributing state.
Abstract: The data centers used to create cloud services represent a significant investment in capital outlay and ongoing costs. Accordingly, we first examine the costs of cloud service data centers today. The cost breakdown reveals the importance of optimizing work completed per dollar invested. Unfortunately, the resources inside the data centers often operate at low utilization due to resource stranding and fragmentation. To attack this first problem, we propose (1) increasing network agility, and (2) providing appropriate incentives to shape resource consumption. Second, we note that cloud service providers are building out geo-distributed networks of data centers. Geo-diversity lowers latency to users and increases reliability in the presence of an outage taking out an entire site. However, without appropriate design and management, these geo-diverse data center networks can raise the cost of providing service. Moreover, leveraging geo-diversity requires services be designed to benefit from it. To attack this problem, we propose (1) joint optimization of network and data center resources, and (2) new systems and mechanisms for geo-distributing state.

1,756 citations


Proceedings ArticleDOI
TL;DR: In this article, the authors present a 21st century vision of computing, identify various computing paradigms promising to deliver the vision of cloud utilities, define cloud computing and provide the architecture for creating market-oriented clouds by leveraging technologies such as VMs.
Abstract: This keynote paper: presents a 21st century vision of computing; identifies various computing paradigms promising to deliver the vision of computing utilities; defines Cloud computing and provides the architecture for creating market-oriented Clouds by leveraging technologies such as VMs; provides thoughts on market-based resource management strategies that encompass both customer-driven service management and computational risk management to sustain SLA-oriented resource allocation; presents some representative Cloud platforms especially those developed in industries along with our current work towards realising market-oriented resource allocation of Clouds by leveraging the 3rd generation Aneka enterprise Grid technology; reveals our early thoughts on interconnecting Clouds for dynamically creating an atmospheric computing environment along with pointers to future community research; and concludes with the need for convergence of competing IT paradigms for delivering our 21st century vision.

1,437 citations


Proceedings ArticleDOI
01 Nov 2008
TL;DR: An ontology of this area is proposed which demonstrates a dissection of the cloud into five main layers, and illustrates their interrelations as well as their inter-dependency on preceding technologies.
Abstract: Progress of research efforts in a novel technology is contingent on having a rigorous organization of its knowledge domain and a comprehensive understanding of all the relevant components of this technology and their relationships. Cloud computing is one contemporary technology in which the research community has recently embarked. Manifesting itself as the descendant of several other computing research areas such as service-oriented architecture, distributed and grid computing, and virtualization, cloud computing inherits their advancements and limitations. Towards the end-goal of a thorough comprehension of the field of cloud computing, and a more rapid adoption from the scientific community, we propose in this paper an ontology of this area which demonstrates a dissection of the cloud into five main layers, and illustrates their interrelations as well as their inter-dependency on preceding technologies. The contribution of this paper lies in being one of the first attempts to establish a detailed ontology of the cloud. Better comprehension of the technology would enable the community to design more efficient portals and gateways for the cloud, and facilitate the adoption of this novel computing approach in scientific environments. In turn, this will assist the scientific community to expedite its contributions and insights into this evolving computing field.

1,014 citations


Journal ArticleDOI
30 Dec 2008
TL;DR: The concept of “ cloud” computing, some of the issues it tries to address, related research topics, and a “cloud” implementation available today are discussed.
Abstract: "Cloud" computing – a relatively recent term, builds on decades of research in virtualization, distributed computing, utility computing, and more recently networking, web and software services. It implies a service oriented architecture, reduced information technology overhead for the end-user, great flexibility, reduced total cost of ownership, on-demand services and many other things. This paper discusses the concept of “cloud” computing, some of the issues it tries to address, related research topics, and a “cloud” implementation available today.

945 citations


Proceedings Article
07 Dec 2008
TL;DR: The study reveals the energy performance trade-offs for consolidation and shows that optimal operating points exist and the challenges in finding effective solutions to the consolidation problem.
Abstract: Consolidation of applications in cloud computing environments presents a significant opportunity for energy optimization. As a first step toward enabling energy efficient consolidation, we study the inter-relationships between energy consumption, resource utilization, and performance of consolidated workloads. The study reveals the energy performance trade-offs for consolidation and shows that optimal operating points exist. We model the consolidation problem as a modified bin packing problem and illustrate it with an example. Finally, we outline the challenges in finding effective solutions to the consolidation problem.

Proceedings ArticleDOI
15 Nov 2008
TL;DR: Using the Amazon cloud fee structure and a real-life astronomy application, the cost performance tradeoffs of different execution and resource provisioning plans are studied and it is shown that by provisioning the right amount of storage and compute resources, cost can be significantly reduced with no significant impact on application performance.
Abstract: Utility grids such as the Amazon EC2 cloud and Amazon S3 offer computational and storage resources that can be used on-demand for a fee by compute and data-intensive applications. The cost of running an application on such a cloud depends on the compute, storage and communication resources it will provision and consume. Different execution plans of the same application may result in significantly different costs. Using the Amazon cloud fee structure and a real-life astronomy application, we study via simulation the cost performance tradeoffs of different execution and resource provisioning plans. We also study these trade-offs in the context of the storage and communication fees of Amazon S3 when used for long-term application data archival. Our results show that by provisioning the right amount of storage and compute resources, cost can be significantly reduced with no significant impact on application performance.

Proceedings ArticleDOI
23 Jun 2008
TL;DR: This paper discusses the concept of ldquocloudrdquo computing, issues it tries to address, related research topics, and a ldquistocloud thirdquo implementation available today.
Abstract: ldquoCloudrdquo computing - a relatively recent term, builds on decades of research in virtualization, distributed computing, utility computing, and more recently networking, web and software services. It implies a service oriented architecture, reduced information technology overhead for the end-user, great flexibility, reduced total cost of ownership, on-demand services and many other things. This paper discusses the concept of ldquocloudrdquo computing, issues it tries to address, related research topics, and a ldquocloudrdquo implementation available today.

Proceedings ArticleDOI
25 Sep 2008
TL;DR: This paper reviews recent advances of Cloud computing, identifies the concepts and characters of scientific Clouds, and finally presents an example of scientific Cloud for data centers.
Abstract: Cloud computing emerges as a new computing paradigm which aims to provide reliable, customized and QoS guaranteed computing dynamic environments for end-users. This paper reviews recent advances of Cloud computing, identifies the concepts and characters of scientific Clouds, and finally presents an example of scientific Cloud for data centers

Patent
12 Aug 2008
TL;DR: In this article, the authors present a multi-cloud management module having a plurality of cloud adapters, each cloud adapter converts non-cloud-specific commands to cloud-specific provisioning commands for the cloud to which the cloud adapter is associated.
Abstract: In one embodiment the present invention includes a multi-cloud management module having a plurality of cloud adapters. The multi-cloud management module provides a unified administrative interface for provisioning cloud-based resources on any one of several clouds for which a cloud adapter is configured for use with the multi-cloud management module. Each cloud adapter converts non-cloud-specific commands to cloud-specific provisioning commands for the cloud to which the cloud adapter is associated.

Book
11 Aug 2008
TL;DR: This book teaches how to use web-based applications to collaborate on reports and presentations, share online calendars and to-do lists, manage large projects, and edit and store digital photographs.
Abstract: Cloud Computing: Web-Based Applications That Change the Way You Work and Collaborate On-Line Computing as you know it has changed. No longer are you tied to using expensive programs stored on your computer. No longer will you be able to only access your data from one computer. No longer will you be tied to doing work only from your work computer or playing only from your personal computer. Enter cloud computingan exciting new way to work with programs and data, collaborate with friends and family, share ideas with coworkers and friends, and most of all, be more productive! The cloud consists of thousands of computers and servers, all linked and accessible to you via the Internet. With cloud computing, everything you do is now web-based instead of being desktop-based; you can access all your programs and documents from any computer thats connected to the Internet. Whether you want to share photographs with your family, coordinate volunteers for a community organization, or manage a multi-faceted project in a large organization, cloud computing can help you do it more easily than ever before. Trust us. If you need to collaborate, cloud computing is the way to do it. Learn what cloud computing is, how it works, who should use it, and why its the wave of the future. Explore the practical benefits of cloud computing, from saving money on expensive programs to accessing your documents ANYWHERE. See just how easy it is to manage work and personal schedules, share documents with coworkers and friends, edit digital photos, and much more! Learn how to use web-based applications to collaborate on reports and presentations, share online calendars and to-do lists, manage largeprojects, and edit and store digital photographs. Michael Miller is known for his casual, easy-to-read writing style and his ability to explain a wide variety of complex topics to an everyday audience. Mr. Miller has written more than 80 nonfiction books over the past two decades, with more than a million copies in print. His books for Que include Absolute Beginners Guide to Computer Basics, Googlepedia: The Ultimate Google Resource, and Is It Safe?: Protecting Your Computer, Your Business, and Yourself Online. His website is located at www.molehillgroup.com. Covers the most popular cloud-based applications, including the following: Adobe Photoshop Express Apple MobileMe Glide OS Google Docs Microsoft Office Live Workspace Zoho Office CATEGORY: Web Applications COVERS: Cloud Computing USER LEVEL: Beginner-Intermediate

Patent
James Michael Ferris1
28 May 2008
TL;DR: In this paper, a cloud management system can be configured to monitor and allocate resources of a cloud computing environment, such that the current resource usage and available resources of the cloud in order to allocate resources to the requested virtual machine.
Abstract: A cloud management system can be configured to monitor and allocate resources of a cloud computing environment. The cloud management system can be configured to receive a request to instantiate a virtual machine. In order to instantiate the virtual machine, the cloud management system can be configured to determine the current resource usage and available resources of the cloud in order to allocate resources to the requested virtual machine. The cloud management system can be configured to scale the resources of the cloud in the event that resources are not available for a requested virtual machine.

Proceedings ArticleDOI
07 Dec 2008
TL;DR: The results show that for Montage, a workflow with short job runtimes, the virtual environment can provide good compute time performance but it can suffer from resource scheduling delays and widearea communications.
Abstract: This paper explores the use of cloud computing for scientific workflows, focusing on a widely used astronomy application-Montage. The approach is to evaluate from the point of view of a scientific workflow the tradeoffs between running in a local environment, if such is available, and running in a virtual environment via remote, wide-area network resource access. Our results show that for Montage, a workflow with short job runtimes, the virtual environment can provide good compute time performance but it can suffer from resource scheduling delays and widearea communications.

Proceedings ArticleDOI
15 Nov 2008
TL;DR: The design of an agile data center with integrated server and storage virtualization technologies is described and a novel load balancing algorithm called VectorDot is proposed for handling the hierarchical and multi-dimensional resource constraints in such systems.
Abstract: We describe the design of an agile data center with integrated server and storage virtualization technologies. Such data centers form a key building block for new cloud computing architectures. We also show how to leverage this integrated agility for non-disruptive load balancing in data centers across multiple resource layers - servers, switches, and storage. We propose a novel load balancing algorithm called VectorDot for handling the hierarchical and multi-dimensional resource constraints in such systems. The algorithm, inspired by the successful Toyoda method for multi-dimensional knapsacks, is the first of its kind.We evaluate our system on a range of synthetic and real data center testbeds comprising of VMware ESX servers, IBM SAN Volume Controller, Cisco and Brocade switches. Experiments under varied conditions demonstrate the end-to-end validity of our system and the ability of VectorDot to efficiently remove overloads on server, switch and storage nodes.

Journal ArticleDOI
TL;DR: An experiment, giving participants the option of using a tag cloud or a traditional search interface to answer various questions, found that where the information-seeking task required specific information, participants preferred the search interface.
Abstract: The weighted list, known popularly as a `tag cloud', has appeared on many popular folksonomy-based web-sites. Flickr, Delicious, Technorati and many others have all featured a tag cloud at some point in their history. However, it is unclear whether the tag cloud is actually useful as an aid to finding information. We conducted an experiment, giving participants the option of using a tag cloud or a traditional search interface to answer various questions. We found that where the information-seeking task required specific information, participants preferred the search interface. Conversely, where the information-seeking task was more general, participants preferred the tag cloud. While the tag cloud is not without value, it is not sufficient as the sole means of navigation for a folksonomy-based dataset.

Journal ArticleDOI
Werner Vogels1
TL;DR: At the foundation of Amazon’s cloud computing are infrastructure services such as Amazon's S3 (Simple Storage Service), SimpleDB, and EC2 (Elastic Compute Cloud) that provide the resources for constructing Internet-scale computing platforms and a great variety of applications.
Abstract: At the foundation of Amazon’s cloud computing are infrastructure services such as Amazon’s S3 (Simple Storage Service), SimpleDB, and EC2 (Elastic Compute Cloud) that provide the resources for constructing Internet-scale computing platforms and a great variety of applications. The requirements placed on these infrastructure services are very strict; they need to score high marks in the areas of security, scalability, availability, performance, and cost effectiveness, and they need to meet these requirements while serving millions of customers around the globe, continuously.

Proceedings ArticleDOI
07 Dec 2008
TL;DR: The proposed approach uses the MapReduce paradigm to parallelize tools and manage their execution, machine virtualization to encapsulate their execution environments and commonly used data sets into flexibly deployable virtual machines, and networkvirtualization to connect resources behind firewalls/NATs while preserving the necessary performance and the communication environment.
Abstract: This paper proposes and evaluates an approach to the parallelization, deployment and management of bioinformatics applications that integrates several emerging technologies for distributed computing. The proposed approach uses the MapReduce paradigm to parallelize tools and manage their execution, machine virtualization to encapsulate their execution environments and commonly used data sets into flexibly deployable virtual machines, and network virtualization to connect resources behind firewalls/NATs while preserving the necessary performance and the communication environment. An implementation of this approach is described and used to demonstrate and evaluate the proposed approach. The implementation integrates Hadoop, Virtual Workspaces, and ViNe as the MapReduce, virtual machine and virtual network technologies, respectively, to deploy the commonly used bioinformatics tool NCBI BLAST on a WAN-based test bed consisting of clusters at two distinct locations, the University of Florida and the University of Chicago. This WAN-based implementation, called CloudBLAST, was evaluated against both non-virtualized and LAN-based implementations in order to assess the overheads of machine and network virtualization, which were shown to be insignificant. To compare the proposed approach against an MPI-based solution, CloudBLAST performance was experimentally contrasted against the publicly available mpiBLAST on the same WAN-based test bed. Both versions demonstrated performance gains as the number of available processors increased, with CloudBLAST delivering speedups of 57 against 52.4 of MPI version, when 64 processors on 2 sites were used. The results encourage the use of the proposed approach for the execution of large-scale bioinformatics applications on emerging distributed environments that provide access to computing resources as a service.

Proceedings ArticleDOI
Deelman, Singh, Livny, Berriman, Good 
01 Jan 2008

Patent
James Michael Ferris1
22 Aug 2008
TL;DR: In this paper, a cloud marketplace system can be configured to determine the resource and service data for the cloud computing environments and select a set of resource servers for instantiating virtual machines based specifications of the virtual machines and parameters of the instantiation.
Abstract: A cloud marketplace system can be configured to communicate with multiple cloud computing environments in order to ascertain the details for the resources and services provided by the cloud computing environments for optimizing resources utilized by virtual machines. The cloud marketplace system can be configured to determine the resource and service data for the cloud computing environments and select a set of resource servers for instantiating the virtual machines based specifications of the virtual machines and parameters of the instantiation. The cloud marketplace system can be configured to periodically monitor the cloud's resources and migrate the virtual machines if resources become available that more closely match the parameters of the virtual machines.

01 Jan 2008
TL;DR: Examining the performance of Amazon EC2 for high-performance scientific applications shows a significant performance gap that system builders, computational scientist, and commercial cloud computing vendors need to be aware of.
Abstract: How effective are commercial cloud computers for high-performance scientific computing compared to currently avaible alternatives? I aim to answer a specific instance of this question by examining the performance of Amazon EC2 for high-performance scientific applications. I used macro and micro benchmarks to study the performance of a cluster composed of EC2 high-CPU compute nodes and compared this against the performance of a cluster composed of equivalent processors avaible to the open scientific research community. My results show a significant performance gap in the examined clusters that system builders, computational scientist, and commercial cloud computing vendors need to be aware of .

Proceedings ArticleDOI
09 Jun 2008
TL;DR: The purpose of this paper is to demonstrate the opportunities and limitations of using S3 as a storage system for general-purpose database applications which involve small objects and frequent updates.
Abstract: There has been a great deal of hype about Amazon's simple storage service (S3). S3 provides infinite scalability and high availability at low cost. Currently, S3 is used mostly to store multi-media documents (videos, photos, audio) which are shared by a community of people and rarely updated. The purpose of this paper is to demonstrate the opportunities and limitations of using S3 as a storage system for general-purpose database applications which involve small objects and frequent updates. Read, write, and commit protocols are presented. Furthermore, the cost ($), performance, and consistency properties of such a storage system are studied.

01 Jan 2008
TL;DR: The Science Clouds provide EC2-style cycles to scientific projects and an early summary of its experiences is provided.
Abstract: The Science Clouds provide EC2-style cycles to scientific projects. This document contains a description of technologies enabling this project and an early summary of its experiences.

Journal ArticleDOI
TL;DR: The nature and potential of cloud computing, the policy issues raised, and research questions related to cloud computing and policy are examined as a part of larger issues of public policy attempting to respond to rapid technological evolution.
Abstract: Cloud computing is a computing platform that resides in a large data center and is able to dynamically provide servers with the ability to address a wide range of needs, from scientific research to e-commerce. The provision of computing resources as if it were a utility such as electricity, while potentially revolutionary as a computing service, presents many major problems of information policy, including issues of privacy, security, reliability, access, and regulation. This article explores the nature and potential of cloud computing, the policy issues raised, and research questions related to cloud computing and policy. Ultimately, the policy issues raised by cloud computing are examined as a part of larger issues of public policy attempting to respond to rapid technological evolution.

Patent
31 Dec 2008
TL;DR: In this paper, the authors present a computer-implemented method comprising specifying configuration information for creating one or more software servers as images on a cloud computing system, specifying a processing load threshold, and continuously monitoring the processing load on one or multiple software servers.
Abstract: In one embodiment the present invention includes a computer-implemented method comprising specifying configuration information for creating one or more software servers as images on a cloud computing system, specifying a processing load threshold, and continuously monitoring a processing load on one or more software servers. If the monitored load exceeds the processing load threshold, a request to the cloud computing system may be generated to instantiate an instance of one of said images. The method further includes creating a server instance on the cloud in response to the request, distributing the processing load across the one or more servers and the server instance, and monitoring the processing load on the one or more servers and the server instance.

Proceedings ArticleDOI
09 Dec 2008
TL;DR: This paper compares cloud computing with service computing and pervasive computing based on the classic model of computer architecture, and draws up a series of research questions in cloud computing for future exploration.
Abstract: Cloud computing is an emerging computing paradigm. It aims to share data, calculations, and services transparently among users of a massive grid. Although the industry has started selling cloud-computing products, research challenges in various areas, such as UI design, task decomposition, task distribution, and task coordination, are still unclear. Therefore, we study the methods to reason and model cloud computing as a step toward identifying fundamental research questions in this paradigm. In this paper, we compare cloud computing with service computing and pervasive computing. Both the industry and research community have actively examined these three computing paradigms. We draw a qualitative comparison among them based on the classic model of computer architecture. We finally evaluate the comparison results and draw up a series of research questions in cloud computing for future exploration.