scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Toward Secure and Dependable Storage Services in Cloud Computing

Cong Wang1, Qian Wang1, Kui Ren1, Ning Cao, Wenjing Lou 
01 Jan 2012-IEEE Transactions on Services Computing (IEEE Computer Society)-Vol. 5, Iss: 2, pp 220-232
TL;DR: This paper proposes a flexible distributed storage integrity auditing mechanism, utilizing the homomorphic token and distributed erasure-coded data, which is highly efficient and resilient against Byzantine failure, malicious data modification attack, and even server colluding attacks.
Abstract: Cloud storage enables users to remotely store their data and enjoy the on-demand high quality cloud applications without the burden of local hardware and software management. Though the benefits are clear, such a service is also relinquishing users' physical possession of their outsourced data, which inevitably poses new security risks toward the correctness of the data in cloud. In order to address this new problem and further achieve a secure and dependable cloud storage service, we propose in this paper a flexible distributed storage integrity auditing mechanism, utilizing the homomorphic token and distributed erasure-coded data. The proposed design allows users to audit the cloud storage with very lightweight communication and computation cost. The auditing result not only ensures strong cloud storage correctness guarantee, but also simultaneously achieves fast data error localization, i.e., the identification of misbehaving server. Considering the cloud data are dynamic in nature, the proposed design further supports secure and efficient dynamic operations on outsourced data, including block modification, deletion, and append. Analysis shows the proposed scheme is highly efficient and resilient against Byzantine failure, malicious data modification attack, and even server colluding attacks.

Summary (7 min read)

1 INTRODUCTION

  • SEVERAL trends are opening up the era of cloud comput-ing, which is an Internet-based development and use of computer technology.
  • As a result, users are at the mercy of their cloud service providers (CSP) for the availability and integrity of their data [3], [4].
  • As an complementary approach, researchers have also proposed distributed protocols [23], [24], [25] for ensuring storage correctness across multiple servers or peers.
  • Section 5 gives the security analysis and performance evaluations, followed by Section 6 which overviews the related work.

2.1 System Model

  • A representative network architecture for cloud storage service architecture is illustrated in Fig.
  • In cloud data storage, a user stores his data through a CSP into a set of cloud servers, which are running in a simultaneous, cooperated, and distributed manner.
  • The most general forms of these operations the authors are considering are block update, delete, insert, and append.
  • As users no longer possess their data locally, it is of critical importance to ensure users that their data are being correctly stored and maintained.
  • In case that users do not necessarily have the time, feasibility or resources to monitor their data online, they can delegate the data auditing tasks to an optional trusted TPA of their respective choices.

2.2 Adversary Model

  • From user’s perspective, the adversary model has to capture all kinds of threats toward his cloud data integrity.
  • Because cloud data do not reside at user’s local site but at CSP’s address domain, these threats can come from two different sources: internal and external attacks.
  • Not only does it desire to move data that has not been or is rarely accessed to a lower tier of storage than agreed for monetary reasons, but it may also attempt to hide a data loss incident due to management errors, Byzantine failures, and so on.
  • Therefore, the authors consider the adversary in their model has the following capabilities, which captures both external and internal threats toward the cloud data integrity.
  • Specifically, the adversary is interested in continuously corrupting the user’s data files stored on individual servers.

2.3 Design Goals

  • To ensure the security and dependability for cloud data storage under the aforementioned adversary model, the authors aim to design efficient mechanisms for dynamic data verification and operation and achieve the following goals:.
  • To ensure users that their data are indeed stored appropriately and kept intact all the time in the cloud, also known as 1. Storage correctness.
  • Fast localization of data error: to effectively locate the malfunctioning server when data corruption has been detected.
  • To enhance data availability against Byzantine failures, malicious data modification and server colluding attacks, i.e., minimizing the effect brought by data errors or server failures, also known as 4. Dependability.
  • To enable users to perform storage correctness checks with minimum overhead, also known as 5. Lightweight.

2.4 Notation and Preliminaries

  • A—The dispersal matrix used for Reed-Solomon coding. .
  • F0; 1glog2ð‘Þ. . ver—a version number bound with the index for individual blocks, which records the times the block has been modified.
  • . sverij —the seed for PRF, which depends on the file name, block index i, the server position j as well as the optional block version number ver.

3 ENSURING CLOUD DATA STORAGE

  • In cloud data storage system, users store their data in the cloud and no longer possess the data locally.
  • Thus, the correctness and availability of the data files being stored on the distributed cloud servers must be guaranteed.
  • Besides, in the distributed case when such inconsistencies are successfully detected, to find which server the data error lies in is also of great significance, since it can always be the first step to fast recover the storage errors and/or identifying potential threats of external attacks.
  • Finally, the authors describe how to extend their scheme to third party auditing with only slight modification of the main design.

3.1 File Distribution Preparation

  • It is well known that erasure-correcting code may be used to tolerate multiple failures in distributed storage systems.
  • For support of efficient sequential I/O to the original file, their file layout is systematic, i.e., the unmodified m data file vectors together with k parity vectors is distributed across mþ k different servers.
  • All these blocks are elements of GF ð2pÞ.

3.2 Challenge Token Precomputation

  • In order to achieve assurance of data storage correctness and data error localization simultaneously, their scheme entirely relies on the precomputed verification tokens.
  • Later, when the user wants to make sure the storage correctness for the data in the cloud, he challenges the cloud servers with a set of randomly generated block indices.
  • In their case here, the user stores them locally to obviate the need for encryption and lower the bandwidth overhead during dynamic data operation which will be discussed shortly.
  • The details of token generation are shown in Algorithm 1. Algorithm 1. Token Precomputation.
  • The authors will discuss the necessity of using blinded parities in detail in Section 5.2.

3.3 Correctness Verification and Error Localization

  • Error localization is a key prerequisite for eliminating errors in storage systems.
  • Many previous schemes [23], [24] do not explicitly consider the problem of data error localization, thus only providing binary results for the storage verification.
  • Specifically, the procedure of the ith challenge-response for a cross-check over the n servers is described as follows: 1. The user reveals the i as well as the ith permutation key kðiÞprp to each servers.
  • Otherwise, it indicates that among those specified rows, there exist file block corruptions.
  • That is, their approach can identify any number of misbehaving servers for b ðmþ kÞ.

3.4 File Retrieval and Error Recovery

  • Since their layout of file matrix is systematic, the user can reconstruct the original file by downloading the data vectors from the first m servers, assuming that they return the correct response values.
  • Notice that their verification scheme is based on random spot-checking, so the storage correctness assurance is a probabilistic one.
  • By choosing system parameters ðe:g:; r; l; tÞ appropriately and conducting enough times of verification, one can guarantee the successful file retrieval with high probability.the authors.
  • On the other hand, whenever the data corruption is detected, the comparison of precomputed tokens and received response values can guarantee the identification of misbehaving server(s) (again with high probability), which will be discussed shortly.
  • Assume the block corruptions have been detected among % the specified r rows; % Assume s k servers have been identified misbehaving 2: Download r rows of blocks from servers; 3: Treat s servers as erasures and recover the blocks.

3.5 Toward Third Party Auditing

  • As discussed in their architecture, in case the user does not have the time, feasibility, or resources to perform the storage correctness verification, he can optionally delegate this task to an independent third-party auditor, making the cloud storage publicly verifiable.
  • As pointed out by the recent work [30], [31], to securely introduce an effective TPA, the auditing process should bring in no new vulnerabilities toward user data privacy.
  • Now the authors show that with only slight modification, their protocol can support privacy-preserving third party auditing.
  • The new design is based on the observation of linear property of the parity vector blinding process.
  • Thus, the overall computation overhead and communication overhead remains roughly the same.

4 PROVIDING DYNAMIC DATA OPERATION SUPPORT

  • This model may fit some application scenarios, such as libraries and scientific data sets.
  • Therefore, it is crucial to consider the dynamic case, where a user may wish to perform various block-level operations of update, delete, and append to modify the data file while maintaining the storage correctness assurance.
  • On the other hand, users need to ensure that all the dynamic data operation request has been faithfully processed by CSP.
  • Only with the accordingly changed storage verification tokens, the previously discussed challenge-response protocol can be carried on successfully even after data dynamics.

4.1 Update Operation

  • In cloud data storage, a user may need to modify some data block(s) stored in the cloud, from its current value fij to a new one, fij þ fij.
  • Fig. 2 gives the high level logical representation of data block update.
  • In other words, for all the unused tokens, the user needs to exclude every occurrence of the old data block and replace it with the new one.
  • Thanks to the homomorphic construction of their verification token, the user can perform the token update efficiently.
  • Note that by using the new seed sverij for the PRF functions every time (for a block update operation), the authors can ensure the freshness of the random value embedded into parity blocks.

4.2 Delete Operation

  • Sometimes, after being stored in the cloud, certain data blocks may need to be deleted.
  • The delete operation the authors are considering is a general one, in which user replaces the data block with zero or some special reserved data symbol.
  • From this point of view, the delete operation is actually a special case of the data update operation, where the original data blocks can be replaced with zeros or some predetermined special blocks.
  • In practice, it is possible that only a fraction of tokens need amendment, since the updated blocks may not be covered by all the tokens.
  • Also, all the affected tokens have to be modified and the updated parity information has to be blinded using the same method specified in an update operation.

4.3 Append Operation

  • In some cases, the user may want to increase the size of his stored data by adding blocks at the end of the data file, which the authors refer as data append.
  • This idea of supporting block append was first suggested by Ateniese et al. [14] in a single server setting, and it relies on both the initial budget for the maximum anticipated data size lmax in each encoded data vector and the system parameter rmax ¼ dr ðlmax=lÞe for each precomputed challenge-response token.
  • Because the cloud servers and the user have the agreement on the number of existing blocks in each vectorGðjÞ, servers will follow exactly the above procedure when recomputing the token values upon receiving user’s challenge request.
  • Now when the user is ready to append new blocks, i.e., both the file blocks and the corresponding parity blocks are generated, the total length of each vector GðjÞ will be increased and fall into the range ½l; lmax .
  • The parity blinding is similar as introduced in update operation, and thus is omitted here.

4.4 Insert Operation

  • An insert operation to the data file refers to an append operation at the desired index position while maintaining the same data block structure for the whole data file, i.e., inserting a block F ½j corresponds to shifting all blocks starting with index jþ 1 by one slot.
  • Thus, an insert operation may affect many rows in the logical data file matrix F, and a substantial number of computations are required to renumber all the subsequent blocks as well as recompute the challenge-response tokens.
  • In order to fully support block insertion operation, recent work [15], [16] suggests utilizing additional data structure (for example, Merkle Hash Tree [32]) to maintain and enforce the block index information.
  • On the other hand, using the block index mapping information, the user can still access or retrieve the file as it is.
  • Note that as a tradeoff, the extra data structure information has to be maintained locally on the user side.

5 SECURITY ANALYSIS AND PERFORMANCE EVALUATION

  • The authors analyze their proposed scheme in terms of correctness, security, and efficiency.
  • The authors security analysis focuses on the adversary model defined in Section 2.
  • The authors also evaluate the efficiency of their scheme via implementation of both file distribution preparation and verification token precomputation.

5.1 Correctness Analysis

  • First, the authors analyze the correctness of the verification procedure.
  • Thus, it is clear to show that as long as each server operates on the same specified subset of rows, the above checking equation will always hold.

5.2.1 Detection Probability against Data Modification

  • In their scheme, servers are required to operate only on specified rows in each challenge-response protocol execution.
  • Suppose nc servers are misbehaving due to the possible compromise or Byzantine failure.
  • The authors will leave the explanation on collusion resistance of their scheme against this worst case scenario in a later section.
  • Next, the authors study the probability of a false negative result that there exists at least one invalid response calculated from those specified r rows, but the checking equation still holds.
  • From the figure the authors can see that if more than a fraction of the data file is corrupted, then it suffices to challenge for a small constant number of rows in order to achieve detection with high probability.

5.2.2 Identification Probability for Misbehaving Servers

  • The authors have shown that, if the adversary modifies the data blocks among any of the data storage servers, their sampling checking scheme can successfully detect the attack with high probability.
  • As long as the data modification is caught, the user will further determine which server is malfunctioning.
  • It is the product of the matching probability for sampling check and the probability of complementary event for the false negative result.
  • Note that if the number of detected misbehaving servers is less than the parity vectors, the authors can use erasure-correcting code to recover the corrupted data, achieving storage dependability as shown at Section 3.4 and Algorithm 3.

5.2.3 Security Strength against Worst Case Scenario

  • The authors now explain why it is a must to blind the parity blocks and how their proposed schemes achieve collusion resistance against the worst case scenario in the adversary model.
  • Recall that in the file distribution preparation, the redundancy parity vectors are calculated via multiplying the file matrix F by P, where P is the secret parity generation matrix the authors later rely on for storage correctness assurance.
  • Once they have the knowledge of P, those malicious servers can consequently modify any part of the data blocks and calculate the corresponding parity blocks, and vice versa, making their codeword relationship always consistent.
  • Therefore, their storage correctness challenge scheme would be undermined—even if those modified blocks are covered by the specified rows, the storage correctness check equation would always hold.
  • By blinding each parity block with random perturbation, the malicious servers no longer have all the necessary information to build up the correct linear equation groups and therefore cannot derive the secret matrix P.

5.3 Performance Evaluation

  • The authors now assess the performance of the proposed storage auditing scheme.
  • Algorithms are implemented using open-source erasure coding library Jerasure [34] written in C.
  • All results represent the mean of 20 trials.

5.3.1 File Distribution Preparation

  • As discussed, file distribution preparation includes the generation of parity vectors (the encoding part) as well as the corresponding parity blinding part.
  • The authors consider two sets of different parameters for the ðm; kÞ Reed-Solomon encoding, both of which work over GF ð216Þ.
  • Fig. 4 shows the total cost for preparing a 1 GB file before outsourcing.
  • From the figure, the authors can see the number k is the dominant factor for the cost of both parity generation and parity blinding.
  • For the same reason, the two-layer coding structure makes the solution in [23] more suitable for static data only, as any change to the contents of file F must propagate through the two-layer error-correcting code, which entails both high communication and computation complexity.

5.3.2 Challenge Token Computation

  • When t is selected to be 7,300 and 14,600, the data file can be verified every day for the next 20 years and 40 years, respectively, which should be of enough use in practice.
  • With Jerasure library [34], the multiplication over GF ð216Þ in their experiment is based on discrete logarithms.
  • Other parameters are along with the file distribution preparation.
  • The authors implementation shows that the average token precomputation cost is about 0.4 ms.
  • Note that each token is only an element of field GF ð216Þ, the extra storage for those precomputed tokens is less than 1MB, and thus can be neglected.

7 CONCLUSION

  • The authors investigate the problem of data security in cloud data storage, which is essentially a distributed storage system.
  • The authors rely on erasure-correcting code in the file distribution preparation to provide redundancy parity vectors and guarantee the data dependability.
  • By utilizing the homomorphic token with distributed verification of erasurecoded data, their scheme achieves the integration of storage correctness insurance and data error localization, i.e., whenever data corruption has been detected during the storage correctness verification across the distributed servers, the authors can almost guarantee the simultaneous identification of the misbehaving server(s).
  • Considering the time, computation resources, and even the related online burden of users, the authors also provide the extension of the proposed main scheme to support third-party auditing, where users can safely delegate the integrity checking tasks to thirdparty auditors and be worry-free to use the cloud storage services.
  • Through detailed security and extensive experiment results, the authors show that their scheme is highly efficient and resilient to Byzantine failure, malicious data modification attack, and even server colluding attacks.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Toward Secure and Dependable
Storage Services in Cloud Computing
Cong Wang, Student Member, IEEE, Qian Wang, Student Member , IEEE,
Kui Ren, Senior Member, IEEE, Ning Cao, and Wenjing Lou, Senior Member, IEEE
Abstract—Cloud storage enables users to remotely store their data and enjoy the on-demand high quality cloud applications without
the burden of local hardware and software management. Though the benefits are clear, such a service is also relinquishing users’
physical possession of their outsourced data, which inevitably poses new security risks toward the correctness of the data in cloud. In
order to address this new problem and further achieve a secure and dependable cloud storage service, we propose in this paper a
flexible distributed storage integrity auditing mechanism, utilizing the homomorphic token and distributed erasure-coded data. The
proposed design allows users to audit the cloud storage with very lightweight communication and computation cost. The auditing result
not only ensures strong cloud storage correctness guarantee, but also simultaneously achieves fast data error localization, i.e., the
identification of misbehaving server. Considering the cloud data are dynamic in nature, the proposed design further supports secure
and efficient dynamic operations on outsourced data, including block modification, deletion, and append. Analysis shows the proposed
scheme is highly efficient and resilient against Byzantine failure, malicious data modification attack, and even server colluding attacks.
Index Terms—Data integrity, dependable distributed storage, error localization, data dynamics, cloud computing.
Ç
1INTRODUCTION
S
EVERAL trends are opening up the era of cloud comput-
ing, which is an Internet-based development and use of
computer technology. The ever cheaper and more powerful
processors, together with the Software as a Service (SaaS)
computing architecture, are transforming data centers into
pools of computing service on a huge scale. The increasing
network bandwidth and reliable yet flexible network
connections make it even possible that users can now
subscribe high quality services from data and software that
reside solely on remote data centers.
Moving data into the cloud offers great convenience to
users since they don’t have to care about the complexities
of direct hardware management. The pioneer of cloud
computing vendors, Amazon Simple Storage Service (S3),
and Amazon Elastic Compute Cloud (EC2) [2] are both
well-known examples. While these internet-based online
services do pr ovide h uge a mounts of st orage space
and customizable computing resources, this computing
platform shift, however, is eliminating the responsibility of
local machines for data maintenance at the same time. As a
result, users are at the mercy of their cloud service
providers (CSP) for the availability and integrity of their
data [3], [4]. On the one hand, although the cloud
infrastructures are much more powerful and reliable than
personal computing devices, broad range of both internal
and external threats for data integrity still exist. Examples
of outages and data loss incidents of noteworthy cloud
storage services appear from time to time [5], [6], [7], [8],
[9]. On the other hand, since users may not retain a local
copy of outsourced data, there exist various incentives for
CSP to behave unfaithfully toward the cloud users
regarding the status of their outsourced data. For example,
to increase the profit margin by reducing cost, it is possible
for CSP to discard rarely accessed data without being
detected in a timely fashion [10]. Similarly, CSP may even
attempt to hide data loss incidents so as to maintain a
reputation [11], [12], [13]. Therefore, although outsourcing
data into the cloud is economically attractive for the cost
and complexity of long-term large-scale data storage, its
lacking of offering strong assurance of data integrity and
availability may impede its wide adoption by both
enterprise and individual cloud users.
In order to achieve the assurances of cloud data integrity
and availability and enforce the quality of cloud storage
service, efficient methods that enable on-dema nd data
correctness verification on behalf of cloud users have to
be designed. However, the fact that users no longer have
physical possession of data in the cloud prohibits the direct
adoption of traditional cryptographic primitives for the
purpose of data integrity protection. Hence, the verification
of cloud storage correctness must be conducted without
explicit knowledge of the whole data files [10], [11], [12],
[13]. Meanwhile, cloud storage is not just a third party data
warehouse. The data stored in the cloud may not only be
220 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 2012
. C. Wang is with the Department of Electrical and Computer Engineering,
Illinois Institute of Technology, 1451 East 55th St., Apt. 1017 N, Chicago,
IL 60616. E-mail: cwang55@iit.edu.
. Q. Wang is with the Department of Electrical and Computer Engineering,
Illinois Institute of Technology, 500 East 33rd St., Apt. 602, Chicago, IL
60616. E-mail: qwang38@iit.edu.
. K. Ren is with the Department of Electrical and Computer Engineering,
Illinois Institute of Technology, 3301 Dearborn St., Siegel Hall 319,
Chicago, IL 60616. E-mail: kren@ece.iit.edu.
. N. Cao is with the Department of Electrical and Computer Engineering,
Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA
01609. E-mail: ncao@wpi.edu.
. W. Lou is with the Department of Computer Science, Virginia Polytechnic
Institute and State University, Falls Church, VA 22043.
E-mail: wjlou@vt.edu.
Manuscript received 4 Apr. 2010; revised 14 Sept. 2010; accepted 25 Dec.
2010; published online 6 May 2011.
For information on obtaining reprints of this article, please send e-mail to:
tsc@computer.org and reference IEEECS Log Number TSCSI-2010-04-0033.
Digital Object Identifier no. 10.1109/TSC.2011.24.
1939-1374/12/$31.00 ß 2012 IEEE Published by the IEEE Computer Society

accessed but also be frequently updated by the users [14],
[15], [16], including insertion, deletion, modification, ap-
pending, etc. Thus, it is also imperative to support the
integration of this dynamic feature into the cloud storage
correctness assurance, which makes the system design even
more challenging. Last but not the least, the deployment of
cloud computing is powered by data centers running in a
simultaneous, cooperated, and distributed manner [3]. It is
more advantages for individual users to store their data
redundantly across multiple phy sical servers so as to
reduce the data integrity and availability threats. Thus,
distributed protocols for storage correctness assurance will
be of most importance in achieving robust and secure cloud
storage systems. However, such important area remains to
be fully explored in the literature.
Recently, the importance of ensuring the remote data
integrity has been highlighted by the following research
works under different system and security models [10], [11],
[12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22]. These
techniques, while can be useful to ensure the storage
correctness without having users possessing local data, are
all focusing on single server scenario. They may be useful for
quality-of-service testing [23], but does not guarantee the
data availability in case of server failures. Although direct
applying these techniques to distributed storage (multiple
servers) could be straightforward, the resulted storage
verification overhead would be linear to the number of
servers. As an complementary approach, researchers have
also proposed distributed protocols [23], [ 24], [ 25] for
ensuring storage correctness across multiple servers or peers.
However, while providing efficient cross server storage
verification and data availability insurance, these schemes
are all focusing on static or archival data. As a result, their
capabilities of handling dynamic data remains unclear,
which inevitabl y limits their full applicability i n cloud
storage scenarios.
In this paper, we propose an effective and flexible
distributed storage verification scheme with explicit dy-
namic data support to ensure the correctness and avail-
ability of users’ data in the cloud. We rely on erasure-
correcting code in the file distribution preparat ion to
provide redundancies and guarantee the data dependability
against Byzantine servers [26], where a storage server may
fail in arbitrary ways. This construction drastically reduces
the communication and storage overhead as compared to
the traditional replication-based file distribution techniques.
By utilizing the homomorphic token with distributed
verification of erasure-coded data, our scheme achieves
the storage correctness insurance as well as data error
localization: whenever data corruption has been detected
during the storage correctness verification, our scheme can
almost guarantee the simultaneous localization of data
errors, i.e., the identification of the misbehaving server(s). In
order to strike a good balance between error resilience and
data dynamics, we further explore the algebraic property of
our token computation and erasure-coded data, and
demonstrate how to efficiently support dynamic operation
on data blocks, while maintaining the same level of storage
correctness assurance. In order to save the time, computa-
tion resources, and even the related online burden of users,
we also provide the extension of the proposed main scheme
to support third-party auditing, where users can safely
delegate the integrity checking tasks to third-party auditors
(TPA) and be worry-free to use the cloud storage services.
Our work is among the first few ones in this field to
consider distributed data storage security in cloud comput-
ing. Our contribution can be summarized as the following
three aspects: 1) Compared to many of its predecessors,
which only provide binary results about the storage status
across the distributed servers, the proposed scheme
achieves the integration of storage correctness insurance
and data error localization, i.e., the identi fication of
misbehaving server(s). 2) Unlike most prior works for
ensuring remote data integrity, the new scheme further
supports secure and efficient dynamic operations on data
blocks, including: update, delete, and append. 3) The
experiment results demonstrate the proposed scheme is
highly efficient. Extensive security analysis shows our
scheme is resilient against Byzantine failure, malicious data
modification attack, and even server colluding attacks.
The rest of the paper is organized as follows: Section 2
introduces the system model, adversary model, our design
goal, and notations. Then we provide the detailed
description of our scheme in Sections 3 and 4. Section 5
gives the security analysis and performance evaluations,
followed by Section 6 which overviews the related work.
Finally, Section 7 concludes the whole paper.
2PROBLEM STATEMENT
2.1 System Model
A representative network architecture for cloud storage
service architecture is illustrated in Fig. 1. Three different
network entities can be identified as follows:
. User: an entity, who has data to be stored in the
cloud and relies on the cloud for data storage and
computation, can be either enterprise or individual
customers.
. Cloud Server (CS): an entity, which is managed by
cloud service provider (CSP) to provide data storage
service and has significant storage space and
computation resources (we will not differentiate CS
and CSP hereafter).
. Third-Party Auditor: an optional TPA, who has
expertise and capabilities that users may not have, is
trusted to assess and expose risk of cloud storage
services on behalf of the users upon request.
In cloud data storage, a user stores his data through a
CSP into a set of cloud servers, which are running in a
WANG ET AL.: TOWARD SECURE AND DEPENDABLE STORAGE SERVICES IN CLOUD COMPUTING 221
Fig. 1. Cloud storage service architecture.

simultaneous, cooperated, and distributed manner. Data
redundancy can be employed with a technique of erasure-
correcting code to further tolerate faults or server crash as
user’s data grow in size and importance. Thereafter, for
application purposes, the user interacts with the cloud
servers via CSP to access or retrieve his data. In some cases,
the user may need to perform block level operations on his
data. The most general forms of these operations we are
considering are block update, delete, insert, and append.
Note that in this paper, we put more focus on the support of
file-oriented cloud applications other than nonfile applica-
tion data, such as social networking data. In other words,
the cloud data we are considering is not expected to be
rapidly changing in a relative short period.
As users no longer possess their data locally, it is of
critical importance to ensure users that their data are being
correctly stored and maintained. That is, users should be
equipped with security means so that they can make
continuous correctness assurance (to enforce cloud storage
service-level agreement) of their stored data even without
the existence of local copies. In case that users do not
necessarily have the time, feasibility or resources to monitor
their data online, they can delegate the data auditing tasks
to an optional trusted TPA of their respective choices.
However, to securely introduce such a TPA, any possible
leakage of user’s outsourced data toward TPA through the
auditing protocol should be prohibited.
In our model, we assume that the point-to-point
communication channels between each cloud server and
the user is authenticated and reliable, which can be
achieved in practice with little overhead. These authentica-
tion handshakes are omitted in the following presentation.
2.2 Adversary Model
From user’s perspective, the adversary model has to capture
all kinds of threats toward his cloud data integrity. Because
cloud data do not reside at user’s local site but at CSP’s
address domain, these threats can come from two different
sources: internal and external attacks. For internal attacks, a
CSP can be self-interested, untrusted, and possibly mal-
icious. Not only does it desire to move data that has not
been or is rarely accessed to a lower tier of storage than
agreed for monetary reasons, but it may also attempt to hide
a data loss incident due to management errors, Byzantine
failures, and so on. For external attacks, data integrity
threats may come from outsiders who are beyond the
control domain of CSP, for example, the economically
motivated attackers. They may compromise a number of
cloud data storage servers in different time intervals and
subsequently be able to modify or delete users’ data while
remaining undetected by CSP.
Therefore, we consider the adversary in our model has
the following capabilities, which captures both external and
internal threats toward the cloud data integrity. Specifically,
the adversary is interested in continuously corrupting the
user’s data files stored on individual servers. Once a server
is comprised, an adversary can pollute the original data files
by modifying or introducing its own fraudulent data to
prevent the original data from being retrieved by the user.
This corresponds to the threats from external attacks. In the
worst case scenario, the adversary can compromise all the
storage servers so that he can intentionally modify the data
files as long as they are internally consistent. In fact, this is
equivalent to internal attack case where all servers are
assumed colluding together from the early stages of
application or service deployment to hide a data loss or
corruption incident.
2.3 Design Goals
To ensure the security and dependability for cloud data
storage under the aforementioned adversary model, we aim
to design efficient mechanisms for dynamic data verifica-
tion and operation and achieve the following goals:
1. Storage correctness: to ensure users that their data
are indeed stored appropriately and kept intact all
the time in the cloud.
2. Fast localization of data error: to effectively locate
the malfunctioning server when data corruption has
been detected.
3. Dynamic data support: to maintain the same level of
storage correctness assurance even if users modify,
delete, or append their data files in the cloud.
4. Dependability: to enhance data availability against
Byzantine failures, malicious data modification and
server colluding attacks, i.e., minimizing the effect
brought by data errors or server failures.
5. Lightweight: to enable users to perform storage
correctness checks with minimum overhead.
2.4 Notation and Preliminaries
. F—the data file to be stored. We assume that F can
be denoted as a matrix of m equal-sized data vectors,
each consisting of l blocks. Data blocks are all well
represented as elements in Galois Field GF ð2
p
Þ for
p ¼ 8 or 16.
. A—The dispersal matrix used for Reed-Solomon
coding.
. G—The encoded file matrix, which includes a set of
n ¼ m þ k vectors, each consisting of l blocks.
. f
key
ðÞ—pseudorandom function (PRF), which is
defined as f : f0; 1g
key ! GF ð2
p
Þ.
.
key
ðÞ—pseudorandom permutation (PRP), which
is defined as : f0; 1g
log
2
ðÞ
key !f0; 1g
log
2
ðÞ
.
. ver—a version number bound with the index for
individual blocks, which records the times the block
has been modified. Initially we assume ver is 0 for all
data blocks.
. s
ver
ij
—the seed for PRF, which depends on the file
name, block index i, the server position j as well as
the optional block version number ver.
3ENSURING CLOUD DATA STORAGE
In cloud data storage system, users store their data in the
cloud and no longer possess the data locally. Thus, the
correctness and availability of the data files being stored on
the distributed cloud servers must be guaranteed. One of
the key issues is to effectively detect any unauthorized data
modification and corruption, possibly due to server
compromise and/or random Byzantine failures. Besides,
in the dis tributed case when such inconsistencies are
222 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 2012

successfully detected, to find which server the data error
lies in is also of great significance, since it can always be the
first step to fast recover the storage errors and/ or
identifying potential threats of external attacks.
To address these problems, our main scheme for ensuring
cloud data storage is presented in this section. The first part
of the section is devoted to a review of basic tools from
coding theory that is needed in our scheme for file
distribution across cloud servers. Then, the homomorphic
token is introduced. The token computation function we are
considering belongs to a family of universal hash function
[27], chosen to preserve the homomorphic properties, which
can be perfectly integrated with the verification of erasure-
coded data [24], [28]. Subsequently, it is shown how to
derive a challenge-response protocol for verifying the
storage correctness as well as identifying misbehaving
servers. The procedure for file retrieval and error recovery
based on erasure-correcting code is also outlined. Finally,
we describe how to extend our scheme to third party
auditing with only slight modification of the main design.
3.1 File Distribution Preparation
It is well known that erasure-correcting code may be used to
tolerate multiple failures in distributed storage systems. In
cloud data storage, we rely on this technique to disperse the
data file F redundantly across a set of n ¼ m þ k distributed
servers. An ðm; kÞ Reed-Solomon erasure-correcting code is
used to create k redundancy parity vectors from m data
vectors in such a way that the original m data vectors can be
reconstructed from any m out of the m þ k data and parity
vectors. By placing each of the m þ k vectors on a different
server, the original data file can survive the failure of any
k of the m þ k servers without any data loss, with a space
overhead of k=m . For support of efficient sequential I/O to
the original file, our file layout is sys tematic, i.e., the
unmodified m data file vectors together with k parity
vectors is distributed across m þ k different servers.
Let F ¼ðF
1
;F
2
; ...;F
m
Þ and F
i
¼ðf
1i
;f
2i
; ...;f
li
Þ
T
ði 2
f1; ...;m. Here, T (shorthand for transpose) denotes that
each F
i
is represented as a column vector, and l denotes
data vector size in blocks. All these blocks are elements of
GFð2
p
Þ. The systematic layout with parity vectors is
achieved with the information dispersal matrix A, derived
from an m ðm þ kÞ Vandermonde matrix [29]
1 1 ... 1 1 ... 1
1
2
...
m
mþ1
...
n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
m1
1
m1
2
...
m1
m
m1
mþ1
...
m1
n
0
B
B
B
@
1
C
C
C
A
;
where
j
ðj 2f1; ...;n are distinct elements randomly
picked from GF ð2
p
Þ.
After a sequence of elementary row transformations, the
desired matrix A can be written as
A ¼ðIjPÞ¼
10...0 p
11
p
12
... p
1k
01...0 p
21
p
22
... p
2k
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
00...1p
m1
p
m2
... p
mk
0
B
B
B
@
1
C
C
C
A
;
where I is a m m identity matrix and P is the secret parity
generation matrix with size m k. Note that A is derived
from a Vandermonde matrix, thus it has the property that
any m out of the m þ k columns form an invertible matrix.
By multiplying F by A, the user obtains the encoded file
G ¼ F A ¼ðG
ð1Þ
;G
ð2Þ
; ...;G
ðmÞ
;G
ðmþ1Þ
; ...;G
ðnÞ
Þ
¼ðF
1
;F
2
; ...;F
m
;G
ðmþ1Þ
; ...;G
ðnÞ
Þ;
where G
ðjÞ
¼ðg
ðjÞ
1
;g
ðjÞ
2
; ...;g
ðjÞ
l
Þ
T
ðj 2f1; ...;n. As noticed,
the multiplication reproduces the original data file vectors
of F and the remaining part ðG
ðmþ1Þ
; ...;G
ðnÞ
Þ are k parity
vectors generated based on F.
3.2 Challenge Token Precomputation
In order to achieve assurance of data storage correctness
and data error localization simultaneously, our scheme
entirely relies on the precomputed verification tokens. The
main idea is as follows: before file distribution the user
precomputes a certain number of short verification tokens
on individual vector G
ðjÞ
ðj 2f1; ...;n, each token cover-
ing a random subset of data blocks. Later, when the user
wants to make sure the storage correctness for the data in
the cloud, he challenges the cloud servers with a set of
randomly generated block indices. Upon receiving chal-
lenge, each cloud server computes a short “signature” over
the specified blocks and returns them to the user. The
values of these signatures should match the corresponding
tokens precomputed by the user. Meanwhile, as all servers
operate over the same subset of the indices, the requested
response values for integrity check must also be a valid
codeword determined by the secret matrix P.
Suppose the user wants to challenge the cloud servers t
times to ensure the correctness of data storage. Then, he must
precompute t verification tokens for each G
ðjÞ
ðj2f1; ...;n,
using a PRF fðÞ, a PRP ðÞ, a challenge key k
chal
, and a
master permutation key K
PRP
. Specifically, to generate the
ith token for server j, the user acts as follows:
1. Derive a random challenge value
i
of GF ð2
p
Þ by
i
¼ f
k
chal
ðiÞ and a permutation key k
ðiÞ
prp
based on
K
PRP
.
2. Compute the set of r randomly-chosen indices
fI
q
1; ...;lj1 q rg; where I
q
¼
k
ðiÞ
prp
ðqÞ:
3. Calculate the token as
v
ðjÞ
i
¼
X
r
q¼1
q
i
G
ðjÞ
½I
q
; where G
ðjÞ
½I
q
¼g
ðjÞ
I
q
:
Note that v
ðjÞ
i
, which is an element of GF ð2
p
Þ with small
size, is the response the user expects to receive from server j
when he challenges it on the specified data blocks.
After token generation, the user has the choice of either
keeping the precomputed tokens locally or storing them in
encrypted form on the cloud servers. In our case here, the
user stores them locally to obviate the need for encryption
and lower the bandwidth overhead during dynamic data
operation which will be discussed shortly. The details of
token generation are shown in Algorithm 1.
WANG ET AL.: TOWARD SECURE AND DEPENDABLE STORAGE SERVICES IN CLOUD COMPUTING 223

Algorithm 1. Token Precomputation.
1: procedure
2: Choose parameters l; n and function f;;
3: Choose the number t of tokens;
4: Choose the number r of indices per verification;
5: Generate master key K
PRP
and challenge key k
chal
;
6: for vector G
ðjÞ
;j 1;n do
7: for round i 1;t do
8: Derive
i
¼ f
k
chal
ðiÞ and k
ðiÞ
prp
from K
PRP
.
9: Compute v
ðjÞ
i
¼
P
r
q¼1
q
i
G
ðjÞ
½
k
ðiÞ
prp
ðqÞ
10: end for
11: end for
12: Store all the v
i
’s locally.
13: end procedure
Once all tokens are computed, the final step before file
distribution is to blind each parity block g
ðjÞ
i
in ðG
ðmþ1Þ
; ...;
G
ðnÞ
Þ by
g
ðjÞ
i
g
ðjÞ
i
þ f
k
j
ðs
ij
Þ;i2f1; ...;lg;
where k
j
is the secret key for parity vector G
ðjÞ
ðj 2fm þ
1; ...;n. This is for protection of the secret matrix P.We
will discuss the necessity of using blinded parities in detail
in Section 5.2. After blinding the parity information, the
user disperses all the n encoded vectors G
ðjÞ
ðj 2f1; ...;n
across the cloud servers S
1
;S
2
; ...;S
n
.
3.3 Correctness Verification and Error Localization
Error localization is a key prerequisite for eliminating errors
in storage systems. It is also of critical importance to
identify potential threats from external attacks. However,
many previous schemes [23], [24] do not explicitly consider
the problem of data error localization, thus only providing
binary results for the storage verification. Our scheme
outperforms those by integrating the correctness verifica-
tion and error localization (misbehaving server identifica-
tion) in our challenge-response protocol: the response
values from servers for each challenge not only determine
the correctness of the distributed storage, but also contain
information to locate potential data error(s).
Specifically, the procedure of the ith challenge-response
for a cross-check over the n servers is described as follows:
1. The user reveals the
i
as well as the ith permutation
key k
ðiÞ
prp
to each servers.
2. The server storing vector G
ðjÞ
ðj 2f1; ...;n aggre-
gates those r rows specified by index k
ðiÞ
prp
into a linear
combination
R
ðjÞ
i
¼
X
r
q¼1
q
i
G
ðjÞ
½
k
ðiÞ
prp
ðqÞ;
and send back R
ðjÞ
i
ðj 2f1; ...;n.
3. Upon receiving R
ðjÞ
i
’s from all the servers, the user
takes away blind values in R
ðjÞ
ðj 2fm þ 1; ...;n by
R
ðjÞ
i
R
ðjÞ
i
X
r
q¼1
f
k
j
ðs
I
q
;j
Þ
q
i
; where I
q
¼
k
ðiÞ
prp
ðqÞ:
4. Then, the user verifies whether the received values
remain a valid codeword determined by the secret
matrix P
R
ð1Þ
i
; ...;R
ðmÞ
i
P ¼
?
R
ðmþ1Þ
i
; ...;R
ðnÞ
i
:
Because all the servers operate over the same subset of
indices, the linear aggregation of these r specified rows
ðR
ð1Þ
i
; ...;R
ðnÞ
i
Þ has to be a codeword in the encoded file
matrix (See Section 5.1 for the correctness analysis). If the
above equation holds, the challenge is passed. Otherwise, it
indicates that among those specified rows, there exist file
block corruptions.
Once the inconsistency among the storage has been
successfully detected, we can rely on the precomputed
verification tokens to further determine where the potential
data error(s) lies in. Note that each response R
ðjÞ
i
is
computed exactly in the same way as token v
ðjÞ
i
, thus the
user can simply find which server is misbehaving by
verifying the following n equations:
R
ðjÞ
i
¼
?
v
ðjÞ
i
;j2f1; ...;ng:
Algorithm 2 gives the details of correctness verification and
error localization.
Algorithm 2. Correctness Verification and Error Localization.
1: procedure C
HALLENGE(i)
2: Recompute
i
¼ f
k
chal
ðiÞ and k
ðiÞ
prp
from K
PRP
;
3: Send f
i
;k
ðiÞ
prp
g to all the cloud servers;
4: Receive from servers:
fR
ðjÞ
i
¼
P
r
q¼1
q
i
G
ðjÞ
½
k
ðiÞ
prp
ðqÞj1 j ng
5: for ðj m þ 1;nÞ do
6: R
ðjÞ
R
ðjÞ
P
r
q¼1
f
k
j
ðs
I
q
;j
Þ
q
i
, I
q
¼
k
ðiÞ
prp
ðqÞ
7: end for
8: if ððR
ð1Þ
i
; ...;R
ðmÞ
i
ÞP¼¼ðR
ðmþ1Þ
i
; ...;R
ðnÞ
i
ÞÞ than
9: Accept and ready for the next challenge.
10: else
11: for (j 1;n) do
12: if ðR
ðjÞ
i
! ¼v
ðjÞ
i
Þ than
13: return server j is misbehaving.
14: end if
15: end for
16: end if
17: end procedure
Discussion. Previous work [23], [24] has suggested using
the decoding capability of error-correction code to treat data
errors. But such approach imposes a bound on the number
of misbehaving servers b by b bk=2c. Namely, they cannot
identify misbehaving servers when b>bk=2c.
1
However,
our token-based approach, while allowing efficient storage
correctness validation, does not have this limitation on the
number of misbehaving servers. That is, our approach can
identify any number of misbehaving servers for b
ðm þ kÞ. Also note that, for every challenge, each server
only needs to send back an aggregated value over the
224 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 2012
1. In [23], the authors also suggest using brute-force decoding when their
dispersal code is an erasure code. However, such brute-force method is
asymptotically inefficient, and still cannot guarantee identification of all
misbehaving servers.

Citations
More filters
Posted Content
TL;DR: This paper defines and explores proofs of retrievability (PORs), a POR scheme that enables an archive or back-up service to produce a concise proof that a user can retrieve a target file F, that is, that the archive retains and reliably transmits file data sufficient for the user to recover F in its entirety.
Abstract: In this paper, we define and explore proofs of retrievability (PORs). A POR scheme enables an archive or back-up service (prover) to produce a concise proof that a user (verifier) can retrieve a target file F, that is, that the archive retains and reliably transmits file data sufficient for the user to recover F in its entirety.A POR may be viewed as a kind of cryptographic proof of knowledge (POK), but one specially designed to handle a large file (or bitstring) F. We explore POR protocols here in which the communication costs, number of memory accesses for the prover, and storage requirements of the user (verifier) are small parameters essentially independent of the length of F. In addition to proposing new, practical POR constructions, we explore implementation considerations and optimizations that bear on previously explored, related schemes.In a POR, unlike a POK, neither the prover nor the verifier need actually have knowledge of F. PORs give rise to a new and unusual security definition whose formulation is another contribution of our work.We view PORs as an important tool for semi-trusted online archives. Existing cryptographic techniques help users ensure the privacy and integrity of files they retrieve. It is also natural, however, for users to want to verify that archives do not delete or modify files prior to retrieval. The goal of a POR is to accomplish these checks without users having to download the files themselves. A POR can also provide quality-of-service guarantees, i.e., show that a file is retrievable within a certain time bound.

1,783 citations

Journal ArticleDOI
TL;DR: This paper proposes a mechanism that combines data deduplication with dynamic data operations in the privacy preserving public auditing for secure cloud storage and shows that the proposed mechanism is highly efficient and provably secure.
Abstract: Using cloud storage, users can remotely store their data and enjoy the on-demand high-quality applications and services from a shared pool of configurable computing resources, without the burden of local data storage and maintenance. However, the fact that users no longer have physical possession of the outsourced data makes the data integrity protection in cloud computing a formidable task, especially for users with constrained computing resources. Moreover, users should be able to just use the cloud storage as if it is local, without worrying about the need to verify its integrity. Thus, enabling public auditability for cloud storage is of critical importance so that users can resort to a third-party auditor (TPA) to check the integrity of outsourced data and be worry free. To securely introduce an effective TPA, the auditing process should bring in no new vulnerabilities toward user data privacy, and introduce no additional online burden to user. In this paper, we propose a secure cloud storage system supporting privacy-preserving public auditing. We further extend our result to enable the TPA to perform audits for multiple users simultaneously and efficiently. Extensive security and performance analysis show the proposed schemes are provably secure and highly efficient. Our preliminary experiment conducted on Amazon EC2 instance further demonstrates the fast performance of the design.

982 citations


Cites background from "Toward Secure and Dependable Storag..."

  • ...In [22], Wang et al....

    [...]

  • ...In Cloud Computing, outsourced data might not only be accessed but also updated frequently by users for various application purposes [21], [8], [22], [23]....

    [...]

Journal ArticleDOI
TL;DR: This paper proposes a basic idea for the MRSE based on secure inner product computation, and gives two significantly improved MRSE schemes to achieve various stringent privacy requirements in two different threat models and further extends these two schemes to support more search semantics.
Abstract: With the advent of cloud computing, data owners are motivated to outsource their complex data management systems from local sites to the commercial public cloud for great flexibility and economic savings. But for protecting data privacy, sensitive data have to be encrypted before outsourcing, which obsoletes traditional data utilization based on plaintext keyword search. Thus, enabling an encrypted cloud data search service is of paramount importance. Considering the large number of data users and documents in the cloud, it is necessary to allow multiple keywords in the search request and return documents in the order of their relevance to these keywords. Related works on searchable encryption focus on single keyword search or Boolean keyword search, and rarely sort the search results. In this paper, for the first time, we define and solve the challenging problem of privacy-preserving multi-keyword ranked search over encrypted data in cloud computing (MRSE). We establish a set of strict privacy requirements for such a secure cloud data utilization system. Among various multi-keyword semantics, we choose the efficient similarity measure of "coordinate matching," i.e., as many matches as possible, to capture the relevance of data documents to the search query. We further use "inner product similarity" to quantitatively evaluate such similarity measure. We first propose a basic idea for the MRSE based on secure inner product computation, and then give two significantly improved MRSE schemes to achieve various stringent privacy requirements in two different threat models. To improve search experience of the data search service, we further extend these two schemes to support more search semantics. Thorough analysis investigating privacy and efficiency guarantees of proposed schemes is given. Experiments on the real-world data set further show proposed schemes indeed introduce low overhead on computation and communication.

979 citations

Journal ArticleDOI
TL;DR: The security issues that arise due to the very nature of cloud computing are detailed and the recent solutions presented in the literature to counter the security issues are presented.

694 citations


Cites background or methods from "Toward Secure and Dependable Storag..."

  • ...To ensure the quality of the cloud storage, integrity and availability of data in the cloud, authors in [110] proposed effectual methodology that supports on-demand data correctness verification....

    [...]

  • ...The data in the cloud is much more vulnerable to risks in terms of confidentiality, integrity, and availability in comparison to the conventional computing model [110]....

    [...]

Journal ArticleDOI
TL;DR: This paper defines and solves the problem of secure ranked keyword search over encrypted cloud data, and explores the statistical measure approach from information retrieval to build a secure searchable index, and develops a one-to-many order-preserving mapping technique to properly protect those sensitive score information.
Abstract: Cloud computing economically enables the paradigm of data service outsourcing. However, to protect data privacy, sensitive cloud data have to be encrypted before outsourced to the commercial public cloud, which makes effective data utilization service a very challenging task. Although traditional searchable encryption techniques allow users to securely search over encrypted data through keywords, they support only Boolean search and are not yet sufficient to meet the effective data utilization need that is inherently demanded by large number of users and huge amount of data files in cloud. In this paper, we define and solve the problem of secure ranked keyword search over encrypted cloud data. Ranked search greatly enhances system usability by enabling search result relevance ranking instead of sending undifferentiated results, and further ensures the file retrieval accuracy. Specifically, we explore the statistical measure approach, i.e., relevance score, from information retrieval to build a secure searchable index, and develop a one-to-many order-preserving mapping technique to properly protect those sensitive score information. The resulting design is able to facilitate efficient server-side ranking without losing keyword privacy. Thorough analysis shows that our proposed solution enjoys “as-strong-as-possible” security guarantee compared to previous searchable encryption schemes, while correctly realizing the goal of ranked keyword search. Extensive experimental results demonstrate the efficiency of the proposed solution.

526 citations


Additional excerpts

  • ...application purposes (see [19], [20], [21], for example)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Book
01 Jan 1994
TL;DR: The book is an introduction to the idea of design patterns in software engineering, and a catalog of twenty-three common patterns, which most experienced OOP designers will find out they've known about patterns all along.
Abstract: The book is an introduction to the idea of design patterns in software engineering, and a catalog of twenty-three common patterns. The nice thing is, most experienced OOP designers will find out they've known about patterns all along. It's just that they've never considered them as such, or tried to centralize the idea behind a given pattern so that it will be easily reusable.

22,762 citations

Book ChapterDOI
TL;DR: Pastry as mentioned in this paper is a scalable, distributed object location and routing substrate for wide-area peer-to-peer ap- plications, which performs application-level routing and object location in a po- tentially very large overlay network of nodes connected via the Internet.
Abstract: This paper presents the design and evaluation of Pastry, a scalable, distributed object location and routing substrate for wide-area peer-to-peer ap- plications. Pastry performs application-level routing and object location in a po- tentially very large overlay network of nodes connected via the Internet. It can be used to support a variety of peer-to-peer applications, including global data storage, data sharing, group communication and naming. Each node in the Pastry network has a unique identifier (nodeId). When presented with a message and a key, a Pastry node efficiently routes the message to the node with a nodeId that is numerically closest to the key, among all currently live Pastry nodes. Each Pastry node keeps track of its immediate neighbors in the nodeId space, and notifies applications of new node arrivals, node failures and recoveries. Pastry takes into account network locality; it seeks to minimize the distance messages travel, according to a to scalar proximity metric like the number of IP routing hops. Pastry is completely decentralized, scalable, and self-organizing; it automatically adapts to the arrival, departure and failure of nodes. Experimental results obtained with a prototype implementation on an emulated network of up to 100,000 nodes confirm Pastry's scalability and efficiency, its ability to self-organize and adapt to node failures, and its good network locality properties.

7,423 citations

Journal ArticleDOI
Leslie Lamport1
TL;DR: In this article, the concept of one event happening before another in a distributed system is examined, and a distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events.
Abstract: The concept of one event happening before another in a distributed system is examined, and is shown to define a partial ordering of the events. A distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events. The use of the total ordering is illustrated with a method for solving synchronization problems. The algorithm is then specialized for synchronizing physical clocks, and a bound is derived on how far out of synchrony the clocks can become.

6,804 citations

Journal ArticleDOI
01 Aug 2001
TL;DR: The authors present an extensible and open Grid architecture, in which protocols, services, application programming interfaces, and software development kits are categorized according to their roles in enabling resource sharing.
Abstract: "Grid" computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high performance orientation. In this article, the authors define this new field. First, they review the "Grid problem," which is defined as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources--what is referred to as virtual organizations. In such settings, unique authentication, authorization, resource access, resource discovery, and other challenges are encountered. It is this class of problem that is addressed by Grid technologies. Next, the authors present an extensible and open Grid architecture, in which protocols, services, application programming interfaces, and software development kits are categorized according to their roles in enabling resource sharing. The authors describe requirements that they believe any such mechanisms must satisfy and discuss the importance of defining a compact set of intergrid protocols to enable interoperability among different Grid systems. Finally, the authors discuss how Grid technologies relate to other contemporary technologies, including enterprise integration, application service provider, storage service provider, and peer-to-peer computing. They maintain that Grid concepts and technologies complement and have much to contribute to these other approaches.

6,716 citations

Frequently Asked Questions (1)
Q1. What are the contributions in "Toward secure and dependable storage services in cloud computing" ?

In order to address this new problem and further achieve a secure and dependable cloud storage service, the authors propose in this paper a flexible distributed storage integrity auditing mechanism, utilizing the homomorphic token and distributed erasure-coded data. Considering the cloud data are dynamic in nature, the proposed design further supports secure and efficient dynamic operations on outsourced data, including block modification, deletion, and append.