Toward Secure and Dependable Storage Services in Cloud Computing
Summary (7 min read)
1 INTRODUCTION
- SEVERAL trends are opening up the era of cloud comput-ing, which is an Internet-based development and use of computer technology.
- As a result, users are at the mercy of their cloud service providers (CSP) for the availability and integrity of their data [3], [4].
- As an complementary approach, researchers have also proposed distributed protocols [23], [24], [25] for ensuring storage correctness across multiple servers or peers.
- Section 5 gives the security analysis and performance evaluations, followed by Section 6 which overviews the related work.
2.1 System Model
- A representative network architecture for cloud storage service architecture is illustrated in Fig.
- In cloud data storage, a user stores his data through a CSP into a set of cloud servers, which are running in a simultaneous, cooperated, and distributed manner.
- The most general forms of these operations the authors are considering are block update, delete, insert, and append.
- As users no longer possess their data locally, it is of critical importance to ensure users that their data are being correctly stored and maintained.
- In case that users do not necessarily have the time, feasibility or resources to monitor their data online, they can delegate the data auditing tasks to an optional trusted TPA of their respective choices.
2.2 Adversary Model
- From user’s perspective, the adversary model has to capture all kinds of threats toward his cloud data integrity.
- Because cloud data do not reside at user’s local site but at CSP’s address domain, these threats can come from two different sources: internal and external attacks.
- Not only does it desire to move data that has not been or is rarely accessed to a lower tier of storage than agreed for monetary reasons, but it may also attempt to hide a data loss incident due to management errors, Byzantine failures, and so on.
- Therefore, the authors consider the adversary in their model has the following capabilities, which captures both external and internal threats toward the cloud data integrity.
- Specifically, the adversary is interested in continuously corrupting the user’s data files stored on individual servers.
2.3 Design Goals
- To ensure the security and dependability for cloud data storage under the aforementioned adversary model, the authors aim to design efficient mechanisms for dynamic data verification and operation and achieve the following goals:.
- To ensure users that their data are indeed stored appropriately and kept intact all the time in the cloud, also known as 1. Storage correctness.
- Fast localization of data error: to effectively locate the malfunctioning server when data corruption has been detected.
- To enhance data availability against Byzantine failures, malicious data modification and server colluding attacks, i.e., minimizing the effect brought by data errors or server failures, also known as 4. Dependability.
- To enable users to perform storage correctness checks with minimum overhead, also known as 5. Lightweight.
2.4 Notation and Preliminaries
- A—The dispersal matrix used for Reed-Solomon coding. .
- F0; 1glog2ð‘Þ. . ver—a version number bound with the index for individual blocks, which records the times the block has been modified.
- . sverij —the seed for PRF, which depends on the file name, block index i, the server position j as well as the optional block version number ver.
3 ENSURING CLOUD DATA STORAGE
- In cloud data storage system, users store their data in the cloud and no longer possess the data locally.
- Thus, the correctness and availability of the data files being stored on the distributed cloud servers must be guaranteed.
- Besides, in the distributed case when such inconsistencies are successfully detected, to find which server the data error lies in is also of great significance, since it can always be the first step to fast recover the storage errors and/or identifying potential threats of external attacks.
- Finally, the authors describe how to extend their scheme to third party auditing with only slight modification of the main design.
3.1 File Distribution Preparation
- It is well known that erasure-correcting code may be used to tolerate multiple failures in distributed storage systems.
- For support of efficient sequential I/O to the original file, their file layout is systematic, i.e., the unmodified m data file vectors together with k parity vectors is distributed across mþ k different servers.
- All these blocks are elements of GF ð2pÞ.
3.2 Challenge Token Precomputation
- In order to achieve assurance of data storage correctness and data error localization simultaneously, their scheme entirely relies on the precomputed verification tokens.
- Later, when the user wants to make sure the storage correctness for the data in the cloud, he challenges the cloud servers with a set of randomly generated block indices.
- In their case here, the user stores them locally to obviate the need for encryption and lower the bandwidth overhead during dynamic data operation which will be discussed shortly.
- The details of token generation are shown in Algorithm 1. Algorithm 1. Token Precomputation.
- The authors will discuss the necessity of using blinded parities in detail in Section 5.2.
3.3 Correctness Verification and Error Localization
- Error localization is a key prerequisite for eliminating errors in storage systems.
- Many previous schemes [23], [24] do not explicitly consider the problem of data error localization, thus only providing binary results for the storage verification.
- Specifically, the procedure of the ith challenge-response for a cross-check over the n servers is described as follows: 1. The user reveals the i as well as the ith permutation key kðiÞprp to each servers.
- Otherwise, it indicates that among those specified rows, there exist file block corruptions.
- That is, their approach can identify any number of misbehaving servers for b ðmþ kÞ.
3.4 File Retrieval and Error Recovery
- Since their layout of file matrix is systematic, the user can reconstruct the original file by downloading the data vectors from the first m servers, assuming that they return the correct response values.
- Notice that their verification scheme is based on random spot-checking, so the storage correctness assurance is a probabilistic one.
- By choosing system parameters ðe:g:; r; l; tÞ appropriately and conducting enough times of verification, one can guarantee the successful file retrieval with high probability.the authors.
- On the other hand, whenever the data corruption is detected, the comparison of precomputed tokens and received response values can guarantee the identification of misbehaving server(s) (again with high probability), which will be discussed shortly.
- Assume the block corruptions have been detected among % the specified r rows; % Assume s k servers have been identified misbehaving 2: Download r rows of blocks from servers; 3: Treat s servers as erasures and recover the blocks.
3.5 Toward Third Party Auditing
- As discussed in their architecture, in case the user does not have the time, feasibility, or resources to perform the storage correctness verification, he can optionally delegate this task to an independent third-party auditor, making the cloud storage publicly verifiable.
- As pointed out by the recent work [30], [31], to securely introduce an effective TPA, the auditing process should bring in no new vulnerabilities toward user data privacy.
- Now the authors show that with only slight modification, their protocol can support privacy-preserving third party auditing.
- The new design is based on the observation of linear property of the parity vector blinding process.
- Thus, the overall computation overhead and communication overhead remains roughly the same.
4 PROVIDING DYNAMIC DATA OPERATION SUPPORT
- This model may fit some application scenarios, such as libraries and scientific data sets.
- Therefore, it is crucial to consider the dynamic case, where a user may wish to perform various block-level operations of update, delete, and append to modify the data file while maintaining the storage correctness assurance.
- On the other hand, users need to ensure that all the dynamic data operation request has been faithfully processed by CSP.
- Only with the accordingly changed storage verification tokens, the previously discussed challenge-response protocol can be carried on successfully even after data dynamics.
4.1 Update Operation
- In cloud data storage, a user may need to modify some data block(s) stored in the cloud, from its current value fij to a new one, fij þ fij.
- Fig. 2 gives the high level logical representation of data block update.
- In other words, for all the unused tokens, the user needs to exclude every occurrence of the old data block and replace it with the new one.
- Thanks to the homomorphic construction of their verification token, the user can perform the token update efficiently.
- Note that by using the new seed sverij for the PRF functions every time (for a block update operation), the authors can ensure the freshness of the random value embedded into parity blocks.
4.2 Delete Operation
- Sometimes, after being stored in the cloud, certain data blocks may need to be deleted.
- The delete operation the authors are considering is a general one, in which user replaces the data block with zero or some special reserved data symbol.
- From this point of view, the delete operation is actually a special case of the data update operation, where the original data blocks can be replaced with zeros or some predetermined special blocks.
- In practice, it is possible that only a fraction of tokens need amendment, since the updated blocks may not be covered by all the tokens.
- Also, all the affected tokens have to be modified and the updated parity information has to be blinded using the same method specified in an update operation.
4.3 Append Operation
- In some cases, the user may want to increase the size of his stored data by adding blocks at the end of the data file, which the authors refer as data append.
- This idea of supporting block append was first suggested by Ateniese et al. [14] in a single server setting, and it relies on both the initial budget for the maximum anticipated data size lmax in each encoded data vector and the system parameter rmax ¼ dr ðlmax=lÞe for each precomputed challenge-response token.
- Because the cloud servers and the user have the agreement on the number of existing blocks in each vectorGðjÞ, servers will follow exactly the above procedure when recomputing the token values upon receiving user’s challenge request.
- Now when the user is ready to append new blocks, i.e., both the file blocks and the corresponding parity blocks are generated, the total length of each vector GðjÞ will be increased and fall into the range ½l; lmax .
- The parity blinding is similar as introduced in update operation, and thus is omitted here.
4.4 Insert Operation
- An insert operation to the data file refers to an append operation at the desired index position while maintaining the same data block structure for the whole data file, i.e., inserting a block F ½j corresponds to shifting all blocks starting with index jþ 1 by one slot.
- Thus, an insert operation may affect many rows in the logical data file matrix F, and a substantial number of computations are required to renumber all the subsequent blocks as well as recompute the challenge-response tokens.
- In order to fully support block insertion operation, recent work [15], [16] suggests utilizing additional data structure (for example, Merkle Hash Tree [32]) to maintain and enforce the block index information.
- On the other hand, using the block index mapping information, the user can still access or retrieve the file as it is.
- Note that as a tradeoff, the extra data structure information has to be maintained locally on the user side.
5 SECURITY ANALYSIS AND PERFORMANCE EVALUATION
- The authors analyze their proposed scheme in terms of correctness, security, and efficiency.
- The authors security analysis focuses on the adversary model defined in Section 2.
- The authors also evaluate the efficiency of their scheme via implementation of both file distribution preparation and verification token precomputation.
5.1 Correctness Analysis
- First, the authors analyze the correctness of the verification procedure.
- Thus, it is clear to show that as long as each server operates on the same specified subset of rows, the above checking equation will always hold.
5.2.1 Detection Probability against Data Modification
- In their scheme, servers are required to operate only on specified rows in each challenge-response protocol execution.
- Suppose nc servers are misbehaving due to the possible compromise or Byzantine failure.
- The authors will leave the explanation on collusion resistance of their scheme against this worst case scenario in a later section.
- Next, the authors study the probability of a false negative result that there exists at least one invalid response calculated from those specified r rows, but the checking equation still holds.
- From the figure the authors can see that if more than a fraction of the data file is corrupted, then it suffices to challenge for a small constant number of rows in order to achieve detection with high probability.
5.2.2 Identification Probability for Misbehaving Servers
- The authors have shown that, if the adversary modifies the data blocks among any of the data storage servers, their sampling checking scheme can successfully detect the attack with high probability.
- As long as the data modification is caught, the user will further determine which server is malfunctioning.
- It is the product of the matching probability for sampling check and the probability of complementary event for the false negative result.
- Note that if the number of detected misbehaving servers is less than the parity vectors, the authors can use erasure-correcting code to recover the corrupted data, achieving storage dependability as shown at Section 3.4 and Algorithm 3.
5.2.3 Security Strength against Worst Case Scenario
- The authors now explain why it is a must to blind the parity blocks and how their proposed schemes achieve collusion resistance against the worst case scenario in the adversary model.
- Recall that in the file distribution preparation, the redundancy parity vectors are calculated via multiplying the file matrix F by P, where P is the secret parity generation matrix the authors later rely on for storage correctness assurance.
- Once they have the knowledge of P, those malicious servers can consequently modify any part of the data blocks and calculate the corresponding parity blocks, and vice versa, making their codeword relationship always consistent.
- Therefore, their storage correctness challenge scheme would be undermined—even if those modified blocks are covered by the specified rows, the storage correctness check equation would always hold.
- By blinding each parity block with random perturbation, the malicious servers no longer have all the necessary information to build up the correct linear equation groups and therefore cannot derive the secret matrix P.
5.3 Performance Evaluation
- The authors now assess the performance of the proposed storage auditing scheme.
- Algorithms are implemented using open-source erasure coding library Jerasure [34] written in C.
- All results represent the mean of 20 trials.
5.3.1 File Distribution Preparation
- As discussed, file distribution preparation includes the generation of parity vectors (the encoding part) as well as the corresponding parity blinding part.
- The authors consider two sets of different parameters for the ðm; kÞ Reed-Solomon encoding, both of which work over GF ð216Þ.
- Fig. 4 shows the total cost for preparing a 1 GB file before outsourcing.
- From the figure, the authors can see the number k is the dominant factor for the cost of both parity generation and parity blinding.
- For the same reason, the two-layer coding structure makes the solution in [23] more suitable for static data only, as any change to the contents of file F must propagate through the two-layer error-correcting code, which entails both high communication and computation complexity.
5.3.2 Challenge Token Computation
- When t is selected to be 7,300 and 14,600, the data file can be verified every day for the next 20 years and 40 years, respectively, which should be of enough use in practice.
- With Jerasure library [34], the multiplication over GF ð216Þ in their experiment is based on discrete logarithms.
- Other parameters are along with the file distribution preparation.
- The authors implementation shows that the average token precomputation cost is about 0.4 ms.
- Note that each token is only an element of field GF ð216Þ, the extra storage for those precomputed tokens is less than 1MB, and thus can be neglected.
7 CONCLUSION
- The authors investigate the problem of data security in cloud data storage, which is essentially a distributed storage system.
- The authors rely on erasure-correcting code in the file distribution preparation to provide redundancy parity vectors and guarantee the data dependability.
- By utilizing the homomorphic token with distributed verification of erasurecoded data, their scheme achieves the integration of storage correctness insurance and data error localization, i.e., whenever data corruption has been detected during the storage correctness verification across the distributed servers, the authors can almost guarantee the simultaneous identification of the misbehaving server(s).
- Considering the time, computation resources, and even the related online burden of users, the authors also provide the extension of the proposed main scheme to support third-party auditing, where users can safely delegate the integrity checking tasks to thirdparty auditors and be worry-free to use the cloud storage services.
- Through detailed security and extensive experiment results, the authors show that their scheme is highly efficient and resilient to Byzantine failure, malicious data modification attack, and even server colluding attacks.
Did you find this useful? Give us your feedback
Citations
1,783 citations
982 citations
Cites background from "Toward Secure and Dependable Storag..."
...In [22], Wang et al....
[...]
...In Cloud Computing, outsourced data might not only be accessed but also updated frequently by users for various application purposes [21], [8], [22], [23]....
[...]
979 citations
694 citations
Cites background or methods from "Toward Secure and Dependable Storag..."
...To ensure the quality of the cloud storage, integrity and availability of data in the cloud, authors in [110] proposed effectual methodology that supports on-demand data correctness verification....
[...]
...The data in the cloud is much more vulnerable to risks in terms of confidentiality, integrity, and availability in comparison to the conventional computing model [110]....
[...]
526 citations
Additional excerpts
...application purposes (see [19], [20], [21], for example)....
[...]
References
88,255 citations
22,762 citations
7,423 citations
6,804 citations
6,716 citations