scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Identity-Based Remote Data Integrity Checking With Perfect Data Privacy Preserving for Cloud Storage

TL;DR: This paper proposes a new construction of identity-based (ID-based) RDIC protocol by making use of key-homomorphic cryptographic primitive to reduce the system complexity and the cost for establishing and managing the public key authentication framework in PKI-based RDIC schemes.
Abstract: Remote data integrity checking (RDIC) enables a data storage server, say a cloud server, to prove to a verifier that it is actually storing a data owner’s data honestly. To date, a number of RDIC protocols have been proposed in the literature, but most of the constructions suffer from the issue of a complex key management, that is, they rely on the expensive public key infrastructure (PKI), which might hinder the deployment of RDIC in practice. In this paper, we propose a new construction of identity-based (ID-based) RDIC protocol by making use of key-homomorphic cryptographic primitive to reduce the system complexity and the cost for establishing and managing the public key authentication framework in PKI-based RDIC schemes. We formalize ID-based RDIC and its security model, including security against a malicious cloud server and zero knowledge privacy against a third party verifier. The proposed ID-based RDIC protocol leaks no information of the stored data to the verifier during the RDIC process. The new construction is proven secure against the malicious server in the generic group model and achieves zero knowledge privacy against a verifier. Extensive security analysis and implementation results demonstrate that the proposed protocol is provably secure and practical in the real-world applications.

Summary (2 min read)

Introduction

  • Cloud storage, data integrity, privacy preserving, identity-based cryptography.
  • Cloud computing brings a number of advantages for cloud users.
  • Recently, remote data integrity checking becomes more and more significant due to the development of distributed storage systems and online storage systems.
  • The authors formally prove the correctness, soundness and zero knowledge privacy of their ID-based RDIC protocol in section V.

B. Equality of Discrete Logarithm

  • An identity-based signature (IDS) scheme [38], [39] consists of four polynomial-time, probabilistic algorithms described below.
  • This algorithm takes as input the security parameter k and outputs the master secret key msk and the master public key mpk.
  • This algorithm takes as input a user’s identity ID, the master secret key msk and generates a secret key usk for the user.

III. SYSTEM MODEL AND SECURITY MODEL

  • The authors describe the system model and security model of identity-based RDIC protocols.
  • The RDIC protocols with public verifiability enable anyone to audit the integrity of the outsourced data.
  • Four different entities namely the KGC, the cloud user, the cloud server and the TPA are involved in the system.
  • Each entity has their own obligations and benefits respectively.
  • The TPA’s job is to perform the data integrity checking on behalf the cloud user, but the TPA is also curious in the sense that he is willing to learn some information of the users’ data during the data integrity checking procedure.

B. System Components and its Security

  • Six algorithms namely Setup, Extract, TagGen, Challenge, ProofGen and ProofCheck are involved in an identity-based RDIC system.
  • It takes the system parameters param, the master secret key msk and a user’s identity ID ∈ {0, 1}∗ as input, outputs the secret key skID that corresponds to the identity ID.
  • The authors consider three security properties namely completeness, security against a malicious server , and privacy against the TPA (perfect data privacy) in identity-based remote data integrity checking protocols.
  • For a file F of which a TagGen query has been made, the adversary can undertake executions of the ProofGen algorithm by specifying an identity ID of the data owner and the file name Fn.
  • The challenger plays the role of the TPA and the adversary A behaves as the prover during the proof generation.

IV. OUR CONSTRUCTION

  • The authors provide a concrete construction of secure identity-based remote data integrity checking protocol supporting perfect data privacy protection.
  • In the proof generation, the cloud server computes a response using the challenged blocks, obtains the corresponding plaintext and forwards it to the TPA.
  • If the equality holds, the verifier accepts the proof; Otherwise, the proof is invalid.

V. SECURITY ANALYSIS OF THE NEW PROTOCOL

  • The authors show that the proposed scheme achieves the properties of completeness, soundness and perfect data privacy preserving.
  • Completeness guarantees the correctness of the protocol while soundness shows that the protocol is secure against an untrusted server.
  • Perfect data privacy states that the protocol leaks no information of the stored files to the verifier.

C. Perfect Data Privacy Preserving

  • To prove that the scheme preserve data privacy, the authors show how to construct a simulator S, having blackbox-access to verifier V , can simulate the remote data integrity checking protocol without the knowledge of the data file blocks {mi} nor their corresponding {σi}3.
  • Next, S extracts from V the value ρ. Due 3Since {σi} also contains information about the file block mi.

A. Numerical Analysis

  • The authors provide a numerical analysis of costs regarding computation, communication and storage of the proposed protocol in this part.
  • The authors present the computation cost from the viewpoint of the KGC, the data owner, the cloud server and the verifier (TPA).
  • This implies that the timing results for Setup, Extract and TagGen steps are constant for this part.
  • The authors can see that it costs the verifier only about 3.0 seconds to verify a response and the server 0.7 seconds to generate a response when challenging 460 blocks.
  • In the second part, the authors test the most expensive algorithm TagGen of the protocol by increasing the file size from 200 KB to 2 MB, that is, from 10, 000 blocks to 100, 000 blocks accordingly, and record the time for TagGen.

VII. CONCLUSION

  • The authors investigated a new primitive called identity-based remote data integrity checking for secure cloud storage.
  • The authors formalized the security model of two important properties of this primitive, namely, soundness and perfect data privacy.
  • The authors provided a new construction of of this primitive and showed that it achieves soundness and perfect data privacy.
  • Both the numerical analysis and the implementation demonstrated that the proposed protocol is efficient and practical.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

ORE Open Research Exeter
TITLE
Identity-based remote data integrity checking with perfect data privacy preserving for cloud storage
AUTHORS
Yu, Y; Au, MH; Ateniese, G; et al.
JOURNAL
IEEE Transactions on Information Forensics and Security
DEPOSITED IN ORE
14 February 2017
This version available at
http://hdl.handle.net/10871/25833
COPYRIGHT AND REUSE
Open Research Exeter makes this work available in accordance with publisher policies.
A NOTE ON VERSIONS
The version presented here may differ from the published version. If citing, you are advised to consult the published version for pagination, volume/issue and date of
publication

1
Identity-based Remote Data Integrity Checking with
Perfect Data Privacy Preserving for Cloud Storage
Yong Yu, Man Ho Au
, Giuseppe Ateniese, Xinyi Huang, Yuanshun Dai, Willy Susilo and Geyong Min
Abstract Remote data integrity checking (RDIC) enables a
data storage server, such as a cloud server, to prove to a
verifier that it is actually storing a data owner’s data honestly.
To date, a number of RDIC protocols have been proposed in
the literature, but almost all the constructions suffer from the
issue of a complex key management, that is, they rely on the
expensive public key infrastructure (PKI), which might hinder
the deployment of RDIC in practice. In this paper, we propose
a new construction of identity-based (ID-based) RDIC protocol
by making use of key-homomorphic cryptographic primitive
to reduce the system complexity and the cost for establishing
and managing the public key authentication framework in PKI
based RDIC schemes. We formalize ID-based RDIC and its
security model including security against a malicious cloud server
and zero knowledge privacy against a third party verifier. We
then provide a concrete construction of ID-based RDIC scheme
which leaks no information of the stored files to the verifier
during the RDIC process. The new construction is proven secure
against the malicious server in the generic group model and
achieves zero knowledge privacy against a verifier. Extensive
security analysis and implementation results demonstrate that
the proposed new protocol is provably secure and practical in
the real-world applications.
Keywords: Cloud storage, data integrity, privacy preserving,
identity-based cryptography.
I. INTRODUCTION
Cloud computing [1], which has received considerable at-
tention from research communities in academia as well as
industry, is a distributed computation model over a large pool
of shared-virtualized computing resources, such as storage,
processing power, applications and services. Cloud users are
provisioned and de-provisioned recourses as they want in
cloud computing environment. This kind of new computing
represents a vision of providing computing services as public
utilities like water and electricity. Cloud computing brings
a number of advantages for cloud users. As examples, this
include the following issues: (1) Users can avoid capital
expenditure on hardware, software and services because they
Corresponding Author
Yong Yu, Yuanshun Dai and Geyong Min are with School of Computer
Science and Engineering, University of Electronic Science and Technology
of China, Chengdu, 611731, China. Email: yuyong@uestc.edu.cn
Yong Yu and Xinyi Huang are with Fujian Provincial Key Laboratory of
Network Security and Cryptology, Fujian Normal University, Fuzhou 350007,
China.
Man Ho Au is with Department of Computing, The Hong Kong Polytechnic
University, China.
Giuseppe Ateniese is with Department of Computer Science, Sapienza
University of Roma, Italy
Willy Susilo is with the Center for Computer and Information Security
Research, School of Computing and Information Technology, University of
Wollongong, Wollongong, NSW 2522, Australia.
pay only for what they use; (2) Users can enjoy low man-
agement overhead and immediate access to a wide range of
applications; and (3) Users can access their data wherever they
are, rather than having to stay close to their computers.
However, there is a vast variety of barriers before cloud
computing can be widely deployed. A recent survey by Oracle
referred the data source from international data corporation
enterprise panel, showing that security represents 87% of
users’ cloud fears
1
. One of the major security concerns of
cloud users is the integrity of their outsourced files since
they no longer physically possess their data and thus lose
the control over their data. Moreover, the cloud server is not
fully trusted and it is not mandatory for the cloud server to
report data loss incidents. Indeed, to ascertain cloud computing
reliability, the cloud security alliance (CSA) published an
analysis of cloud vulnerability incidents. The investigation [2]
revealed that the incident of data Loss & Leakage accounted
for 25% of all incidents, ranked second only to ”Insecure
Interfaces & APIs”. Take Amazon’s cloud crash disaster as
an example
2
. In 2011, Amazon’s huge EC2 cloud services
crash permanently destroyed some data of cloud users. The
data loss was apparently small relative to the total data stored,
but anyone who runs a website can immediately understand
how terrifying a prospect any data loss is. Sometimes it is
insufficient to detect data corruption when accessing the data
because it might be too late to recover the corrupted data. As
a result, it is necessary for cloud users to frequently check if
their outsourced data are stored properly.
The size of the cloud data is huge, downloading the entire
file to check the integrity might be prohibitive in terms of
bandwidth cost, and hence, very impractical. Moreover, tradi-
tional cryptographic primitives for data integrity checking such
as hash functions, authorisation code (MAC) cannot apply
here directly due to being short of a copy of the original file
in verification. In conclusion, remote data integrity checking
for secure cloud storage is a highly desirable as well as a
challenging research topic.
Blum proposed an auditing issue for the first time that en-
ables data owners to verify the integrity of remote data without
explicit knowledge of the entire data [3]. Recently, remote
data integrity checking becomes more and more significant
due to the development of distributed storage systems and
online storage systems. Provable data possession (PDP) [4],
[5] at untrusted stores, introduced by Ateniese et al., is a novel
technique for “blockless validating” data integrity over remote
1
IDC Enterprise Panel, 2010
2
http://www.businessinsider.com/amazon-lost-data-2011-4

2
servers. In PDP, the data owner generates some metadata for a
file, and then sends his data file together with the metadata to
a remote server and deletes the file from its local storage.
To generate a proof that the server stores the original file
correctly, the server computes a response to a challenge from
the verifier. The verifier can verify if the file keeps unchanged
via checking the correctness of the response. PDP is a practical
approach to checking the integrity of cloud data since it adopts
a spot-checking technique. Specifically, a file is divided into
blocks and a verifier only challenges a small set of randomly
chosen clocks for integrity checking. According to the example
given by Ateniese et al. [4], for a file with 10, 000 blocks,
if the server has deleted 1% of the blocks, then a verifier
can detect server’s misbehavior with probability greater than
99% by asking proof of possession for only 460 randomly
selected blocks. Ateniese et al. proposed two concrete PDP
constructions by making use of RSA-based homomorphic
linear authenticators. Due to its necessity and practicability,
remote data integrity checking has attracted extensive research
interest [7]–[10], [12]–[15], [17]–[20], [24], [25] in recent
years. Shacham and Waters [7] proposed the notion of compact
proofs of retrievability by making use of publicly verifiable
homomorphic authenticators from BLS signature [36]. This
scheme also relies on homomorphic properties to aggregate
a proof into a small authenticator value and as a result, the
public retrievability can be achieved.
Ateniese et al. [21] considered dynamic PDP scheme for
the first time based on hash functions and symmetric key
encryptions, which means the data owner can dynamically
update their file after they store their data on the cloud server.
The dynamics operation involves data insertion, modification,
deletion and appending. This scheme [21] is efficient but has
only limited number of queries and block insertion cannot
explicitly be supported. Erway et al. [23] extended the PDP
model to dynamic PDP model by utilizing rank-based au-
thenticated skip lists. Wang et al. [15] improved the previous
PDP models by manipulating the Merkle Hash Tree (MHT)
for block tag authentication. A recent work due to Liu et al.
[24] showed that MHT itself is not enough to verify the block
indices, which may lead to replace attack. They gave top-
down levelled multi-replica MHT based data auditing scheme
for dynamic big data storage on the cloud.
In data integrity checking with public verifiability, an ex-
ternal auditor (or anyone) is able to verify the integrity of
the cloud data. In this scenario, data privacy against the third
party verifier is highly essential since the cloud users may
store confidential or sensitive files say business contracts or
medical records to the cloud. However, this issue has not been
fully investigated. The “privacy” definition in the previous
privacy-preserving public auditing scheme [13] requires that
the verifier cannot recover the whole blocks from the responses
generated by the cloud server. However, this definition is
not strong enough, for example, it is vulnerable to dictio-
nary attack. Wang et al. [25] proposed the notion of “zero-
knowledge public auditing” to resist off-line guessing attack.
However, a formal security model is not provided in this work.
Yu et al. [26] recently enhanced the privacy of remote data
integrity checking protocols for secure cloud storage, but their
model works only in public key infrastructure (PKI) based
scenario instead of the identity-based framework. Encrypting
the file before outsourcing can partially address the data
privacy issue but leads to losing the flexibility of the protocols,
since privacy preserving RDIC protocols can be used as a
building block for other primitives. For example, Ateniese et
al. [27] proposed a framework for building leakage-resilient
identification protocols in the bounded retrieval model from
publicly verifiable proofs of storage that are computationally
zero-knowledge, but in the identification schemes, even the
encrypted files must stay private.
Currently, a majority of the existing RDIC constructions
rely on PKI where a digital certificate is used to guarantee
the genuine of a user’s public key. These constructions incur
complex key management procedures since certificate gen-
eration, certificate storage, certificate update and certificate
revocation are time-consuming and expensive. There is a
variety of standards, say the Internet X.509 PKI certificate
policy and certification practices framework (RFC 2527), that
cover aspects of PKI. However, it lacks predominant governing
body to enforce these standards. Despite a certificate authority
(CA) is often regarded as trusted, drawbacks in the security
procedures of various CAs have jeopardized trust in the entire
PKI on which the Internet depends on. For instance, after
discovering more than 500 fake certificates, web browser
vendors were forced to blacklist all certificates issued by
DigiNotar, a Dutch CA, in 2011. An alternative approach
to using a certificate to authenticate public key is identity-
based cryptography [28], in which the public key of a user
is simply his identity, say, his name, email or IP address. A
trusted key distribution center (KGC) generates a secret key for
each user corresponding to his identity. When all users have
their secret keys issued by the same KGC, individual public
keys become obsolete, thus removing the need for explicit
certification and all associated costs. These features make
the identity-based paradigm particularly appealing for use in
conjunction with organization-oriented PDP. For example, a
university purchases the cloud storage service for the staff
and students, who have a valid E-mail address issued by the IT
department of the university. All the members of the university
can have a secret key provided by the KGC, say the IT
department. The members of the university can store their data
together with the meta-data of the file to the cloud. To ensure
the data are stored properly, an auditor, a staff of IT department
can check the integrity for any member with his E-mail
address only, which can relieve the complex key management
caused by PKI. The first ID-based PDP was proposed in
[29] which converted the ID-based aggregate signature due to
Gentry [30] to an ID-based PDP protocol. Wang [31] proposed
another identity-based provable data possession in multi-cloud
storage. However, their security model called unforgeability
for identity-based PDP is not strong enough for capturing
the property of soundness in the sense that, the challenged
data blocks are not allowed for TagGen queries in this model,
which indicates that the adversary cannot access the tags of
those blocks. This is clearly not consistent with the real cloud
storage where the cloud server is, in fact, storing the tags of
all data blocks. Moreover, the concrete identity-based PDP

3
protocol in [31] fails to achieve soundness, a basic security
requirement of PDP schemes. The reason is that, the hashed
value of each block is used for generating a tag of the block,
as a consequence, a malicious cloud server can keep only the
hash value of the blocks for generating a valid response to a
challenge.
Our Contributions. The contributions of this paper are
summarized as follows.
In an ID-based signature scheme, anyone with access
to the signer’s identity can verify a signature of the
signer. Similarly, in ID-based RDIC protocols, anyone
knowing a cloud user’s identity, say a third party auditor
(TPA), is able to check the data integrity on behalf of the
cloud user. Thus, public verifiability is more desirable
than private verification in ID-based RDIC, especially
for the resource constrained cloud users. In this case, the
property of zero-knowledge privacy is highly essential for
data confidentiality in ID-based RDIC protocols. Our first
contribution is to formalize the security model of zero-
knowledge privacy against the TPA in ID-based RDIC
protocols for the first time.
We fill the gap that there is no a secure and novel ID-
based RDIC scheme to date. Specifically, we propose
a concrete ID-based RDIC protocol, which is a novel
construction that is different from the previous ones,
by making use of the idea of a new primitive called
asymmetric group key agreement [32], [33]. To be more
specific, our challenge-response protocol is a two party
key agreement between the TPA and the cloud server, the
challenged blocks must be used when generating a shared
key by the cloud server, which a response to a challenge
from the TPA.
We provide detailed security proofs of the new protocol,
including the soundness and zero-knowledge privacy of
the stored data. Our security proofs are carried out in the
generic group model [34]. This is the first correct security
proof of ID-based RDIC protocol. Thus, the new security
proof method itself may be of independent interest.
We show the practicality of the proposal by developing
a prototype implementation of the protocol.
Organization: The rest of the paper are organized as
follows. In Section II, we review some preliminaries used in
ID-based RDIC construction. In Section III, we formalize the
system model and security model of ID-based RDIC protocols.
We describe our concrete construction of ID-based RDIC
protocol in section IV. We formally prove the correctness,
soundness and zero knowledge privacy of our ID-based RDIC
protocol in section V. We report the performance and imple-
mentation results in section VI. Section VII concludes our
paper.
II. PRELIMINARIES
In this section, we review some preliminary knowledge used
in this paper, including bilinear pairings and zero-knowledge
proof.
A. Bilinear Pairing
A bilinear pairing [28] maps a pair of group elements to
another group element. Specifically, let G
1
, G
2
be two cyclic
groups of order p. g
1
and g
2
denote generators of G
1
and G
2
respectively. A function e : G
1
×G
1
G
2
is called a bilinear
pairing if it has the following properties:
Bilinearity. For all u, v G
1
and x, y Z
p
, e(u
x
, v
y
) =
e(u, v)
xy
holds.
Non-Degeneracy. e(g, g) 6= 1
G
2
where 1
G
2
is the identity
element of G
2
.
Efficient Computation. e(u, v) can be computed efficiently
(in polynomial time) for all u, v G
1
.
B. Equality of Discrete Logarithm
Let G be a finite cyclic group such that |G| = q for some
prime q, and g
1
, g
2
be generators of G. The following protocol
[37] enables a prover P to prove to a verifier V that two
elements Y
1
, Y
2
have equal discrete logarithm to base g
1
and
g
2
respectively.
Commitment. P randomly chooses ρ Z
q
, computes T
1
=
g
ρ
1
, T
2
= g
ρ
2
and sends T
1
, T
2
to V.
Challenge. V randomly chooses a challenge c {0, 1}
λ
and sends c back to P.
Response. P computes z = ρ cx (mod q) and returns z
to V.
Verify. V accepts the proof if and only if T
1
= g
z
1
Y
c
1
T
2
=
g
z
2
Y
c
2
holds.
This protocol can be converted into a more efficient non-
interactive version, which is denoted as P OK{(x) : Y
1
=
g
x
1
Y
2
= g
x
2
}, by replacing the challenge with the hash of
the commitment, that is, c = H(T
1
||T
2
), where H is a secure
hash function.
C. ID-based Signature
An identity-based signature (IDS) scheme [38], [39] consists
of four polynomial-time, probabilistic algorithms described
below.
Setup(k). This algorithm takes as input the security param-
eter k and outputs the master secret key msk and the master
public key mpk.
Extract(msk, ID). This algorithm takes as input a user’s
identity ID, the master secret key msk and generates a secret
key usk for the user.
Sign(ID, usk, m). This algorithm takes as input a user’s
identity ID, a message m and the user’s secret key usk and
generates a signature σ of the message m.
Verify(ID, m, σ, mpk). This algorithm takes as input a
signature σ, a message m, an identity ID and the master public
key mpk, and outputs if the signature is valid or not.
III. SYSTEM MODEL AND SECURITY MODEL
In this section, we describe the system model and security
model of identity-based RDIC protocols.

4
A. ID-based RDIC System
Usually, data owners themselves can check the integrity
of their cloud data by running a two-party RDIC protocol.
However, the auditing result from either the data owner or
the cloud server might be regarded as biased in a two-
party scenario. The RDIC protocols with public verifiability
enable anyone to audit the integrity of the outsourced data. To
make the description of the publicly verifiable RDIC protocols
clearly, we assume there exits a third party auditor (TPA) who
has expertise and capabilities to do the verification work. With
this in mind, the ID-based RDIC architecture is illustrated
in Fig 1. Four different entities namely the KGC, the cloud
user, the cloud server and the TPA are involved in the system.
The KGC generates secret keys for all the users according to
their identities. The cloud user has large amount of files to be
stored on cloud without keeping a local copy, and the cloud
server has significant storage space and computation resources
and provides data storage services for cloud users. TPA has
expertise and capabilities that cloud users do not have and
is trusted to check the integrity of the cloud data on behalf
of the cloud user upon request. Each entity has their own
obligations and benefits respectively. The cloud server could
be self-interested, and for his own benefits, such as to maintain
a good reputation, the cloud server might even decide to hide
data corruption incidents to cloud users. However, we assume
that the cloud server has no incentives to reveal the hosted
data to TPA because of regulations and financial incentives.
The TPAs job is to perform the data integrity checking on
behalf the cloud user, but the TPA is also curious in the sense
that he is willing to learn some information of the users’ data
during the data integrity checking procedure.
Third Party Auditor
KGC
Privacy
againstTPA
Identity
Private
Identity
Key
Shared Data Flow
Data Owners
Security
against
Cloud Server
against
server
Fig. 1. The system model of identity-based RDIC
B. System Components and its Security
Six algorithms namely Setup, Extract, TagGen, Chal-
lenge, ProofGen and ProofCheck are involved in an
identity-based RDIC system.
Setup(1
k
) is a probabilistic algorithm run by the KGC.
It takes a security parameter k as input and outputs the
system parameters param and the master secret key msk.
Extract(param, msk, ID) is a probabilistic algorithm
run by the KGC. It takes the system parameters param,
the master secret key msk and a user’s identity ID
{0, 1}
as input, outputs the secret key sk
ID
that corre-
sponds to the identity ID.
TagGen(param, F, sk
ID
) is a probabilistic algorithm
run by the data owner with identity ID. It takes the
system parameters param, the secret key of the user
sk
ID
and a file F {0, 1}
to store as input, outputs
the tags σ = (σ
1
, · · · , σ
n
) of each file block m
i
, which
will be stored on the cloud together with the file F .
Challenge(param, F n, ID) is a randomized algorithm
run by the TPA. It takes the system parameters param,
the data owner’s identity ID, and a unique file name Fn
as input, outputs a challenge chal for the file named F n
on behalf of the user ID.
ProofGen(param, ID, chal, F, σ) is a probabilistic al-
gorithm run by the cloud server. It takes the system
parameters param, the challenge chal, the data owner’s
identity ID, the tag σ, the file F and its name F n as
input, outputs a data possession proof P of the challenged
blocks.
ProofCheck(param, ID, chal, P, F n) is a deterministic
algorithm run by the TPA. It takes the system parameters
param, the challenge chal, the data owner’s identity ID,
the file name F n and an alleged data possession proof
P as input, outputs 1 or 0 to indicate if the file F keeps
intact.
We consider three security properties namely completeness,
security against a malicious server (soundness), and privacy
against the TPA (perfect data privacy) in identity-based re-
mote data integrity checking protocols. Following the security
notions due to Shacham and Waters [7], an identity-based
RDIC scheme is called secure against a server if there exists
no polynomial-time algorithm that can cheat the TPA with
non-negligible probability and there exists a polynomial-time
extractor that can recover the file by running the challenges-
response protocols multiple times. Completeness states that
when interacting with a valid cloud server, the algorithm
of ProofCheck will accept the proof. Soundness says that a
cheating prover who can convince the TPA it is storing the
data file is actually storing that file. We now formalize the
security model of soundness for identity-based remote data
integrity checking below, where an adversary who plays the
role of the untrusted server and a challenger who represents a
data owner are involved.
Security against the Server. This security game captures
that an adversary cannot successfully generate a valid proof
without possessing all the blocks of a user ID corresponding
to a given challenge, unless it guesses all the challenged
blocks. The game consists of the following phases [35].
Setup: The challenger runs the Setup algorithm to obtain
the system parameters param and the master secret key
msk, and forwards param to the adversary, while keeps

Citations
More filters
Posted Content
TL;DR: This paper defines and explores proofs of retrievability (PORs), a POR scheme that enables an archive or back-up service to produce a concise proof that a user can retrieve a target file F, that is, that the archive retains and reliably transmits file data sufficient for the user to recover F in its entirety.
Abstract: In this paper, we define and explore proofs of retrievability (PORs). A POR scheme enables an archive or back-up service (prover) to produce a concise proof that a user (verifier) can retrieve a target file F, that is, that the archive retains and reliably transmits file data sufficient for the user to recover F in its entirety.A POR may be viewed as a kind of cryptographic proof of knowledge (POK), but one specially designed to handle a large file (or bitstring) F. We explore POR protocols here in which the communication costs, number of memory accesses for the prover, and storage requirements of the user (verifier) are small parameters essentially independent of the length of F. In addition to proposing new, practical POR constructions, we explore implementation considerations and optimizations that bear on previously explored, related schemes.In a POR, unlike a POK, neither the prover nor the verifier need actually have knowledge of F. PORs give rise to a new and unusual security definition whose formulation is another contribution of our work.We view PORs as an important tool for semi-trusted online archives. Existing cryptographic techniques help users ensure the privacy and integrity of files they retrieve. It is also natural, however, for users to want to verify that archives do not delete or modify files prior to retrieval. The goal of a POR is to accomplish these checks without users having to download the files themselves. A POR can also provide quality-of-service guarantees, i.e., show that a file is retrievable within a certain time bound.

1,783 citations

Journal ArticleDOI
TL;DR: The architecture and features of fog computing are reviewed and critical roles of fog nodes are studied, including real-time services, transient storage, data dissemination and decentralized computation, which are expected to draw more attention and efforts into this new architecture.
Abstract: Internet of Things (IoT) allows billions of physical objects to be connected to collect and exchange data for offering various applications, such as environmental monitoring, infrastructure management, and home automation. On the other hand, IoT has unsupported features (e.g., low latency, location awareness, and geographic distribution) that are critical for some IoT applications, including smart traffic lights, home energy management and augmented reality. To support these features, fog computing is integrated into IoT to extend computing, storage and networking resources to the network edge. Unfortunately, it is confronted with various security and privacy risks, which raise serious concerns towards users. In this survey, we review the architecture and features of fog computing and study critical roles of fog nodes, including real-time services, transient storage, data dissemination and decentralized computation. We also examine fog-assisted IoT applications based on different roles of fog nodes. Then, we present security and privacy threats towards IoT applications and discuss the security and privacy requirements in fog computing. Further, we demonstrate potential challenges to secure fog computing and review the state-of-the-art solutions used to address security and privacy issues in fog computing for IoT applications. Finally, by defining several open research issues, it is expected to draw more attention and efforts into this new architecture.

499 citations

Journal ArticleDOI
TL;DR: A sanitizer is used to sanitize the data blocks corresponding to the sensitive information of the file and transforms these data blocks’ signatures into valid ones for the sanitized file, which makes the file stored in the cloud able to be shared and used by others on the condition that thesensitive information is hidden, while the remote data integrity auditing is still able to been efficiently executed.
Abstract: With cloud storage services, users can remotely store their data to the cloud and realize the data sharing with others. Remote data integrity auditing is proposed to guarantee the integrity of the data stored in the cloud. In some common cloud storage systems such as the electronic health records system, the cloud file might contain some sensitive information. The sensitive information should not be exposed to others when the cloud file is shared. Encrypting the whole shared file can realize the sensitive information hiding, but will make this shared file unable to be used by others. How to realize data sharing with sensitive information hiding in remote data integrity auditing still has not been explored up to now. In order to address this problem, we propose a remote data integrity auditing scheme that realizes data sharing with sensitive information hiding in this paper. In this scheme, a sanitizer is used to sanitize the data blocks corresponding to the sensitive information of the file and transforms these data blocks’ signatures into valid ones for the sanitized file. These signatures are used to verify the integrity of the sanitized file in the phase of integrity auditing. As a result, our scheme makes the file stored in the cloud able to be shared and used by others on the condition that the sensitive information is hidden, while the remote data integrity auditing is still able to be efficiently executed. Meanwhile, the proposed scheme is based on identity-based cryptography, which simplifies the complicated certificate management. The security analysis and the performance evaluation show that the proposed scheme is secure and efficient.

182 citations


Cites background from "Identity-Based Remote Data Integrit..."

  • ...[24] constructed a remote data integrity auditing scheme with perfect data privacy preserving in identity-based cryptosystems....

    [...]

Journal ArticleDOI
TL;DR: A blockchain-based security architecture for distributed cloud storage, where users can divide their own files into encrypted data chunks, and upload those data chunks randomly into the P2P network nodes that provide free storage capacity is proposed.

155 citations

Journal ArticleDOI
TL;DR: Blockchain technique is utilized to develop a novel public auditing scheme for verifying data integrity in cloud storage, different from the existing works that involve three participatory entities, and shows that the proposed scheme can defend against malicious entities and the 51% attack.
Abstract: Cloud storage enables applications to efficiently manage their remote data but facing the risk of being tampered with. This paper utilizes blockchain technique to develop a novel public auditing scheme for verifying data integrity in cloud storage. In the proposed scheme, different from the existing works that involve three participatory entities, only two predefined entities (i.e. data owner and cloud service provider) who may not trust each other are involved, and the third party auditor for data auditing is removed. Specifically, data owners store the lightweight verification tags on the blockchain and generate a proof by constructing the Merkle Hash Tree using the hashtags to reduce the overhead of computation and communication for integrity verification. Besides, this work is able to achieve 100% confidence of auditing theoretically, as the hashtag of each data block is utilized to build the Merkle Hash Tree for the data integrity verification. Security analysis shows that the proposed scheme can defend against malicious entities and the 51% attack. Experimental results demonstrate the significant improvements on computation and communication.

123 citations

References
More filters
Book ChapterDOI
19 Aug 2001
TL;DR: This work proposes a fully functional identity-based encryption scheme (IBE) based on the Weil pairing that has chosen ciphertext security in the random oracle model assuming an elliptic curve variant of the computational Diffie-Hellman problem.
Abstract: We propose a fully functional identity-based encryption scheme (IBE). The scheme has chosen ciphertext security in the random oracle model assuming an elliptic curve variant of the computational Diffie-Hellman problem. Our system is based on the Weil pairing. We give precise definitions for secure identity based encryption schemes and give several applications for such systems.

7,083 citations

Book ChapterDOI
09 Dec 2001
TL;DR: A short signature scheme based on the Computational Diffie-Hellman assumption on certain elliptic and hyperelliptic curves is introduced, designed for systems where signatures are typed in by a human or signatures are sent over a low-bandwidth channel.
Abstract: We introduce a short signature scheme based on the Computational Diffie-Hellman assumption on certain elliptic and hyperelliptic curves. The signature length is half the size of a DSA signature for a similar level of security. Our short signature scheme is designed for systems where signatures are typed in by a human or signatures are sent over a low-bandwidth channel.

3,697 citations


"Identity-Based Remote Data Integrit..." refers background in this paper

  • ...Shacham and Waters [7] proposed the notion of compact proofs of retrievability by making use of publicly verifiable homomorphic authenticators from BLS signature [36]....

    [...]

  • ...[36] to sign a user’s identity ID ∈ {0, 1}∗ and obtain the user’s secret key....

    [...]

Proceedings ArticleDOI
28 Oct 2007
TL;DR: The provable data possession (PDP) model as discussed by the authors allows a client that has stored data at an untrusted server to verify that the server possesses the original data without retrieving it.
Abstract: We introduce a model for provable data possession (PDP) that allows a client that has stored data at an untrusted server to verify that the server possesses the original data without retrieving it. The model generates probabilistic proofs of possession by sampling random sets of blocks from the server, which drastically reduces I/O costs. The client maintains a constant amount of metadata to verify the proof. The challenge/response protocol transmits a small, constant amount of data, which minimizes network communication. Thus, the PDP model for remote data checking supports large data sets in widely-distributed storage system.We present two provably-secure PDP schemes that are more efficient than previous solutions, even when compared with schemes that achieve weaker guarantees. In particular, the overhead at the server is low (or even constant), as opposed to linear in the size of the data. Experiments using our implementation verify the practicality of PDP and reveal that the performance of PDP is bounded by disk I/O and not by cryptographic computation.

2,238 citations

Posted Content
TL;DR: Ateniese et al. as discussed by the authors introduced the provable data possession (PDP) model, which allows a client that has stored data at an untrusted server to verify that the server possesses the original data without retrieving it.
Abstract: We introduce a model for provable data possession (PDP) that allows a client that has stored data at an untrusted server to verify that the server possesses the original data without retrieving it. The model generates probabilistic proofs of possession by sampling random sets of blocks from the server, which drastically reduces I/O costs. The client maintains a constant amount of metadata to verify the proof. The challenge/response protocol transmits a small, constant amount of data, which minimizes network communication. Thus, the PDP model for remote data checking supports large data sets in widely-distributed storage systems. We present two provably-secure PDP schemes that are more efficient than previous solutions, even when compared with schemes that achieve weaker guarantees. In particular, the overhead at the server is low (or even constant), as opposed to linear in the size of the data. Experiments using our implementation verify the practicality of PDP and reveal that the performance of PDP is bounded by disk I/O and not by cryptographic computation.

2,127 citations

Posted Content
TL;DR: This paper defines and explores proofs of retrievability (PORs), a POR scheme that enables an archive or back-up service to produce a concise proof that a user can retrieve a target file F, that is, that the archive retains and reliably transmits file data sufficient for the user to recover F in its entirety.
Abstract: In this paper, we define and explore proofs of retrievability (PORs). A POR scheme enables an archive or back-up service (prover) to produce a concise proof that a user (verifier) can retrieve a target file F, that is, that the archive retains and reliably transmits file data sufficient for the user to recover F in its entirety.A POR may be viewed as a kind of cryptographic proof of knowledge (POK), but one specially designed to handle a large file (or bitstring) F. We explore POR protocols here in which the communication costs, number of memory accesses for the prover, and storage requirements of the user (verifier) are small parameters essentially independent of the length of F. In addition to proposing new, practical POR constructions, we explore implementation considerations and optimizations that bear on previously explored, related schemes.In a POR, unlike a POK, neither the prover nor the verifier need actually have knowledge of F. PORs give rise to a new and unusual security definition whose formulation is another contribution of our work.We view PORs as an important tool for semi-trusted online archives. Existing cryptographic techniques help users ensure the privacy and integrity of files they retrieve. It is also natural, however, for users to want to verify that archives do not delete or modify files prior to retrieval. The goal of a POR is to accomplish these checks without users having to download the files themselves. A POR can also provide quality-of-service guarantees, i.e., show that a file is retrievable within a certain time bound.

1,783 citations

Frequently Asked Questions (17)
Q1. What are the contributions mentioned in the paper "Identity-based remote data integrity checking with perfect data privacy preserving for cloud storage" ?

In this paper, the authors propose a new construction of identity-based ( ID-based ) RDIC protocol by making use of key-homomorphic cryptographic primitive to reduce the system complexity and the cost for establishing and managing the public key authentication framework in PKI based RDIC schemes. The authors then provide a concrete construction of ID-based RDIC scheme which leaks no information of the stored files to the verifier during the RDIC process. 

The verifier needs to perform 1 pairing operation, and 6 exponentiations in G1 to generate a challenge when using the proof of equality of discrete logarithm given in [37]. 

The Setup algorithm picks some random values and compute a modular exponentiation in G1, which costs 4.8ms, and the Extract algorithm needs to perform one modular exponentiation in G1 for generating the private key of a cloud user, which cost 0.1ms. 

The main computation cost of generating a proof by the cloud server is calculating the aggregation of σi, that is σ =∏ i∈I σ vi i , and the total cost is 2P+(2c−1)MG1+EG2+MG2.Communication cost. 

The time cost of off-line computation of generating tags for 1 MB file is 241.9 seconds while the on-online time cost is 20.3 seconds. 

An ID-RDIC scheme is called -sound if there exists an extraction algorithm Extr such that, for every adversaryA, whenever A, playing the soundness game, outputs an -admissible cheating prover P † on identity ID† and file name Fn†, Extr recovers F † from P †, i.e., Extr(param, ID†, Fn†, P †) = F , except possibly with negligible probability. 

In the random oracle model, the only way for A to successfully return m′ = H3( ∏ i∈I e(H2(fname||i)vi , rρ)) is to make a query to H3 with an element ξ in group G2. 

The challenger runs the Setup algorithm to obtain the system parameters param and the master secret key msk, and forwards param to the adversary, while keeps5 msk confidential. 

The dominated computation of data owner is generating tags for file blocks as σi = smiH2(fname||i)η , which is the most expensive operation in the protocol but fortunately it can be done offline. 

It takes the system parameters param, the challenge chal, the data owner’s identity ID, the file name Fn and an alleged data possession proof P as input, outputs 1 or 0 to indicate if the file F keeps intact. 

The TagGen algorithm is expensive and the authors show that the TagGen timing result consists of two phases, an off-line phase, where the data owner can preprocess H2(fname‖i)η without knowing the actual data; and an on-line phase, where the data owner needs to compute smi for each data block. 

The authors can see that it costs the verifier only about 3.0 seconds to verify a response and the server 0.7 seconds to generate a response when challenging 460 blocks. 

Due to the unforgeability of the identity-based signature, the authors can safely assume r (represented by the element ξη used in the verification is the one given to A during the TagGen query. 

In the second part, the authors test the most expensive algorithm TagGen of the protocol by increasing the file size from 200 KB to 2 MB, that is, from 10, 000 blocks to 100, 000 blocks accordingly, and record the time for TagGen. 

4 A. ID-based RDIC System Usually, data owners themselves can check the integrity of their cloud data by running a two-party RDIC protocol. 

To prove that the scheme preserve data privacy, the authors show how to construct a simulator S, having blackbox-access to verifier V , can simulate the remote data integrity checking protocol without the knowledge of the data file blocks {mi} nor their corresponding {σi}3. 

The implementation shows that generating tags is more expensive than other parts but fortunately, computing tags for a file is a one time task, as compared to challenging the outsourced data, which will be done repeatedly.