scispace - formally typeset
Open AccessProceedings ArticleDOI

Secure kNN computation on encrypted databases

Reads0
Chats0
TLDR
A new asymmetric scalar-product-preserving encryption (ASPE) that preserves a special type of scalar product and is shown to resist practical attacks of a different background knowledge level, at a different overhead cost.
Abstract
Service providers like Google and Amazon are moving into the SaaS (Software as a Service) business. They turn their huge infrastructure into a cloud-computing environment and aggressively recruit businesses to run applications on their platforms. To enforce security and privacy on such a service model, we need to protect the data running on the platform. Unfortunately, traditional encryption methods that aim at providing "unbreakable" protection are often not adequate because they do not support the execution of applications such as database queries on the encrypted data. In this paper we discuss the general problem of secure computation on an encrypted database and propose a SCONEDB Secure Computation ON an Encrypted DataBase) model, which captures the execution and security requirements. As a case study, we focus on the problem of k-nearest neighbor (kNN) computation on an encrypted database. We develop a new asymmetric scalar-product-preserving encryption (ASPE) that preserves a special type of scalar product. We use APSE to construct two secure schemes that support kNN computation on encrypted data; each of these schemes is shown to resist practical attacks of a different background knowledge level, at a different overhead cost. Extensive performance studies are carried out to evaluate the overhead and the efficiency of the schemes.

read more

Content maybe subject to copyright    Report

Secure kNN Computation on Encrypted Databases
W. K. Wong
The University of
Hong Kong
wkwong2@cs.hku.hk
David W. Cheung
The University of
Hong Kong
dcheung@cs.hku.hk
Ben Kao
The University of
Hong Kong
kao@cs.hku.hk
Nikos Mamoulis
The University of
Hong Kong
nikos@cs.hku.hk
ABSTRACT
Service providers like Google and Amazon are moving into
the SaaS (Software as a Service) business. They turn their
huge infrastructure into a cloud-computing environment and
aggressively recruit businesses to run applications on their
platforms. To enforce security and privacy on such a service
model, we need to protect the data running on the platform.
Unfortunately, traditional encryption methods that aim at
providing “unbreakable” protection are often not adequate
because they do not support the execution of applications
such as database queries on the encrypted data. In this
paper we discuss the general problem of secure computa-
tion on an encrypted database and propose a SCONEDB
(Secure Computation ON an Encrypted DataBase) model,
which captures the execution and security requirements. As
a case study, we focus on the problem of k-nearest neigh-
bor (kNN) computation on an encrypted database. We de-
velop a new asymmetric scalar-product-preserving encryp-
tion (ASPE) that preserves a special type of scalar product.
We use APSE to construct two secure schemes that support
kNN computation on encrypted data; each of these schemes
is shown to resist practical attacks of a different background
knowledge level, at a different overhead cost. Extensive per-
formance studies are carried out to evaluate the overhead
and the efficiency of the schemes.
Categories and Subject Descriptors
H.2.7 [Database Administration]: Security, integrity, and
protection
General Terms
Algorithms, Security
Keywords
Security, kNN, Encryption
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
SIGMOD’09, June 29–July 2, 2009, Providence, Rhode Island, USA.
Copyright 2009 ACM 978-1-60558-551-2/09/06 ...$5.00.
1. INTRODUCTION
Emerging computing paradigms such as database service
outsourcing and utility computing (a.k.a. cloud comput-
ing) offer attractive financial and technological advantages.
These are drawing interests of enterprises in migrating their
computing operations, including DBMS’s, to service providers.
Nevertheless, many vocal consultants, including Gartner [7],
have issued warnings on the security threats in the cloud
computing model. Private information, which includes both
customer data and business information, should not be re-
vealed to unauthorized parties. In this paper we address
a very important problem of security in services outsourc-
ing: the elements of an encryption scheme and the execution
protocol for encrypted query processing. More specifically,
we study how sensitive data and queries should be trans-
formed in an encrypted database environment and how a ser-
vice provider processes encrypted queries on an encrypted
database without the plain data revealed. We call our model
of secure query processing SCONEDB (for Secure Compu-
tation ON an Encrypted DataBase).
The conventional way to deal with security threats is to
apply encryption on the plain data and to allow only au-
thorized parties to perform decryption. Unauthorized par-
ties, including the service provider, should not be able to
recover the plain data even if they can access the encrypted
database. Some previous works [2, 10, 11] have studied
this encryption problem in the outsourced database (ODB)
model. However, these studies are restricted to simple SQL
operations, e.g., exact match of attribute value in point
query [12]; comparisons between numeric values in range
query [2]. In practice, users often interact with a database
via applications in which queries are not easily expressible
in SQL.
Moreover, most of the previous methods were specially
engineered to work against one specific attack model. How-
ever, the problem should be studied with respect to various
security requirements, considering different attacker capa-
bilities. In this paper we focus on k-nearest neighbor (kNN)
queries and show how various encryption schemes are de-
signed to support secure kNN query processing under dif-
ferent attacker capabilities. The kNN query is an impor-
tant database analysis operation, used as a standalone query
(e.g., in similarity search applications on top of multimedia
databases) or as a core module of common data mining tasks
(e.g., classification and clustering).

Player 1
Player 2
EDBMS
E(DB)
DB
E
T
()
E
Q
()
D()
t
K
Query
Processor
K
q
D(R)
R
E
Q
(q)E
T
(t)
Aux
Cryptanalysis
H
DB
A
Player 3 (attacker)
Figure 1: The SCONEDB Model
1.1 The SCONEDB Model
Figure 1 shows our SCONEDB model for secure encrypted
database computation. In our model, player 1 is the owner
of a database DB on which player 2 wants to execute certain
queries. To take advantage of the computational resources of
a Service Provider (SP), instead of processing queries locally,
player 1 exports DB to an Encrypted Database Management
System (EDBMS) at which queries submitted by player 2
are processed. (In some applications, player 1 may not even
store DB and rely on the EDBMS as the sole repository of
the data.) For security, all the tuples in DB are encrypted
by player 1 before they are exported to the EDBMS. The
EDBMS maintains and processes the encrypted database,
E(DB ). Likewise, queries submitted by player 2 are also
encrypted. EDBMS executes an encrypted query on the en-
crypted database and returns to player 2 an encrypted result
R (e.g., R is a set of encrypted tuples which are the answer
of a kNN query). Player 2 applies a decryption function D
on R to obtain the plain results.
Under our SCONEDB model, players 1 and 2 have to
agree on a security protocol. In particular, they choose a
common encryption scheme. In SCONEDB, an encryption
scheme consists of the following components:
A secret key K. A key K is required as a parameter
to the encryption and decryption processes (note that
a key may contain a number of components, e.g., RSA
requires a pair of numbers as the key). In our model,
the key is kept private to players 1 and 2.
A database encryption function E
T
(). The encrypted
database E(DB) is obtained by encrypting each tuple
t in DB by E
T
(t, K).
A query encryption function E
Q
(). Each query q is
encrypted by E
Q
(q, K) before it is submitted to the
EDBMS.
A result decryption function D(). Each tuple
ˆ
t in the
encrypted result R is decrypted by D(
ˆ
t, K).
A set of auxiliary operators Aux. An auxiliary operator
A
e
in Aux operates on the encrypted database to ob-
tain information for the purpose of answering queries.
Note that A
e
operates without knowing the secret key
K and hence has no access to the plain tuples. For ex-
ample, for kNN queries, A
e
may return the Euclidean
distance between an encrypted database tuple p and
an encrypted query tuple q.
The main goal in the SCONEDB model is to design an
encryption scheme in which Aux can operate on E(DB) to
support query processing.
1.2 Attack models
In our SCONEDB model, we assume that the EDBMS,
which is possibly located at a third party (e.g., a service
provider in the cloud), is not secure. Therefore, we as-
sume that an attacker (player 3) sees the environment of the
EDBMS. In particular, the attacker has accesses to the en-
crypted database, the encrypted queries, and the encrypted
results. Also, we assume that the attacker knows what en-
cryption scheme is being used. That is, he knows all the
components of the scheme except the key. These include
the encryption and decryption procedures (E
T
(), E
Q
() and
D()) and the set of auxiliary operators Aux. We assume
that the attacker’s objective is to recover a plain database
DB
A
DB. We assume the attacker is capable to exe-
cute PTIME cryptanalysis algorithms with respect to the
size of the encrypted database. Our objective is to deny
the attacker from obtaining DB
A
. Apart from E(DB), the
attacker may possess additional knowledge about the origi-
nal data. To better evaluate the strength of an encryption
scheme, we classify attackers into different levels based on
the knowledge H they possess.
Level 1: the attacker observes only the encrypted data-
base E(DB), i.e., H = hE(DB)i. This corresponds to
the ciphertext-only attack (COA) in cryptography [5].
In practice, there are applications accessed by secluded
users, for which others can hardly observe any infor-
mation other than the encrypted data.
Level 2: Apart from E(DB ), the attacker knows a set
of plain tuples P in DB but he does not know the corre-
sponding encrypted values of those tuples in E(DB),
i.e., H = hE(DB), P i where P DB . This corre-
sponds to the known-sample attack in database liter-
ature [15]. For example, if the attacker observes the
encrypted database of a bank and some of his sources
are customers of the bank, he then knows the values
of several tuples in the plain database.
Level 3: Apart from E(DB), the attacker observes a
set of tuples P in DB and he knows the corresponding
encrypted values of those tuples, i.e., H = hE(DB ), P, Ii,
where P DB and I(t) = E
T
(t, K) for all t P .
This corresponds to the known-plaintext attack (KPA)
in cryptography [5] or known input-output attack in
database literature [15]. For example, if the attacker
opens a new account at the bank and observes only
one new encrypted tuple afterwards, he can associate
the new account’s information (unencrypted) with the
encrypted value of the new tuple.

A higher-level attack is more powerful than a lower-level
one. If an encryption scheme resists a higher level attack,
it resists a lower level one as well. Among the 3 attack lev-
els defined, we remark that level-2 attacks capture practical
scenarios. This is because in some applications, it is not
difficult to observe a small number of plain database tuples
(e.g., by artificially inserting “spy tuples in DB).
We assume the attacker cannot observe the plain queries
in all cases. In particular, we do not allow the attacker
to disguise as player 2 and submit queries to the database.
Note that level-3 attacks are rare in practice, since it is not
easy for someone who does not hold the encryption key to
associate known plain tuples to their encrypted values.
1.3 kNN on SCONEDB model
In this paper we focus on kNN queries and illustrate how
an encryption scheme (which includes the above five com-
ponents) can be developed to securely support kNN appli-
cations under the SCONEDB model. A kNN query searches
for k points in a database that are the nearest to a given
query point q. Note that each database tuple can be mod-
eled as a multi-dimensional point, if we consider some of
its attributes as dimensions and their values as their coor-
dinates. One approach to securely support kNN is to use
distance-preserving transformation (DPT) to encrypt data
points [20] so that the distance between any two encrypted
points in E(DB ) is the same as that between the corre-
sponding original points in DB. Given this property, kNN
can be computed on the encrypted database. Unfortunately,
such transformation is shown to be not secure in practice. If
an attacker can access the DPT-encrypted database E(DB)
and knows a few points in the plain database DB, he can
recover DB entirely [15].
A similar problem on kNN computation at an untrusted
platform is studied for location based services (LBS) [8, 17,
9, 13], where users submit queries to an untrusted server
which holds the data. The focus of such applications is on
protecting the privacy of users (query content), since the
database is assumed to be owned by the server [9]. While
some studies in LBS also address the privacy of records in the
database, k-anonymity is adopted as the standard to protect
the database [8, 17]. We remark that k-anonymity has a
different security goal compared to our model; k-anonymity
aims at preventing an attacker to identify an individual from
the database, but the content in the database may be ex-
posed. In addition, most of these models require the exis-
tence of a trusted intermediate party (location anonymizer),
which handles the data and query transformation. This
party, except from being a single point of attack, compro-
mises performance as every query and result has to pass
through it. In this paper we seek for alternative encryption
schemes that protect data security and at the same time
they return accurate kNN results to users.
In the rest of the paper, we study various encryption
schemes and analyze their vulnerability to different levels of
attacks. In Section 2, we show that if the distance between
any two points in the plain database DB can be determined
from the points’ encrypted values in E(DB) (a property we
call distance recoverability), then the encryption scheme is
vulnerable to level-2 attacks. This observation leads us to
develop an asymmetric scalar-product-preserving encryption
(ASPE) that is not distance-recoverable (Section 3). ASPE
can be used to construct a scheme to support kNN compu-
tation that resists level-2 attacks. In Section 4, we describe
how we can extend this scheme to resist level-3 attacks, how-
ever, with additional overhead. Section 5 empirically evalu-
ates the proposed schemes. In Section 6, we briefly discuss
how our encryption scheme effectively transforms the kNN
problem into a top-k problem, and how that leads to effi-
cient solutions to the problem of secure kNN computation.
Section 7 reviews related work. Finally, Section 8 further
discusses our SCONEDB model and concludes the paper.
2. DISTANCE-RECOVERABLE ENCRYPTION
In kNN computation, distances between database points
to a query point are computed for finding the nearer neigh-
bors to the query point. To solve the secure kNN problem, it
is natural to consider adopting an encryption scheme that al-
lows the system to compute d(p
1
, p
2
) on E(DB ) for database
points p
1
and p
2
in DB. kNN can then be computed effi-
ciently w.r.t. such a scheme. However, we show in this
section that no encryption scheme is secure against level-2
attacks if it allows distance computation as suggested above.
We start with the definition of distance recoverability.
Definition 1. (Distance-recoverable encryption (DRE))
Given an encryption function E and a key K, let E(p, K)
be the encrypted value of a point p in DB. E is distance-
recoverable if and only if there exists a computational proce-
dure f such that p
1
, p
2
, K, f(E(p
1
, K), E(p
2
, K)) = d(p
1
, p
2
).
A DPT [20] is an example of DRE. This is because, by def-
inition, a DPT preserves distances in the transformed space.
Hence, if E is a DPT, we have d(E(p
1
, K), E(p
2
, K)) =
d(p
1
, p
2
). So, f is simply the Euclidean distance. A DPT
transforms the space by rotations and translations. For a
point p in DB represented as a column vector, the encrypted
value E(p, K) of p w.r.t. a DPT E can be expressed as
Np + t, where N is a d × d orthogonal matrix and t is a
d-dimensional column vector. Distance between points is
preserved, i.e., d(p
1
, p
2
) = d(E(p
1
, K), E(p
2
, K)). So, DPT
supports efficient kNN computations. Here, N and t to-
gether form the encryption key K. Regrading level-1 at-
tacks, the attacker cannot recover DB since he does not
know N or t [20]. So, DPT is a scheme that resists level-1
attacks. Note that in our model, we assume that the at-
tacker knows E. Therefore, if E is a DRE, we assume that
the attacker knows f as well. For example, if E is a DPT,
the attacker knows that f is the Euclidean distance function
d(). However, we will show that DRE, and hence DPT, is
not secure under level-2 or level-3 attacks. We first show
how to attack DRE at level-3.
Theorem 1. Assume a DRE E is used to encrypt DB to
get E(DB). A level-3 attacker with H = hE(DB), P, Ii can
recover DB if P contains at least d + 1 points x
i
(1 i
d + 1) such that the set of vectors {x
j
x
1
|2 j d + 1}
are linearly independent.
Proof. Since the encryption is a DRE, the distance be-
tween any two points p and q, d(p, q), can be computed
by the attacker using f (E(p, K), E(q, K)). Suppose the at-
tacker wants to find the original value of an encrypted point
y
0
E(DB ). Let the set of known points in P be {x
1
, x
2
, ...,
x
d+1
} and y be the original value of y
0
before encryption.
He can set up d + 1 equations: d(x
i
, y) = f (I(x
i
), y
0
) for
i = 1 to d + 1. Note that the RHS of the equations are

known numeric values to the attacker. Each equation thus
represents a d-dimensional hypersphere. The solution of y
lies on the intersection of the hyperspheres. Since y exists
in the database, a solution must exist. We can show that
if the set of vectors {x
j
x
1
|2 j d + 1} are linearly
independent, the d + 1 hypersheres intersect at exactly one
point (see Appendix A). So, y can be uniquely determined.
Hence the attacker can recover the entire database.
The level-3 attack shown above is independent of the im-
plementation of DRE. So, no DRE (e.g., DPT) can survive
this level-3 attack. Furthermore, we can show that DRE
has poor resistance to level-2 attacks by showing that the
attacker can “upgrade” his level-2 knowledge to level-3 using
signature linking attack.
Let us explain signature linking attack. At level-2, H =
hE(DB ), P i, the attacker constructs the signature of P by
the pairwise distances between every two points in P . Sup-
pose the points in P are ordered and P = {x
1
, x
2
, ..., x
|P |
}.
The signature of P , sig(P ), is a vector of size
|P |
C
2
of
the form (d(x
1
, x
2
), d(x
1
, x
3
), ..., d(x
1
, x
|P |
), d(x
2
, x
3
), ...,
d(x
|P |−1
, x
|P |
)). The attacker tries to find an ordered set of
encrypted points Q in E(DB), such that |Q| = |P | and Q
gives the same signature as P . Let Q = {x
0
1
, x
0
2
, ..., x
0
|P |
}.
sig(Q) is (f(x
0
1
, x
0
2
), f (x
0
1
, x
0
3
), ..., f(x
0
1
, x
0
|P |
), f (x
0
2
, x
0
3
), ...,
f(x
0
|P |−1
, x
0
|P |
)). If there is only one set Q with a matching
signature, the attacker can conclude that x
0
i
is the encrypted
value of x
i
, i.e., he can construct I with I(x
i
) = x
0
i
for all
x
i
P . With this I, H = hE(DB), P, Ii, and the attacker
can now carry out a level-3 attack.
The success of signature linking attack rests on two issues:
(1) Is it easy to find Q? (2) Is it likely that another set,
say Q
0
, which is not the transformed set of the points in P ,
happens to give the same signature? (We call this a signa-
ture collision.) For the first issue, we remark that although
the search space is huge
1
, it can be pruned very effectively.
For example, given two encrypted points x
0
1
and x
0
2
such
that f(x
0
1
, x
0
2
) 6∈ sig(P ), we know that Q cannot contain
both x
0
1
and x
0
2
. For the second issue, we can show that the
probability of signature collision is generally very small (see
Appendix B). Also, if multiple Qs are found with the same
signature as P , the attacker can increase the size of P to
lower the likelihood of collision and repeat the attack. In
Section 5, we will report experiments for evaluating the fea-
sibility of signature linking attack in terms of the number of
points in P required and its computation cost. The results
confirms that the attacker can easily recover the database
at an affordable cost.
3. ASYMMETRIC SCALAR-PRODUCT
-PRESERVING ENCRYPTION
From our discussion, we observe that the weakness of DRE
comes from the fact that the attacker is able to recover
distance information from the encrypted database. More
specifically, given any two points p
1
and p
2
in DB, their
distance d(p1, p2) can be determined from their encrypted
values E
T
(p
1
, K) and E
T
(p
2
, K). These distances allow the
attacker to compute signatures and thus to apply the sig-
nature linking attack. To resist level-2 attacks, we need an
1
If there are n points in the database, there are
n
P
|P |
can-
didate Q sets to examine.
encryption function that does not reveal distance informa-
tion. For kNN search, we observe that exact distance com-
putation is not necessary. Rather, we only need a distance
comparison operation. Given two points p
1
, p
2
in DB, we
must decide which of the two points is nearer to a query
point q. Note that,
d(p
1
, q) d(p
2
, q)
p
||p
1
||
2
2p
1
· q + ||q||
2
p
||p
2
||
2
2p
2
· q + ||q||
2
||p
1
||
2
||p
2
||
2
+ 2(p
2
p
1
) · q 0 (1)
where ||p|| represents the Euclidean norm of p, and · repre-
sents scalar product. ||p||
2
can be represented by p · p. So,
the inequality is decomposed to a number of scalar prod-
uct computations. This suggests a scalar-product-preserving
encryption E
spp
, i.e., p
1
, p
2
DB , p
1
· p
2
= E
spp
(p
1
, K) ·
E
spp
(p
2
, K), for kNN computation. Unfortunately, a scalar-
product-preserving encryption is also distance-recoverable
and hence is not secure against level-2 attacks.
Theorem 2. Scalar-product-preserving encryption is distance-
recoverable.
Proof. Let p
0
1
(resp. p
0
2
) be the encrypted point of p
1
(resp. p
2
) in DB. We define the function f by
f(p
0
1
, p
0
2
) =
p
p
0
1
· p
0
1
2(p
0
1
· p
0
2
) + p
0
2
· p
0
2
(2)
Since the encryption preserves scalar product, we have RHS
=
p
p
1
· p
1
2(p
1
· p
2
) + p
2
· p
2
= d(p
1
, p
2
).
There are three types of scalar products: (type-1) scalar
product of a database point with itself (e.g., ||p||
2
); (type-
2) scalar product of a database point with the query point;
(type-3) scalar product of two different database points p
1
and p
2
. We observe that Eq. 1 consists of type-1 and type-
2 products but not type-3 products, which are essential in
Eq. 2. If an encryption preserves only type-1 and type-2
products but not type-3 products, then we can compare the
distances d(p
1
, q) and d(p
2
, q) by Eq. 1, but the attacker
cannot recover distances using Eq. 2. Furthermore, for each
point p in the database, its corresponding type-1 scalar prod-
uct, ||p||
2
, is fixed. Hence, if player 1 pre-computes all type-
1 scalar products and make them available (e.g., by stor-
ing them in the database) for kNN query processing, then
the encryption needs not preserve type-1 products either.
In summary, we need an encryption that preserves type-2
but not type-1 or type-3 scalar products. With the pre-
computed type-1 scalar products, we can verify the inequal-
ity of Eq. 1 on the encrypted database to implement the
distance comparison operation. Since the encryption does
not preserve type-1 or type-3 products, it is not distance-
recoverable by design. The encryption is thus resilient to
level-2 attacks.
Definition 2. (Asymmetric scalar-product-preserving en-
cryption (ASPE))
Let E be an encryption function and E(p, K) be the en-
crypted value of a point p given a key K. E is an ASPE
if and only if E preserves type-2 scalar products but not the
other two types, i.e.,
(i) p
i
· q = E(p
i
, K) · E(q, K) for any p
i
in DB and any
query point q and
(ii) p
i
· p
j
6= E(p
i
, K) · E(p
j
, K) for any p
i
and p
j
in DB.

In Definition 2, we require that the encrypted value of a
query q not equal to that of any point p
j
in DB, even when
q = p
j
. This suggests that query points and database points
should be encrypted differently. That is the encryption func-
tions E
T
() and E
Q
() in the encryption scheme should be
different.
The scalar product of p and q (represented by column vec-
tors) can be represented as p
T
Iq, where p
T
is the transpose
of p and I is a d × d identity matrix. I can be decom-
posed to MM
1
for any invertible matrix M , i.e., p
T
q =
(p
T
M)(M
1
q). If we set p
0
= E
T
(p, K) = M
T
p (resp.
q
0
= E
Q
(q, K) = M
1
q), it is not possible for one to de-
termine the value of p (resp. q), from p
0
(resp. q
0
) with-
out knowing M. Also, p
0T
q
0
= p
T
MM
1
q = p
T
q, i.e.,
type-2 scalar product is preserved. Suppose p
0
1
and p
0
2
are
the encrypted points of p
1
and p
2
in DB respectively, then
p
0T
1
p
0
2
= p
T
1
MM
T
p
2
, which is not equal to p
T
1
p
2
in general.
Type-1 and type-3 scalar products are therefore not pre-
served. So, we can implement ASPE by using M and M
1
as the transformations for database points and queries, re-
spectively.
3.1 A Secure Scheme Against Level-2 Attacks
We have described a special encryption function ASPE
that preserves type-2 scalar products; together with the pre-
computed type-1 scalar products, we can perform distance
comparisons to find the neighbors of a query point. How-
ever, if the type-1 product ||p|| is revealed to the attacker,
he knows that p lies on a hypersphere that is centered at the
origin with a radius ||p||. Although the exact location of p
is unknown, the information revealed partially compromises
security. In this section, we show how we hide this infor-
mation by “encrypting” ||p|| and how the EDBMS computes
kNN on such encrypted data.
Our idea is to treat a pre-computed type-1 scalar product
||p||
2
as the (d+1)-st dimension of the point p. More specif-
ically, given a (d-dimensional) database point p, we create
a (d+1)-dimensional point ˆp. The first d dimensions of ˆp
are those of p, and the (d+1)-st dimension of ˆp is set to
0.5||p||
2
. (We multiply ||p||
2
with this factor to facilitate
distance comparisons, as shown later by Theorem 3.) The
extended database points are then transformed (encrypted)
using ASPE.
Similarly, we need to extend a query q to a (d+1)-dimensional
point ˆq before applying ASPE. The simplest way is to set
the (d+1)-st dimension of ˆq to 1. The weakness of this
simple method is that the unencrypted query points ˆq’s all
lie on a d-dimensional hyperplane with the unit vector in
the (d+1)-st dimension being the normal of the hyperplane.
Since APSE is a linear transformation, the encrypted query
points all lie on a d-dimensional hyperplane in the trans-
formed space as well. The attacker can determine the nor-
mal of that hyperplane in the transformed space. By con-
sidering the normal in the original space and the normal in
the transformed space, the attacker obtains some level-3-like
information, which is undesirable.
To avoid this problem, we introduce a random factor. For
each query q, we generate a random number r > 0 and scale
ˆq by r, i.e., ˆq = r(q
T
, 1)
T
. We will show in Theorem 3 that
this scaling does not affect the correctness of the distance
comparison operation.
We summarize in Scheme 1 the procedures of the encryp-
Key: a (d + 1) × (d + 1) invertible matrix M.
Tuple encryption function E
T
: Consider a database
point p. (1) Create a (d+1)-dimensional point ˆp =
(p
T
, 0.5||p||
2
)
T
. (2) The encrypted point p
0
= M
T
ˆp.
Query encryption function E
Q
: Consider a query point
q. (1) Generate a random number r > 0. Create a (d+1)-
dimensional point ˆq = r(q
T
, 1)
T
. (3) The encrypted query
point q
0
= M
1
ˆq.
Distance comparison operator A
e
: Let p
0
1
and p
0
2
be
the encrypted points of p
1
and p
2
respectively. To determine
whether p
1
is nearer to a query point q than p
2
is, the system
checks whether (p
0
1
p
0
2
) · q
0
> 0, where q
0
is the encrypted
point of q.
Decryption function D: Consider an encrypted point
p
0
. The original point p = π
d
M
T
1
p
0
where π
d
is a d×(d+1)
matrix which projects on the first d dimensions and π
d
=
(I
d
, 0) where I
d
is the d × d identity matrix.
Scheme 1. ASPE
tion scheme using ASPE.
2
Theorem 3. Suppose p
0
1
, p
0
2
and q
0
are the encrypted
points of the database points p
1
, p
2
and the query point
q, respectively, Scheme 1 correctly determines whether p
1
is closer to q than p
2
is by evaluating (p
0
1
p
0
2
) · q
0
> 0.
Proof. Note that
(p
0
1
p
0
2
) · q
0
= (p
0
1
p
0
2
)
T
q
0
= (M
T
ˆp
1
M
T
ˆp
2
)
T
M
1
ˆq
= (ˆp
1
ˆp
2
)
T
ˆq.
The scalar product of these two (d+1)-dimensional points
can be represented as
(p
1
p
2
)
T
(rq) + (0.5||p
1
||
2
+ 0.5||p
2
||
2
)r
= 0.5r(||p
2
||
2
|p
1
||
2
+ 2(p
1
p
2
)
T
q)
= 0.5r(d(p
2
, q) d(p
1
, q))
So, the condition is equivalent to
0.5r(d(p
2
, q) d(p
1
, q)) > 0 d(p
2
, q) > d(p
1
, q).
3.2 Cost and security analysis
In this section, we analyze the cost of Scheme 1 and study
whether the scheme can resist level-2 and level-3 attacks.
First, the cost:
Encryption and decryption: To encrypt and decrypt,
we perform two kinds of operations: (1) multiplication
of an O(d) × O(d) matrix and an O(d)-dimensional
point, which takes O(d
2
) time, and (2) computation
of the Euclidean norm of an O(d)-dimensional point,
which takes O(d) time. Computing E
T
() requires both
2
There is a special case that if an encrypted point is the
origin of the transformed space, the corresponding un-
encrypted point is the origin of the original space. In order
to avoid this special inference, we can perform a translation
before applying Scheme 1. In that case, the origin is trans-
lated to a random point O
0
. This translation does not affect
the correctness of the scheme.

Citations
More filters
Journal ArticleDOI

Privacy-Preserving Multi-Keyword Ranked Search over Encrypted Cloud Data

TL;DR: This paper proposes a basic idea for the MRSE based on secure inner product computation, and gives two significantly improved MRSE schemes to achieve various stringent privacy requirements in two different threat models and further extends these two schemes to support more search semantics.
Journal ArticleDOI

A Secure and Dynamic Multi-Keyword Ranked Search Scheme over Encrypted Cloud Data

TL;DR: This paper constructs a special tree-based index structure and proposes a “Greedy Depth-first Search” algorithm to provide efficient multi-keyword ranked search over encrypted cloud data, which simultaneously supports dynamic update operations like deletion and insertion of documents.
Journal ArticleDOI

Enabling Personalized Search over Encrypted Outsourced Data with Efficiency Improvement

TL;DR: This paper study and solve the problem of personalized multi-keyword ranked search over encrypted data (PRSE) while preserving privacy in cloud computing with the help of semantic ontology WordNet, and proposes two PRSE schemes for different search intentions.
Journal ArticleDOI

A Privacy-Preserving and Copy-Deterrence Content-Based Image Retrieval Scheme in Cloud Computing

TL;DR: A unique watermark is directly embedded into the encrypted images by the cloud server before images are sent to the query user, and when image copy is found, the unlawful query user who distributed the image can be traced by the watermark extraction.
References
More filters
Journal ArticleDOI

k -anonymity: a model for protecting privacy

TL;DR: The solution provided in this paper includes a formal protection model named k-anonymity and a set of accompanying policies for deployment and examines re-identification attacks that can be realized on releases that adhere to k- anonymity unless accompanying policies are respected.
Journal ArticleDOI

L-diversity: Privacy beyond k-anonymity

TL;DR: This paper shows with two simple attacks that a \kappa-anonymized dataset has some subtle, but severe privacy problems, and proposes a novel and powerful privacy definition called \ell-diversity, which is practical and can be implemented efficiently.
Proceedings ArticleDOI

t-Closeness: Privacy Beyond k-Anonymity and l-Diversity

TL;DR: T-closeness as mentioned in this paper requires that the distribution of a sensitive attribute in any equivalence class is close to the distributions of the attribute in the overall table (i.e., the distance between the two distributions should be no more than a threshold t).
Journal ArticleDOI

Privacy-preserving data mining

TL;DR: This work considers the concrete case of building a decision-tree classifier from training data in which the values of individual records have been perturbed and proposes a novel reconstruction procedure to accurately estimate the distribution of original data values.
Related Papers (5)
Frequently Asked Questions (19)
Q1. What are the contributions in "Secure knn computation on encrypted databases" ?

In this paper the authors discuss the general problem of secure computation on an encrypted database and propose a SCONEDB ( Secure Computation ON an Encrypted DataBase ) model, which captures the execution and security requirements. As a case study, the authors focus on the problem of k-nearest neighbor ( kNN ) computation on an encrypted database. The authors use APSE to construct two secure schemes that support kNN computation on encrypted data ; each of these schemes is shown to resist practical attacks of a different background knowledge level, at a different overhead cost. 

The possibility of integrating different schemes in the SCONEDB model to support a wide range of applications makes EDBMS a practical solution to service outsourcing. A future research issue is the systematic study on different operators that can be supported on an encrypted database w. r. t different security levels and goals. It is possible to extend the attack model to include other aspects, e. g., the amount of available computational power. How to include security goal as another component into the SCONEDB model is a subject for future work. 

If there are d+1 points xi (1 ≤ i ≤ d+1) in P such that the vectors (xi,−0.5||xi||2) are linearly independent, then the attacker can recover DB from E(DB). 

The conventional way to deal with security threats is to apply encryption on the plain data and to allow only authorized parties to perform decryption. 

Since the authors know P = {x1, x2, ..., xd+1} and the corresponding encrypted values I(xi), the authors can set up the following equations to solve M : Mx̂i = I(xi) where x̂i = (xi,−0.5||xi||2)T for i = 1 to d + 

5.2.4 Query encryption and result decryption0 1 2 3 4 5 650 60 70 80 90 100d'En cryp tion time (ins )For each query, player 2 needs to perform one encryption and k decryptions. 

A weakness of Scheme 1 is that given an enough number of points in P , a level-3 attacker can set up enough number of equations to solve for the unknowns in the transformation matrix M . 

The authors evaluate the performance of the schemes under 4 tasks: (i) key generation; (ii) database encryption; (iii) kNN computation and (iv) query encryption and result decryption. 

Emerging computing paradigms such as database service outsourcing and utility computing (a.k.a. cloud computing) offer attractive financial and technological advantages. 

A key K is required as a parameter to the encryption and decryption processes (note that a key may contain a number of components, e.g., RSA requires a pair of numbers as the key). 

The scalar product of p and q (represented by column vectors) can be represented as pT Iq, where pT is the transpose of p and The authoris a d × d identity matrix. 

The weakness of this simple method is that the unencrypted query points q̂’s all lie on a d-dimensional hyperplane with the unit vector in the (d+1)-st dimension being the normal of the hyperplane. 

Given a set P = {x1, x2, ..., x|P |} ⊂ DB in a level-2 attacker’s knowledge H, the authors want to find a unique ordered set Q ⊂ E(DB) such that sig(Q) = sig(P ). 

The equations for solving the transformation matrices are: MT1 p̂a = p ′ a and M T 2 p̂b = p ′ b, where M1 and M2 are two d′ × d′ unknown matrices. 

there is a tradeoff between Scheme 2, which is resilient to level-3 attacks, and Scheme 1, which allows more efficient query processing. 

the authors can show that DRE has poor resistance to level-2 attacks by showing that the attacker can “upgrade” his level-2 knowledge to level-3 using signature linking attack. 

the attacker can set up equations to solve for M and use Pv to verify the hypothesis: if the recovered database contains Pv, the hypothesis may be correct; otherwise, the hypothesis cannot be true. 

The authors have shown that signature linking attack only requires a small number of known points in P to break a DRE and the attack cost is not expensive. 

The scalar product of these two (d+1)-dimensional points can be represented as(p1 − p2)T (rq) + (−0.5||p1||2 + 0.5||p2||2)r = 0.5r(||p2||2 − |p1||2 + 2(p1 − p2)T q) = 0.5r(d(p2, q)− d(p1, q))So, the condition is equivalent to 0.5r(d(p2, q)− d(p1, q)) > 0⇔ d(p2, q) > d(p1, q).