What are the future works mentioned in the paper "Secure knn computation on encrypted databases" ?

The possibility of integrating different schemes in the SCONEDB model to support a wide range of applications makes EDBMS a practical solution to service outsourcing. A future research issue is the systematic study on different operators that can be supported on an encrypted database w. r. t different security levels and goals. It is possible to extend the attack model to include other aspects, e. g., the amount of available computational power. How to include security goal as another component into the SCONEDB model is a subject for future work.

How many d+1 points can be recovered from E(DB)?

If there are d+1 points xi (1 ≤ i ≤ d+1) in P such that the vectors (xi,−0.5||xi||2) are linearly independent, then the attacker can recover DB from E(DB).

How many kNN equations can be used to solve M?

Since the authors know P = {x1, x2, ..., xd+1} and the corresponding encrypted values I(xi), the authors can set up the following equations to solve M : Mx̂i = I(xi) where x̂i = (xi,−0.5||xi||2)T for i = 1 to d +

How many queries do players need to perform to break a DRE?

5.2.4 Query encryption and result decryption0 1 2 3 4 5 650 60 70 80 90 100d'En cryp tion time (ins )For each query, player 2 needs to perform one encryption and k decryptions.

what is the weakness of the scheme?

A weakness of Scheme 1 is that given an enough number of points in P , a level-3 attacker can set up enough number of equations to solve for the unknowns in the transformation matrix M .

How many tasks are used to evaluate the performance of the schemes?

The authors evaluate the performance of the schemes under 4 tasks: (i) key generation; (ii) database encryption; (iii) kNN computation and (iv) query encryption and result decryption.

What is the scalar product of p and q?

The scalar product of p and q (represented by column vectors) can be represented as pT Iq, where pT is the transpose of p and The authoris a d × d identity matrix.

What is the weakness of the encryp-?

The weakness of this simple method is that the unencrypted query points q̂’s all lie on a d-dimensional hyperplane with the unit vector in the (d+1)-st dimension being the normal of the hyperplane.

What is the way to find a unique ordered set of Q?

Given a set P = {x1, x2, ..., x|P |} ⊂ DB in a level-2 attacker’s knowledge H, the authors want to find a unique ordered set Q ⊂ E(DB) such that sig(Q) = sig(P ).

What is the simplest way to solve the transformation matrices?

The equations for solving the transformation matrices are: MT1 p̂a = p ′ a and M T 2 p̂b = p ′ b, where M1 and M2 are two d′ × d′ unknown matrices.

what is the tradeoff between the two proposed schemes?

there is a tradeoff between Scheme 2, which is resilient to level-3 attacks, and Scheme 1, which allows more efficient query processing.

how to set up equations to solve for M?

the attacker can set up equations to solve for M and use Pv to verify the hypothesis: if the recovered database contains Pv, the hypothesis may be correct; otherwise, the hypothesis cannot be true.

How many known points in P can be broken to break a DRE?

The authors have shown that signature linking attack only requires a small number of known points in P to break a DRE and the attack cost is not expensive.

what is the scalar product of these two (d+1)-dimensional points?

The scalar product of these two (d+1)-dimensional points can be represented as(p1 − p2)T (rq) + (−0.5||p1||2 + 0.5||p2||2)r = 0.5r(||p2||2 − |p1||2 + 2(p1 − p2)T q) = 0.5r(d(p2, q)− d(p1, q))So, the condition is equivalent to 0.5r(d(p2, q)− d(p1, q)) > 0⇔ d(p2, q) > d(p1, q).

(Open Access) Secure kNN computation on encrypted databases (2009) | Wai Kit Wong

Q: What are the contributions in "Secure knn computation on encrypted databases" ?

In this paper the authors discuss the general problem of secure computation on an encrypted database and propose a SCONEDB ( Secure Computation ON an Encrypted DataBase ) model, which captures the execution and security requirements. As a case study, the authors focus on the problem of k-nearest neighbor ( kNN ) computation on an encrypted database. The authors use APSE to construct two secure schemes that support kNN computation on encrypted data ; each of these schemes is shown to resist practical attacks of a different background knowledge level, at a different overhead cost.

Secure kNN Computation on Encrypted Databases

W. K. Wong

The University of

Hong Kong

wkwong2@cs.hku.hk

David W. Cheung

The University of

Hong Kong

dcheung@cs.hku.hk

Ben Kao

The University of

Hong Kong

kao@cs.hku.hk

Nikos Mamoulis

The University of

Hong Kong

nikos@cs.hku.hk

ABSTRACT

Service providers like Google and Amazon are moving into

the SaaS (Software as a Service) business. They turn their

huge infrastructure into a cloud-computing environment and

aggressively recruit businesses to run applications on their

platforms. To enforce security and privacy on such a service

model, we need to protect the data running on the platform.

Unfortunately, traditional encryption methods that aim at

providing “unbreakable” protection are often not adequate

because they do not support the execution of applications

such as database queries on the encrypted data. In this

paper we discuss the general problem of secure computa-

tion on an encrypted database and propose a SCONEDB

(Secure Computation ON an Encrypted DataBase) model,

which captures the execution and security requirements. As

a case study, we focus on the problem of k-nearest neigh-

bor (kNN) computation on an encrypted database. We de-

velop a new asymmetric scalar-product-preserving encryp-

tion (ASPE) that preserves a special type of scalar product.

We use APSE to construct two secure schemes that support

kNN computation on encrypted data; each of these schemes

is shown to resist practical attacks of a diﬀerent background

knowledge level, at a diﬀerent overhead cost. Extensive per-

formance studies are carried out to evaluate the overhead

and the eﬃciency of the schemes.

Categories and Subject Descriptors

H.2.7 [Database Administration]: Security, integrity, and

protection

General Terms

Algorithms, Security

Keywords

Security, kNN, Encryption

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

SIGMOD’09, June 29–July 2, 2009, Providence, Rhode Island, USA.

1. INTRODUCTION

Emerging computing paradigms such as database service

outsourcing and utility computing (a.k.a. cloud comput-

ing) oﬀer attractive ﬁnancial and technological advantages.

These are drawing interests of enterprises in migrating their

computing operations, including DBMS’s, to service providers.

Nevertheless, many vocal consultants, including Gartner [7],

have issued warnings on the security threats in the cloud

computing model. Private information, which includes both

customer data and business information, should not be re-

vealed to unauthorized parties. In this paper we address

a very important problem of security in services outsourc-

ing: the elements of an encryption scheme and the execution

protocol for encrypted query processing. More speciﬁcally,

we study how sensitive data and queries should be trans-

formed in an encrypted database environment and how a ser-

vice provider processes encrypted queries on an encrypted

database without the plain data revealed. We call our model

of secure query processing SCONEDB (for Secure Compu-

tation ON an Encrypted DataBase).

The conventional way to deal with security threats is to

apply encryption on the plain data and to allow only au-

thorized parties to perform decryption. Unauthorized par-

ties, including the service provider, should not be able to

recover the plain data even if they can access the encrypted

database. Some previous works [2, 10, 11] have studied

this encryption problem in the outsourced database (ODB)

model. However, these studies are restricted to simple SQL

operations, e.g., exact match of attribute value in point

query [12]; comparisons between numeric values in range

query [2]. In practice, users often interact with a database

via applications in which queries are not easily expressible

in SQL.

Moreover, most of the previous methods were specially

engineered to work against one speciﬁc attack model. How-

ever, the problem should be studied with respect to various

security requirements, considering diﬀerent attacker capa-

bilities. In this paper we focus on k-nearest neighbor (kNN)

queries and show how various encryption schemes are de-

signed to support secure kNN query processing under dif-

ferent attacker capabilities. The kNN query is an impor-

tant database analysis operation, used as a standalone query

(e.g., in similarity search applications on top of multimedia

databases) or as a core module of common data mining tasks

(e.g., classiﬁcation and clustering).

Player 1

Player 2

EDBMS

E(DB)

()

D()

Query

Processor

D(R)

(q)E

(t)

Aux

Cryptanalysis

Player 3 (attacker)

Figure 1: The SCONEDB Model

1.1 The SCONEDB Model

Figure 1 shows our SCONEDB model for secure encrypted

database computation. In our model, player 1 is the owner

of a database DB on which player 2 wants to execute certain

queries. To take advantage of the computational resources of

a Service Provider (SP), instead of processing queries locally,

player 1 exports DB to an Encrypted Database Management

System (EDBMS) at which queries submitted by player 2

are processed. (In some applications, player 1 may not even

store DB and rely on the EDBMS as the sole repository of

the data.) For security, all the tuples in DB are encrypted

by player 1 before they are exported to the EDBMS. The

EDBMS maintains and processes the encrypted database,

E(DB ). Likewise, queries submitted by player 2 are also

encrypted. EDBMS executes an encrypted query on the en-

crypted database and returns to player 2 an encrypted result

R (e.g., R is a set of encrypted tuples which are the answer

of a kNN query). Player 2 applies a decryption function D

on R to obtain the plain results.

Under our SCONEDB model, players 1 and 2 have to

agree on a security protocol. In particular, they choose a

common encryption scheme. In SCONEDB, an encryption

scheme consists of the following components:

• A secret key K. A key K is required as a parameter

to the encryption and decryption processes (note that

a key may contain a number of components, e.g., RSA

requires a pair of numbers as the key). In our model,

the key is kept private to players 1 and 2.

• A database encryption function E

(). The encrypted

database E(DB) is obtained by encrypting each tuple

t in DB by E

(t, K).

• A query encryption function E

(). Each query q is

encrypted by E

(q, K) before it is submitted to the

EDBMS.

• A result decryption function D(). Each tuple

t in the

encrypted result R is decrypted by D(

t, K).

• A set of auxiliary operators Aux. An auxiliary operator

in Aux operates on the encrypted database to ob-

tain information for the purpose of answering queries.

Note that A

operates without knowing the secret key

K and hence has no access to the plain tuples. For ex-

ample, for kNN queries, A

may return the Euclidean

distance between an encrypted database tuple p and

an encrypted query tuple q.

The main goal in the SCONEDB model is to design an

encryption scheme in which Aux can operate on E(DB) to

support query processing.

1.2 Attack models

In our SCONEDB model, we assume that the EDBMS,

which is possibly located at a third party (e.g., a service

provider in the cloud), is not secure. Therefore, we as-

sume that an attacker (player 3) sees the environment of the

EDBMS. In particular, the attacker has accesses to the en-

crypted database, the encrypted queries, and the encrypted

results. Also, we assume that the attacker knows what en-

cryption scheme is being used. That is, he knows all the

components of the scheme except the key. These include

the encryption and decryption procedures (E

(), E

() and

D()) and the set of auxiliary operators Aux. We assume

that the attacker’s objective is to recover a plain database

⊆ DB. We assume the attacker is capable to exe-

cute PTIME cryptanalysis algorithms with respect to the

size of the encrypted database. Our objective is to deny

the attacker from obtaining DB

. Apart from E(DB), the

attacker may possess additional knowledge about the origi-

nal data. To better evaluate the strength of an encryption

scheme, we classify attackers into diﬀerent levels based on

the knowledge H they possess.

• Level 1: the attacker observes only the encrypted data-

base E(DB), i.e., H = hE(DB)i. This corresponds to

the ciphertext-only attack (COA) in cryptography [5].

In practice, there are applications accessed by secluded

users, for which others can hardly observe any infor-

mation other than the encrypted data.

• Level 2: Apart from E(DB ), the attacker knows a set

of plain tuples P in DB but he does not know the corre-

sponding encrypted values of those tuples in E(DB),

i.e., H = hE(DB), P i where P ⊂ DB . This corre-

sponds to the known-sample attack in database liter-

ature [15]. For example, if the attacker observes the

encrypted database of a bank and some of his sources

are customers of the bank, he then knows the values

of several tuples in the plain database.

• Level 3: Apart from E(DB), the attacker observes a

set of tuples P in DB and he knows the corresponding

encrypted values of those tuples, i.e., H = hE(DB ), P, Ii,

where P ⊂ DB and I(t) = E

(t, K) for all t ∈ P .

This corresponds to the known-plaintext attack (KPA)

in cryptography [5] or known input-output attack in

database literature [15]. For example, if the attacker

opens a new account at the bank and observes only

one new encrypted tuple afterwards, he can associate

the new account’s information (unencrypted) with the

encrypted value of the new tuple.

A higher-level attack is more powerful than a lower-level

one. If an encryption scheme resists a higher level attack,

it resists a lower level one as well. Among the 3 attack lev-

els deﬁned, we remark that level-2 attacks capture practical

scenarios. This is because in some applications, it is not

diﬃcult to observe a small number of plain database tuples

(e.g., by artiﬁcially inserting “spy” tuples in DB).

We assume the attacker cannot observe the plain queries

in all cases. In particular, we do not allow the attacker

to disguise as player 2 and submit queries to the database.

Note that level-3 attacks are rare in practice, since it is not

easy for someone who does not hold the encryption key to

associate known plain tuples to their encrypted values.

1.3 kNN on SCONEDB model

In this paper we focus on kNN queries and illustrate how

an encryption scheme (which includes the above ﬁve com-

ponents) can be developed to securely support kNN appli-

cations under the SCONEDB model. A kNN query searches

for k points in a database that are the nearest to a given

query point q. Note that each database tuple can be mod-

eled as a multi-dimensional point, if we consider some of

its attributes as dimensions and their values as their coor-

dinates. One approach to securely support kNN is to use

distance-preserving transformation (DPT) to encrypt data

points [20] so that the distance between any two encrypted

points in E(DB ) is the same as that between the corre-

sponding original points in DB. Given this property, kNN

can be computed on the encrypted database. Unfortunately,

such transformation is shown to be not secure in practice. If

an attacker can access the DPT-encrypted database E(DB)

and knows a few points in the plain database DB, he can

recover DB entirely [15].

A similar problem on kNN computation at an untrusted

platform is studied for location based services (LBS) [8, 17,

9, 13], where users submit queries to an untrusted server

which holds the data. The focus of such applications is on

protecting the privacy of users (query content), since the

database is assumed to be owned by the server [9]. While

some studies in LBS also address the privacy of records in the

database, k-anonymity is adopted as the standard to protect

the database [8, 17]. We remark that k-anonymity has a

diﬀerent security goal compared to our model; k-anonymity

aims at preventing an attacker to identify an individual from

the database, but the content in the database may be ex-

posed. In addition, most of these models require the exis-

tence of a trusted intermediate party (location anonymizer),

which handles the data and query transformation. This

party, except from being a single point of attack, compro-

mises performance as every query and result has to pass

through it. In this paper we seek for alternative encryption

schemes that protect data security and at the same time

they return accurate kNN results to users.

In the rest of the paper, we study various encryption

schemes and analyze their vulnerability to diﬀerent levels of

attacks. In Section 2, we show that if the distance between

any two points in the plain database DB can be determined

from the points’ encrypted values in E(DB) (a property we

call distance recoverability), then the encryption scheme is

vulnerable to level-2 attacks. This observation leads us to

develop an asymmetric scalar-product-preserving encryption

(ASPE) that is not distance-recoverable (Section 3). ASPE

can be used to construct a scheme to support kNN compu-

tation that resists level-2 attacks. In Section 4, we describe

how we can extend this scheme to resist level-3 attacks, how-

ever, with additional overhead. Section 5 empirically evalu-

ates the proposed schemes. In Section 6, we brieﬂy discuss

how our encryption scheme eﬀectively transforms the kNN

problem into a top-k problem, and how that leads to eﬃ-

cient solutions to the problem of secure kNN computation.

Section 7 reviews related work. Finally, Section 8 further

discusses our SCONEDB model and concludes the paper.

2. DISTANCE-RECOVERABLE ENCRYPTION

In kNN computation, distances between database points

to a query point are computed for ﬁnding the nearer neigh-

bors to the query point. To solve the secure kNN problem, it

is natural to consider adopting an encryption scheme that al-

lows the system to compute d(p

, p

) on E(DB ) for database

points p

and p

in DB. kNN can then be computed eﬃ-

ciently w.r.t. such a scheme. However, we show in this

section that no encryption scheme is secure against level-2

attacks if it allows distance computation as suggested above.

We start with the deﬁnition of distance recoverability.

Definition 1. (Distance-recoverable encryption (DRE))

Given an encryption function E and a key K, let E(p, K)

be the encrypted value of a point p in DB. E is distance-

recoverable if and only if there exists a computational proce-

dure f such that ∀p

, p

, K, f(E(p

, K), E(p

, K)) = d(p

, p

A DPT [20] is an example of DRE. This is because, by def-

inition, a DPT preserves distances in the transformed space.

Hence, if E is a DPT, we have d(E(p

, K), E(p

, K)) =

d(p

, p

). So, f is simply the Euclidean distance. A DPT

transforms the space by rotations and translations. For a

point p in DB represented as a column vector, the encrypted

value E(p, K) of p w.r.t. a DPT E can be expressed as

Np + t, where N is a d × d orthogonal matrix and t is a

d-dimensional column vector. Distance between points is

preserved, i.e., d(p

, p

) = d(E(p

, K), E(p

, K)). So, DPT

supports eﬃcient kNN computations. Here, N and t to-

gether form the encryption key K. Regrading level-1 at-

tacks, the attacker cannot recover DB since he does not

know N or t [20]. So, DPT is a scheme that resists level-1

attacks. Note that in our model, we assume that the at-

tacker knows E. Therefore, if E is a DRE, we assume that

the attacker knows f as well. For example, if E is a DPT,

the attacker knows that f is the Euclidean distance function

d(). However, we will show that DRE, and hence DPT, is

not secure under level-2 or level-3 attacks. We ﬁrst show

how to attack DRE at level-3.

Theorem 1. Assume a DRE E is used to encrypt DB to

get E(DB). A level-3 attacker with H = hE(DB), P, Ii can

recover DB if P contains at least d + 1 points x

(1 ≤ i ≤

d + 1) such that the set of vectors {x

− x

|2 ≤ j ≤ d + 1}

are linearly independent.

Proof. Since the encryption is a DRE, the distance be-

tween any two points p and q, d(p, q), can be computed

by the attacker using f (E(p, K), E(q, K)). Suppose the at-

tacker wants to ﬁnd the original value of an encrypted point

∈ E(DB ). Let the set of known points in P be {x

, x

, ...,

d+1

} and y be the original value of y

before encryption.

He can set up d + 1 equations: d(x

, y) = f (I(x

), y

) for

i = 1 to d + 1. Note that the RHS of the equations are

known numeric values to the attacker. Each equation thus

represents a d-dimensional hypersphere. The solution of y

lies on the intersection of the hyperspheres. Since y exists

in the database, a solution must exist. We can show that

if the set of vectors {x

− x

|2 ≤ j ≤ d + 1} are linearly

independent, the d + 1 hypersheres intersect at exactly one

point (see Appendix A). So, y can be uniquely determined.

Hence the attacker can recover the entire database.

The level-3 attack shown above is independent of the im-

plementation of DRE. So, no DRE (e.g., DPT) can survive

this level-3 attack. Furthermore, we can show that DRE

has poor resistance to level-2 attacks by showing that the

attacker can “upgrade” his level-2 knowledge to level-3 using

signature linking attack.

Let us explain signature linking attack. At level-2, H =

hE(DB ), P i, the attacker constructs the signature of P by

the pairwise distances between every two points in P . Sup-

pose the points in P are ordered and P = {x

, x

, ..., x

|P |

The signature of P , sig(P ), is a vector of size

|P |

the form (d(x

, x

), d(x

, x

), ..., d(x

, x

|P |

), d(x

, x

), ...,

d(x

|P |−1

, x

|P |

)). The attacker tries to ﬁnd an ordered set of

encrypted points Q in E(DB), such that |Q| = |P | and Q

gives the same signature as P . Let Q = {x

, x

, ..., x

|P |

sig(Q) is (f(x

, x

), f (x

, x

), ..., f(x

, x

|P |

), f (x

, x

), ...,

f(x

|P |−1

, x

|P |

)). If there is only one set Q with a matching

signature, the attacker can conclude that x

is the encrypted

value of x

, i.e., he can construct I with I(x

) = x

for all

∈ P . With this I, H = hE(DB), P, Ii, and the attacker

can now carry out a level-3 attack.

The success of signature linking attack rests on two issues:

(1) Is it easy to ﬁnd Q? (2) Is it likely that another set,

say Q

, which is not the transformed set of the points in P ,

happens to give the same signature? (We call this a signa-

ture collision.) For the ﬁrst issue, we remark that although

the search space is huge

, it can be pruned very eﬀectively.

For example, given two encrypted points x

and x

such

that f(x

, x

) 6∈ sig(P ), we know that Q cannot contain

both x

and x

. For the second issue, we can show that the

probability of signature collision is generally very small (see

Appendix B). Also, if multiple Qs are found with the same

signature as P , the attacker can increase the size of P to

lower the likelihood of collision and repeat the attack. In

Section 5, we will report experiments for evaluating the fea-

sibility of signature linking attack in terms of the number of

points in P required and its computation cost. The results

conﬁrms that the attacker can easily recover the database

at an aﬀordable cost.

3. ASYMMETRIC SCALAR-PRODUCT

-PRESERVING ENCRYPTION

From our discussion, we observe that the weakness of DRE

comes from the fact that the attacker is able to recover

distance information from the encrypted database. More

speciﬁcally, given any two points p

and p

in DB, their

distance d(p1, p2) can be determined from their encrypted

values E

, K) and E

, K). These distances allow the

attacker to compute signatures and thus to apply the sig-

nature linking attack. To resist level-2 attacks, we need an

If there are n points in the database, there are

|P |

can-

didate Q sets to examine.

encryption function that does not reveal distance informa-

tion. For kNN search, we observe that exact distance com-

putation is not necessary. Rather, we only need a distance

comparison operation. Given two points p

, p

in DB, we

must decide which of the two points is nearer to a query

point q. Note that,

d(p

, q) ≥ d(p

, q)

||p

− 2p

· q + ||q||

≥

||p

− 2p

· q + ||q||

||p

− ||p

+ 2(p

− p

) · q ≥ 0 (1)

where ||p|| represents the Euclidean norm of p, and · repre-

sents scalar product. ||p||

can be represented by p · p. So,

the inequality is decomposed to a number of scalar prod-

uct computations. This suggests a scalar-product-preserving

encryption E

spp

, i.e., ∀p

, p

∈ DB , p

· p

= E

spp

, K) ·

spp

, K), for kNN computation. Unfortunately, a scalar-

product-preserving encryption is also distance-recoverable

and hence is not secure against level-2 attacks.

Theorem 2. Scalar-product-preserving encryption is distance-

recoverable.

Proof. Let p

(resp. p

) be the encrypted point of p

(resp. p

) in DB. We deﬁne the function f by

f(p

, p

) =

· p

− 2(p

· p

) + p

· p

(2)

Since the encryption preserves scalar product, we have RHS

· p

− 2(p

· p

) + p

· p

= d(p

, p

There are three types of scalar products: (type-1) scalar

product of a database point with itself (e.g., ||p||

); (type-

2) scalar product of a database point with the query point;

(type-3) scalar product of two diﬀerent database points p

and p

. We observe that Eq. 1 consists of type-1 and type-

2 products but not type-3 products, which are essential in

Eq. 2. If an encryption preserves only type-1 and type-2

products but not type-3 products, then we can compare the

distances d(p

, q) and d(p

, q) by Eq. 1, but the attacker

cannot recover distances using Eq. 2. Furthermore, for each

point p in the database, its corresponding type-1 scalar prod-

uct, ||p||

, is ﬁxed. Hence, if player 1 pre-computes all type-

1 scalar products and make them available (e.g., by stor-

ing them in the database) for kNN query processing, then

the encryption needs not preserve type-1 products either.

In summary, we need an encryption that preserves type-2

but not type-1 or type-3 scalar products. With the pre-

computed type-1 scalar products, we can verify the inequal-

ity of Eq. 1 on the encrypted database to implement the

distance comparison operation. Since the encryption does

not preserve type-1 or type-3 products, it is not distance-

recoverable by design. The encryption is thus resilient to

level-2 attacks.

Definition 2. (Asymmetric scalar-product-preserving en-

cryption (ASPE))

Let E be an encryption function and E(p, K) be the en-

crypted value of a point p given a key K. E is an ASPE

if and only if E preserves type-2 scalar products but not the

other two types, i.e.,

(i) p

· q = E(p

, K) · E(q, K) for any p

in DB and any

query point q and

(ii) p

· p

6= E(p

, K) · E(p

, K) for any p

and p

in DB.

In Deﬁnition 2, we require that the encrypted value of a

query q not equal to that of any point p

in DB, even when

q = p

. This suggests that query points and database points

should be encrypted diﬀerently. That is the encryption func-

tions E

() and E

() in the encryption scheme should be

diﬀerent.

The scalar product of p and q (represented by column vec-

tors) can be represented as p

Iq, where p

is the transpose

of p and I is a d × d identity matrix. I can be decom-

posed to MM

−1

for any invertible matrix M , i.e., p

q =

M)(M

−1

q). If we set p

= E

(p, K) = M

p (resp.

= E

(q, K) = M

−1

q), it is not possible for one to de-

termine the value of p (resp. q), from p

(resp. q

) with-

out knowing M. Also, p

= p

−1

q = p

q, i.e.,

type-2 scalar product is preserved. Suppose p

and p

are

the encrypted points of p

and p

in DB respectively, then

= p

, which is not equal to p

in general.

Type-1 and type-3 scalar products are therefore not pre-

served. So, we can implement ASPE by using M and M

−1

as the transformations for database points and queries, re-

spectively.

3.1 A Secure Scheme Against Level-2 Attacks

We have described a special encryption function ASPE

that preserves type-2 scalar products; together with the pre-

computed type-1 scalar products, we can perform distance

comparisons to ﬁnd the neighbors of a query point. How-

ever, if the type-1 product ||p|| is revealed to the attacker,

he knows that p lies on a hypersphere that is centered at the

origin with a radius ||p||. Although the exact location of p

is unknown, the information revealed partially compromises

security. In this section, we show how we hide this infor-

mation by “encrypting” ||p|| and how the EDBMS computes

kNN on such encrypted data.

Our idea is to treat a pre-computed type-1 scalar product

||p||

as the (d+1)-st dimension of the point p. More specif-

ically, given a (d-dimensional) database point p, we create

a (d+1)-dimensional point ˆp. The ﬁrst d dimensions of ˆp

are those of p, and the (d+1)-st dimension of ˆp is set to

−0.5||p||

. (We multiply ||p||

with this factor to facilitate

distance comparisons, as shown later by Theorem 3.) The

extended database points are then transformed (encrypted)

using ASPE.

Similarly, we need to extend a query q to a (d+1)-dimensional

point ˆq before applying ASPE. The simplest way is to set

the (d+1)-st dimension of ˆq to 1. The weakness of this

simple method is that the unencrypted query points ˆq’s all

lie on a d-dimensional hyperplane with the unit vector in

the (d+1)-st dimension being the normal of the hyperplane.

Since APSE is a linear transformation, the encrypted query

points all lie on a d-dimensional hyperplane in the trans-

formed space as well. The attacker can determine the nor-

mal of that hyperplane in the transformed space. By con-

sidering the normal in the original space and the normal in

the transformed space, the attacker obtains some level-3-like

information, which is undesirable.

To avoid this problem, we introduce a random factor. For

each query q, we generate a random number r > 0 and scale

ˆq by r, i.e., ˆq = r(q

, 1)

. We will show in Theorem 3 that

this scaling does not aﬀect the correctness of the distance

comparison operation.

We summarize in Scheme 1 the procedures of the encryp-

• Key: a (d + 1) × (d + 1) invertible matrix M.

• Tuple encryption function E

: Consider a database

point p. (1) Create a (d+1)-dimensional point ˆp =

, −0.5||p||

)

. (2) The encrypted point p

= M

ˆp.

• Query encryption function E

: Consider a query point

q. (1) Generate a random number r > 0. Create a (d+1)-

dimensional point ˆq = r(q

, 1)

. (3) The encrypted query

point q

= M

−1

ˆq.

• Distance comparison operator A

: Let p

and p

the encrypted points of p

and p

respectively. To determine

whether p

is nearer to a query point q than p

is, the system

checks whether (p

− p

) · q

> 0, where q

is the encrypted

point of q.

• Decryption function D: Consider an encrypted point

. The original point p = π

−1

where π

is a d×(d+1)

matrix which projects on the ﬁrst d dimensions and π

, 0) where I

is the d × d identity matrix.

Scheme 1. ASPE

tion scheme using ASPE.

Theorem 3. Suppose p

, p

and q

are the encrypted

points of the database points p

, p

and the query point

q, respectively, Scheme 1 correctly determines whether p

is closer to q than p

is by evaluating (p

− p

) · q

> 0.

Proof. Note that

− p

) · q

= (p

− p

)

= (M

ˆp

− M

ˆp

)

−1

ˆq

= (ˆp

− ˆp

)

ˆq.

The scalar product of these two (d+1)-dimensional points

can be represented as

− p

)

(rq) + (−0.5||p

+ 0.5||p

= 0.5r(||p

− |p

+ 2(p

− p

)

= 0.5r(d(p

, q) − d(p

, q))

So, the condition is equivalent to

0.5r(d(p

, q) − d(p

, q)) > 0 ⇔ d(p

, q) > d(p

, q).

3.2 Cost and security analysis

In this section, we analyze the cost of Scheme 1 and study

whether the scheme can resist level-2 and level-3 attacks.

First, the cost:

• Encryption and decryption: To encrypt and decrypt,

we perform two kinds of operations: (1) multiplication

of an O(d) × O(d) matrix and an O(d)-dimensional

point, which takes O(d

) time, and (2) computation

of the Euclidean norm of an O(d)-dimensional point,

which takes O(d) time. Computing E

() requires both

There is a special case that if an encrypted point is the

origin of the transformed space, the corresponding un-

encrypted point is the origin of the original space. In order

to avoid this special inference, we can perform a translation

before applying Scheme 1. In that case, the origin is trans-

lated to a random point O

. This translation does not aﬀect

the correctness of the scheme.

Secure kNN computation on encrypted databases

Figures

Citations

Privacy-Preserving Multi-Keyword Ranked Search over Encrypted Cloud Data

A Secure and Dynamic Multi-Keyword Ranked Search Scheme over Encrypted Cloud Data

Enabling Personalized Search over Encrypted Outsourced Data with Efficiency Improvement

Achieving Efficient Cloud Search Services: Multi-Keyword Ranked Search over Encrypted Cloud Data Supporting Parallel Computing

A Privacy-Preserving and Copy-Deterrence Content-Based Image Retrieval Scheme in Cloud Computing

References

UCI Machine Learning Repository

k -anonymity: a model for protecting privacy

L-diversity: Privacy beyond k-anonymity

t-Closeness: Privacy Beyond k-Anonymity and l-Diversity

Privacy-preserving data mining

Related Papers (5)

Practical techniques for searches on encrypted data

Privacy-Preserving Multi-Keyword Ranked Search over Encrypted Cloud Data

Searchable symmetric encryption: improved definitions and efficient constructions

Public Key Encryption with Keyword Search

Public-key cryptosystems based on composite degree residuosity classes

Frequently Asked Questions (19)

Q1. What are the contributions in "Secure knn computation on encrypted databases" ?

Q2. What are the future works mentioned in the paper "Secure knn computation on encrypted databases" ?

Q3. How many d+1 points can be recovered from E(DB)?

Q4. What is the way to deal with security threats?

Q5. How many kNN equations can be used to solve M?

Q6. How many queries do players need to perform to break a DRE?

Q7. what is the weakness of the scheme?

Q8. How many tasks are used to evaluate the performance of the schemes?

Q9. What are the advantages of cloud computing?

Q10. What is the key required to the encryption and decryption processes?

Q11. What is the scalar product of p and q?

Q12. What is the weakness of the encryp-?

Q13. What is the way to find a unique ordered set of Q?

Q14. What is the simplest way to solve the transformation matrices?

Q15. what is the tradeoff between the two proposed schemes?

Q16. How can the authors show that DRE has poor resistance to level-2 attacks?

Q17. how to set up equations to solve for M?

Q18. How many known points in P can be broken to break a DRE?

Q19. what is the scalar product of these two (d+1)-dimensional points?