Proceedings Article•DOI•

Practical techniques for searches on encrypted data

Dawn Song¹, David Wagner¹, Adrian Perrig¹•Institutions (1)

14 May 2000-pp 44-55

TL;DR: This work describes the cryptographic schemes for the problem of searching on encrypted data and provides proofs of security for the resulting crypto systems, and presents simple, fast, and practical algorithms that are practical to use today.

read less

Abstract: It is desirable to store data on data storage servers such as mail servers and file servers in encrypted form to reduce security and privacy risks. But this usually implies that one has to sacrifice functionality for security. For example, if a client wishes to retrieve only documents containing certain words, it was not previously known how to let the data storage server perform the search and answer the query, without loss of data confidentiality. We describe our cryptographic schemes for the problem of searching on encrypted data and provide proofs of security for the resulting crypto systems. Our techniques have a number of crucial advantages. They are provably secure: they provide provable secrecy for encryption, in the sense that the untrusted server cannot learn anything about the plaintext when only given the ciphertext; they provide query isolation for searches, meaning that the untrusted server cannot learn anything more about the plaintext than the search result; they provide controlled searching, so that the untrusted server cannot search for an arbitrary word without the user's authorization; they also support hidden queries, so that the user may ask the untrusted server to search for a secret word without revealing the word to the server. The algorithms presented are simple, fast (for a document of length n, the encryption and search algorithms only need O(n) stream cipher and block cipher operations), and introduce almost no space and communication overhead, and hence are practical to use today.

...read moreread less

Summary (4 min read)

Jump to: [1 Introduction] – [2 Searching on Encrypted Data] – [4 Our Solution with Sequential Scan] – [4.1 Scheme I: The Basic Scheme] – [4.2 Scheme II: Controlled Searching] – [4.3 Scheme III: Support for Hidden Searches] – [4.4 Scheme IV: The Final Scheme] – [5.1 Other Practical Considerations] – [5.2 Supporting More Advanced Search Queries] – [5.3 Dealing with Variable-Length Words] – [5.4 Searching with an Encrypted Index] – [5.5 More Security Issues] – [6 Related Work] and [7 Conclusion]

1 Introduction

Today’s mail servers such as IMAP servers [11], file servers and other data storage servers typically must be fully trusted—they have access to the data, and hence must be trusted not to reveal it without authorization—which introduces undesirable security and privacy risks in applications.
The authors show how to support searching functionality without any loss of data confidentiality.
The techniques provide provable secrecy for encryption, in the sense that the untrusted server cannot learn anything about the plaintext given only the ciphertext.
The algorithms the authors present are simple and fast.
The authors may control the number of errors by adjusting a parameter in the encryption algorithm; each wrong position will be returned with probability about , so for a -word document, they expect to see about false matches.

2 Searching on Encrypted Data

The authors first define the problem of searching on encrypted data.
Each document can be divided up into ‘words’.
So the approach of using an index is more suitable for mostly-read-only data.
The authors adopt the standard definitions of security from the provable security literature [2], and they measure the strength of the cryptographic primitives in terms of the resources needed to break them.
The authors say that is a -secure pseudorandom generator if every algorithm with running time at most has advantage Adv .

4 Our Solution with Sequential Scan

The authors introduce their solution for searching with sequential scan.
The authors first start with a basic scheme and show that its encryption algorithm provides provable secrecy.
The authors then show how they can extend the first scheme to handle controlled searching and hidden searches.
The authors describe their final scheme which satisfies all the properties they mentioned earlier including query isolation at the end.

4.1 Scheme I: The Basic Scheme

Alice wants to encrypt a document which contains the sequence of words .
More specifically, the basic scheme is as follows.
Alice generates a sequence of pseudorandom values using some stream cipher (namely, the pseudorandom generator ), where each is bits long.
Another alternative is to choose a new key for each position independent of all other keys.
The basic scheme supports searches over the ciphertext in the following way: if Alice wants to search the word , she can tell Bob and the corresponding to each location where a word may occur.

4.2 Scheme II: Controlled Searching

Let be an additional pseudorandom function, which will be keyed independently of .
Suppose is a -secure pseudorandom function, is a -secure pseudorandom function, and is a -secure pseudorandom generator.
The authors can take this idea even further by using a hierarchical key management scheme.
Then she can reveal either (1) for each chapter of interest or (2) itself if she wishes to succinctly authorize Bob to search for in all the chapters.

4.3 Scheme III: Support for Hidden Searches

Suppose Alice would now like to ask Bob to search for a word but she is not willing to reveal to Bob.
The authors propose a simple extension to the above scheme to support this goal.
Note that is not allowed to use any randomness, and the computation of may depend only on and must not depend on the position in the document where is found.
After the pre-encryption phase, Alice has a sequence of -encrypted words .
Note that this allows Bob to search for without revealing itself.

4.4 Scheme IV: The Final Scheme

Careful readers may have noticed that Scheme III actually suffers from a small inadequacy: if Alice generates keys as then Alice can no longer recover the plaintext from just the ciphertext because she would need to know (more precisely, the last bits of ) before she can decrypt.
(Scheme II also has a similar inadequacy, but as the authors will show below, the best way to fix it is to introduce pre-encryption as in Scheme III.).
In the fixed scheme, the authors split the pre-encrypted word into two parts, ! , where !.
Alice to compute and thus finish the decryption.
Moreover, if the authors disclose one and consider the reduced sequence obtained by discarding all the values at positions where , then they obtain a -secure pseudorandom generator, where .

5.1 Other Practical Considerations

The authors can see that updates in this scheme are easy.
If Alice wants to add a new document into Bob’s data storage, she can simply encrypt it in the appropriate way and instruct Bob to append it to the already-stored ciphertext.
Moreover, since the keys can be generated hierarchically from a master key, the key storage and management is also very convenient:.
Alice only needs to remember one password, the master key.
The underlying technique of embedding information in pseudorandom bit streams may also be of independent interest: the authors speculate that this simple trick might prove useful for other applications, too.

5.2 Supporting More Advanced Search Queries

The schemes the authors presented earlier only address the problem of searching for a single word.
The authors show several ex- amples to illustrate that it is relatively easy to implement more advanced searching functionality using their scheme as a fundamental building block.
The authors can also support searches if the query is given as a regular expression using, e.g., wildcards in a limited form.
For many applications the purpose of the search is to find documents which contain a specific word, where the position or the number of occurrences are not relevant.
The authors add a count to each word, which counts how many times that word occurs previously in that document.

5.3 Dealing with Variable-Length Words

In their scheme, the minimal unit the authors can search for is an individual word.
One possibility is to pick a fixed-size block that is long enough to contain most words.
Such a padding scheme would introduce space inefficiency.
When words lengths may vary, it is important to hide the length information from the server, because revealing the length of each word might allow for statistical attacks.
Fortunately, in this case the server does not need to know the lengths to perform a search: he may just scan through the file and check for a match at each possible bit boundary.

5.4 Searching with an Encrypted Index

Sequential scan may not be efficient enough when the data size is large.
The interesting question is how to encrypt the index.
Alice may decrypt the encrypted entries and send Bob another request to retrieve the relevant documents.
Note that by keeping the lists of pointers in a fixed-size list, the authors are mainly preventing Bob from learning statistical information on the key words that he has not searched.
Note that a general disadvantage for index search is that whenever Alice changes her documents, she must update the index.

5.5 More Security Issues

In all their schemes, by allowing Bob to search for a word the authors effectively disclose to him a list of potential locations where might occur.
If the authors allow Bob to search for too many words, he may be able to use statistical techniques to start learning important information about the documents.
One possible defense is to decrease (so that false matches are more prevalent and thus Bob’s information about the plaintext is ‘noisy’), but the authors have not analyzed the costeffectiveness of this tradeoff in any detail.
In all the schemes the authors have discussed so far, they must trust Bob to return all the search results.
Even when this type of attack is present, it is possible to combine their scheme with hash tree techniques [17] to ensure the integrity of the data and detect such attacks, although a full description of this countermeasure is out of the scope of the paper.

7 Conclusion

The authors have described new techniques for remote searching on encrypted data using an untrusted server and provided proofs of security for the resulting crypto systems.
The techniques have a number of crucial advantages: they are provably secure; they support controlled and hidden search and query isolation; they are simple and fast (More specifically, for a document of length , the encryption and search algorithms only need stream cipher and block cipher operations); and they introduce almost no space and communication overhead.
The authors scheme is also very flexible, and it can easily be extended to support more advanced search queries.
The authors conclude that this provides a powerful new building block for the construction of secure services in the untrusted infrastructure.

Did you find this useful? Give us your feedback

Figures (3)

Content maybe subject to copyright Report

Practical Techniques for Searches on Encrypted Data



Dawn Xiaodong Song David Wagner Adrian Perrig



dawnsong, daw, perrig



@cs.berkeley.edu

University of California, Berkeley

Abstract

It is desirable to store data on data storage servers such

as mail servers and ﬁle servers in encrypted form to reduce

security and privacy risks. But this usually implies that one

has to sacriﬁce functionality for security. For example, if a

client wishes to retrieve only documents containing certain

words, it was not previously known how to let the data stor-

age server perform the search and answer the query without

loss of data conﬁdentiality.

In this paper, we describe our cryptographic schemes

for the problem of searching on encrypted data and pro-

vide proofs of security for the resulting crypto systems. Our

techniques have a number of crucial advantages. They are

provably secure: they provide provable secrecy for encryp-

tion, in the sense that the untrusted server cannot learn

anything about the plaintext when only given the cipher-

text; they provide query isolation for searches, meaning

that the untrusted server cannot learn anything more about

the plaintext than the search result; they provide controlled

searching, so that the untrusted server cannot search for an

arbitrary word without the user’s authorization; they also

support hidden queries, so that the user may ask the un-

trusted server to search for a secret word without revealing

the word to the server. The algorithms we present are sim-

ple, fast (for a document of length



, the encryption and

search algorithms only need







stream cipher and block

cipher operations), and introduce almost no space and com-

munication overhead, and hence are practical to use today.



We gratefully acknowledge support for this research from several US

government agencies. This research was suported in part by the Defense

Advanced Research Projects Agency under DARPA contract N6601-99-

28913 (under supervision of the Space and Naval Warfare Systems Center

San Diego), by the National Science foundation under grant FD99-79852,

and by the United States Postal Service under grant USPS 1025 90-98-C-

3513. Views and conclusions contained in this document are those of the

authors and do not necessarily represent the ofﬁcial opinion or policies,

either expressed or implied of the US government or any of its agencies,

DARPA, NSF, USPS.

1 Introduction

Today’s mail servers such as IMAP servers [11], ﬁle

servers and other data storage servers typically must be fully

trusted—they have access to the data, and hence must be

trusted not to reveal it without authorization—which intro-

duces undesirable security and privacy risks in applications.

Previous work shows how to build encrypted ﬁle systems

and secure mail servers, but typically one must sacriﬁce

functionality to ensure security. The fundamental problem

is that moving the computation to the data storage seems

very difﬁcult when the data is encrypted, and many com-

putation problems over encrypted data previously had no

practical solutions.

In this paper, we show how to support searching func-

tionality without any loss of data conﬁdentiality. An exam-

ple is where a mobile user with limited bandwidth wants

to retrieve all email containing the word “Urgent” from an

untrusted mail-storage server in the infrastructure. This is

trivial to do when the server knows the content of the data,

but how can we support search queries if we do not wish to

reveal all our email to the server?

Our answer is to present cryptographic schemes that en-

able searching on encrypted data without leaking any infor-

mation to the untrusted server.



Our techniques are provably secure. The techniques

provide provable secrecy for encryption, in the sense

that the untrusted server cannot learn anything about

the plaintext given only the ciphertext. The tech-

niques provide controlled searching, so that the un-

trusted server cannot search for a word without the

user’s authorization. The techniques support hidden

queries, so that the user may ask the untrusted server

to search for a secret word without revealing the word

to the server. The techniques also support query isola-

tion, meaning that the untrusted server learns nothing

more than the search result about the plaintext.



Our schemes are efﬁcient and practical. The algo-

rithms we present are simple and fast. More speciﬁ-

cally, for a document of length



, the encryption and

search algorithms only need







number of stream

cipher and block cipher operations. Our schemes in-

troduce essentially no space and communication over-

head. They are also ﬂexible and can be easily extended

to support more advanced searches.

Our schemes all take the form of probabilistic searching:

a search for the word



returns all the positions where



occurs in the plaintext, as well as possibly some other er-

roneous positions. We may control the number of errors

by adjusting a parameter



in the encryption algorithm;

each wrong position will be returned with probability about









, so for a



-word document, we expect to see about







false matches. The user will be able to eliminate all

the false matches (by decrypting), so in remote searching

applications, false matches should not be a problem so long

as they are not so common that they overwhelm the com-

munication channel between the user and the server.

This paper is structured as follows. We ﬁrst introduce

the problem of searching on encrypted data in Section 2 and

brieﬂy review some important background in Section 3. We

then describe our solution for the case of searching with

sequential scan in Section 4. We discuss further issues such

as advanced search and search with index in Section 5. We

discuss related work in Section 6 and ﬁnally we conclude in

Section 7. Appendix A presents the proofs for all of proofs

of security for these schemes.

2 Searching on Encrypted Data

We ﬁrst deﬁne the problem of searching on encrypted

data.

Assume Alice has a set of documents and stores them

on an untrusted server Bob. For example, Alice could be a

mobile user who stores her email messages on an untrusted

mail server. Because Bob is untrusted, Alice wishes to en-

crypt her documents and only store the ciphertext on Bob.

Each document can be dividedup into ‘words’. Each ‘word’

may be any token; it may be a 64-bit block, an English

word, a sentence, or some other atomic quantity, according

to the application domain of interest. For simplicity, we typ-

ically assume these ‘words’ have the same length (otherwise

we can either pad the shorter ‘words’ or split longer ‘words’

to make all the ‘words’ to have equal length, or use some

simple extensions for variable length ‘words’; see also Sec-

tion 5.3). Because Alice may have only a low-bandwidth

network connection to the server Bob, she wishes to only

retrieve the documents which contain the word



.Inor-

der to achieve this goal, we need to design a scheme so that

after performing certain computations over the ciphertext,

Bob can determine with some probabilitywhether each doc-

ument contains the word



without learning anything else.

There seem to be two types of approaches. One possibil-

ity is to build up an index that, for each word



of interest,

lists the documents that contain



. An alternative is to per-

form a sequential scan without an index. The advantage of

using an index is that it may be faster than the sequential

scan when the documents are large. The disadvantage of

using an index is that storing and updating the index can be

of substantial overhead. So the approach of using an index

is more suitable for mostly-read-only data.

We ﬁrst describe our scheme for searching on encrypted

data without an index. Since the index-based schemes seem

to require less sophisticated constructions, we will defer

discussion of searching with an index until the end of the

paper (see Section 5.4).

3 Background and Deﬁnitions

Our scheme requires several fundamental primitives

from classical symmetric-key cryptography. Because we

will prove our scheme secure, we use only primitives with

a well-deﬁned notion of security. We will list here the re-

quired primitives, as well as reviewing the standard deﬁni-

tions of security for them. The deﬁnitions may be skipped

on ﬁrst reading for those uninterested in our theoretical

proofs of security.

We adopt the standard deﬁnitions of security from the

provable security literature [2], and we measure the strength

of the cryptographic primitives in terms of the resources

needed to break them. We will say that an attack



-breaks

a cryptographic primitive if the attack algorithm succeeds

in breaking the primitive with resources speciﬁed by



, and

we say that a crypto primitive is



-secure if there is no al-

gorithm that can



-break it. Let

















 









be an arbitrary algorithm and let



and



be random vari-

ables distributed on













. The distinguishing probability



—sometimes called the advantage of



—for



and



Adv

































With this background, our list of required primitives is

as follows:

1. A pseudorandom generator



, i.e., a stream cipher.

We say that













is a



 



-secure pseu-

dorandom generator if every algorithm



with run-

ning time at most



has advantage Adv

  

. The

advantage of an adversary



is deﬁned as Adv





















  













  



, where









are random variables distributed uniformly







2. A pseudorandom function



. We say that











  

is a



   



-secure pseudorandom function

if every oracle algorithm



making at most



oracle

queries and with running time at most



has advantage

Adv

  

. The advantage is deﬁned as Adv















 









 



where



represents

a random function selected uniformly from the set of

all maps from





, and where the probabilities are

taken over the choice of



and



3. A pseudorandom permutation



, i.e., a block cipher.

We say that









  

is a



   



-secure pseu-

dorandom function if every oracle algorithm



making

at most



oracle queries and with running time at most



has advantage Adv



. The advantage is deﬁned

as Adv





















 











 



where



represents a random permutation selected uni-

formly from the set of all bijections on



, and where

the probabilities are taken over the choice of



and



Notice that the adversary is given an oracle for encryp-

tion as well as for decryption; this corresponds to the

adaptive chosen-plaintext/ciphertext attack model.

In general, the intuition is that



   



-security represents

resistance to attacks that use at most



ofﬂine work and at

most



adaptive chosen-text queries.

There is of course no fundamental need for three sepa-

rate primitives, since in practice all three may be built out

of just one off-the-shelf primitive. For instance, given any

block cipher, we may build a pseudorandom generator us-

ing the counter mode [3] or a pseudorandom function using

the CBC-MAC [4].

We rely on the following notation. If





 

represents a pseudorandom function or permutation, we

write











for the result of applying



to input



with key



 

. We write



 



for the concatenation of



and



and







for the bitwise XOR of



and



. For the remain-

der of the paper, we let













be a pseudorandom

generator for some











 



be a pseudo-

random function, and









 

be a pseudoran-

dom permutation. Typically we will have



































, and





















4 Our Solution with Sequential Scan

In this section, we introduce our solution for searching

with sequential scan. We ﬁrst start with a basic scheme

and show that its encryption algorithm provides provable

secrecy. We then show how we can extend the ﬁrst scheme

to handle controlled searching and hidden searches. We de-

scribe our ﬁnal scheme which satisﬁes all the properties we

mentioned earlier including query isolation at the end.

4.1 Scheme I: The Basic Scheme

Alice wants to encrypt a document which contains the

sequence of words



 



. Intuitively, the scheme

works by computing the bitwise exclusive or (XOR) of the

clear-text with a sequence of pseudorandom bits which have

a special structure. This structure will allow to search on the

data without revealing anything else about the clear text.

More speciﬁcally, the basic scheme is as follows. Alice

generates a sequence of pseudorandom values



 



using some stream cipher (namely, the pseudorandom gen-

erator



), where each











bits long. To encrypt



-bit word





that appears in position



, Alice takes the

pseudorandombits





, sets



























, and outputs

the ciphertext

















. Note that only Alice can gen-

erate the pseudorandom stream



 



so no one else

can decrypt. Of course, encryption can be done on-line,so

that we encrypt each word as it becomes available.

There is some ﬂexibility in how the keys





may be cho-

sen. One possibility is to use the same key



at every po-

sition in the document. Another alternative is to choose a

new key





for each position independent of all other keys.

More generally, at each position, Alice can either (a) choose





to be the same as some previous





(



), or (b) choose





independently of all the previous keys. We shall see later

how this ﬂexibility allows us to support a variety of inter-

esting features.

The basic scheme provides provable secrecy if the pseu-

dorandom function



and the pseudorandom generator



are secure. By this, we mean that, at each position where





is unknown, the values





are indistinguishable from

truly random bits for any computationally-bounded adver-

sary. We formalize the theorem as below.

Theorem 4.1. If



is a



  





-secure pseudorandom

function and



is a



 





-secure pseudorandom genera-

tor, and if the key material is chosen as described above,

then the algorithm described above for generating the se-

quence





 





is a







 



-secure pseudorandom

generator, where



































 



and the

constant



is negligible compared to



In other words, we expect the basic scheme to be good

for encrypting up to about



max















words, if

the pseudorandom function and pseudorandom generator

are adequately secure. See Appendix A for a slightly more

precise statement of the theorem and for a full proof.

The basic scheme supports searches over the ciphertext

in the following way: if Alice wants to search the word



she can tell Bob



and the





corresponding to each lo-

cation



where a word



may occur. Bob can then search

for



in the ciphertext by checking whether











of the form



 













for some



. Such a search can be

performed in linear time. At the positions where Bob does

not know





, Bob learns nothing about the plaintext. Thus,

the scheme allows a limited form of control: if Alice only

wants Bob to be able to search over the ﬁrst half of the ci-

phertext, Alice should reveal only the





corresponding to



 





















Plaintext

Stream Cipher



Ciphertext

Figure 1. The Basic Scheme

those locations and none of the





used in the second half of

the ciphertext.

As described so far, the basic scheme is not terribly sat-

isfying: if Alice wants to help Bob search for a word



either Alice must reveal all the





(thus potentially reveal-

ing the entire document), or Alice must know in advance

which locations



may appear at (which seems to defeat

the purpose of remote searching). However, we shall see

next how to take care of this difﬁculty.

4.2 Scheme II: Controlled Searching

Let





















 



be an additional pseudo-

random function, which will be keyed independently of



The main idea is to choose our keys as



















We require that





be chosen uniformly randomly in



Alice and never be revealed. Then, if Alice wish to allow

Bob to search for the word



, she reveals











and



him. This allows Bob to identify all the locations where



might occur, but reveals absolutely nothing on the locations



where











. This attains our desired goal of con-

trolled searching. We show the correctness of this approach

in the following theorem.

Theorem 4.2. Suppose



is a



  





-secure pseudoran-

dom function,



is a



  





-secure pseudorandom func-

tion, and



is a



 





-secure pseudorandom generator.

If the key material is chosen as described above, then

the algorithm described above for generating the sequence





 





will be a







 





-secure pseudorandom

generator, where









































 



This shows that our scheme for controlled searching is

about as good as the basic scheme, if the underlying prim-

itives are secure. See Appendix A for a proof as well as a

more precise formulation.

Various extensions of this idea are possible. If the doc-

ument to be encrypted consists of a series of chapters, an

alternative approach is to generate the key





for the word



in chapter

















 





. This allows Al-

ice to control which chapters Bob may search in as well as

controlling which words Bob may search for.

We can take this idea even further by using a hierarchi-

cal key management scheme. Alice sets























and





















. Then she can reveal either

(1)

























for each chapter of interest or (2)

















itself if she wishes to succinctly authorize Bob

to search for



in all the chapters.

This scheme still does not support hidden search queries:

in order to let Bob search for the location where the word



appears, Alice has to reveal



to Bob. We shall see next

that this problem can be easily ﬁxed.

4.3 Scheme III: Support for Hidden Searches

Suppose Alice would now like to ask Bob to search for

a word



but she is not willing to reveal



to Bob. We

propose a simple extension to the above scheme to support

this goal.

Alice should merely pre-encrypt each word



of the

clear text separately using a deterministic encryption algo-

rithm





¼¼

. Note that



is not allowed to use any random-

ness, and the computation of





¼¼







may depend only on



and must not depend on the position



in the document

where



is found. So we may think of this pre-encryption

step as ECB encryption of the words of the document us-

ing some block cipher. (Of course, if the word is very

long, internally the map





¼¼

may be implemented by CBC-

encrypting





with a constant IV, or some such, but the

point is that this process must be the same at every position

of the document.) We let











¼¼









After the pre-encryption phase, Alice has a sequence of



-encrypted words



 



. Now she post-encrypts

that sequence using the stream cipher construction de-

scribed above to obtain

















, where











¼¼









and





























To search for a word



, Alice computes









¼¼







and















, and sends



 



to Bob. Note that this



 



























Stream Cipher



Ciphertext





Plaintext



Figure 2. The Scheme for Hidden Search

allows Bob to search for



without revealing



itself. It

is easy to see that this scheme satisﬁes the hidden search

property as long as the pre-encryption



is secure.

4.4 Scheme IV: The Final Scheme

Careful readers may have noticed that Scheme III ac-

tually suffers from a small inadequacy: if Alice generates

keys





















¼¼









then Alice can no longer

recover the plaintext from just the ciphertext because she

would need to know





¼¼









(more precisely, the last



bits of





¼¼









) before she can decrypt. This defeats the

purpose of an encryption scheme, because even legitimate

principals with access to the decryption keys will be unable

to decrypt. (Scheme II also has a similar inadequacy, but

as we will show below, the best way to ﬁx it is to introduce

pre-encryption as in Scheme III.)

We now show a simple ﬁx for this problem. In the ﬁxed

scheme, we split the pre-encrypted word











¼¼









into two parts,

















, where



(respectively





)

denotes the ﬁrst







bits (resp. last



bits) of





. Instead

of generating























, Alice should generate





















. To decrypt, Alice can generate





using

the pseudorandom generator (since Alice knows the seed),

and with





she can recover



by XORing





against the

ﬁrst







bits of





. Finally, knowledgeof



allows Alice

to compute





and thus ﬁnish the decryption.

This ﬁx is not secure if the





’s are not encrypted since it

might be very likely in some cases that different words have

the same ﬁrst







bits. Pre-encryption will eliminate

this problem, since with high probability all the



’s are

distinct. (Assuming that the pre-encryption



is a pseudo-

random permutation, then due to the birthday paradox [15],

the probability that at least one collision happens after en-

crypting



words is at most



















·½µ



)

With this ﬁx, the resulting scheme is provably secure,

and in fact we can also show that it provides query isola-

tion, meaning that even when a single key





is revealed, no

extra information is leaked beyond the ability to identify the

positions where the corresponding word





occurs.

Theorem 4.3. Suppose



is a



  





-secure pseudoran-

dom permutation,



is a



  





-secure pseudorandom

function,



is a



  





-secure pseudorandom function,



is a



 





-secure pseudorandom generator, and we choose

the key material as described above. Then the algorithm de-

scribed above for generatingthe sequence





 





will

beaa







 





-secure pseudorandom generator, where









































 



Moreover, if we disclose one





and consider the reduced

sequence





obtained by discarding all the





values at po-

sitions where











, then we obtain a







 







-secure

pseudorandom generator, where





















 

Strictly speaking, the proof of the theorem does not ac-

tually require



to be a pseudorandom permutation: if

denotes the (keyed) map sending



to the ﬁrst







bits





¼¼







, then we can make do with the much weaker

assumption that collisions in

should be rare. As a special

case, if the ﬁrst

bits of

(









) can be shown to be

a pseudorandom function, then



will necessarily have the

required property, and we will be able to prove a result anal-

ogous to Theorem 3. This suggests that for pre-encryption

HTML Viewer

Frequently Asked Questions (13)

Q1. What contributions have the authors mentioned in the paper "Practical techniques for searches on encrypted data" ?

In this paper, the authors describe their cryptographic schemes for the problem of searching on encrypted data and provide proofs of security for the resulting crypto systems. They are provably secure: they provide provable secrecy for encryption, in the sense that the untrusted server can not learn anything about the plaintext when only given the ciphertext ; they provide query isolation for searches, meaning that the untrusted server can not learn anything more about the plaintext than the search result ; they provide controlled searching, so that the untrusted server can not search for an arbitrary word without the user ’ s authorization ; they also support hidden queries, so that the user may ask the untrusted server to search for a secret word without revealing the word to the server. The algorithms the authors present are simple, fast ( for a document of length, the encryption and search algorithms only need stream cipher and block cipher operations ), and introduce almost no space and communication overhead, and hence are practical to use today. The authors gratefully acknowledge support for this research from several US government agencies. This research was suported in part by the Defense Advanced Research Projects Agency under DARPA contract N6601-9928913 ( under supervision of the Space and Naval Warfare Systems Center San Diego ), by the National Science foundation under grant FD99-79852, and by the United States Postal Service under grant USPS 1025 90-98-C3513. Views and conclusions contained in this document are those of the authors and do not necessarily represent the official opinion or policies, either expressed or implied of the US government or any of its agencies, DARPA, NSF, USPS.

Q2. What are the advantages of their techniques?

Their techniques have a number of crucial advantages: they are provably secure; they support controlled and hidden search and query isolation; they are simple and fast (More specifically, for a document of length , the encryption and search algorithms only need stream cipher and block cipher operations); and they introduce almost no space and communication overhead.

Q3. What is the way to build a pseudorandom generator?

For instance, given any block cipher, the authors may build a pseudorandom generator using the counter mode [3] or a pseudorandom function using the CBC-MAC [4].

Q4. Why does Alice want to retrieve only the documents which contain the word?

Because Alice may have only a low-bandwidth network connection to the server Bob, she wishes to only retrieve the documents which contain the word .

Q5. What is the way to hide the length information from the server?

When words lengths may vary, it is important to hide the length information from the server, because revealing the length of each word might allow for statistical attacks.

Q6. How do the authors control the number of errors in the encryption algorithm?

The authors may control the number of errors by adjusting a parameter in the encryption algorithm; each wrong position will be returned with probability about , so for a -word document, the authors expect to see about false matches.

Q7. What is the main problem with the encryption of mail servers?

Today’s mail servers such as IMAP servers [11], file servers and other data storage servers typically must be fully trusted—they have access to the data, and hence must be trusted not to reveal it without authorization—which introduces undesirable security and privacy risks in applications.

Q8. What is the way to store the length field before each word in the file?

One natural approach is to store the length field before each word in the file, and to glue the length field and word together as one word to perform encryption and search using their standard schemes.

Q9. What is the purpose of the techniques?

The techniques provide controlled searching, so that the untrusted server cannot search for a word without the user’s authorization.

Q10. What is the way to prevent Bob from doing statistical analysis on the index?

In order to prevent Bob from doing statistical analysis on the index, it is better to keep the lists of pointers in a fixed-size list.

Q11. What is the disadvantage of keeping the lists of pointers in a fixed-size list?

Note that by keeping the lists of pointers in a fixed-size list, the authors are mainly preventing Bob from learning statistical information on the key words that he has not searched.

Q12. What is the way to keep the list of document pointers in a fixed size?

Alice can split the long list into several lists with the fixed size; then, to search for such a word, Alice will need to ask Bob to perform and merge several search queries in parallel.

Q13. What is the cost of each scan?

In this case, the cost of each scan is increased, because the number of operations is determined by the bit-length of the document rather than by the number of blocks in the document.

Practical techniques for searches on encrypted data

Summary (4 min read)

1 Introduction

2 Searching on Encrypted Data

4 Our Solution with Sequential Scan

4.1 Scheme I: The Basic Scheme

4.2 Scheme II: Controlled Searching

4.3 Scheme III: Support for Hidden Searches

4.4 Scheme IV: The Final Scheme

5.1 Other Practical Considerations

5.2 Supporting More Advanced Search Queries

5.3 Dealing with Variable-Length Words

5.4 Searching with an Encrypted Index

5.5 More Security Issues

7 Conclusion

Figures (3)

Citations

Cites background from "Practical techniques for searches o..."

Cites background from "Practical techniques for searches o..."

Cites background from "Practical techniques for searches o..."

References

"Practical techniques for searches o..." refers background in this paper

"Practical techniques for searches o..." refers background in this paper

"Practical techniques for searches o..." refers background in this paper

"Practical techniques for searches o..." refers background in this paper

Related Papers (5)

Frequently Asked Questions (13)

Q1. What contributions have the authors mentioned in the paper "Practical techniques for searches on encrypted data" ?

Q2. What are the advantages of their techniques?

Q3. What is the way to build a pseudorandom generator?

Q4. Why does Alice want to retrieve only the documents which contain the word?

Q5. What is the way to hide the length information from the server?

Q6. How do the authors control the number of errors in the encryption algorithm?

Q7. What is the main problem with the encryption of mail servers?

Q8. What is the way to store the length field before each word in the file?

Q9. What is the purpose of the techniques?

Q10. What is the way to prevent Bob from doing statistical analysis on the index?

Q11. What is the disadvantage of keeping the lists of pointers in a fixed-size list?

Q12. What is the way to keep the list of document pointers in a fixed size?

Q13. What is the cost of each scan?