Book Chapter•DOI•

Space-efficient private search with applications to rateless codes

Q: What are the contributions in "Space-efficient private search with applications to rateless codes" ?

The authors improve the space efficiency of the Ostrovsky et al. Private Search [ 9 ] scheme, by describing methods that require considerably shorter buffers for returning the results of the search. Finally the authors note the similarity between their decoding techniques and the ones used to decode rateless codes, and show how such codes can be extracted from encrypted documents.

George Danezis¹, Claudia Diaz¹•Institutions (1)

Katholieke Universiteit Leuven¹

12 Feb 2007-Vol. 4886, pp 148-162

TL;DR: This work improves the space efficiency of the Ostrovsky et al.

read less

Abstract: Private keyword search is a technique that allows for searching and retrieving documents matching certain keywords without revealing the search criteria. We improve the space efficiency of the Ostrovsky et al. Private Search [9] scheme, by describing methods that require considerably shorter buffers for returning the results of the search. Our basic decoding scheme recursive extraction, requires buffers of length less than twice the number of returned results and is still simple and highly efficient. Our extended decoding schemes rely on solving systems of simultaneous equations, and in special cases can uncover documents in buffers that are close to 95% full. Finally we note the similarity between our decoding techniques and the ones used to decode rateless codes, and show how such codes can be extracted from encrypted documents.

...read moreread less

Summary (4 min read)

Jump to: [1 Introduction] – [2 Related Work] – [3 Private Search] – [4 Modifications to the Original Scheme] – [5 Basic Decoding Algorithm: Recursive Extraction] – [6 Extended Decoding Algorithm: Solving Equations] – [6.1 Special Case: Searching for One Keyword] – [6.2 Tight Packing of Encrypted Lists and Bit Fields] – [7 Experimental Results] – [8 Applications to Rateless Codes] and [9 Conclusions]

1 Introduction

Private search allows for keyword searching on a stream of documents (typical of online environments) without revealing the search criteria.
Financial applications that can benefit from this technique are, for example, corporate searches on a patents database, searches for financial transactions meeting specific but private criteria and periodic updates of filtered financial news or stock values.
The authors extend their scheme to improve the space-efficiency of the returned results considerably by using more efficient coding and decoding techniques.
The authors first key contribution is a method called recursive extraction for efficiently decoding encrypted buffers resulting from the Ostrovsky et al. scheme.
Solving the remaining equations at colliding buffer positions allows for even more documents to be retrieved from short buffers, and in the special case of documents only matching one keyword, the authors can decode buffers that are only 10% longer than the expected matches, with high probability.

3 Private Search

The Private Search scheme proposed by Ostrovsky et al. [9] is based on the properties of the homomorphic Paillier public key cryptosystem [10], in which the multiplication of two ciphertexts leads to the encryption of the sum of the corresponding plaintexts (E(x) · E(y) = E(x + y)).
Constructions with El-Gamal [6] are also possible but do not allow for full recovery of documents.
All buffer positions are initialized with tuples (E(0), E(0)).
The documents that do not match any of the keywords, do not contribute to changing the contents of these positions in the buffer (since zero is being added to the plaintexts), but the matched documents do.
Collisions will occur when two matching documents are inserted at the same position in the buffer.

4 Modifications to the Original Scheme

A prerequisite for more efficient decoding schemes is to reduce the uncertainty of the party that performs the decoding.
At the same time, the party performing the search should gain no additional information with respect to the original scheme.
The position pij is then represented by the q most significant bits of the result of the hash.
The extension requires that the total number N of searched documents is known to the decoder, and that the positions of all (not just matched) searched documents are known by the decoder.
With respect to the original Ostrovsky scheme, their basic algorithm only requires the substitution of the random function U [0, b−1] used to select the buffer positions for the document copies by a pseudorandom function dependent on the document and the copy number, that can be computed by the decoder.

5 Basic Decoding Algorithm: Recursive Extraction

Given the minor modifications above, the authors note that much more efficient decoding algorithms can be used, that would allow the use of significantly smaller buffers for the same recovery probability.
While collisions are ignored in the original Ostrovsky scheme, their key intuition is that collisions are in fact not destroying all information, but merely adding together the encrypted plaintexts.
The decoder decrypts the buffer, and thanks to the redundancy included in the documents, it can discern three states of a particular buffer position: whether it is empty, contains a single document, or contains a collision.
Documents ‘3’, ‘5’, ‘7’ and ‘8’ can be trivially recovered (note that these four documents would be the only ones recovered in the original scheme).
Once they are removed from the buffer, document ‘6’ can be also retrieved.

6 Extended Decoding Algorithm: Solving Equations

The authors basic decoding algorithm may terminate without recovering all matching documents if the authors run into a situation where a group of documents is copied to the same set of buffer positions.
The authors key observation is that by expressing these buffer positions as linear equations, the authors can still retrieve the 3 colliding documents.
Making predictable the positions of the document copies in the buffer, further reduces uncertainty and allows us to further improve the decoding efficiency.
Each document that has a copy in this buffer position is a variable in the equation, and the sum of the actual matched documents equals the value of the bucket.
As such, solving equations is complementary to the first decoding technique, and it is always applied after recursive extraction.

6.1 Special Case: Searching for One Keyword

In some applications, the decoder may be interested in searching only one keyword in the documents.
When retrieving pseudonymous email [11], the decoder would provide his email address as the only keyword for searching in the documents.
The technique works as follows: the serial number is appended to the lower end of the document, leaving enough space to accommodate the sum of serial numbers of documents present in the bucket.
Note that a lower number of bits between log2(N) and 2 log2(N) may be sufficient, since the average number of matched documents in a buffer position is generally much lower than the worst case.
Note that in the general case of searching K keywords yi takes values between zero and K.

6.2 Tight Packing of Encrypted Lists and Bit Fields

In the previous section the authors described how they can use a single Paillier ciphertext to encode, space permitting, both the serial number of the document and the document itself.
In many cases (e.g., the first element of each buffer position in the Ostrovsky scheme, the representation of the serial number, or the Bloom filter entry) only a small plaintext is to be represented.
The authors can do this by using only one Paillier ciphertext and packing as many elements as possible into it.
Consider that the authors have two ciphertexts E(i) and E(j), representing two fields.
The authors assume that the cryptographic protocols they shall perform will never result in a sum of those fields being greater than b bits long.

7 Experimental Results

The authors present in Figures 5 and 6 simulation results1 illustrating the performance of recursive extraction and solving equations for different sets of parameters.
Figures 5(b) and 5(d) show the probability of success of their techniques based on solving equations, for buffers of length 100 and 1000, respectively.
Since solving equations can only be done after recursive extraction, these probabilities of success have to be seen as providing the possibility to get all documents and 5(c)).
The authors observe that too few or too many copies, reduce the recovery rate of documents.
This is only the case if one assumes a buffer of infinite size – and therefore an optimal parameter for the number of copies has to be calculated for each set of practical values of buffer size and expected matches.

8 Applications to Rateless Codes

Maymounkov and Mazires in [8] introduce “rateless codes”, a method for erasure resistant, multi-source coding.
These codes have been designed to be used in a peer-to-peer context, where Alice maybe downloading the same file from multiple sources, with no coordination between sources.
Bob sends check blocks to Alice, that decodes them.
The mapping between auxiliary blocks and check blocks is determined only by the pseudo-random number sequences for which the seeds are known, and therefore she is able to tell when a decryption would be successful.
As a result of this property of El-Gamal ciphertexts, not only messages encrypted blockwise using this cipher can be expanded into rateless codes served from multiple sources, but also the receiver of these blocks can perform the decoding and reconstruct a valid El-Gamal encrypted representation of the original message.

9 Conclusions

The authors have presented in this paper efficient decoding mechanisms for private search.
The size of the returned buffer with matched documents is the key to the success of private search schemes.
If the size of this buffer is too long, any scheme can simply be reduced to transmitting back all the documents, which would save in complexity and cryptographic costs.
The proposed decoding methods reduce by a significant constant factor the buffer sizes required by the Ostrovsky et al.
Recursive extraction and solving equations are complementary and can be applied sequentially to extract a maximum number of documents from short buffers.

Did you find this useful? Give us your feedback

Figures (6)

Fig. 3. Example of buffer with 2 copies of 3 documents colliding in 3 buffer positions

Fig. 5. Performance evaluation of our techniques and comparison with the original scheme

Fig. 6. The effect of the number of copies used on the performance of all techniques

Fig. 4. Example of buffer with 2 copies of 5 documents colliding in 3 buffer positions

Fig. 1. Function to determine the position of a document copy dij

Content maybe subject to copyright Report

Space-Eﬃcient Private Search

with Applications to Rateless Codes

George Danezis and Claudia Diaz

K.U. Leuven, ESAT/COSIC,

Kasteelpark Arenberg 10,

B-3001 Leuven-Heverlee, Belgium

{george.danezis, claudia.diaz}@esat.kuleuven.be

Abstract. Private keyword search is a technique that allows for search-

ing and retrieving documents matching certain keywords without reveal-

ing the search criteria. We improve the space eﬃciency of the Ostrovsky

et al. Private Search [9] scheme, by describing methods that require con-

siderably shorter buﬀers for returning the results of the search. Our ba-

sic decoding scheme recursive extraction, requires buﬀers of length less

than twice the number of returned results and is still simple and highly

eﬃcient. Our extended decoding schemes rely on solving systems of si-

multaneous equations, and in special cases can uncover documents in

buﬀers that are close to 95% full. Finally we note the similarity between

our decoding techniques and the ones used to decode rateless codes, and

show how such codes can be extracted from encrypted documents.

1 Introduction

Private search allows for keyword searching on a stream of documents (typical

of online environments) without revealing the search criteria. Its applications

include intelligence gathering, medical privacy, private information retrieval and

ﬁnancial applications. Financial applications that can beneﬁt from this technique

are, for example, corporate searches on a patents database, searches for ﬁnancial

transactions meeting speciﬁc but private criteria and periodic updates of ﬁltered

ﬁnancial news or stock values.

Rafail Ostrovsky et al. presented in [9] a scheme that allows a server to ﬁlter a

stream of documents, based on matching keywords, and only return the relevant

documents without gaining any information about the query string. This allows

searching to be outsourced, and only relevant results to be returned, economising

on communications costs. The authors of [9] show that the communication cost

is linear in the number of results expected. We extend their scheme to improve

the space-eﬃciency of the returned results considerably by using more eﬃcient

coding and decoding techniques.

Our ﬁrst key contribution is a method called recursive extraction for eﬃ-

ciently decoding encrypted buﬀers resulting from the Ostrovsky et al. scheme.

The second method, based on solving systems of linear equations, is applied af-

ter recursive extraction and allows for the recovery of extra matching documents

S. Dietrich and R. Dhamija (Eds.): FC 2007 and USEC 2007, LNCS 4886, pp. 148–162, 2007.

 IFCA/Springer-Verlag Berlin Heidelberg 2007

Space-Eﬃcient Private Search 149

from the encrypted buﬀers. Recursive extraction results in the full decoding of

buﬀers of length twice the size of the expected number of matches, and has a

linear time-complexity. Shorter buﬀerscanalsobedecryptedwithhighprob-

ability. Solving the remaining equations at colliding buﬀer positions allows for

even more documents to be retrieved from short buﬀers, and in the special case

of documents only matching one keyword, we can decode buﬀers that are only

10% longer than the expected matches, with high probability. We present simula-

tions to assess the decoding performance of our techniques, and estimate optimal

parameters for our schemes.

In this work we also present some observations that may be of general interest

beyond the context of private search. We show how arrays of small integers

can be represented in a space eﬃcient manner using Pailler ciphertexts, while

maintaining the homomorphic properties of the scheme. These techniques can be

used to make private search more space-eﬃcient, but also implement other data

structures like Bloom ﬁlters, or vectors in a compact way. Finally we show how

rateless codes, block based erasure resistant multi-source codes, can be extracted

from encrypted documents, while maintaining all their desirable properties.

This paper is structured as follows: We introduce the related work in Section 2;

present more in detail in Section 3 the original Ostrovsky scheme whose eﬃciency

we are trying to improve; and explain in Section 4 the required modiﬁcations.

Sections 5 and 6 present the proposed eﬃcient decoding techniques, which are

evaluated in Section 7. In Section 8 we explain how our techniques can be applied

to rateless codes; and we present our conclusions in Section 9.

2 Related Work

Our results can be applied to improve the decoding eﬃciency of the Private

Search scheme proposed by Rafail Ostrovsky et al. in [9]. This scheme is described

in detail in Section 3. Danezis and Diaz proposed in [5] some preliminary ideas on

how to improve the decoding eﬃciency of the Ostrovsky Private Search scheme,

which are elaborated in this paper.

Bethencourt et al. [1,2] have independently proposed several modiﬁcations to

the Ostrovsky private searchscheme which include solving a system of linear equa-

tions to recover the documents. As such, the time complexity of their approach is

O(n

), while our base technique, recursive extraction, is O(n). Their technique

also requires some changes to the original scheme [9], such as the addition of an

encrypted buﬀer that acts as a Bloom ﬁlter [3]. This buﬀer by itself increases by

50% the data returned. Some of our techniques presented in section 6.2, that al-

low for eﬃcient space representation of concatenated data, are complementary to

their work, and would greatly beneﬁt the eﬃciency of their techniques.

The rateless codes for big downloads proposed by Maymounkov and Mazi`eres

in [8] use a technique similar to ours for eﬃcient decoding, indicating that our

ideas can be applied beyond private search applications. We explore further this

relation in Section 8, where we show how homomorphic encryption can be used

to create rateless codes for encrypted data.

150 G. Danezis and C. Diaz

Pﬁtzmann and Wainer [13] also notice that collisions in DC networks [4] do

not destroy all information transmitted. They use this observation to allow n

messages to be transmitted in n steps despite collisions.

3PrivateSearch

The Private Search scheme proposed by Ostrovsky et al. [9] is based on the prop-

erties of the homomorphic Paillier public key cryptosystem [10], in which the

multiplication of two ciphertexts leads to the encryption of the sum of the corre-

sponding plaintexts (E(x) · E(y)=E(x + y)). Constructions with El-Gamal [6]

are also possible but do not allow for full recovery of documents.

The searching party provides a dictionary of terms and a corresponding Pail-

lier ciphertext, that is the encryption of one (t

= E(1)), if the term is to be

matched, or the encryption of zero (t



= E(0)) if the term is of no interest. Be-

cause of the semantic security properties of the Paillier cryptosystem this leaks

no information about the matching criteria.

The dictionary ciphertexts corresponding to the terms in the document d

are multiplied together to form g



= E(m

), where m

is the number

of matching words in document d

. A tuple (g

E(d

)

) is then computed. The

second term will be an encryption of zero (E(0)) if there has been no match, and

the encryption E(m

) otherwise. Note that repeated words in the document

are not taken into account, meaning that each matching word is counted only

once, and m

represents the number of diﬀerent matching words found in a

document.

Each document tuple is then multiplied into a set of l random positions in a

buﬀer of size b (smaller than the total number of searched documents, but bigger

than the number of matching documents). All buﬀer positions are initialized with

tuples (E(0),E(0)). The documents that do not match any of the keywords, do

not contribute to changing the contents of these positions in the buﬀer (since

zero is being added to the plaintexts), but the matched documents do.

Collisions will occur when two matching documents are inserted at the same

position in the buﬀer. These collisions can be detected by adding some redun-

dancy to the documents. The color survival theorem [9] can be used to show

that the probability that all copies of a single document are overwritten be-

comes negligibly small as the number of l copies and the size of the buﬀer b

increase (the suggested buﬀer length is b =2· l · M,whereM is the expected

number of matching documents). The searcher can decode all positions, ignoring

the collisions, and dividing the second term of the tuples by the ﬁrst term to

retrieve the documents.

4 Modiﬁcations to the Original Scheme

A prerequisite for more eﬃcient decoding schemes is to reduce the uncertainty of

the party that performs the decoding. At the same time, the party performing

Space-Eﬃcient Private Search 151

the search should gain no additional information with respect to the original

scheme. In order to make sure of this, we note that the modiﬁcations to the

original scheme involve only information ﬂows from the searching (encoding)

party back to the matching (decoding) party, and therefore cannot introduce

any additional vulnerabilities in this respect.

Our basic decoding algorithm (presented in Section 5) only requires that the

document copies are stored in buﬀer positions known to the decoder. In practice,

the mapping of documents to buﬀer positions can be done using a good hash

function H(·) that can be agreed by both parties or ﬁxed by the protocol. We

give an example of how this function can be constructed.

Notation:

– l is the total of copies stored per document;

– d

is the j-th copy of document d

(j =1...l) – note that all copies of d

are equal;

– b is the size of the buﬀer;

– q is the number of bits needed to represent b (2

q−1

<b≤ 2

);

– p

is the position of document copy d

in the buﬀer (0 ≤ p

<b).

The hash function is applied to the sum of the the document d

and the

copy number j, H(d

+ j). The position p

is then represented by the q most

signiﬁcant bits of the result of the hash. If there is index overﬂow (i.e., b ≤ p

then we apply the hash function again (H(H(d

+ j))) and repeat the process,

until we obtain a result p

<b. This is illustrated in Figure 1(a).

With this method, once the decoding party sees a copy of a matched doc-

ument, d

, it can compute the positions of the buﬀer where all l copies of d

have been stored (and thus extract them from those positions) by applying the

function to d

+ j,withj =1...l.

We present in Section 6 an extension to our decoding algorithm that further

improves its decoding eﬃciency. The extension requires that the total number

N of searched documents is known to the decoder, and that the positions of

all (not just matched) searched documents are known by the decoder. This can

be achieved by adding a serial number s

to the documents, and then deriving

the position p

of the document copies as a function of the document serial

number and the number of the copy H(s

||j), as shown in Figure 1(b). We then

take the q most signiﬁcant bits of the result and proceed as in the previous

case.

With respect to the original Ostrovsky scheme, our basic algorithm only re-

quires the substitution of the random function U[0,b−1] used to select the buﬀer

positions for the document copies by a pseudorandom function dependent on the

document and the copy number, that can be computed by the decoder.

The extension requires that the encoder transmits to the decoder the total

number N of documents searched. The encoder should also append a serial

number to the documents (before encrypting them). We assume that the serial

numbers take values between 1 and N (i.e., s

= i).

152 G. Danezis and C. Diaz

(a) in the basic algorithm (b) in the extended algorithm

Fig. 1. Function to determine the position of a document copy d

5 Basic Decoding Algorithm: Recursive Extraction

Given the minor modiﬁcations above, we note that much more eﬃcient decoding

algorithms can be used, that would allow the use of signiﬁcantly smaller buﬀers

for the same recovery probability.

While collisions are ignored in the original Ostrovsky scheme, our key in-

tuition is that collisions are in fact not destroying all information, but merely

adding together the encrypted plaintexts. This property can be used to recover

a plaintext if the values of the other plaintexts with which it collides are known.

The decoder decrypts the buﬀer, and thanks to the redundancy included in

the documents, it can discern three states of a particular buﬀer position: whether

it is empty, contains a single document, or contains a collision.

In this basic scheme, the empty buﬀer positions are of no interest to the

decoder (they do provide useful information in the extended algorithm, as we

shall see in the next section). In the case of it containing a single document d

then the document can be recovered. By applying the hash function as described

in Section 4 to d

+ j with j =1...l, the decoder can locate all the other

copies of d

and extract them from the buﬀer. This hopefully uncovers some

new buﬀer positions containing only one document. This simple algorithm is

repeated multiple times until all documents are recovered or no more progress

can be made.

In the example shown in Figure 2(a), we match 9 documents and store 3

copies of each in a buﬀer of size 24. Documents ‘3’, ‘5’, ‘7’ and ‘8’ can be trivially

recovered (note that these four documents would be the only ones recovered in

the original scheme). All copies of these documents are located and extracted

HTML Viewer

Frequently Asked Questions (1)

Q1. What are the contributions in "Space-efficient private search with applications to rateless codes" ?

The authors improve the space efficiency of the Ostrovsky et al. Private Search [ 9 ] scheme, by describing methods that require considerably shorter buffers for returning the results of the search. Finally the authors note the similarity between their decoding techniques and the ones used to decode rateless codes, and show how such codes can be extracted from encrypted documents.

Space-efficient private search with applications to rateless codes

Summary (4 min read)

1 Introduction

3 Private Search

4 Modifications to the Original Scheme

5 Basic Decoding Algorithm: Recursive Extraction

6 Extended Decoding Algorithm: Solving Equations

6.1 Special Case: Searching for One Keyword

6.2 Tight Packing of Encrypted Lists and Bit Fields

7 Experimental Results

8 Applications to Rateless Codes

9 Conclusions

Figures (6)

Citations

Cites background from "Space-efficient private search with..."

Cites background from "Space-efficient private search with..."

References

"Space-efficient private search with..." refers background or methods in this paper

Related Papers (5)

Frequently Asked Questions (1)

Q1. What are the contributions in "Space-efficient private search with applications to rateless codes" ?