# Space-efficient private search with applications to rateless codes

TL;DR: This work improves the space efficiency of the Ostrovsky et al.

Abstract: Private keyword search is a technique that allows for searching and retrieving documents matching certain keywords without revealing the search criteria. We improve the space efficiency of the Ostrovsky et al. Private Search [9] scheme, by describing methods that require considerably shorter buffers for returning the results of the search. Our basic decoding scheme recursive extraction, requires buffers of length less than twice the number of returned results and is still simple and highly efficient. Our extended decoding schemes rely on solving systems of simultaneous equations, and in special cases can uncover documents in buffers that are close to 95% full. Finally we note the similarity between our decoding techniques and the ones used to decode rateless codes, and show how such codes can be extracted from encrypted documents.

## Summary (4 min read)

### 1 Introduction

- Private search allows for keyword searching on a stream of documents (typical of online environments) without revealing the search criteria.
- Financial applications that can benefit from this technique are, for example, corporate searches on a patents database, searches for financial transactions meeting specific but private criteria and periodic updates of filtered financial news or stock values.
- The authors extend their scheme to improve the space-efficiency of the returned results considerably by using more efficient coding and decoding techniques.
- The authors first key contribution is a method called recursive extraction for efficiently decoding encrypted buffers resulting from the Ostrovsky et al. scheme.
- Solving the remaining equations at colliding buffer positions allows for even more documents to be retrieved from short buffers, and in the special case of documents only matching one keyword, the authors can decode buffers that are only 10% longer than the expected matches, with high probability.

### 3 Private Search

- The Private Search scheme proposed by Ostrovsky et al. [9] is based on the properties of the homomorphic Paillier public key cryptosystem [10], in which the multiplication of two ciphertexts leads to the encryption of the sum of the corresponding plaintexts (E(x) · E(y) = E(x + y)).
- Constructions with El-Gamal [6] are also possible but do not allow for full recovery of documents.
- All buffer positions are initialized with tuples (E(0), E(0)).
- The documents that do not match any of the keywords, do not contribute to changing the contents of these positions in the buffer (since zero is being added to the plaintexts), but the matched documents do.
- Collisions will occur when two matching documents are inserted at the same position in the buffer.

### 4 Modifications to the Original Scheme

- A prerequisite for more efficient decoding schemes is to reduce the uncertainty of the party that performs the decoding.
- At the same time, the party performing the search should gain no additional information with respect to the original scheme.
- The position pij is then represented by the q most significant bits of the result of the hash.
- The extension requires that the total number N of searched documents is known to the decoder, and that the positions of all (not just matched) searched documents are known by the decoder.
- With respect to the original Ostrovsky scheme, their basic algorithm only requires the substitution of the random function U [0, b−1] used to select the buffer positions for the document copies by a pseudorandom function dependent on the document and the copy number, that can be computed by the decoder.

### 5 Basic Decoding Algorithm: Recursive Extraction

- Given the minor modifications above, the authors note that much more efficient decoding algorithms can be used, that would allow the use of significantly smaller buffers for the same recovery probability.
- While collisions are ignored in the original Ostrovsky scheme, their key intuition is that collisions are in fact not destroying all information, but merely adding together the encrypted plaintexts.
- The decoder decrypts the buffer, and thanks to the redundancy included in the documents, it can discern three states of a particular buffer position: whether it is empty, contains a single document, or contains a collision.
- Documents ‘3’, ‘5’, ‘7’ and ‘8’ can be trivially recovered (note that these four documents would be the only ones recovered in the original scheme).
- Once they are removed from the buffer, document ‘6’ can be also retrieved.

### 6 Extended Decoding Algorithm: Solving Equations

- The authors basic decoding algorithm may terminate without recovering all matching documents if the authors run into a situation where a group of documents is copied to the same set of buffer positions.
- The authors key observation is that by expressing these buffer positions as linear equations, the authors can still retrieve the 3 colliding documents.
- Making predictable the positions of the document copies in the buffer, further reduces uncertainty and allows us to further improve the decoding efficiency.
- Each document that has a copy in this buffer position is a variable in the equation, and the sum of the actual matched documents equals the value of the bucket.
- As such, solving equations is complementary to the first decoding technique, and it is always applied after recursive extraction.

### 6.1 Special Case: Searching for One Keyword

- In some applications, the decoder may be interested in searching only one keyword in the documents.
- When retrieving pseudonymous email [11], the decoder would provide his email address as the only keyword for searching in the documents.
- The technique works as follows: the serial number is appended to the lower end of the document, leaving enough space to accommodate the sum of serial numbers of documents present in the bucket.
- Note that a lower number of bits between log2(N) and 2 log2(N) may be sufficient, since the average number of matched documents in a buffer position is generally much lower than the worst case.
- Note that in the general case of searching K keywords yi takes values between zero and K.

### 6.2 Tight Packing of Encrypted Lists and Bit Fields

- In the previous section the authors described how they can use a single Paillier ciphertext to encode, space permitting, both the serial number of the document and the document itself.
- In many cases (e.g., the first element of each buffer position in the Ostrovsky scheme, the representation of the serial number, or the Bloom filter entry) only a small plaintext is to be represented.
- The authors can do this by using only one Paillier ciphertext and packing as many elements as possible into it.
- Consider that the authors have two ciphertexts E(i) and E(j), representing two fields.
- The authors assume that the cryptographic protocols they shall perform will never result in a sum of those fields being greater than b bits long.

### 7 Experimental Results

- The authors present in Figures 5 and 6 simulation results1 illustrating the performance of recursive extraction and solving equations for different sets of parameters.
- Figures 5(b) and 5(d) show the probability of success of their techniques based on solving equations, for buffers of length 100 and 1000, respectively.
- Since solving equations can only be done after recursive extraction, these probabilities of success have to be seen as providing the possibility to get all documents and 5(c)).
- The authors observe that too few or too many copies, reduce the recovery rate of documents.
- This is only the case if one assumes a buffer of infinite size – and therefore an optimal parameter for the number of copies has to be calculated for each set of practical values of buffer size and expected matches.

### 8 Applications to Rateless Codes

- Maymounkov and Mazires in [8] introduce “rateless codes”, a method for erasure resistant, multi-source coding.
- These codes have been designed to be used in a peer-to-peer context, where Alice maybe downloading the same file from multiple sources, with no coordination between sources.
- Bob sends check blocks to Alice, that decodes them.
- The mapping between auxiliary blocks and check blocks is determined only by the pseudo-random number sequences for which the seeds are known, and therefore she is able to tell when a decryption would be successful.
- As a result of this property of El-Gamal ciphertexts, not only messages encrypted blockwise using this cipher can be expanded into rateless codes served from multiple sources, but also the receiver of these blocks can perform the decoding and reconstruct a valid El-Gamal encrypted representation of the original message.

### 9 Conclusions

- The authors have presented in this paper efficient decoding mechanisms for private search.
- The size of the returned buffer with matched documents is the key to the success of private search schemes.
- If the size of this buffer is too long, any scheme can simply be reduced to transmitting back all the documents, which would save in complexity and cryptographic costs.
- The proposed decoding methods reduce by a significant constant factor the buffer sizes required by the Ostrovsky et al.
- Recursive extraction and solving equations are complementary and can be applied sequentially to extract a maximum number of documents from short buffers.

Did you find this useful? Give us your feedback

...read more

##### Citations

130 citations

43 citations

### Cites background from "Space-efficient private search with..."

...Private Search, a simplified PIR construction [18, 44, 151] has the potential to be used in efficient receiver anonymity systems....

[...]

37 citations

### Cites background from "Space-efficient private search with..."

...While this does provide the necessary privacy, the communication cost is high....

[...]

34 citations

27 citations

##### References

6,998 citations

^{1}

6,871 citations

### "Space-efficient private search with..." refers background or methods in this paper

...Constructions with El-Gamal [ 6 ] are also possible but do not allow for full recovery of documents....

[...]

...El-Gamal [ 6 ] encryption can also be used instead of Paillier, with certain advantages....

[...]

6,049 citations

1,435 citations

151 citations