Practical techniques for searches on encrypted data
Summary (4 min read)
1 Introduction
- Today’s mail servers such as IMAP servers [11], file servers and other data storage servers typically must be fully trusted—they have access to the data, and hence must be trusted not to reveal it without authorization—which introduces undesirable security and privacy risks in applications.
- The authors show how to support searching functionality without any loss of data confidentiality.
- The techniques provide provable secrecy for encryption, in the sense that the untrusted server cannot learn anything about the plaintext given only the ciphertext.
- The algorithms the authors present are simple and fast.
- The authors may control the number of errors by adjusting a parameter in the encryption algorithm; each wrong position will be returned with probability about , so for a -word document, they expect to see about false matches.
2 Searching on Encrypted Data
- The authors first define the problem of searching on encrypted data.
- Each document can be divided up into ‘words’.
- So the approach of using an index is more suitable for mostly-read-only data.
- The authors adopt the standard definitions of security from the provable security literature [2], and they measure the strength of the cryptographic primitives in terms of the resources needed to break them.
- The authors say that is a -secure pseudorandom generator if every algorithm with running time at most has advantage Adv .
4 Our Solution with Sequential Scan
- The authors introduce their solution for searching with sequential scan.
- The authors first start with a basic scheme and show that its encryption algorithm provides provable secrecy.
- The authors then show how they can extend the first scheme to handle controlled searching and hidden searches.
- The authors describe their final scheme which satisfies all the properties they mentioned earlier including query isolation at the end.
4.1 Scheme I: The Basic Scheme
- Alice wants to encrypt a document which contains the sequence of words .
- More specifically, the basic scheme is as follows.
- Alice generates a sequence of pseudorandom values using some stream cipher (namely, the pseudorandom generator ), where each is bits long.
- Another alternative is to choose a new key for each position independent of all other keys.
- The basic scheme supports searches over the ciphertext in the following way: if Alice wants to search the word , she can tell Bob and the corresponding to each location where a word may occur.
4.2 Scheme II: Controlled Searching
- Let be an additional pseudorandom function, which will be keyed independently of .
- Suppose is a -secure pseudorandom function, is a -secure pseudorandom function, and is a -secure pseudorandom generator.
- The authors can take this idea even further by using a hierarchical key management scheme.
- Then she can reveal either (1) for each chapter of interest or (2) itself if she wishes to succinctly authorize Bob to search for in all the chapters.
4.4 Scheme IV: The Final Scheme
- Careful readers may have noticed that Scheme III actually suffers from a small inadequacy: if Alice generates keys as then Alice can no longer recover the plaintext from just the ciphertext because she would need to know (more precisely, the last bits of ) before she can decrypt.
- (Scheme II also has a similar inadequacy, but as the authors will show below, the best way to fix it is to introduce pre-encryption as in Scheme III.).
- In the fixed scheme, the authors split the pre-encrypted word into two parts, ! , where !.
- Alice to compute and thus finish the decryption.
- Moreover, if the authors disclose one and consider the reduced sequence obtained by discarding all the values at positions where , then they obtain a -secure pseudorandom generator, where .
5.1 Other Practical Considerations
- The authors can see that updates in this scheme are easy.
- If Alice wants to add a new document into Bob’s data storage, she can simply encrypt it in the appropriate way and instruct Bob to append it to the already-stored ciphertext.
- Moreover, since the keys can be generated hierarchically from a master key, the key storage and management is also very convenient:.
- Alice only needs to remember one password, the master key.
- The underlying technique of embedding information in pseudorandom bit streams may also be of independent interest: the authors speculate that this simple trick might prove useful for other applications, too.
5.2 Supporting More Advanced Search Queries
- The schemes the authors presented earlier only address the problem of searching for a single word.
- The authors show several ex- amples to illustrate that it is relatively easy to implement more advanced searching functionality using their scheme as a fundamental building block.
- The authors can also support searches if the query is given as a regular expression using, e.g., wildcards in a limited form.
- For many applications the purpose of the search is to find documents which contain a specific word, where the position or the number of occurrences are not relevant.
- The authors add a count to each word, which counts how many times that word occurs previously in that document.
5.3 Dealing with Variable-Length Words
- In their scheme, the minimal unit the authors can search for is an individual word.
- One possibility is to pick a fixed-size block that is long enough to contain most words.
- Such a padding scheme would introduce space inefficiency.
- When words lengths may vary, it is important to hide the length information from the server, because revealing the length of each word might allow for statistical attacks.
- Fortunately, in this case the server does not need to know the lengths to perform a search: he may just scan through the file and check for a match at each possible bit boundary.
5.4 Searching with an Encrypted Index
- Sequential scan may not be efficient enough when the data size is large.
- The interesting question is how to encrypt the index.
- Alice may decrypt the encrypted entries and send Bob another request to retrieve the relevant documents.
- Note that by keeping the lists of pointers in a fixed-size list, the authors are mainly preventing Bob from learning statistical information on the key words that he has not searched.
- Note that a general disadvantage for index search is that whenever Alice changes her documents, she must update the index.
5.5 More Security Issues
- In all their schemes, by allowing Bob to search for a word the authors effectively disclose to him a list of potential locations where might occur.
- If the authors allow Bob to search for too many words, he may be able to use statistical techniques to start learning important information about the documents.
- One possible defense is to decrease (so that false matches are more prevalent and thus Bob’s information about the plaintext is ‘noisy’), but the authors have not analyzed the costeffectiveness of this tradeoff in any detail.
- In all the schemes the authors have discussed so far, they must trust Bob to return all the search results.
- Even when this type of attack is present, it is possible to combine their scheme with hash tree techniques [17] to ensure the integrity of the data and detect such attacks, although a full description of this countermeasure is out of the scope of the paper.
7 Conclusion
- The authors have described new techniques for remote searching on encrypted data using an untrusted server and provided proofs of security for the resulting crypto systems.
- The techniques have a number of crucial advantages: they are provably secure; they support controlled and hidden search and query isolation; they are simple and fast (More specifically, for a document of length , the encryption and search algorithms only need stream cipher and block cipher operations); and they introduce almost no space and communication overhead.
- The authors scheme is also very flexible, and it can easily be extended to support more advanced search queries.
- The authors conclude that this provides a powerful new building block for the construction of secure services in the untrusted infrastructure.
Did you find this useful? Give us your feedback
Citations
4,670 citations
Cites background from "Practical techniques for searches o..."
...Various techniques exist for searching through encrypted data (Song et al. 2000), which provides a form of privacy protection (the data is encrypted) and selective access to sensitive data....
[...]
3,024 citations
Cites background from "Practical techniques for searches o..."
...at al [28] requires very little communication between the user and the database (proportional to the security parameter) and only one round of interaction....
[...]
...We stress that both the constructions of [26, 17] and the more recent work of [10, 28, 16] apply only to the private-key setting for users who own their data and wish to upload it to a third-party database that they do not trust....
[...]
1,673 citations
1,444 citations
Cites background from "Practical techniques for searches o..."
...A related line of work called predicate encryption or searching on encrypted data attempts to evaluate predicates over the encrypted data itself [39, 12, 1, 16, 15, 37, 29]....
[...]
1,416 citations
References
1,918 citations
"Practical techniques for searches o..." refers background in this paper
...Several researchers have studied the Private Information Retrieval (PIR) problem [9], so that clients may access entries in a distributed table without revealing which entrie s they are interested in....
[...]
1,746 citations
"Practical techniques for searches o..." refers background in this paper
...Even when this type of attack is present, it is possible to combine our scheme with hash tree techniques [ 17 ] to ensure the integrity of the data and detect such attacks, although a full description of this countermeasure is out of the scope of the paper....
[...]
1,630 citations
1,089 citations
"Practical techniques for searches o..." refers background in this paper
...One possibility is to use the same key k at every position in the document....
[...]
1,074 citations
"Practical techniques for searches o..." refers background in this paper
..., [16, 13, 10, 7] for important exceptions which allow to remove some—but not all—of those limitations)....
[...]
Related Papers (5)
Frequently Asked Questions (13)
Q2. What are the advantages of their techniques?
Their techniques have a number of crucial advantages: they are provably secure; they support controlled and hidden search and query isolation; they are simple and fast (More specifically, for a document of length , the encryption and search algorithms only need stream cipher and block cipher operations); and they introduce almost no space and communication overhead.
Q3. What is the way to build a pseudorandom generator?
For instance, given any block cipher, the authors may build a pseudorandom generator using the counter mode [3] or a pseudorandom function using the CBC-MAC [4].
Q4. Why does Alice want to retrieve only the documents which contain the word?
Because Alice may have only a low-bandwidth network connection to the server Bob, she wishes to only retrieve the documents which contain the word .
Q5. What is the way to hide the length information from the server?
When words lengths may vary, it is important to hide the length information from the server, because revealing the length of each word might allow for statistical attacks.
Q6. How do the authors control the number of errors in the encryption algorithm?
The authors may control the number of errors by adjusting a parameter in the encryption algorithm; each wrong position will be returned with probability about , so for a -word document, the authors expect to see about false matches.
Q7. What is the main problem with the encryption of mail servers?
Today’s mail servers such as IMAP servers [11], file servers and other data storage servers typically must be fully trusted—they have access to the data, and hence must be trusted not to reveal it without authorization—which introduces undesirable security and privacy risks in applications.
Q8. What is the way to store the length field before each word in the file?
One natural approach is to store the length field before each word in the file, and to glue the length field and word together as one word to perform encryption and search using their standard schemes.
Q9. What is the purpose of the techniques?
The techniques provide controlled searching, so that the untrusted server cannot search for a word without the user’s authorization.
Q10. What is the way to prevent Bob from doing statistical analysis on the index?
In order to prevent Bob from doing statistical analysis on the index, it is better to keep the lists of pointers in a fixed-size list.
Q11. What is the disadvantage of keeping the lists of pointers in a fixed-size list?
Note that by keeping the lists of pointers in a fixed-size list, the authors are mainly preventing Bob from learning statistical information on the key words that he has not searched.
Q12. What is the way to keep the list of document pointers in a fixed size?
Alice can split the long list into several lists with the fixed size; then, to search for such a word, Alice will need to ask Bob to perform and merge several search queries in parallel.
Q13. What is the cost of each scan?
In this case, the cost of each scan is increased, because the number of operations is determined by the bit-length of the document rather than by the number of blocks in the document.