scispace - formally typeset
Open AccessJournal ArticleDOI

Searchable symmetric encryption: Improved definitions and efficient constructions

Reads0
Chats0
TLDR
This paper begins by reviewing existing notions of security and proposes new and stronger security definitions, and presents two constructions that show secure under these new definitions and are more efficient than all previous constructions.
Abstract
Searchable symmetric encryption SSE allows a party to outsource the storage of his data to another party in a private manner, while maintaining the ability to selectively search over it. This problem has been the focus of active research and several security definitions and constructions have been proposed. In this paper we begin by reviewing existing notions of security and propose new and stronger security definitions. We then present two constructions that we show secure under our new definitions. Interestingly, in addition to satisfying stronger security guarantees, our constructions are more efficient than all previous constructions.Further, prior work on SSE only considered the setting where only the owner of the data is capable of submitting search queries. We consider the natural extension where an arbitrary group of parties other than the owner can submit search queries. We formally define SSE in this multi-user setting, and present an efficient construction.

read more

Content maybe subject to copyright    Report

Searchable Symmetric Encryption:
Improved Definitions and Efficient Constructions
Reza Curtmola
NJIT
Juan Garay
AT&T Labs Research
Seny Kamara
§
Microsoft Research
Rafail Ostrovsky
UCLA
Abstract
Searchable symmetric encryption (SSE) allows a party to outsource the storage of his data to
another party in a private manner, while maintaining the ability to selectively search over it. This
problem has been the focus of active research and several security definitions and constructions
have been proposed. In this paper we begin by reviewing existing notions of security and propose
new and stronger security definitions. We then present two constructions that we show secure
under our new definitions. Interestingly, in addition to satisfying stronger security guarantees, our
constructions are more efficient than all previous constructions.
Further, prior work on SSE only considered the setting where only the owner of the data is
capable of submitting search queries. We consider the natural extension where an arbitrary group
of parties other than the owner can submit search queries. We formally define SSE in this multi-user
setting, and present an efficient construction.
1 Introduction
Private-key storage outsourcing [30, 4, 33] allows clients with either limited resources or limited exper-
tise to store and distribute large amounts of symmetrically encrypted data at low cost. Since regular
private-key encryption prevents one from searching over encrypted data, clients also lose the ability
to selectively retrieve segments of their data. To address this, several techniques have been proposed
for provisioning symmetric encryption with search capabilities [40, 23, 10, 18]; the resulting construct
is typically called searchable encryption. The area of searchable encryption has been identified by
DARPA as one of the technical advances that can be used to balance the need for both privacy and
national security in information aggregation systems [1].
One approach to provisioning symmetric encryption with search capabilities is with a so-called
secure index [23]. An index is a data structure that stores document collections while supporting
efficient keyword search, i.e., given a keyword, the index returns a pointer to the documents that
contain it. Informally, an index is “secure” if the search operation for a keyword w can only be
performed by users that possess a “trapdoor” for w and if the trapdoor can only be generated with
a secret key. Without knowledge of trapdoors, the index leaks no information about its contents.
As shown by Goh in [23], one can build a symmetric searchable encryption scheme from a secure
A preliminary version of this article appeared in the 13
th
ACM Conference on Computer and Communications
Security (CCS ’06) [20].
crix@njit.edu. Work done in part while at Bell Labs and Johns Hopkins University.
garay@research.att.com. Work done in part while at Bell Labs.
§
senyk@microsoft.com. Work done in part while at Johns Hopkins University.
rafail@cs.ucla.edu.
1

index as follows: the client indexes and encrypts its document collection and sends the secure index
together with the encrypted data to the server. To search for a keyword w, the client generates and
sends a trapdoor for w which the server uses to run the search operation and recover pointers to the
appropriate (encrypted) documents.
Symmetric searchable encryption can be achieved in its full generality and with optimal security
using the work of Ostrovsky and Goldreich on oblivious RAMs [35, 25]. More precisely, using these
techniques any type of search query can be achieved (e.g., conjunctions or disjunctions of keywords)
without leaking any information to the server, not even the “access pattern” (i.e., which documents
contain the keyword). This strong privacy guarantee, however, comes at the cost of a logarithmic (in
the number of documents) number of rounds of interaction for each read and write. In the same paper,
the authors show a 2-round solution, but with considerably larger square-root overhead. Therefore,
the previously mentioned work on searchable encryption [40, 23, 10, 18] tries to achieve more efficient
solutions (typically in one or two rounds) by weakening the privacy guarantees.
1.1 Our contributions
We now give an overview of the contributions of this work.
Revisiting previous definitions. We review existing security definitions for secure indexes, includ-
ing indistinguishability against chosen-keyword attacks (IND2-CKA) [23] and the simulation-based
definition in [18], and highlight some of their limitations. Specifically, we recall that IND2-CKA does
not guarantee the privacy of user queries (and is therefore not an adequate notion of security for
constructing SSE schemes) and then highlight (and fix) technical issues with the simulation-based
definition of [18]. We address both these issues by proposing new game-based and simulation-based
definitions that provide security for both indexes and trapdoors.
New definitions. We introduce new adversarial models for SSE. The first, which we refer to as non-
adaptive, only considers adversaries that make their search queries without taking into account the
trapdoors and search outcomes of previous searches. The second—adaptive—considers adversaries
that choose their queries as a function of previously obtained trapdoors and search outcomes. All
previous work on SSE (with the exception of oblivious RAMs) falls within the non-adaptive setting.
The implication is that, contrary to the natural use of searchable encryption described in [40, 23, 18],
these definitions only guarantee security for users that perform all their searches at once. We address
this by introducing game-based and simulation-based definitions in the adaptive setting.
New constructions. We present two constructions which we prove secure under our new definitions.
Our first scheme is only secure in the non-adaptive setting, but is the most efficient SSE construction
to date. In fact, it achieves searches in one communication round, requires an amount of work from
the server that is linear in the number of documents that contain the keyword (which is optimal),
requires constant storage on the client, and linear (in the size of the document collection) storage on
the server. While the construction in [23] also performs searches in one round, it can induce false
positives, which is not the case for our construction. Additionally, all the constructions in [23, 18]
require the server to perform an amount of work that is linear in the total number of documents in
the collection.
Our second construction is secure against an adaptive adversary, but at the price of requiring
a higher communication overhead per query and more storage at the server (comparable with the
storage required in [23]). While our adaptive scheme is conceptually simple, we note that constructing
efficient and provably secure adaptive SSE schemes is a non-trivial task. The main challenge lies in
proving such constructions secure in the simulation paradigm, since the simulator requires the ability
2

Properties [35, 25] [35, 25]-light [40] [23] [18] SSE-1 SSE-2
hides access pattern yes yes no no no no no
server computation O(log
3
n) O(
n) O(n) O(n) O(n) O(1) O(1)
server storage O(n ·log n) O(n) O(n) O(n) O(n) O(n) O(n)
number of rounds log n 2 1 1 1 1 1
communication O(log
3
n) O(
n) O(1) O(1) O(1) O(1) O(1)
adaptive adversaries yes yes no no no no yes
Table 1: Properties and performance (per query) of various SSE schemes. n denotes the number of documents
in the collection. For communication costs, we consider only the overhead and omit the size of the retrieved
documents, which is the same for all schemes. For server computation, we show the costs per returned document.
For simplicity, the security parameter is not included as a factor for the relevant costs.
to commit to a correct index before the adversary has even chosen its search queries—in other words,
the simulator needs to commit to an index and then be able to perform some form of equivocation.
Table 1 compares our constructions (SSE-1 and SSE-2) with the previous SSE schemes. To make
the comparison easier, we assume that each document in the collection has the same (constant) size
(otherwise, some of the costs have to be scaled by the document size). The server computation row
shows the costs per returned document for a query. Note that all previous work requires an amount
of server computation at least linear with the number of documents in the collection, even if only
one document matches a query. In contrast, in our constructions the server computation is constant
per each document that matches a query, and the overall computation per query is proportional to
the number of documents that match the query. In all the considered schemes, the computation and
storage at the user is O(1).
We remark that as an additional benefit, our constructions can also handle updates to the docu-
ment collection in the sense of [18]. We point out an optimization which lowers the communication
complexity per query from linear to logarithmic in the number of updates.
Multi-user SSE. Previous work on searchable encryption only considered the single-user setting.
We also consider a natural extension of this setting, namely, the multi-user setting, where a user owns
the data, but an arbitrary group of users can submit queries to search his document collection. The
owner can control the search access by granting and revoking searching privileges to other users. We
formally define searchable encryption in the multi-user setting, and present an efficient construction
that does not require authentication, thus achieving better performance than simply using access
control mechanisms.
Finally, we note that in most of the works mentioned above the server is assumed to be honest-
but-curious. However, using techniques for memory checking [14] and universal arguments [7] one can
make those solutions robust against malicious servers at the price of additional overhead. We restrict
our attention to honest-but-curious servers as well.
1.2 On different models for private search
Before providing a detailed comparison to existing work, we put our work in context by providing
a classification of the various models for privacy-preserving search. In recent years, there has been
some confusion regarding three distinct models: searching on private-key encrypted data (which is the
subject of this work); searching on public-key encrypted data; and single-database private information
retrieval (PIR).
3

Common to all three models is a server (sometimes called the “database”) that stores data, and a
user that wishes to access, search, or modify the data while revealing as little as possible to the server.
There are, however, important differences between these three settings.
Private-key searchable encryption. In the setting of searching on private-key-encrypted data,
the user himself encrypts the data, so he can organize it in an arbitrary way (before encryption) and
include additional data structures to allow for efficient access of relevant data. The data and the
additional data structures can then be encrypted and stored on the server so that only someone with
the private key can access it. In this setting, the initial work for the user (i.e., for preprocessing the
data) is at least as large as the data, but subsequent work (i.e., for accessing the data) is very small
relative to the size of the data for both the user and the server. Furthermore, everything about the
user’s access pattern can be hidden [35, 25].
Public-key searchable encryption. In the setting of searching on public-key-encrypted data,
users who encrypt the data (and send it to the server) can be different from the owner of the decryption
key. In a typical application, a user publishes a public key while multiple senders send e-mails to the
mail server [15, 2]. Anyone with access to the public key can add words to the index, but only the
owner of the private key can generate “trapdoors” to test for the occurrence of a keyword. Although
the original work on public-key encryption with keyword search (PEKS) by Boneh, di Crescenzo,
Ostrosvky and Persiano [15] reveals the user’s access pattern, Boneh, Kushilevitz, Ostrovsky and
Skeith [16] have shown how to build a public-key encryption scheme that hides even the access pattern.
This construction, however, has an overhead in search time that is proportional to the square root of
the database size, which is far less efficient then the best private-key solutions.
Recently, Bellare, Boldyreva and O’Neill [8] introduced the notion of public key efficiently search-
able encryption (ESE) and proposed constructions in the random oracle model. Unlike PEKS, ESE
schemes allow anyone with access to a user’s public key to add words to the index and to generate
trapdoors to search. While ESE schemes achieve optimal search time (same as our constructions see
below), they are inherently deterministic and therefore provide security guarantees that are weaker
than the ones considered in this work.
Single-database PIR. In single-database private information retrieval (or PIR), introduced by
Kushilevitz and Ostrovsky [31], a user can retrieve data from a server containing unencrypted data
without revealing the access pattern and with total communication less then the data size. This was
extended to keyword searching, including searching on streaming data [36]. We note, however, that
since the data in PIR is always unencrypted, any scheme that tries to hide the access pattern must
touch all data items. Otherwise, the server learns information: namely, that the untouched item was
not of interest to the user. Thus, PIR schemes require work which is linear in the database size. Of
course, one can amortize this work for multiple queries and multiple users in order to save work of
the database per query, as shown in [27, 28], but the key feature of all PIR schemes is that the data
is always unencrypted, unlike the previous two settings on searching on encrypted data.
1.3 Versions of this Paper
This is the full version of [20] and includes all omitted proofs and several improvements. Following [19],
the definition of SSE used in this version explicitly captures the encryptions of the documents. Using
the terminology of [19], we consider pointer-output SSE schemes as opposed to [20] which considered
structure-only schemes. While most previous work on SSE considers only the latter (ignoring how
the documents are encrypted), we prefer the former definition of SSE. Another difference with [20]
is in our treatment of multi-user SSE. Here, we describe the algorithms of a multi-user SSE scheme
4

as stateful which allows us to provide a “cleaner” description of our construction. Finally, we note
that the simulation-based definitions used in this work (i.e., Definitions 4.8 and 4.11) differ from the
definitions that appeared in a preliminary full version of this paper (i.e., Definitions 3.6 and 3.9 in
[21]). We believe that the formulations provided here are easier to work with and intuitively more
appealing.
2 Related Work
We already mentioned the work on oblivious RAMs [35, 25]. In an effort to reduce the round complexity
associated with oblivious RAMs, Song, Wagner and Perrig [40] showed that a solution for searchable
encryption was possible for a weaker security model. Specifically, they achieve searchable encryption
by crafting, for each word, a special two-layered encryption construct. Given a trapdoor, the server
can strip the outer layer and assert whether the inner layer is of the correct form. This construction,
however, has some limitations: while the construction is proven to be a secure encryption scheme, it is
not proven to be a secure searchable encryption scheme; the distribution of the underlying plaintexts
is vulnerable to statistical attacks; and searching is linear in the length of the document collection.
The above limitations are addressed by the works of Goh [23] and of Chang and Mitzenmacher [18],
who propose constructions that associate an “index” to each document in a collection. As a result, the
server has to search each of these indexes, and the amount of work required for a query is proportional
to the number of documents in the collection. Goh introduces a notion of security for indexes (IND-
CKA and the slightly stronger IND2-CKA), and puts forth a construction based on Bloom filters [13]
and pseudo-random functions. Chang and Mitzenmacher achieve a notion of security similar to IND2-
CKA, except that it also tries to guarantee that the trapdoors not leak any information about the
words being queried. We discuss these security definitions and their limitations in more detail in
Section 4 and Appendix B.
As mentioned above, encryption with keyword search has also been considered in the public-key
setting [15, 2], where anyone with access to a user’s public-key can add words to an index, but
only the owner of the private-key can generate trapdoors to test for the occurrence of a keyword.
While related, the public-key solutions are suitable for different applications and are not as efficient
as private-key solutions, which is the main subject of this work. Public key efficiently searchable
encryption (ESE) [8] achieves efficiency comparable to ours, but at the price of providing weaker
security guarantees. The notion of ESE, originally proposed in a public key setting was extended to
the symmetric key setting [5], which views the outsourced data as a relational database and seeks
to achieve query-processing efficiency comparable to that for unencrypted databases. These schemes
sacrifice security in order to preserve general efficiency and functionality: Similar to our work, the
efficiency of operations on encrypted and unencrypted databases are comparable; unlike our work,
this comes at the cost of weakening the security definition (in addition to revealing the user’s query
access pattern, the frequency distribution of the plaintext data is also revealed to the server prior to
any client queries). Further, we also note that the notion of multi-user SSE—which we introduce in
this work—combined with a classical public-key encryption scheme, achieves a functionality similar
to that of public key ESE, with the added benefit of allowing the owner to revoke search privileges.
Whereas this work focuses on the case of single-keyword equality queries, we note that more
complex queries have also been considered. This includes conjunctive queries in the symmetric key
setting [26, 6]; it also includes conjunctive queries [37, 17], comparison and subset queries [17], and
range queries [39] in the public-key setting.
Unlike the above mentioned work on searchable encryption that relies on computational assump-
tions, Sedghi et al. [38] propose a model that targets an information theoretic security analysis.
Naturally, SSE can also be viewed as an instance of secure two-party/multi-party computation [41,
24, 11]. However, the weakening and refinement of the privacy requirement (more on this below) as
5

Citations
More filters
Book ChapterDOI

Cryptographic cloud storage

TL;DR: This work considers the problem of building a secure cloud storage service on top of a public cloud infrastructure where the service provider is not completely trusted by the customer and describes several architectures that combine recent and non-standard cryptographic primitives to achieve this goal.
Journal ArticleDOI

Privacy-Preserving Multi-Keyword Ranked Search over Encrypted Cloud Data

TL;DR: This paper proposes a basic idea for the MRSE based on secure inner product computation, and gives two significantly improved MRSE schemes to achieve various stringent privacy requirements in two different threat models and further extends these two schemes to support more search semantics.
Proceedings ArticleDOI

Dynamic searchable symmetric encryption

TL;DR: In this article, the authors proposed a searchable symmetric encryption (SSE) scheme to achieve sublinear search time, security against adaptive chosen-keyword attacks, compact indexes and the ability to add and delete files efficiently.
Posted Content

Dynamic Searchable Symmetric Encryption.

TL;DR: This work proposes the first SSE scheme to satisfy all the properties of searchable symmetric encryption and extends the inverted index approach in several non-trivial ways and introduces new techniques for the design of SSE.
Journal ArticleDOI

Enabling Personalized Search over Encrypted Outsourced Data with Efficiency Improvement

TL;DR: This paper study and solve the problem of personalized multi-keyword ranked search over encrypted data (PRSE) while preserving privacy in cloud computing with the help of semantic ontology WordNet, and proposes two PRSE schemes for different search intentions.
References
More filters
Journal ArticleDOI

Space/time trade-offs in hash coding with allowable errors

TL;DR: Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.
Proceedings ArticleDOI

How to play ANY mental game

TL;DR: This work presents a polynomial-time algorithm that, given as a input the description of a game with incomplete information and any number of players, produces a protocol for playing the game that leaks no partial information, provided the majority of the players is honest.
Proceedings ArticleDOI

Protocols for secure computations

TL;DR: This paper describes three ways of solving the millionaires’ problem by use of one-way functions (i.e., functions which are easy to evaluate but hard to invert) and discusses the complexity question “How many bits need to be exchanged for the computation”.
Journal ArticleDOI

OceanStore: an architecture for global-scale persistent storage

TL;DR: OceanStore monitoring of usage patterns allows adaptation to regional outages and denial of service attacks; monitoring also enhances performance through pro-active movement of data.
Proceedings ArticleDOI

Practical techniques for searches on encrypted data

TL;DR: This work describes the cryptographic schemes for the problem of searching on encrypted data and provides proofs of security for the resulting crypto systems, and presents simple, fast, and practical algorithms that are practical to use today.
Related Papers (5)
Frequently Asked Questions (13)
Q1. What are the contributions in "Searchable symmetric encryption: improved definitions and efficient constructions" ?

This problem has been the focus of active research and several security definitions and constructions have been proposed. In this paper the authors begin by reviewing existing notions of security and propose new and stronger security definitions. The authors then present two constructions that they show secure under their new definitions. The authors consider the natural extension where an arbitrary group of parties other than the owner can submit search queries. The authors formally define SSE in this multi-user setting, and present an efficient construction. Further, prior work on SSE only considered the setting where only the owner of the data is capable of submitting search queries. 

The main challenge lies in proving such constructions secure in the simulation paradigm, since the simulator requires the abilityto commit to a correct index before the adversary has even chosen its search queries—in other words, the simulator needs to commit to an index and then be able to perform some form of equivocation. 

Since revoked users will not be able to recover r′, with overwhelming probability, their queries will not yield a valid trapdoor after the server applies φ−1r′ . 

The search pattern induced by a q-query history H = (D,w), is a symmetric binary matrix σ(H) such that for 1 ≤ i, j ≤ q, the element in the ith row and jth column is 1 if wi = wj, and 0 otherwise. 

For applications where the number of queries dominates the number of updates, their solution may significantly reduce the communication size and the server’s computation. 

The implication is that, contrary to the natural use of searchable encryption described in [40, 23, 18], these definitions only guarantee security for users that perform all their searches at once. 

The difficulty of proving their SSE-1 construction secure against an adaptive adversary stems from the difficulty of simulating in advance an index for the adversary that will be consistent with future unknown queries. 

Since each node of Li contains a pointer to the next node, the server can locate and decrypt all the nodes of Li, revealing the identifiers in D(wi). 

When the user wants to retrieve the documents that contain keyword wi, it computes the decryption key and the address for the corresponding entry in T and sends them to the server. 

If access control mechanisms were used instead for this step, a more expensive authentication protocol would be required for each search query in order to establish the identity of the querier. 

To avoid revealing the number of distinct keywords in D, the authors add an additional |∆| − |δ(D)| entries in T filled with random values so that the total number of entries is always equal to |∆|. 

Let ∆ = (w1, . . . , wd) be a dictionary of d words in lexicographic order, and 2∆ be the set of all possible documents with words in ∆. 

the maximum number of entries in a look-up table will be polynomial in `, so the number of virtual addresses that are used is poly(`).