scispace - formally typeset
Search or ask a question
Journal ArticleDOI

5PM: Secure pattern matching

01 Sep 2013-Journal of Computer Security (IOS Press)-Vol. 21, Iss: 5, pp 601-625
TL;DR: The problem of secure pattern matching that allows single-character wildcards and substring matching in the malicious stand-alone setting is considered and the first secure expressive pattern matching protocol designed to optimize round complexity by carefully specifying the entire protocol round by round is considered.
Abstract: In this paper we consider the problem of secure pattern matching that allows single-character wildcards and substring matching in the malicious stand-alone setting. Our protocol, called 5PM, is executed between two parties: Server, holding a text of length n, and Client, holding a pattern of length m to be matched against the text, where our notion of matching is more general than traditionally considered and includes non-binary alphabets, non-binary Hamming distance and non-binary substring matching.5PM is the first secure expressive pattern matching protocol designed to optimize round complexity by carefully specifying the entire protocol round by round. 5PM requires only eight rounds in the malicious static corruptions model. In the malicious model, 5PM requires O((m+n)k2) communication complexity and O(m+n) encryptions, where m is the pattern length and n is the text length. Further, 5PM can hide pattern size with no asymptotic additional costs in either computation or bandwidth.

Summary (5 min read)

1 Introduction

  • Pattern matching is fundamental to computer science.
  • It was also supported by the OKAWA Foundation Research Award, IBM Faculty Research Award, Xerox Faculty Research Award, B. John Garrick Foundation Award, Teradata Research Award and LockheedMartin Corporation Research Award.
  • A secure version of pattern matching has many applications, also known as – Substring pattern matching.

1.1 Our Contributions

  • This paper presents 5ecure Pattern Matching (or 5PM), a new protocol for arbitrary alphabets that addresses, in addition to exact matching, more expressive search queries including single-character wildcards and substring pattern matching, and also provides the ability to hide pattern length.
  • 5PM has communication complexity sublinear in circuit size (as opposed to general MPC, which has communication complexity linear in circuit size) to securely compute non-binary substring matching in the malicious model.
  • The authors malicious model protocol requires O((m+ n)k2) bandwidth complexity.
  • Here and throughout, the authors use the DNA alphabet (Σ = {A,C,G, T}) for examples.

1.2 Comparison to Previous Work

  • In the exact pattern matching setting, the algorithm of Freedman, Ishai, Pinkas and Reingold [13] achieves polylogarithmic overhead in m and n and polynomial overhead in security parameters in the honest-but-curious setting.
  • Recently, Vergnaud [14] built on the work of Hazay and Toft [16] to construct an efficient secure pattern matching scheme for wildcard matching and substring matching (requiring t runs over the preliminary matching result to search for t different Hamming distance values, which is also required by 5PM) in the malicious adversary model.
  • By contrast, 5PM has the same overhead except for O(nm) exponentiations (see Table 2).
  • The second is that their techniques are of independent interest and may be extended to additional functionalities.
  • Jarrous and Pinkas [15] gave the first construction of a secure protocol for computing non-binary Hamming distances.

2 Preliminaries

  • The rationale behind their secure 5PM protocol is based on a modification of an insecure pattern matching algorithm (IPM) [29] that can perform exact matching, exact matching with singlecharacter wildcards, and substring matching within the same algorithm.
  • In Section 3.1, the authors show how their modified algorithm can be reduced to basic linear operations whose secure and efficient evaluation allows us to obtain their 5PM protocol.

2.1 Insecure Pattern Matching (IPM) Algorithm

  • To illustrate how their modified algorithm works, the authors begin by describing how it performs exact matching; they then show how it handles single-character wildcards and substring matching.
  • IPM involves the following steps: a. Inputs:.
  • It then adds a 1 at the position in the activation vector several steps ahead, where it would expect the pattern to end (if the character appears in multiple positions in the pattern, it adds a 1 to all the corresponding positions where the pattern might end).
  • The activation vector will be initialized to all zeros.
  • This operation does not incur any false positives for the same reason that the exact matching IPM algorithm does not: there, for each pattern p, there is only one encoding into CDV s and only one sequence of adding CDV s as one moves along the text that could add up to m.

2.2 Preliminary Cryptographic Tools

  • This section outlines preliminary cryptographic tools required for their protocols.
  • The authors make use of additively homomorphic semantically secure encryption schemes.
  • For concreteness, in the rest of this paper the authors concentrate on the additively homomorphic ElGamal encryption scheme whose security depends on the Decisional Diffie-Hellman (DDH) computational hardness assumption.
  • While the authors use threshold ElGamal, in practice, any scheme is acceptable if it satisfies the required properties and supports the needed zero-knowledge arguments.
  • For the malicious model protocol, the authors will make use of perfectly hiding, computationally binding commitment schemes (for further discussion, see [33]).

2.3 Computing Linear Operations Using Additively Homomorphic Encryption Schemes.

  • The authors secure pattern-matching protocol relies on the following observations about linear operations and additively homomorphic encryption schemes.
  • In what follows, let E be the encryption algorithm for an additively homomorphic encryption scheme for key pair (pk, sk).
  • Suppose that P1 possesses pk, Epk(A), the entry-wise encryption of A, and also the unencrypted matrix B. Then P1 can compute Epk(A ·B), the encryption of the multiplication of A and B under the same pk. 2.3.2 Matrix Operators.
  • More specifically, an affine hash function Zklq →.
  • Only with probability 1/q will the decryptions equal each other when A 6= B because the hash function is chosen uniformly at random.

3 5PM Protocol

  • This section uses the above observations and cryptographic tools to construct the secure patternmatching protocol (5PM).
  • The authors develop πH5PM for the honest-but-curious adversary model and π M 5PM for the malicious (static corruption) adversary model.

3.1 Converting IPM to Linear Operations.

  • In reality, since MT and MCDV are 0/1 matrices, multiplication is more computationally expensive than necessary, and vectors can simply be selected (as shown in the IPM description in Section 2.1).
  • This transformation, jointly with the previous step, constructs a matrix of CDV s where the ith row contains only CDV (Ti), which starts in the ith position in the ith row (sets up step d in Section 2.1.1).

3.2 Honest-Cut-Curious (HBC) 5PM Protocol

  • The authors begin by describing the intuition behind required modifications to secure IPM in the HBC adversary model.
  • When Client sends Server E(MCDV ), by the reasoning of Sections 2.3 and 3.1, Server can compute E(AV ), an encrypted activation vector, using only MT and E(MCDV ).
  • The authors refer the reader to Sections 3.1 and 2.3.2 for the notation used here.
  • The protocol operation is as follows: a. Client computes (sk, pk) ← Key(1k) using the key generation algorithm of an additively homomorphic encryption scheme, E. b. Client computes MCDV ← GenCDV (p).
  • In particular, πH5PM does not require multiple independent protocol executions to compute substring matching for a range of substring length values.

3.3 Malicious Model 5PM Protocol

  • The authors describe an instantiation of πM5PM based on additively homomorphic threshold ElGamal encryption (see Section 2.2) for concreteness; generalization to other encryption schemes follows provided they have efficient Σ protocols for the statements required here.
  • Second, the authors give interactive zero-knowledge consistency arguments that will be required.
  • Finally, the authors divide πM5PM into six subprotocols and describe their construction and how they are combined into the final protocol πM5PM .
  • ΠS,AV is a two-party protocol executed between Client and Server which outputs to Server an encrypted activation vector corresponding to matching Client’s p against Server’s T .

4.1 Definitions

  • The authors consider interactive protocols that have the following specification: a. P sends message a, |a| ∈ poly(|x|).
  • Σ protocols that only have standard soundness will not always satisfy the lemma.
  • For all x such that there does not exist a w with (x,w) ∈ R, V will only accept with negligible probability.
  • The authors first construct an extractable equivocable commitment scheme and use this scheme together with the Σ protocol specification for the ZK-AoK construction.

4.2 Extractable Equivocable Commitment Schemes

  • Such a scheme is an interactive protocol between a PPT committer C and a PPT receiver R consisting of three functions: EComSet instantiates the commitment scheme, com computes the commitment, and EComV er verifies that decommitment is valid.
  • For pk correctly constructed and any messages s and s′, the distributions of com(s, r, pk) and com(s′, r′, pk) are statistically indistinguishable over the choice of random input (e.g., r and r′).
  • The above EP protocol has bandwidth complexityO(k2) and computational complexityO(k2 log2 k).
  • Just like Pedersen commitments, this commitment scheme is statistically hiding and computa- tionally binding.

4.3 Construction of a ZK-AoK from Σ Protocols

  • The authors give a construction for how to transform a three-move Σ argument of knowledge Σrel for a binary relation Rrel into a five-move ZK argument of knowledge πrel for Rrel using the extractable equivocable commitment scheme EP described in Section 4.2.
  • Then it follows that Σrel has a verifier V that accepts a transcript with non-negligible probability for the same x.
  • This implies that there are at least two distinct challenges e and e′ such that P can produce accepting transcripts (a, e, z) and (a, e′, z′) for Σrel within πrel (in fact, there must be a non-negligible number of such challenges).
  • Note that in particular, the fact that Σ protocols are special honest verifier zero knowledge is important, as it implies the ability to construct correct transcripts for arbitrary (pre-selected) distributions of verifier messages.
  • EP then rewinds to rel-4, after P has already instantiated the commitment scheme and sent its initial message a for Σrel, and changes its challenge for Σrel according to the specification of Erel,P .

6 Detailed πM5PM Specification

  • The authors provide here the detailed protocol specification of the malicious model version of 5PM , πM5PM .
  • First, the authors must specify the various zero-knowledge arguments of consistency that are required.

6.1 Arguments of Knowledge of Consistency

  • The authors first describe five required interactive arguments which they rely on to prove statements required in the πM5PM protocol.
  • They are designed for use with the specified threshold ElGamal encryption scheme (Section 2.2).
  • The five required interactive arguments are: AM01, an AoK of Consistency for Matrix Formation 0/1: APD, an AoK of Consistency for Partial Decryption: ARand, an AoK of Consistency for Randomization: AFD, an AoK of Consistency for Final Decryption:.
  • The authors denote by AFD the five-move interactive argument where P proves, using l parallel instantiations of πfin, that either the l encryptions (xi, yi) has been partially decrypted correctly or that P knows the discrete logarithm of gw.

6.2 πM5PM Protocol Specification

  • The eight round protocol for the malicious model, πM5PM , consists of the following six subprotocols: (a) πencr: initializes an additively homomorphic threshold encryption scheme.
  • Allows Client to also construct an encrypted activation vector for Client’s pattern and Server’s encrypted text, also known as (c) πC,AV.
  • Client input is pattern p, MCDV for p, and pt, the matching threshold.
  • This subprotocol starts at global round 3 and ends at global round 5, with ZK preprocessing occurring during global rounds 1 and 2. – Client also sends A P,2 M01 to prove that E(MCDV ) is formatted correctly, where A P,1 M01 and A V,1 M01 occur during global rounds 1 and 2, respectively.
  • Server also sends the message comm(A P,2 FD), where AFD is the argument to prove that either DS(E(AV r S )) was obtained correctly or that Server knows s∗ (for h∗ sent by Client in the first global round during πencr), whereas A P,1 FD and AV,1FD are sent during global rounds 2 and 3, respectively.

7.1 Adversarial Model

  • The authors refer the reader to [33,38] for further discussion of the definitions given here.
  • Note that parties can be defined via their next message functions; see, for example, [39].
  • In particular, the corrupted party may choose to abort and to not complete the protocol at all.

7.2 Simulator Constructions and Security for πH5PM

  • The authors provide, for each admissible pair in the real world, an admissible pair in the ideal world such that REAL πH5PM P̄ (x̄, ȳ, r̄) and IDEAL πH5PM P̄ ′ (x̄, ȳ, r̄) are computationally indistinguishable.
  • Consider the admissible pair P̄ = (Client, Server) in the real world.
  • Note that SS has oracle access to real-world Server.
  • The authors assume that the encryption scheme (Key,E,D) is fixed.
  • The authors construct SS for an admissible pair P̄ ′ =(SC , Server) in the ideal world (where Server behaves honestly in both cases) such that REAL πH5PM P̄ (x̄, ȳ, r̄) and IDEAL πH5PM P̄ ′ (x̄, ȳ, r̄) are computationally indistinguishable.

7.3 Simulator Constructions and Security for πM5PM

  • The authors provide, for each admissible pair in the real world, an admissible pair in the ideal world such that REAL πM5PM P̄ (x̄, ȳ, r̄) and IDEAL πM5PM P̄ ′ (x̄, ȳ, r̄) are computationally indistinguishable.
  • Server also sends the message comm(A P,2 FD), where AFD is the argument to prove that either DS(E(AV r S )) was obtained correctly or that Server knows s∗ (for h∗ sent by SS in the first global round during πencr), where AP,1FD and AV,1FD are sent during global rounds 2 and 3, respectively.
  • Therefore, the zero knowledge distinguisher DZK distinguishes the two cases of VZK ’s interaction with non-negligible probability by running D internally, which will distinguish the two views of the ZK execution with non-negligible probability, which is a contradiction.
  • Once the two interactions are done, VZK completes the internal execution of πM5PM .
  • Renc encrypts with the Client’s sC (which it obtains by running the knowledge extractor; as in hybrid H1, this does not affect transcript indistinguishability), and uses this final encryption as the Server-side encryption; this final encryption corresponds to encryption with the secret key sC + sS .

8 Detailed Performance Results of 5PM Implementation

  • The authors experiments were performed on an Intel dual quad-core 2.93GHz machine with 8GB of memory running Ubuntu Linux version 10.10.
  • The authors used fast-decryption Paillier [40] from the Self-Certifying File System (SFS) library [41], and used alphabets of sizes 4 (DNA) and 36 .
  • The authors implementation results in Table 13 show that on average, 95% of the total online runtime was spent in three components of the protocol, two at Server and one at Client.
  • The first is searching the text at Server by adding CDVs, which correspond to pattern characters, to the activation vector; the second is blinding elements of the activation vector at the Server; the third is decrypting the blinded activation vector at Client.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

5PM: Secure Pattern Matching
?
Joshua Baron,
2
Karim El Defrawy,
2
Kirill Minkovich,
2
Rafail Ostrovsky,
1
and Eric Tressler
2
1
Departments of Mathematics and Computer Science, UCLA, Los Angeles, CA, USA 90095
2
Information and System Sciences Laboratory, HRL Laboratories, LLC, Malibu, CA, USA, 90265
{jwbaron,kmeldefrawy,kminkovich,eptressler}@hrl.com, rafail@cs.ucla.edu
Abstract. In this paper we consider the problem of secure pattern matching that allows single-
character wildcards and substring matching in the malicious (stand-alone) setting. Our protocol, called
5PM, is executed between two parties: Server, holding a text of length n, and Client, holding a pattern
of length m to be matched against the text, where our notion of matching is more general and includes
non-binary alphabets, non-binary Hamming distance and non-binary substring matching.
5PM is the first secure expressive pattern matching protocol designed to optimize round complexity
by carefully specifying the entire protocol round by round. In the malicious model, 5PM requires
O((m + n)k
2
) bandwidth and O(m + n) encryptions, where m is the pattern length and n is the text
length. Further, 5PM can hide pattern size with no asymptotic additional costs in either computation or
bandwidth. Finally, 5PM requires only two rounds of communication in the honest-but-curious model
and eight rounds in the malicious model. Our techniques reduce pattern matching and generalized
Hamming distance problems to a novel linear algebra formulation that allows for generic solutions
based on any additively homomorphic encryption. We believe our efficient algebraic techniques are of
independent interest.
1 Introduction
Pattern matching is fundamental to computer science. It is used in many areas, including text
processing, database search [1], networking and security applications [2] and recently in the context
of bioinformatics and DNA analysis [3,4,5]. It is a problem that has been extensively studied, re-
sulting in several efficient (although insecure) techniques to solve its many variations, e.g., [6,7,8,9].
The most common interpretation of the pattern matching problem is the following: given a finite
alphabet Σ, a text T Σ
n
and a pattern p Σ
m
, the exact pattern matching decision problem
requires one to decide whether or not a pattern appears in the text. The exact pattern matching
search problem requires finding all indices i of T (if any) where p occurs as a substring starting
at position i. If we denote by T
i
the ith character of T , the output should be the set of match-
ing positions MP
:
= {i | p matches T beginning at T
i
}. The following generalizations of the exact
matching problem are often encountered, where the output in all cases is the set MP :
Pattern matching with single-character wildcards
1
: There is a special character / Σ that
matches any single-character of the alphabet, where p {Σ {∗}}
m
and T Σ
n
. Using such
?
This work was done while the first author was at UCLA. The work of the first and fourth author is supported in
part by NSF grants CCF-0916574, IIS-1065276, CCF-1016540, CNS-1118126, CNS-1136174, and by US-Israel BSF
grant 2008411. It was also supported by the OKAWA Foundation Research Award, IBM Faculty Research Award,
Xerox Faculty Research Award, B. John Garrick Foundation Award, Teradata Research Award and Lockheed-
Martin Corporation Research Award. The material contained herein is also based upon work supported by the
Defense Advanced Research Projects Agency through the U.S. Office of Naval Research under Contract N00014-
11-1-0392. The views expressed are those of the author and do not reflect the official policy or position of the
Department of Defense or the U.S. Government. The authors would like to thank Jonathan Katz, Sky Faber and
Matt Cheung for helpful discussions and comments.
1
Such wildcards are also called “do not cares” and “mismatches” in the literature.
1
c
2011 HRL Laboratories, LLC. All Rights Reserved

Paper NB Hamming Exact Wildcard NB Substring Security
Distance Matching Matching Matching
[13] No Yes No No HBC/M
[14] Yes
Yes Yes Yes
HBC/M
[15] Yes No
∗∗
No
∗∗
No
∗∗
HBC
5PM Yes Yes Yes Yes HBC/M
Table 1. Comparison of previous protocol functionality, NB=non-binary HBC=honest but curious,
M=malicious, *=using unary encoding and additional tools, **=can be extended
a “wildcard” character allows one pattern to be specified that could match several sequences
of characters. For example the pattern T A would match any of the following character
sequence in a text
2
: T AA, T AC, T AG, and T AT .
Substring pattern matching: Fix some l m; a match for p is found whenever there exists in
T an m-length string that differs in l characters from p (i.e., has Hamming distance l from
p). For example, the pattern T AC has m = 3. If l = 1, then any of the following words will
match: AC, T C, or T A; note that this is an example of non-binary substring matching.
A secure version of pattern matching has many applications. For example, secure pattern matching
can help secure databases that contain medical information such as DNA records, while still al-
lowing one to perform pattern matching operations on such data. The need for privacy-preserving
DNA matching has been highlighted in recent papers [10,11,12]. In addition to the case of DNA
matching, where substring matching may be particularly useful, Hamming distance-based approx-
imate matching has also been demonstrated in the case of secure facial recognition [3]. We note
that both of these settings require computation over non-binary alphabets.
1.1 Our Contributions
This paper presents 5ecure Pattern Matching (or 5PM), a new protocol for arbitrary alphabets that
addresses, in addition to exact matching, more expressive search queries including single-character
wildcards and substring pattern matching, and also provides the ability to hide pattern length.
5PM has communication complexity sublinear in circuit size (as opposed to general MPC, which
has communication complexity linear in circuit size) to securely compute non-binary substring
matching in the malicious model. In addition, our extension of Hamming distance computation
to substring matching has minimal overhead; our protocol makes a single computation pass per
text element, even for multiple Hamming distance values, and therefore is able to securely compute
non-binary substring matching efficiently (see Table 1 for a comparison of protocol functionality
and Tables 2 and 3 for a comparison of protocol overhead).
5PM performs exact, single-character wildcards, and substring pattern matching in the honest-
but-curious and malicious (static corruption) models. Our malicious model protocol requires O((m+
n)k
2
) bandwidth complexity. Further, our protocol can be specified to require two (one-way) rounds
of communication in the semi-honest model and eight (one-way) rounds of communication in the
malicious model.
We construct our protocols by reducing the problems of Hamming distance and pattern match-
ing, including single-character wildcards and substring matching, to a sequence of linear operations.
2
Here and throughout, we use the DNA alphabet (Σ = {A, C, G, T }) for examples.
2

Paper Encryptions Exponentiations Multiplications Bandwidth Rds
[16] O(mn) O(mn) O(mn) O(mnk
2
) O(1)
[14] O(n + m) O(n log m) O(nm) O((n + m)k
2
) O(1)
5PM O(n + m) O(nm) O(nm) O((n + m)k
2
) 8
Table 2. Detailed comparison with [14] and [16] for single-character wildcards and substring match-
ing in malicious model with text length=n, pattern length=m, security parameter=k, rounds=Rds.
Paper Encryptions Exponentiations Multiplications Bandwidth Rds
[15] O(n + m) O(nm) O(nm) O((nm)k) O(1)
5PM O(n + m) O(n + m) O(nm) O((n + m)k) 2
Table 3. Detailed comparison with [15] for non-binary substring matching in HBC model with
text length=n, pattern length=m, security parameter=k, rounds=Rds.
We then rely on the observation that these linear operations, such as the inner products and matrix
multiplication, can be efficiently computed in the malicious model using additively homomorphic
encryption schemes.
The security requirements (informally) dictate that the party holding the text learns nothing
except the upper bound on the length of the pattern, while the party holding the pattern only
learns either a binary (yes/no) answer for the decision problem or the matching positions (if any),
and nothing else.
1.2 Comparison to Previous Work
Exact Matching. In the exact pattern matching setting, the algorithm of Freedman, Ishai,
Pinkas and Reingold [13] achieves polylogarithmic overhead in m and n and polynomial overhead
in security parameters in the honest-but-curious setting. Using efficient arguments [17,18] with
the modern probabilistically checkable proofs (PCPs) of proximity [19], one can extend (at least
asymptotically) their results to the malicious (static corruption) model. However, the protocol in
[13] works only for exact matching and does not address more general problems, including single-
character wildcards and substring matching, which are the main focus of our work. Other protocols
that address secure exact matching (and not wildcard or substring matching) are [12,20,21,22,23,11];
of these, only [22] obtains (full) security in the malicious setting. We note that [23] is more efficient
than [13], but only in the random oracle model; here, we are interested in standard security models.
Single-Character Wildcards and Substring Matching. Recently, Vergnaud [14] built on
the work of Hazay and Toft [16] to construct an efficient secure pattern matching scheme for wildcard
matching and substring matching (requiring t runs over the preliminary matching result to search
for t different Hamming distance values, which is also required by 5PM) in the malicious adversary
model. More specifically, [14,16] take advantage of the fact that (p
i
t
i
)
2
equals 0 if binary values
p
i
and t
i
are equal and 1 if they are not equal; therefore, binary Hamming distance can essentially
be computed by counting the number of 1s in a particular polynomial-based computation. However,
when p
i
and t
i
are non-binary, it is unknown how to obtain 0 when p
i
and t
i
equal, and 1 (or some
other fixed value) when they are not equal using oblivious polynomial evaluations.
However, non-binary elements can be computed by unary encoding; that is, an element α Σ
can be encoded as an element α
0
{0, 1}
|Σ|
with all 0s except for a single 1 in the place representing
3

α (lexicographically). There are two subtleties of such an approach. The first is that if α 6= β, then
α
0
and β
0
will have Hamming distance 2 instead of 1; the second is, in the malicious case, zero
knowledge proofs are needed to demonstrate that α
0
is well formed.
[14] requires O(m + n) encryptions, O(n log m) exponentiations, O(nm) multiplications (of en-
crypted elements), and O(n +m) bandwidth, all in a constant number of rounds. By contrast, 5PM
has the same overhead except for O(nm) exponentiations (see Table 2). However, our work is of
interest for several reasons. The first is that we have implemented our protocol and believe it to
be more efficient (additional work is needed on this front). The second is that our techniques are
of independent interest and may be extended to additional functionalities. Finally, the protocol
presented here is fully specified; by contrast, additional work is needed to transform the work of
[14] into a protocol that can support non-binary alphabets for substring matching or to calculate
Hamming distance in the malicious case.
Non-binary Hamming Distance. Jarrous and Pinkas [15] gave the first construction of a
secure protocol for computing non-binary Hamming distances. In order to count the non-binary
mismatches, they leverage 1-out-of-2 oblivious transfers. 5PM can also compute non-binary Ham-
ming distance even when the text and pattern have the same length (and where the output is not
blinded to only reveal whether or not a pattern match occurred). We note that [15] can be used to
implement exact and substring matching with additional tools to blind Hamming distance output
(for instance, see [14]). [15], to compare two strings of length n, requires O(n) 1-out-of-2 OTs, O(n)
multiplications of encryptions and O(nk) bandwidth, while 5PM requires O(n) exponentiations
(which require less computation than OTs), O(n
2
) multiplications, and O(nk) bandwidth. The ad-
vantage of 5PM over [15] is twofold: the first is that 5PM is proven secure in the malicious model
while [15] is not; the second is that 5PM, in both the honest-but-curious and malicious models,
amortizes well in the substring matching setting, while [15] does not amortize because it cannot
reuse OT outputs to compute substring matching (see Table 3).
Other Techniques. In the most general case, secure exact, approximate and single-character
wildcards pattern matching is an instance of general secure two-party computation techniques (for
instance, [24,25,26,27]). All of these schemes have bandwidth and computational complexity at
best linear in the circuit size. For instance, a naive implementation of Yao [24] requires bandwidth
O(mn) in the security parameter. In contrast, we aim for a protocol where circuit size is O(mn),
yet we achieve communication complexity of O(m + n).
Finally, we observe that with the construction of fully homomorphic encryption (FHE) schemes
[28], the following “folklore” construction can be executed for any pattern matching algorithm:
Client encrypts its pattern using an FHE scheme and sends it to Server. Server applies the ap-
propriate pattern matching circuit to the encrypted pattern (where the circuit output is a yes/no
indicating whether a match exists or not), and sends the FHE circuit output to Client. Client
decrypts to obtain the answer. Such a scheme requires O(m) bandwidth, but since FHE schemes
are not yet practical, we view the 5PM protocol outlined here as an efficient and practical solution
to secure pattern matching with single-character wildcards and substring matching.
4

2 Preliminaries
The rationale behind our secure 5PM protocol is based on a modification of an insecure pattern
matching algorithm (IPM) [29] that can perform exact matching, exact matching with single-
character wildcards, and substring matching within the same algorithm. In Section 3.1, we show
how our modified algorithm can be reduced to basic linear operations whose secure and efficient
evaluation allows us to obtain our 5PM protocol.
2.1 Insecure Pattern Matching (IPM) Algorithm
To illustrate how our modified algorithm works, we begin by describing how it performs exact
matching; we then show how it handles single-character wildcards and substring matching.
2.1.1 Exact Matching. IPM involves the following steps:
a. Inputs: An alphabet Σ, a text T Σ
n
and a pattern p Σ
m
.
b. Initialization: For each character in Σ, the algorithm constructs a vector, here termed a
Character Delay Vector (CDV ), of length equal to the pattern length, m. These vectors
are initialized with zeros. For example, if the pattern is: T ACT over Σ = {A, C, G, T }, then
the CDV s will be initialized to: CDV (A) = [0, 0, 0, 0], CDV (C) = [0, 0, 0, 0], CDV (G) =
[0, 0, 0, 0] and CDV (T ) = [0, 0, 0, 0].
c. Pattern preprocessing: For each pattern character p
i
(i {1, ..., m}), a delay value, d
r
p
i
, is
computed to be the number of characters from p
i
to the end of the pattern, i.e., d
r
p
i
= m i
for the rth occurrence of p
i
in p. The d
r
p
i
th position of CDV (p
i
) is set to 1. For example, the
CDV s of T ACT would be:
CDV (A) = [0, 0, 1, 0] because d
1
A
= 4 2 = 2
CDV (C) = [0, 1, 0, 0] because d
1
C
= 4 3 = 1
CDV (G) = [0, 0, 0, 0] because G 6∈ p
CDV (T ) = [1, 0, 0, 1] because d
1
T
= 4 4 = 0 and d
2
T
= 4 1 = 3
d. Matching pass and comparison with pattern length: A vector of length n called the Activation
Vector (AV ) is constructed, and its elements are initialized with zeros. For each input
text character T
j
, CDV (T
j
) is added element-wise to the AV from position j to position
min(n, j +m1). To determine if there was a pattern match in the text, after these operations
the algorithm checks (when j m) if AV
j
= m. If so, then the match started at position
j m + 1. The value j m + 1 is added to the set of matching positions (MP ). Note that
n AV
j
is the non-binary Hamming distance of the pattern and the text starting at position
j m + 1.
The intuition behind the algorithm is that when an input text character matches a character
in the pattern, the algorithm optimistically assumes that the following characters will correspond
to the rest of the pattern characters. It then adds a 1 at the position in the activation vector
several steps ahead, where it would expect the pattern to end (if the character appears in multiple
positions in the pattern, it adds a 1 to all the corresponding positions where the pattern might
end). If all subsequent characters are indeed characters in the pattern, then at the position where
a pattern would end the number of added 1s will sum up to the pattern length; otherwise the sum
will be strictly less than the pattern length. This algorithm does not incur false positives and always
indicates when (and where) a pattern occurs if it exists, as shown in [29].
5

Citations
More filters
Proceedings ArticleDOI
08 Nov 2013
TL;DR: This paper makes use of the somewhat homomorphic encryption scheme presented by Lauter, Naehrig and Vaikuntanathan (ACM CCSW 2011), which supports a limited number of both additions and multiplications on encrypted data and proposes a new packing method suitable for an efficient computation of multiple Hamming distance values onencrypted data.
Abstract: The basic pattern matching problem is to find the locations where a pattern occurs in a text. Recently, secure pattern matching has been received much attention in various areas, including privacy-preserving DNA matching and secure biometric authentication. The aim of this paper is to give a practical solution for this problem using homomorphic encryption, which is public key encryption supporting some operations on encrypted data.In this paper, we make use of the somewhat homomorphic encryption scheme presented by Lauter, Naehrig and Vaikuntanathan (ACM CCSW 2011), which supports a limited number of both additions and multiplications on encrypted data. In their work, some message encoding techniques are also presented for enabling us to efficiently compute sums and products over the integers. Based on their techniques, we propose a new packing method suitable for an efficient computation of multiple Hamming distance values on encrypted data. Our main extension gives two types of packed ciphertexts, and a linear computation over packed ciphertexts gives our desired results. We implemented the scheme with our packing method.Our experiments ran in an Intel Xeon at 3.07 GHz with our software library using inline assembly language in C programs. Our optimized implementation shows that the packed encryption of a text or a pattern, the computation of multiple Hamming distance values over packed ciphertexts, and the decryption respectively take about 3.65 milliseconds (ms), 5.31 ms, and 3.47 ms for secure exact and approximate pattern matching of a binary text of length 2048. The total time is about 12.43 ms, which would give the practical performance in real life. Our method gives both faster performance and lower communication than the state-of-the-art work for a binary text of several thousand bits in length.

130 citations


Cites methods from "5PM: Secure pattern matching"

  • ...5PM [3] O((k + )λ) 2 (semi-honest) practical in using exact, approximate, (MPC-based) O((k + )λ(2)) 8 (malicious) k ≤ 1000 ∼ 10000 wildcards, substring FHE O( ) 2 impractical any Our work O( /n ) 2 faster than 5PM exact, approximate (SHE scheme)...

    [...]

  • ...in [3] presented an efficient twoparties protocol of secure pattern matching for more expressive search queries including single character wildcards and substring pattern matching....

    [...]

Journal ArticleDOI
01 Jun 2015
TL;DR: It is proved security of the substring-searchable encryption scheme against malicious adversaries, where the query protocol leaks limited information about memory access patterns through the suffix tree of the encrypted string.
Abstract: In this paper, we consider a setting where a client wants to outsource storage of a large amount of private data and then perform substring search queries on the data – given a data string s and a search string p, find all occurrences of p as a substring of s. First, we formalize an encryption paradigm that we call queryable encryption, which generalizes searchable symmetric encryption (SSE) and structured encryption. Then, we construct a queryable encryption scheme for substring queries. Our construction uses suffix trees and achieves asymptotic efficiency comparable to that of unencrypted suffix trees. Encryption of a string of length n takes O(λn) time and produces a ciphertext of size O(λn), and querying for a substring of length m that occurs k times takes O(λm + k) time and three rounds of communication. Our security definition guarantees correctness of query results and privacy of data and queries against a malicious adversary. Following the line of work started by Curtmola et al. (ACM CCS 2006), in order to construct more efficient schemes we allow the query protocol to leak some limited information that is captured precisely in the definition. We prove security of our substring-searchable encryption scheme against malicious adversaries, where the query protocol leaks limited information about memory access patterns through the suffix tree of the encrypted string.

77 citations


Cites background from "5PM: Secure pattern matching"

  • ...These works take the approach of considering a specific type of query and identifying a data structure that allows efficient evaluation of those queries in an unencrypted setting....

    [...]

Journal ArticleDOI
TL;DR: The construction guarantees full simulation in the presence of malicious, polynomial-time adversaries (assuming the hardness of DDH assumption) and exhibits computation and communication costs of O(n+m) group elements in a constant round complexity.
Abstract: We propose a protocol for the problem of secure two-party pattern matching, where Alice holds a text t?{0,1}? of length n, while Bob has a pattern p?{0,1}? of length m. The goal is for Bob to (only) learn where his pattern occurs in Alice's text, while Alice learns nothing. Private pattern matching is an important problem that has many applications in the area of DNA search, computational biology and more. Our construction guarantees full simulation in the presence of malicious, polynomial-time adversaries (assuming the hardness of DDH assumption) and exhibits computation and communication costs of O(n+m) group elements in a constant round complexity. This improves over previous work by Gennaro et al. (Public Key Cryptography, pp. 145---160, 2010) whose solution requires overhead of O(nm) group elements and exponentiations in O(m) rounds. In addition to the above, we propose a collection of protocols for important variations of the secure pattern matching problem that are significantly more efficient than the current state of art solutions: First, we deal with secure pattern matching with wildcards. In this variant the pattern may contain wildcards that match both 0 and 1. Our protocol requires O(n+m) communication and O(1) rounds using O(nm) computation. Then we treat secure approximate pattern matching. In this variant the matches may be approximated, i.e., have Hamming distance less than some threshold, ?. Our protocol requires O(n?) communication in O(1) rounds using O(nm) computation. Third, we have secure pattern matching with hidden pattern length. Here, the length, m, of Bob's pattern remains a secret. Our protocol requires O(n+M) communication in O(1) rounds using O(n+M) computation, where M is an upper bound on m. Finally, we have secure pattern matching with hidden text length. Finally, in this variant the length, n, of Alice's text remains a secret. Our protocol requires O(N+m) communication in O(1) rounds using O(N+m) computation, where N is an upper bound on n.

67 citations


Cites background or result from "5PM: Secure pattern matching"

  • ...Finally, the work of [6] studies pattern matching with wildcards in the malicious setting and achieves similar costs to our protocols but for larger alphabets....

    [...]

  • ...[6] studies the problem of pattern matching with wildcards in a more general sense of non-binary alphabet, implementing a different algorithm based on linear algebra formulation and additive homomorphic encryption....

    [...]

Journal ArticleDOI
TL;DR: This paper presents two types of packed ciphertexts, one of which is based on the message encoding technique proposed by Brakerski and Vaikuntanathan, and enables efficient secure computation of more complex functionalities such as multiple inner products and multiple Hamming distances.
Abstract: Somewhat homomorphic encryption is public key encryption supporting a limited number of additions and multiplications on encrypted data. This encryption gives a powerful tool in performing meaningful computations with protecting data confidentiality, whose property is suitable mainly in cloud computing. In this paper, we focus on the scheme proposed by Brakerski and Vaikuntanathan, and present two types of packed ciphertexts in order to improve performance and reduce size of the encrypted data. One type of our packed ciphertexts is based on the message encoding technique proposed by Lauter, Naehrig and Vaikuntanathan. While their technique empowers efficient secure computation of sums and products over the integers, our second type of packed ciphertexts enables efficient secure computation of more complex functionalities such as multiple inner products and multiple Hamming distances. We apply our packing method to construct several protocols for secure biometric authentication and secure pattern matching computations. Our implementation shows that our method gives faster performance than the state-of-the-art work in such applications. Copyright © 2015 John Wiley & Sons, Ltd.

35 citations


Cites background or methods from "5PM: Secure pattern matching"

  • ...(A) integer comparison based pattern matching [41], (B) fast Fourier transform based protocol [48], (C) matrix multiplication-based pattern matching [49] and (D) garbled circuit-based text processing [43]....

    [...]

  • ...The authors in [49] call their work on (C) “5PM (5ecure Pattern Matching),” and they propose the notion of a Character Delay Vector (CDV), which enables to efficiently compute the position where a matching pattern could possibly end....

    [...]

  • ...5PM [49] Paillier scheme O((k + `) ) bandwidth 670 ms for k = 1, 000 and ` = 100 Our work Polynomial LWE scheme O(d`/ne) bandwidth 12....

    [...]

  • ...Unfortunately, only implementation results of H 5PM has been reported in the full version paper of [49]....

    [...]

  • ...The protocol H 5PM requires only two rounds of communication between two parties (see Table 5 in the full version paper of [49]), which is the same as ours in the matching phase of Section 5....

    [...]

Proceedings ArticleDOI
24 Aug 2015
TL;DR: This paper proposes a scheme for Generalized Pattern-matching String-search on Encrypted data (GPSE) in cloud systems and implements two most commonly used pattern matching search functions on encrypted data, the substring matching and the longest-prefix-first matching.
Abstract: Searchable encryption is an important and challenging issue. It allows people to search on encrypted data. This is a very useful function when more and more people choose to host their data in the cloud and the cloud server is not fully trustable. Existing solutions for searchable encryption are only limited to some simple functions of search, such as boolean search or similarity search. In this paper, we propose a scheme for Generalized Pattern-matching String-search on Encrypted data (GPSE) in cloud systems. GPSE allows users to specify their search queries by using generalized wildcard-based string patterns (such as SQL-like patterns). It gives users great expressive power in specifying highly targeted search queries. In the framework of GPSE, we particularly implemented two most commonly used pattern matching search functions on encrypted data, the substring matching and the longest-prefix-first matching. We also prove that GPSE is secure under the known-plaintext model. Experiments over real data sets show that GPSE achieves high search accuracy.

31 citations


Cites background or methods from "5PM: Secure pattern matching"

  • ...To evaluate the matching degree between 𝑠𝑖 and 𝑠𝑝, we introduce a metric called weighted Euclidean distance that can compare the fingerprint of two strings....

    [...]

  • ...There are some works [9]–[15] that use secure multi-party computation to achieve string pattern matching without revealing each party’s own information to others....

    [...]

References
More filters
Book
01 Jan 2000
TL;DR: This book presents a rigorous and systematic treatment of the foundational issues of cryptography: defining cryptographic tasks and solving new cryptographic problems using existing tools, focusing on the basic mathematical tools: computational difficulty, pseudorandomness and zero-knowledge proofs.
Abstract: From the Publisher: This book presents a rigorous and systematic treatment of the foundational issues of cryptography: defining cryptographic tasks and solving new cryptographic problems using existing tools It focuses on the basic mathematical tools: computational difficulty (one-way functions), pseudorandomness and zero-knowledge proofs Rather than describing ad?hoc approaches, this book emphasizes the clarification of fundamental concepts and the demonstration of the feasibility of solving cryptographic problems

1,226 citations

Proceedings Article
01 Jul 1989
TL;DR: In this paper, practical non-interactive public key systems are proposed which allow the reuse of the shared secret key since the key is not revealed either to insiders or to outsiders.
Abstract: In a society oriented cryptography it is better to have a public key for the company (organization) than having one for each individual employee [Des88]. Certainly in emergency situations, power is shared in many organizations. Solutions to this problem were presented [Des88], based on [GMW87], but are completely impractical and interactive. In this paper practical non-interactive public key systems are proposed which allow the reuse of the shared secret key since the key is not revealed either to insiders or to outsiders.

1,088 citations

Book ChapterDOI
11 May 1997
TL;DR: A new multi-authority secret-ballot election scheme that guarantees privacy, universal verifiability, and robustness is presented, and is the first scheme for which the performance is optimal in the sense that time and communication complexity is minimal both for the individual voters and the authorities.
Abstract: In this paper we present a new multi-authority secret-ballot election scheme that guarantees privacy, universal verifiability, and robustness. It is the first scheme for which the performance is optimal in the sense that time and communication complexity is minimal both for the individual voters and the authorities. An interesting property of the scheme is that the time and communication complexity for the voter is independent of the number of authorities. A voter simply posts a single encrypted message accompanied by a compact proof that it contains a valid vote. Our result is complementary to the result by Cramer, Franklin, Schoenmakers, and Yung in the sense that in their scheme the work for voters is linear in the number of authorities but can be instantiated to yield information-theoretic privacy, while in our scheme the voter's effort is independent of the number of authorities but always provides computational privacy-protection. We will also point out that the majority of proposed voting schemes provide computational privacy only (often without even considering the lack of information-theoretic privacy), and that our new scheme is by far superior to those schemes.

897 citations

Proceedings ArticleDOI
07 Aug 2002
TL;DR: It is shown that, in some cases, tree-merge algorithms can have performance comparable to stack-tree algorithms, in many cases they are considerably worse, and this behavior is explained by analytical results that demonstrate that, on sorted inputs, the stack- tree algorithms have worst-case I/O and CPU complexities linear in the sum of the sizes of inputs and output, while the tree-MERge algorithms do not have the same guarantee.
Abstract: XML queries typically specify patterns of selection predicates on multiple elements that have some specified tree structured relationships. The primitive tree structured relationships are parent-child and ancestor-descendant, and finding all occurrences of these relationships in an XML database is a core operation for XML query processing. We develop two families of structural join algorithms for this task: tree-merge and stack-tree. The tree-merge algorithms are a natural extension of traditional merge joins and the multi-predicate merge joins, while the stack-tree algorithms have no counterpart in traditional relational join processing. We present experimental results on a range of data and queries using the TIMBER native XML query engine built on top of SHORE. We show that while, in some cases, tree-merge algorithms can have performance comparable to stack-tree algorithms, in many cases they are considerably worse. This behavior is explained by analytical results that demonstrate that, on sorted inputs, the stack-tree algorithms have worst-case I/O and CPU complexities linear in the sum of the sizes of inputs and output, while the tree-merge algorithms do not have the same guarantee.

895 citations

Book ChapterDOI
17 Aug 2008
TL;DR: A simple and efficient compiler is presented for transforming secure multi-party computation protocols that enjoy security only with an honest majority into MPC protocols that guarantee security with no honest majority, in the oblivious-transfer (OT) hybrid model.
Abstract: We present a simple and efficient compiler for transforming secure multi-party computation (MPC) protocols that enjoy security only with an honest majority into MPC protocols that guarantee security with no honest majority, in the oblivious-transfer (OT) hybrid model. Our technique works by combining a secure protocol in the honest majority setting with a protocol achieving only security against semi-honestparties in the setting of no honest majority. Applying our compiler to variants of protocols from the literature, we get several applications for secure two-party computation and for MPC with no honest majority. These include: Constant-rate two-party computation in the OT-hybrid model. We obtain a statistically UC-secure two-party protocol in the OT-hybrid model that can evaluate a general circuit Cof size sand depth dwith a total communication complexity of O(s) + poly(k, d, log s) and O(d) rounds. The above result generalizes to a constant number of parties. Extending OTs in the malicious model. We obtain a computationally efficient protocol for generating many string OTs from few string OTs with only a constant amortized communication overheadcompared to the total length of the string OTs. Black-box constructions for constant-round MPC with no honest majority. We obtain general computationally UC-secure MPC protocols in the OT-hybrid model that use only a constant number of rounds, and only make a black-boxaccess to a pseudorandom generator. This gives the first constant-round protocols for three or more parties that only make a black-box use of cryptographic primitives (and avoid expensive zero-knowledge proofs).

635 citations