# 5PM: Secure pattern matching

TL;DR: The problem of secure pattern matching that allows single-character wildcards and substring matching in the malicious stand-alone setting is considered and the first secure expressive pattern matching protocol designed to optimize round complexity by carefully specifying the entire protocol round by round is considered.

Abstract: In this paper we consider the problem of secure pattern matching that allows single-character wildcards and substring matching in the malicious stand-alone setting. Our protocol, called 5PM, is executed between two parties: Server, holding a text of length n, and Client, holding a pattern of length m to be matched against the text, where our notion of matching is more general than traditionally considered and includes non-binary alphabets, non-binary Hamming distance and non-binary substring matching.5PM is the first secure expressive pattern matching protocol designed to optimize round complexity by carefully specifying the entire protocol round by round. 5PM requires only eight rounds in the malicious static corruptions model. In the malicious model, 5PM requires O((m+n)k2) communication complexity and O(m+n) encryptions, where m is the pattern length and n is the text length. Further, 5PM can hide pattern size with no asymptotic additional costs in either computation or bandwidth.

## Summary (5 min read)

### 1 Introduction

- Pattern matching is fundamental to computer science.
- It was also supported by the OKAWA Foundation Research Award, IBM Faculty Research Award, Xerox Faculty Research Award, B. John Garrick Foundation Award, Teradata Research Award and LockheedMartin Corporation Research Award.
- A secure version of pattern matching has many applications, also known as – Substring pattern matching.

### 1.1 Our Contributions

- This paper presents 5ecure Pattern Matching (or 5PM), a new protocol for arbitrary alphabets that addresses, in addition to exact matching, more expressive search queries including single-character wildcards and substring pattern matching, and also provides the ability to hide pattern length.
- 5PM has communication complexity sublinear in circuit size (as opposed to general MPC, which has communication complexity linear in circuit size) to securely compute non-binary substring matching in the malicious model.
- The authors malicious model protocol requires O((m+ n)k2) bandwidth complexity.
- Here and throughout, the authors use the DNA alphabet (Σ = {A,C,G, T}) for examples.

### 1.2 Comparison to Previous Work

- In the exact pattern matching setting, the algorithm of Freedman, Ishai, Pinkas and Reingold [13] achieves polylogarithmic overhead in m and n and polynomial overhead in security parameters in the honest-but-curious setting.
- Recently, Vergnaud [14] built on the work of Hazay and Toft [16] to construct an efficient secure pattern matching scheme for wildcard matching and substring matching (requiring t runs over the preliminary matching result to search for t different Hamming distance values, which is also required by 5PM) in the malicious adversary model.
- By contrast, 5PM has the same overhead except for O(nm) exponentiations (see Table 2).
- The second is that their techniques are of independent interest and may be extended to additional functionalities.
- Jarrous and Pinkas [15] gave the first construction of a secure protocol for computing non-binary Hamming distances.

### 2 Preliminaries

- The rationale behind their secure 5PM protocol is based on a modification of an insecure pattern matching algorithm (IPM) [29] that can perform exact matching, exact matching with singlecharacter wildcards, and substring matching within the same algorithm.
- In Section 3.1, the authors show how their modified algorithm can be reduced to basic linear operations whose secure and efficient evaluation allows us to obtain their 5PM protocol.

### 2.1 Insecure Pattern Matching (IPM) Algorithm

- To illustrate how their modified algorithm works, the authors begin by describing how it performs exact matching; they then show how it handles single-character wildcards and substring matching.
- IPM involves the following steps: a. Inputs:.
- It then adds a 1 at the position in the activation vector several steps ahead, where it would expect the pattern to end (if the character appears in multiple positions in the pattern, it adds a 1 to all the corresponding positions where the pattern might end).
- The activation vector will be initialized to all zeros.
- This operation does not incur any false positives for the same reason that the exact matching IPM algorithm does not: there, for each pattern p, there is only one encoding into CDV s and only one sequence of adding CDV s as one moves along the text that could add up to m.

### 2.2 Preliminary Cryptographic Tools

- This section outlines preliminary cryptographic tools required for their protocols.
- The authors make use of additively homomorphic semantically secure encryption schemes.
- For concreteness, in the rest of this paper the authors concentrate on the additively homomorphic ElGamal encryption scheme whose security depends on the Decisional Diffie-Hellman (DDH) computational hardness assumption.
- While the authors use threshold ElGamal, in practice, any scheme is acceptable if it satisfies the required properties and supports the needed zero-knowledge arguments.
- For the malicious model protocol, the authors will make use of perfectly hiding, computationally binding commitment schemes (for further discussion, see [33]).

### 2.3 Computing Linear Operations Using Additively Homomorphic Encryption Schemes.

- The authors secure pattern-matching protocol relies on the following observations about linear operations and additively homomorphic encryption schemes.
- In what follows, let E be the encryption algorithm for an additively homomorphic encryption scheme for key pair (pk, sk).
- Suppose that P1 possesses pk, Epk(A), the entry-wise encryption of A, and also the unencrypted matrix B. Then P1 can compute Epk(A ·B), the encryption of the multiplication of A and B under the same pk. 2.3.2 Matrix Operators.
- More specifically, an affine hash function Zklq →.
- Only with probability 1/q will the decryptions equal each other when A 6= B because the hash function is chosen uniformly at random.

### 3 5PM Protocol

- This section uses the above observations and cryptographic tools to construct the secure patternmatching protocol (5PM).
- The authors develop πH5PM for the honest-but-curious adversary model and π M 5PM for the malicious (static corruption) adversary model.

### 3.1 Converting IPM to Linear Operations.

- In reality, since MT and MCDV are 0/1 matrices, multiplication is more computationally expensive than necessary, and vectors can simply be selected (as shown in the IPM description in Section 2.1).
- This transformation, jointly with the previous step, constructs a matrix of CDV s where the ith row contains only CDV (Ti), which starts in the ith position in the ith row (sets up step d in Section 2.1.1).

### 3.2 Honest-Cut-Curious (HBC) 5PM Protocol

- The authors begin by describing the intuition behind required modifications to secure IPM in the HBC adversary model.
- When Client sends Server E(MCDV ), by the reasoning of Sections 2.3 and 3.1, Server can compute E(AV ), an encrypted activation vector, using only MT and E(MCDV ).
- The authors refer the reader to Sections 3.1 and 2.3.2 for the notation used here.
- The protocol operation is as follows: a. Client computes (sk, pk) ← Key(1k) using the key generation algorithm of an additively homomorphic encryption scheme, E. b. Client computes MCDV ← GenCDV (p).
- In particular, πH5PM does not require multiple independent protocol executions to compute substring matching for a range of substring length values.

### 3.3 Malicious Model 5PM Protocol

- The authors describe an instantiation of πM5PM based on additively homomorphic threshold ElGamal encryption (see Section 2.2) for concreteness; generalization to other encryption schemes follows provided they have efficient Σ protocols for the statements required here.
- Second, the authors give interactive zero-knowledge consistency arguments that will be required.
- Finally, the authors divide πM5PM into six subprotocols and describe their construction and how they are combined into the final protocol πM5PM .
- ΠS,AV is a two-party protocol executed between Client and Server which outputs to Server an encrypted activation vector corresponding to matching Client’s p against Server’s T .

### 4.1 Definitions

- The authors consider interactive protocols that have the following specification: a. P sends message a, |a| ∈ poly(|x|).
- Σ protocols that only have standard soundness will not always satisfy the lemma.
- For all x such that there does not exist a w with (x,w) ∈ R, V will only accept with negligible probability.
- The authors first construct an extractable equivocable commitment scheme and use this scheme together with the Σ protocol specification for the ZK-AoK construction.

### 4.2 Extractable Equivocable Commitment Schemes

- Such a scheme is an interactive protocol between a PPT committer C and a PPT receiver R consisting of three functions: EComSet instantiates the commitment scheme, com computes the commitment, and EComV er verifies that decommitment is valid.
- For pk correctly constructed and any messages s and s′, the distributions of com(s, r, pk) and com(s′, r′, pk) are statistically indistinguishable over the choice of random input (e.g., r and r′).
- The above EP protocol has bandwidth complexityO(k2) and computational complexityO(k2 log2 k).
- Just like Pedersen commitments, this commitment scheme is statistically hiding and computa- tionally binding.

### 4.3 Construction of a ZK-AoK from Σ Protocols

- The authors give a construction for how to transform a three-move Σ argument of knowledge Σrel for a binary relation Rrel into a five-move ZK argument of knowledge πrel for Rrel using the extractable equivocable commitment scheme EP described in Section 4.2.
- Then it follows that Σrel has a verifier V that accepts a transcript with non-negligible probability for the same x.
- This implies that there are at least two distinct challenges e and e′ such that P can produce accepting transcripts (a, e, z) and (a, e′, z′) for Σrel within πrel (in fact, there must be a non-negligible number of such challenges).
- Note that in particular, the fact that Σ protocols are special honest verifier zero knowledge is important, as it implies the ability to construct correct transcripts for arbitrary (pre-selected) distributions of verifier messages.
- EP then rewinds to rel-4, after P has already instantiated the commitment scheme and sent its initial message a for Σrel, and changes its challenge for Σrel according to the specification of Erel,P .

### 6 Detailed πM5PM Specification

- The authors provide here the detailed protocol specification of the malicious model version of 5PM , πM5PM .
- First, the authors must specify the various zero-knowledge arguments of consistency that are required.

### 6.1 Arguments of Knowledge of Consistency

- The authors first describe five required interactive arguments which they rely on to prove statements required in the πM5PM protocol.
- They are designed for use with the specified threshold ElGamal encryption scheme (Section 2.2).
- The five required interactive arguments are: AM01, an AoK of Consistency for Matrix Formation 0/1: APD, an AoK of Consistency for Partial Decryption: ARand, an AoK of Consistency for Randomization: AFD, an AoK of Consistency for Final Decryption:.
- The authors denote by AFD the five-move interactive argument where P proves, using l parallel instantiations of πfin, that either the l encryptions (xi, yi) has been partially decrypted correctly or that P knows the discrete logarithm of gw.

### 6.2 πM5PM Protocol Specification

- The eight round protocol for the malicious model, πM5PM , consists of the following six subprotocols: (a) πencr: initializes an additively homomorphic threshold encryption scheme.
- Allows Client to also construct an encrypted activation vector for Client’s pattern and Server’s encrypted text, also known as (c) πC,AV.
- Client input is pattern p, MCDV for p, and pt, the matching threshold.
- This subprotocol starts at global round 3 and ends at global round 5, with ZK preprocessing occurring during global rounds 1 and 2. – Client also sends A P,2 M01 to prove that E(MCDV ) is formatted correctly, where A P,1 M01 and A V,1 M01 occur during global rounds 1 and 2, respectively.
- Server also sends the message comm(A P,2 FD), where AFD is the argument to prove that either DS(E(AV r S )) was obtained correctly or that Server knows s∗ (for h∗ sent by Client in the first global round during πencr), whereas A P,1 FD and AV,1FD are sent during global rounds 2 and 3, respectively.

### 7.1 Adversarial Model

- The authors refer the reader to [33,38] for further discussion of the definitions given here.
- Note that parties can be defined via their next message functions; see, for example, [39].
- In particular, the corrupted party may choose to abort and to not complete the protocol at all.

### 7.2 Simulator Constructions and Security for πH5PM

- The authors provide, for each admissible pair in the real world, an admissible pair in the ideal world such that REAL πH5PM P̄ (x̄, ȳ, r̄) and IDEAL πH5PM P̄ ′ (x̄, ȳ, r̄) are computationally indistinguishable.
- Consider the admissible pair P̄ = (Client, Server) in the real world.
- Note that SS has oracle access to real-world Server.
- The authors assume that the encryption scheme (Key,E,D) is fixed.
- The authors construct SS for an admissible pair P̄ ′ =(SC , Server) in the ideal world (where Server behaves honestly in both cases) such that REAL πH5PM P̄ (x̄, ȳ, r̄) and IDEAL πH5PM P̄ ′ (x̄, ȳ, r̄) are computationally indistinguishable.

### 7.3 Simulator Constructions and Security for πM5PM

- The authors provide, for each admissible pair in the real world, an admissible pair in the ideal world such that REAL πM5PM P̄ (x̄, ȳ, r̄) and IDEAL πM5PM P̄ ′ (x̄, ȳ, r̄) are computationally indistinguishable.
- Server also sends the message comm(A P,2 FD), where AFD is the argument to prove that either DS(E(AV r S )) was obtained correctly or that Server knows s∗ (for h∗ sent by SS in the first global round during πencr), where AP,1FD and AV,1FD are sent during global rounds 2 and 3, respectively.
- Therefore, the zero knowledge distinguisher DZK distinguishes the two cases of VZK ’s interaction with non-negligible probability by running D internally, which will distinguish the two views of the ZK execution with non-negligible probability, which is a contradiction.
- Once the two interactions are done, VZK completes the internal execution of πM5PM .
- Renc encrypts with the Client’s sC (which it obtains by running the knowledge extractor; as in hybrid H1, this does not affect transcript indistinguishability), and uses this final encryption as the Server-side encryption; this final encryption corresponds to encryption with the secret key sC + sS .

### 8 Detailed Performance Results of 5PM Implementation

- The authors experiments were performed on an Intel dual quad-core 2.93GHz machine with 8GB of memory running Ubuntu Linux version 10.10.
- The authors used fast-decryption Paillier [40] from the Self-Certifying File System (SFS) library [41], and used alphabets of sizes 4 (DNA) and 36 .
- The authors implementation results in Table 13 show that on average, 95% of the total online runtime was spent in three components of the protocol, two at Server and one at Client.
- The first is searching the text at Server by adding CDVs, which correspond to pattern characters, to the activation vector; the second is blinding elements of the activation vector at the Server; the third is decrypting the blinded activation vector at Client.

Did you find this useful? Give us your feedback

...read more

##### Citations

114 citations

### Cites methods from "5PM: Secure pattern matching"

...5PM [3] O((k + )λ) 2 (semi-honest) practical in using exact, approximate, (MPC-based) O((k + )λ(2)) 8 (malicious) k ≤ 1000 ∼ 10000 wildcards, substring FHE O( ) 2 impractical any Our work O( /n ) 2 faster than 5PM exact, approximate (SHE scheme)...

[...]

...in [3] presented an efficient twoparties protocol of secure pattern matching for more expressive search queries including single character wildcards and substring pattern matching....

[...]

67 citations

### Cites background from "5PM: Secure pattern matching"

...These works take the approach of considering a specific type of query and identifying a data structure that allows efficient evaluation of those queries in an unencrypted setting....

[...]

59 citations

### Cites background or result from "5PM: Secure pattern matching"

...Finally, the work of [6] studies pattern matching with wildcards in the malicious setting and achieves similar costs to our protocols but for larger alphabets....

[...]

...[6] studies the problem of pattern matching with wildcards in a more general sense of non-binary alphabet, implementing a different algorithm based on linear algebra formulation and additive homomorphic encryption....

[...]

32 citations

### Cites background or methods from "5PM: Secure pattern matching"

...(A) integer comparison based pattern matching [41], (B) fast Fourier transform based protocol [48], (C) matrix multiplication-based pattern matching [49] and (D) garbled circuit-based text processing [43]....

[...]

...The authors in [49] call their work on (C) “5PM (5ecure Pattern Matching),” and they propose the notion of a Character Delay Vector (CDV), which enables to efficiently compute the position where a matching pattern could possibly end....

[...]

...5PM [49] Paillier scheme O((k + `) ) bandwidth 670 ms for k = 1, 000 and ` = 100 Our work Polynomial LWE scheme O(d`/ne) bandwidth 12....

[...]

...Unfortunately, only implementation results of H 5PM has been reported in the full version paper of [49]....

[...]

...The protocol H 5PM requires only two rounds of communication between two parties (see Table 5 in the full version paper of [49]), which is the same as ours in the matching phase of Section 5....

[...]

27 citations

### Cites background or methods from "5PM: Secure pattern matching"

...To evaluate the matching degree between 𝑠𝑖 and 𝑠𝑝, we introduce a metric called weighted Euclidean distance that can compare the fingerprint of two strings....

[...]

...There are some works [9]–[15] that use secure multi-party computation to achieve string pattern matching without revealing each party’s own information to others....

[...]

##### References

6,049 citations

4,940 citations

3,401 citations

3,178 citations

^{1}

3,174 citations