# 5PM: Secure pattern matching

## Summary (5 min read)

### 1 Introduction

- Pattern matching is fundamental to computer science.
- It was also supported by the OKAWA Foundation Research Award, IBM Faculty Research Award, Xerox Faculty Research Award, B. John Garrick Foundation Award, Teradata Research Award and LockheedMartin Corporation Research Award.
- A secure version of pattern matching has many applications, also known as – Substring pattern matching.

### 1.1 Our Contributions

- This paper presents 5ecure Pattern Matching (or 5PM), a new protocol for arbitrary alphabets that addresses, in addition to exact matching, more expressive search queries including single-character wildcards and substring pattern matching, and also provides the ability to hide pattern length.
- 5PM has communication complexity sublinear in circuit size (as opposed to general MPC, which has communication complexity linear in circuit size) to securely compute non-binary substring matching in the malicious model.
- The authors malicious model protocol requires O((m+ n)k2) bandwidth complexity.
- Here and throughout, the authors use the DNA alphabet (Σ = {A,C,G, T}) for examples.

### 1.2 Comparison to Previous Work

- In the exact pattern matching setting, the algorithm of Freedman, Ishai, Pinkas and Reingold [13] achieves polylogarithmic overhead in m and n and polynomial overhead in security parameters in the honest-but-curious setting.
- Recently, Vergnaud [14] built on the work of Hazay and Toft [16] to construct an efficient secure pattern matching scheme for wildcard matching and substring matching (requiring t runs over the preliminary matching result to search for t different Hamming distance values, which is also required by 5PM) in the malicious adversary model.
- By contrast, 5PM has the same overhead except for O(nm) exponentiations (see Table 2).
- The second is that their techniques are of independent interest and may be extended to additional functionalities.
- Jarrous and Pinkas [15] gave the first construction of a secure protocol for computing non-binary Hamming distances.

### 2 Preliminaries

- The rationale behind their secure 5PM protocol is based on a modification of an insecure pattern matching algorithm (IPM) [29] that can perform exact matching, exact matching with singlecharacter wildcards, and substring matching within the same algorithm.
- In Section 3.1, the authors show how their modified algorithm can be reduced to basic linear operations whose secure and efficient evaluation allows us to obtain their 5PM protocol.

### 2.1 Insecure Pattern Matching (IPM) Algorithm

- To illustrate how their modified algorithm works, the authors begin by describing how it performs exact matching; they then show how it handles single-character wildcards and substring matching.
- IPM involves the following steps: a. Inputs:.
- It then adds a 1 at the position in the activation vector several steps ahead, where it would expect the pattern to end (if the character appears in multiple positions in the pattern, it adds a 1 to all the corresponding positions where the pattern might end).
- The activation vector will be initialized to all zeros.
- This operation does not incur any false positives for the same reason that the exact matching IPM algorithm does not: there, for each pattern p, there is only one encoding into CDV s and only one sequence of adding CDV s as one moves along the text that could add up to m.

### 2.2 Preliminary Cryptographic Tools

- This section outlines preliminary cryptographic tools required for their protocols.
- The authors make use of additively homomorphic semantically secure encryption schemes.
- For concreteness, in the rest of this paper the authors concentrate on the additively homomorphic ElGamal encryption scheme whose security depends on the Decisional Diffie-Hellman (DDH) computational hardness assumption.
- While the authors use threshold ElGamal, in practice, any scheme is acceptable if it satisfies the required properties and supports the needed zero-knowledge arguments.
- For the malicious model protocol, the authors will make use of perfectly hiding, computationally binding commitment schemes (for further discussion, see [33]).

### 2.3 Computing Linear Operations Using Additively Homomorphic Encryption Schemes.

- The authors secure pattern-matching protocol relies on the following observations about linear operations and additively homomorphic encryption schemes.
- In what follows, let E be the encryption algorithm for an additively homomorphic encryption scheme for key pair (pk, sk).
- Suppose that P1 possesses pk, Epk(A), the entry-wise encryption of A, and also the unencrypted matrix B. Then P1 can compute Epk(A ·B), the encryption of the multiplication of A and B under the same pk. 2.3.2 Matrix Operators.
- More specifically, an affine hash function Zklq →.
- Only with probability 1/q will the decryptions equal each other when A 6= B because the hash function is chosen uniformly at random.

### 3 5PM Protocol

- This section uses the above observations and cryptographic tools to construct the secure patternmatching protocol (5PM).
- The authors develop πH5PM for the honest-but-curious adversary model and π M 5PM for the malicious (static corruption) adversary model.

### 3.1 Converting IPM to Linear Operations.

- In reality, since MT and MCDV are 0/1 matrices, multiplication is more computationally expensive than necessary, and vectors can simply be selected (as shown in the IPM description in Section 2.1).
- This transformation, jointly with the previous step, constructs a matrix of CDV s where the ith row contains only CDV (Ti), which starts in the ith position in the ith row (sets up step d in Section 2.1.1).

### 3.2 Honest-Cut-Curious (HBC) 5PM Protocol

- The authors begin by describing the intuition behind required modifications to secure IPM in the HBC adversary model.
- When Client sends Server E(MCDV ), by the reasoning of Sections 2.3 and 3.1, Server can compute E(AV ), an encrypted activation vector, using only MT and E(MCDV ).
- The authors refer the reader to Sections 3.1 and 2.3.2 for the notation used here.
- The protocol operation is as follows: a. Client computes (sk, pk) ← Key(1k) using the key generation algorithm of an additively homomorphic encryption scheme, E. b. Client computes MCDV ← GenCDV (p).
- In particular, πH5PM does not require multiple independent protocol executions to compute substring matching for a range of substring length values.

### 3.3 Malicious Model 5PM Protocol

- The authors describe an instantiation of πM5PM based on additively homomorphic threshold ElGamal encryption (see Section 2.2) for concreteness; generalization to other encryption schemes follows provided they have efficient Σ protocols for the statements required here.
- Second, the authors give interactive zero-knowledge consistency arguments that will be required.
- Finally, the authors divide πM5PM into six subprotocols and describe their construction and how they are combined into the final protocol πM5PM .
- ΠS,AV is a two-party protocol executed between Client and Server which outputs to Server an encrypted activation vector corresponding to matching Client’s p against Server’s T .

### 4.1 Definitions

- The authors consider interactive protocols that have the following specification: a. P sends message a, |a| ∈ poly(|x|).
- Σ protocols that only have standard soundness will not always satisfy the lemma.
- For all x such that there does not exist a w with (x,w) ∈ R, V will only accept with negligible probability.
- The authors first construct an extractable equivocable commitment scheme and use this scheme together with the Σ protocol specification for the ZK-AoK construction.

### 4.2 Extractable Equivocable Commitment Schemes

- Such a scheme is an interactive protocol between a PPT committer C and a PPT receiver R consisting of three functions: EComSet instantiates the commitment scheme, com computes the commitment, and EComV er verifies that decommitment is valid.
- For pk correctly constructed and any messages s and s′, the distributions of com(s, r, pk) and com(s′, r′, pk) are statistically indistinguishable over the choice of random input (e.g., r and r′).
- The above EP protocol has bandwidth complexityO(k2) and computational complexityO(k2 log2 k).
- Just like Pedersen commitments, this commitment scheme is statistically hiding and computa- tionally binding.

### 4.3 Construction of a ZK-AoK from Σ Protocols

- The authors give a construction for how to transform a three-move Σ argument of knowledge Σrel for a binary relation Rrel into a five-move ZK argument of knowledge πrel for Rrel using the extractable equivocable commitment scheme EP described in Section 4.2.
- Then it follows that Σrel has a verifier V that accepts a transcript with non-negligible probability for the same x.
- This implies that there are at least two distinct challenges e and e′ such that P can produce accepting transcripts (a, e, z) and (a, e′, z′) for Σrel within πrel (in fact, there must be a non-negligible number of such challenges).
- Note that in particular, the fact that Σ protocols are special honest verifier zero knowledge is important, as it implies the ability to construct correct transcripts for arbitrary (pre-selected) distributions of verifier messages.
- EP then rewinds to rel-4, after P has already instantiated the commitment scheme and sent its initial message a for Σrel, and changes its challenge for Σrel according to the specification of Erel,P .

### 6 Detailed πM5PM Specification

- The authors provide here the detailed protocol specification of the malicious model version of 5PM , πM5PM .
- First, the authors must specify the various zero-knowledge arguments of consistency that are required.

### 6.1 Arguments of Knowledge of Consistency

- The authors first describe five required interactive arguments which they rely on to prove statements required in the πM5PM protocol.
- They are designed for use with the specified threshold ElGamal encryption scheme (Section 2.2).
- The five required interactive arguments are: AM01, an AoK of Consistency for Matrix Formation 0/1: APD, an AoK of Consistency for Partial Decryption: ARand, an AoK of Consistency for Randomization: AFD, an AoK of Consistency for Final Decryption:.
- The authors denote by AFD the five-move interactive argument where P proves, using l parallel instantiations of πfin, that either the l encryptions (xi, yi) has been partially decrypted correctly or that P knows the discrete logarithm of gw.

### 6.2 πM5PM Protocol Specification

- The eight round protocol for the malicious model, πM5PM , consists of the following six subprotocols: (a) πencr: initializes an additively homomorphic threshold encryption scheme.
- Allows Client to also construct an encrypted activation vector for Client’s pattern and Server’s encrypted text, also known as (c) πC,AV.
- Client input is pattern p, MCDV for p, and pt, the matching threshold.
- This subprotocol starts at global round 3 and ends at global round 5, with ZK preprocessing occurring during global rounds 1 and 2. – Client also sends A P,2 M01 to prove that E(MCDV ) is formatted correctly, where A P,1 M01 and A V,1 M01 occur during global rounds 1 and 2, respectively.
- Server also sends the message comm(A P,2 FD), where AFD is the argument to prove that either DS(E(AV r S )) was obtained correctly or that Server knows s∗ (for h∗ sent by Client in the first global round during πencr), whereas A P,1 FD and AV,1FD are sent during global rounds 2 and 3, respectively.

### 7.1 Adversarial Model

- The authors refer the reader to [33,38] for further discussion of the definitions given here.
- Note that parties can be defined via their next message functions; see, for example, [39].
- In particular, the corrupted party may choose to abort and to not complete the protocol at all.

### 7.2 Simulator Constructions and Security for πH5PM

- The authors provide, for each admissible pair in the real world, an admissible pair in the ideal world such that REAL πH5PM P̄ (x̄, ȳ, r̄) and IDEAL πH5PM P̄ ′ (x̄, ȳ, r̄) are computationally indistinguishable.
- Consider the admissible pair P̄ = (Client, Server) in the real world.
- Note that SS has oracle access to real-world Server.
- The authors assume that the encryption scheme (Key,E,D) is fixed.
- The authors construct SS for an admissible pair P̄ ′ =(SC , Server) in the ideal world (where Server behaves honestly in both cases) such that REAL πH5PM P̄ (x̄, ȳ, r̄) and IDEAL πH5PM P̄ ′ (x̄, ȳ, r̄) are computationally indistinguishable.

### 7.3 Simulator Constructions and Security for πM5PM

- The authors provide, for each admissible pair in the real world, an admissible pair in the ideal world such that REAL πM5PM P̄ (x̄, ȳ, r̄) and IDEAL πM5PM P̄ ′ (x̄, ȳ, r̄) are computationally indistinguishable.
- Server also sends the message comm(A P,2 FD), where AFD is the argument to prove that either DS(E(AV r S )) was obtained correctly or that Server knows s∗ (for h∗ sent by SS in the first global round during πencr), where AP,1FD and AV,1FD are sent during global rounds 2 and 3, respectively.
- Therefore, the zero knowledge distinguisher DZK distinguishes the two cases of VZK ’s interaction with non-negligible probability by running D internally, which will distinguish the two views of the ZK execution with non-negligible probability, which is a contradiction.
- Once the two interactions are done, VZK completes the internal execution of πM5PM .
- Renc encrypts with the Client’s sC (which it obtains by running the knowledge extractor; as in hybrid H1, this does not affect transcript indistinguishability), and uses this final encryption as the Server-side encryption; this final encryption corresponds to encryption with the secret key sC + sS .

### 8 Detailed Performance Results of 5PM Implementation

- The authors experiments were performed on an Intel dual quad-core 2.93GHz machine with 8GB of memory running Ubuntu Linux version 10.10.
- The authors used fast-decryption Paillier [40] from the Self-Certifying File System (SFS) library [41], and used alphabets of sizes 4 (DNA) and 36 .
- The authors implementation results in Table 13 show that on average, 95% of the total online runtime was spent in three components of the protocol, two at Server and one at Client.
- The first is searching the text at Server by adding CDVs, which correspond to pattern characters, to the activation vector; the second is blinding elements of the activation vector at the Server; the third is decrypting the blinded activation vector at Client.

Did you find this useful? Give us your feedback

##### Citations

130 citations

### Cites methods from "5PM: Secure pattern matching"

...5PM [3] O((k + )λ) 2 (semi-honest) practical in using exact, approximate, (MPC-based) O((k + )λ(2)) 8 (malicious) k ≤ 1000 ∼ 10000 wildcards, substring FHE O( ) 2 impractical any Our work O( /n ) 2 faster than 5PM exact, approximate (SHE scheme)...

[...]

...in [3] presented an efficient twoparties protocol of secure pattern matching for more expressive search queries including single character wildcards and substring pattern matching....

[...]

77 citations

### Cites background from "5PM: Secure pattern matching"

...These works take the approach of considering a specific type of query and identifying a data structure that allows efficient evaluation of those queries in an unencrypted setting....

[...]

67 citations

### Cites background or result from "5PM: Secure pattern matching"

...Finally, the work of [6] studies pattern matching with wildcards in the malicious setting and achieves similar costs to our protocols but for larger alphabets....

[...]

...[6] studies the problem of pattern matching with wildcards in a more general sense of non-binary alphabet, implementing a different algorithm based on linear algebra formulation and additive homomorphic encryption....

[...]

35 citations

### Cites background or methods from "5PM: Secure pattern matching"

...(A) integer comparison based pattern matching [41], (B) fast Fourier transform based protocol [48], (C) matrix multiplication-based pattern matching [49] and (D) garbled circuit-based text processing [43]....

[...]

...The authors in [49] call their work on (C) “5PM (5ecure Pattern Matching),” and they propose the notion of a Character Delay Vector (CDV), which enables to efficiently compute the position where a matching pattern could possibly end....

[...]

...5PM [49] Paillier scheme O((k + `) ) bandwidth 670 ms for k = 1, 000 and ` = 100 Our work Polynomial LWE scheme O(d`/ne) bandwidth 12....

[...]

...Unfortunately, only implementation results of H 5PM has been reported in the full version paper of [49]....

[...]

...The protocol H 5PM requires only two rounds of communication between two parties (see Table 5 in the full version paper of [49]), which is the same as ours in the matching phase of Section 5....

[...]

31 citations

### Cites background or methods from "5PM: Secure pattern matching"

...To evaluate the matching degree between 𝑠𝑖 and 𝑠𝑝, we introduce a metric called weighted Euclidean distance that can compare the fingerprint of two strings....

[...]

...There are some works [9]–[15] that use secure multi-party computation to achieve string pattern matching without revealing each party’s own information to others....

[...]

##### References

7,008 citations

5,770 citations

3,579 citations

3,463 citations

^{1}

3,270 citations