What are the contributions in "Deleting secret data with public verifiability" ?

Secure Storage and Erasure ( SSE ) this paper is a secure data erasure protocol based on the trust-but-verify paradigm, which enables a user to verify the correct implementation of cryptographic operations inside a TPM without having to access its internal source code.

What are the future works in "Deleting secret data with public verifiability" ?

Future work includes extending the “ trust-butverify ” paradigm to other crypto primitives, in particular, the secure random number generator. The problem of permitting end users to audit if a random number has been generated correctly in a TPM as part of the encryption process ( or a cryptographic protocol ) is still largely unsolved and deserves further research. Java cards can be purchased from various sources, e. g., [ 41 ], [ 42 ].

Why is it necessary to feed in the entire encrypted message into the audit function input?

Because of the use of a key-confirmation string, it is unnecessary to feed in the entire encrypted message (i.e., EAuthkη (m)) into the audit function input.

How many random EC pairs can be created?

Given that the Java card that the authors use has 80 KB EEPROM in total and that the SSE program takes up 16 KB storage in EEPROM, the authors can create about 650 random EC public/private pairs.

What can a user do to gain access to the protected memory?

In one attack, the user can do as a data thief would do: 1) compromising the tamper resistance to gain access to the TPM’s protected memory; 2) recovering the overwritten key value in the protected memory in the TPM.

How do the authors assume the user erases the keys?

before the system falls into the enemy hands, the authors assume that the user erases keys by calling the Delete function, or in the extreme case, physically destroying the TPM chip.

How many user instances can be created?

As an example, with 160-bit n, 32-bit index Ci and a TPM of 16 MB EEPROM memory (see [38]), up to 666,667 user instances can be created.

What is the reason why the latency in audit is a constant value?

This is attributed to the use of explicit key confirmation; otherwise, with the original DHIES, the authors will have to feed in the entire encrypted message and the latency for auditing will have a linear time3.

(Open Access) Deleting Secret Data with Public Verifiability (2016) | Feng Hao

Q: What is the first method of deleting a key?

The first method is to simply save the key on the disk, alongside the encrypted data (typically as part of the meta data in the file header) [17], [20], [25], [26].

Newcastle University ePrints - eprint.ncl.ac.uk

Hao F, Clarke D, Zorzo AF. Deleting Secret Data with Public Verifiability. IEEE

Transactions on Dependable and Secure Computing 2015. DOI:

10.1109/TDSC.2015.2423684

other uses, in any current or future media, including reprinting/republishing this material for advertising

or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or

reuse of any copyrighted component of this work in other works.

DOI link to article:

http://dx.doi.org/10.1109/TDSC.2015.2423684

Date deposited:

16/05/2016

Deleting Secret Data with Public Veriﬁability

Feng Hao, Member, IEEE, Dylan Clarke, Avelino Francisco Zorzo

Abstract—Existing software-based data erasure programs can be summarized as following the same one-bit-return protocol: the

deletion program performs data erasure and returns either success or failure. However, such a one-bit-return protocol turns the

data deletion system into a black box – the user has to trust the outcome but cannot easily verify it. This is especially problematic

when the deletion program is encapsulated within a Trusted Platform Module (TPM), and the user has no access to the code

inside.

In this paper, we present a cryptographic solution that aims to make the data deletion process more transparent and veriﬁable.

In contrast to the conventional black/white assumptions about TPM (i.e., either completely trust or distrust), we introduce a third

assumption that sits in between: namely, “trust-but-verify”. Our solution enables a user to verify the correct implementation of two

important operations inside a TPM without accessing its source code: i.e., the correct encryption of data and the faithful deletion

of the key. Finally, we present a proof-of-concept implementation of the SSE system on a resource-constrained Java card to

demonstrate its practical feasibility. To our knowledge, this is the ﬁrst systematic solution to the secure data deletion problem

based on a “trust-but-verify” paradigm, together with a concrete prototype implementation.

1 INTRODUCTION

Secure data erasure requires permanently deleting

digital data from a physical medium such that the

data is irrecoverable [13]. This requirement plays a

critical role in all practical data management systems,

and in satisfying several government regulations on

data protection [25]. For the past two decades, this

subject has been extensively studied by researchers

in both academia and industry, resulting in a rich

body of literature [5], [7], [8], [13], [14], [17], [23], [25],

[26], [28], [33], [35]. A recent survey on this topic is

published in [27] .

1.1 One-bit return

To delete data securely is a non-trivial problem. It

has been generally agreed that no existing software-

based solutions can guarantee the complete removal

of data from the storage medium [27]. To explain the

context of this ﬁeld, we will abstract away imple-

mentation details of existing solutions, and focus at

a higher and more intuitive protocol level. Existing

deletion methods can be described using essentially

the same protocol, which we call the “one-bit-return”

protocol. In this protocol, the user sends a command

– usually through a host computer – to delete data

from a storage system, and receives a one-bit reply

indicating the status of the operation. The process can

be summarized as follows.

User → Storage : Delete data

Storage → User : Success/Failure (1 bit )

F. Hao and D. Clarke are with the School of Computing Science,

Newcastle University, UK. Email: {Feng.Hao, Dylan.Clarke}@ncl.ac.uk.

A.F. Zorzo is with Pontiﬁcal Catholic University of RS, Brazil. Email:

avelino.zorzo@pucrs.br. The ﬁrst author would like to acknowledge the

support of EPSRC First Grant EP/J011541/1 and ERC Starting Grant

No. 106591.

Deletion by unlinking. Take the deletion in the

Windows operating system as an example. When the

user wishes to delete a ﬁle (say by hitting the “delete”

button), the operating system removes the link of the

ﬁle from the underlying ﬁle system, and returns one

bit to the user: Success. However, the return of the

“Success” bit can be misleading. Although the link

of the ﬁle has been removed, the content of the ﬁle

remains on the disk. An attacker with a forensic tool

can easily recover the deleted ﬁle by scanning the disk

[12]. The same problem also applies to the default

deletion program bundled in other operating systems

(e.g., Apple and Linux).

Deletion by overwriting. Obviously, merely unlink-

ing the ﬁle is not sufﬁcient. In addition, the content of

the ﬁle should be overwritten with random data. This

has been proposed in several papers [5], [13], [14] and

speciﬁed in various standards (e.g., [18]). However,

one inherent limitation with the overwriting methods

is that they cannot guarantee the complete removal of

data. As concluded in [13]: “it is effectively impossible

to sanitize storage locations by simply overwriting

them, no matter how many overwrite passes are made

or what data patterns are written.” The conclusion

holds for not only magnetic drives [13], but also tapes

[7], optical disks [14] and ﬂash-based solid state drives

[33]. In all these cases, an attacker, equipped with

advanced microsoping tools, may recover overwritten

data based on the physical remanence of the deleted

data left on the storage medium. Therefore, although

overwriting data makes the recovery harder, it does

not change the basic one-bit-return protocol. Same as

before, the return of “Success” cannot guarantee the

actual deletion of data.

Deletion by cryptography. Boneh and Lipton [7]

were among the ﬁrst in proposing the use of cryp-

tography to address the secure data erasure problem,

with a number of follow-up works [17], [20], [21],

[24]–[26], [35]. In general, a cryptography-based so-

lution works by encrypting all data before saving it

to the disk, and later deleting the data by discard-

ing the decryption key. This approach is especially

desirable when duplicate copies of data are backed

up in distributed locations so it becomes impossible

to overwrite every copy [7]. The use of cryptography

essentially changes the problem of deleting a large

amount of data to that of deleting a short key (say

a 128-bit AES key). Still, the fundamental question

remains: how to securely delete the key?

1.2 Key management

When cryptography is used to address the data era-

sure problem, the key management becomes critically

important. There are several approaches proposed in

the past literature to manage cryptographic keys.

The ﬁrst method is to simply save the key on the

disk, alongside the encrypted data (typically as part

of the meta data in the ﬁle header) [17], [20], [25],

[26]. Deleting the data involves overwriting the disk

location where the key is stored. Once the key is

erased, the ciphertext immediately becomes useless

[7]. This has the advantage of quickly erasing data

since only a small block of data (16 bytes for AES-128)

needs to be overwritten. However, if the key is saved

on the disk, cryptography may not add much security

in ensuring data deletion [16]. On the contrary, it

may even degrade security if not handled properly

– instead of recovering a large amount of overwritten

data, the attacker now just needs to recover a short

128-bit key. This may signiﬁcantly increase the chance

of a total recovery. Once the key is restored, the

deleted data will be fully recovered. (We assume the

ciphertext is available to the attacker, which is usually

the case.)

The second method is to use a user-deﬁned pass-

word as the encryption key [35]. The key is derived on

the ﬂy in RAM upon the user’s entry of the password

so it is never saved on the disk. However, passwords

are naturally bounded by low entropy (typically 20-

30 bits) [3]. Hence, cryptographic keys derived from

passwords are subject to brute-force attacks. As soon

as the attacker has access to ciphertext data, the

ciphertext becomes an oracle, against which the at-

tacker can recover the key through the exhaustive

search. Instead of directly using a password-derived

encryption key, Lee et al. proposed to ﬁrst generate a

random AES key for encrypting data and then use the

password to wrap the AES key and store the wrapped

key on the disk [21]. This is essentially equivalent to

deriving the key from the password. The wrapped key

now becomes an oracle, against which the attacker can

run the exhaustive search.

The third method is to store the key in a dencentral-

ized network. Along this line, Geambasu et. al. pro-

pose a solution called Vanish, which generates a

random key to encrypt the user’s data locally and

then distributes shares of the key using Shamir’s

secret sharing scheme to a global-sale, peer-to-peer,

distributed hash tables (DHTs). The shares of the

key naturally disappear (vanish), due to the fact that

the DHT is constantly changing. However, Wochok

et. al. [32] subsequently show two Sybil attacks

that work by continuously crawling the DHT and

recovering the stored key shared before they vanish.

They conclude that the original Vanish scheme cannot

guarantee the secure deletion of the key.

The fourth method is to store the key in a tamper

resistant hardware module (e.g., TPM) and deﬁne the

Application Programming Interface (API) to manage

the stored keys. This is in line with the standard

practice employed in ﬁnancial industry for key man-

agement [3]. In this paper, we will adopt the same

TPM-based approach. However, the main difﬁculty

with the TPM lies in how the API should be deﬁned.

In 2005, Perlman ﬁrst proposed to use a TPM for

assured data deletion [24]. In her solution, data is

always encrypted before being saved onto the disk.

All decryption keys are stored in a tamper resistant

module and do not live outside the module. Erasing

the keys will effectively delete the data. To delete

a key, the user simply sends a delete command to

the module with a reference to that key and receives

a one-bit conﬁrmation if the operation is successful.

Clearly, this design still follows the one-bit return

protocol, which assumes complete trust on the correct

implementation of the software inside the module.

1.3 Motivation for public veriﬁability

There are similar examples of black-box systems in

security. For instance, as explained in [19], the Direct

Recording Electronic (DRE) e-voting machines, widely

used in the US between 2000 and 2004, worked like

a black box. The system returns a tally at the end of

the election, which the voters have to trust but cannot

easily verify. The lack of veriﬁability had raised wide-

spread suspicion about the integrity of the software

inside the voting machine and hence the integrity of

the election, eventually forcing several states in the US

to abandon DRE machines. Today, the importance of

having public veriﬁability in any e-voting system has

been commonly acknowledged and progress is being

made in deploying veriﬁable e-voting in real-world

elections [2], [6].

Unfortunately, the need for public veriﬁability has

been almost entirely neglected in the secure data

erasure ﬁeld. This is an important omission that we

aim to address in this research work.

When a TPM is used for key management, the

trust assumption about the TPM becomes a critical

question. In the past literature [3], there exist two

disparate assumptions about TPM: either completely

trust or totally distrust. However, we ﬁnd neither of

such black/white assumptions is adequate in captur-

ing the reality. On one hand, the fact that a TPM stores

cryptographic keys implies an inherent trust. But on

the other hand, the encapsulated nature of a TPM

prevents users from verifying the internal software,

which inevitably adds distrust. These seemingly con-

tradictory dual-facets are echoes of similar problems

in e-voting, where a DRE machine is used as a trusted

device to record votes, but the public have no access to

its internal code. The established solution to address

this dilemma is “trust-but-verify” [2], [6], [15]: i.e.,

demanding the voting machine to produce additional

cryptographic proofs such that by verifying the cor-

rectness of those proofs a voter can gain conﬁdence

about the integrity of the internal software (this is also

succinctly summarized by Ron Rivest and John Wack

as the “software independence” principle).

Summary of main idea. The main idea of this work

follows the same design principle based on “trust-but-

verify”. By applying cryptographic techniques, we

allow an end user to verify the correct implementation

of two important operations inside a TPM: encryption

and deletion.

First, the user is able to explicitly verify that the en-

cryption follows the correct procedure (i.e., the cipher-

text is free from containing any trap-door block). By

contrast, previous cryptography-based data deletion

solutions only provide implicit assurance: by checking

if the decryption produces the same original plaintext,

one gains implicit assurance about the correctness

of the encryption. However, we argue that such an

implicit assurance is inadequate (in light of Snowden

revelations [40]): a TPM manufacturer might be co-

erced by a state-funded adversary to compress a trap-

door block into the ciphertext so to keep the output

length the same. The user will not be able to notice

any difference and the decryption can still produce

the original plaintext (we will explain more details

in Section 6.2.2). This issue will be addressed in our

solution through the Audit function.

Second, the user is able to verify the outcome of a

deletion process. Obviously, because using software

means can never guarantee the complete deletion

of data, verifying the successful erasure of data ap-

pears intuitively impossible. However, “you normally

change the problem if you can’t solve it.” (David

Wheeler [31]) Here, we slightly change the prob-

lem by shifting verifying the successful deletion of

data to verifying the failure of that operation. The

deletion process returns a digital signature, which

cryptographically binds the deletion program’s com-

mitment of deleting a secret key to the outcome of

that operation. In case the supposedly deleted key is

recovered later, the signature can serve as publicly

veriﬁable evidence to prove the vendor’s liability.

More technical details will be explained in Section 4

after we cover the related work in Section 2 and the

relevant cryptographic primitives in Section 3. Sec-

tion 5 explains the proof-of-concept implementation

with detailed performance measurements, followed

by security analysis in Section 6. Finally, Section 7

concludes the paper.

2 RELATED WORK

In this section, we review related works that discuss

the importance of veriﬁability for secure data deletion.

In 2010, Paul and Saxena [22] aim to give users

the ability to verify the outcome of secure data dele-

tion. They propose a scheme called the “Proof of

Erasability” (PoE), in which a host program deletes

data by overwriting the disk with random patterns

and the disk must return the same patterns as the

proof of erasability. Clearly, this so-called proof is

not cryptographically binding, nor publicly veriﬁable,

since the data storage system may cheat by echoing

the received patterns without actually overwriting the

disk.

In ESORICS’10, Perito and Tsudik [23] study how

to securely erase memory in an embedded device,

as a preparatory step for updating the ﬁrmware in

the device. They propose a protocol called Proofs of

Secure Erasure (PoSE-s). In this protocol, the host

program sends a string of random patterns to the

embedded device. To prove that the memory has been

securely erased, the embedded device should return

the same string of patterns. It is assumed that the

embedded device has limited memory - just enough

to hold the received random patterns. This protocol

works essentially the same way as the PoE in [22], but

with an additional assumption of bounded storage.

Finally, in 2012, Wanson and Wei [34] investigate

the effectiveness of the built-in data erasure mecha-

nisms in several commercial Solid State Drives (SSDs).

They discovered that the built-in “sanitize” methods

in several SSD were completely ineffective due to

software bugs. Based on this discovery, they stress the

importance of being able to independently verify the

data deletion outcome. They propose a veriﬁcation

method that works as follows. First of all, a series

of recognizable patterns are written to the entire

drive. Then, the drive is erased by calling the built-

in “sanitize” command. Next, the drive is manually

dismantled and a custom-built probing tool (made by

the authors) is used to read raw bits from the memory

in search for any unerased data. This approach can

be useful for factory testing. However, it may prove

difﬁcult for ordinary users to perform.

In summary, several researchers have recognized

the importance of veriﬁability in the secure data dele-

tion process and proposed some solutions. But none

of those solutions have used any cryptography. Our

work differs from theirs in that we aim to provide

public veriﬁability for a secure data deletion system by

adopting public key cryptography.

3 CRYPTOGRAPHIC PRIMITIVES

In this section, we explain two relevant cryptographic

primitives: the Difﬁe-Hellman Integrated Encryption

Scheme (DHIES) and Chaum-Pedersen Zero Knowl-

edge Proof.

3.1 DHIES

The DHIES is a public key encryption system adapted

from the Difﬁe-Hellman key exchange protocol and

has been included into the draft standards of ANSI

X9.63 and IEEE P1363a [1]. The scheme is designed to

provide security against chosen ciphertext attacks. It

makes use of a ﬁnite cyclic group, which for example

can be the same cyclic group used in DSA or ECDSA

[29]. Here, we use the ECDSA-like group for illustra-

tion. Let E be an underlying elliptic curve for ECDSA

and G be a base point on the curve with the prime

order n.

Assume the user’s private key is v, which is chosen

at random from [1, n − 1]. The corresponding public

key is Q

= v · G. The encryption in DHIES works

as follows. The program ﬁrst generates an ephemeral

public key Q

= u · G where u ∈

[1, n − 1]. It then

derives a shared secret following the Difﬁe-Hellman

protocol: S = u · Q

. The shared secret is then hashed

through a cryptographic hash function H, and the

output is split into two keys: encKey and macKey.

First, the encKey key is used to encrypt a message to

obtain encM. Then, the macKey key is used to com-

pute a MAC tag from the encrypted message encM.

The ﬁnal ciphertext consists of the ephemeral key Q

the MAC tag and the encrypted message encM. This

encryption process is summarized in Figure 1.

The decryption procedure starts with checking if

the ephemeral public key Q

is a valid element in

the designated group – a step commonly known as

“public key validation”

. Next, it derives the same

shared secret value following the Difﬁe-Hellman pro-

tocol. Based on the shared secret, a hash function is

applied to derive encKey and macKey, according to

Figure 1. Upon the successful validation of the MAC

tag by using the macKey, the encrypted message will

be decrypted accordingly by using the encKey. More

details about DHIES can be found in [1].

It is worth noting that DHIES is essentially built on

the Difﬁe-Hellman key exchange protocol, but with

adaptations to make it suitable for a secure data

storage application. For example, Alice can encrypt a

message under her own public key using DHIES, so

that only she can decrypt the message at a later time.

1. The original DHIES paper [1] does not explicitly mandate

public key validation on the ephemeral public key, but as explained

by Antipa et al. in [4], the security proofs in DHIES [1] implicitly

assume the received points must be on the valid elliptic curve;

otherwise, the scheme may be subject to invalid-curve attacks.

In our speciﬁcation, we regard such public key validation as a

mandatory step.

v·G

u·v·G

u·G

Make

secret value

Make

ephemeral PK

macKey encKey

User’s

public key

Ephemeral

public key

Secret value

tag encM

Figure 1: Encrypting with DHIES [1]. The symmetric

encryption algorithm is denoted as E, the MAC algo-

rithm as T and the hash function as H. The shaded

rectangles constitute the ciphertext.

In some sense, it is like Alice securely communicating

with herself in the future.

For any key exchange protocol, there is always

a key conﬁrmation step, which is either implicit or

explicit [29]. The original DHIES scheme is designed

to provide only implicit key conﬁrmation – the key is

implicitly conﬁrmed by checking the MAC tag. How-

ever, there are two drawbacks with this approach.

First, it does not distinguish two different failure

modes in case the MAC veriﬁcation is unsuccessful.

In the ﬁrst mode, wrong session keys may have been

derived from the key exchange process. For example,

the message had been encrypted by a different key

· G, v

6= v. In the second mode, the encrypted

message encM may have been corrupted (due to

storage errors or malicious tampering). It is sometimes

useful for an application to be able to distinguish the

two modes and handle the failure accordingly, but

this is not possible in the original DHIES. The second

drawback is performance. In DHIES, the latency for

performing implicit key conﬁrmation (through check-

ing MAC) is always linear to the size of the ciphertext.

However, this linear time complexity O(n) can prove

unnecessarily inefﬁcient if the MAC failure was due to

the derivation of wrong session keys. (We will explain

more on this after we describe the Audit function in

Section 4.)

We address both limitations by adding an explicit

key conﬁrmation step to DHIES. This change provides

explicit assurance on the correct derivation of the

session keys. It is consistent with the common under-

standing that in key exchange protocols, explicit key

conﬁrmation is generally considered more desirable

than implicit key conﬁrmation [29]. We will explain

the modiﬁed DHIES in detail in Section 4.

3.2 Chaum-Pedersen protocol

Assume the same Elliptic Curve setting (E, G, n) as

above. Given a tuple (G, X, R, Z) = (G, x·G, r·G, x·r ·

G) where x, r ∈

[1, n−1], the Chaum-Pedersen proto-

col is an honest veriﬁer Zero-Knowledge Proof (ZKP)

Deleting Secret Data with Public Verifiability

Figures

Citations

Blockchain-based publicly verifiable data deletion scheme for cloud storage

Provable data transfer from provable data possession and deletion in cloud storage

A comprehensive meta-analysis of cryptographic security mechanisms for cloud computing

DRE-ip: A Verifiable E-Voting Scheme Without Tallying Authorities

Toward Assured Data Deletion in Cloud Storage

References

How to prove yourself: practical solutions to identification and signature problems

Cryptography: Theory and Practice

Security Engineering: A Guide to Building Dependable Distributed Systems

Plutus: Scalable Secure File Sharing on Untrusted Storage

Helios: web-based open-audit voting

Related Papers (5)

Blockchain-based publicly verifiable data deletion scheme for cloud storage

SoK: Secure Data Deletion

Secure code update for embedded devices via proofs of secure erasure

A survey of confidential data storage and deletion methods

A Secure Data Self-Destructing Scheme in Cloud Computing

Frequently Asked Questions (9)

Q1. What are the contributions in "Deleting secret data with public verifiability" ?

Q2. What are the future works in "Deleting secret data with public verifiability" ?

Q3. Why is it necessary to feed in the entire encrypted message into the audit function input?

Q4. How many random EC pairs can be created?

Q5. What can a user do to gain access to the protected memory?

Q6. What is the first method of deleting a key?

Q7. How do the authors assume the user erases the keys?

Q8. How many user instances can be created?

Q9. What is the reason why the latency in audit is a constant value?