What are the contributions in "Measuring information leakage using generalized gain functions" ?

This paper introduces g-leakage, a rich generalization of the min-entropy model of quantitative information flow. The authors prove important properties of g-leakage, including bounds between min-capacity, g-capacity, and Shannon capacity. The authors also show a deep connection between a strong leakage ordering on two channels, C1 and C2, and the possibility of factoring C1 into C2C3, for some C3. Based on this connection, the authors propose a generalization of the Lattice of Information from deterministic to probabilistic channels.

What are the future works mentioned in the paper "Measuring information leakage using generalized gain functions" ?

The authors also proved important mathematical properties of g-leakage that further attest to the significance of their framework. As future work the authors intend to identify algorithms to calculate g-capacity, possibly using linear programming. Also, it would be interesting to extend g-leakage to the scenario where the adversary does not know the prior π, but instead has ( possibly incorrect ) beliefs about it, as in the works of Clarkson, Myers, and Scheider [ 31 ] and Hamadou, Sassone, and Palamidessi [ 32 ]. The authors are grateful to Miguel E. Andrés for discussions of this work, and to the anonymous referees for their comments and suggestions.

What is the reason for the factorization of C2?

On the assumption that C2’s columns are linearly independent, the linearly independent rows of C2 form an invertible matrix, and so the authors are done by Theorem 6.5.

What is the simplest way to explain the Shannon leakage?

For they tell us that if the authors can show that the mincapacity of C is small, then the authors are guaranteed that the leakage under any gain function g and under any prior π is also small, as is the Shannon leakage.

What is the expected gain of every element of W?

Under the uniform prior π, it is easy to see that the expected gain of every element (u, x) of W is 2−10, since for every u, X [u] is uniformly distributed on [0..1023].

What is the guess under output z?

since X+ is the best guess a priori, the authors conclude by Theorem 4.2 that Lg(π,C2) > 0.Lemma 6.4 allows us to prove some significant special cases of Conjecture 6.3, as the authors now show.

What is the exact setW of guesses?

Note that the exact setW of guesses is not important, as any gain function with n possible guesses can be represented by a n×|X | matrix G. First, from Theorem 6.9 (adapted to ≤Gn instead of ≤G), the authors know that C1 ≤Gn C2 iff Lg(πu, C1) ≤ Lg(πu, C2) for all g ∈ Gn.

What is the order of Bayes risk?

This order is sound and complete for Bayes risk, and they show that Bayes risk is maximally discerning, if contexts are taken into account, when compared to the alternative elementary tests of marginal guesswork, guessing entropy and Shannon entropy.

What is the prior that realizes gd-capacity?

Now if the authors consider the prior π′ = (0.5, 0.5, 0), the authors find that Vgd(π′) = 0.5, pY = (0.3, 0.7), Vgd(pX|y1) = 1, Vgd(pX|y2) = 5 7 , and Vgd(π′,Ex5) = 0.8, which gives Lgd(π′,Ex5) = log 1.6 ≈ 0.6781.

What is the simplest way to prove that C2 is invertible?

Note that ≤G ⊆ ≤G2 ; the above theorem shows that in the case when C2 is invertible, the conjecture holds even if the authors restrict to 2-block gain functions.

What is the implication of the solution to the problem of maximizing tr(D?

Recall from Section IV-C that Vg(π,C1) is the solution to the problem of maximizing tr(DπC1SG) subject to S being a channel matrix.

(Open Access) Measuring Information Leakage Using Generalized Gain Functions (2012) | Mário S. Alvim

HAL Id: hal-00734044

https://hal.inria.fr/hal-00734044

Submitted on 20 Sep 2012

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-

entic research documents, whether they are pub-

lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diusion de documents

scientiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Measuring Information Leakage using Generalized Gain

Functions

Mário S. Alvim, Konstantinos Chatzikokolakis, Catuscia Palamidessi, Georey

Smith

To cite this version:

Mário S. Alvim, Konstantinos Chatzikokolakis, Catuscia Palamidessi, Georey Smith. Measuring

Information Leakage using Generalized Gain Functions. Computer Security Foundations, 2012, Cam-

bridge MA, United States. pp.265-279, �10.1109/CSF.2012.26�. �hal-00734044�

Measuring Information Leakage using Generalized Gain Functions

M´ario S. Alvim

∗

, Konstantinos Chatzikokolakis

†

, Catuscia Palamidessi

†

, and Geoffrey Smith

‡

∗

University of Pennsylvania, USA, msalvim@sas.upenn.edu

†

INRIA, CNRS and LIX,

Ecole Polytechnique, France, {kostas,catuscia}@lix.polytechnique.fr

‡

Florida International University, USA, smithg@cis.ﬁu.edu

Abstract—This paper introduces g-leakage, a rich general-

ization of the min-entropy model of quantitative information

ﬂow. In g-leakage, the beneﬁt that an adversary derives from

a certain guess about a secret is speciﬁed using a gain function

g. Gain functions allow a wide variety of operational scenarios

to be modeled, including those where the adversary beneﬁts

from guessing a value close to the secret, guessing a part of the

secret, guessing a property of the secret, or guessing the secret

within some number of tries. We prove important properties of

g-leakage, including bounds between min-capacity, g-capacity,

and Shannon capacity. We also show a deep connection between

a strong leakage ordering on two channels, C

and C

, and

the possibility of factoring C

into C

, for some C

. Based

on this connection, we propose a generalization of the Lattice

of Information from deterministic to probabilistic channels.

I. INTRODUCTION

A fundamental concern in computer security is to control

information ﬂow, whether to protect conﬁdential information

from being leaked, or to protect trusted information from

being tainted. In view of the pragmatic difﬁculty of prevent-

ing undesirable ﬂows completely, there is now much interest

in theories that allow information ﬂow to be quantiﬁed, so

that “small” leaks can be tolerated. (See, for example, [1],

[2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12].) For

any leakage measure, a key challenge is to establish its

operational signiﬁcance, so that a certain amount of leakage

implies a deﬁnite security guarantee.

Min-entropy leakage [10], [13] is a leakage measure based

on the amount by which a channel increases the vulnerability

of a secret to being guessed correctly in one try by an

adversary.

This clear operational signiﬁcance is a strength

of min-entropy, but it also leads to questions about whether

min-entropy leakage is relevant across the wide range of

possible application scenarios. For instance, what if the

adversary is allowed to make multiple guesses? Or what if

the adversary could gain some beneﬁt by guessing the secret

only partially or approximately?

With respect to guessing the secret partially, we can note

that we could in fact analyze a sub-channel that models

The precise deﬁnition is reviewed in Section II.

This work has been partially supported by the European Union Seventh Framework

Programme under grant agreement no. 295261 (MEALS), and by the Inria large scale

initiative CAPPRIS: Collaborative Action for the Protection of Privacy Rights in the

Information Society.

the processing of whatever piece of a larger secret that we

wish to consider. While this can be useful, it is clumsy to

need to analyze multiple sub-channels of the same channel.

Also, such an analysis is misleading in the case of a channel

that poses little threat to any particular piece of the secret,

yet is very likely to leak some piece of the secret. To

illustrate, suppose that the secret is an array X containing

10-bit, uniformly-distributed passwords for 1000 users. Now

consider the following probabilistic channel, which leaks

some randomly-chosen user’s password:

← {0..999};

Y = (u, X[u]);

(Ex1)

If we analyze the min-entropy leakage of (Ex1), we ﬁnd

that the prior vulnerability is 2

−10000

, since there are 10000

completely unknown bits, while the posterior vulnerability

is 2

−9990

, since Y reveals 10 of the bits. The min-entropy

leakage is the logarithm of the ratio of the posterior and

prior vulnerabilities:

L = lo g

−9990

−10000

= log 2

= 10 bits.

If we instead analyze the sub-channel focused on any partic-

ular user i’s password, the prior vulnerability is 2

−10

, and the

posterior vulnerability is 0.001 ·1 + 0.999 ·2

−10

≈ 0.00198,

since with probability 0.001 , the adversary learns user i’s

password from Y , and with probability 0 .999, he must still

make a blind guess. Thus the min-entropy leakage of the

sub-channel is log 2.023 ≈ 1.016 bits. Hence we see that

the threat of (Ex1) is not well described by min-entropy

leakage—the whole channel leaks just 10 bits out of 10000,

and the sub-channel just 1.016 bits out of 10, even though

some user’s password is always leaked completely.

In light of the wide range of possible operational threat

scenarios, there is growing appreciation that no single leak-

age measure is likely to be appropriate in all cases. For

this reason, in this paper we introduce a generalization of

min-entropy leakage, called g-leakage. The key idea is to

generalize the notion of vulnerability to incorporate what

we call a gain function g that models the beneﬁt that the

adversary gains by making a certain guess about the secret. If

the adversary makes guess w when the secret’s actual value

is x, then g(w, x) models the beneﬁt that the adversary gains

from this guess, ranging from 0 (if w has no value at all)

to 1 (if w is ideal). Given gain function g, g-vulnerability is

deﬁned as the maximum expected gain over all guesses.

As we will see in Section III, gain functions let us

model a wide variety of scenarios, including those where

the adversary beneﬁts from guessing a value close to the

secret, guessing a part of the secret, guessing a property of

the secret, or guessing the secret within k tries. We can also

model the case when there is a penalty for incorrect guesses.

Thus g-leakage seems fruitful in addressing a great number

of practical situations.

In addition to introducing the new concept of g-leakage,

we also make signiﬁcant technical contributions, principally

in Sections V and VI.

In Section V, we establish important bounds on capacity,

the maximum leakage over all prior distributions. We prove

that min-capacity is an upper bound on g-capacity, for any

gain function g—this means that a channel with small min-

capacity is (in a sense) safe in every possible scenario.

Moreover, we prove that min-capacity is also an upper bound

on Shannon capacity, settling a conjecture in [14].

In Section VI, we consider the problem of comparing

two channels, C

and C

, asking whether on every prior the

leakage of C

is less than or equal to that of C

. Yasuoka

and Terauchi [15] and Malacaria [16] recently explored this

strong ordering in the case where C

and C

are determin-

istic, focusing on the fact that deterministic channels induce

partitions on the space of secrets. They showed that the

orderings produced by min-entropy leakage and Shannon

leakage are the same and, moreover, they coincide with the

partition reﬁnement ordering ⊑ in the Lattice of Information

[17]. Since partition reﬁnement applies only to deterministic

channels but leakage ordering makes sense for any channels,

this equivalence suggests an approach to generalizing the

Lattice of Information to probabilistic channels.

Our ﬁrst result in Section VI identiﬁes a promising

generalization of partition reﬁnement ⊑. We show that on

deterministic channels, C

⊑ C

iff there exists a factoriza-

tion of C

into a cascade: C

= C

, for some channel

. In this case we say that C

is composition reﬁned by

, written C

⊑

◦

. In the most technically challenging

part of our paper, we show a deep connection between ⊑

◦

and leakage ordering. We show ﬁrst in Theorem 6.2 that

⊑

◦

implies that C

’s g-leakage is less than or equal

to C

’s, for every prior and every g; we denote this by

≤

. We conjecture that the converse implication,

≤

implies C

⊑

◦

, is also true, but it turns out to

be extremely subtle and we have been unable so far to prove

it in full generality. We have proved it in important special

cases (e.g. when C

’s columns are linearly independent)

even limiting to a very restricted kind of gain function; we

have also shown that the unproved case is inherently harder,

in that much richer gain functions are required.

The rest of the paper is structured as follows. Sections II,

III, and IV present preliminaries, deﬁne g-leakage, and show

its basic properties. Sections V and VI present our results on

capacity and on comparing channels. Finally, Sections VII

and VIII discuss related work and conclude.

II. PRELIMINARIES

In this section, we brieﬂy recall the basic deﬁnitions of

information-theoretic channels [18], vulnerability, and min-

entropy leakage [10], introducing the non-standard notation

that we will use.

A channel is a triple (X, Y, C), where X and Y are ﬁnite

sets (of secret input values and observable output values) and

C is a channel matrix, an |X|×|Y| matrix whose entries are

between 0 and 1 and whose rows each sum to 1; the intent

is that C[x, y] is the probability of getting output y when

the input is x. Channel C is deterministic if each entry of C

is either 0 or 1, implying that each row contains exactly one

1, which means that each input produces a unique output.

Given a prior distribution π on X, we can deﬁne a joint

distribution p on X ×Y by p(x, y) = π[x]C[x, y]. This gives

jointly distributed random variables X and Y with marginal

probabilities p(x) =

p(x, y), conditional probabilities

p(y|x) =

p(x,y)

p(x)

(if p(x) is nonzero), and similarly p(y) and

p(x|y). As shown in [19], p is the unique joint distribution

that recovers π and C, in that p(x) = π[x] and p(y|x) =

C[x, y] (if p(x) is nonzero).

We now deﬁne vulnerability, introducing a new notation.

Deﬁnition 2.1: Given prior π and channel C, the prior

vulnerability is given by

V (π ) = max

x∈X

π[x],

and the posterior vulnerability is given by

V (π , C) =

y∈Y

max

x∈X

π[x]C[x, y].

We assume in this paper that the prior distribution π and

channel C are known to the adversary A. Then V (π) is the

prior probability that A could guess the value of X correctly

in one try. To understand posterior vulnerability, note that

V (π , C) =

max

p(x, y)

p(y) max

p(x|y)

p(y)V (p

X|y

)

making it the (weighted) average of the vulnerabilities of

the posterior distributions p

X|y

We convert from vulnerability to min-entropy by taking

the negative logarithm (to base 2):

Deﬁnition 2.2:

∞

(π) = −log V (π)

∞

(π, C) = −log V (π, C).

We deviate from the standard notation V (X) and V (X|Y ) used in

[14] and elsewhere, because we wish to express explicitly the dependence

on X’s prior distribution.

Note that vulnerability is a probability, while min-entropy

is a measure of bits of uncertainty.

Next we deﬁne min-entropy leakage L(π, C) and min-

capacity ML(C):

Deﬁnition 2.3:

L(π, C) = H

∞

(π) − H

∞

(π, C) = log

V (π , C)

V (π )

ML(C) = sup

L(π, C).

The min-entropy leakage L(π, C) is the amount by which

channel C decreases the uncertainty about the secret; equiv-

alently, it is the logarithm of the factor by which C increases

the vulnerability. The min-capacity ML(C) is the maximum

min-entropy leakage over all priors π; it can be seen as the

worst-case leakage of C.

Finally, we recall [13] that the min-capacity of C is easy

to calculate, as it is simply the logarithm of the sum of the

column maximums of C:

Theorem 2.1: ML(C) = log

max

C[x, y], and it is

realized on a uniform prior π.

III. GAIN FUNCTIONS, g-VULNERABILITY, AND

g-LEAKAGE

We now develop the theory of gain functions and the

leakage measures that they give.

Implicit in the deﬁnition of prior and posterior vulnerabil-

ity V (π) and V (π , C) is the assumption that the adversary

beneﬁts only by guessing the entire secret exactly. But,

as motivated in Section I, there are certainly situations

where this assumption is not appropriate. This leads us to

introduce what we call gain functions as abstract models of

the particular operational scenario. The idea is that in any

such scenario, there will be some set of guesses that the

adversary could make about the secret, and for any guess w

and secret value x, there will be some gain that the adversary

gets by choosing w when the secret’s actual value is x. A

gain function g will specify this gain as g(w, x), using scores

that range from 0 to 1.

A ﬁrst question, however, is what should be the set of

allowable guesses. One might be tempted to assume that this

should just be X, the set of possible values of the secret.

But given our desire to model scenarios where the adversary

gains by guessing a piece of the secret, or a value close to

the secret, or some property of the secret, we instead let a

gain function use an arbitrary set W of allowable guesses.

Deﬁnition 3.1: Given a set X of possible secrets and a

ﬁnite, nonempty set W of allowable guesses, a gain function

is a function g : W × X → [0, 1].

Sometimes it is convenient to represent a gain function g

as a |W|×|X| matrix G, where G[w, x] = g(w, x); the rows

of G correspond to guesses and the columns to secrets.

We now adapt the deﬁnition of vulnerability to take

account of the gain function:

Deﬁnition 3.2: Given gain function g and prior π, the

prior g-vulnerability is

(π) = max

w∈W

x∈X

π[x]g(w, x).

The idea is that adversary A should make a guess w that

maximizes the expected gain; we therefore take the weighted

average of g(w, x), for every possible value x of X.

Deﬁnition 3.3: Given gain function g, prior π, and chan-

nel C, the posterior g-vulnerability is

(π, C) =

y∈Y

max

w∈W

x∈X

π[x]C[x, y]g(w, x)

y∈Y

max

w∈W

x∈X

p(x, y)g(w, x)

y∈Y

p(y)V

X|y

)

Now we deﬁne g-entropy, g-leakage, and g-capacity in

exactly the same way as in Section II:

Deﬁnition 3.4:

(π) = −log V

(π)

(π, C) = −log V

(π, C)

(π, C) = H

(π) − H

(π, C) = log

(π, C)

(π)

(π, C)

In Section IV, we will explore the mathematical properties

of g-leakage. But ﬁrst we present a number of example gain

functions that illustrate the usefulness of g-leakage.

A. The identity gain function

One obvious (and often appropriate) gain function is the

one that says that a correct guess is worth 1 and an incorrect

guess is worth 0:

Deﬁnition 3.5: The identity gain function g

: X ×X →

[0, 1] is given by

(w, x) =



1, if w = x,

0, if w 6= x.

Note that for g

we assume that W = X, since there is

no gain to be had from a guess outside of X. In terms of

representing a gain function as a matrix, g

corresponds to

the identity matrix I

|X |

. Also notice that g

is the Kronecker

delta, since g

(w, x) = δ

Now we can show that g-vulnerability is a generalization

of ordinary vulnerability:

Proposition 3.1: Vulnerability under g

coincides with

vulnerability:

(π) = V (π).

We remark that our assumption that gain values are between 0 and 1 is

unimportant. Allowing g to return a value in [0, a], for some constant a,

just scales all g-vulnerabilities by a factor of a and therefore has no effect

on g-leakage.

Proof: Note for any w,

π[x]g

(w, x) = π[w]. So

(π) = max

π[w] = V (π).

This means that g

-leakage coincides with min-entropy

leakage.

B. Gain functions induced from metrics or other distance

functions

Exploring other gain functions, one quite natural kind of

structure that X may exhibit is a notion of distance between

secrets. That is, there may be a metric d on X, which is a

function

d : X × X → [0, ∞)

satisfying the properties

• (identity of indiscernibles) d(x

, x

) = 0 iff x

= x

• (symmetry) d(x

, x

) = d(x

, x

), and

• (triangle inequality) d(x

, x

) ≤ d(x

, x

) + d(x

, x

Given a metric d, we can ﬁrst form a normalized metric

by dividing all distances by the maximum value of d, and

then we can deﬁne a gain function g

(w, x) = 1 −

d(w, x).

(Note that here we are taking W = X.) In this case we say

that g

is the gain function induced from metric d.

Metrics induce a large class of gain functions—note in

particular that the identity gain function is induced by the

discrete metric, which assigns distance 1 to any two distinct

values. However, there are several reasons why it is useful

to allow more generality.

For one thing, it may make sense to generalize to a metric

on a set W that is a superset of X. To see why, suppose

that the space of secrets is the set of corner points of a

unit square: X = {(0, 0), (0, 1), (1, 0), (1, 1)}. Suppose that

we use the gain function g(w, x) = 1 −

d(w, x), where the

metric

d is the normalized Euclidean distance:

d((x

, y

), (x

, y

)) =

− x

)

+ (y

− y

)

Now,

(π) = max

π[x](1 −

d(w, x))

and if π is uniform, then it is easy to see that any of the

four corner points are equally-good guesses, giving

(π) =

(1 + 2(1 −

√

) + 0) ≈ 0.3964

But the adversary could actually do better by guessing

(

), a value that is not in X, since that guess has

normalized distance

from each of the four corner points,

giving V

(π) =

, which is larger than the previous

vulnerability.

However, it is also rather natural to deﬁne a gain function from a metric

by g(w, x) = e

−d(w,x)

; note that here we would actually want d to be

an extended metric, so that a gain of 0 becomes possible.

Moreover, the assumption of symmetry is sometimes in-

appropriate. Suppose that the secret is the time (rounded to

the nearest minute) that the last RER B train will depart

from Loz`ere back to Paris.

The adversary (i.e. the weary

traveler) wants to guess this time as accurately as possible,

but note that guessing 23:44 when the actual time is 23:47

is completely different from guessing 23:47 when the actual

time is 23:44! If we normalize so that a wait of an hour

or more is considered intolerable, then we would want the

distance function

d(w, x) =



x−w

if x − 60 < w ≤ x

1 otherwise

and the gain function

g(w, x) = 1 − d(w, x).

But d(w, x) is not a metric, because it is not symmetric.

C. Binary gain functions

The family of gain functions that return either 0 or 1

(and no values in between) are of particular interest, since

we can characterize them concretely. For given such a gain

function, each guess exactly corresponds to the subset of X

for which that guess gives gain 1. (Moreover we can assume

without loss of generality that no two guesses correspond to

the same subset of X, since such guesses may as well be

merged into one.) Hence we can use the subsets themselves

as the guesses, leading to the following deﬁnition:

Deﬁnition 3.6: Given W ⊆ 2

, W nonempty, the binary

gain function g

is given by

(W, x) =



1, if x ∈ W

0, otherwise.

Now we can identify a number of interesting gain func-

tions by considering different choices of W.

1) 2-block gain functions: If W = {W, X \W } then we

can see W as a property that the secret X might or might

not satisfy, and g

is the gain function corresponding to

an adversary that just wants to decide whether or not X

satisﬁes that property.

Such 2-block gain functions are reminiscent of the cryp-

tographic notion of indistinguishability, which demands that

from a ciphertext an adversary should not be able to decide

any property of the corresponding plaintext.

2) Partition gain functions: More generally, W could be

any partition of X into one or more disjoint blocks, where

the adversary just wants to determine which block the secret

belongs to.

This is equivalent to saying that W = X/∼, where ∼ is

an equivalence relation on X.

It is well known that RATP uses sophisticated techniques, such as the

droit de retrait, to make this time as unpredictable as possible.

Such a function is sometimes called a quasimetric.

Measuring Information Leakage Using Generalized Gain Functions

Citations

Data Mining Practical Machine Learning Tools and Techniques

Elements of Information Theory (2nd ed.). Thomas M. Cover and Joy A. Thomas

An operational measure of information leakage

Privacy Games: Optimal User-Centric Data Obfuscation

An Operational Approach to Information Leakage

References

Elements of information theory

Data Mining: Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques

Bayesian Estimation and Prediction Using Asymmetric Loss Functions

Loss function‐based evaluation of DSGE models

Related Papers (5)

On the Foundations of Quantitative Information Flow

An information-theoretic model for adaptive side-channel attacks

A mathematical theory of communication

Quantitative Notions of Leakage for One-try Attacks

Anonymity protocols as noisy channels

Frequently Asked Questions (11)

Q1. What are the contributions in "Measuring information leakage using generalized gain functions" ?

Q2. What are the future works mentioned in the paper "Measuring information leakage using generalized gain functions" ?

Q3. What is the reason for the factorization of C2?

Q4. What is the simplest way to explain the Shannon leakage?

Q5. What is the expected gain of every element of W?

Q6. What is the guess under output z?

Q7. What is the exact setW of guesses?

Q8. What is the order of Bayes risk?

Q9. What is the prior that realizes gd-capacity?

Q10. What is the simplest way to prove that C2 is invertible?

Q11. What is the implication of the solution to the problem of maximizing tr(D?