scispace - formally typeset
Search or ask a question
Author

Samson Zhou

Bio: Samson Zhou is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Computer science & Streaming algorithm. The author has an hindex of 11, co-authored 73 publications receiving 401 citations. Previous affiliations of Samson Zhou include Purdue University & University of Haifa.

Papers published on a yearly basis

Papers
More filters
Proceedings ArticleDOI
01 May 2018
TL;DR: It is advocated that password hashing standards should be updated to require the use of memory hard functions for password hashing and disallow the useof non-memory hard functions such as BCRYPT or PBKDF2.
Abstract: We develop an economic model of an offline password cracker which allows us to make quantitative predictions about the fraction of accounts that a rational password attacker would crack in the event of an authentication server breach. We apply our economic model to analyze recent massive password breaches at Yahoo!, Dropbox, LastPass and AshleyMadison. All four organizations were using key-stretching to protect user passwords. In fact, LastPass' use of PBKDF2-SHA256 with 10^5 hash iterations exceeds 2017 NIST minimum recommendation by an order of magnitude. Nevertheless, our analysis paints a bleak picture: the adopted key-stretching levels provide insufficient protection for user passwords. In particular, we present strong evidence that most user passwords follow a Zipf's law distribution, and characterize the behavior of a rational attacker when user passwords are selected from a Zipf's law distribution. We show that there is a finite threshold which depends on the Zipf's law parameters that characterizes the behavior of a rational attacker — if the value of a cracked password (normalized by the cost of computing the password hash function) exceeds this threshold then the adversary's optimal strategy is always to continue attacking until each user password has been cracked. In all cases (Yahoo!, Dropbox, LastPass and AshleyMadison) we find that the value of a cracked password almost certainly exceeds this threshold meaning that a rational attacker would crack all passwords that are selected from the Zipf's law distribution (i.e., most user passwords). This prediction holds even if we incorporate an aggressive model of diminishing returns for the attacker (e.g., the total value of 500 million cracked passwords is less than 100 times the total value of 5 million passwords). On a positive note our analysis demonstrates that memory hard functions (MHFs) such as SCRYPT or Argon2i can significantly reduce the damage of an offline attack. In particular, we find that because MHFs substantially increase guessing costs a rational attacker will give up well before he cracks most user passwords and this prediction holds even if the attacker does not encounter diminishing returns for additional cracked passwords. Based on our analysis we advocate that password hashing standards should be updated to require the use of memory hard functions for password hashing and disallow the use of non-memory hard functions such as BCRYPT or PBKDF2.

57 citations

Posted Content
TL;DR: The results show there is no separation between the sliding window model and the standard data stream model in terms of the approximation factor, and the first difference estimators for a wide range of problems are developed.
Abstract: We introduce difference estimators for data stream computation, which provide approximations to $F(v)-F(u)$ for frequency vectors $v\succeq u$ and a given function $F$. We show how to use such estimators to carefully trade error for memory in an iterative manner. The function $F$ is generally non-linear, and we give the first difference estimators for the frequency moments $F_p$ for $p\in[0,2]$, as well as for integers $p>2$. Using these, we resolve a number of central open questions in adversarial robust streaming and sliding window models. For adversarially robust streams, we obtain a $(1+\epsilon)$-approximation to $F_p$ using $\tilde{\mathcal{O}}\left(\frac{\log n}{\epsilon^2}\right)$ bits of space for $p\in[0,2]$, and using $\tilde{\mathcal{O}}\left(\frac{1}{\epsilon^2}n^{1-2/p}\right)$ bits of space for integers $p>2$. We also obtain an adversarially robust algorithm for the $L_2$-heavy hitters problem using $\mathcal{O}\left(\frac{\log n}{\epsilon^2}\right)$ bits of space. Our bounds are optimal up to $\text{poly}(\log\log n + \log(1/\epsilon))$ factors, and improve the $\frac{1}{\epsilon^3}$ dependence of Ben-Eliezer, et al. (PODS 2020, best paper award) and the $\frac{1}{\epsilon^{2.5}}$ dependence of Hassidim, et al. (NeurIPS 2020, oral presentation). For sliding windows, we obtain a $(1+\epsilon)$-approximation to $F_p$ using $\tilde{\mathcal{O}}\left(\frac{\log^2 n}{\epsilon^2}\right)$ bits of space for $p\in(0,2]$, resolving a longstanding question of Braverman and Ostrovsky (FOCS 2007). For example, for $p = 2$ we improve the dependence on $\epsilon$ from $\frac{1}{\epsilon^4}$ to an optimal $\frac{1}{\epsilon^2}$. For both models, our dependence on $\epsilon$ shows, up to $\log\frac{1}{\epsilon}$ factors, that there is no overhead over the standard insertion-only data stream model for any of these problems.

37 citations

Proceedings Article
30 Apr 2020
TL;DR: This work proposes the first efficient, data-independent neural pruning algorithm with a provable trade-off between its compression rate and the approximation error for any future test sample, and demonstrates the effectiveness of the method on popular network architectures.
Abstract: Previous work showed empirically that large neural networks can be significantly reduced in size while preserving their accuracy. Model compression became a central research topic, as it is crucial for deployment of neural networks on devices with limited computational and memory resources. The majority of the compression methods are based on heuristics and offer no worst-case guarantees on the trade-off between the compression rate and the approximation error for an arbitrarily new sample. We propose the first efficient, data-independent neural pruning algorithm with a provable trade-off between its compression rate and the approximation error for any future test sample. Our method is based on the coreset framework, which finds a small weighted subset of points that provably approximates the original inputs. Specifically, we approximate the output of a layer of neurons by a coreset of neurons in the previous layer and discard the rest. We apply this framework in a layer-by-layer fashion from the top to the bottom. Unlike previous works, our coreset is data independent, meaning that it provably guarantees the accuracy of the function for any input $x\in \mathbb{R}^d$, including an adversarial one. We demonstrate the effectiveness of our method on popular network architectures. In particular, our coresets yield 90% compression of the LeNet-300-100 architecture on MNIST while improving the accuracy.

36 citations

Book ChapterDOI
12 Nov 2017
TL;DR: The cumulative memory cost of computing Argon2i is analyzed and a lower bound for Argon2 i is provided which demonstrates that the lower bound is nearly tight.
Abstract: Argon2i is a data-independent memory hard function that won the password hashing competition. The password hashing algorithm has already been incorporated into several open source crypto libraries such as libsodium. In this paper we analyze the cumulative memory cost of computing Argon2i. On the positive side we provide a lower bound for Argon2i. On the negative side we exhibit an improved attack against Argon2i which demonstrates that our lower bound is nearly tight. In particular, we show that

29 citations

Journal ArticleDOI
TL;DR: This work studies group testing models in which the testing procedure is constrained to be “sparse,” and provides information-theoretic lower bounds on the number of tests required to guarantee high probability recovery.
Abstract: Group testing is the process of pooling arbitrary subsets from a set of $ {n}$ items so as to identify, with a minimal number of tests, a “small” subset of $ {d}$ defective items. In “classical” non-adaptive group testing, it is known that when $ {d}$ is substantially smaller than $ {n}$ , $\Theta ( {d}\log ( {n}))$ tests are both information-theoretically necessary and sufficient to guarantee recovery with high probability. Group testing schemes in the literature that meet this bound require most items to be tested $ {\Omega }(\log ( {n}))$ times, and most tests to incorporate $ {\Omega }({{n/d}})$ items. Motivated by physical considerations, we study group testing models in which the testing procedure is constrained to be “sparse.” Specifically, we consider (separately) scenarios in which 1) items are finitely divisible and hence may participate in at most $ {\gamma } \in {o}(\log ( {n}))$ tests; or 2) tests are size-constrained to pool no more than $\rho \in {o}({{n/d}})$ items per test. For both scenarios, we provide information-theoretic lower bounds on the number of tests required to guarantee high probability recovery. In particular, one of our main results shows that $ {\gamma }$ -finite divisibility of items forces any non-adaptive group testing algorithm with the probability of recovery error at most $ {\epsilon }$ to perform at least $ {\gamma } {d}({ {n/d}})^{({1}-{5} {\epsilon })/ {\gamma }}$ tests. Analogously, for $ {\rho }$ -sized constrained tests, we show an information-theoretic lower bound of $ {\Omega }( {n}/ {\rho })$ tests for high-probability recovery–hence in both settings the number of tests required grows dramatically (relative to the classical setting) as a function of $ {n}$ . In both scenarios, we provide both randomized constructions and explicit constructions of designs with computationally efficient reconstruction algorithms that require a number of tests that is optimal up to constant or small polynomial factors in some regimes of ${{n, d,}} {\gamma }$ , and $ {\rho }$ . The randomized design/reconstruction algorithm in the $ {\rho }$ -sized test scenario is universal –independent of the value of $ {d}$ , as long as $ {\rho } \in {o}({\textbf {n/d}})$ . We also investigate the effect of unreliability/noise in test outcomes, and show that whereas the impact of noise in test outcomes can be obviated with a small (constant factor) penalty in the number of tests in the $ {\rho }$ -sized tests scenario, there is no group-testing procedure, regardless of the number of tests, that can combat noise in the $ {\gamma }$ -divisible scenario.

29 citations


Cited by
More filters
Journal Article
TL;DR: In this paper, the authors consider the question of determining whether a function f has property P or is e-far from any function with property P. In some cases, it is also allowed to query f on instances of its choice.
Abstract: In this paper, we consider the question of determining whether a function f has property P or is e-far from any function with property P. A property testing algorithm is given a sample of the value of f on instances drawn according to some distribution. In some cases, it is also allowed to query f on instances of its choice. We study this question for different properties and establish some connections to problems in learning theory and approximation.In particular, we focus our attention on testing graph properties. Given access to a graph G in the form of being able to query whether an edge exists or not between a pair of vertices, we devise algorithms to test whether the underlying graph has properties such as being bipartite, k-Colorable, or having a p-Clique (clique of density p with respect to the vertex set). Our graph property testing algorithms are probabilistic and make assertions that are correct with high probability, while making a number of queries that is independent of the size of the graph. Moreover, the property testing algorithms can be used to efficiently (i.e., in time linear in the number of vertices) construct partitions of the graph that correspond to the property being tested, if it holds for the input graph.

870 citations

Book ChapterDOI
01 Jan 1996

378 citations

Proceedings Article
01 Jan 1991
TL;DR: The proofs are based on the recent characterization of the power of multiprover interactive protocols and on random self-reducibility via low degree polynomials and exhibit an interplay between Boolean circuit simulation, interactive proofs and classical complexity classes.
Abstract: It is shown that BPP can be simulated in subexponential time for infinitely many input lengths unless exponential time collapses to the second level of the polynomial-time hierarchy, has polynomial-size circuits, and has publishable proofs (EXPTIME=MA). It is also shown that BPP is contained in subexponential time unless exponential time has publishable proofs for infinitely many input lengths. In addition, it is shown that BPP can be simulated in subexponential time for infinitely many input lengths unless there exist unary languages in MA/P. The proofs are based on the recent characterization of the power of multiprover interactive protocols and on random self-reducibility via low degree polynomials. They exhibit an interplay between Boolean circuit simulation, interactive proofs and classical complexity classes. An important feature of this proof is that it does not relativize.<>

255 citations

Proceedings Article
21 Nov 2020
TL;DR: This paper introduces a novel algorithm, called FetchSGD, which compresses model updates using a Count Sketch, and then takes advantage of the mergeability of sketches to combine model updates from many workers.
Abstract: Existing approaches to federated learning suffer from a communication bottleneck as well as convergence issues due to sparse client participation. In this paper we introduce a novel algorithm, called FetchSGD, to overcome these challenges. FetchSGD compresses model updates using a Count Sketch, and then takes advantage of the mergeability of sketches to combine model updates from many workers. A key insight in the design of FetchSGD is that, because the Count Sketch is linear, momentum and error accumulation can both be carried out within the sketch. This allows the algorithm to move momentum and error accumulation from clients to the central aggregator, overcoming the challenges of sparse client participation while still achieving high compression rates and good convergence. We prove that FetchSGD has favorable convergence guarantees, and we demonstrate its empirical effectiveness by training two residual networks and a transformer model.

169 citations

Journal ArticleDOI
TL;DR: A survey of recent developments in the group testing problem from an information-theoretic perspective can be found in this article, where the authors assess the theoretical guarantees in terms of scaling laws and constant factors.
Abstract: The group testing problem concerns discovering a small number of defective items within a large population by performing tests on pools of items. A test is positive if the pool contains at least one defective, and negative if it contains no defectives. This is a sparse inference problem with a combinatorial flavour, with applications in medical testing, biology, telecommunications, information technology, data science, and more. In this monograph, we survey recent developments in the group testing problem from an information-theoretic perspective. We cover several related developments: efficient algorithms with practical storage and computation requirements, achievability bounds for optimal decoding methods, and algorithm-independent converse bounds. We assess the theoretical guarantees not only in terms of scaling laws, but also in terms of the constant factors, leading to the notion of the {\em rate} of group testing, indicating the amount of information learned per test. Considering both noiseless and noisy settings, we identify several regimes where existing algorithms are provably optimal or near-optimal, as well as regimes where there remains greater potential for improvement. In addition, we survey results concerning a number of variations on the standard group testing problem, including partial recovery criteria, adaptive algorithms with a limited number of stages, constrained test designs, and sublinear-time algorithms.

110 citations