scispace - formally typeset
Search or ask a question
Author

Arya Mazumdar

Bio: Arya Mazumdar is an academic researcher from University of California, San Diego. The author has contributed to research in topics: Block code & Upper and lower bounds. The author has an hindex of 25, co-authored 173 publications receiving 2312 citations. Previous affiliations of Arya Mazumdar include Massachusetts Institute of Technology & University of Minnesota.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors derived several lower and upper bounds on the size of codes for rank modulation and showed that for any fixed number of errors, there are codes whose size is within a constant factor of the sphere packing bound.
Abstract: Codes for rank modulation have been recently proposed as a means of protecting flash memory devices from errors. We study basic coding theoretic problems for such codes, representing them as subsets of the set of permutations of n elements equipped with the Kendall tau distance. We derive several lower and upper bounds on the size of codes. These bounds enable us to establish the exact scaling of the size of optimal codes for large values of n. We also show the existence of codes whose size is within a constant factor of the sphere packing bound for any fixed number of errors.

174 citations

Journal ArticleDOI
TL;DR: It turns out that the binary Simplex codes satisfy the minimum distance of a code in terms of its length, size, and locality and are the first example of an optimal binary locally repairable code family.
Abstract: In a locally recoverable or repairable code, any symbol of a codeword can be recovered by reading only a small (constant) number of other symbols. The notion of local recoverability is important in the area of distributed storage where a most frequent error-event is a single storage node failure (erasure). A common objective is to repair the node by downloading data from as few other storage nodes as possible. In this paper, we bound the minimum distance of a code in terms of its length, size, and locality. Unlike the previous bounds, our bound follows from a significantly simple analysis and depends on the size of the alphabet being used. It turns out that the binary Simplex codes satisfy our bound with equality; hence, the Simplex codes are the first example of an optimal binary locally repairable code family. We also provide achievability results based on random coding and concatenated codes that are numerically verified to be close to our bounds.

172 citations

Journal ArticleDOI
TL;DR: This work derives several lower and upper bounds on the size of codes for rank modulation, and shows the existence of codes whose size is within a constant factor of the sphere packing bound for any fixed number of errors.
Abstract: Codes for rank modulation have been recently proposed as a means of protecting flash memory devices from errors. We study basic coding theoretic problems for such codes, representing them as subsets of the set of permutations of $n$ elements equipped with the Kendall tau distance. We derive several lower and upper bounds on the size of codes. These bounds enable us to establish the exact scaling of the size of optimal codes for large values of $n$. We also show the existence of codes whose size is within a constant factor of the sphere packing bound for any fixed number of errors.

154 citations

Proceedings ArticleDOI
07 Jun 2013
TL;DR: This paper bound the minimum distance of a code in terms of of its length, size and locality from a significantly simple analysis and depends on the size of the alphabet being used.
Abstract: In a locally recoverable or repairable code, any symbol of a codeword can be recovered by reading only a small (constant) number of other symbols. The notion of local recoverability is important in the area of distributed storage where a most frequent error-event is a single storage node failure (erasure). A common objective is to repair the node by downloading data from as few other storage node as possible. In this paper, we bound the minimum distance of a code in terms of of its length, size and locality. Unlike previous bounds, our bound follows from a significantly simple analysis and depends on the size of the alphabet being used.

111 citations

Posted Content
TL;DR: A family of vector quantization schemes that provide an asymptotic reduction in the communication cost with convergence guarantees in first-order distributed optimization and it is shown that vqSGD also offers strong privacy guarantees.
Abstract: In this work, we present a family of vector quantization schemes \emph{vqSGD} (Vector-Quantized Stochastic Gradient Descent) that provide an asymptotic reduction in the communication cost with convergence guarantees in first-order distributed optimization. In the process we derive the following fundamental information theoretic fact: $\Theta(\frac{d}{R^2})$ bits are necessary and sufficient to describe an unbiased estimator ${\hat{g}}({g})$ for any ${g}$ in the $d$-dimensional unit sphere, under the constraint that $\|{\hat{g}}({g})\|_2\le R$ almost surely. In particular, we consider a randomized scheme based on the convex hull of a point set, that returns an unbiased estimator of a $d$-dimensional gradient vector with almost surely bounded norm. We provide multiple efficient instances of our scheme, that are near optimal, and require only $o(d)$ bits of communication at the expense of tolerable increase in error. The instances of our quantization scheme are obtained using the properties of binary error-correcting codes and provide a smooth tradeoff between the communication and the estimation error of quantization. Furthermore, we show that \emph{vqSGD} also offers strong privacy guarantees.

90 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Proceedings ArticleDOI
22 Jan 2006
TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.
Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

7,116 citations

Journal ArticleDOI
TL;DR: Expander graphs were first defined by Bassalygo and Pinsker in the early 1970s, and their existence was proved in the late 1970s as discussed by the authors and early 1980s.
Abstract: A major consideration we had in writing this survey was to make it accessible to mathematicians as well as to computer scientists, since expander graphs, the protagonists of our story, come up in numerous and often surprising contexts in both fields But, perhaps, we should start with a few words about graphs in general They are, of course, one of the prime objects of study in Discrete Mathematics However, graphs are among the most ubiquitous models of both natural and human-made structures In the natural and social sciences they model relations among species, societies, companies, etc In computer science, they represent networks of communication, data organization, computational devices as well as the flow of computation, and more In mathematics, Cayley graphs are useful in Group Theory Graphs carry a natural metric and are therefore useful in Geometry, and though they are “just” one-dimensional complexes, they are useful in certain parts of Topology, eg Knot Theory In statistical physics, graphs can represent local connections between interacting parts of a system, as well as the dynamics of a physical process on such systems The study of these models calls, then, for the comprehension of the significant structural properties of the relevant graphs But are there nontrivial structural properties which are universally important? Expansion of a graph requires that it is simultaneously sparse and highly connected Expander graphs were first defined by Bassalygo and Pinsker, and their existence first proved by Pinsker in the early ’70s The property of being an expander seems significant in many of these mathematical, computational and physical contexts It is not surprising that expanders are useful in the design and analysis of communication networks What is less obvious is that expanders have surprising utility in other computational settings such as in the theory of error correcting codes and the theory of pseudorandomness In mathematics, we will encounter eg their role in the study of metric embeddings, and in particular in work around the Baum-Connes Conjecture Expansion is closely related to the convergence rates of Markov Chains, and so they play a key role in the study of Monte-Carlo algorithms in statistical mechanics and in a host of practical computational applications The list of such interesting and fruitful connections goes on and on with so many applications we will not even

2,037 citations

01 Mar 2001
TL;DR: Using singular value decomposition in transforming genome-wide expression data from genes x arrays space to reduced diagonalized "eigengenes" x "eigenarrays" space gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype.
Abstract: ‡We describe the use of singular value decomposition in transforming genome-wide expression data from genes 3 arrays space to reduced diagonalized ‘‘eigengenes’’ 3 ‘‘eigenarrays’’ space, where the eigengenes (or eigenarrays) are unique orthonormal superpositions of the genes (or arrays). Normalizing the data by filtering out the eigengenes (and eigenarrays) that are inferred to represent noise or experimental artifacts enables meaningful comparison of the expression of different genes across different arrays in different experiments. Sorting the data according to the eigengenes and eigenarrays gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype, respectively. After normalization and sorting, the significant eigengenes and eigenarrays can be associated with observed genome-wide effects of regulators, or with measured samples, in which these regulators are overactive or underactive, respectively.

1,815 citations