scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Information Theory in 2020"


Journal ArticleDOI
TL;DR: Novel coded computation strategies for distributed matrix–matrix products that outperform the recent “Polynomial code” constructions in recovery threshold, i.e., the required number of successful workers are provided.
Abstract: We provide novel coded computation strategies for distributed matrix–matrix products that outperform the recent “Polynomial code” constructions in recovery threshold, i.e., the required number of successful workers. When a fixed $1/m$ fraction of each matrix can be stored at each worker node, Polynomial codes require $m^{2}$ successful workers, while our MatDot codes only require $2m-1$ successful workers. However, MatDot codes have higher computation cost per worker and higher communication cost from each worker to the fusion node. We also provide a systematic construction of MatDot codes. Furthermore, we propose “PolyDot” coding that interpolates between Polynomial codes and MatDot codes to trade off computation/communication costs and recovery thresholds. Finally, we demonstrate a novel coding technique for multiplying $n$ matrices ( $n \geq 3$ ) using ideas from MatDot and PolyDot codes.

217 citations


Journal ArticleDOI
TL;DR: While evaluating bilinear complexity is a well-known challenging problem, it is shown that optimal recovery threshold for linear coding strategies can be approximated within a factor of 2 of this fundamental quantity.
Abstract: We consider the problem of massive matrix multiplication, which underlies many data analytic applications, in a large-scale distributed system comprising a group of worker nodes. We target the stragglers’ delay performance bottleneck, which is due to the unpredictable latency in waiting for slowest nodes (or stragglers) to finish their tasks. We propose a novel coding strategy, named entangled polynomial code , for designing the intermediate computations at the worker nodes in order to minimize the recovery threshold (i.e., the number of workers that we need to wait for in order to compute the final output). We demonstrate the optimality of entangled polynomial code in several cases, and show that it provides orderwise improvement over the conventional schemes for straggler mitigation. Furthermore, we characterize the optimal recovery threshold among all linear coding strategies within a factor of 2 using bilinear complexity , by developing an improved version of the entangled polynomial code. In particular, while evaluating bilinear complexity is a well-known challenging problem, we show that optimal recovery threshold for linear coding strategies can be approximated within a factor of 2 of this fundamental quantity. On the other hand, the improved version of the entangled polynomial code enables further and orderwise reduction in the recovery threshold, compared to its basic version. Finally, we show that the techniques developed in this paper can also be extended to several other problems such as coded convolution and fault-tolerant computing, leading to tight characterizations.

180 citations


Journal ArticleDOI
Roy D. Yates1
TL;DR: In this article, the authors derived first order linear differential equations for the temporal evolution of both the moments and the moment generating function (MGF) of the AoI vector components and showed that the existence of a non-negative fixed point for the first moment is sufficient to guarantee convergence of all higher order moments as well as a region of convergence for the stationary MGF vector of the age.
Abstract: A source provides status updates to monitors through a network with state defined by a continuous-time finite Markov chain. An age of information (AoI) metric is used to characterize timeliness by the vector of ages tracked by the monitors. Based on a stochastic hybrid systems (SHS) approach, first order linear differential equations are derived for the temporal evolution of both the moments and the moment generating function (MGF) of the age vector components. It is shown that the existence of a non-negative fixed point for the first moment is sufficient to guarantee convergence of all higher order moments as well as a region of convergence for the stationary MGF vector of the age. The stationary MGF vector is then found for the age on a line network of preemptive memoryless servers. From this MGF, it is found that the age at a node is identical in distribution to the sum of independent exponential service times. This observation is then generalized to linear status sampling networks in which each node receives samples of the update process at each preceding node according to a renewal point process. For each node in the line, the age is shown to be identical in distribution to a sum of independent renewal process age random variables.

174 citations


Journal ArticleDOI
TL;DR: In this article, an approximate variant of the gradient coding problem is introduced, in which they settle for approximate gradient computation instead of the exact one, which enables graceful degradation, i.e., the approximation error of the approximate gradient is a decreasing function of the number of straggglers.
Abstract: Gradient coding is a technique for straggler mitigation in distributed learning. In this paper we design novel gradient codes using tools from classical coding theory, namely, cyclic MDS codes, which compare favorably with existing solutions, both in the applicable range of parameters and in the complexity of the involved algorithms. Second, we introduce an approximate variant of the gradient coding problem, in which we settle for approximate gradient computation instead of the exact one. This approach enables graceful degradation, i.e., the $\ell _{2}$ error of the approximate gradient is a decreasing function of the number of stragglers. Our main result is that normalized adjacency matrices of expander graphs yield excellent approximate gradient codes, which enable significantly less computation compared to exact gradient coding, and guarantee faster convergence than trivial solutions under standard assumptions. We experimentally test our approach on Amazon EC2, and show that the generalization error of approximate gradient coding is very close to the full gradient while requiring significantly less computation from the workers.

149 citations


Journal ArticleDOI
TL;DR: In this article, the problem of PIR with side information was studied in the presence of prior side information, where side information can be obtained opportunistically from other users or has previously downloaded some messages using classical PIR schemes.
Abstract: We study the problem of Private Information Retrieval (PIR) in the presence of prior side information. The problem setup includes a database of $K$ independent messages possibly replicated on several servers, and a user that needs to retrieve one of these messages. In addition, the user has some prior side information in the form of a subset of $M$ messages, not containing the desired message and unknown to the servers. This problem is motivated by practical settings in which the user can obtain side information opportunistically from other users or has previously downloaded some messages using classical PIR schemes. The objective of the user is to retrieve the required message with downloading minimum amount of data from the servers while achieving information-theoretic privacy in one of the following two scenarios: (i) the user wants to protect jointly the identities of the demand and the side information; (ii) the user wants to protect only the identity of the demand, but not necessarily the side information. To highlight the role of side information, we focus first on the case of a single server (single database). In the first scenario, we prove that the minimum download cost is $K-M$ messages, and in the second scenario it is $\lceil K/(M+1)\rceil $ messages, which should be compared to $K$ messages—the minimum download cost in the case of no side information. Then, we extend some of our results to the case of the database replicated on multiple servers. Our proof techniques relate PIR with side information to the index coding problem. We leverage this connection to prove converse results, as well as to design achievability schemes.

144 citations


Journal ArticleDOI
TL;DR: The proposed divide-and-conquer approach leverages recent advances in compressed sensing and forward error correction to produce a novel uncoordinated access paradigm, along with a computationally efficient decoding algorithm.
Abstract: This article introduces a novel scheme, termed coded compressed sensing, for unsourced multiple-access communication. The proposed divide-and-conquer approach leverages recent advances in compressed sensing and forward error correction to produce a novel uncoordinated access paradigm, along with a computationally efficient decoding algorithm. Within this framework, every active device partitions its data into several sub-blocks and, subsequently, adds redundancy using a systematic linear block code. Compressed sensing techniques are then employed to recover sub-blocks up to a permutation of their order, and the original messages are obtained by stitching fragments together using a tree-based algorithm. The error probability and computational complexity of this access paradigm are characterized. An optimization framework, which exploits the tradeoff between performance and computational complexity, is developed to assign parity-check bits to each sub-block. In addition, two emblematic parity bit allocation strategies are examined and their performances are analyzed in the limit as the number of active users and their corresponding payloads tend to infinity. The number of channel uses needed and the computational complexity associated with these allocation strategies are established for various scaling regimes. Numerical results demonstrate that coded compressed sensing outperforms other existing practical access strategies over a range of operational scenarios.

129 citations


Journal ArticleDOI
TL;DR: In this article, the authors considered an energy-harvesting sensor node that is sending status updates to a destination, and the goal is to design status update transmission times (policy) such that the long term average AoI is minimized.
Abstract: An energy-harvesting sensor node that is sending status updates to a destination is considered. The sensor is equipped with a battery of finite size to save its incoming energy, and consumes one unit of energy per status update transmission, which is delivered to the destination instantly over an error-free channel. The setting is online in which the harvested energy is revealed to the sensor causally over time after it arrives, and the goal is to design status update transmission times (policy) such that the long term average age of information (AoI) is minimized. The AoI is defined as the time elapsed since the latest update has reached at the destination. Two energy arrival models are considered: a random battery recharge (RBR) model, and an incremental battery recharge (IBR) model. In both models, energy arrives according to a Poisson process with unit rate, with values that completely fill up the battery in the RBR model, and with values that fill up the battery incrementally in a unit-by-unit fashion in the IBR model. The key approach to characterizing the optimal status update policy for both models is showing the optimality of renewal policies , in which the inter-update times follow a renewal process in a certain manner that depends on the energy arrival model and the battery size. It is then shown that the optimal renewal policy has an energy-dependent threshold structure, in which the sensor sends a status update only if the AoI grows above a certain threshold that depends on the energy available in its battery. For both the random and the incremental battery recharge models, the optimal energy-dependent thresholds are characterized explicitly , i.e., in closed-form, in terms of the optimal long term average AoI. It is also shown that the optimal thresholds are monotonically decreasing in the energy available in the battery, and that the smallest threshold, which comes in effect when the battery is full, is equal to the optimal long term average AoI.

129 citations


Journal ArticleDOI
TL;DR: The capacity of TPIR-PSI is shown to be zero, and the problem of symmetric-TPIR with private side information (STPIR- PSI), where the answers from all databases reveal no information about any other message besides the desired message, is considered.
Abstract: We consider the problem of $T$ -Private Information Retrieval with private side information (TPIR-PSI). In this problem, $N$ replicated databases store $K$ independent messages, and a user, equipped with a local cache that holds $M$ messages as side information, wishes to retrieve one of the other $K-M$ messages. The desired message index and the side information must remain jointly private even if any $T$ of the $N$ databases collude. We show that the capacity of TPIR-PSI is $\left ({1+\frac {T}{N}+\cdots +\left ({\frac {T}{N}}\right)^{K-M-1}}\right)^{-1}$ . As a special case obtained by setting $T=1$ , this result settles the capacity of PIR-PSI, an open problem previously noted by Kadhe et al. We also consider the problem of symmetric-TPIR with private side information (STPIR-PSI), where the answers from all $N$ databases reveal no information about any other message besides the desired message. We show that the capacity of STPIR-PSI is $1-\frac {T}{N}$ if the databases have access to common randomness (not available to the user) that is independent of the messages, in an amount that is at least $\frac {T}{N-T}$ bits per desired message bit. Otherwise, the capacity of STPIR-PSI is zero.

120 citations


Journal ArticleDOI
TL;DR: The framework used to define maximal leakage is used to give operational interpretations of commonly used leakage measures, such as Shannon capacity, maximal correlation, and local differential privacy.
Abstract: Given two random variables $X$ and $Y$ , an operational approach is undertaken to quantify the “leakage” of information from $X$ to $Y$ . The resulting measure $\mathcal {L}\left ({X \!\! \to \!\! Y}\right)$ is called maximal leakage , and is defined as the multiplicative increase, upon observing $Y$ , of the probability of correctly guessing a randomized function of $X$ , maximized over all such randomized functions. A closed-form expression for $\mathcal {L}\left ({X \!\! \to \!\! Y}\right)$ is given for discrete $X$ and $Y$ , and it is subsequently generalized to handle a large class of random variables. The resulting properties are shown to be consistent with an axiomatic view of a leakage measure, and the definition is shown to be robust to variations in the setup. Moreover, a variant of the Shannon cipher system is studied, in which performance of an encryption scheme is measured using maximal leakage. A single-letter characterization of the optimal limit of (normalized) maximal leakage is derived and asymptotically-optimal encryption schemes are demonstrated. Furthermore, the sample complexity of estimating maximal leakage from data is characterized up to subpolynomial factors. Finally, the guessing framework used to define maximal leakage is used to give operational interpretations of commonly used leakage measures, such as Shannon capacity, maximal correlation, and local differential privacy.

111 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider a problem of sampling a Wiener process, with samples forwarded to a remote estimator over a channel that is modeled as a queue, and study the optimal online sampling strategy that minimizes the mean square estimation error subject to a sampling rate constraint.
Abstract: In this paper, we consider a problem of sampling a Wiener process, with samples forwarded to a remote estimator over a channel that is modeled as a queue. The estimator reconstructs an estimate of the real-time signal value from causally received samples. We study the optimal online sampling strategy that minimizes the mean square estimation error subject to a sampling rate constraint. We prove that the optimal sampling strategy is a threshold policy, and find the optimal threshold. This threshold is determined by how much the Wiener process varies during the random service time and the maximum allowed sampling rate. Further, if the sampling times are independent of the observed Wiener process, the above sampling problem for minimizing the estimation error is equivalent to a sampling problem for minimizing the age of information. This reveals an interesting connection between the age of information and remote estimation error. Our comparisons show that the estimation error achieved by the optimal sampling policy can be much smaller than those of age-optimal sampling, zero-wait sampling, and periodic sampling.

108 citations


Journal ArticleDOI
TL;DR: In this paper, the authors examine the theoretical properties of enforcing priors provided by generative deep neural networks via empirical risk minimization and show that with high probability, at any point away from small neighborhoods around two scalar multiples of the desired solution, there are no local minima, saddle points, or other stationary points outside these neighborhoods.
Abstract: We examine the theoretical properties of enforcing priors provided by generative deep neural networks via empirical risk minimization. In particular we consider two models, one in which the task is to invert a generative neural network given access to its last layer and another in which the task is to invert a generative neural network given only compressive linear observations of its last layer. We establish that in both cases, in suitable regimes of network layer sizes and a randomness assumption on the network weights, that the non-convex objective function given by empirical risk minimization does not have any spurious stationary points. That is, we establish that with high probability, at any point away from small neighborhoods around two scalar multiples of the desired solution, there is a descent direction. Hence, there are no local minima, saddle points, or other stationary points outside these neighborhoods. These results constitute the first theoretical guarantees which establish the favorable global geometry of these non-convex optimization problems, and they bridge the gap between the empirical success of enforcing deep generative priors and a rigorous understanding of non-linear inverse problems.

Journal ArticleDOI
TL;DR: In this article, the authors propose a general information usage framework to quantify and provably bound the bias and other error metrics of an arbitrary exploratory analysis, and prove that their mutual information based bound is tight in natural settings, and then use it to give rigorous insights into when commonly used procedures do or do not lead to substantially biased estimation.
Abstract: Modern data is messy and high-dimensional, and it is often not clear a priori what are the right questions to ask. Instead, the analyst typically needs to use the data to search for interesting analyses to perform and hypotheses to test. This is an adaptive process, where the choice of analysis to be performed next depends on the results of the previous analyses on the same data. Ultimately, which results are reported can be heavily influenced by the data. It is widely recognized that this process, even if well-intentioned, can lead to biases and false discoveries, contributing to the crisis of reproducibility in science. But while any data-exploration renders standard statistical theory invalid, experience suggests that different types of exploratory analysis can lead to disparate levels of bias, and the degree of bias also depends on the particulars of the data set. In this paper, we propose a general information usage framework to quantify and provably bound the bias and other error metrics of an arbitrary exploratory analysis. We prove that our mutual information based bound is tight in natural settings, and then use it to give rigorous insights into when commonly used procedures do or do not lead to substantially biased estimation. Through the lens of information usage, we analyze the bias of specific exploration procedures such as filtering, rank selection and clustering. Our general framework also naturally motivates randomization techniques that provably reduce exploration bias while preserving the utility of the data analysis. We discuss the connections between our approach and related ideas from differential privacy and blinded data analysis, and supplement our results with illustrative simulations.

Journal ArticleDOI
TL;DR: The work derives the exact optimal worst-case delay and DoF, for a broad range of user-to-cache association profiles where each such profile describes how many users are helped by each cache.
Abstract: The work explores the fundamental limits of coded caching in the setting where a transmitter with potentially multiple ( $N_{0}$ ) antennas serves different users that are assisted by a smaller number of caches. Under the assumption of uncoded cache placement, the work derives the exact optimal worst-case delay and DoF, for a broad range of user-to-cache association profiles where each such profile describes how many users are helped by each cache. This is achieved by presenting an information-theoretic converse based on index coding that succinctly captures the impact of the user-to-cache association, as well as by presenting a coded caching scheme that optimally adapts to the association profile by exploiting the benefits of encoding across users that share the same cache. The work reveals a powerful interplay between shared caches and multiple senders/antennas, where we can now draw the striking conclusion that, as long as each cache serves at least $N_{0}$ users, adding a single degree of cache-redundancy can yield a DoF increase equal to $N_{0}$ , while at the same time — irrespective of the profile — going from 1 to $N_{0}$ antennas reduces the delivery time by a factor of $N_{0}$ . Finally some conclusions are also drawn for the related problem of coded caching with multiple file requests.

Journal ArticleDOI
TL;DR: GASP Codes are shown to outperform all previously known polynomial codes for secure distributed matrix multiplication in terms of download rate.
Abstract: We consider the problem of secure distributed matrix multiplication (SDMM) in which a user wishes to compute the product of two matrices with the assistance of honest but curious servers. We construct polynomial codes for SDMM by studying a combinatorial problem on a special type of addition table, which we call the degree table. The codes are based on arithmetic progressions, and are thus named GASP (Gap Additive Secure Polynomial) Codes. GASP Codes are shown to outperform all previously known polynomial codes for secure distributed matrix multiplication in terms of download rate.

Journal ArticleDOI
TL;DR: In this article, the authors consider the estimation of a signal from the knowledge of its noisy linear random Gaussian projections and show that approximate message-passing always reaches the minimal-mean-square error.
Abstract: We consider the estimation of a signal from the knowledge of its noisy linear random Gaussian projections. A few examples where this problem is relevant are compressed sensing, sparse superposition codes, and code division multiple access. There has been a number of works considering the mutual information for this problem using the replica method from statistical physics. Here we put these considerations on a firm rigorous basis. First, we show, using a Guerra-Toninelli type interpolation, that the replica formula yields an upper bound to the exact mutual information. Secondly, for many relevant practical cases, we present a converse lower bound via a method that uses spatial coupling, state evolution analysis and the I-MMSE theorem. This yields a single letter formula for the mutual information and the minimal-mean-square error for random Gaussian linear estimation of all discrete bounded signals. In addition, we prove that the low complexity approximate message-passing algorithm is optimal outside of the so-called hard phase, in the sense that it asymptotically reaches the minimal-mean-square error. In this work spatial coupling is used primarily as a proof technique. However our results also prove two important features of spatially coupled noisy linear random Gaussian estimation. First there is no algorithmically hard phase. This means that for such systems approximate message-passing always reaches the minimal-mean-square error. Secondly, in the limit of infinitely long coupled chain, the mutual information associated to spatially coupled systems is the same as the one of uncoupled linear random Gaussian estimation.

Journal ArticleDOI
TL;DR: Lower bounds for the sample complexity of learning and testing discrete distributions in this information-constrained setting are derived from a characterization of the contraction in chi-square distance between the observed distributions of the samples when information constraints are placed.
Abstract: Multiple players are each given one independent sample, about which they can only provide limited information to a central referee. Each player is allowed to describe its observed sample to the referee using a channel from a family of channels $\mathcal {W}$ , which can be instantiated to capture, among others, both the communication- and privacy-constrained settings. The referee uses the players’ messages to solve an inference problem on the unknown distribution that generated the samples. We derive lower bounds for the sample complexity of learning and testing discrete distributions in this information-constrained setting. Underlying our bounds is a characterization of the contraction in chi-square distance between the observed distributions of the samples when information constraints are placed. This contraction is captured in a local neighborhood in terms of chi-square and decoupled chi-square fluctuations of a given channel, two quantities we introduce. The former captures the average distance between distributions of channel output for two product distributions on the input, and the latter for a product distribution and a mixture of product distribution on the input. Our bounds are tight for both public- and private-coin protocols. Interestingly, the sample complexity of testing is order-wise higher when restricted to private-coin protocols.

Journal ArticleDOI
TL;DR: This paper provides an explicit error bound on the support matching distance of ESPRIT in terms of the minimum singular value of Vandermonde matrices and establishes the near-optimality of ESP RIT.
Abstract: The problem of imaging point objects can be formulated as estimation of an unknown atomic measure from its ${M}+1$ consecutive noisy Fourier coefficients. The standard resolution of this inverse problem is $1/{M}$ and super-resolution refers to the capability of resolving atoms at a higher resolution. When any two atoms are less than $1/{M}$ apart, this recovery problem is highly challenging and many existing algorithms either cannot deal with this situation or require restrictive assumptions on the sign of the measure. ESPRIT is an efficient method which does not depend on the sign of the measure. This paper provides an explicit error bound on the support matching distance of ESPRIT in terms of the minimum singular value of Vandermonde matrices. When the support consists of multiple well-separated clumps and noise is sufficiently small, the support error by ESPRIT scales like $\text {SRF}^{2\lambda -2} \times \text {Noise}$ , where the Super-Resolution Factor (SRF) governs the difficulty of the problem and $\lambda $ is the cardinality of the largest clump. Our error bound matches the min-max rate of a special model with one clump of closely spaced atoms up to a factor of $M$ in the small noise regime, and therefore establishes the near-optimality of ESPRIT. Our theory is validated by numerical experiments.

Journal ArticleDOI
TL;DR: In this article, the authors derived fundamental performance limits for the caching problem by using tools for the index coding problem that were either known or are newly developed in this work, and proposed a new index coding achievable scheme based on distributed source coding.
Abstract: Caching is an efficient way to reduce network traffic congestion during peak hours, by storing some content at the user’s local cache memory, even without knowledge of user’s later demands. Maddah-Ali and Niesen proposed a two-phase (placement phase and delivery phase) coded caching strategy for broadcast channels with cache-aided users. This paper investigates the same model under the constraint that content is placed uncoded within the caches, that is, when bits of the files are simply copied within the caches. When the cache contents are uncoded and the users’ demands are revealed, the caching problem can be connected to an index coding problem. This paper focuses on deriving fundamental performance limits for the caching problem by using tools for the index coding problem that were either known or are newly developed in this work. First, a converse bound for the caching problem under the constraint of uncoded cache placement is proposed based on the “acyclic index coding converse bound.” This converse bound is proved to be achievable by the Maddah-Ali and Niesen’s scheme when the number of files is not less than the number of users, and by a newly derived index coding achievable scheme otherwise. The proposed index coding achievable scheme is based on distributed source coding and strictly improves on the widely used “composite (index) coding” achievable bound and its improvements, and is of independent interest. An important consequence of the findings of this paper is that advancements on the coded caching problem posed by Maddah-Ali and Niesen are thus only possible by considering strategies with coded placement phase. A recent work by Yu et al has however shown that coded cache placement can at most half the network load compared to the results presented in this paper.

Journal ArticleDOI
TL;DR: A fast iterative algorithm, called Tubal-AltMin, that is inspired by a similar approach for low-rank matrix completion is proposed that improves the recovery error by several orders of magnitude and is faster than TNN-ADMM by a factor of 5.
Abstract: The low-tubal-rank tensor model has been recently proposed for real-world multidimensional data. In this paper, we study the low-tubal-rank tensor completion problem, i.e., to recover a third-order tensor by observing a subset of its elements selected uniformly at random. We propose a fast iterative algorithm, called Tubal-AltMin , that is inspired by a similar approach for low-rank matrix completion. The unknown low-tubal-rank tensor is represented as the product of two much smaller tensors with the low-tubal-rank property being automatically incorporated, and Tubal-AltMin alternates between estimating those two tensors using tensor least squares minimization. First, we note that tensor least squares minimization is different from its matrix counterpart and nontrivial as the circular convolution operator of the low-tubal-rank tensor model is intertwined with the sub-sampling operator. Secondly, the theoretical performance guarantee is challenging since Tubal-AltMin is iterative and nonconvex. We prove that 1) Tubal-AltMin generates a best rank- $r$ approximate up to any predefined accuracy $\epsilon $ at an exponential rate, and 2) for an $n \times n \times k$ tensor $\mathcal {M}$ with tubal-rank $r \ll n$ , the required sampling complexity is $O((nr^{2}k ||\mathcal {M}||_{F}^{2} \log ^{3}~n) / \overline {\sigma }_{rk}^{2})$ , where $\overline {\sigma }_{rk}$ is the $rk$ -th singular value of the block diagonal matrix representation of $\mathcal {M}$ in the frequency domain, and the computational complexity is $O(n^{2}r^{2}k^{3} \log n \log (n/\epsilon))$ . Finally, on both synthetic data and real-world video data, evaluation results show that compared with tensor-nuclear norm minimization using alternating direction method of multipliers (TNN-ADMM), Tubal-AltMin-Simple (a simplified implementation of Tubal-AltMin) improves the recovery error by several orders of magnitude. In experiments, Tubal-AltMin-Simple is faster than TNN-ADMM by a factor of 5 for a $200 \times 200 \times 20$ tensor.

Journal ArticleDOI
TL;DR: In this article, leave-one-out analysis is used to obtain fine-grained entry-wise bounds for low-rank matrix completion algorithms in the presence of probabilistic dependency.
Abstract: In this paper, we introduce a powerful technique based on Leave-One-Out analysis to the study of low-rank matrix completion problems. Using this technique, we develop a general approach for obtaining fine-grained, entrywise bounds for iterative stochastic procedures in the presence of probabilistic dependency. We demonstrate the power of this approach in analyzing two of the most important algorithms for matrix completion: (i) the non-convex approach based on Projected Gradient Descent (PGD) for a rank-constrained formulation, also known as the Singular Value Projection algorithm, and (ii) the convex relaxation approach based on nuclear norm minimization (NNM). Using this approach, we establish the first convergence guarantee for the original form of PGD without regularization or sample splitting , and in particular shows that it converges linearly in the infinity norm . For NNM, we use this approach to study a fictitious iterative procedure that arises in the dual analysis . Our results show that NNM recovers an $d$ -by- $d$ rank- $r$ matrix with $\mathcal {O}(\mu r \log (\mu r) d\log d)$ observed entries. This bound has optimal dependence on the matrix dimension and is independent of the condition number. To the best of our knowledge, none of previous sample complexity results for tractable matrix completion algorithms satisfies these two properties simultaneously.

Journal ArticleDOI
TL;DR: In this paper, an optimization framework for coded caching that accounts for various heterogeneous aspects of practical systems is provided, and the optimization framework is used to develop a coded caching scheme capable of handling simultaneous non-uniform file length, file popularity, and user cache size.
Abstract: This paper aims to provide an optimization framework for coded caching that accounts for various heterogeneous aspects of practical systems. An optimization theoretic perspective on the seminal work on the fundamental limits of caching by Maddah-Ali and Niesen is first developed, whereas it is proved that the coded caching scheme presented in that work is the optimal scheme among a large, non-trivial family of possible caching schemes. The optimization framework is then used to develop a coded caching scheme capable of handling simultaneous non-uniform file length, non-uniform file popularity, and non-uniform user cache size. Although the resulting full optimization problem scales exponentially with the problem size, this paper shows that tractable simplifications of the problem that scale as a polynomial function of the problem size can still perform well compared to the original problem. By considering these heterogeneities both individually and in conjunction with one another, evidence of the effect of their interactions and influence on optimal cache content is obtained.

Journal ArticleDOI
TL;DR: In this article, a lower bound of the achievable rate for private information retrieval from MDS coded storage is proved by presenting a novel scheme based on cross-subspace alignment and a successive decoding with interference cancellation strategy.
Abstract: The problem of $X$ -secure $T$ -private information retrieval from MDS coded storage is studied in this paper, where the user wishes to privately retrieve one out of $K$ independent messages that are distributed over $N$ servers according to an MDS code. It is guaranteed that any group of up to $X$ colluding servers learn nothing about the messages and that any group of up to $T$ colluding servers learn nothing about the identity of desired message. A lower bound of achievable rates is proved by presenting a novel scheme based on cross-subspace alignment and a successive decoding with interference cancellation strategy. For large number of messages $(K\rightarrow \infty)$ the achieved rate, which we conjecture to be optimal, improves upon the best known rates previously reported in the literature by Raviv and Karpuk, and generalizes an achievable rate for MDS-TPIR previously found by Freij-Hollanti et al. that is also conjectured to be asymptotically optimal. The setting is then expanded to allow unresponsive and Byzantine servers. Finally, the scheme is applied to find a new lower convex hull of (download, upload) pairs of secure and private distributed matrix multiplication that generalizes, and in certain asymptotic settings strictly improves upon the best known previous results.

Journal ArticleDOI
TL;DR: In this article, the authors considered the problem of PIR from uncoded storage constrained databases, where each database has a storage capacity of $L$ bits, where L$ is the size of each message in bits, and L$ in [{1/N, 1}] is the normalized storage.
Abstract: Private information retrieval (PIR) allows a user to retrieve a desired message from a set of databases without revealing the identity of the desired message. The replicated database scenario, where $N$ databases store each of the $K$ messages was considered by Sun and Jafar, and the optimal download cost was characterized as $\left ({1+ \frac {1}{N}+ \frac {1}{N^{2}}+ \cdots + \frac {1}{N^{K-1}}}\right)$ . In this work, we consider the problem of PIR from uncoded storage constrained databases. Each database has a storage capacity of $\mu KL$ bits, where $L$ is the size of each message in bits, and $\mu \in [{1/N, 1}]$ is the normalized storage. The novel aspect of this work is to characterize the optimum download cost of PIR from uncoded storage constrained databases for any “normalized storage” value in the range $\mu \in [{1/N, 1}]$ . In particular, for any $(N,K)$ , we show that the optimal trade-off between normalized storage, $\mu $ , and the download cost, $D(\mu)$ , is a piece-wise linear function given by the lower convex hull of the $N$ pairs $\left ({\frac {t}{N}, \left ({1+ \frac {1}{t}+ \frac {1}{t^{2}}+ \cdots + \frac {1}{t^{K-1}}}\right)}\right)$ for $t=1,2,\ldots, N$ . To prove this result, we first present a storage constrained PIR scheme for any $(N,K)$ . Next, we obtain a general lower bound on the download cost for PIR, which is valid for any arbitrary storage architecture. The uncoded storage assumption is then applied which allows us to express the lower bound as a linear program (LP). Finally, we solve the LP to obtain tight lower bounds on the download cost for different regimes of storage, which match the proposed storage constrained PIR scheme.

Journal ArticleDOI
TL;DR: A general upper bound for the problem in the form of a max-min optimization problem, which extends the converse proof of the PIR problem under asymmetric traffic constraints, and proposes an achievability scheme that satisfies the security constraint by encoding a secret key, which is generated securely at each database, into an artificial noise vector using an MDS code.
Abstract: We consider the problem of private information retrieval through wiretap channel II (PIR-WTC-II). In PIR-WTC-II, a user wants to retrieve a single message (file) privately out of M messages, which are stored in N replicated and non-communicating databases. An external eavesdropper observes a fraction $\mu _{\text {n}}$ (of its choice) of the traffic exchanged between the nth database and the user. In addition to the privacy constraint, the databases should encode the returned answer strings such that the eavesdropper learns absolutely nothing about the contents of the databases. We aim at characterizing the capacity of the PIR-WTC-II under the combined privacy and security constraints. We obtain a general upper bound for the problem in the form of a max-min optimization problem, which extends the converse proof of the PIR problem under asymmetric traffic constraints. We propose an achievability scheme that satisfies the security constraint by encoding a secret key, which is generated securely at each database, into an artificial noise vector using an MDS code. The user and the databases operate at one of the corner points of the achievable scheme for the PIR under asymmetric traffic constraints such that the retrieval rate is maximized under the imposed security constraint. The upper bound and the lower bound match for the case of $\text {M}=2$ and $\text {M}=3$ messages, for any N, and any $\boldsymbol {\mu }=(\mu _{1}, \cdots, \mu _{\text {N}})$ .

Journal ArticleDOI
TL;DR: Exact single-letter characterization of the same is established for the special case of testing against conditional independence, and it is shown to be achieved by the separate HT and channel coding scheme.
Abstract: A distributed binary hypothesis testing (HT) problem involving two parties, one referred to as the observer and the other as the detector is studied. The observer observes a discrete memoryless source (DMS) and communicates its observations to the detector over a discrete memoryless channel (DMC). The detector observes another DMS correlated with that at the observer, and performs a binary hypothesis test on the joint distribution of the two DMS’s using its own observed data and the information received from the observer. The trade-off between the type I error probability and the type II error-exponent of the HT is explored. Single-letter lower bounds on the optimal type II error-exponent are obtained by using two different coding schemes, a separate HT and channel coding scheme and a joint HT and channel coding scheme based on hybrid coding for the matched bandwidth case. Exact single-letter characterization of the same is established for the special case of testing against conditional independence, and it is shown to be achieved by the separate HT and channel coding scheme. An example is provided where the joint scheme achieves a strictly better performance than the separation based scheme.

Journal ArticleDOI
TL;DR: Tang et al. as mentioned in this paper constructed three-weight linear codes from weakly regular plateaued functions based on the second generic construction and then determined their weight distributions, which can be directly employed to obtain (democratic) secret sharing schemes, which have diverse applications in the industry.
Abstract: Minimal linear codes have significant applications in secret sharing schemes and secure two-party computation. There are several methods to construct linear codes, one of which is based on functions over finite fields. Recently, many construction methods for linear codes from functions have been proposed in the literature. In this paper, we generalize the recent construction methods given by Tang et al. in [IEEE Transactions on Information Theory, 62(3), 1166-1176, 2016] to weakly regular plateaued functions over finite fields of odd characteristic. We first construct three-weight linear codes from weakly regular plateaued functions based on the second generic construction and then determine their weight distributions. We also give a punctured version and subcode of each constructed code. We note that they may be (almost) optimal codes and can be directly employed to obtain (democratic) secret sharing schemes, which have diverse applications in the industry. We next observe that the constructed codes are minimal for almost all cases and finally describe the access structures of the secret sharing schemes based on their dual codes.

Journal ArticleDOI
TL;DR: In this paper, the convergence of empirical measures smoothed by a Gaussian kernel was studied in terms of the Wasserstein distance, total variation (TV), Kullback-Leibler (KL) divergence, and divergence divergence.
Abstract: This paper studies convergence of empirical measures smoothed by a Gaussian kernel. Specifically, consider approximating $P\ast \mathcal {N}_\sigma $ , for $\mathcal {N}_\sigma \triangleq \mathcal {N}(0,\sigma ^{2} \mathrm {I}_{d})$ , by $\hat {P}_{n}\ast \mathcal {N}_\sigma $ under different statistical distances, where $\hat {P}_{n}$ is the empirical measure. We examine the convergence in terms of the Wasserstein distance, total variation (TV), Kullback-Leibler (KL) divergence, and $\chi ^{2}$ -divergence. We show that the approximation error under the TV distance and 1-Wasserstein distance ( $\mathsf {W}_{1}$ ) converges at the rate $e^{O(d)} n^{-1/2}$ in remarkable contrast to a (typical) $n^{-\frac {1}{d}}$ rate for unsmoothed $\mathsf {W}_{1}$ (and $d\ge 3$ ). Similarly, for the KL divergence, squared 2-Wasserstein distance ( $\mathsf {W}_{2}^{2}$ ), and $\chi ^{2}$ -divergence, the convergence rate is $e^{O(d)} n^{-1}$ , but only if $P$ achieves finite input-output $\chi ^{2}$ mutual information across the additive white Gaussian noise (AWGN) channel. If the latter condition is not met, the rate changes to $\omega \left ({n^{-1}}\right)$ for the KL divergence and $\mathsf {W}_{2}^{2}$ , while the $\chi ^{2}$ -divergence becomes infinite – a curious dichotomy. As an application we consider estimating the differential entropy $h(S+Z)$ , where $S\sim P$ and $Z\sim \mathcal {N}_\sigma $ are independent $d$ -dimensional random variables. The distribution $P$ is unknown and belongs to some nonparametric class, but n independently and identically distributed (i.i.d) samples from it are available. Despite the regularizing effect of noise, we first show that any good estimator (within an additive gap) for this problem must have a sample complexity that is exponential in d . We then leverage the above empirical approximation results to show that the absolute-error risk of the plug-in estimator converges as $e^{O(d)} n^{-1/2}$ , thus attaining the parametric rate in n . This establishes the plug-in estimator as minimax rate-optimal for the considered problem, with sharp dependence of the convergence rate both in n and d . We provide numerical results comparing the performance of the plug-in estimator to that of general-purpose (unstructured) differential entropy estimators (based on kernel density estimation (KDE) or k nearest neighbors (kNN) techniques) applied to samples of $S+Z$ . These results reveal a significant empirical superiority of the plug-in to state-of-the-art KDE and kNN methods. As a motivating utilization of the plug-in approach, we estimate information flows in deep neural networks and discuss Tishby’s Information Bottleneck and the compression conjecture, among others.

Journal ArticleDOI
TL;DR: In this paper, the dimensions of Euclidean hulls or Hermitian hulls of the codes in our constructions can take all or almost all possible values, and the required number of maximally entangled states of these MDS EAQECCs with flexible parameters can be obtained.
Abstract: In this paper, we construct several classes of maximum distance separable (MDS) codes via generalized Reed-Solomon (GRS) codes and extended GRS codes, where we can determine the dimensions of their Euclidean hulls or Hermitian hulls. It turns out that the dimensions of Euclidean hulls or Hermitian hulls of the codes in our constructions can take all or almost all possible values. As a consequence, we can apply our results to entanglement-assisted quantum error-correcting codes (EAQECCs) and obtain several new families of MDS EAQECCs with flexible parameters. The required number of maximally entangled states of these MDS EAQECCs can take all or almost all possible values. Moreover, several new classes of q-ary MDS EAQECCs of length $ {n} > {q}+1$ are also obtained.

Journal ArticleDOI
TL;DR: It is shown that the minimum message size (sometimes also referred to as the sub-packetization factor) is significantly, in fact exponentially, lower than previously believed.
Abstract: We consider constructing capacity-achieving linear codes with minimum message size for private information retrieval (PIR) from N non-colluding databases, where each message is coded using maximum distance separable (MDS) codes, such that it can be recovered from accessing the contents of any T databases. It is shown that the minimum message size (sometimes also referred to as the sub-packetization factor) is significantly, in fact exponentially, lower than previously believed. More precisely, when ${K}> {T}/\text{gcd}({N},{T})$ where K is the total number of messages in the system and $\gcd (\cdot,\cdot)$ means the greatest common divisor, we establish, by providing both novel code constructions and a matching converse, the minimum message size as ${{\textrm {lcm}}}({N}-{T},{T})$ , where ${{\textrm {lcm}}}(\cdot,\cdot)$ means the least common multiple. On the other hand, when $K$ is small, we show that it is in fact possible to design codes with a message size even smaller than ${{\textrm {lcm}}}({N}-{T},{T})$ .

Journal ArticleDOI
TL;DR: It is shown that the NFDM AIR is greater than the WDM AIR subject to a bandwidth and average power constraint, in a representative system with one symbol per user, the improvement results from nonlinear signal multiplexing.
Abstract: Two signal multiplexing schemes for optical fiber communication are considered: Wavelength-division multiplexing (WDM) and nonlinear frequency-division multiplexing (NFDM), based on the nonlinear Fourier transform. Achievable information rates (AIRs) of NFDM and WDM are compared in a network scenario with an ideal lossless model of the optical fiber in the defocusing regime. It is shown that the NFDM AIR is greater than the WDM AIR subject to a bandwidth and average power constraint, in a representative system with one symbol per user. The improvement results from nonlinear signal multiplexing.