scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Information Theory in 2015"


Journal ArticleDOI
TL;DR: Simulations show that the resulting performance is very close to that of maximum-likelihood decoding, even for moderate values of L, and it is shown that such a genie can be easily implemented using simple CRC precoding.
Abstract: We describe a successive-cancellation list decoder for polar codes, which is a generalization of the classic successive-cancellation decoder of Arikan. In the proposed list decoder, $L$ decoding paths are considered concurrently at each decoding stage, where $L$ is an integer parameter. At the end of the decoding process, the most likely among the $L$ paths is selected as the single codeword at the decoder output. Simulations show that the resulting performance is very close to that of maximum-likelihood decoding, even for moderate values of $L$ . Alternatively, if a genie is allowed to pick the transmitted codeword from the list, the results are comparable with the performance of current state-of-the-art LDPC codes. We show that such a genie can be easily implemented using simple CRC precoding. The specific list-decoding algorithm that achieves this performance doubles the number of decoding paths for each information bit, and then uses a pruning procedure to discard all but the $L$ most likely paths. However, straightforward implementation of this algorithm requires $\Omega (L n^{2})$ time, which is in stark contrast with the $O(n \log n)$ complexity of the original successive-cancellation decoder. In this paper, we utilize the structure of polar codes along with certain algorithmic transformations in order to overcome this problem: we devise an efficient, numerically stable, implementation of the proposed list decoder that takes only $O(L n \log n)$ time and $O(L n)$ space.

1,263 citations


Journal ArticleDOI
TL;DR: In this article, a nonconvex formulation of the phase retrieval problem was proposed and a concrete solution algorithm was presented. But the main contribution is that this algorithm is shown to rigorously allow the exact retrieval of phase information from a nearly minimal number of random measurements.
Abstract: We study the problem of recovering the phase from magnitude measurements; specifically, we wish to reconstruct a complex-valued signal $ \boldsymbol {x}\in \mathbb {C}^{n}$ about which we have phaseless samples of the form $y_{r} = \left |{\langle \boldsymbol {a}_{r}, \boldsymbol {x} \rangle }\right |^{2}$ , $r = 1,\ldots , m$ (knowledge of the phase of these samples would yield a linear system). This paper develops a nonconvex formulation of the phase retrieval problem as well as a concrete solution algorithm. In a nutshell, this algorithm starts with a careful initialization obtained by means of a spectral method, and then refines this initial estimate by iteratively applying novel update rules, which have low computational complexity, much like in a gradient descent scheme. The main contribution is that this algorithm is shown to rigorously allow the exact retrieval of phase information from a nearly minimal number of random measurements. Indeed, the sequence of successive iterates provably converges to the solution at a geometric rate so that the proposed scheme is efficient both in terms of computational and data resources. In theory, a variation on this scheme leads to a near-linear time algorithm for a physically realizable model based on coded diffraction patterns. We illustrate the effectiveness of our methods with various experiments on image data. Underlying our analysis are insights for the analysis of nonconvex optimization schemes that may have implications for computational problems beyond phase retrieval.

1,096 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider derivative-free algorithms for stochastic and non-stochastic convex optimization problems that use only function values rather than gradients, and show that if pairs of function values are available, algorithms for $d$ -dimensional optimization that use gradient estimates based on random perturbations suffer a factor of at most Ω( √ n/d) in convergence rate over traditional gradient methods.
Abstract: We consider derivative-free algorithms for stochastic and nonstochastic convex optimization problems that use only function values rather than gradients. Focusing on nonasymptotic bounds on convergence rates, we show that if pairs of function values are available, algorithms for $d$ -dimensional optimization that use gradient estimates based on random perturbations suffer a factor of at most $\sqrt {d}$ in convergence rate over traditional stochastic gradient methods. We establish such results for both smooth and nonsmooth cases, sharpening previous analyses that suggested a worse dimension dependence, and extend our results to the case of multiple ( $ {m}\ge 2$ ) evaluations. We complement our algorithmic development with information-theoretic lower bounds on the minimax convergence rate of such problems, establishing the sharpness of our achievable results up to constant (sometimes logarithmic) factors.

342 citations


Journal ArticleDOI
TL;DR: In this article, a class of two-weight and three-weight linear codes over GF(p) was constructed, and their application in secret sharing was investigated, and some of the linear codes obtained are optimal in the sense that they meet certain bounds on linear codes.
Abstract: In this paper, a class of two-weight and three-weight linear codes over $ {\mathrm {GF}}(p)$ is constructed, and their application in secret sharing is investigated. Some of the linear codes obtained are optimal in the sense that they meet certain bounds on linear codes. These codes have applications also in authentication codes, association schemes, and strongly regular graphs, in addition to their applications in consumer electronics, communication and data storage systems.

284 citations


Journal ArticleDOI
TL;DR: This framework applies to arbitrary structure-inducing norms as well as to a wide range of measurement ensembles, and allows us to give sample complexity bounds for problems such as sparse phase retrieval and low-rank tensor completion.
Abstract: Recovering structured models (e.g., sparse or group-sparse vectors, low-rank matrices) given a few linear observations have been well-studied recently. In various applications in signal processing and machine learning, the model of interest is structured in several ways, for example, a matrix that is simultaneously sparse and low rank. Often norms that promote the individual structures are known, and allow for recovery using an orderwise optimal number of measurements (e.g., $\ell _{1}$ norm for sparsity, nuclear norm for matrix rank). Hence, it is reasonable to minimize a combination of such norms. We show that, surprisingly, using multiobjective optimization with these norms can do no better, orderwise, than exploiting only one of the structures, thus revealing a fundamental limitation in sample complexity. This result suggests that to fully exploit the multiple structures, we need an entirely new convex relaxation. Further, specializing our results to the case of sparse and low-rank matrices, we show that a nonconvex formulation recovers the model from very few measurements (on the order of the degrees of freedom), whereas the convex problem combining the $\ell _{1}$ and nuclear norms requires many more measurements, illustrating a gap between the performance of the convex and nonconvex recovery problems. Our framework applies to arbitrary structure-inducing norms as well as to a wide range of measurement ensembles. This allows us to give sample complexity bounds for problems such as sparse phase retrieval and low-rank tensor completion.

263 citations


Journal ArticleDOI
TL;DR: This paper constructs protograph-based spatially coupled low-density parity-check codes by coupling together a series of L disjoint, or uncoupled, LDPC code Tanner graphs into a single coupled chain, and obtains sequences of asymptotically good LDPC codes with fast convergence rates and BP thresholds close to the Shannon limit.
Abstract: In this paper, we construct protograph-based spatially coupled low-density parity-check (LDPC) codes by coupling together a series of $L$ disjoint, or uncoupled, LDPC code Tanner graphs into a single coupled chain. By varying $L$ , we obtain a flexible family of code ensembles with varying rates and frame lengths that can share the same encoding and decoding architecture for arbitrary $L$ . We demonstrate that the resulting codes combine the best features of optimized irregular and regular codes in one design: capacity approaching iterative belief propagation (BP) decoding thresholds and linear growth of minimum distance with block length. In particular, we show that, for sufficiently large $L$ , the BP thresholds on both the binary erasure channel and the binary-input additive white Gaussian noise channel saturate to a particular value significantly better than the BP decoding threshold and numerically indistinguishable from the optimal maximum a posteriori decoding threshold of the uncoupled LDPC code. When all variable nodes in the coupled chain have degree greater than two, asymptotically the error probability converges at least doubly exponentially with decoding iterations and we obtain sequences of asymptotically good LDPC codes with fast convergence rates and BP thresholds close to the Shannon limit. Further, the gap to capacity decreases as the density of the graph increases, opening up a new way to construct capacity achieving codes on memoryless binary-input symmetric-output channels with low-complexity BP decoding.

237 citations


Journal ArticleDOI
TL;DR: In this article, a different method of constructing linear codes using specific classes of 2-designs is studied, and linear codes with a few weights are obtained from almost difference sets, difference sets and a type of 2 -designs associated to semibent functions.
Abstract: A classical method of constructing a linear code over $ {\mathrm {GF}}(q)$ with a $t$ -design is to use the incidence matrix of the $t$ -design as a generator matrix over $ {\mathrm {GF}}(q)$ of the code. This approach has been extensively investigated in the literature. In this paper, a different method of constructing linear codes using specific classes of 2-designs is studied, and linear codes with a few weights are obtained from almost difference sets, difference sets, and a type of 2-designs associated to semibent functions. Two families of the codes obtained in this paper are optimal. The linear codes presented in this paper have applications in secret sharing and authentication schemes, in addition to their applications in consumer electronics, communication and data storage systems. A coding-theory approach to the characterization of highly nonlinear Boolean functions is presented.

230 citations


Journal ArticleDOI
TL;DR: A bipartite graph representation of the SIC process, resembling iterative decoding of generalized low-density parity-check codes over the erasure channel, is exploited to optimize the selection probabilities of the component erasure correcting codes through a density evolution analysis.
Abstract: In this paper, a random access scheme is introduced, which relies on the combination of packet erasure correcting codes and successive interference cancellation (SIC). The scheme is named coded slotted ALOHA. A bipartite graph representation of the SIC process, resembling iterative decoding of generalized low-density parity-check codes over the erasure channel, is exploited to optimize the selection probabilities of the component erasure correcting codes through a density evolution analysis. The capacity (in packets per slot) of the scheme is then analyzed in the context of the collision channel without feedback. Moreover, a capacity bound is developed, and component code distributions tightly approaching the bound are derived.

212 citations


Journal ArticleDOI
TL;DR: This paper explores a quadratic (or rank-one) measurement model which imposes minimal memory requirements and low computational complexity during the sampling process, and is shown to be optimal in preserving various low-dimensional covariance structures.
Abstract: Statistical inference and information processing of high-dimensional data often require an efficient and accurate estimation of their second-order statistics. With rapidly changing data, limited processing power and storage at the acquisition devices, it is desirable to extract the covariance structure from a single pass over the data and a small number of stored measurements. In this paper, we explore a quadratic (or rank-one) measurement model which imposes minimal memory requirements and low computational complexity during the sampling process, and is shown to be optimal in preserving various low-dimensional covariance structures. Specifically, four popular structural assumptions of covariance matrices, namely, low rank, Toeplitz low rank, sparsity, jointly rank-one and sparse structure, are investigated, while recovery is achieved via convex relaxation paradigms for the respective structure. The proposed quadratic sampling framework has a variety of potential applications, including streaming data processing, high-frequency wireless communication, phase space tomography and phase retrieval in optics, and noncoherent subspace detection. Our method admits universally accurate covariance estimation in the absence of noise, as soon as the number of measurements exceeds the information theoretic limits. We also demonstrate the robustness of this approach against noise and imperfect structural assumptions. Our analysis is established upon a novel notion called the mixed-norm restricted isometry property (RIP- $\ell _{2}/\ell _{1}$ ), as well as the conventional RIP- $\ell _{2}/\ell _{2}$ for near-isotropic and bounded measurements. In addition, our results improve upon the best-known phase retrieval (including both dense and sparse signals) guarantees using PhaseLift with a significantly simpler approach.

198 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that it is not necessary to assume joint incoherence, which is a standard but unintuitive and restrictive condition that is imposed by previous studies, and thus it is possible to obtain a sample complexity bound that is orderwise optimal with respect to the incoherence parameter (up to a log factor).
Abstract: This paper considers the matrix completion problem. We show that it is not necessary to assume joint incoherence, which is a standard but unintuitive and restrictive condition that is imposed by previous studies. This leads to a sample complexity bound that is orderwise optimal with respect to the incoherence parameter (as well as to the rank $r$ and the matrix dimension $n$ up to a log factor). As a consequence, we improve the sample complexity of recovering a semidefinite matrix from $O(nr^{2}\log ^{2}n)$ to $O(nr\log ^{2}n)$ , and the highest allowable rank from $\Theta (\sqrt {n}/\log n)$ to $\Theta (n/\log ^{2}n)$ . The key step in proof is to obtain new bounds in terms of the $\ell _{\infty ,2}$ -norm, defined as the maximum of the row and column norms of a matrix. To illustrate the applicability of our techniques, we discuss extensions to singular value decomposition projection, structured matrix completion and semisupervised clustering, for which we provide orderwise improvements over existing results. Finally, we turn to the closely related problem of low-rank-plus-sparse matrix decomposition. We show that the joint incoherence condition is unavoidable here for polynomial-time algorithms conditioned on the planted clique conjecture. This means it is intractable in general to separate a rank- $\omega (\sqrt {n})$ positive semidefinite matrix and a sparse matrix. Interestingly, our results show that the standard and joint incoherence conditions are associated, respectively, with the information (statistical) and computational aspects of the matrix decomposition problem.

187 citations


Journal ArticleDOI
TL;DR: In this article, a general methodology for the construction and analysis of essentially minimax estimators for a wide class of functionals of finite dimensional parameters, and elaborate on the case of discrete distributions, where the support size $S$ is unknown and may be comparable with or even much larger than the number of observations $n$.
Abstract: We propose a general methodology for the construction and analysis of essentially minimax estimators for a wide class of functionals of finite dimensional parameters, and elaborate on the case of discrete distributions, where the support size $S$ is unknown and may be comparable with or even much larger than the number of observations $n$ . We treat the respective regions where the functional is nonsmooth and smooth separately. In the nonsmooth regime, we apply an unbiased estimator for the best polynomial approximation of the functional whereas, in the smooth regime, we apply a bias-corrected version of the maximum likelihood estimator (MLE). We illustrate the merit of this approach by thoroughly analyzing the performance of the resulting schemes for estimating two important information measures: 1) the entropy $H(P) = \sum _{i = 1}^{S} -p_{i} \ln p_{i}$ and 2) $F_\alpha (P) = \sum _{i = 1}^{S} p_{i}^\alpha ,\alpha >0$ . We obtain the minimax $L_{2}$ rates for estimating these functionals. In particular, we demonstrate that our estimator achieves the optimal sample complexity $n \asymp S/\ln S$ for entropy estimation. We also demonstrate that the sample complexity for estimating $F_\alpha (P),0 , is $n\asymp S^{1/\alpha }/ \ln S$ , which can be achieved by our estimator but not the MLE. For $1 , we show the minimax $L_{2}$ rate for estimating $F_\alpha (P)$ is $(n\ln n)^{-2(\alpha -1)}$ for infinite support size, while the maximum $L_{2}$ rate for the MLE is $n^{-2(\alpha -1^{^{^{}}})}$ . For all the above cases, the behavior of the minimax rate-optimal estimators with $n$ samples is essentially that of the MLE (plug-in rule) with $n\ln n$ samples, which we term “effective sample size enlargement.” We highlight the practical advantages of our schemes for the estimation of entropy and mutual information. We compare our performance with various existing approaches, and demonstrate that our approach reduces running time and boosts the accuracy. Moreover, we show that the minimax rate-optimal mutual information estimator yielded by our framework leads to significant performance boosts over the Chow–Liu algorithm in learning graphical models. The wide use of information measure estimation suggests that the insights and estimators obtained in this paper could be broadly applicable.

Journal ArticleDOI
TL;DR: In this article, the authors presented the first provably accurate feature selection method for $k$ -means clustering and in addition, they presented two feature extraction methods for clustering.
Abstract: We study the topic of dimensionality reduction for $k$ -means clustering. Dimensionality reduction encompasses the union of two approaches: 1) feature selection and 2) feature extraction. A feature selection-based algorithm for $k$ -means clustering selects a small subset of the input features and then applies $k$ -means clustering on the selected features. A feature extraction-based algorithm for $k$ -means clustering constructs a small set of new artificial features and then applies $k$ -means clustering on the constructed features. Despite the significance of $k$ -means clustering as well as the wealth of heuristic methods addressing it, provably accurate feature selection methods for $k$ -means clustering are not known. On the other hand, two provably accurate feature extraction methods for $k$ -means clustering are known in the literature; one is based on random projections and the other is based on the singular value decomposition (SVD). This paper makes further progress toward a better understanding of dimensionality reduction for $k$ -means clustering. Namely, we present the first provably accurate feature selection method for $k$ -means clustering and, in addition, we present two feature extraction methods. The first feature extraction method is based on random projections and it improves upon the existing results in terms of time complexity and number of features needed to be extracted. The second feature extraction method is based on fast approximate SVD factorizations and it also improves upon the existing results in terms of time complexity. The proposed algorithms are randomized and provide constant-factor approximation guarantees with respect to the optimal $k$ -means objective value.

Journal ArticleDOI
TL;DR: In this article, it was shown that the generalized degrees of freedom (GDoF) region under the TIN scheme achieves a polyhedral relaxation of the GDoF region achieved by TIN, and a dual characterization of this polyhedral region via the use of potential functions.
Abstract: It is shown that in the $K$ -user interference channel, if for each user the desired signal strength is no less than the sum of the strengths of the strongest interference from this user and the strongest interference to this user (all values in decibel scale), then the simple scheme of using point-to-point Gaussian codebooks with appropriate power levels at each transmitter and treating interference as noise (TIN) at every receiver (in short, TIN scheme) achieves all points in the capacity region to within a constant gap. The generalized degrees of freedom (GDoF) region under this condition is a polyhedron, which is shown to be fully achieved by the same scheme, without the need for time-sharing. The results are proved by first deriving a polyhedral relaxation of the GDoF region achieved by TIN, and then providing a dual characterization of this polyhedral region via the use of potential functions, and finally proving the optimality of this region in the desired regime.

Journal ArticleDOI
TL;DR: A simple low-complexity subspace clustering algorithm is proposed, which applies spectral clustering to an adjacency matrix obtained by thresholding the correlations between data points, and the results reveal an explicit tradeoff between the affinity of the subspaces and the tolerable noise level.
Abstract: The problem of clustering noisy and incompletely observed high-dimensional data points into a union of low-dimensional subspaces and a set of outliers is considered. The number of subspaces, their dimensions, and their orientations are assumed unknown. We propose a simple low-complexity subspace clustering algorithm, which applies spectral clustering to an adjacency matrix obtained by thresholding the correlations between data points. In other words, the adjacency matrix is constructed from the nearest neighbors of each data point in spherical distance. A statistical performance analysis shows that the algorithm exhibits robustness to additive noise and succeeds even when the subspaces intersect. Specifically, our results reveal an explicit tradeoff between the affinity of the subspaces and the tolerable noise level. We furthermore prove that the algorithm succeeds even when the data points are incompletely observed with the number of missing entries allowed to be (up to a log-factor) linear in the ambient dimension. We also propose a simple scheme that provably detects outliers, and we present numerical results on real and synthetic data.

Journal ArticleDOI
TL;DR: It turns out that the binary Simplex codes satisfy the minimum distance of a code in terms of its length, size, and locality and are the first example of an optimal binary locally repairable code family.
Abstract: In a locally recoverable or repairable code, any symbol of a codeword can be recovered by reading only a small (constant) number of other symbols. The notion of local recoverability is important in the area of distributed storage where a most frequent error-event is a single storage node failure (erasure). A common objective is to repair the node by downloading data from as few other storage nodes as possible. In this paper, we bound the minimum distance of a code in terms of its length, size, and locality. Unlike the previous bounds, our bound follows from a significantly simple analysis and depends on the size of the alphabet being used. It turns out that the binary Simplex codes satisfy our bound with equality; hence, the Simplex codes are the first example of an optimal binary locally repairable code family. We also provide achievability results based on random coding and concatenated codes that are numerically verified to be close to our bounds.

Journal ArticleDOI
TL;DR: In this article, a nearly optimal algorithm for denoising a mixture of sinusoids from noisy equispaced samples was derived by viewing line spectral estimation as a sparse recovery problem with a continuous, infinite dictionary.
Abstract: This paper establishes a nearly optimal algorithm for denoising a mixture of sinusoids from noisy equispaced samples. We derive our algorithm by viewing line spectral estimation as a sparse recovery problem with a continuous, infinite dictionary. We show how to compute the estimator via semidefinite programming and provide guarantees on its mean-squared error rate. We derive a complementary minimax lower bound on this estimation rate, demonstrating that our approach nearly achieves the best possible estimation error. Furthermore, we establish bounds on how well our estimator localizes the frequencies in the signal, showing that the localization error tends to zero as the number of samples grows. We verify our theoretical results in an array of numerical experiments, demonstrating that the semidefinite programming approach outperforms three classical spectral estimation techniques.

Journal ArticleDOI
TL;DR: A detailed transient analysis of the learning behavior of multiagent networks reveals how combination policies influence the learning process of networked agents, and how these policies can steer the convergence point toward any of many possible Pareto optimal solutions.
Abstract: This paper carries out a detailed transient analysis of the learning behavior of multiagent networks, and reveals interesting results about the learning abilities of distributed strategies. Among other results, the analysis reveals how combination policies influence the learning process of networked agents, and how these policies can steer the convergence point toward any of many possible Pareto optimal solutions. The results also establish that the learning process of an adaptive network undergoes three (rather than two) well-defined stages of evolution with distinctive convergence rates during the first two stages, while attaining a finite mean-square-error level in the last stage. The analysis reveals what aspects of the network topology influence performance directly and suggests design procedures that can optimize performance by adjusting the relevant topology parameters. Interestingly, it is further shown that, in the adaptation regime, each agent in a sparsely connected network is able to achieve the same performance level as that of a centralized stochastic-gradient strategy even for left-stochastic combination strategies. These results lead to a deeper understanding and useful insights on the convergence behavior of coupled distributed learners. The results also lead to effective design mechanisms to help diffuse information more thoroughly over networks.

Journal ArticleDOI
TL;DR: In this article, the authors consider a D2D network where the nodes have precached information from a library of available files and characterize the optimal throughput-outage tradeoff in terms of tight scaling laws for various regimes of the system parameters.
Abstract: We consider a wireless device-to-device (D2D) network where the nodes have precached information from a library of available files. Nodes request files at random. If the requested file is not in the on-board cache, then it is downloaded from some neighboring node via one-hop local communication. An outage event occurs when a requested file is not found in the neighborhood of the requesting node, or if the network admission control policy decides not to serve the request. We characterize the optimal throughput-outage tradeoff in terms of tight scaling laws for various regimes of the system parameters, when both the number of nodes and the number of files in the library grow to infinity. Our analysis is based on Gupta and Kumar protocol model for the underlying D2D wireless network, widely used in the literature on capacity scaling laws of wireless networks without caching. Our results show that the combination of D2D spectrum reuse and caching at the user nodes yields a per-user throughput independent of the number of users, for any fixed outage probability in (0, 1). This implies that the D2D caching network is scalable: even though the number of users increases, each user achieves constant throughput. This behavior is very different from the classical Gupta and Kumar result on ad hoc wireless networks, for which the per-user throughput vanishes as the number of users increases. Furthermore, we show that the user throughput is directly proportional to the fraction of cached information over the whole file library size. Therefore, we can conclude that D2D caching networks can turn memory into bandwidth (i.e., doubling the on-board cache memory on the user devices yields a 100% increase of the user throughout).

Journal ArticleDOI
TL;DR: Four classes of dual-containing constacyclic MDS codes are constructed and their parameters are computed, and the quantum M DS codes derived from these parameters have minimum distance bigger than the ones available in the literature.
Abstract: Quantum maximum-distance-separable (MDS) codes form an important class of quantum codes. To get q-ary quantum MDS codes, one of the effective ways is to find linear MDS codes C over F q2 satisfying C ⊥H ⊆ C, where C ⊥H denotes the Hermitian dual code of C. For a linear code C of length n over F q2 , we say that C is a dual-containing code if C ⊥H ⊆ C and C≠ F q2 n . Several classes of new quantum MDS codes with relatively large minimum distance have been produced through dual-containing constacyclic MDS codes. These works motivate us to make a careful study on the existence conditions for dual-containing constacyclic codes. We obtain necessary and sufficient conditions for the existence of dual-containing constacyclic codes. Four classes of dual-containing constacyclic MDS codes are constructed and their parameters are computed. Consequently, the quantum MDS codes are derived from these parameters. The quantum MDS codes exhibited here have minimum distance bigger than the ones available in the literature.

Journal ArticleDOI
TL;DR: This work analyzes RP-based approximations of convex programs, in which the original optimization problem is approximated by solving a lower dimensional problem, and proves that the approximation ratio of this procedure can be bounded in terms of the geometry of the constraint set.
Abstract: Random projection (RP) is a classical technique for reducing storage and computational costs. We analyze RP-based approximations of convex programs, in which the original optimization problem is approximated by solving a lower dimensional problem. Such dimensionality reduction is essential in computation-limited settings, since the complexity of general convex programming can be quite high (e.g., cubic for quadratic programs, and substantially higher for semidefinite programs). In addition to computational savings, RP is also useful for reducing memory usage, and has useful properties for privacy-preserving optimization. We prove that the approximation ratio of this procedure can be bounded in terms of the geometry of the constraint set. For a broad class of RPs, including those based on various sub-Gaussian distributions as well as randomized Hadamard and Fourier transforms, the data matrix defining the cost function can be projected to a dimension proportional to the squared Gaussian width of the tangent cone of the constraint set at the original solution. This effective dimension of the convex program is often substantially smaller than the original dimension. We illustrate consequences of our theory for various cases, including unconstrained and $\ell _{1}$ -constrained least squares, support vector machines, low-rank matrix estimation, and discuss implications for privacy-preserving optimization, as well as connections with denoising and compressed sensing.

Journal ArticleDOI
TL;DR: It is shown that the network coding and index coding problems are equivalent and one can determine the capacity region of a given network coding instance with colocated sources by studying the capacity area of a corresponding index coding instance.
Abstract: We show that the network coding and index coding problems are equivalent. This equivalence holds in the general setting which includes linear and nonlinear codes. Specifically, we present a reduction that maps a network coding instance to an index coding instance while preserving feasibility, i.e., the network coding instance has a feasible solution if and only if the corresponding index coding instance is feasible. In addition, we show that one can determine the capacity region of a given network coding instance with colocated sources by studying the capacity region of a corresponding index coding instance. Previous connections between network and index coding were restricted to the linear case.

Journal ArticleDOI
TL;DR: This paper shows that, under the average error probability formalism, the third-order term in the normal approximation for the additive white Gaussian noise channel with a maximal or equal power constraint is at least 1 over 2 log n+O(1).
Abstract: This paper shows that, under the average error probability formalism, the third-order term in the normal approximation for the additive white Gaussian noise channel with a maximal or equal power constraint is at least $({1}/{2})\log n\,+\,O(1)$ . This improves on the lower bound by Polyanskiys–Poor–Verdu (2010) and matches the upper bound proved by the same authors.

Journal ArticleDOI
TL;DR: In this article, the authors propose a graphical model for representing networks of stochastic processes, the minimal generative model graph, which is based on reduced factorizations of the joint distribution over time.
Abstract: We propose a graphical model for representing networks of stochastic processes, the minimal generative model graph. It is based on reduced factorizations of the joint distribution over time. We show that under appropriate conditions, it is unique and consistent with another type of graphical model, the directed information graph, which is based on a generalization of Granger causality. We demonstrate how directed information quantifies Granger causality in a particular sequential prediction setting. We also develop efficient methods to estimate the topological structure from data that obviate estimating the joint statistics. One algorithm assumes upper bounds on the degrees and uses the minimal dimension statistics necessary. In the event that the upper bounds are not valid, the resulting graph is nonetheless an optimal approximation in terms of Kullback-Leibler (KL) divergence. Another algorithm uses near-minimal dimension statistics when no bounds are known, but the distribution satisfies a certain criterion. Analogous to how structure learning algorithms for undirected graphical models use mutual information estimates, these algorithms use directed information estimates. We characterize the sample-complexity of two plug-in directed information estimators and obtain confidence intervals. For the setting when point estimates are unreliable, we propose an algorithm that uses confidence intervals to identify the best approximation that is robust to estimation error. Last, we demonstrate the effectiveness of the proposed algorithms through the analysis of both synthetic data and real data from the Twitter network. In the latter case, we identify which news sources influence users in the network by merely analyzing tweet times.

Journal ArticleDOI
TL;DR: Two fast numerical methods for computing the nonlinear Fourier transform with respect to the NSE are presented and achieves a runtime of O(D2) floating point operations, where D is the number of sample points.
Abstract: The nonlinear Fourier transform, which is also known as the forward scattering transform, decomposes a periodic signal into nonlinearly interacting waves. In contrast to the common Fourier transform, these waves no longer have to be sinusoidal. Physically relevant waveforms are often available for the analysis instead. The details of the transform depend on the waveforms underlying the analysis, which in turn are specified through the implicit assumption that the signal is governed by a certain evolution equation. For example, water waves generated by the Korteweg–de Vries equation can be expressed in terms of cnoidal waves. Light waves in optical fiber governed by the nonlinear Schrodinger equation (NSE) are another example. Nonlinear analogs of classic problems such as spectral analysis and filtering arise in many applications, with information transmission in optical fiber, as proposed by Yousefi and Kschischang, being a very recent one. The nonlinear Fourier transform is eminently suited to address them—at least from a theoretical point of view. Although numerical algorithms are available for computing the transform, a fast nonlinear Fourier transform that is similarly effective as the fast Fourier transform is for computing the common Fourier transform has not been available so far. The goal of this paper is to address this problem. Two fast numerical methods for computing the nonlinear Fourier transform with respect to the NSE are presented. The first method achieves a runtime of $O(D^{2})$ floating point operations, where $D$ is the number of sample points. The second method applies only to the case where the NSE is defocusing, but it achieves an $O(D\log ^{2}D)$ runtime. Extensions of the results to other evolution equations are discussed as well.

Journal ArticleDOI
TL;DR: Polar codes are introduced for discrete memoryless broadcast channels for m-user deterministic broadcast channels to map uniformly random message bits from m-independent messages to one codeword while satisfying broadcast constraints.
Abstract: Polar codes are introduced for discrete memoryless broadcast channels. For $m$ -user deterministic broadcast channels, polarization is applied to map uniformly random message bits from $m$ -independent messages to one codeword while satisfying broadcast constraints. The polarization-based codes achieve rates on the boundary of the private-message capacity region. For two-user noisy broadcast channels, polar implementations are presented for two information-theoretic schemes: 1) Cover’s superposition codes and 2) Marton’s codes. Due to the structure of polarization, constraints on the auxiliary and channel-input distributions are identified to ensure proper alignment of polarization indices in the multiuser setting. The codes achieve rates on the capacity boundary of a few classes of broadcast channels (e.g., binary-input stochastically degraded). The complexity of encoding and decoding is $ O(n \log n)$ , where $n$ is the block length. In addition, polar code sequences obtain a stretched-exponential decay of $ O(2^{-n^{\beta }})$ of the average block error probability where $0 . Reproducible experiments for finite block lengths $n = 512, 1024, 2048$ corroborate the theory.

Journal ArticleDOI
TL;DR: In this paper, the authors examined the mean-square stability and convergence of the learning process of distributed strategies over graphs and identified conditions on the network topology, utilities, and data in order to ensure stability; the results also identified three distinct stages in the learning behavior of multiagent networks related to transient phases I and II and the steady state phase.
Abstract: Part I of this paper examined the mean-square stability and convergence of the learning process of distributed strategies over graphs. The results identified conditions on the network topology, utilities, and data in order to ensure stability; the results also identified three distinct stages in the learning behavior of multiagent networks related to transient phases I and II and the steady-state phase. This Part II examines the steady-state phase of distributed learning by networked agents. Apart from characterizing the performance of the individual agents, it is shown that the network induces a useful equalization effect across all agents. In this way, the performance of noisier agents is enhanced to the same level as the performance of agents with less noisy data. It is further shown that in the small step-size regime, each agent in the network is able to achieve the same performance level as that of a centralized strategy corresponding to a fully connected network. The results in this part reveal explicitly which aspects of the network topology and operation influence performance and provide important insights into the design of effective mechanisms for the processing and diffusion of information over networks.

Journal ArticleDOI
TL;DR: This paper considers the case of over-complete dictionaries, noisy signals, and possible outliers, thus extending the previous work limited to noiseless settings and/or under complete dictionaries and shows that, with high probability, sparse coding admits a local minimum around the reference dictionary generating the signals.
Abstract: A popular approach within the signal processing and machine learning communities consists in modeling signals as sparse linear combinations of atoms selected from a learned dictionary. While this paradigm has led to numerous empirical successes in various fields ranging from image to audio processing, there have only been a few theoretical arguments supporting these evidences. In particular, sparse coding, or sparse dictionary learning, relies on a non-convex procedure whose local minima have not been fully analyzed yet. In this paper, we consider a probabilistic model of sparse signals, and show that, with high probability, sparse coding admits a local minimum around the reference dictionary generating the signals. This paper considers the case of over-complete dictionaries, noisy signals, and possible outliers, thus extending the previous work limited to noiseless settings and/or undercomplete dictionaries. The analysis we conduct is non-asymptotic and makes it possible to understand how the key quantities of the problem, such as the coherence or the level of noise, can scale with respect to the dimension of the signals, the number of atoms, the sparsity, and the number of observations.

Journal ArticleDOI
TL;DR: A regret minimization algorithm for setting the reserve price in a sequence of second-price auctions, under the assumption that all bids are independently drawn from the same unknown and arbitrary distribution, achieves a regret of O(√T) in asequence of T auctions.
Abstract: We show a regret minimization algorithm for setting the reserve price in a sequence of second-price auctions, under the assumption that all bids are independently drawn from the same unknown and arbitrary distribution. Our algorithm is computationally efficient, and achieves a regret of O(√T) in a sequence of T auctions. This holds even when the number of bidders is stochastic with a known distribution.

Journal ArticleDOI
TL;DR: It is established that the Cauchy-Schwarz divergence between the probability densities of two Poisson point processes is half the squared L2-distance between their intensity functions.
Abstract: In this paper, we extend the notion of Cauchy–Schwarz divergence to point processes and establish that the Cauchy–Schwarz divergence between the probability densities of two Poisson point processes is half the squared ${L^{2}}$ -distance between their intensity functions. Extension of this result to mixtures of Poisson point processes and, in the case where the intensity functions are Gaussian mixtures, closed form expressions for the Cauchy–Schwarz divergence are presented. Our result also implies that the Bhattacharyya distance between the probability distributions of two Poisson point processes is equal to the square of the Hellinger distance between their intensity measures. We illustrate the result via a sensor management application where the system states are modeled as point processes.

Journal ArticleDOI
TL;DR: In this article, a secret-key capacity-achieving polar coding scheme was proposed for the degraded binary memoryless source (DBMS) model with rate-unlimited public communication, the DBMS model with one-way rate-limited public communications, and the 1-to-m broadcast model with uniform marginals.
Abstract: Practical implementations of secret-key generation are often based on sequential strategies, which handle reliability and secrecy in two successive steps, called reconciliation and privacy amplification. In this paper, we propose an alternative approach based on polar codes that jointly deals with reliability and secrecy. Specifically, we propose secret-key capacity-achieving polar coding schemes for the following models: (i) the degraded binary memoryless source (DBMS) model with rate-unlimited public communication, (ii) the DBMS model with one-way rate-limited public communication, (iii) the 1-to-m broadcast model and (iv) the Markov tree model with uniform marginals. For models (i) and (ii) our coding schemes remain valid for non-degraded sources, although they may not achieve the secret-key capacity. For models (i), (ii) and (iii), our schemes rely on pre-shared secret seed of negligible rate; however, we provide special cases of these models for which no seed is required. Finally, we show an application of our results to secrecy and privacy for biometric systems. We thus provide the first examples of low-complexity secret-key capacity-achieving schemes that are able to handle vector quantization for model (ii), or multiterminal communication for models (iii) and (iv).