scispace - formally typeset
Search or ask a question

Showing papers in "Information and Inference: A Journal of the IMA in 2023"


Journal ArticleDOI
TL;DR: In this article , the authors provide scaling conditions on the signal-to-noise ratio under which classical multidimensional scaling followed by a distance-based clustering algorithm can recover the cluster labels of all samples.
Abstract: Classical multidimensional scaling is a widely used dimension reduction technique. Yet few theoretical results characterizing its statistical performance exist. This paper provides a theoretical framework for analyzing the quality of embedded samples produced by classical multidimensional scaling. This lays a foundation for various downstream statistical analyses, and we focus on clustering noisy data. Our results provide scaling conditions on the signal-to-noise ratio under which classical multidimensional scaling followed by a distance-based clustering algorithm can recover the cluster labels of all samples. Simulation studies confirm these scaling conditions are sharp. Applications to the cancer gene-expression data, the single-cell RNA sequencing data and the natural language data lend strong support to the methodology and theory.

3 citations


Journal ArticleDOI
TL;DR: In this article , the authors studied the decoding problem with pairwise Markov models (PMMs) and showed that the hybrid path is a Viterbi path whenever $C$ is big enough.
Abstract: The article studies the decoding problem (also known as the classification or the segmentation problem) with pairwise Markov models (PMMs). A PMM is a process where the observation process and the underlying state sequence form a two-dimensional Markov chain, a natural generalization of hidden Markov model. The standard solutions to the decoding problem are the so-called Viterbi path—a sequence with maximum state path probability given the observations—or the pointwise maximum a posteriori (PMAP) path that maximizes the expected number of correctly classified entries. When the goal is to simultaneously maximize both criterions—conditional probability (corresponding to Viterbi path) and pointwise conditional probability (corresponding to PMAP path)—then they are combined into one single criterion via the regularization parameter $C$. The main objective of the article is to study the behaviour of the solution—called the hybrid path—as $C$ grows. Increasing $C$ increases the conditional probability of the hybrid path and when $C$ is big enough then every hybrid path is a Viterbi path. We show that hybrid paths also approach the Viterbi path locally: we define $m$-locally Viterbi paths and show that the hybrid path is $m$-locally Viterbi whenever $C$ is big enough. This all might lead to an impression that when $C$ is relatively big then any hybrid path that is not yet Viterbi differs from the Viterbi path by a few single entries only. We argue that this intuition is wrong, because when unique and $m$-locally Viterbi, then different hybrid paths differ by at least $m$ entries. Thus, when $C$ increases then the different hybrid paths tend to differ from each other by larger and larger intervals. Hence the hybrid paths might offer a variety of rather different solutions to the decoding problem.

1 citations


Journal ArticleDOI
TL;DR: In this paper , the authors proposed two-sample tests based on integral probability metric (IPM) for high-dimensional samples supported on a low-dimensional manifold, which are adaptive to lowdimensional geometric structure because their performance crucially depends on the intrinsic dimension instead of the data dimension.
Abstract: Abstract Two-sample tests are important areas aiming to determine whether two collections of observations follow the same distribution or not. We propose two-sample tests based on integral probability metric (IPM) for high-dimensional samples supported on a low-dimensional manifold. We characterize the properties of proposed tests with respect to the number of samples $n$ and the structure of the manifold with intrinsic dimension $d$. When an atlas is given, we propose a two-step test to identify the difference between general distributions, which achieves the type-II risk in the order of $n^{-1/\max \{d,2\}}$. When an atlas is not given, we propose Hölder IPM test that applies for data distributions with $(s,\beta )$-Hölder densities, which achieves the type-II risk in the order of $n^{-(s+\beta )/d}$. To mitigate the heavy computation burden of evaluating the Hölder IPM, we approximate the Hölder function class using neural networks. Based on the approximation theory of neural networks, we show that the neural network IPM test has the type-II risk in the order of $n^{-(s+\beta )/d}$, which is in the same order of the type-II risk as the Hölder IPM test. Our proposed tests are adaptive to low-dimensional geometric structure because their performance crucially depends on the intrinsic dimension instead of the data dimension.

Journal ArticleDOI
TL;DR: In this article , the authors show the important roles of sharp minima and strong minima for robust recovery in group-sparsity and low-rank convex regularized optimization problems.
Abstract: Abstract In this paper, we show the important roles of sharp minima and strong minima for robust recovery. We also obtain several characterizations of sharp minima for convex regularized optimization problems. Our characterizations are quantitative and verifiable especially for the case of decomposable norm regularized problems including sparsity, group-sparsity and low-rank convex problems. For group-sparsity optimization problems, we show that a unique solution is a strong solution and obtains quantitative characterizations for solution uniqueness.

Journal ArticleDOI
TL;DR: In this article , the authors present bounds on the rate-distortion function and quantization error of random variables taking values in general measurable spaces such as, e.g. manifolds and fractal sets.
Abstract: Abstract This paper is concerned with the lossy compression of general random variables, specifically with rate-distortion theory and quantization of random variables taking values in general measurable spaces such as, e.g. manifolds and fractal sets. Manifold structures are prevalent in data science, e.g. in compressed sensing, machine learning, image processing and handwritten digit recognition. Fractal sets find application in image compression and in the modeling of Ethernet traffic. Our main contributions are bounds on the rate-distortion function and the quantization error. These bounds are very general and essentially only require the existence of reference measures satisfying certain regularity conditions in terms of small ball probabilities. To illustrate the wide applicability of our results, we particularize them to random variables taking values in (i) manifolds, namely, hyperspheres and Grassmannians and (ii) self-similar sets characterized by iterated function systems satisfying the weak separation property.

Journal ArticleDOI
TL;DR: In this article , the authors show that those embeddings which successfully retain the database topology coincide in persistent homology by introducing two dilation-invariant comparative measures to capture this effect.
Abstract: Abstract Appropriately representing elements in a database so that queries may be accurately matched is a central task in information retrieval; recently, this has been achieved by embedding the graphical structure of the database into a manifold in a hierarchy-preserving manner using a variety of metrics. Persistent homology is a tool commonly used in topological data analysis that is able to rigorously characterize a database in terms of both its hierarchy and connectivity structure. Computing persistent homology on a variety of embedded datasets reveals that some commonly used embeddings fail to preserve the connectivity. We show that those embeddings which successfully retain the database topology coincide in persistent homology by introducing two dilation-invariant comparative measures to capture this effect: in particular, they address the issue of metric distortion on manifolds. We provide an algorithm for their computation that exhibits greatly reduced time complexity over existing methods. We use these measures to perform the first instance of topology-based information retrieval and demonstrate its increased performance over the standard bottleneck distance for persistent homology. We showcase our approach on databases of different data varieties including text, videos and medical images.

Journal ArticleDOI
TL;DR: In this paper , it was shown that the Wasserstein distance is not universally consistent on the space of measures supported in $(0, 1) and hence one should not expect to obtain universal consistency without some restriction on the base metric space.
Abstract: We study the $k$-nearest neighbour classifier ($k$-NN) of probability measures under the Wasserstein distance. We show that the $k$-NN classifier is not universally consistent on the space of measures supported in $(0,1)$. As any Euclidean ball contains a copy of $(0,1)$, one should not expect to obtain universal consistency without some restriction on the base metric space, or the Wasserstein space itself. To this end, via the notion of $\sigma $-finite metric dimension, we show that the $k$-NN classifier is universally consistent on spaces of discrete measures (and more generally, $\sigma $-finite uniformly discrete measures) with rational mass. In addition, by studying the geodesic structures of the Wasserstein spaces for $p=1$ and $p=2$, we show that the $k$-NN classifier is universally consistent on spaces of measures supported on a finite set, the space of Gaussian measures and spaces of measures with finite wavelet series densities.

Journal ArticleDOI
TL;DR: HadRGD as discussed by the authors transforms the standard simplex to the unit sphere and thus transforms the corresponding constrained optimization problem into an optimization problem on a simple, smooth manifold, and proposes several simple, efficient and projection-free algorithms using the manifold structure.
Abstract: Abstract The standard simplex in $\mathbb{R}^{n}$, also known as the probability simplex, is the set of nonnegative vectors whose entries sum up to 1. It frequently appears as a constraint in optimization problems that arise in machine learning, statistics, data science, operations research and beyond. We convert the standard simplex to the unit sphere and thus transform the corresponding constrained optimization problem into an optimization problem on a simple, smooth manifold. We show that Karush-Kuhn-Tucker points and strict-saddle points of the minimization problem on the standard simplex all correspond to those of the transformed problem, and vice versa. So, solving one problem is equivalent to solving the other problem. Then, we propose several simple, efficient and projection-free algorithms using the manifold structure. The equivalence and the proposed algorithm can be extended to optimization problems with unit simplex, weighted probability simplex or $\ell _{1}$-norm sphere constraints. Numerical experiments between the new algorithms and existing ones show the advantages of the new approach. Open source code is available at https://github.com/DanielMckenzie/HadRGD.

Journal ArticleDOI
TL;DR: In this paper , a tensor-norm-constrained estimator for low-rank matrix models was proposed for matrix completion and decentralized sketching, based on tensor products of Banach spaces.
Abstract: Abstract Low-rank matrix models have been universally useful for numerous applications, from classical system identification to more modern matrix completion in signal processing and statistics. The nuclear norm has been employed as a convex surrogate of the low-rankness since it induces a low-rank solution to inverse problems. While the nuclear norm for low rankness has an excellent analogy with the $\ell _1$ norm for sparsity through the singular value decomposition, other matrix norms also induce low-rankness. Particularly as one interprets a matrix as a linear operator between Banach spaces, various tensor product norms generalize the role of the nuclear norm. We provide a tensor-norm-constrained estimator for the recovery of approximately low-rank matrices from local measurements corrupted with noise. A tensor-norm regularizer is designed to adapt to the local structure. We derive statistical analysis of the estimator over matrix completion and decentralized sketching by applying Maurey’s empirical method to tensor products of Banach spaces. The estimator provides a near-optimal error bound in a minimax sense and admits a polynomial-time algorithm for these applications.

Journal ArticleDOI
TL;DR: In this article , the authors derive sharp and uniform nonasymptotic expansions for both estimators in the sparsest possible regime (up to some poly-logarithmic factors) of the underlying comparison graph.
Abstract: Abstract The Bradley–Terry–Luce (BTL) model is a benchmark model for pairwise comparisons between individuals. Despite recent progress on the first-order asymptotics of several popular procedures, the understanding of uncertainty quantification in the BTL model remains largely incomplete, especially when the underlying comparison graph is sparse. In this paper, we fill this gap by focusing on two estimators that have received much recent attention: the maximum likelihood estimator (MLE) and the spectral estimator. Using a unified proof strategy, we derive sharp and uniform non-asymptotic expansions for both estimators in the sparsest possible regime (up to some poly-logarithmic factors) of the underlying comparison graph. These expansions allow us to obtain: (i) finite-dimensional central limit theorems for both estimators; (ii) construction of confidence intervals for individual ranks; (iii) optimal constant of $\ell _2$ estimation, which is achieved by the MLE but not by the spectral estimator. Our proof is based on a self-consistent equation of the second-order remainder vector and a novel leave-two-out analysis.

Journal ArticleDOI
TL;DR: In this article , the authors proposed a metric independent Fréchet mean for sets of graphs which is defined by the difference between the spectra of the adjacency matrices of the graphs.
Abstract: To characterize the location (mean, median) of a set of graphs, one needs a notion of centrality that has been adapted to metric spaces. A standard approach is to consider the Fréchet mean. In practice, computing the Fréchet mean for sets of large graphs presents many computational issues. In this work, we suggest a method that may be used to compute the Fréchet mean for sets of graphs which is metric independent. We show that the technique proposed can be used to determine the Fréchet mean when considering the Hamming distance or a distance defined by the difference between the spectra of the adjacency matrices of the graphs.

Journal ArticleDOI
TL;DR: In this article , the minimum error probability of list hypothesis testing, where an error is defined as the event where the true hypothesis is not in the list output by the test, was studied.
Abstract: We study a variation of Bayesian $M$-ary hypothesis testing in which the test outputs a list of $L$ candidates out of the $M$ possible upon processing the observation. We study the minimum error probability of list hypothesis testing, where an error is defined as the event where the true hypothesis is not in the list output by the test. We derive two exact expressions of the minimum probability or error. The first is expressed as the error probability of a certain non-Bayesian binary hypothesis test and is reminiscent of the meta-converse bound by Polyanskiy, Poor and Verdú (2010). The second, is expressed as the tail probability of the likelihood ratio between the two distributions involved in the aforementioned non-Bayesian binary hypothesis test. Hypothesis testing, error probability, information theory.

Journal ArticleDOI
Li Hu1
TL;DR: In this paper , the authors established an equivalence between adversarial training problems for nonparametric binary classification and a family of regularized risk minimization problems where the regularizer is a nonlocal perimeter functional.
Abstract: Abstract We establish an equivalence between a family of adversarial training problems for non-parametric binary classification and a family of regularized risk minimization problems where the regularizer is a nonlocal perimeter functional. The resulting regularized risk minimization problems admit exact convex relaxations of the type $L^1+\text{(nonlocal)}\operatorname{TV}$, a form frequently studied in image analysis and graph-based learning. A rich geometric structure is revealed by this reformulation which in turn allows us to establish a series of properties of optimal solutions of the original problem, including the existence of minimal and maximal solutions (interpreted in a suitable sense) and the existence of regular solutions (also interpreted in a suitable sense). In addition, we highlight how the connection between adversarial training and perimeter minimization problems provides a novel, directly interpretable, statistical motivation for a family of regularized risk minimization problems involving perimeter/total variation. The majority of our theoretical results are independent of the distance used to define adversarial attacks.

Journal ArticleDOI
TL;DR: In this paper , the authors study linear non-Gaussian graphical models from the perspective of algebraic statistics and show that when the graph is a polytree, these relations form a toric ideal.
Abstract: Abstract In this paper, we study linear non-Gaussian graphical models from the perspective of algebraic statistics. These are acyclic causal models in which each variable is a linear combination of its direct causes and independent noise. The underlying directed causal graph can be identified uniquely via the set of second and third-order moments of all random vectors that lie in the corresponding model. Our focus is on finding the algebraic relations among these moments for a given graph. We show that when the graph is a polytree, these relations form a toric ideal. We construct explicit trek-matrices associated to 2-treks and 3-treks in the graph. Their entries are covariances and third-order moments and their $2$-minors define our model set-theoretically. Furthermore, we prove that their 2-minors also generate the vanishing ideal of the model. Finally, we describe the polytopes of third-order moments and the ideals for models with hidden variables.

Journal ArticleDOI
TL;DR: In this article , a corresponding over-parameterized square loss function is introduced to reconstruct a vector from underdetermined linear measurements, where the vector to be reconstructed is deeply factorized into several vectors.
Abstract: Abstract In deep learning, it is common to overparameterize neural networks, that is, to use more parameters than training samples. Quite surprisingly training the neural network via (stochastic) gradient descent leads to models that generalize very well, while classical statistics would suggest overfitting. In order to gain understanding of this implicit bias phenomenon, we study the special case of sparse recovery (compressed sensing) which is of interest on its own. More precisely, in order to reconstruct a vector from underdetermined linear measurements, we introduce a corresponding overparameterized square loss functional, where the vector to be reconstructed is deeply factorized into several vectors. We show that, if there exists an exact solution, vanilla gradient flow for the overparameterized loss functional converges to a good approximation of the solution of minimal $\ell _1$-norm. The latter is well-known to promote sparse solutions. As a by-product, our results significantly improve the sample complexity for compressed sensing via gradient flow/descent on overparameterized models derived in previous works. The theory accurately predicts the recovery rate in numerical experiments. Our proof relies on analyzing a certain Bregman divergence of the flow. This bypasses the obstacles caused by non-convexity and should be of independent interest.

Journal ArticleDOI
TL;DR: In this article , the authors introduced a new model, i.e., the weighted approximation of the ratio of partial support information for sparse signal recovery, to recover sparse signals from the linear measurements in both noiseless and noisy cases.
Abstract: The ratio of $\ell _{1}$ and $\ell _{2}$ norms, denoted as $\ell _{1}/\ell _{2}$, has presented prominent performance in promoting sparsity. By adding partial support information to the standard $\ell _{1}/\ell _{2}$ minimization, in this paper, we introduce a novel model, i.e. the weighted $\ell _{1}/\ell _{2}$ minimization, to recover sparse signals from the linear measurements. The restricted isometry property based conditions for sparse signal recovery in both noiseless and noisy cases through the weighted $\ell _{1}/\ell _{2}$ minimization are established. And we show that the proposed conditions are weaker than the analogous conditions for standard $\ell _{1}/\ell _{2}$ minimization when the accuracy of the partial support information is at least $50\%$. Moreover, we develop effective algorithms and illustrate our results via extensive numerical experiments on synthetic data in both noiseless and noisy cases.

Journal ArticleDOI
TL;DR: In this paper , the authors studied the convergence behavior of dictionary learning via the Iterative Thresholding and K-residual Means (ITKrM) algorithm and proposed an adaptive dictionary learning algorithm based on an analysis of the residuals using these bad dictionaries.
Abstract: Abstract This paper studies the convergence behaviour of dictionary learning via the Iterative Thresholding and K-residual Means (ITKrM) algorithm. On one hand, it is proved that ITKrM is a contraction under much more relaxed conditions than previously necessary. On the other hand, it is shown that there seem to exist stable fixed points that do not correspond to the generating dictionary, which can be characterised as very coherent. Based on an analysis of the residuals using these bad dictionaries, replacing coherent atoms with carefully designed replacement candidates is proposed. In experiments on synthetic data, this outperforms random or no replacement and always leads to full dictionary recovery. Finally, the question how to learn dictionaries without knowledge of the correct dictionary size and sparsity level is addressed. Decoupling the replacement strategy of coherent or unused atoms into pruning and adding, and slowly carefully increasing the sparsity level, leads to an adaptive version of ITKrM. In several experiments, this adaptive dictionary learning algorithm is shown to recover a generating dictionary from randomly initialized dictionaries of various sizes on synthetic data and to learn meaningful dictionaries on image data.

Journal ArticleDOI
TL;DR: In this paper , the authors propose a group explainability formalism for trained machine learning decision rules, based on their response to the variability of the input variables distribution and quantifies the influence of all input-output observations based on entropic projections.
Abstract: Abstract In this paper, we present a new explainability formalism designed to shed light on how each input variable of a test set impacts the predictions of machine learning models. Hence, we propose a group explainability formalism for trained machine learning decision rules, based on their response to the variability of the input variables distribution. In order to emphasize the impact of each input variable, this formalism uses an information theory framework that quantifies the influence of all input–output observations based on entropic projections. This is thus the first unified and model agnostic formalism enabling data scientists to interpret the dependence between the input variables, their impact on the prediction errors and their influence on the output predictions. Convergence rates of the entropic projections are provided in the large sample case. Most importantly, we prove that computing an explanation in our framework has a low algorithmic complexity, making it scalable to real-life large datasets. We illustrate our strategy by explaining complex decision rules learned using XGBoost, Random Forest or Deep Neural Network classifiers on various datasets such as Adult Income, MNIST, CelebA, Boston Housing, Iris, as well as synthetic ones. We finally make clear its differences with the explainability strategies LIME and SHAP, which are based on single observations. Results can be reproduced using the freely distributed Python toolbox https://gems-ai.aniti.fr/.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a tensor robust principal component analysis (RPCA) algorithm to recover a low-rank tensor from its observations contaminated by sparse corruptions, under the Tucker decomposition.
Abstract: Abstract An increasing number of data science and machine learning problems rely on computation with tensors, which better capture the multi-way relationships and interactions of data than matrices. When tapping into this critical advantage, a key challenge is to develop computationally efficient and provably correct algorithms for extracting useful information from tensor data that are simultaneously robust to corruptions and ill-conditioning. This paper tackles tensor robust principal component analysis (RPCA), which aims to recover a low-rank tensor from its observations contaminated by sparse corruptions, under the Tucker decomposition. To minimize the computation and memory footprints, we propose to directly recover the low-dimensional tensor factors—starting from a tailored spectral initialization—via scaled gradient descent (ScaledGD), coupled with an iteration-varying thresholding operation to adaptively remove the impact of corruptions. Theoretically, we establish that the proposed algorithm converges linearly to the true low-rank tensor at a constant rate that is independent with its condition number, as long as the level of corruptions is not too large. Empirically, we demonstrate that the proposed algorithm achieves better and more scalable performance than state-of-the-art tensor RPCA algorithms through synthetic experiments and real-world applications.

Journal ArticleDOI
TL;DR: In this paper , the authors proposed a new framework to construct confidence sets for a $d$-dimensional unknown sparse parameter under the normal mean model, which is called sparse confidence set.
Abstract: Abstract In this paper, we propose a new framework to construct confidence sets for a $d$-dimensional unknown sparse parameter ${\boldsymbol \theta }$ under the normal mean model ${\boldsymbol X}\sim N({\boldsymbol \theta },\sigma ^{2}\bf{I})$. A key feature of the proposed confidence set is its capability to account for the sparsity of ${\boldsymbol \theta }$, thus named as sparse confidence set. This is in sharp contrast with the classical methods, such as the Bonferroni confidence intervals and other resampling-based procedures, where the sparsity of ${\boldsymbol \theta }$ is often ignored. Specifically, we require the desired sparse confidence set to satisfy the following two conditions: (i) uniformly over the parameter space, the coverage probability for ${\boldsymbol \theta }$ is above a pre-specified level; (ii) there exists a random subset $S$ of $\{1,...,d\}$ such that $S$ guarantees the pre-specified true negative rate for detecting non-zero $\theta _{j}$’s. To exploit the sparsity of ${\boldsymbol \theta }$, we allow the confidence interval for $\theta _{j}$ to degenerate to a single point 0 for any $j otin S$. Under this new framework, we first consider whether there exist sparse confidence sets that satisfy the above two conditions. To address this question, we establish a non-asymptotic minimax lower bound for the non-coverage probability over a suitable class of sparse confidence sets. The lower bound deciphers the role of sparsity and minimum signal-to-noise ratio (SNR) in the construction of sparse confidence sets. Furthermore, under suitable conditions on the SNR, a two-stage procedure is proposed to construct a sparse confidence set. To evaluate the optimality, the proposed sparse confidence set is shown to attain a minimax lower bound of some properly defined risk function up to a constant factor. Finally, we develop an adaptive procedure to the unknown sparsity. Numerical studies are conducted to verify the theoretical results.

Journal ArticleDOI
TL;DR: In this article , the authors considered non-adaptive threshold group testing with consecutive positives in which the items are linearly ordered and the positives are consecutive in that order, and they showed that by designing deterministic and strongly explicit measurement matrices, they can achieve the state-of-the-art performance.
Abstract: Abstract Given up to $d$ positive items in a large population of $n$ items ($d \ll n$), the goal of threshold group testing is to efficiently identify the positives via tests, where a test on a subset of items is positive if the subset contains at least $u$ positive items, negative if it contains up to $\ell $ positive items and arbitrary (either positive or negative) otherwise. The parameter $g = u - \ell - 1$ is called the gap. In non-adaptive strategies, all tests are fixed in advance and can be represented as a measurement matrix, in which each row and column represent a test and an item, respectively. In this paper, we consider non-adaptive threshold group testing with consecutive positives in which the items are linearly ordered and the positives are consecutive in that order. We show that by designing deterministic and strongly explicit measurement matrices, $\lceil \log _{2}{\lceil \frac {n}{d} \rceil } \rceil + 2d + 3$ (respectively, $\lceil \log _{2}{\lceil \frac {n}{d} \rceil } \rceil + 3d$) tests suffice to identify the positives in $O \left ( \log _{2}{\frac {n}{d}} + d \right )$ time when $g = 0$ (respectively, $g> 0$). The results significantly improve the state-of-the-art scheme that needs $15 \lceil \log _{2}{\lceil \frac {n}{d} \rceil } \rceil + 4d + 71$ tests to identify the positives in $O \left ( \frac {n}{d} \log _{2}{\frac {n}{d}} + ud^{2} \right )$ time, and whose associated measurement matrices are random and (non-strongly) explicit.

Journal ArticleDOI
TL;DR: In this paper , the authors develop deterministic perturbation bounds for singular values and vectors of orthogonally decomposable tensors, in a spirit similar to classical results for matrices such as those due to Weyl, Davis, Kahan and Wedin.
Abstract: We develop deterministic perturbation bounds for singular values and vectors of orthogonally decomposable tensors, in a spirit similar to classical results for matrices such as those due to Weyl, Davis, Kahan and Wedin. Our bounds demonstrate intriguing differences between matrices and higher order tensors. Most notably, they indicate that for higher order tensors perturbation affects each essential singular value/vector in isolation, and its effect on an essential singular vector does not depend on the multiplicity of its corresponding singular value or its distance from other singular values. Our results can be readily applied and provide a unified treatment to many different problems involving higher order orthogonally decomposable tensors. In particular, we illustrate the implications of our bounds through connected yet seemingly different high-dimensional data analysis tasks: the unsupervised learning scenario of tensor SVD and the supervised task of tensor regression, leading to new insights in both of these settings.

Journal ArticleDOI
TL;DR: In this paper , a convex optimization program was proposed to find the true measure in the noiseless case, and the optimal solution was shown in the noisy case with respect to the generalized Wasserstein distance.
Abstract: Abstract In this paper, we study the high-dimensional super-resolution imaging problem. Here, we are given an image of a number of point sources of light whose locations and intensities are unknown. The image is pixelized and is blurred by a known point-spread function arising from the imaging device. We encode the unknown point sources and their intensities via a non-negative measure and we propose a convex optimization program to find it. Assuming the device’s point-spread function is componentwise decomposable, we show that the optimal solution is the true measure in the noiseless case, and it approximates the true measure well in the noisy case with respect to the generalized Wasserstein distance. Our main assumption is that the components of the point-spread function form a Tchebychev system ($T$-system) in the noiseless case and a $T^{*}$-system in the noisy case, mild conditions that are satisfied by Gaussian point-spread functions. Our work is a generalization to all dimensions of the work [14] where the same analysis is carried out in two dimensions. We also extend results in [27] to the high-dimensional case when the point-spread function decomposes.

Journal ArticleDOI
TL;DR: In this paper , a fast splitting algorithm for group testing with sparsity-constraints was proposed, where the testing procedure is subject to one of the following two constraints: items are finitely divisible and thus may participate in at most ε-gamma $ tests; or tests are size-consstrained to pool no more than ε items per test; and a noisy version of the problem is independently flipped with some constant probability.
Abstract: Abstract In group testing, the goal is to identify a subset of defective items within a larger set of items based on tests whose outcomes indicate whether at least one defective item is present. This problem is relevant in areas such as medical testing, DNA sequencing, communication protocols and many more. In this paper, we study (i) a sparsity-constrained version of the problem, in which the testing procedure is subjected to one of the following two constraints: items are finitely divisible and thus may participate in at most $\gamma $ tests; or tests are size-constrained to pool no more than $\rho $ items per test; and (ii) a noisy version of the problem, where each test outcome is independently flipped with some constant probability. Under each of these settings, considering the for-each recovery guarantee with asymptotically vanishing error probability, we introduce a fast splitting algorithm and establish its near-optimality not only in terms of the number of tests, but also in terms of the decoding time. While the most basic formulations of our algorithms require $\varOmega (n)$ storage for each algorithm, we also provide low-storage variants based on hashing, with similar recovery guarantees.

Journal ArticleDOI
TL;DR: In this article , the authors obtained concentration and large deviation results for the sum of independent and identically distributed random variables with heavy-tailed distributions, where the concentration results are concerned with random variables whose distributions satisfy the following conditions:
Abstract: Abstract We obtain concentration and large deviation for the sums of independent and identically distributed random variables with heavy-tailed distributions. Our concentration results are concerned with random variables whose distributions satisfy $P(X>t) \leq{\text{ e}}^{- I(t)}$, where $I: \mathbb{R} \rightarrow \mathbb{R}$ is an increasing function and $I(t)/t \rightarrow \alpha \in [0, \infty )$ as $t \rightarrow \infty $. Our main theorem can not only recover some of the existing results, such as the concentration of the sum of sub-Weibull random variables, but it can also produce new results for the sum of random variables with heavier tails. We show that the concentration inequalities we obtain are sharp enough to offer large deviation results for the sums of independent random variables as well. Our analyses which are based on standard truncation arguments simplify, unify and generalize the existing results on the concentration and large deviation of heavy-tailed random variables.