scispace - formally typeset
Search or ask a question
Posted Content

Phase retrieval with random Gaussian sensing vectors by alternating projections

TL;DR: It is conjecture that the classical algorithm of alternating projections (Gerchberg–Saxton) succeeds with high probability when no special initialization procedure is used, and it is conjectured that this result is still true when nospecial initialization process is used.
Abstract: We consider a phase retrieval problem, where we want to reconstruct a $n$-dimensional vector from its phaseless scalar products with $m$ sensing vectors. We assume the sensing vectors to be independently sampled from complex normal distributions. We propose to solve this problem with the classical non-convex method of alternating projections. We show that, when $m\geq Cn$ for $C$ large enough, alternating projections succeed with high probability, provided that they are carefully initialized. We also show that there is a regime in which the stagnation points of the alternating projections method disappear, and the initialization procedure becomes useless. However, in this regime, $m$ has to be of the order of $n^2$. Finally, we conjecture from our numerical experiments that, in the regime $m=O(n)$, there are stagnation points, but the size of their attraction basin is small if $m/n$ is large enough, so alternating projections can succeed with probability close to $1$ even with no special initialization.
Citations
More filters
Journal ArticleDOI
TL;DR: This tutorial-style overview highlights the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees and reviews two contrasting approaches: two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and global landscape analysis and initialization-free algorithms.
Abstract: Substantial progress has been made recently on developing provably accurate and efficient algorithms for low-rank matrix factorization via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple iterative methods such as gradient descent have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently. In this tutorial-style overview, we highlight the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees. We review two contrasting approaches: (1) two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and (2) global landscape analysis and initialization-free algorithms. Several canonical matrix factorization problems are discussed, including but not limited to matrix sensing, phase retrieval, matrix completion, blind deconvolution, and robust principal component analysis. Special care is taken to illustrate the key technical insights underlying their analyses. This article serves as a testament that the integrated consideration of optimization and statistics leads to fruitful research findings.

369 citations


Cites methods from "Phase retrieval with random Gaussia..."

  • ...The theoretical guarantee for the original sample-reuse version was derived by [103]....

    [...]

  • ...Therefore, by applying AltMin to the loss function f(b,x), we obtain the following update rule [102], [103]: for each...

    [...]

  • ...Theorem 17 (AltMin (ER) for phase retrieval [103]): Consider the problem (25)....

    [...]

Journal ArticleDOI
TL;DR: It is proved that when the measurement vectors are generic, with high probability, a natural least-squares formulation for GPR has the following benign geometric structure: (1) There are no spurious local minimizers, and all global minimizers are equal to the target signal, up to a global phase, and (2) the objective function has a negative directional curvature around each saddle point.
Abstract: Can we recover a complex signal from its Fourier magnitudes? More generally, given a set of $m$ measurements, $y_k = |\mathbf a_k^* \mathbf x|$ for $k = 1, \dots, m$, is it possible to recover $\mathbf x \in \mathbb{C}^n$ (i.e., length-$n$ complex vector)? This **generalized phase retrieval** (GPR) problem is a fundamental task in various disciplines, and has been the subject of much recent investigation. Natural nonconvex heuristics often work remarkably well for GPR in practice, but lack clear theoretical explanations. In this paper, we take a step towards bridging this gap. We prove that when the measurement vectors $\mathbf a_k$'s are generic (i.i.d. complex Gaussian) and the number of measurements is large enough ($m \ge C n \log^3 n$), with high probability, a natural least-squares formulation for GPR has the following benign geometric structure: (1) there are no spurious local minimizers, and all global minimizers are equal to the target signal $\mathbf x$, up to a global phase; and (2) the objective function has a negative curvature around each saddle point. This structure allows a number of iterative optimization methods to efficiently find a global minimizer, without special initialization. To corroborate the claim, we describe and analyze a second-order trust-region algorithm.

354 citations

Posted Content
TL;DR: This work proposes a new framework dubbed the Iterative Federated Clustering Algorithm (IFCA), which alternately estimates the cluster identities of the users and optimizes model parameters for the user clusters via gradient descent, and analyzes the convergence rate of this algorithm first in a linear model with squared loss and then for generic strongly convex and smooth loss functions.
Abstract: We address the problem of federated learning (FL) where users are distributed and partitioned into clusters. This setup captures settings where different groups of users have their own objectives (learning tasks) but by aggregating their data with others in the same cluster (same learning task), they can leverage the strength in numbers in order to perform more efficient federated learning. For this new framework of clustered federated learning, we propose the Iterative Federated Clustering Algorithm (IFCA), which alternately estimates the cluster identities of the users and optimizes model parameters for the user clusters via gradient descent. We analyze the convergence rate of this algorithm first in a linear model with squared loss and then for generic strongly convex and smooth loss functions. We show that in both settings, with good initialization, IFCA is guaranteed to converge, and discuss the optimality of the statistical error rate. In particular, for the linear model with two clusters, we can guarantee that our algorithm converges as long as the initialization is slightly better than random. When the clustering structure is ambiguous, we propose to train the models by combining IFCA with the weight sharing technique in multi-task learning. In the experiments, we show that our algorithm can succeed even if we relax the requirements on initialization with random initialization and multiple restarts. We also present experimental results showing that our algorithm is efficient in non-convex problems such as neural networks. We demonstrate the benefits of IFCA over the baselines on several clustered FL benchmarks.

252 citations


Cites background from "Phase retrieval with random Gaussia..."

  • ...In recent years, some progress has been made towards understanding the convergence of EM and AM in the centralized setting [31, 5, 47, 1, 42]....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors consider the problem of recovering a one-dimensional signal from its Fourier transform magnitude, called Fourier phase retrieval, which is ill-posed in most cases.
Abstract: The problem of recovering a one-dimensional signal from its Fourier transform magnitude, called Fourier phase retrieval, is ill-posed in most cases. We consider the closely-related problem of recovering a signal from its phaseless short-time Fourier transform (STFT) measurements. This problem arises naturally in several applications, such as ultra-short laser pulse characterization and ptychography. The redundancy offered by the STFT enables unique recovery under mild conditions. We show that in some cases the unique solution can be obtained by the principal eigenvector of a matrix, constructed as the solution of a simple least-squares problem. When these conditions are not met, we suggest using the principal eigenvector of this matrix to initialize non-convex local optimization algorithms and propose two such methods. The first is based on minimizing the empirical risk loss function, while the second maximizes a quadratic function on the manifold of phases. We prove that under appropriate conditions, the proposed initialization is close to the underlying signal. We then analyze the geometry of the empirical risk loss function and show numerically that both gradient algorithms converge to the underlying signal even with small redundancy in the measurements. In addition, the algorithms are robust to noise.

119 citations

Journal ArticleDOI
TL;DR: A novel algorithm to reconstruct a sparse signal from a small number of magnitude-only measurements, SPARTA is a simple yet effective, scalable, and fast sparse PR solver that is robust against additive noise of bounded support.
Abstract: This paper develops a novel algorithm, termed SPARse Truncated Amplitude flow (SPARTA), to reconstruct a sparse signal from a small number of magnitude-only measurements. It deals with what is also known as sparse phase retrieval (PR), which is NP-hard in general and emerges in many science and engineering applications. Upon formulating sparse PR as an amplitude-based nonconvex optimization task, SPARTA works iteratively in two stages: In stage one, the support of the underlying sparse signal is recovered using an analytically well-justified rule, and subsequently a sparse orthogonality-promoting initialization is obtained via power iterations restricted on the support; and in the second stage, the initialization is successively refined by means of hard thresholding based gradient-type iterations. SPARTA is a simple yet effective, scalable, and fast sparse PR solver. On the theoretical side, for any $n$ -dimensional $k$ -sparse ( $k\ll n$ ) signal $\boldsymbol {x}$ with minimum (in modulus) nonzero entries on the order of $(1/\sqrt{k})\Vert \boldsymbol {x}\Vert _2$ , SPARTA recovers the signal exactly (up to a global unimodular constant) from about $k^2\log n$ random Gaussian measurements with high probability. Furthermore, SPARTA incurs computational complexity on the order of $k^2n\log n$ with total runtime proportional to the time required to read the data, which improves upon the state of the art by at least a factor of $k$ . Finally, SPARTA is robust against additive noise of bounded support. Extensive numerical tests corroborate markedly improved recovery performance and speedups of SPARTA relative to existing alternatives.

116 citations


Cites methods from "Phase retrieval with random Gaussia..."

  • ...A popular class of nonconvex approaches is based on alternating projections including the seminal works by Gerchberg-Saxton [13] and Fienup [5], [14], [15], alternating minimization with re-sampling (AltMinPhase) [6], (stochastic) truncated amplitude flow (TAF) [10], [16]–[19] and the Wirtinger flow (WF) variants [8], [9], [20], [21], trustregion [22], proximal linear algorithms [23]....

    [...]

References
More filters
Book ChapterDOI

[...]

01 Jan 2012

139,059 citations


"Phase retrieval with random Gaussia..." refers background or methods in this paper

  • ...1The proof technique used by [21] collapses when gradient descent is replaced with alternating projections, as detailed in a companion study of this article, available on arXiv [22]....

    [...]

  • ...This fact is related to the observations of [21], who proved that, at least in the regime m ≥ O(n log3 n) and for a specific cost function, the initialization part of the two-step scheme is not necessary in order for the algorithm to converge....

    [...]

  • ...The closest one to our conjecture is [21], that also considers phase retrieval with Gaussian sensing vectors: in the almost optimal regime m = O(n log3 n), it shows that gradient descent over a (specific) non-convex function succeeds with high probability, even when randomly initialized....

    [...]

Journal ArticleDOI
TL;DR: Iterative algorithms for phase retrieval from intensity data are compared to gradient search methods and it is shown that both the error-reduction algorithm for the problem of a single intensity measurement and the Gerchberg-Saxton algorithm forThe problem of two intensity measurements converge.
Abstract: Iterative algorithms for phase retrieval from intensity data are compared to gradient search methods. Both the problem of phase retrieval from two intensity measurements (in electron microscopy or wave front sensing) and the problem of phase retrieval from a single intensity measurement plus a non-negativity constraint (in astronomy) are considered, with emphasis on the latter. It is shown that both the error-reduction algorithm for the problem of a single intensity measurement and the Gerchberg-Saxton algorithm for the problem of two intensity measurements converge. The error-reduction algorithm is also shown to be closely related to the steepest-descent method. Other algorithms, including the input-output algorithm and the conjugate-gradient method, are shown to converge in practice much faster than the error-reduction algorithm. Examples are shown.

5,210 citations


"Phase retrieval with random Gaussia..." refers methods in this paper

  • ...The oldest reconstruction algorithms [2], [3] were iterative: they started from a random initial guess of x0, and tried to iteratively refine it by various heuristics....

    [...]

Journal Article
01 Jan 1972-Optik
TL;DR: In this article, an algorithm is presented for the rapid solution of the phase of the complete wave function whose intensity in the diffraction and imaging planes of an imaging system are known.

5,197 citations

Book ChapterDOI
01 May 2012
TL;DR: This is a tutorial on some basic non-asymptotic methods and concepts in random matrix theory, particularly for the problem of estimating covariance matrices in statistics and for validating probabilistic constructions of measurementMatrices in compressed sensing.
Abstract: This is a tutorial on some basic non-asymptotic methods and concepts in random matrix theory. The reader will learn several tools for the analysis of the extreme singular values of random matrices with independent rows or columns. Many of these methods sprung off from the development of geometric functional analysis since the 1970's. They have applications in several fields, most notably in theoretical computer science, statistics and signal processing. A few basic applications are covered in this text, particularly for the problem of estimating covariance matrices in statistics and for validating probabilistic constructions of measurement matrices in compressed sensing. These notes are written particularly for graduate students and beginning researchers in different areas, including functional analysts, probabilists, theoretical statisticians, electrical engineers, and theoretical computer scientists.

2,780 citations


"Phase retrieval with random Gaussia..." refers background in this paper

  • ...the event 8x;CardI x<nm1=8 has probability at least 1 1C 1 exp( C 2m=8): Proof of Lemma C.11. Let M1 be temporarily xed. For any n;m, let N n;mbe a 1 Mm2 -net of the unit sphere of C n. From [Vershynin, 2012, Lemma 5.2], there is one of cardinality at most 1 + 4Mm2 2n (5Mm2)2n: We dene two events: E 1 = ˆ 8x2N n;m;Card ˆ i;j(Ax) ij 2 m2 ˙ <nm1=8 ˙ ; E 2 = f8i2f1;:::;mg;jja i jjMg: (We recall that ...

    [...]

  • ...ere of dimension n, we can construct Mk n as Mk n = fP En (y);y2V k n g; where Vk n is a 2 (k+1)-net of the unit sphere, and, for any y, P En (y) is a point in E nwhose distance to yis minimal. From [Vershynin, 2012, Lemma 5.2], this implies that we can choose Mk n such that CardMk n 1 + 2 2 (k+1) 2n 22n(k+3): (35) For any x2Cn, we set F(x) = E(hAx 0;bphase(Ax)i) (where the expectation denotes the expect...

    [...]

Journal ArticleDOI
TL;DR: It is shown that in some instances, the combinatorial phase retrieval problem can be solved by convex programming techniques, and it is proved that the methodology is robust vis‐à‐vis additive noise.
Abstract: Suppose we wish to recover a signal \input amssym $\font\abc=cmmib10\def\bi#1{\hbox{\abc#1}} {\bi x} \in {\Bbb C}^n$ from m intensity measurements of the form , ; that is, from data in which phase information is missing. We prove that if the vectors are sampled independently and uniformly at random on the unit sphere, then the signal x can be recovered exactly (up to a global phase factor) by solving a convenient semidefinite program–-a trace-norm minimization problem; this holds with large probability provided that m is on the order of , and without any assumption about the signal whatsoever. This novel result demonstrates that in some instances, the combinatorial phase retrieval problem can be solved by convex programming techniques. Finally, we also prove that our methodology is robust vis-a-vis additive noise. © 2012 Wiley Periodicals, Inc.

1,190 citations


"Phase retrieval with random Gaussia..." refers methods in this paper

  • ...To overcome convergence problems, convexification methods have been introduced [4], [5]....

    [...]