scispace - formally typeset
Search or ask a question

Showing papers by "Richard Zhang published in 2018"


Posted Content
TL;DR: A new dataset of human perceptual similarity judgments is introduced and it is found that deep features outperform all previous metrics by large margins on this dataset, and suggests that perceptual similarity is an emergent property shared across deep visual representations.
Abstract: While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on ImageNet classification has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new dataset of human perceptual similarity judgments. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by large margins on our dataset. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.

3,838 citations


Proceedings ArticleDOI
11 Jan 2018
TL;DR: In this paper, the authors introduce a new dataset of human perceptual similarity judgments, and systematically evaluate deep features across different architectures and tasks and compare them with classic metrics, finding that deep features outperform all previous metrics by large margins on their dataset.
Abstract: While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on ImageNet classification has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new dataset of human perceptual similarity judgments. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by large margins on our dataset. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.

3,322 citations


Posted Content
TL;DR: This work shows that latent variational variable models that explicitly model underlying stochasticity and adversarially-trained models that aim to produce naturalistic images are in fact complementary and combines the two to produce predictions that look more realistic to human raters and better cover the range of possible futures.
Abstract: Being able to predict what may happen in the future requires an in-depth understanding of the physical and causal rules that govern the world. A model that is able to do so has a number of appealing applications, from robotic planning to representation learning. However, learning to predict raw future observations, such as frames in a video, is exceedingly challenging -- the ambiguous nature of the problem can cause a naively designed model to average together possible futures into a single, blurry prediction. Recently, this has been addressed by two distinct approaches: (a) latent variational variable models that explicitly model underlying stochasticity and (b) adversarially-trained models that aim to produce naturalistic images. However, a standard latent variable model can struggle to produce realistic results, and a standard adversarially-trained model underutilizes latent variables and fails to produce diverse predictions. We show that these distinct methods are in fact complementary. Combining the two produces predictions that look more realistic to human raters and better cover the range of possible futures. Our method outperforms prior and concurrent work in these aspects.

398 citations


Posted Content
TL;DR: This work shows that a class of non-differentiable nonconvex optimization problems arising in tensor decomposition applications are global functions, the first result concerning nonconcex methods for nonsmooth objective functions, and provides a theoretical guarantee for the widely-used $\ell_1$ norm to avoid outliers in nonconventus optimization.
Abstract: We study the set of continuous functions that admit no spurious local optima (i.e. local minima that are not global minima) which we term \textit{global functions}. They satisfy various powerful properties for analyzing nonconvex and nonsmooth optimization problems. For instance, they satisfy a theorem akin to the fundamental uniform limit theorem in the analysis regarding continuous functions. Global functions are also endowed with useful properties regarding the composition of functions and change of variables. Using these new results, we show that a class of nonconvex and nonsmooth optimization problems arising in tensor decomposition applications are global functions. This is the first result concerning nonconvex methods for nonsmooth objective functions. Our result provides a theoretical guarantee for the widely-used $\ell_1$ norm to avoid outliers in nonconvex optimization.

28 citations


Proceedings Article
25 May 2018
TL;DR: It is shown that moderate RIP is not enough to eliminate spurious local minima, so existing results can only hold for near-perfect RIP, and arguments based solely on norm preservation will only be applicable to a narrow set of nearly-isotropic instances.
Abstract: When the linear measurements of an instance of low-rank matrix recovery satisfy a restricted isometry property (RIP) --- i.e. they are approximately norm-preserving --- the problem is known to contain no spurious local minima, so exact recovery is guaranteed. In this paper, we show that moderate RIP is not enough to eliminate spurious local minima, so existing results can only hold for near-perfect RIP. In fact, counterexamples are ubiquitous: every $x$ is the spurious local minimum of a rank-1 instance of matrix recovery that satisfies RIP. One specific counterexample has RIP constant $\delta=1/2$, but causes randomly initialized stochastic gradient descent (SGD) to fail 12\% of the time. SGD is frequently able to avoid and escape spurious local minima, but this empirical result shows that it can occasionally be defeated by their existence. Hence, while exact recovery guarantees will likely require a proof of no spurious local minima, arguments based solely on norm preservation will only be applicable to a narrow set of nearly-isotropic instances.

24 citations


Posted Content
TL;DR: In this article, the authors show that moderate RIP is not enough to eliminate spurious local minima, so existing results can only hold for near-perfect RIP, and they prove that every x is the spurious local minimum of a rank-1 instance of matrix recovery that satisfies RIP.
Abstract: When the linear measurements of an instance of low-rank matrix recovery satisfy a restricted isometry property (RIP)---i.e. they are approximately norm-preserving---the problem is known to contain no spurious local minima, so exact recovery is guaranteed. In this paper, we show that moderate RIP is not enough to eliminate spurious local minima, so existing results can only hold for near-perfect RIP. In fact, counterexamples are ubiquitous: we prove that every x is the spurious local minimum of a rank-1 instance of matrix recovery that satisfies RIP. One specific counterexample has RIP constant $\delta=1/2$, but causes randomly initialized stochastic gradient descent (SGD) to fail 12% of the time. SGD is frequently able to avoid and escape spurious local minima, but this empirical result shows that it can occasionally be defeated by their existence. Hence, while exact recovery guarantees will likely require a proof of no spurious local minima, arguments based solely on norm preservation will only be applicable to a narrow set of nearly-isotropic instances.

24 citations


Proceedings Article
01 May 2018
TL;DR: The set of continuous functions that admit no spurious local optima (i.e., local minima that are not global minima) which are termed global functions was studied in this article, where it was shown that a class of non-differentiable nonconvex optimization problems arising in tensor decomposition applications are global functions.
Abstract: We study the set of continuous functions that admit no spurious local optima (i.e. local minima that are not global minima) which we term global functions. They satisfy various powerful properties for analyzing nonconvex and nonsmooth optimization problems. For instance, they satisfy a theorem akin to the fundamental uniform limit theorem in the analysis regarding continuous functions. Global functions are also endowed with useful properties regarding the composition of functions and change of variables. Using these new results, we show that a class of non-differentiable nonconvex optimization problems arising in tensor decomposition applications are global functions. This is the first result concerning nonconvex methods for nonsmooth objective functions. Our result provides a theoretical guarantee for the widely-used $\ell_1$ norm to avoid outliers in nonconvex optimization.

22 citations


Proceedings Article
03 Jul 2018
TL;DR: In this paper, a Newton-CG algorithm was proposed to solve the maximum determinant matrix completion (MDMC) problem for sparse inverse covariance estimation, where the covariance matrix is soft thresholded with a sparse Cholesky factorization, and the algorithm converges to an accurate solution in O(n log(1/ε) time and memory.
Abstract: The sparse inverse covariance estimation problem is commonly solved using an $\ell_{1}$-regularized Gaussian maximum likelihood estimator known as "graphical lasso", but its computational cost becomes prohibitive for large data sets. A recent line of results showed--under mild assumptions--that the graphical lasso estimator can be retrieved by soft-thresholding the sample covariance matrix and solving a maximum determinant matrix completion (MDMC) problem. This paper proves an extension of this result, and describes a Newton-CG algorithm to efficiently solve the MDMC problem. Assuming that the thresholded sample covariance matrix is sparse with a sparse Cholesky factorization, we prove that the algorithm converges to an $\epsilon$-accurate solution in $O(n\log(1/\epsilon))$ time and $O(n)$ memory. The algorithm is highly efficient in practice: we solve the associated MDMC problems with as many as 200,000 variables to 7-9 digits of accuracy in less than an hour on a standard laptop computer running MATLAB.

21 citations


Journal ArticleDOI
TL;DR: The theoretical justification and numerical evidence are given that the GMRES-accelerated variant consistently solves the same problem in $O(\kappa^{1/4})$ iterations for an order-of-magnitude reduction in iterations, despite a worst-case bound of $O(1/k)$ iterations.
Abstract: We consider the sequence acceleration problem for the alternating direction method of multipliers (ADMM) applied to a class of equality-constrained problems with strongly convex quadratic objectives, which frequently arise as the Newton subproblem of interior-point methods. Within this context, the ADMM update equations are linear, the iterates are confined within a Krylov subspace, and the general minimum residual (GMRES) algorithm is optimal in its ability to accelerate convergence. The basic ADMM method solves a $\kappa$-conditioned problem in $O(\sqrt{\kappa})$ iterations. We give theoretical justification and numerical evidence that the GMRES-accelerated variant consistently solves the same problem in $O(\kappa^{1/4})$ iterations for an order-of-magnitude reduction in iterations, despite a worst-case bound of $O(\sqrt{\kappa})$ iterations. The method is shown to be competitive against standard preconditioned Krylov subspace methods for saddle-point problems. The method is embedded within SeDuMi, a po...

11 citations


Proceedings ArticleDOI
03 Jan 2018
TL;DR: This paper derives a bound for the distance between the true solution and the nearest spurious local minimum and uses the bound to show that critical points of the nonconvex least squares objective become increasing rare and far-away from thetrue solution with the addition of redundant information.
Abstract: The power systems state estimation problem computes the set of complex voltage phasors given quadratic measurements using nonlinear least squares (NLS). This is a nonconvex optimization problem, so even in the absence of measurement errors, local search algorithms like Newton / Gauss–Newton can become “stuck” at local minima, which correspond to nonsensical estimations. In this paper, we observe that local minima cease to be an issue as redundant measurements are added. Posing state estimation as an instance of the quadratic recovery problem, we derive a bound for the distance between the true solution and the nearest spurious local minimum. We use the bound to show that critical points of the nonconvex least squares objective become increasing rare and far-away from the true solution with the addition of redundant information.

9 citations


Proceedings ArticleDOI
01 Dec 2018
TL;DR: In this paper, a modification of chordal conversion is proposed to solve a certain class of sparse SDPs with a guaranteed complexity of O(nl.5L) time and O(n) memory.
Abstract: Some of the strongest polynomial-time relaxations to NP-hard combinatorial optimization problems are semidefinite programs (SDPs), but their solution complexity of up to O(n6.5 L) time and O(n4) memory for $L$ accurate digits limits their use in all but the smallest problems. Given that combinatorial SDP relaxations are often sparse, a technique known as chordal conversion can sometimes reduce complexity substantially. In this paper, we describe a modification of chordal conversion that allows any general-purpose interior-point method to solve a certain class of sparse SDPs with a guaranteed complexity of O(nl.5L) time and O(n) memory. To illustrate the use of this technique, we solve the MAX k- CUT relaxation and the Lovasz Theta problem on power system models with up to n = 13659 nodes in 5 minutes, using SeDuMi v1.32 on a 1.7 GHz CPU with 16 GB of RAM. The empirical time complexity for attaining $L$ decimal digits of accuracy is ≈ 0.001nl.l $L$ seconds.

Proceedings ArticleDOI
12 Jun 2018
TL;DR: In this article, a closed-form solution for the Graphical Lasso problem with chordal structures was proposed. But the complexity of the solution was not reduced to 2 minutes and the state-of-the-art algorithms took more than 2 hours.
Abstract: In this paper, we consider the Graphical Lasso (GL), a popular optimization problem for learning the sparse representations of high-dimensional datasets, which is well-known to be computationally expensive for large-scale problems. Recently, we have shown that the sparsity pattern of the optimal solution of GL is equivalent to the one obtained from simply thresholding the sample covariance matrix, for sparse graphs under different conditions. We have also derived a closed-form solution that is optimal when the thresholded sample covariance matrix has an acyclic structure. As a major generalization of the previous result, in this paper we derive a closed-form solution for the GL for graphs with chordal structures. We show that the GL and thresholding equivalence conditions can significantly be simplified and are expected to hold for high-dimensional problems if the thresholded sample covariance matrix has a chordal structure. We then show that the GL and thresholding equivalence is enough to reduce the GL to a maximum determinant matrix completion problem and drive a recursive closed-form solution for the GL when the thresholded sample covariance matrix has a chordal structure. For large-scale problems with up to 450 million variables, the proposed method can solve the GL problem in less than 2 minutes, while the state-of-the-art methods converge in more than 2 hours.

Proceedings ArticleDOI
01 Dec 2018
TL;DR: This paper describes a Newton-PCG algorithm to efficiently solve large-and-sparse LMI feasibility problems, based on efficient log-det barriers for sparse matrices, and proves that the algorithm converges in linear $O(n)$ time and memory.
Abstract: Linear matrix inequalities (LMIs) play a fundamental role in robust and optimal control theory. However, their practical use remains limited, in part because their solution complexities of $O(n^{6.5})$ time and $O(n^{4})$ memory limit their applicability to systems containing no more than a few hundred state variables. This paper describes a Newton-PCG algorithm to efficiently solve large-and-sparse LMI feasibility problems, based on efficient log-det barriers for sparse matrices. Assuming that the data matrices share a sparsity pattern that admits a sparse Cholesky factorization, we prove that the algorithm converges in linear $O(n)$ time and memory. The algorithm is highly efficient in practice: we solve LMI feasibility problems over power system models with as many as $n=5738$ state variables in 2 minutes on a standard workstation running MATLAB.

Posted Content
TL;DR: This paper proves an extension of a recent line of results, and describes a Newton-CG algorithm to efficiently solve the MDMC problem, and proves that the algorithm converges to an $\epsilon$-accurate solution in $O(n\log(1/\ep silon)$ time and $O (n)$ memory.
Abstract: The sparse inverse covariance estimation problem is commonly solved using an $\ell_{1}$-regularized Gaussian maximum likelihood estimator known as "graphical lasso", but its computational cost becomes prohibitive for large data sets. A recent line of results showed--under mild assumptions--that the graphical lasso estimator can be retrieved by soft-thresholding the sample covariance matrix and solving a maximum determinant matrix completion (MDMC) problem. This paper proves an extension of this result, and describes a Newton-CG algorithm to efficiently solve the MDMC problem. Assuming that the thresholded sample covariance matrix is sparse with a sparse Cholesky factorization, we prove that the algorithm converges to an $\epsilon$-accurate solution in $O(n\log(1/\epsilon))$ time and $O(n)$ memory. The algorithm is highly efficient in practice: we solve the associated MDMC problems with as many as 200,000 variables to 7-9 digits of accuracy in less than an hour on a standard laptop computer running MATLAB.

Proceedings ArticleDOI
27 Jun 2018
TL;DR: This tutorial paper offers a detailed overview of some major advances in this area, namely conic optimization and its emerging applications and explains seminal results on the design of hierarchies of convex relaxations for a wide range of nonconvex problems.
Abstract: Optimization is at the core of control theory and appears in several areas of this field, such as optimal control, distributed control, system identification, robust control, state estimation, model predictive control and dynamic programming. The recent advances in various topics of modern optimization have also been revamping the area of machine learning. Motivated by the crucial role of optimization theory in the design, analysis, control and operation of real-world systems, this tutorial paper offers a detailed overview of some major advances in this area, namely conic optimization and its emerging applications. First, we discuss the importance of conic optimization in different areas. Then, we explain seminal results on the design of hierarchies of convex relaxations for a wide range of nonconvex problems. Finally, we study different numerical algorithms for large-scale conic optimization problems.

Proceedings ArticleDOI
01 Dec 2018
TL;DR: Under the practical assumption that a real-world traffic network has a bounded treewidth, it is shown that the complexity of the overall algorithm scales near-linearly with the number of intersections, and has a linear empirical time complexity.
Abstract: We consider the offset optimization problem that coordinates the offsets of signalized intersections to reduce vehicle queues in large-scale signalized traffic networks. We adopt a recent approach that transforms the offset optimization problem into a complex-valued quadratically-constrained quadratic program (QCQP). Using the special structure of the QCQP, we provide a π /4-approximation algorithm to find a near-global solution based on the optimal solution of a semidefinite program (SDP) relaxation. Although large-scale SDPs are generally hard to solve, we exploit sparsity structures of traffic networks to propose a numerical algorithm that is able to efficiently solve the SDP relaxation of the offset optimization problem. The developed algorithm relies on a tree decomposition to reformulate the large-scale problem into a reduced-complexity SDP. Under the practical assumption that a real-world traffic network has a bounded treewidth, we show that the complexity of the overall algorithm scales near-linearly with the number of intersections. The results of this work, including the bounded treewidth property, are demonstrated on the Berkeley, Manhattan, and Los Angeles networks. From numerical experiments it is observed that the algorithm has a linear empirical time complexity, and the solutions of all cases achieve a near-globally optimal guarantee of more than 0.99.

Posted Content
14 Feb 2018
TL;DR: In this paper, the authors showed that the graphical lasso estimator can be retrieved by soft-thresholding the sample covariance matrix and solving a maximum determinant matrix completion (MDMC) problem.
Abstract: The sparse inverse covariance estimation problem is commonly solved using an $\ell_{1}$-regularized Gaussian maximum likelihood estimator known as "graphical lasso", but its computational cost becomes prohibitive for large data sets A recent line of results showed--under mild assumptions--that the graphical lasso estimator can be retrieved by soft-thresholding the sample covariance matrix and solving a maximum determinant matrix completion (MDMC) problem This paper proves an extension of this result, and describes a Newton-CG algorithm to efficiently solve the MDMC problem Assuming that the thresholded sample covariance matrix is sparse with a sparse Cholesky factorization, we prove that the algorithm converges to an $\epsilon$-accurate solution in $O(n\log(1/\epsilon))$ time and $O(n)$ memory The algorithm is highly efficient in practice: we solve the associated MDMC problems with as many as 200,000 variables to 7-9 digits of accuracy in less than an hour on a standard laptop computer running MATLAB

Posted Content
21 May 2018
TL;DR: The set of continuous functions that admit no spurious local optima (i.e., local minima that are not global minima) which we refer to as global functions were studied in this paper.
Abstract: We study the set of continuous functions that admit no spurious local optima (i.e. local minima that are not global minima) which we term \textit{global functions}. They satisfy various powerful properties for analyzing nonconvex and nonsmooth optimization problems. For instance, they satisfy a theorem akin to the fundamental uniform limit theorem in the analysis regarding continuous functions. Global functions are also endowed with useful properties regarding the composition of functions and change of variables. Using these new results, we show that a class of nonconvex and nonsmooth optimization problems arising in tensor decomposition applications are global functions. This is the first result concerning nonconvex methods for nonsmooth objective functions. Our result provides a theoretical guarantee for the widely-used $\ell_1$ norm to avoid outliers in nonconvex optimization.