Top 21 papers published by Richard Zhang from Adobe Systems in 2020

Proceedings Article•DOI•

CNN-Generated Images Are Surprisingly Easy to Spot… for Now

[...]

Sheng-Yu Wang¹, Oliver Wang², Richard Zhang², Andrew Owens³, Alexei A. Efros¹ - Show less +1 more•Institutions (3)

University of California, Berkeley¹, Adobe Systems², University of Michigan³

14 Jun 2020

TL;DR: In this article, the authors show that a standard image classifier trained on only one specific CNN generator is able to generalize surprisingly well to unseen architectures, datasets, and training methods.

...read moreread less

Abstract: In this work we ask whether it is possible to create a "universal" detector for telling apart real images from these generated by a CNN, regardless of architecture or dataset used. To test this, we collect a dataset consisting of fake images generated by 11 different CNN-based image generator models, chosen to span the space of commonly used architectures today (ProGAN, StyleGAN, BigGAN, CycleGAN, StarGAN, GauGAN, DeepFakes, cascaded refinement networks, implicit maximum likelihood estimation, second-order attention super-resolution, seeing-in-the-dark). We demonstrate that, with careful pre- and post-processing and data augmentation, a standard image classifier trained on only one specific CNN generator (ProGAN) is able to generalize surprisingly well to unseen architectures, datasets, and training methods (including the just released StyleGAN2). Our findings suggest the intriguing possibility that today's CNN-generated images share some common systematic flaws, preventing them from achieving realistic image synthesis.

...read moreread less

497 citations

Posted Content•

Contrastive Learning for Unpaired Image-to-Image Translation

[...]

Taesung Park¹, Alexei A. Efros¹, Richard Zhang², Jun-Yan Zhu²•Institutions (2)

University of California, Berkeley¹, Adobe Systems²

30 Jul 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: The framework enables one-sided translation in the unpaired image-to-image translation setting, while improving quality and reducing training time, and can be extended to the training setting where each "domain" is only a single image.

...read moreread less

Abstract: In image-to-image translation, each patch in the output should reflect the content of the corresponding patch in the input, independent of domain. We propose a straightforward method for doing so -- maximizing mutual information between the two, using a framework based on contrastive learning. The method encourages two elements (corresponding patches) to map to a similar point in a learned feature space, relative to other elements (other patches) in the dataset, referred to as negatives. We explore several critical design choices for making contrastive learning effective in the image synthesis setting. Notably, we use a multilayer, patch-based approach, rather than operate on entire images. Furthermore, we draw negatives from within the input image itself, rather than from the rest of the dataset. We demonstrate that our framework enables one-sided translation in the unpaired image-to-image translation setting, while improving quality and reducing training time. In addition, our method can even be extended to the training setting where each "domain" is only a single image.

...read moreread less

420 citations

Book Chapter•DOI•

Contrastive Learning for Unpaired Image-to-Image Translation

[...]

Taesung Park¹, Alexei A. Efros¹, Richard Zhang², Jun-Yan Zhu²•Institutions (2)

University of California, Berkeley¹, Adobe Systems²

23 Aug 2020

TL;DR: In contrastive learning as discussed by the authors, two elements (corresponding patches) are mapped to a similar point in a learned feature space, relative to other elements in the dataset, referred to as negatives.

...read moreread less

Abstract: In image-to-image translation, each patch in the output should reflect the content of the corresponding patch in the input, independent of domain. We propose a straightforward method for doing so – maximizing mutual information between the two, using a framework based on contrastive learning. The method encourages two elements (corresponding patches) to map to a similar point in a learned feature space, relative to other elements (other patches) in the dataset, referred to as negatives. We explore several critical design choices for making contrastive learning effective in the image synthesis setting. Notably, we use a multilayer, patch-based approach, rather than operate on entire images. Furthermore, we draw negatives from within the input image itself, rather than from the rest of the dataset. We demonstrate that our framework enables one-sided translation in the unpaired image-to-image translation setting, while improving quality and reducing training time. In addition, our method can even be extended to the training setting where each “domain” is only a single image.

...read moreread less

316 citations

Posted Content•

Swapping Autoencoder for Deep Image Manipulation

[...]

Taesung Park¹, Jun-Yan Zhu², Oliver Wang², Jingwan Lu², Eli Shechtman², Alexei A. Efros¹, Richard Zhang² - Show less +3 more•Institutions (2)

University of California, Berkeley¹, Adobe Systems²

01 Jul 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: The Swapping Autoencoder is proposed, a deep model designed specifically for image manipulation, rather than random sampling, that can be used to manipulate real input images in various ways, including texture swapping, local and global editing, and latent code vector arithmetic.

...read moreread less

Abstract: Deep generative models have become increasingly effective at producing realistic images from randomly sampled seeds, but using such models for controllable manipulation of existing images remains challenging. We propose the Swapping Autoencoder, a deep model designed specifically for image manipulation, rather than random sampling. The key idea is to encode an image with two independent components and enforce that any swapped combination maps to a realistic image. In particular, we encourage the components to represent structure and texture, by enforcing one component to encode co-occurrent patch statistics across different parts of an image. As our method is trained with an encoder, finding the latent codes for a new input image becomes trivial, rather than cumbersome. As a result, it can be used to manipulate real input images in various ways, including texture swapping, local and global editing, and latent code vector arithmetic. Experiments on multiple datasets show that our model produces better results and is substantially more efficient compared to recent generative models.

...read moreread less

146 citations

Proceedings Article•

Few-shot Image Generation with Elastic Weight Consolidation

[...]

Yijun Li¹, Richard Zhang², Jingwan Lu¹, Eli Shechtman¹•Institutions (2)

Adobe Systems¹, University of Illinois at Urbana–Champaign²

04 Dec 2020

TL;DR: This work adapts a pretrained model, without introducing any additional parameters, to the few examples of the target domain, in order to best preserve the information of the source dataset, while fitting the target.

...read moreread less

Abstract: Few-shot image generation seeks to generate more data of a given domain, with only few available training examples. As it is unreasonable to expect to fully infer the distribution from just a few observations (e.g., emojis), we seek to leverage a large, related source domain as pretraining (e.g., human faces). Thus, we wish to preserve the diversity of the source domain, while adapting to the appearance of the target. We adapt a pretrained model, without introducing any additional parameters, to the few examples of the target domain. Crucially, we regularize the changes of the weights during this adaptation, in order to best preserve the information of the source dataset, while fitting the target. We demonstrate the effectiveness of our algorithm by generating high-quality results of different target domains, including those with extremely few examples (e.g., <10). We also analyze the performance of our method with respect to some important factors, such as the number of examples and the dissimilarity between the source and target domain.

...read moreread less

79 citations

Book Chapter•DOI•

Transforming and Projecting Images into Class-Conditional Generative Networks

[...]

Minyoung Huh¹, Minyoung Huh², Richard Zhang², Jun-Yan Zhu², Sylvain Paris², Aaron Hertzmann² - Show less +2 more•Institutions (2)

Massachusetts Institute of Technology¹, Adobe Systems²

23 Aug 2020

TL;DR: It is demonstrated that one can solve for image translation, scale, and global color transformation, during the projection optimization to address the object-center bias and color bias of a Generative Adversarial Network.

...read moreread less

Abstract: We present a method for projecting an input image into the space of a class-conditional generative neural network. We propose a method that optimizes for transformation to counteract the model biases in generative neural networks. Specifically, we demonstrate that one can solve for image translation, scale, and global color transformation, during the projection optimization to address the object-center bias and color bias of a Generative Adversarial Network. This projection process poses a difficult optimization problem, and purely gradient-based optimizations fail to find good solutions. We describe a hybrid optimization strategy that finds good projections by estimating transformations and class parameters. We show the effectiveness of our method on real images and further demonstrate how the corresponding projections lead to better editability of these images. The project page and the code is available at https://minyoungg.github.io/GAN-Transform-and-Project/.

...read moreread less

70 citations

Proceedings Article•

Swapping Autoencoder for Deep Image Manipulation

[...]

Taesung Park¹, Jun-Yan Zhu², Oliver Wang², Jingwan Lu², Eli Shechtman², Alexei A. Efros¹, Richard Zhang² - Show less +3 more•Institutions (2)

University of California, Berkeley¹, Adobe Systems²

01 Jul 2020

TL;DR: The Swapping Autoencoder as mentioned in this paper uses two independent components to encode co-occurrent patch statistics across different parts of an image and enforce that any swapped combination maps to a realistic image.

...read moreread less

Abstract: Deep generative models have become increasingly effective at producing realistic images from randomly sampled seeds, but using such models for controllable manipulation of existing images remains challenging. We propose the Swapping Autoencoder, a deep model designed specifically for image manipulation, rather than random sampling. The key idea is to encode an image with two independent components and enforce that any swapped combination maps to a realistic image. In particular, we encourage the components to represent structure and texture, by enforcing one component to encode co-occurrent patch statistics across different parts of an image. As our method is trained with an encoder, finding the latent codes for a new input image becomes trivial, rather than cumbersome. As a result, it can be used to manipulate real input images in various ways, including texture swapping, local and global editing, and latent code vector arithmetic. Experiments on multiple datasets show that our model produces better results and is substantially more efficient compared to recent generative models.

...read moreread less

63 citations

Proceedings Article•DOI•

A Differentiable Perceptual Audio Metric Learned from Just Noticeable Differences.

[...]

Pranay Manocha¹, Adam Finkelstein¹, Richard Zhang², Nicholas J. Bryan², Gautham J. Mysore², Zeyu Jin² - Show less +2 more•Institutions (2)

Princeton University¹, Adobe Systems²

13 Jan 2020

TL;DR: In this paper, the authors construct a metric by fitting a deep neural network to a new large dataset of crowdsourced human judgments, where subjects are prompted to answer a straightforward, objective question: are two recordings identical or not?

...read moreread less

Abstract: Many audio processing tasks require perceptual assessment. The ``gold standard`` of obtaining human judgments is time-consuming, expensive, and cannot be used as an optimization criterion. On the other hand, automated metrics are efficient to compute but often correlate poorly with human judgment, particularly for audio differences at the threshold of human detection. In this work, we construct a metric by fitting a deep neural network to a new large dataset of crowdsourced human judgments. Subjects are prompted to answer a straightforward, objective question: are two recordings identical or not? These pairs are algorithmically generated under a variety of perturbations, including noise, reverb, and compression artifacts; the perturbation space is probed with the goal of efficiently identifying the just-noticeable difference (JND) level of the subject. We show that the resulting learned metric is well-calibrated with human judgments, outperforming baseline methods. Since it is a deep network, the metric is differentiable, making it suitable as a loss function for other tasks. Thus, simply replacing an existing loss (e.g., deep feature loss) with our metric yields significant improvement in a denoising network, as measured by subjective pairwise comparison.

...read moreread less

49 citations

Proceedings Article•DOI•

Deep Parametric Shape Predictions Using Distance Fields

[...]

Dmitriy Smirnov¹, Matthew Fisher², Vladimir G. Kim², Richard Zhang², Justin Solomon¹ - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, Adobe Systems²

14 Jun 2020

TL;DR: In this article, the authors use distance fields to transition between shape parameters like control points and input data on a pixel grid, and demonstrate efficacy on 2D and 3D tasks, including font vectorization and surface abstraction.

...read moreread less

Abstract: Many tasks in graphics and vision demand machinery for converting shapes into consistent representations with sparse sets of parameters; these representations facilitate rendering, editing, and storage. When the source data is noisy or ambiguous, however, artists and engineers often manually construct such representations, a tedious and potentially time-consuming process. While advances in deep learning have been successfully applied to noisy geometric data, the task of generating parametric shapes has so far been difficult for these methods. Hence, we propose a new framework for predicting parametric shape primitives using deep learning. We use distance fields to transition between shape parameters like control points and input data on a pixel grid. We demonstrate efficacy on 2D and 3D tasks, including font vectorization and surface abstraction.

...read moreread less

43 citations

Posted Content•

Spatially-Adaptive Pixelwise Networks for Fast Image Translation

[...]

Tamar Rott Shaham¹, Michaël Gharbi², Richard Zhang², Eli Shechtman², Tomer Michaeli¹ - Show less +1 more•Institutions (2)

Technion – Israel Institute of Technology¹, Adobe Systems²

05 Dec 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A new generator architecture, aimed at fast and efficient high-resolution image-to-image translation, that uses pixel-wise networks, which provides an effective inductive bias for generating realistic novel high-frequency image content.

...read moreread less

Abstract: We introduce a new generator architecture, aimed at fast and efficient high-resolution image-to-image translation. We design the generator to be an extremely lightweight function of the full-resolution image. In fact, we use pixel-wise networks; that is, each pixel is processed independently of others, through a composition of simple affine transformations and nonlinearities. We take three important steps to equip such a seemingly simple function with adequate expressivity. First, the parameters of the pixel-wise networks are spatially varying so they can represent a broader function class than simple 1x1 convolutions. Second, these parameters are predicted by a fast convolutional network that processes an aggressively low-resolution representation of the input; Third, we augment the input image with a sinusoidal encoding of spatial coordinates, which provides an effective inductive bias for generating realistic novel high-frequency image content. As a result, our model is up to 18x faster than state-of-the-art baselines. We achieve this speedup while generating comparable visual quality across different image resolutions and translation domains.

...read moreread less

38 citations

Posted Content•

A Differentiable Perceptual Audio Metric Learned from Just Noticeable Differences

[...]

Pranay Manocha¹, Adam Finkelstein¹, Richard Zhang², Nicholas J. Bryan², Gautham J. Mysore², Zeyu Jin² - Show less +2 more•Institutions (2)

Princeton University¹, Adobe Systems²

13 Jan 2020-arXiv: Audio and Speech Processing

TL;DR: This work constructs a metric by fitting a deep neural network to a new large dataset of crowdsourced human judgments and shows that the resulting learned metric is well-calibrated with human judgments, outperforming baseline methods.

...read moreread less

Abstract: Many audio processing tasks require perceptual assessment. The ``gold standard`` of obtaining human judgments is time-consuming, expensive, and cannot be used as an optimization criterion. On the other hand, automated metrics are efficient to compute but often correlate poorly with human judgment, particularly for audio differences at the threshold of human detection. In this work, we construct a metric by fitting a deep neural network to a new large dataset of crowdsourced human judgments. Subjects are prompted to answer a straightforward, objective question: are two recordings identical or not? These pairs are algorithmically generated under a variety of perturbations, including noise, reverb, and compression artifacts; the perturbation space is probed with the goal of efficiently identifying the just-noticeable difference (JND) level of the subject. We show that the resulting learned metric is well-calibrated with human judgments, outperforming baseline methods. Since it is a deep network, the metric is differentiable, making it suitable as a loss function for other tasks. Thus, simply replacing an existing loss (e.g., deep feature loss) with our metric yields significant improvement in a denoising network, as measured by subjective pairwise comparison.

...read moreread less

Proceedings Article•

How many samples is a good initial point worth in Low-rank Matrix Recovery?

[...]

Jialun Zhang, Richard Zhang¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Jan 2020

Journal Article•DOI•

Image Morphing with Perceptual Constraints and STN Alignment

[...]

Noa Fish¹, Richard Zhang², Lilach Perry¹, Daniel Cohen-Or¹, Eli Shechtman², Connelly Barnes² - Show less +2 more•Institutions (2)

Tel Aviv University¹, Adobe Systems²

29 Apr 2020-arXiv: Graphics

TL;DR: A conditional generative adversarial network (GAN) morphing framework operating on a pair of input images is proposed, trained to synthesize frames corresponding to temporal samples along the transformation, and learns a proper shape prior that enhances the plausibility of intermediate frames.

...read moreread less

Abstract: In image morphing, a sequence of plausible frames are synthesized and composited together to form a smooth transformation between given instances. Intermediates must remain faithful to the input, stand on their own as members of the set, and maintain a well-paced visual transition from one to the next. In this paper, we propose a conditional GAN morphing framework operating on a pair of input images. The network is trained to synthesize frames corresponding to temporal samples along the transformation, and learns a proper shape prior that enhances the plausibility of intermediate frames. While individual frame plausibility is boosted by the adversarial setup, a special training protocol producing sequences of frames, combined with a perceptual similarity loss, promote smooth transformation over time. Explicit stating of correspondences is replaced with a grid-based freeform deformation spatial transformer that predicts the geometric warp between the inputs, instituting the smooth geometric effect by bringing the shapes into an initial alignment. We provide comparisons to classic as well as latent space morphing techniques, and demonstrate that, given a set of images for self-supervision, our network learns to generate visually pleasing morphing effects featuring believable in-betweens, with robustness to changes in shape and texture, requiring no correspondence annotation.

...read moreread less

Proceedings Article•

On the Tightness of Semidefinite Relaxations for Certifying Robustness to Adversarial Examples

[...]

Richard Zhang¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Jan 2020

TL;DR: A geometric technique is described that proves that the robustness certificate is exact over a single hidden layer under mild assumptions, and explains why it is usually conservative for several hidden layers.

...read moreread less

Abstract: The robustness of a neural network to adversarial examples can be provably certified by solving a convex relaxation. If the relaxation is loose, however, then the resulting certificate can be too conservative to be practically useful. Recently, a less conservative robustness certificate was proposed, based on a semidefinite programming (SDP) relaxation of the ReLU activation function. In this paper, we describe a geometric technique that determines whether this SDP certificate is exact, meaning whether it provides both a lower-bound on the size of the smallest adversarial perturbation, as well as a globally optimal perturbation that attains the lower-bound. Concretely, we show, for a least-squares restriction of the usual adversarial attack problem, that the SDP relaxation amounts to the nonconvex projection of a point onto a hyperbola. The resulting SDP certificate is exact if and only if the projection of the point lies on the major axis of the hyperbola. Using this geometric technique, we prove that the certificate is exact over a single hidden layer under mild assumptions, and explain why it is usually conservative for several hidden layers. We experimentally confirm our theoretical insights using a general-purpose interior-point method and a custom rank-2 Burer-Monteiro algorithm.

...read moreread less

Posted Content•

How Many Samples is a Good Initial Point Worth in Low-rank Matrix Recovery?

[...]

Gavin Zhang¹, Richard Zhang¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

12 Jun 2020-arXiv: Learning

TL;DR: In this paper, the authors quantify the relationship between the quality of the initial guess and the corresponding reduction in data requirements, and compute a sharp threshold number of samples needed to prevent each specific point on the optimization landscape from becoming a spurious local minimum.

...read moreread less

Abstract: Given a sufficiently large amount of labeled data, the non-convex low-rank matrix recovery problem contains no spurious local minima, so a local optimization algorithm is guaranteed to converge to a global minimum starting from any initial guess. However, the actual amount of data needed by this theoretical guarantee is very pessimistic, as it must prevent spurious local minima from existing anywhere, including at adversarial locations. In contrast, prior work based on good initial guesses have more realistic data requirements, because they allow spurious local minima to exist outside of a neighborhood of the solution. In this paper, we quantify the relationship between the quality of the initial guess and the corresponding reduction in data requirements. Using the restricted isometry constant as a surrogate for sample complexity, we compute a sharp threshold number of samples needed to prevent each specific point on the optimization landscape from becoming a spurious local minimum. Optimizing the threshold over regions of the landscape, we see that for initial points around the ground truth, a linear improvement in the quality of the initial guess amounts to a constant factor improvement in the sample complexity.

...read moreread less

Posted Content•

Transforming and Projecting Images into Class-conditional Generative Networks

[...]

Minyoung Huh¹, Minyoung Huh², Richard Zhang², Jun-Yan Zhu², Sylvain Paris², Aaron Hertzmann² - Show less +2 more•Institutions (2)

Massachusetts Institute of Technology¹, Adobe Systems²

04 May 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposed a method for projecting an input image into the space of a class-conditional generative neural network to counteract the model biases in generative adversarial networks (GANs).

...read moreread less

Abstract: We present a method for projecting an input image into the space of a class-conditional generative neural network. We propose a method that optimizes for transformation to counteract the model biases in generative neural networks. Specifically, we demonstrate that one can solve for image translation, scale, and global color transformation, during the projection optimization to address the object-center bias and color bias of a Generative Adversarial Network. This projection process poses a difficult optimization problem, and purely gradient-based optimizations fail to find good solutions. We describe a hybrid optimization strategy that finds good projections by estimating transformations and class parameters. We show the effectiveness of our method on real images and further demonstrate how the corresponding projections lead to better editability of these images.

...read moreread less

Posted Content•

On the Tightness of Semidefinite Relaxations for Certifying Robustness to Adversarial Examples.

[...]

Richard Zhang¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

11 Jun 2020-arXiv: Optimization and Control

TL;DR: In this article, the robustness of a neural network to adversarial examples can be provably certified by solving a convex relaxation of the ReLU activation function, and a geometric technique that determines whether this SDP certificate is exact is described.

...read moreread less

Abstract: The robustness of a neural network to adversarial examples can be provably certified by solving a convex relaxation. If the relaxation is loose, however, then the resulting certificate can be too conservative to be practically useful. Recently, a less conservative robustness certificate was proposed, based on a semidefinite programming (SDP) relaxation of the ReLU activation function. In this paper, we describe a geometric technique that determines whether this SDP certificate is exact, meaning whether it provides both a lower-bound on the size of the smallest adversarial perturbation, as well as a globally optimal perturbation that attains the lower-bound. Concretely, we show, for a least-squares restriction of the usual adversarial attack problem, that the SDP relaxation amounts to the nonconvex projection of a point onto a hyperbola. The resulting SDP certificate is exact if and only if the projection of the point lies on the major axis of the hyperbola. Using this geometric technique, we prove that the certificate is exact over a single hidden layer under mild assumptions, and explain why it is usually conservative for several hidden layers. We experimentally confirm our theoretical insights using a general-purpose interior-point method and a custom rank-2 Burer-Monteiro algorithm.

...read moreread less

Journal Article•DOI•

Large-Scale Traffic Signal Offset Optimization

[...]

Yi Ouyang, Richard Zhang¹, Javad Lavaei, Pravin Varaiya²•Institutions (2)

University of Illinois at Urbana–Champaign¹, University of California, Berkeley²

01 Jan 2020-IEEE Transactions on Control of Network Systems

TL;DR: In this article, a tree decomposition reduction (TCR) algorithm is proposed to solve the offset optimization problem on a large scale, where the objective is to coordinate and synchronize the timing of traffic signals in order to enhance traffic flow and reduce stops and delays.

...read moreread less

Abstract: The offset optimization problem seeks to coordinate and synchronize the timing of traffic signals throughout a network in order to enhance traffic flow and reduce stops and delays. Recently, offset optimization was formulated into a continuous optimization problem without integer variables by modeling traffic flow as sinusoidal. In this article, we present a novel algorithm to solve this new formulation to near-global optimality on a large scale. Specifically, we solve a convex relaxation of the nonconvex problem using a tree decomposition reduction, and use randomized rounding to recover a near-global solution. We prove that the algorithm always delivers solutions of expected value at least 0.785 times the globally optimal value. Moreover, assuming that the topology of the traffic network is “tree-like,” we prove that the algorithm has near-linear time complexity with respect to the number of intersections. These theoretical guarantees are experimentally validated on the Berkeley, Manhattan, and Los Angeles traffic networks. In our numerical results, the empirical time complexity of the algorithm is linear, and the solutions have objectives within 0.99 times the globally optimal value.

...read moreread less

Journal Article•DOI•

Image Morphing With Perceptual Constraints and STN Alignment

[...]

Noa Fish¹, Richard Zhang², Lilach Perry¹, Daniel Cohen-Or¹, Eli Shechtman², Connelly Barnes² - Show less +2 more•Institutions (2)

Tel Aviv University¹, Adobe Systems²

01 Sep 2020-Computer Graphics Forum

TL;DR: In this paper, a conditional generative adversarial network (GAN) is proposed to synthesize frames corresponding to temporal samples along the transformation, and learns a proper shape prior that enhances the plausibility of intermediate frames.

...read moreread less

Abstract: In image morphing, a sequence of plausible frames are synthesized and composited together to form a smooth transformation between given instances. Intermediates must remain faithful to the input, stand on their own as members of the set, and maintain a well-paced visual transition from one to the next. In this paper, we propose a conditional GAN morphing framework operating on a pair of input images. The network is trained to synthesize frames corresponding to temporal samples along the transformation, and learns a proper shape prior that enhances the plausibility of intermediate frames. While individual frame plausibility is boosted by the adversarial setup, a special training protocol producing sequences of frames, combined with a perceptual similarity loss, promote smooth transformation over time. Explicit stating of correspondences is replaced with a grid-based freeform deformation spatial transformer that predicts the geometric warp between the inputs, instituting the smooth geometric effect by bringing the shapes into an initial alignment. We provide comparisons to classic as well as latent space morphing techniques, and demonstrate that, given a set of images for self-supervision, our network learns to generate visually pleasing morphing effects featuring believable in-betweens, with robustness to changes in shape and texture, requiring no correspondence annotation.

...read moreread less

Posted Content•

Few-shot Image Generation with Elastic Weight Consolidation

[...]

Yijun Li¹, Richard Zhang², Jingwan Lu¹, Eli Shechtman¹•Institutions (2)

Adobe Systems¹, University of Illinois at Urbana–Champaign²

04 Dec 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a pretrained model is adapted to the target domain by regularizing the changes of the weights in order to best preserve the information of the source dataset, while fitting the target.

...read moreread less

Abstract: Few-shot image generation seeks to generate more data of a given domain, with only few available training examples. As it is unreasonable to expect to fully infer the distribution from just a few observations (e.g., emojis), we seek to leverage a large, related source domain as pretraining (e.g., human faces). Thus, we wish to preserve the diversity of the source domain, while adapting to the appearance of the target. We adapt a pretrained model, without introducing any additional parameters, to the few examples of the target domain. Crucially, we regularize the changes of the weights during this adaptation, in order to best preserve the information of the source dataset, while fitting the target. We demonstrate the effectiveness of our algorithm by generating high-quality results of different target domains, including those with extremely few examples (e.g., <10). We also analyze the performance of our method with respect to some important factors, such as the number of examples and the dissimilarity between the source and target domain.

...read moreread less

Posted Content•

How Many Samples is a Good Initial Point Worth

[...]

Gavin Zhang, Richard Zhang

12 Jun 2020

TL;DR: This paper quantifies the relationship between the quality of the initial guess and the corresponding reduction in data requirements, using the restricted isometry constant as a surrogate for sample complexity, and calculates a sharp threshold number of samples needed to prevent each specific point on the optimization landscape from becoming a spurious local minima.

...read moreread less

Abstract: Given a sufficiently large amount of labeled data, the non-convex low-rank matrix recovery problem contains no spurious local minima, so a local optimization algorithm is guaranteed to converge to a global minimum starting from any initial guess. However, the actual amount of data needed by this theoretical guarantee is very pessimistic, as it must prevent spurious local minima from existing anywhere, including at adversarial locations. In contrast, prior work based on good initial guesses have more realistic data requirements, because they allow spurious local minima to exist outside of a neighborhood of the solution. In this paper, we quantify the relationship between the quality of the initial guess and the corresponding reduction in data requirements. Using the restricted isometry constant as a surrogate for sample complexity, we compute a sharp threshold number of samples needed to prevent each specific point on the optimization landscape from becoming a spurious local minima. Optimizing the threshold over regions of the landscape, we see that, for initial points not too close to the ground truth, a linear improvement in the quality of the initial guess amounts to a constant factor improvement in the sample complexity.

...read moreread less

Showing papers by "Richard Zhang published in 2020"