Top 21 papers published by Richard Zhang from Adobe Systems in 2021

Proceedings Article•DOI•

Few-shot Image Generation via Cross-domain Correspondence

[...]

Utkarsh Ojha¹, Yijun Li¹, Jingwan Lu¹, Alexei A. Efros¹, Yong Jae Lee², Eli Shechtman¹, Richard Zhang¹ - Show less +3 more•Institutions (2)

Adobe Systems¹, University of California, Davis²

13 Apr 2021

TL;DR: In this paper, a cross-domain distance consistency loss is proposed to preserve the relative similarities and differences between instances in the source via a novel cross-source distance consistency layer and an anchor-based strategy to encourage different levels of realism over different regions in the latent space.

...read moreread less

Abstract: Training generative models, such as GANs, on a target domain containing limited examples (e.g., 10) can easily result in overfitting. In this work, we seek to utilize a large source domain for pretraining and transfer the diversity information from source to target. We propose to preserve the relative similarities and differences between instances in the source via a novel cross-domain distance consistency loss. To further reduce overfitting, we present an anchor-based strategy to encourage different levels of realism over different regions in the latent space. With extensive results in both photorealistic and non-photorealistic domains, we demonstrate qualitatively and quantitatively that our few-shot model automatically discovers correspondences between source and target domains and generates more diverse and realistic images than previous methods.

...read moreread less

136 citations

Proceedings Article•DOI•

Anycost GANs for Interactive Image Synthesis and Editing

[...]

Ji Lin¹, Richard Zhang², Frieder Ganz², Song Han¹, Jun-Yan Zhu² - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, Adobe Systems²

04 Mar 2021

TL;DR: In this article, the Anycost GAN is proposed for interactive natural image editing, which uses sampling-based multi-resolution training, adaptive-channel training, and a generator-conditioned discriminator.

...read moreread less

Abstract: Generative adversarial networks (GANs) have enabled photorealistic image synthesis and editing. However, due to the high computational cost of large-scale generators (e.g., StyleGAN2), it usually takes seconds to see the results of a single edit on edge devices, prohibiting interactive user experience. In this paper, inspired by quick preview features in modern rendering software, we propose Anycost GAN for interactive natural image editing. We train the Anycost GAN to support elastic resolutions and channels for faster image generation at versatile speeds. Running subsets of the full generator produce outputs that are perceptually similar to the full generator, making them a good proxy for quick preview. By using sampling-based multi-resolution training, adaptive-channel training, and a generator-conditioned discriminator, the anycost generator can be evaluated at various configurations while achieving better image quality compared to separately trained models. Furthermore, we develop new encoder training and latent code optimization techniques to encourage consistency between the different sub-generators during image projection. Anycost GAN can be executed at various cost budgets (up to 10× computation reduction) and adapt to a wide range of hardware and la tency requirements. When deployed on desktop CPUs and edge devices, our model can provide perceptually similar previews at 6-12× speedup, enabling interactive image editing. The ${\color{RubineRed}{code}}$ and ${\color{RubineRed}{demo}}$ are publicly available.

...read moreread less

56 citations

Proceedings Article•DOI•

Ensembling with Deep Generative Views

[...]

Lucy Chai¹, Jun-Yan Zhu², Eli Shechtman², Phillip Isola¹, Richard Zhang² - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, Adobe Systems²

29 Apr 2021

TL;DR: In this paper, the latent code corresponding to a given real input image is generated using a pre-trained generator, and the generated code can then be used to generate natural variations of the image.

...read moreread less

Abstract: Recent generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose, simply by learning from unlabeled image collections. Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification. Using a pre-trained generator, we first find the latent code corresponding to a given real input image. Applying perturbations to the code creates natural variations of the image, which can then be ensembled together at test-time. We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars. Critically, we find that several design decisions are required towards making this process work; the perturbation procedure, weighting between the augmentations and original image, and training the classifier on synthesized images can all impact the result. Currently, we find that while test-time ensembling with GAN-based augmentations can offer some small improvements, the remaining bottlenecks are the efficiency and accuracy of the GAN reconstructions, coupled with classifier sensitivities to artifacts in GAN-generated images.

...read moreread less

48 citations

Posted Content•

CDPAM: Contrastive learning for perceptual audio similarity

[...]

Pranay Manocha¹, Zeyu Jin², Richard Zhang², Adam Finkelstein¹•Institutions (2)

Princeton University¹, Adobe Systems²

09 Feb 2021-arXiv: Audio and Speech Processing

TL;DR: CDPAM is introduced –a metric that builds on and advances DPAM, and it is shown that adding this metric to existing speech synthesis and enhancement methods yields significant improvement, as measured by objective and subjective tests.

...read moreread less

Abstract: Many speech processing methods based on deep learning require an automatic and differentiable audio metric for the loss function. The DPAM approach of Manocha et al. learns a full-reference metric trained directly on human judgments, and thus correlates well with human perception. However, it requires a large number of human annotations and does not generalize well outside the range of perturbations on which it was trained. This paper introduces CDPAM, a metric that builds on and advances DPAM. The primary improvement is to combine contrastive learning and multi-dimensional representations to build robust models from limited data. In addition, we collect human judgments on triplet comparisons to improve generalization to a broader range of audio perturbations. CDPAM correlates well with human responses across nine varied datasets. We also show that adding this metric to existing speech synthesis and enhancement methods yields significant improvement, as measured by objective and subjective tests.

...read moreread less

33 citations

Proceedings Article•DOI•

Spatially-Adaptive Pixelwise Networks for Fast Image Translation

[...]

Tamar Rott Shaham¹, Michaël Gharbi², Richard Zhang², Eli Shechtman², Tomer Michaeli¹ - Show less +1 more•Institutions (2)

Technion – Israel Institute of Technology¹, Adobe Systems²

01 Jun 2021

TL;DR: In this paper, a pixel-wise network is proposed for image-to-image translation, where each pixel is processed independently of others, through a composition of simple affine transformations and nonlinearities.

...read moreread less

Abstract: We introduce a new generator architecture, aimed at fast and efficient high-resolution image-to-image translation. We design the generator to be an extremely lightweight function of the full-resolution image. In fact, we use pixel-wise networks; that is, each pixel is processed independently of others, through a composition of simple affine transformations and nonlinearities. We take three important steps to equip such a seemingly simple function with adequate expressivity. First, the parameters of the pixel-wise networks are spatially varying, so they can represent a broader function class than simple 1 × 1 convolutions. Second, these parameters are predicted by a fast convolutional network that processes an aggressively low-resolution representation of the input. Third, we augment the input image by concatenating a sinusoidal encoding of spatial coordinates, which provides an effective inductive bias for generating realistic novel high-frequency image content. As a result, our model is up to 18× faster than state-of-the-art baselines. We achieve this speedup while generating comparable visual quality across different image resolutions and translation domains.

...read moreread less

27 citations

Journal Article•DOI•

Benchmarked approaches for reconstruction of in vitro cell lineages and in silico models of C. elegans and M. musculus developmental trees.

[...]

Wuming Gong¹, Alejandro A. Granados², Jingyuan Hu³, Matthew G. Jones⁴, Matthew G. Jones⁵, Ofir Raz⁶, Irepan Salvador-Martínez⁷, Hanrui Zhang⁸, Ke Huan K. Chow², Il Youp Kwak⁹, Renata Retkute¹⁰, Alidivinas Prusokas¹¹, Augustinas Prusokas¹², Alex Khodaverdian⁴, Richard Zhang⁴, Suhas Rao⁴, Robert Y. Wang⁴, Phil Rennert¹³, Vangala G. Saipradeep¹⁴, Naveen Sivadasan¹⁴, Aditya Rao¹⁴, Thomas Joseph¹⁴, Rajgopal Srinivasan¹⁴, Jiajie Peng¹⁵, Lu Han¹⁵, Xuequn Shang¹⁵, Daniel J. Garry¹, Thomas Yu¹⁶, Verena Chung¹⁶, Michael Mason¹⁶, Zhandong Liu³, Yuanfang Guan⁸, Nir Yosef⁴, Jay Shendure, Maximilian J. Telford⁷, Ehud Shapiro⁶, Michael B. Elowitz², Pablo Meyer¹⁷ - Show less +34 more•Institutions (17)

University of Minnesota¹, California Institute of Technology², Baylor College of Medicine³, University of California, Berkeley⁴, University of California, San Francisco⁵, Weizmann Institute of Science⁶, University College London⁷, University of Michigan⁸, Chung-Ang University⁹, University of Cambridge¹⁰, Newcastle University¹¹, Imperial College London¹², Rafael Advanced Defense Systems¹³, Tata Consultancy Services¹⁴, Northwestern Polytechnical University¹⁵, Sage Bionetworks¹⁶, IBM¹⁷

18 Aug 2021-Cell systems

TL;DR: The DREAM challenge as mentioned in this paper used in-vitro experimental intMEMOIR recordings and in-silico data for a C.elegans lineage tree and a Mus musculus tree of 10,000 cells.

...read moreread less

Abstract: Summary The recent advent of CRISPR and other molecular tools enabled the reconstruction of cell lineages based on induced DNA mutations and promises to solve the ones of more complex organisms. To date, no lineage reconstruction algorithms have been rigorously examined for their performance and robustness across dataset types and number of cells. To benchmark such methods, we decided to organize a DREAM challenge using in vitro experimental intMEMOIR recordings and in silico data for a C. elegans lineage tree of about 1,000 cells and a Mus musculus tree of 10,000 cells. Some of the 22 approaches submitted had excellent performance, but structural features of the trees prevented optimal reconstructions. Using smaller sub-trees as training sets proved to be a good approach for tuning algorithms to reconstruct larger trees. The simulation and reconstruction methods here generated delineate a potential way forward for solving larger cell lineage trees such as in mouse.

...read moreread less

26 citations

Journal Article•DOI•

Sparse semidefinite programs with guaranteed near-linear time complexity via dualized clique tree conversion

[...]

Richard Zhang¹, Richard Zhang², Javad Lavaei¹•Institutions (2)

University of California, Berkeley¹, University of Illinois at Urbana–Champaign²

01 Jul 2021-Mathematical Programming

TL;DR: In this article, the authors show that the coupling due to overlap constraints is guaranteed to be sparse over dense blocks, with a block sparsity pattern that coincides with the adjacency matrix of a tree.

...read moreread less

Abstract: Clique tree conversion solves large-scale semidefinite programs by splitting an $$n\times n$$ matrix variable into up to n smaller matrix variables, each representing a principal submatrix of up to $$\omega \times \omega $$ . Its fundamental weakness is the need to introduce overlap constraints that enforce agreement between different matrix variables, because these can result in dense coupling. In this paper, we show that by dualizing the clique tree conversion, the coupling due to the overlap constraints is guaranteed to be sparse over dense blocks, with a block sparsity pattern that coincides with the adjacency matrix of a tree. We consider two classes of semidefinite programs with favorable sparsity patterns that encompass the MAXCUT and MAX k-CUT relaxations, the Lovasz Theta problem, and the AC optimal power flow relaxation. Assuming that $$\omega \ll n$$ , we prove that the per-iteration cost of an interior-point method is linear O(n) time and memory, so an $$\epsilon $$ -accurate and $$\epsilon $$ -feasible iterate is obtained after $$O(\sqrt{n}\log (1/\epsilon ))$$ iterations in near-linear $$O(n^{1.5}\log (1/\epsilon ))$$ time. We confirm our theoretical insights with numerical results on semidefinite programs as large as $$n=13{,}659$$ .

...read moreread less

21 citations

Proceedings Article•DOI•

CDPAM: Contrastive Learning for Perceptual Audio Similarity

[...]

Pranay Manocha¹, Zeyu Jin², Richard Zhang², Adam Finkelstein¹•Institutions (2)

Princeton University¹, Adobe Systems²

06 Jun 2021

TL;DR: In this paper, a metric called CDPAM is proposed to combine contrastive learning and multi-dimensional representations to improve generalization to a broader range of audio perturbations.

...read moreread less

Abstract: Many speech processing methods based on deep learning require an automatic and differentiable audio metric for the loss function. The DPAM approach of Manocha et al. [1] learns a full-reference metric trained directly on human judgments, and thus correlates well with human perception. However, it requires a large number of human annotations and does not generalize well outside the range of perturbations on which it was trained. This paper introduces CDPAM –a metric that builds on and advances DPAM. The primary improvement is to combine contrastive learning and multi-dimensional representations to build robust models from limited data. In addition, we collect human judgments on triplet comparisons to improve generalization to a broader range of audio perturbations. CDPAM correlates well with human responses across nine varied datasets. We also show that adding this metric to existing speech synthesis and enhancement methods yields significant improvement, as measured by objective and subjective tests.

...read moreread less

14 citations

Posted Content•

On Buggy Resizing Libraries and Surprising Subtleties in FID Calculation

[...]

Gaurav Parmar, Richard Zhang, Jun-Yan Zhu

22 Apr 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors investigate the sensitivity of the Fr\'echet Inception Distance (FID) score to inconsistent and often incorrect implementations across different image processing libraries and provide recommendations for computing the FID score accurately.

...read moreread less

Abstract: We investigate the sensitivity of the Fr\'echet Inception Distance (FID) score to inconsistent and often incorrect implementations across different image processing libraries. FID score is widely used to evaluate generative models, but each FID implementation uses a different low-level image processing process. Image resizing functions in commonly-used deep learning libraries often introduce aliasing artifacts. We observe that numerous subtle choices need to be made for FID calculation and a lack of consistencies in these choices can lead to vastly different FID scores. In particular, we show that the following choices are significant: (1) selecting what image resizing library to use, (2) choosing what interpolation kernel to use, (3) what encoding to use when representing images. We additionally outline numerous common pitfalls that should be avoided and provide recommendations for computing the FID score accurately. We provide an easy-to-use optimized implementation of our proposed recommendations in the accompanying code.

...read moreread less

13 citations

Posted Content•

The Low-Rank Simplicity Bias in Deep Networks

[...]

Minyoung Huh¹, Hossein Mobahi², Richard Zhang³, Brian Cheung¹, Pulkit Agrawal¹, Phillip Isola¹ - Show less +2 more•Institutions (3)

Massachusetts Institute of Technology¹, Google², Adobe Systems³

18 Mar 2021-arXiv: Learning

TL;DR: In this article, the authors make a series of empirical observations that investigate the hypothesis that deeper networks are inductively biased to find solutions with lower rank embeddings and conjecture that this bias exists because the volume of functions that maps to low-rank embedding increases with depth.

...read moreread less

Abstract: Modern deep neural networks are highly over-parameterized compared to the data on which they are trained, yet they often generalize remarkably well. A flurry of recent work has asked: why do deep networks not overfit to their training data? In this work, we make a series of empirical observations that investigate the hypothesis that deeper networks are inductively biased to find solutions with lower rank embeddings. We conjecture that this bias exists because the volume of functions that maps to low-rank embedding increases with depth. We show empirically that our claim holds true on finite width linear and non-linear models and show that these are the solutions that generalize well. We then show that the low-rank simplicity bias exists even after training, using a wide variety of commonly used optimizers. We found this phenomenon to be resilient to initialization, hyper-parameters, and learning methods. We further demonstrate how linear over-parameterization of deep non-linear models can be used to induce low-rank bias, improving generalization performance without changing the effective model capacity. Practically, we demonstrate that simply linearly over-parameterizing standard models at training time can improve performance on image classification tasks, including ImageNet.

...read moreread less

10 citations

Journal Article•DOI•

Uniqueness of Power Flow Solutions Using Monotonicity and Network Topology

[...]

SangWoo Park¹, Richard Zhang², Javad Lavaei¹, Ross Baldick³•Institutions (3)

University of California, Berkeley¹, University of Illinois at Urbana–Champaign², University of Texas at Austin³

01 Mar 2021-IEEE Transactions on Control of Network Systems

TL;DR: It is proved that the P-\Theta power flow problem has at most one solution for any acyclic or GSP graph, and it is shown that multiple distinct solutions cannot exist under the assumption that angle differences across the lines are bounded by some limit related to the maximal girth of the network.

...read moreread less

Abstract: This article establishes sufficient conditions for the uniqueness of AC power flow solutions via the monotonic relationship between real power flow and the phase angle difference. More specifically, we prove that the $P-\Theta$ power flow problem has at most one solution for any acyclic or GSP graph. In addition, for arbitrary cyclic power networks, we show that multiple distinct solutions cannot exist under the assumption that angle differences across the lines are bounded by some limit related to the maximal girth of the network. In these cases, a vector of voltage phase angles can be uniquely determined (up to an absolute phase shift) given a vector of real power injections within the realizable range. The implication of this result for the classical power flow analysis is that under the conditions specified above, the problem has a unique physically realizable solution if the phasor voltage magnitudes are fixed. We also introduce a series–parallel operator and show that this operator obtains a reduced and easier-to-analyze model for the power system without changing the uniqueness of power flow solutions.

...read moreread less

Posted Content•

Sharp Global Guarantees for Nonconvex Low-Rank Matrix Recovery in the Overparameterized Regime.

[...]

Richard Zhang¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

21 Apr 2021-arXiv: Optimization and Control

TL;DR: In this article, it was shown that for nonconvex low-rank matrix recovery, the restricted isometry property (RIP) is sufficient and sufficient for the existence of no spurious local minima.

...read moreread less

Abstract: We prove that it is possible for nonconvex low-rank matrix recovery to contain no spurious local minima when the rank of the unknown ground truth $r^{\star}

...read moreread less

Posted Content•DOI•

Reconstructing unobserved cellular states from paired single-cell lineage tracing and transcriptomics data

[...]

Ouardini K¹, Ouardini K², Ouardini K³, Romain Lopez¹, Matthew G. Jones¹, Prillo S¹, Richard Zhang¹, Michael I. Jordan¹, Nir Yosef - Show less +5 more•Institutions (3)

University of California, Berkeley¹, École Normale Supérieure², CentraleSupélec³

30 May 2021-bioRxiv

TL;DR: TreeVAE as discussed by the authors uses a variational autoencoder (VAE) to model the observed transcriptomic data while accounting for the phylogenetic relationships between cells, and it outperforms benchmarks in reconstructing ancestral states on several metrics.

...read moreread less

Abstract: AO_SCPLOWBSTRACTC_SCPLOWNovel experimental assays now simultaneously measure lineage relationships and transcriptomic states from single cells, thanks to CRISPR/Cas9-based genome engineering. These multimodal measurements allow researchers not only to build comprehensive phylogenetic models relating all cells but also infer transcriptomic determinants of consequential subclonal behavior. The gene expression data, however, is limited to cells that are currently present ("leaves" of the phylogeny). As a consequence, researchers cannot form hypotheses about unobserved, or "ancestral", states that gave rise to the observed population. To address this, we introduce TreeVAE: a probabilistic framework for estimating ancestral transcriptional states. TreeVAE uses a variational autoencoder (VAE) to model the observed transcriptomic data while accounting for the phylogenetic relationships between cells. Using simulations, we demonstrate that TreeVAE outperforms benchmarks in reconstructing ancestral states on several metrics. TreeVAE also provides a measure of uncertainty, which we demonstrate to correlate well with its prediction accuracy. This estimate therefore potentially provides a data-driven way to estimate how far back in the ancestor chain predictions could be made. Finally, using real data from lung cancer metastasis, we show that accounting for phylogenetic relationship between cells improves goodness of fit. Together, TreeVAE provides a principled framework for reconstructing unobserved cellular states from single cell lineage tracing data.

...read moreread less

Posted Content•

Editing Conditional Radiance Fields

[...]

Steven Liu¹, Xiuming Zhang¹, Zhoutong Zhang², Richard Zhang³, Jun-Yan Zhu³, Bryan Russell³ - Show less +2 more•Institutions (3)

Massachusetts Institute of Technology¹, Google², Adobe Systems³

13 May 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors propose a method for propagating coarse 2D user scribbles to the 3D space to modify the color or shape of a local region, which can be used for editing the appearance and shape of real photographs.

...read moreread less

Abstract: A neural radiance field (NeRF) is a scene model supporting high-quality view synthesis, optimized per scene. In this paper, we explore enabling user editing of a category-level NeRF - also known as a conditional radiance field - trained on a shape category. Specifically, we introduce a method for propagating coarse 2D user scribbles to the 3D space, to modify the color or shape of a local region. First, we propose a conditional radiance field that incorporates new modular network components, including a shape branch that is shared across object instances. Observing multiple instances of the same category, our model learns underlying part semantics without any supervision, thereby allowing the propagation of coarse 2D user scribbles to the entire 3D region (e.g., chair seat). Next, we propose a hybrid network update strategy that targets specific network components, which balances efficiency and accuracy. During user interaction, we formulate an optimization problem that both satisfies the user's constraints and preserves the original object structure. We demonstrate our approach on various editing tasks over three shape datasets and show that it outperforms prior neural editing approaches. Finally, we edit the appearance and shape of a real photograph and show that the edit propagates to extrapolated novel views.

...read moreread less

Posted Content•

Anycost GANs for Interactive Image Synthesis and Editing

[...]

Ji Lin¹, Richard Zhang², Frieder Ganz², Song Han¹, Jun-Yan Zhu² - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, Adobe Systems²

04 Mar 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a generative adversarial network (GAN) was proposed for interactive natural image editing, which can provide perceptually similar previews at 6-12x speedup, enabling interactive image editing.

...read moreread less

Abstract: Generative adversarial networks (GANs) have enabled photorealistic image synthesis and editing. However, due to the high computational cost of large-scale generators (e.g., StyleGAN2), it usually takes seconds to see the results of a single edit on edge devices, prohibiting interactive user experience. In this paper, we take inspirations from modern rendering software and propose Anycost GAN for interactive natural image editing. We train the Anycost GAN to support elastic resolutions and channels for faster image generation at versatile speeds. Running subsets of the full generator produce outputs that are perceptually similar to the full generator, making them a good proxy for preview. By using sampling-based multi-resolution training, adaptive-channel training, and a generator-conditioned discriminator, the anycost generator can be evaluated at various configurations while achieving better image quality compared to separately trained models. Furthermore, we develop new encoder training and latent code optimization techniques to encourage consistency between the different sub-generators during image projection. Anycost GAN can be executed at various cost budgets (up to 10x computation reduction) and adapt to a wide range of hardware and latency requirements. When deployed on desktop CPUs and edge devices, our model can provide perceptually similar previews at 6-12x speedup, enabling interactive image editing. The code and demo are publicly available: this https URL.

...read moreread less

Posted Content•DOI•

Theoretical Guarantees for Phylogeny Inference from Single-Cell Lineage Tracing

[...]

Robert Y. Wang¹, Richard Zhang¹, Alex Khodaverdian¹, Nir Yosef¹•Institutions (1)

University of California, Berkeley¹

22 Nov 2021-bioRxiv

TL;DR: In this paper, two theoretically grounded algorithms for reconstruction of the underlying phylogenetic tree, as well as asymptotic bounds for the number of recording sites necessary for exact recapitulation of the ground truth phylogeny at high probability, are presented.

...read moreread less

Abstract: CRISPR-Cas9 lineage tracing technologies have emerged as a powerful tool for investigating develop-ment in single-cell contexts, but exact reconstruction of the underlying clonal relationships in experiment is plagued by data-related complications. These complications are functions of the experimental parameters in these systems, such as the Cas9 cutting rate, the diversity of indel outcomes, and the rate of missing data. In this paper, we develop two theoretically grounded algorithms for reconstruction of the underlying phylogenetic tree, as well as asymptotic bounds for the number of recording sites necessary for exact recapitulation of the ground truth phylogeny at high probability. In doing so, we explore the relationship between the problem difficulty and the experimental parameters, with implications for experimental design. Lastly, we provide simulations validating these bounds and showing the empirical performance of these algorithms. Overall, this work provides a first theoretical analysis of phylogenetic reconstruction in the CRISPR-Cas9 lineage tracing technology.

...read moreread less

Posted Content•

Ensembling with Deep Generative Views

[...]

Lucy Chai¹, Jun-Yan Zhu², Eli Shechtman², Phillip Isola¹, Richard Zhang² - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, Adobe Systems²

29 Apr 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, the authors use StyleGAN2 as the source of GAN-based augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars, and find that while test-time ensembling with GANbased augmented images can offer some small improvements, the remaining bottlenecks are the efficiency and accuracy of the GAN reconstructions, coupled with classifier sensitivities to artifacts in GANgenerated images.

...read moreread less

Abstract: Recent generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose, simply by learning from unlabeled image collections. Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification. Using a pretrained generator, we first find the latent code corresponding to a given real input image. Applying perturbations to the code creates natural variations of the image, which can then be ensembled together at test-time. We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars. Critically, we find that several design decisions are required towards making this process work; the perturbation procedure, weighting between the augmentations and original image, and training the classifier on synthesized images can all impact the result. Currently, we find that while test-time ensembling with GAN-based augmentations can offer some small improvements, the remaining bottlenecks are the efficiency and accuracy of the GAN reconstructions, coupled with classifier sensitivities to artifacts in GAN-generated images.

...read moreread less

Proceedings Article•

Editing Conditional Radiance Fields

[...]

Steven Liu¹, Xiuming Zhang¹, Zhoutong Zhang², Richard Zhang³, Jun-Yan Zhu³, Bryan Russell³ - Show less +2 more•Institutions (3)

Massachusetts Institute of Technology¹, Google², Adobe Systems³

13 May 2021

TL;DR: In this article, the authors propose a method for propagating coarse 2D user scribbles to the 3D space to modify the color or shape of a local region, which can be used for editing the appearance and shape of real photographs.

...read moreread less

Abstract: A neural radiance field (NeRF) is a scene model supporting high-quality view synthesis, optimized per scene. In this paper, we explore enabling user editing of a category-level NeRF - also known as a conditional radiance field - trained on a shape category. Specifically, we introduce a method for propagating coarse 2D user scribbles to the 3D space, to modify the color or shape of a local region. First, we propose a conditional radiance field that incorporates new modular network components, including a shape branch that is shared across object instances. Observing multiple instances of the same category, our model learns underlying part semantics without any supervision, thereby allowing the propagation of coarse 2D user scribbles to the entire 3D region (e.g., chair seat). Next, we propose a hybrid network update strategy that targets specific network components, which balances efficiency and accuracy. During user interaction, we formulate an optimization problem that both satisfies the user's constraints and preserves the original object structure. We demonstrate our approach on various editing tasks over three shape datasets and show that it outperforms prior neural editing approaches. Finally, we edit the appearance and shape of a real photograph and show that the edit propagates to extrapolated novel views.

...read moreread less

Posted Content•

Few-shot Image Generation via Cross-domain Correspondence

[...]

Utkarsh Ojha¹, Yijun Li¹, Jingwan Lu¹, Alexei A. Efros¹, Yong Jae Lee², Eli Shechtman¹, Richard Zhang¹ - Show less +3 more•Institutions (2)

Adobe Systems¹, University of California, Davis²

13 Apr 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a cross-domain distance consistency loss is proposed to preserve the relative similarities and differences between instances in the source via a novel cross-source distance consistency layer and an anchor-based strategy to encourage different levels of realism over different regions in the latent space.

...read moreread less

Abstract: Training generative models, such as GANs, on a target domain containing limited examples (e.g., 10) can easily result in overfitting. In this work, we seek to utilize a large source domain for pretraining and transfer the diversity information from source to target. We propose to preserve the relative similarities and differences between instances in the source via a novel cross-domain distance consistency loss. To further reduce overfitting, we present an anchor-based strategy to encourage different levels of realism over different regions in the latent space. With extensive results in both photorealistic and non-photorealistic domains, we demonstrate qualitatively and quantitatively that our few-shot model automatically discovers correspondences between source and target domains and generates more diverse and realistic images than previous methods.

...read moreread less

Patent•

Utilizing a colorization neural network to generate colorized images based on interactive color edges

[...]

Seungjoo Yoo¹, Richard Zhang, Matthew Fisher, Jingwan Lu•Institutions (1)

Adobe Systems¹

04 May 2021

TL;DR: In this paper, the edge prediction neural network and edge-guided colorization neural network are used to transform grayscale digital images into colorized digital images, which can be presented to a user via a colorization graphical user interface and receive color points and color edge modifications.

...read moreread less

Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for utilizing an edge prediction neural network and edge-guided colorization neural network to transform grayscale digital images into colorized digital images. In one or more embodiments, the disclosed systems apply a color edge prediction neural network to a grayscale image to generate a color edge map indicating predicted chrominance edges. The disclosed systems can present the color edge map to a user via a colorization graphical user interface and receive user color points and color edge modifications. The disclosed systems can apply a second neural network, an edge-guided colorization neural network, to the color edge map or a modified edge map, user color points, and the grayscale image to generate an edge-constrained colorized digital image.

...read moreread less

Posted Content•

Contrastive Feature Loss for Image Prediction

[...]

Alex Andonian¹, Taesung Park², Bryan Russell³, Phillip Isola¹, Jun-Yan Zhu¹, Richard Zhang² - Show less +2 more•Institutions (3)

Massachusetts Institute of Technology¹, University of California, Berkeley², Adobe Systems³

12 Nov 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, the authors introduce an information theory based approach to measure similarity between two images, which enables learning a lightweight critic to calibrate a feature space in a contrastive manner, such that reconstructions of corresponding spatial patches are brought together, while other patches are repulsed.

...read moreread less

Abstract: Training supervised image synthesis models requires a critic to compare two images: the ground truth to the result. Yet, this basic functionality remains an open problem. A popular line of approaches uses the L1 (mean absolute error) loss, either in the pixel or the feature space of pretrained deep networks. However, we observe that these losses tend to produce overly blurry and grey images, and other techniques such as GANs need to be employed to fight these artifacts. In this work, we introduce an information theory based approach to measuring similarity between two images. We argue that a good reconstruction should have high mutual information with the ground truth. This view enables learning a lightweight critic to "calibrate" a feature space in a contrastive manner, such that reconstructions of corresponding spatial patches are brought together, while other patches are repulsed. We show that our formulation immediately boosts the perceptual realism of output images when used as a drop-in replacement for the L1 loss, with or without an additional GAN loss.

...read moreread less

Showing papers by "Richard Zhang published in 2021"