Home
/
Authors
/
Or Patashnik

Author

Or Patashnik

Bio: Or Patashnik is an academic researcher from Tel Aviv University. The author has contributed to research in topics: Computer science & Encoder. The author has an hindex of 7, co-authored 14 publications receiving 208 citations.

Topics: Computer science, Encoder, Real image, Image editing, Image translation ...read more

Papers

PDF

Open Access

More filters

Posted Content•

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

[...]

Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan¹, Yaniv Azar, Stav Shapiro, Daniel Cohen-Or¹ - Show less +3 more•Institutions (1)

Tel Aviv University¹

03 Aug 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work presents a generic image-to-image translation framework, pixel2style2pixel (pSp), based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator, forming the extended latent space.

...read moreread less

Abstract: We present a generic image-to-image translation framework, Pixel2Style2Pixel (pSp). Our pSp framework is based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator, forming the extended W+ latent space. We first show that our encoder can directly embed real images into W+, with no additional optimization. We further introduce a dedicated identity loss which is shown to achieve improved performance in the reconstruction of an input image. We demonstrate pSp to be a simple architecture that, by leveraging a well-trained, fixed generator network, can be easily applied on a wide-range of image-to-image translation tasks. Solving these tasks through the style representation results in a global approach that does not rely on a local pixel-to-pixel correspondence and further supports multi-modal synthesis via the resampling of styles. Notably, we demonstrate that pSp can be trained to align a face image to a frontal pose without any labeled data, generate multi-modal results for ambiguous tasks such as conditional face generation from segmentation maps, and construct high-resolution images from corresponding low-resolution images.

...read moreread less

504 citations

Proceedings Article•DOI•

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

[...]

Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan¹, Yaniv Azar, Stav Shapiro, Daniel Cohen-Or¹ - Show less +3 more•Institutions (1)

Tel Aviv University¹

04 May 2021

TL;DR: The pixel2style2pixel (pSp) as discussed by the authors framework is based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator, forming the extended $\mathcal{W} + $ latent space.

...read moreread less

Abstract: We present a generic image-to-image translation framework, pixel2style2pixel (pSp). Our pSp framework is based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator, forming the extended $\mathcal{W} + $ latent space. We first show that our encoder can directly embed real images into $\mathcal{W} + $, with no additional optimization. Next, we propose utilizing our encoder to directly solve image-to-image translation tasks, defining them as encoding problems from some input domain into the latent domain. By deviating from the standard "invert first, edit later" methodology used with previous StyleGAN encoders, our approach can handle a variety of tasks even when the input image is not represented in the StyleGAN domain. We show that solving translation tasks through StyleGAN significantly simplifies the training process, as no adversary is required, has better support for solving tasks without pixel-to-pixel correspondence, and inherently supports multi-modal synthesis via the resampling of styles. Finally, we demonstrate the potential of our framework on a variety of facial image-to-image translation tasks, even when compared to state-of-the-art solutions designed specifically for a single task, and further show that it can be extended beyond the human facial domain. Code is available at https://github.com/eladrich/pixel2style2pixel.

...read moreread less

301 citations

Proceedings Article•DOI•

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

[...]

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit Bermano, G. Chechik, Daniel Cohen-Or - Show less +3 more

02 Aug 2022

TL;DR: This work uses only 3 - 5 images of a user-provided concept to represent it through new “words” in the embedding space of a frozen text-to-image model, which can be composed into natural language sentences, guiding personalized creation in an intuitive way.

...read moreread less

Abstract: Text-to-image models offer unprecedented freedom to guide creation through natural language. Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes. In other words, we ask: how can we use language-guided models to turn our cat into a painting, or imagine a new product based on our favorite toy? Here we present a simple approach that allows such creative freedom. Using only 3-5 images of a user-provided concept, like an object or a style, we learn to represent it through new"words"in the embedding space of a frozen text-to-image model. These"words"can be composed into natural language sentences, guiding personalized creation in an intuitive way. Notably, we find evidence that a single word embedding is sufficient for capturing unique and varied concepts. We compare our approach to a wide range of baselines, and demonstrate that it can more faithfully portray the concepts across a range of applications and tasks. Our code, data and new words will be available at: https://textual-inversion.github.io

...read moreread less

291 citations

Journal Article•DOI•

Designing an encoder for StyleGAN image manipulation

[...]

Omer Tov¹, Yuval Alaluf¹, Yotam Nitzan¹, Or Patashnik¹, Daniel Cohen-Or¹ - Show less +1 more•Institutions (1)

Tel Aviv University¹

17 Jul 2021-ACM Transactions on Graphics

TL;DR: In this article, the authors identify and analyze the existence of a distortioneditability tradeoff and a distortionperception tradeoff within the StyleGAN latent space, and suggest two principles for designing encoders in a manner that allows one to control the proximity of the inversions to regions that StyleGAN was originally trained on.

...read moreread less

Abstract: Recently, there has been a surge of diverse methods for performing image editing by employing pre-trained unconditional generators. Applying these methods on real images, however, remains a challenge, as it necessarily requires the inversion of the images into their latent space. To successfully invert a real image, one needs to find a latent code that reconstructs the input image accurately, and more importantly, allows for its meaningful manipulation. In this paper, we carefully study the latent space of StyleGAN, the state-of-the-art unconditional generator. We identify and analyze the existence of a distortion-editability tradeoff and a distortion-perception tradeoff within the StyleGAN latent space. We then suggest two principles for designing encoders in a manner that allows one to control the proximity of the inversions to regions that StyleGAN was originally trained on. We present an encoder based on our two principles that is specifically designed for facilitating editing on real images by balancing these tradeoffs. By evaluating its performance qualitatively and quantitatively on numerous challenging domains, including cars and horses, we show that our inversion method, followed by common editing techniques, achieves superior real-image editing quality, with only a small reconstruction accuracy drop.

...read moreread less

158 citations

Journal Article•DOI•

Only a matter of style: age transformation using a style-based regression model

[...]

Yuval Alaluf¹, Or Patashnik¹, Daniel Cohen-Or¹•Institutions (1)

Tel Aviv University¹

17 Jul 2021-ACM Transactions on Graphics

TL;DR: In this article, an image-to-image translation method that learns to directly encode real facial images into the latent space of a pre-trained unconditional GAN (e.g., StyleGAN) subject to a given aging shift is presented.

...read moreread less

Abstract: The task of age transformation illustrates the change of an individual's appearance over time. Accurately modeling this complex transformation over an input facial image is extremely challenging as it requires making convincing, possibly large changes to facial features and head shape, while still preserving the input identity. In this work, we present an image-to-image translation method that learns to directly encode real facial images into the latent space of a pre-trained unconditional GAN (e.g., StyleGAN) subject to a given aging shift. We employ a pre-trained age regression network to explicitly guide the encoder in generating the latent codes corresponding to the desired age. In this formulation, our method approaches the continuous aging process as a regression task between the input age and desired target age, providing fine-grained control over the generated image. Moreover, unlike approaches that operate solely in the latent space using a prior on the path controlling age, our method learns a more disentangled, non-linear path. Finally, we demonstrate that the end-to-end nature of our approach, coupled with the rich semantic latent space of StyleGAN, allows for further editing of the generated images. Qualitative and quantitative evaluations show the advantages of our method compared to state-of-the-art approaches. Code is available at our project page: https://yuval-alaluf.github.io/SAM.

...read moreread less

87 citations

1
2
3
4
…
5

Cited by

PDF

Open Access

More filters

Journal Article•

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

[...]

杉山拓海

12 Sep 2017-Computers & Graphics

3,940 citations

Posted Content•

Alias-Free Generative Adversarial Networks

[...]

Tero Karras¹, Miika Aittala¹, Samuli Laine¹, Erik Härkönen², Janne Hellsten¹, Jaakko Lehtinen², Timo Aila¹ - Show less +3 more•Institutions (2)

Nvidia¹, Aalto University²

23 Jun 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: It is observed that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner, and small architectural changes are derived that guarantee that unwanted information cannot leak into the hierarchical synthesis process.

...read moreread less

Abstract: We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearing to be glued to image coordinates instead of the surfaces of depicted objects. We trace the root cause to careless signal processing that causes aliasing in the generator network. Interpreting all signals in the network as continuous, we derive generally applicable, small architectural changes that guarantee that unwanted information cannot leak into the hierarchical synthesis process. The resulting networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales. Our results pave the way for generative models better suited for video and animation.

...read moreread less

621 citations

Journal Article•DOI•

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

[...]

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, Kfir Aberman - Show less +2 more

25 Aug 2022-arXiv.org

TL;DR: A new approach for “personalization” of text-to-image diffusion models (specializing them to users’ needs) and leveraging the semantic prior embedded in the model with a new autogenous class-speciﬁc prior preservation loss enables synthesizing the subject in diverse scenes, poses, views, and lighting conditions that do not appear in the reference images.

...read moreread less

Abstract: Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in different contexts. In this work, we present a new approach for"personalization"of text-to-image diffusion models. Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can be used to synthesize novel photorealistic images of the subject contextualized in different scenes. By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views and lighting conditions that do not appear in the reference images. We apply our technique to several previously-unassailable tasks, including subject recontextualization, text-guided view synthesis, and artistic rendering, all while preserving the subject's key features. We also provide a new dataset and evaluation protocol for this new task of subject-driven generation. Project page: https://dreambooth.github.io/

...read moreread less

381 citations

Proceedings Article•DOI•

RePaint: Inpainting using Denoising Diffusion Probabilistic Models

[...]

Andreas Lugmayr, Martin Danelljan, Andrés Romero, Fisher Yu, Radu Timofte, Luc Van Gool - Show less +2 more

24 Jan 2022

TL;DR: This work proposes RePaint: A Denoising Diffusion Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks and outperforms state-of-the-art Autoregressive, and GAN approaches for at least five out of six mask distributions.

...read moreread less

Abstract: Free-form inpainting is the task of adding new content to an image in the regions specified by an arbitrary binary mask. Most existing approaches train for a certain distribution of masks, which limits their generalization capabilities to unseen mask types. Furthermore, training with pixel-wise and perceptual losses often leads to simple textural extensions towards the missing areas instead of semantically meaningful generation. In this work, we propose RePaint: A Denoising Diffusion Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks. We employ a pretrained unconditional DDPM as the generative prior. To condition the generation process, we only alter the reverse diffusion iterations by sampling the unmasked regions using the given image infor-mation. Since this technique does not modify or condition the original DDPM network itself, the model produces high-quality and diverse output images for any inpainting form. We validate our method for both faces and general-purpose image inpainting using standard and extreme masks. Re-Paint outperforms state-of-the-art Autoregressive, and GAN approaches for at least five out of six mask distributions. Github Repository: git.io/RePaint

...read moreread less

292 citations

Proceedings Article•DOI•

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

[...]

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit Bermano, G. Chechik, Daniel Cohen-Or - Show less +3 more

02 Aug 2022

...read moreread less

291 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse