Top 23 papers published by Jimei Yang from Adobe Systems in 2018

Proceedings Article•DOI•

Generative Image Inpainting with Contextual Attention

[...]

Jiahui Yu¹, Zhe Lin², Jimei Yang², Xiaohui Shen², Xin Lu², Thomas S. Huang¹ - Show less +2 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Adobe Systems²

18 Jun 2018

TL;DR: Yu et al. as discussed by the authors proposed a new deep generative model-based approach which can not only synthesize novel image structures but also explicitly utilize surrounding image features as references during network training to make better predictions.

...read moreread less

Abstract: Recent deep learning based approaches have shown promising results for the challenging task of inpainting large missing regions in an image. These methods can generate visually plausible image structures and textures, but often create distorted structures or blurry textures inconsistent with surrounding areas. This is mainly due to ineffectiveness of convolutional neural networks in explicitly borrowing or copying information from distant spatial locations. On the other hand, traditional texture and patch synthesis approaches are particularly suitable when it needs to borrow textures from the surrounding regions. Motivated by these observations, we propose a new deep generative model-based approach which can not only synthesize novel image structures but also explicitly utilize surrounding image features as references during network training to make better predictions. The model is a feedforward, fully convolutional neural network which can process images with multiple holes at arbitrary locations and with variable sizes during the test time. Experiments on multiple datasets including faces (CelebA, CelebA-HQ), textures (DTD) and natural images (ImageNet, Places2) demonstrate that our proposed approach generates higher-quality inpainting results than existing ones. Code, demo and models are available at: https://github.com/JiahuiYu/generative_inpainting.

...read moreread less

1,397 citations

Posted Content•

Generative Image Inpainting with Contextual Attention

[...]

Jiahui Yu¹, Zhe Lin², Jimei Yang², Xiaohui Shen², Xin Lu², Thomas S. Huang¹ - Show less +2 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Adobe Systems²

24 Jan 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a new deep generative model-based approach is proposed which can not only synthesize novel image structures but also explicitly utilize surrounding image features as references during network training to make better predictions.

...read moreread less

Abstract: Recent deep learning based approaches have shown promising results for the challenging task of inpainting large missing regions in an image. These methods can generate visually plausible image structures and textures, but often create distorted structures or blurry textures inconsistent with surrounding areas. This is mainly due to ineffectiveness of convolutional neural networks in explicitly borrowing or copying information from distant spatial locations. On the other hand, traditional texture and patch synthesis approaches are particularly suitable when it needs to borrow textures from the surrounding regions. Motivated by these observations, we propose a new deep generative model-based approach which can not only synthesize novel image structures but also explicitly utilize surrounding image features as references during network training to make better predictions. The model is a feed-forward, fully convolutional neural network which can process images with multiple holes at arbitrary locations and with variable sizes during the test time. Experiments on multiple datasets including faces (CelebA, CelebA-HQ), textures (DTD) and natural images (ImageNet, Places2) demonstrate that our proposed approach generates higher-quality inpainting results than existing ones. Code, demo and models are available at: this https URL.

...read moreread less

1,333 citations

Posted Content•

Free-Form Image Inpainting with Gated Convolution

[...]

Jiahui Yu¹, Zhe Lin², Jimei Yang², Xiaohui Shen, Xin Lu², Thomas S. Huang¹ - Show less +2 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Adobe Systems²

10 Jun 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: The proposed gated convolution solves the issue of vanilla convolution that treats all input pixels as valid ones, generalizes partial convolution by providing a learnable dynamic feature selection mechanism for each channel at each spatial location across all layers.

...read moreread less

Abstract: We present a generative image inpainting system to complete images with free-form mask and guidance. The system is based on gated convolutions learned from millions of images without additional labelling efforts. The proposed gated convolution solves the issue of vanilla convolution that treats all input pixels as valid ones, generalizes partial convolution by providing a learnable dynamic feature selection mechanism for each channel at each spatial location across all layers. Moreover, as free-form masks may appear anywhere in images with any shape, global and local GANs designed for a single rectangular mask are not applicable. Thus, we also present a patch-based GAN loss, named SN-PatchGAN, by applying spectral-normalized discriminator on dense image patches. SN-PatchGAN is simple in formulation, fast and stable in training. Results on automatic image inpainting and user-guided extension demonstrate that our system generates higher-quality and more flexible results than previous methods. Our system helps user quickly remove distracting objects, modify image layouts, clear watermarks and edit faces. Code, demo and models are available at: this https URL

...read moreread less

848 citations

Proceedings Article•DOI•

MAttNet: Modular Attention Network for Referring Expression Comprehension

[...]

Licheng Yu¹, Zhe Lin², Xiaohui Shen², Jimei Yang², Xin Lu², Mohit Bansal¹, Tamara L. Berg¹ - Show less +3 more•Institutions (2)

University of North Carolina at Chapel Hill¹, Adobe Systems²

01 Jun 2018

TL;DR: The authors decompose expressions into three modular components related to subject appearance, location, and relationship to other objects in an end-to-end framework, which allows to flexibly adapt to expressions containing different types of information.

...read moreread less

Abstract: In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression. While most recent work treats expressions as a single unit, we propose to decompose them into three modular components related to subject appearance, location, and relationship to other objects. This allows us to flexibly adapt to expressions containing different types of information in an end-to-end framework. In our model, which we call the Modular Attention Network (MAttNet), two types of attention are utilized: language-based attention that learns the module weights as well as the word/phrase attention that each module should focus on; and visual attention that allows the subject and relationship modules to focus on relevant image components. Module weights combine scores from all three modules dynamically to output an overall score. Experiments show that MAttNet outperforms previous state-of-the-art methods by a large margin on both bounding-box-level and pixel-level comprehension tasks. Demo1 and code2 are provided.

...read moreread less

626 citations

Posted Content•

On the Continuity of Rotation Representations in Neural Networks

[...]

Yi Zhou¹, Connelly Barnes², Jingwan Lu³, Jimei Yang³, Hao Li¹ - Show less +1 more•Institutions (3)

University of Southern California¹, University of Virginia², Adobe Systems³

17 Dec 2018-arXiv: Learning

TL;DR: A definition of a continuous representation is advanced, which can be helpful for training deep neural networks and related to topological concepts such as homeomorphism and embedding, and results show that continuous rotation representations outperform discontinuous ones for several practical problems in graphics and vision.

...read moreread less

Abstract: In neural networks, it is often desirable to work with various representations of the same space. For example, 3D rotations can be represented with quaternions or Euler angles. In this paper, we advance a definition of a continuous representation, which can be helpful for training deep neural networks. We relate this to topological concepts such as homeomorphism and embedding. We then investigate what are continuous and discontinuous representations for 2D, 3D, and n-dimensional rotations. We demonstrate that for 3D rotations, all representations are discontinuous in the real Euclidean spaces of four or fewer dimensions. Thus, widely used representations such as quaternions and Euler angles are discontinuous and difficult for neural networks to learn. We show that the 3D rotations have continuous representations in 5D and 6D, which are more suitable for learning. We also present continuous representations for the general case of the n-dimensional rotation group SO(n). While our main focus is on rotations, we also show that our constructions apply to other groups such as the orthogonal group and similarity transforms. We finally present empirical results, which show that our continuous rotation representations outperform discontinuous ones for several practical problems in graphics and vision, including a simple autoencoder sanity test, a rotation estimator for 3D point clouds, and an inverse kinematics solver for 3D human poses.

...read moreread less

464 citations

Book Chapter•DOI•

BodyNet: Volumetric Inference of 3D Human Body Shapes

[...]

Gül Varol, Duygu Ceylan¹, Bryan Russell¹, Jimei Yang¹, Ersin Yumer¹, Ivan Laptev, Cordelia Schmid - Show less +3 more•Institutions (1)

Adobe Systems¹

08 Sep 2018

TL;DR: BodyNet as mentioned in this paper proposes an end-to-end trainable network that benefits from a volumetric 3D loss, a multi-view re-projection loss, and intermediate supervision of 2D pose, 2D body part segmentation, and 3D pose.

...read moreread less

Abstract: Human shape estimation is an important task for video editing, animation and fashion industry. Predicting 3D human body shape from natural images, however, is highly challenging due to factors such as variation in human bodies, clothing and viewpoint. Prior methods addressing this problem typically attempt to fit parametric body models with certain priors on pose and shape. In this work we argue for an alternative representation and propose BodyNet, a neural network for direct inference of volumetric body shape from a single image. BodyNet is an end-to-end trainable network that benefits from (i) a volumetric 3D loss, (ii) a multi-view re-projection loss, and (iii) intermediate supervision of 2D pose, 2D body part segmentation, and 3D pose. Each of them results in performance improvement as demonstrated by our experiments. To evaluate the method, we fit the SMPL model to our network output and show state-of-the-art results on the SURREAL and Unite the People datasets, outperforming recent approaches. Besides achieving state-of-the-art performance, our method also enables volumetric body-part segmentation.

...read moreread less

385 citations

Proceedings Article•DOI•

PlaneNet: Piece-Wise Planar Reconstruction from a Single RGB Image

[...]

Chen Liu¹, Jimei Yang², Duygu Ceylan², Ersin Yumer, Yasutaka Furukawa³ - Show less +1 more•Institutions (3)

Washington University in St. Louis¹, Adobe Systems², Simon Fraser University³

18 Jun 2018

TL;DR: This paper presents the first end-to-end neural architecture for piece-wise planar reconstruction from a single RGB image, and outperforms baseline methods in terms of both plane segmentation and depth estimation accuracy.

...read moreread less

Abstract: This paper proposes a deep neural network (DNN) for piece-wise planar depthmap reconstruction from a single RGB image. While DNNs have brought remarkable progress to single-image depth prediction, piece-wise planar depthmap reconstruction requires a structured geometry representation, and has been a difficult task to master even for DNNs. The proposed end-to-end DNN learns to directly infer a set of plane parameters and corresponding plane segmentation masks from a single RGB image. We have generated more than 50,000 piece-wise planar depthmaps for training and testing from ScanNet, a large-scale RGBD video database. Our qualitative and quantitative evaluations demonstrate that the proposed approach outperforms baseline methods in terms of both plane segmentation and depth estimation accuracy. To the best of our knowledge, this paper presents the first end-to-end neural architecture for piece-wise planar reconstruction from a single RGB image. Code and data are available at https://github.com/art-programmer/PlaneNet.

...read moreread less

183 citations

Proceedings Article•DOI•

Neural Kinematic Networks for Unsupervised Motion Retargetting

[...]

Ruben Villegas¹, Jimei Yang², Duygu Ceylan², Honglak Lee¹•Institutions (2)

University of Michigan¹, Adobe Systems²

16 Apr 2018

TL;DR: In this paper, a recurrent neural network architecture with a Forward Kinematics layer and cycle consistency based adversarial training objective is proposed for unsupervised motion retargeting, which works online and adapts the motion sequence on-the-fly as new frames are received.

...read moreread less

Abstract: We propose a recurrent neural network architecture with a Forward Kinematics layer and cycle consistency based adversarial training objective for unsupervised motion retargetting. Our network captures the high-level properties of an input motion by the forward kinematics layer, and adapts them to a target character with different skeleton bone lengths (e.g., shorter, longer arms etc.). Collecting paired motion training sequences from different characters is expensive. Instead, our network utilizes cycle consistency to learn to solve the Inverse Kinematics problem in an unsupervised manner. Our method works online, i.e., it adapts the motion sequence on-the-fly as new frames are received. In our experiments, we use the Mixamo animation data1 to test our method for a variety of motions and characters and achieve state-of-the-art results. We also demonstrate motion retargetting from monocular human videos to 3D characters using an off-the-shelf 3D pose estimator.

...read moreread less

166 citations

Posted Content•

BodyNet: Volumetric Inference of 3D Human Body Shapes

[...]

Gül Varol, Duygu Ceylan¹, Bryan Russell¹, Jimei Yang¹, Ersin Yumer¹, Ivan Laptev, Cordelia Schmid - Show less +3 more•Institutions (1)

Adobe Systems¹

13 Apr 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: BodyNet is an end-to-end trainable network that benefits from a volumetric 3D loss, a multi-view re-projection loss, and intermediate supervision of 2D pose, 2D body part segmentation, and 3D pose and achieves state-of-the-art performance.

...read moreread less

Abstract: Human shape estimation is an important task for video editing, animation and fashion industry. Predicting 3D human body shape from natural images, however, is highly challenging due to factors such as variation in human bodies, clothing and viewpoint. Prior methods addressing this problem typically attempt to fit parametric body models with certain priors on pose and shape. In this work we argue for an alternative representation and propose BodyNet, a neural network for direct inference of volumetric body shape from a single image. BodyNet is an end-to-end trainable network that benefits from (i) a volumetric 3D loss, (ii) a multi-view re-projection loss, and (iii) intermediate supervision of 2D pose, 2D body part segmentation, and 3D pose. Each of them results in performance improvement as demonstrated by our experiments. To evaluate the method, we fit the SMPL model to our network output and show state-of-the-art results on the SURREAL and Unite the People datasets, outperforming recent approaches. Besides achieving state-of-the-art performance, our method also enables volumetric body-part segmentation.

...read moreread less

150 citations

Book Chapter•DOI•

Flow-Grounded Spatial-Temporal Video Prediction from Still Images

[...]

Yijun Li¹, Chen Fang², Jimei Yang², Zhaowen Wang², Xin Lu², Ming-Hsuan Yang¹ - Show less +2 more•Institutions (2)

University of California, Merced¹, Adobe Systems²

08 Sep 2018

TL;DR: Wang et al. as mentioned in this paper formulated the multi-frame prediction task as a multiple time step flow (multi-flow) prediction phase followed by a flow-to-frame synthesis phase.

...read moreread less

Abstract: Existing video prediction methods mainly rely on observing multiple historical frames or focus on predicting the next one-frame. In this work, we study the problem of generating consecutive multiple future frames by observing one single still image only. We formulate the multi-frame prediction task as a multiple time step flow (multi-flow) prediction phase followed by a flow-to-frame synthesis phase. The multi-flow prediction is modeled in a variational probabilistic manner with spatial-temporal relationships learned through 3D convolutions. The flow-to-frame synthesis is modeled as a generative process in order to keep the predicted results lying closer to the manifold shape of real video sequence. Such a two-phase design prevents the model from directly looking at the high-dimensional pixel space of the frame sequence and is demonstrated to be more effective in predicting better and diverse results. Extensive experimental results on videos with different types of motion show that the proposed algorithm performs favorably against existing methods in terms of quality, diversity and human perceptual evaluation.

...read moreread less

117 citations

Proceedings Article•

LayoutGAN: Generating Graphic Layouts with Wireframe Discriminators.

[...]

Jianan Li¹, Jimei Yang², Aaron Hertzmann³, Jianming Zhang², Tingfa Xu¹ - Show less +1 more•Institutions (3)

Beijing Institute of Technology¹, Adobe Systems², University of Toronto³

27 Sep 2018

TL;DR: A novel differentiable wireframe rendering layer is proposed that maps the generated layout to a wireframe image, upon which a CNN-based discriminator is used to optimize the layouts in image space.

...read moreread less

Abstract: Layout is important for graphic design and scene generation. We propose a novel Generative Adversarial Network, called LayoutGAN, that synthesizes layouts by modeling geometric relations of different types of 2D elements. The generator of LayoutGAN takes as input a set of randomly-placed 2D graphic elements and uses self-attention modules to refine their labels and geometric parameters jointly to produce a realistic layout. Accurate alignment is critical for good layouts. We thus propose a novel differentiable wireframe rendering layer that maps the generated layout to a wireframe image, upon which a CNN-based discriminator is used to optimize the layouts in image space. We validate the effectiveness of LayoutGAN in various experiments including MNIST digit generation, document layout generation, clipart abstract scene generation and tangram graphic design.

...read moreread less

Posted Content•

PlaneNet: Piece-wise Planar Reconstruction from a Single RGB Image

[...]

Chen Liu¹, Jimei Yang², Duygu Ceylan², Ersin Yumer, Yasutaka Furukawa³ - Show less +1 more•Institutions (3)

Washington University in St. Louis¹, Adobe Systems², Simon Fraser University³

17 Apr 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, the authors proposed an end-to-end neural architecture for piece-wise planar reconstruction from a single RGB image, which can directly infer a set of plane parameters and corresponding plane segmentation masks.

...read moreread less

Abstract: This paper proposes a deep neural network (DNN) for piece-wise planar depthmap reconstruction from a single RGB image. While DNNs have brought remarkable progress to single-image depth prediction, piece-wise planar depthmap reconstruction requires a structured geometry representation, and has been a difficult task to master even for DNNs. The proposed end-to-end DNN learns to directly infer a set of plane parameters and corresponding plane segmentation masks from a single RGB image. We have generated more than 50,000 piece-wise planar depthmaps for training and testing from ScanNet, a large-scale RGBD video database. Our qualitative and quantitative evaluations demonstrate that the proposed approach outperforms baseline methods in terms of both plane segmentation and depth estimation accuracy. To the best of our knowledge, this paper presents the first end-to-end neural architecture for piece-wise planar reconstruction from a single RGB image. Code and data are available at this https URL.

...read moreread less

Patent•

Utilizing deep learning for boundary-aware image segmentation

[...]

Zhe Lin¹, Yibing Song¹, Xin Lu¹, Xiaohui Shen¹, Jimei Yang¹ - Show less +1 more•Institutions (1)

Adobe Systems¹

15 May 2018

TL;DR: In this paper, the authors used a first neural network and a second neural network to generate image information used to generate a segmentation mask that corresponds to the object portrayed in the digital image.

...read moreread less

Abstract: Systems and methods are disclosed for segmenting a digital image to identify an object portrayed in the digital image from background pixels in the digital image. In particular, in one or more embodiments, the disclosed systems and methods use a first neural network and a second neural network to generate image information used to generate a segmentation mask that corresponds to the object portrayed in the digital image. Specifically, in one or more embodiments, the disclosed systems and methods optimize a fit between a mask boundary of the segmentation mask to edges of the object portrayed in the digital image to accurately segment the object within the digital image.

...read moreread less

Posted Content•

MAttNet: Modular Attention Network for Referring Expression Comprehension

[...]

Licheng Yu¹, Zhe Lin², Xiaohui Shen², Jimei Yang², Xin Lu², Mohit Bansal¹, Tamara L. Berg¹ - Show less +3 more•Institutions (2)

University of North Carolina at Chapel Hill¹, Adobe Systems²

24 Jan 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: The authors decompose expressions into three modular components related to subject appearance, location, and relationship to other objects in an end-to-end framework, which allows to flexibly adapt to expressions containing different types of information.

...read moreread less

Abstract: In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression. While most recent work treats expressions as a single unit, we propose to decompose them into three modular components related to subject appearance, location, and relationship to other objects. This allows us to flexibly adapt to expressions containing different types of information in an end-to-end framework. In our model, which we call the Modular Attention Network (MAttNet), two types of attention are utilized: language-based attention that learns the module weights as well as the word/phrase attention that each module should focus on; and visual attention that allows the subject and relationship modules to focus on relevant image components. Module weights combine scores from all three modules dynamically to output an overall score. Experiments show that MAttNet outperforms previous state-of-art methods by a large margin on both bounding-box-level and pixel-level comprehension tasks. Demo and code are provided.

...read moreread less

Posted Content•

Flow-Grounded Spatial-Temporal Video Prediction from Still Images

[...]

Yijun Li¹, Chen Fang², Jimei Yang², Zhaowen Wang², Xin Lu², Ming-Hsuan Yang¹ - Show less +2 more•Institutions (2)

University of California, Merced¹, Adobe Systems²

25 Jul 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work forms the multi-frame prediction task as a multiple time step flow (multi-flow) prediction phase followed by a flow-to-frame synthesis phase, which prevents the model from directly looking at the high-dimensional pixel space of the frame sequence and is demonstrated to be more effective in predicting better and diverse results.

...read moreread less

Abstract: Existing video prediction methods mainly rely on observing multiple historical frames or focus on predicting the next one-frame. In this work, we study the problem of generating consecutive multiple future frames by observing one single still image only. We formulate the multi-frame prediction task as a multiple time step flow (multi-flow) prediction phase followed by a flow-to-frame synthesis phase. The multi-flow prediction is modeled in a variational probabilistic manner with spatial-temporal relationships learned through 3D convolutions. The flow-to-frame synthesis is modeled as a generative process in order to keep the predicted results lying closer to the manifold shape of real video sequence. Such a two-phase design prevents the model from directly looking at the high-dimensional pixel space of the frame sequence and is demonstrated to be more effective in predicting better and diverse results. Extensive experimental results on videos with different types of motion show that the proposed algorithm performs favorably against existing methods in terms of quality, diversity and human perceptual evaluation.

...read moreread less

Patent•

Automatically segmenting images based on natural language phrases

[...]

Lin Zhe¹, Lu Xin¹, Shen Xiaohui¹, Jimei Yang¹, Chenxi Liu¹ - Show less +1 more•Institutions (1)

Adobe Systems¹

19 Jan 2018

TL;DR: In this paper, a fully convolutional neural network identifies and encodes the image features and a word embedding model generates the token vectors, and a recurrent neural network (RNN) iteratively updates a segmentation map based on combinations of the image feature encoding and the word vectors.

...read moreread less

Abstract: The invention is directed towards segmenting images based on natural language phrases. An image and an n-gram, including a sequence of tokens, are received. An encoding of image features and a sequence of token vectors are generated. A fully convolutional neural network identifies and encodes the image features. A word embedding model generates the token vectors. A recurrent neural network (RNN) iteratively updates a segmentation map based on combinations of the image feature encoding and the token vectors. The segmentation map identifies which pixels are included in an image region referenced by the n-gram. A segmented image is generated based on the segmentation map. The RNN may be a convolutional multimodal RNN. A separate RNN, such as a long short-term memory network, may iteratively update an encoding of semantic features based on the order of tokens. The first RNN may update the segmentation map based on the semantic feature encoding.

...read moreread less

Posted Content•

Neural Kinematic Networks for Unsupervised Motion Retargetting

[...]

Ruben Villegas¹, Jimei Yang², Duygu Ceylan², Honglak Lee¹•Institutions (2)

University of Michigan¹, Adobe Systems²

16 Apr 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a recurrent neural network architecture with a Forward Kinematics layer and cycle consistency based adversarial training objective is proposed for unsupervised motion retargeting, which works online and adapts the motion sequence on-the-fly as new frames are received.

...read moreread less

Abstract: We propose a recurrent neural network architecture with a Forward Kinematics layer and cycle consistency based adversarial training objective for unsupervised motion retargetting. Our network captures the high-level properties of an input motion by the forward kinematics layer, and adapts them to a target character with different skeleton bone lengths (e.g., shorter, longer arms etc.). Collecting paired motion training sequences from different characters is expensive. Instead, our network utilizes cycle consistency to learn to solve the Inverse Kinematics problem in an unsupervised manner. Our method works online, i.e., it adapts the motion sequence on-the-fly as new frames are received. In our experiments, we use the Mixamo animation data to test our method for a variety of motions and characters and achieve state-of-the-art results. We also demonstrate motion retargetting from monocular human videos to 3D characters using an off-the-shelf 3D pose estimator.

...read moreread less

Posted Content•

Learning to Sketch with Deep Q Networks and Demonstrated Strokes.

[...]

Tao Zhou, Chen Fang, Zhaowen Wang, Jimei Yang, Byungmoon Kim, Zhili Chen, Jonathan Brandt, Demetri Terzopoulos - Show less +4 more

14 Oct 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: A two-stage learning framework to teach a machine to doodle in a simulated painting environment via Stroke Demonstration and deep Q-learning (SDQ), which generates a sequence of pen actions to reproduce a reference drawing and mimics the behavior of human painters.

...read moreread less

Abstract: Doodling is a useful and common intelligent skill that people can learn and master. In this work, we propose a two-stage learning framework to teach a machine to doodle in a simulated painting environment via Stroke Demonstration and deep Q-learning (SDQ). The developed system, Doodle-SDQ, generates a sequence of pen actions to reproduce a reference drawing and mimics the behavior of human painters. In the first stage, it learns to draw simple strokes by imitating in supervised fashion from a set of strokeaction pairs collected from artist paintings. In the second stage, it is challenged to draw real and more complex doodles without ground truth actions; thus, it is trained with Qlearning. Our experiments confirm that (1) doodling can be learned without direct stepby- step action supervision and (2) pretraining with stroke demonstration via supervised learning is important to improve performance. We further show that Doodle-SDQ is effective at producing plausible drawings in different media types, including sketch and watercolor.

...read moreread less

Patent•

Deep high-resolution style synthesis

[...]

Yijun Li¹, Chen Fang¹, Jimei Yang¹, Zhaowen Wang¹, Xin Lu¹ - Show less +1 more•Institutions (1)

Adobe Systems¹

23 Aug 2018

TL;DR: In this paper, the authors describe techniques for synthesizing an image style based on a plurality of neural networks, where a computer system selects a style image based on user input that identifies the style image.

...read moreread less

Abstract: In some embodiments, techniques for synthesizing an image style based on a plurality of neural networks are described. A computer system selects a style image based on user input that identifies the style image. The computer system generates an image based on a generator neural network and a loss neural network. The generator neural network outputs the synthesized image based on a noise vector and the style image and is trained based on style features generated from the loss neural network. The loss neural network outputs the style features based on a training image. The training image and the style image have a same resolution. The style features are generated at different resolutions of the training image. The computer system provides the synthesized image to a user device in response to the user input.

...read moreread less

Proceedings Article•DOI•

Brush stroke synthesis with a generative adversarial network driven by physically based simulation

[...]

Rundong Wu¹, Zhili Chen², Zhaowen Wang², Jimei Yang², Steve Marschner¹ - Show less +1 more•Institutions (2)

Cornell University¹, Adobe Systems²

17 Aug 2018

TL;DR: A novel approach that uses a generative adversarial network (GAN) to synthesize realistic oil painting brush strokes, where the network is trained with data generated by a high-fidelity simulator to replace the expensive fluid simulation with a neural network.

...read moreread less

Abstract: We introduce a novel approach that uses a generative adversarial network (GAN) to synthesize realistic oil painting brush strokes, where the network is trained with data generated by a high-fidelity simulator. Among approaches to digitally synthesizing natural media painting strokes, physically based simulation produces by far the most realistic visual results and allows the most intuitive control of stroke variations. However, accurate physics simulations are known to be computationally expensive and often cannot meet the performance requirements of painting applications.In our work, we propose to replace the expensive fluid simulation with a neural network. The network takes the existing canvas and a new stroke trajectory as input and produces the height and color of the new stroke as output. We train the network with a dataset generated with a high quality offline simulator. The network is able to produce visual quality comparable to the offline simulator with better performance than the existing real-time oil painting simulator. Finally, we implement a real-time painting system using the trained network.

...read moreread less

Patent•

Editing digital images utilizing a neural network with an in-network rendering layer

[...]

Mehmet Ersin Yumer¹, Jimei Yang¹, Guilin Liu¹, Aksit Duygu Ceylan¹•Institutions (1)

Adobe Systems¹

06 Sep 2018

TL;DR: In this article, a neural network is used to decompose an input digital image into intrinsic physical properties (e.g., material, illumination, and shape) and then a rendering layer is utilized to generate a modified digital image based on the target property and the remaining (unsubstituted) intrinsic physical property.

...read moreread less

Abstract: The present disclosure includes methods and systems for generating modified digital images utilizing a neural network that includes a rendering layer. In particular, the disclosed systems and methods can train a neural network to decompose an input digital image into intrinsic physical properties (e.g., such as material, illumination, and shape). Moreover, the systems and methods can substitute one of the intrinsic physical properties for a target property (e.g., a modified material, illumination, or shape). The systems and methods can utilize a rendering layer trained to synthesize a digital image to generate a modified digital image based on the target property and the remaining (unsubstituted) intrinsic physical properties. Systems and methods can increase the accuracy of modified digital images by generating modified digital images that realistically reflect a confluence of intrinsic physical properties of an input digital image and target (i.e., modified) properties.

...read moreread less

Patent•

Generating novel views of a three-dimensional object based on a single two-dimensional image

[...]

Jimei Yang¹, Aksit Duygu Ceylan¹, Mehmet Ersin Yumer¹, Eunbyung Park¹•Institutions (1)

Adobe Systems¹

21 Dec 2018

TL;DR: In this paper, a source image from a source viewpoint and including a common portion of the object is encoded in 2D data, and an intermediate image that includes an intermediate view of the objects is generated based on the data.

...read moreread less

Abstract: Embodiments are directed towards providing a target view, from a target viewpoint, of a 3D object. A source image, from a source viewpoint and including a common portion of the object, is encoded in 2D data. An intermediate image that includes an intermediate view of the object is generated based on the data. The intermediate view is from the target viewpoint and includes the common portion of the object and a disoccluded portion of the object not visible in the source image. The intermediate image includes a common region and a disoccluded region corresponding to the disoccluded portion of the object. The disoccluded region is updated to include a visual representation of a prediction of the disoccluded portion of the object. The prediction is based on a trained image completion model. The target view is based on the common region and the updated disoccluded region of the intermediate image.

...read moreread less

Proceedings Article•

Learning to Doodle with Stroke Demonstrations and Deep Q-Networks.

[...]

Tao Zhou¹, Chen Fang², Zhaowen Wang², Jimei Yang², Byungmoon Kim², Zhili Chen², Jonathan Brandt², Demetri Terzopoulos³ - Show less +4 more•Institutions (3)

University of California, Los Angeles¹, Adobe Systems², University of California³

01 Jan 2018

TL;DR: A two-stage learning framework to teach a machine to doodle in a simulated painting environment via Stroke Demonstration and deep Q-learning (SDQ), which generates a sequence of pen actions to reproduce a reference drawing and mimics the behavior of human painters.

...read moreread less

Abstract: Doodling is a useful and common intelligent skill that people can learn and master. In this work, we propose a two-stage learning framework to teach a machine to doodle in a simulated painting environment via Stroke Demonstration and deep Q-learning (SDQ). The developed system, Doodle-SDQ, generates a sequence of pen actions to reproduce a reference drawing and mimics the behavior of human painters. In the first stage, it learns to draw simple strokes by imitating in supervised fashion from a set of strokeaction pairs collected from artist paintings. In the second stage, it is challenged to draw real and more complex doodles without ground truth actions; thus, it is trained with Qlearning. Our experiments confirm that (1) doodling can be learned without direct stepby-step action supervision and (2) pretraining with stroke demonstration via supervised learning is important to improve performance. We further show that Doodle-SDQ is effective at producing plausible drawings in different media types, including sketch and watercolor. A short video can be found at https://www.youtube.com/watch? v=-5FVUQFQTaE.

...read moreread less

Showing papers by "Jimei Yang published in 2018"