Showing papers by "Jimei Yang published in 2020"

PDF

Open Access

Book Chapter•DOI•

High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling

[...]

Yu Zeng¹, Zhe Lin², Jimei Yang², Jianming Zhang², Eli Shechtman², Huchuan Lu¹ - Show less +2 more•Institutions (2)

Dalian University of Technology¹, Adobe Systems²

23 Aug 2020

TL;DR: A deep generative model which not only outputs an inpainting result but also a corresponding confidence map is introduced, which progressively fills the hole by trusting only high-confidence pixels inside the hole at each iteration and focuses on the remaining pixels in the next iteration.

...read moreread less

Abstract: Existing image inpainting methods often produce artifacts when dealing with large holes in real applications. To address this challenge, we propose an iterative inpainting method with a feedback mechanism. Specifically, we introduce a deep generative model which not only outputs an inpainting result but also a corresponding confidence map. Using this map as feedback, it progressively fills the hole by trusting only high-confidence pixels inside the hole at each iteration and focuses on the remaining pixels in the next iteration. As it reuses partial predictions from the previous iterations as known pixels, this process gradually improves the result. In addition, we propose a guided upsampling network to enable generation of high-resolution inpainting results. We achieve this by extending the Contextual Attention module [39] to borrow high-resolution feature patches in the input image. Furthermore, to mimic real object removal scenarios, we collect a large object mask dataset and synthesize more realistic training data that better simulates user inputs. Experiments show that our method significantly outperforms existing methods in both quantitative and qualitative evaluations. More results and Web APP are available at https://zengxianyu.github.io/iic.

...read moreread less

125 citations

Book Chapter•DOI•

Contact and Human Dynamics from Monocular Video

[...]

Davis Rempe¹, Leonidas J. Guibas¹, Aaron Hertzmann², Bryan Russell², Ruben Villegas², Jimei Yang² - Show less +2 more•Institutions (2)

Stanford University¹, Adobe Systems²

23 Aug 2020

TL;DR: A physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input and produces motions that are significantly more realistic than those from purely kinematic methods, substantially improving quantitative measures of both kinematics and dynamic plausibility.

...read moreread less

Abstract: Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors that violate physical constraints, such as feet penetrating the ground and bodies leaning at extreme angles. In this paper, we present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input. We first estimate ground contact timings with a novel prediction network which is trained without hand-labeled data. A physics-based trajectory optimization then solves for a physically-plausible motion, based on the inputs. We show this process produces motions that are significantly more realistic than those from purely kinematic methods, substantially improving quantitative measures of both kinematic and dynamic plausibility. We demonstrate our method on character animation and pose estimation tasks on dynamic motions of dancing and sports with complex contact patterns.

...read moreread less

60 citations

Posted Content•

Attribute-conditioned Layout GAN for Automatic Graphic Design

[...]

Jianan Li¹, Jimei Yang², Jianming Zhang², Chang Liu², Christina Wang², Tingfa Xu¹ - Show less +2 more•Institutions (2)

Beijing Institute of Technology¹, Adobe Systems²

11 Sep 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: Attribute-conditioned Layout GAN is introduced to incorporate the attributes of design elements for graphic layout generation by forcing both the generator and the discriminator to meet attribute conditions.

...read moreread less

Abstract: Modeling layout is an important first step for graphic design. Recently, methods for generating graphic layouts have progressed, particularly with Generative Adversarial Networks (GANs). However, the problem of specifying the locations and sizes of design elements usually involves constraints with respect to element attributes, such as area, aspect ratio and reading-order. Automating attribute conditional graphic layouts remains a complex and unsolved problem. In this paper, we introduce Attribute-conditioned Layout GAN to incorporate the attributes of design elements for graphic layout generation by forcing both the generator and the discriminator to meet attribute conditions. Due to the complexity of graphic designs, we further propose an element dropout method to make the discriminator look at partial lists of elements and learn their local patterns. In addition, we introduce various loss designs following different design principles for layout optimization. We demonstrate that the proposed method can synthesize graphic layouts conditioned on different element attributes. It can also adjust well-designed layouts to new sizes while retaining elements' original reading-orders. The effectiveness of our method is validated through a user study.

...read moreread less

30 citations

Proceedings Article•DOI•

Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints

[...]

Yuliang Zou¹, Jimei Yang², Duygu Ceylan², Jianming Zhang², Federico Perazzi², Jia-Bin Huang¹ - Show less +2 more•Institutions (2)

Virginia Tech¹, Adobe Systems²

01 Mar 2020

TL;DR: A neural network based detector for localizing ground contact events of human feet is presented and used to impose a physical constraint for optimization of the whole human dynamics in a video.

...read moreread less

Abstract: In this paper, we aim to reduce the footskate artifacts when reconstructing human dynamics from monocular RGB videos. Recent work has made substantial progress in improving the temporal smoothness of the reconstructed motion trajectories. Their results, however, still suffer from severe foot skating and slippage artifacts. To tackle this issue, we present a neural network based detector for localizing ground contact events of human feet and use it to impose a physical constraint for optimization of the whole human dynamics in a video. We present a detailed study on the proposed ground contact detector and demonstrate high-quality human motion reconstruction results in various videos.

...read moreread less

23 citations

Posted Content•

Generative Tweening: Long-term Inbetweening of 3D Human Motions

[...]

Yi Zhou, Jingwan Lu, Connelly Barnes, Jimei Yang, Sitao Xiang, Hao Li - Show less +2 more

18 May 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work introduces a biomechanically constrained generative adversarial network that performs long-term inbetweening of human motions, conditioned on keyframe constraints.

...read moreread less

Abstract: The ability to generate complex and realistic human body animations at scale, while following specific artistic constraints, has been a fundamental goal for the game and animation industry for decades. Popular techniques include key-framing, physics-based simulation, and database methods via motion graphs. Recently, motion generators based on deep learning have been introduced. Although these learning models can automatically generate highly intricate stylized motions of arbitrary length, they still lack user control. To this end, we introduce the problem of long-term inbetweening, which involves automatically synthesizing complex motions over a long time interval given very sparse keyframes by users. We identify a number of challenges related to this problem, including maintaining biomechanical and keyframe constraints, preserving natural motions, and designing the entire motion sequence holistically while considering all constraints. We introduce a biomechanically constrained generative adversarial network that performs long-term inbetweening of human motions, conditioned on keyframe constraints. This network uses a novel two-stage approach where it first predicts local motion in the form of joint angles, and then predicts global motion, i.e. the global path that the character follows. Since there are typically a number of possible motions that could satisfy the given user constraints, we also enable our network to generate a variety of outputs with a scheme that we call Motion DNA. This approach allows the user to manipulate and influence the output content by feeding seed motions (DNA) to the network. Trained with 79 classes of captured motion data, our network performs robustly on a variety of highly complex motion styles.

...read moreread less

17 citations

Book Chapter•DOI•

AIM 2020 Challenge on Image Extreme Inpainting

[...]

Evangelos Ntavelis¹, Andrés Romero¹, Siavash Arjomand Bigdeli, Radu Timofte¹, Zheng Hui², Xiumei Wang², Xinbo Gao², Chajin Shin³, Taeoh Kim³, Hanbin Son³, Sangyoun Lee³, Chao Li⁴, Fu Li⁴, Dongliang He⁴, Shilei Wen⁴, Errui Ding⁴, Mengmeng Bai⁵, Shuchen Li⁵, Yu Zeng⁶, Zhe Lin⁷, Jimei Yang⁷, Jianming Zhang⁷, Eli Shechtman⁷, Huchuan Lu⁶, Weijian Zeng⁸, Haopeng Ni⁸, Cai Yiyang⁸, Chenghua Li⁸, Dejia Xu⁹, Haoning Wu⁹, Yu Han⁹, Uddin S.M. Nadim¹⁰, Hae Woong Jang¹⁰, Soikat Hasan Ahmed¹⁰, Jungmin Yoon¹⁰, Yong Ju Jung¹⁰, Chu Tak Li¹¹, Zhi-Song Liu¹¹, Li-Wen Wang¹¹, Wan-Chi Siu¹¹, Daniel P. K. Lun¹¹, Maitreya Suin¹², Kuldeep Purohit¹², A. N. Rajagopalan¹², Pratik Narang¹³, Murari Mandal¹⁴, Pranjal Singh Chauhan¹³ - Show less +43 more•Institutions (14)

ETH Zurich¹, Xidian University², Yonsei University³, Baidu⁴, Samsung⁵, Dalian University of Technology⁶, Adobe Systems⁷, Rensselaer Polytechnic Institute⁸, Peking University⁹, Gachon University¹⁰, Hong Kong Polytechnic University¹¹, Indian Institute of Technology Madras¹², Birla Institute of Technology and Science¹³, Malaviya National Institute of Technology, Jaipur¹⁴

23 Aug 2020

TL;DR: The AIM 2020 Extreme Image Inpainting Challenge as mentioned in this paper focused on semi-guided and classical image inpainting, and the goal was to inpaint large parts of the image with no supervision.

...read moreread less

Abstract: This paper reviews the AIM 2020 challenge on extreme image inpainting. This report focuses on proposed solutions and results for two different tracks on extreme image inpainting: classical image inpainting and semantically guided image inpainting. The goal of track 1 is to inpaint large part of the image with no supervision. Similarly, the goal of track 2 is to inpaint the image by having access to the entire semantic segmentation map of the input. The challenge had 88 and 74 participants, respectively. 11 and 6 teams competed in the final phase of the challenge, respectively. This report gauges current solutions and set a benchmark for future extreme image inpainting methods.

...read moreread less

15 citations

Journal Article•DOI•

Statistics-based motion synthesis for social conversations

[...]

Yanzhe Yang¹, Jimei Yang², Jessica K. Hodgins¹•Institutions (2)

Carnegie Mellon University¹, Adobe Systems²

06 Oct 2020

TL;DR: This paper proposes a motion synthesis technique that can rapidly generate animated motion for characters engaged in two‐party conversations that synthesizes gestures and other body motions for dyadic conversations that synchronize with novel input audio clips.

...read moreread less

Abstract: Plausible conversations among characters are required to generate the ambiance of social settings such as a restaurant, hotel lobby, or cocktail party. In this paper, we propose a motion synthesis technique that can rapidly generate animated motion for characters engaged in two-party conversations. Our system synthesizes gestures and other body motions for dyadic conversations that synchronize with novel input audio clips. Human conversations feature many different forms of coordination and synchronization. For example, speakers use hand gestures to emphasize important points, and listeners often nod in agreement or acknowledgment. To achieve the desired degree of realism, our method first constructs a motion graph that preserves the statistics of a database of recorded conversations performed by a pair of actors. This graph is then used to search for a motion sequence that respects three forms of audio-motion coordination in human conversations: coordination to phonemic clause, listener response, and partner's hesitation pause. We assess the quality of the generated animations through a user study that compares them to the originally recorded motion and evaluate the effects of each type of audio-motion coordination via ablation studies.

...read moreread less

12 citations

Journal Article•DOI•

Interactive liquid splash modeling by user sketches

[...]

Guowei Yan¹, Zhili Chen, Jimei Yang², Huamin Wang¹•Institutions (2)

Ohio State University¹, Adobe Systems²

26 Nov 2020-ACM Transactions on Graphics

TL;DR: Splashing is one of the most fascinating liquid phenomena in the real world and it is favored by artists to create stunning visual effects, both statically and dynamically.

...read moreread less

Abstract: Splashing is one of the most fascinating liquid phenomena in the real world and it is favored by artists to create stunning visual effects, both statically and dynamically. Unfortunately, the generation of complex and specialized liquid splashes is a challenging task and often requires considerable time and effort. In this paper, we present a novel system that synthesizes realistic liquid splashes from simple user sketch input. Our system adopts a conditional generative adversarial network (cGAN) trained with physics-based simulation data to produce raw liquid splash models from input sketches, and then applies model refinement processes to further improve their small-scale details. The system considers not only the trajectory of every user stroke, but also its speed, which makes the splash model simulation-ready with its underlying 3D flow. Compared with simulation-based modeling techniques through trials and errors, our system offers flexibility, convenience and intuition in liquid splash design and editing. We evaluate the usability and the efficiency of our system in an immersive virtual reality environment. Thanks to this system, an amateur user can now generate a variety of realistic liquid splashes in just a few minutes.

...read moreread less

7 citations

Patent•

Digital image completion by learning generation and patch matching jointly

[...]

Zhe Lin¹, Xin Lu¹, Xiaohui Shen¹, Jimei Yang¹, Jiahui Yu¹ - Show less +1 more•Institutions (1)

Adobe Systems¹

25 Aug 2020

TL;DR: In this article, a dual-stage framework that combines a coarse image neural network and an image refinement network was proposed for digital image completion by learning generation and patch matching jointly, where a digital image having at least one hole is provided as input to an image completer.

...read moreread less

Abstract: Digital image completion by learning generation and patch matching jointly is described. Initially, a digital image having at least one hole is received. This holey digital image is provided as input to an image completer formed with a dual-stage framework that combines a coarse image neural network and an image refinement network. The coarse image neural network generates a coarse prediction of imagery for filling the holes of the holey digital image. The image refinement network receives the coarse prediction as input, refines the coarse prediction, and outputs a filled digital image having refined imagery that fills these holes. The image refinement network generates refined imagery using a patch matching technique, which includes leveraging information corresponding to patches of known pixels for filtering patches generated based on the coarse prediction. Based on this, the image completer outputs the filled digital image with the refined imagery.

...read moreread less

6 citations

Patent•

Predicting patch displacement maps using a neural network

[...]

Zhe Lin¹, Xin Lu¹, Xiaohui Shen¹, Jimei Yang¹, Jiahui Yu¹ - Show less +1 more•Institutions (1)

Adobe Systems¹

02 Jun 2020

TL;DR: In this article, offset prediction neural network is used to predict patch displacement maps, which represent a displacement of pixels of the digital image to different locations for performing the image editing operation, and the pixel values of the affected pixels are set.

...read moreread less

Abstract: Predicting patch displacement maps using a neural network is described. Initially, a digital image on which an image editing operation is to be performed is provided as input to a patch matcher having an offset prediction neural network. From this image and based on the image editing operation for which this network is trained, the offset prediction neural network generates an offset prediction formed as a displacement map, which has offset vectors that represent a displacement of pixels of the digital image to different locations for performing the image editing operation. Pixel values of the digital image are copied to the image pixels affected by the operation by: determining the vectors pixels that correspond to the image pixels affected by the image editing operation and mapping the pixel values of the image pixels represented by the determined offset vectors to the affected pixels. According to this mapping, the pixel values of the affected pixels are set, effective to perform the image editing operation.

...read moreread less

2 citations

Patent•

Hierarchical scale matching and patch estimation for image style transfer with arbitrary resolution

[...]

Chen Fang¹, Zhe Lin¹, Zhaowen Wang¹, Zhang Yulun¹, Yilin Wang¹, Jimei Yang¹ - Show less +2 more•Institutions (1)

Adobe Systems¹

13 Aug 2020

TL;DR: In this paper, a style of a digital image is transferred to another digital image of arbitrary resolution using a patch-by-patch style transfer process at several increasing resolutions, or scale levels, of both content and style images.

...read moreread less

Abstract: A style of a digital image is transferred to another digital image of arbitrary resolution. A high-resolution (HR) content image is segmented into several low-resolution (LR) patches. The resolution of a style image is matched to have the same resolution as the LR content image patches. Style transfer is then performed on a patch-by-patch basis using, for example, a pair of feature transforms—whitening and coloring. The patch-by-patch style transfer process is then repeated at several increasing resolutions, or scale levels, of both the content and style images. The results of the style transfer at each scale level are incorporated into successive scale levels up to and including the original HR scale. As a result, style transfer can be performed with images having arbitrary resolutions to produce visually pleasing results with good spatial consistency.

...read moreread less

Patent•

Digital image layout training using wireframe rendering within a generative adversarial network (GAN) system

[...]

Jimei Yang, Jianming Zhang, Aaron Hertzmann, Jianan Li

14 May 2020

TL;DR: In this paper, a GAN system is employed to train the generator module to refine digital image layouts using a wireframe rendering discriminator module, which is then compared with at least one ground truth digital image layout using a loss function as part of machine learning.

...read moreread less

Abstract: Digital image layout training is described using wireframe rendering within a generative adversarial network (GAN) system A GAN system is employed to train the generator module to refine digital image layouts To do so, a wireframe rendering discriminator module rasterizes a refined digital training digital image layout received from a generator module into a wireframe digital image layout The wireframe digital image layout is then compared with at least one ground truth digital image layout using a loss function as part of machine learning by the wireframe discriminator module The generator module is then trained by backpropagating a result of the comparison

...read moreread less

Posted Content•

Contact and Human Dynamics from Monocular Video

[...]

Davis Rempe¹, Leonidas J. Guibas¹, Aaron Hertzmann², Bryan Russell², Ruben Villegas², Jimei Yang² - Show less +2 more•Institutions (2)

Stanford University¹, Adobe Systems²

22 Jul 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a physics-based method for inferring 3D human motion from video sequences is presented, which takes initial 2D and 3D pose estimates as input and solves for a physically plausible motion, based on the inputs.

...read moreread less

Patent•

Transferring image style to content of a digital image

[...]

Chen Fang¹, Zhe Lin¹, Zhaowen Wang, Zhang Yulun, Wang Yilin, Jimei Yang - Show less +2 more•Institutions (1)

Adobe Systems¹

16 Jul 2020

TL;DR: In this article, a feature transfer module iteratively transfers style features to the coarse feature map and generates a fine feature map, and a decoder generates an output image with content of the content image in a style of the style image from the fused features.

...read moreread less

Abstract: In implementations of transferring image style to content of a digital image, an image editing system includes an encoder that extracts features from a content image and features from a style image. A whitening and color transform generates coarse features from the content and style features extracted by the encoder for one pass of encoding and decoding. Hence, the processing delay and memory requirements are low. A feature transfer module iteratively transfers style features to the coarse feature map and generates a fine feature map. The image editing system fuses the fine features with the coarse features, and a decoder generates an output image with content of the content image in a style of the style image from the fused features. Accordingly, the image editing system efficiently transfers an image style to image content in real-time, without undesirable artifacts in the output image.

...read moreread less

Patent•

Data-driven modeling of advanced paint appearance

[...]

Xin Sun¹, Zhili Chen¹, Nathan A. Carr¹, Murria Julio Marco¹, Jimei Yang¹ - Show less +1 more•Institutions (1)

Adobe Systems¹

12 Mar 2020

TL;DR: In this paper, a painting stroke is rendered on the canvas using the shading function, where the painting stroke includes a plurality of pixels and a neighborhood patch of pixels is selected and input into a neural network and a shading function is output from the neural network.

...read moreread less

Abstract: According to one general aspect, systems and techniques for rendering a painting stroke of a three-dimensional digital painting include receiving a painting stroke input on a canvas, where the painting stroke includes a plurality of pixels. For each of the pixels in the plurality of pixels, a neighborhood patch of pixels is selected and input into a neural network and a shading function is output from the neural network. The painting stroke is rendered on the canvas using the shading function.

...read moreread less

Posted Content•

High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling

[...]

Yu Zeng¹, Zhe Lin², Jimei Yang², Jianming Zhang², Eli Shechtman², Huchuan Lu¹ - Show less +2 more•Institutions (2)

Dalian University of Technology¹, Adobe Systems²

24 May 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: Zhang et al. as mentioned in this paper proposed an iterative inpainting method with a feedback mechanism, which not only outputs an in-painting result but also a corresponding confidence map, and progressively fills the hole by trusting only high-confidence pixels inside the hole at each iteration and focusing on the remaining pixels in the next iteration.

...read moreread less

Abstract: Existing image inpainting methods often produce artifacts when dealing with large holes in real applications. To address this challenge, we propose an iterative inpainting method with a feedback mechanism. Specifically, we introduce a deep generative model which not only outputs an inpainting result but also a corresponding confidence map. Using this map as feedback, it progressively fills the hole by trusting only high-confidence pixels inside the hole at each iteration and focuses on the remaining pixels in the next iteration. As it reuses partial predictions from the previous iterations as known pixels, this process gradually improves the result. In addition, we propose a guided upsampling network to enable generation of high-resolution inpainting results. We achieve this by extending the Contextual Attention module to borrow high-resolution feature patches in the input image. Furthermore, to mimic real object removal scenarios, we collect a large object mask dataset and synthesize more realistic training data that better simulates user inputs. Experiments show that our method significantly outperforms existing methods in both quantitative and qualitative evaluations. More results and Web APP are available at this https URL.

...read moreread less

Patent•

Generating realistic animations for digital animation characters utilizing a generative adversarial network and a hip motion prediction network

[...]

Jingwan Lu¹, Yi Zhou¹, Connelly Barnes, Jimei Yang•Institutions (1)

Adobe Systems¹

31 Dec 2020

Patent•

3D motion effect from a 2D image

[...]

Long Mai¹, Simon Niklaus¹, Jimei Yang•Institutions (1)

Adobe Systems¹

22 Oct 2020

TL;DR: In this paper, a 3D motion effect from a 2D image is generated by inpainting occlusion gaps in the one or more extremal views and combining them with the global point cloud and the camera path.

...read moreread less

Abstract: Systems and methods are described for generating a three dimensional (3D) effect from a two dimensional (2D) image. The methods may include generating a depth map based on a 2D image, identifying a camera path, generating one or more extremal views based on the 2D image and the camera path, generating a global point cloud by inpainting occlusion gaps in the one or more extremal views, generating one or more intermediate views based on the global point cloud and the camera path, and combining the one or more extremal views and the one or more intermediate views to produce a 3D motion effect.

...read moreread less

Patent•

Line drawing generation

[...]

Brian Price¹, Ning Xu¹, Naoto Inoue, Jimei Yang, Ito Daicho - Show less +1 more•Institutions (1)

Adobe Systems¹

19 Nov 2020

TL;DR: In this paper, a two-tone digital image is generated from a human-generated line drawing of the contents of a photograph, where the background of the image is one tone and the contents in the input photographs are represented by lines drawn in the second tone.

...read moreread less

Abstract: Computing systems and computer-implemented methods can be used for automatically generating a digital line drawing of the contents of a photograph. In various examples, these techniques include use of a neural network, referred to as a generator network, that is trained on a dataset of photographs and human-generated line drawings of the photographs. The training data set teaches the neural network to trace the edges and features of objects in the photographs, as well as which edges or features can be ignored. The output of the generator network is a two-tone digital image, where the background of the image is one tone, and the contents in the input photographs are represented by lines drawn in the second tone. In some examples, a second neural network, referred to as a restorer network, can further process the output of the generator network, and remove visual artifacts and clean up the lines.

...read moreread less