scispace - formally typeset
Search or ask a question
Author

Yajie Zhao

Bio: Yajie Zhao is an academic researcher from Wenzhou Medical College. The author has contributed to research in topics: Rendering (computer graphics) & Computer science. The author has an hindex of 11, co-authored 31 publications receiving 410 citations. Previous affiliations of Yajie Zhao include Institute for Creative Technologies & University of Kentucky.

Papers
More filters
Journal ArticleDOI
TL;DR: A deep learning-based technique to infer high-quality facial reflectance and geometry given a single unconstrained image of the subject, which may contain partial occlusions and arbitrary illumination conditions, and demonstrates the rendering of high-fidelity 3D avatars from a variety of subjects captured under different lighting conditions.
Abstract: We present a deep learning-based technique to infer high-quality facial reflectance and geometry given a single unconstrained image of the subject, which may contain partial occlusions and arbitrary illumination conditions. The reconstructed high-resolution textures, which are generated in only a few seconds, include high-resolution skin surface reflectance maps, representing both the diffuse and specular albedo, and medium- and high-frequency displacement maps, thereby allowing us to render compelling digital avatars under novel lighting conditions. To extract this data, we train our deep neural networks with a high-quality skin reflectance and geometry database created with a state-of-the-art multi-view photometric stereo system using polarized gradient illumination. Given the raw facial texture map extracted from the input image, our neural networks synthesize complete reflectance and displacement maps, as well as complete missing regions caused by occlusions. The completed textures exhibit consistent quality throughout the face due to our network architecture, which propagates texture features from the visible region, resulting in high-fidelity details that are consistent with those seen in visible regions. We describe how this highly underconstrained problem is made tractable by dividing the full inference into smaller tasks, which are addressed by dedicated neural networks. We demonstrate the effectiveness of our network design with robust texture completion from images of faces that are largely occluded. With the inferred reflectance and geometry data, we demonstrate the rendering of high-fidelity 3D avatars from a variety of subjects captured under different lighting conditions. In addition, we perform evaluations demonstrating that our method can infer plausible facial reflectance and geometric details comparable to those obtained from high-end capture devices, and outperform alternative approaches that require only a single unconstrained input image.

139 citations

Book ChapterDOI
08 Sep 2018
TL;DR: This work focuses on the task of template-free, per-frame 3D surface reconstruction from as few as three RGB sensors, for which conventional visual hull or multi-view stereo methods fail to generate plausible results.
Abstract: We present a deep learning based volumetric approach for performance capture using a passive and highly sparse multi-view capture system. State-of-the-art performance capture systems require either pre-scanned actors, large number of cameras or active sensors. In this work, we focus on the task of template-free, per-frame 3D surface reconstruction from as few as three RGB sensors, for which conventional visual hull or multi-view stereo methods fail to generate plausible results. We introduce a novel multi-view Convolutional Neural Network (CNN) that maps 2D images to a 3D volumetric field and we use this field to encode the probabilistic distribution of surface points of the captured subject. By querying the resulting field, we can instantiate the clothed human body at arbitrary resolutions. Our approach scales to different numbers of input images, which yield increased reconstruction quality when more views are used. Although only trained on synthetic data, our network can generalize to handle real footage from body performance capture. Our method is suitable for high-quality low-cost full body volumetric capture solutions, which are gaining popularity for VR and AR content creation. Experimental results demonstrate that our method is significantly more robust and accurate than existing techniques when only very sparse views are available.

127 citations

Journal ArticleDOI
TL;DR: Infection assays demonstrated high mortality in the Galleria mellonella model with the highest LD50 values for three isolates (<105 CFU/mL) demonstrating the degree of hypervirulence of these CR-hvKP isolates, and is discussed relative to previous outbreaks.
Abstract: Carbapenem-resistant, hypervirulent Klebsiella pneumoniae (CR-hvKP) has recently emerged as a significant threat to public health. In this study, 29 Klebsiella pneumoniae isolates were isolated from eight patients admitted to the intensive care unit (ICU) of a comprehensive teaching hospital located in China from March 2017 to January 2018. Clinical information of patients was the basis for the further analyses of the isolates including antimicrobial susceptibility tests, identification of antibiotic resistance and virulence gene determinants, multilocus sequence typing (MLST), XbaI-macrorestriction by pulsed-field gel electrophoresis (PFGE). Selected isolates representing distinct resistance profiles and virulence phenotypes were screened for hypervirulence in a Galleria mellonella larvae infection model. In the course of the outbreak, the overall mortality rate of patients was 100% (n=8) attributed to complications arising from CR-hvKP infections. All isolates except one (28/29, 96.6%) were resistant to multiple antimicrobial agents, and harbored diverse resistance determinants that included the globally prevalent carbapenemase blaKPC-2. Most isolates had hypervirulent genotypes being positive for nineteen virulence-associated genes, including iutA (25/29, 86.2%), rmpA (27/29, 93.1%), ybtA (27/29, 93.1%), entB (29/29, 100%), fimH (29/29, 100%) and mrkD (29/29, 100%). MLST revealed ST11 for the majority of isolates (26/29, 89,7%). Infection assays demonstrated high mortality in the Galleria mellonella model with the highest LD50 values for three isolates (less than 105 CFU/mL) demonstrating the degree of hypervirulence of these CR-hvKP isolates, and is discussed relative to previous outbreaks of CR-hvKP.

55 citations

Proceedings ArticleDOI
14 Jun 2020
TL;DR: In this article, a non-linear morphable face model is proposed to generate multifarious face geometry of pore-level resolution, coupled with material attributes for use in physically-based rendering.
Abstract: Based on a combined data set of 4000 high resolution facial scans, we introduce a non-linear morphable face model, capable of producing multifarious face geometry of pore-level resolution, coupled with material attributes for use in physically-based rendering. We aim to maximize the variety of the participant’s face identities, while increasing the robustness of correspondence between unique components, including middle-frequency geometry, albedo maps, specular intensity maps and high-frequency displacement details. Our deep learning based generative model learns to correlate albedo and geometry, which ensures the anatomical correctness of the generated assets. We demonstrate potential use of our generative model for novel identity generation, model fitting, interpolation, animation, high fidelity data visualization, and low-to-high resolution data domain transferring. We hope the release of this generative model will encourage further cooperation between all graphics, vision, and data focused professionals, while demonstrating the cumulative value of every individual’s complete biometric profile.

41 citations

Posted Content
TL;DR: This work introduces a non-linear morphable face model, capable of producing multifarious face geometry of pore-level resolution, coupled with material attributes for use in physically-based rendering.
Abstract: Based on a combined data set of 4000 high resolution facial scans, we introduce a non-linear morphable face model, capable of producing multifarious face geometry of pore-level resolution, coupled with material attributes for use in physically-based rendering. We aim to maximize the variety of face identities, while increasing the robustness of correspondence between unique components, including middle-frequency geometry, albedo maps, specular intensity maps and high-frequency displacement details. Our deep learning based generative model learns to correlate albedo and geometry, which ensures the anatomical correctness of the generated assets. We demonstrate potential use of our generative model for novel identity generation, model fitting, interpolation, animation, high fidelity data visualization, and low-to-high resolution data domain transferring. We hope the release of this generative model will encourage further cooperation between all graphics, vision, and data focused professionals while demonstrating the cumulative value of every individual's complete biometric profile.

39 citations


Cited by
More filters
Reference EntryDOI
15 Oct 2004

2,118 citations

Proceedings Article
01 Jan 1999

2,010 citations

Proceedings ArticleDOI
15 Jun 2019
TL;DR: In this paper, an implicit field is used to assign a value to each point in 3D space, so that a shape can be extracted as an iso-surface, and a binary classifier is trained to perform this assignment.
Abstract: We advocate the use of implicit fields for learning generative models of shapes and introduce an implicit field decoder, called IM-NET, for shape generation, aimed at improving the visual quality of the generated shapes. An implicit field assigns a value to each point in 3D space, so that a shape can be extracted as an iso-surface. IM-NET is trained to perform this assignment by means of a binary classifier. Specifically, it takes a point coordinate, along with a feature vector encoding a shape, and outputs a value which indicates whether the point is outside the shape or not. By replacing conventional decoders by our implicit decoder for representation learning (via IM-AE) and shape generation (via IM-GAN), we demonstrate superior results for tasks such as generative shape modeling, interpolation, and single-view 3D reconstruction, particularly in terms of visual quality. Code and supplementary material are available at https://github.com/czq142857/implicit-decoder.

1,261 citations

Journal ArticleDOI
TL;DR: This paper presents a comprehensive review of recent progress in deep learning methods for point clouds, covering three major tasks, including 3D shape classification, 3D object detection and tracking, and 3D point cloud segmentation.
Abstract: Point cloud learning has lately attracted increasing attention due to its wide applications in many areas, such as computer vision, autonomous driving, and robotics As a dominating technique in AI, deep learning has been successfully used to solve various 2D vision problems However, deep learning on point clouds is still in its infancy due to the unique challenges faced by the processing of point clouds with deep neural networks Recently, deep learning on point clouds has become even thriving, with numerous methods being proposed to address different problems in this area To stimulate future research, this paper presents a comprehensive review of recent progress in deep learning methods for point clouds It covers three major tasks, including 3D shape classification, 3D object detection and tracking, and 3D point cloud segmentation It also presents comparative results on several publicly available datasets, together with insightful observations and inspiring future research directions

1,021 citations

Proceedings ArticleDOI
13 May 2019
TL;DR: Pixel-aligned Implicit Function (PIFu) as mentioned in this paper aligns pixels of 2D images with the global context of their corresponding 3D object to produce highresolution surfaces including largely unseen regions such as the back of a person.
Abstract: We introduce Pixel-aligned Implicit Function (PIFu), an implicit representation that locally aligns pixels of 2D images with the global context of their corresponding 3D object. Using PIFu, we propose an end-to-end deep learning method for digitizing highly detailed clothed humans that can infer both 3D surface and texture from a single image, and optionally, multiple input images. Highly intricate shapes, such as hairstyles, clothing, as well as their variations and deformations can be digitized in a unified way. Compared to existing representations used for 3D deep learning, PIFu produces high-resolution surfaces including largely unseen regions such as the back of a person. In particular, it is memory efficient unlike the voxel representation, can handle arbitrary topology, and the resulting surface is spatially aligned with the input image. Furthermore, while previous techniques are designed to process either a single image or multiple views, PIFu extends naturally to arbitrary number of views. We demonstrate high-resolution and robust reconstructions on real world images from the DeepFashion dataset, which contains a variety of challenging clothing types. Our method achieves state-of-the-art performance on a public benchmark and outperforms the prior work for clothed human digitization from a single image.

907 citations