scispace - formally typeset
Search or ask a question
Author

Jun Xing

Bio: Jun Xing is an academic researcher from Institute for Creative Technologies. The author has contributed to research in topics: Deep learning & Rendering (computer graphics). The author has an hindex of 14, co-authored 30 publications receiving 732 citations. Previous affiliations of Jun Xing include University of Science and Technology of China & Adobe Systems.

Papers
More filters
Proceedings ArticleDOI
01 Jun 2019
TL;DR: This paper provides a simple and uniform way for weights and activations quantization by formulating it as a differentiable non-linear function that will shed new lights on the interpretation of neural network quantization.
Abstract: Although deep neural networks are highly effective, their high computational and memory costs severely hinder their applications to portable devices. As a consequence, lowbit quantization, which converts a full-precision neural network into a low-bitwidth integer version, has been an active and promising research topic. Existing methods formulate the low-bit quantization of networks as an approximation or optimization problem. Approximation-based methods confront the gradient mismatch problem, while optimizationbased methods are only suitable for quantizing weights and can introduce high computational cost during the training stage. In this paper, we provide a simple and uniform way for weights and activations quantization by formulating it as a differentiable non-linear function. The quantization function is represented as a linear combination of several Sigmoid functions with learnable biases and scales that could be learned in a lossless and end-to-end manner via continuous relaxation of the steepness of Sigmoid functions. Extensive experiments on image classification and object detection tasks show that our quantization networks outperform state-of-the-art methods. We believe that the proposed method will shed new lights on the interpretation of neural network quantization.

224 citations

Journal ArticleDOI
TL;DR: This work produces state-of-the-art quality image and video synthesis, and is the first to the knowledge that is able to generate a dynamically textured avatar with a mouth interior, all from a single image.
Abstract: With the rising interest in personalized VR and gaming experiences comes the need to create high quality 3D avatars that are both low-cost and variegated. Due to this, building dynamic avatars from a single unconstrained input image is becoming a popular application. While previous techniques that attempt this require multiple input images or rely on transferring dynamic facial appearance from a source actor, we are able to do so using only one 2D input image without any form of transfer from a source image. We achieve this using a new conditional Generative Adversarial Network design that allows fine-scale manipulation of any facial input image into a new expression while preserving its identity. Our photoreal avatar GAN (paGAN) can also synthesize the unseen mouth interior and control the eye-gaze direction of the output, as well as produce the final image from a novel viewpoint. The method is even capable of generating fully-controllable temporally stable video sequences, despite not using temporal information during training. After training, we can use our network to produce dynamic image-based avatars that are controllable on mobile devices in real time. To do this, we compute a fixed set of output images that correspond to key blendshapes, from which we extract textures in UV space. Using a subject's expression blendshapes at run-time, we can linearly blend these key textures together to achieve the desired appearance. Furthermore, we can use the mouth interior and eye textures produced by our network to synthesize on-the-fly avatar animations for those regions. Our work produces state-of-the-art quality image and video synthesis, and is the first to our knowledge that is able to generate a dynamically textured avatar with a mouth interior, all from a single image.

184 citations

Book ChapterDOI
08 Sep 2018
TL;DR: This work focuses on the task of template-free, per-frame 3D surface reconstruction from as few as three RGB sensors, for which conventional visual hull or multi-view stereo methods fail to generate plausible results.
Abstract: We present a deep learning based volumetric approach for performance capture using a passive and highly sparse multi-view capture system. State-of-the-art performance capture systems require either pre-scanned actors, large number of cameras or active sensors. In this work, we focus on the task of template-free, per-frame 3D surface reconstruction from as few as three RGB sensors, for which conventional visual hull or multi-view stereo methods fail to generate plausible results. We introduce a novel multi-view Convolutional Neural Network (CNN) that maps 2D images to a 3D volumetric field and we use this field to encode the probabilistic distribution of surface points of the captured subject. By querying the resulting field, we can instantiate the clothed human body at arbitrary resolutions. Our approach scales to different numbers of input images, which yield increased reconstruction quality when more views are used. Although only trained on synthetic data, our network can generalize to handle real footage from body performance capture. Our method is suitable for high-quality low-cost full body volumetric capture solutions, which are gaining popularity for VR and AR content creation. Experimental results demonstrate that our method is significantly more robust and accurate than existing techniques when only very sparse views are available.

127 citations

Proceedings ArticleDOI
01 Jun 2018
TL;DR: This work proposes to encode fine details in high-resolution displacement maps which are learned through a hybrid network adopting the state-of-the-art image-to-image translation network and super resolution network, enabling the full range of facial detail to be modeled.
Abstract: We present a learning-based approach for synthesizing facial geometry at medium and fine scales from diffusely-lit facial texture maps. When applied to an image sequence, the synthesized detail is temporally coherent. Unlike current state-of-the-art methods [17, 5], which assume "dark is deep", our model is trained with measured facial detail collected using polarized gradient illumination in a Light Stage [20]. This enables us to produce plausible facial detail across the entire face, including where previous approaches may incorrectly interpret dark features as concavities such as at moles, hair stubble, and occluded pores. Instead of directly inferring 3D geometry, we propose to encode fine details in high-resolution displacement maps which are learned through a hybrid network adopting the state-of-the-art image-to-image translation network [29] and super resolution network [43]. To effectively capture geometric detail at both mid- and high frequencies, we factorize the learning into two separate sub-networks, enabling the full range of facial detail to be modeled. Results from our learning-based approach compare favorably with a high-quality active facial scanhening technique, and require only a single passive lighting condition without a complex scanning setup.

82 citations

Journal ArticleDOI
26 Oct 2015
TL;DR: The key idea is to extend the local similarity method in [Xing et al. 2014], which handles only low-level spatial repetitions such as hatches within a single frame, to a global similarity that can capture high-level structures across multiple frames such as dynamic objects.
Abstract: Hand-drawn animation is a major art form and communication medium, but can be challenging to produce. We present a system to help people create frame-by-frame animations through manual sketches. We design our interface to be minimalistic: it contains only a canvas and a few controls. When users draw on the canvas, our system silently analyzes all past sketches and predicts what might be drawn in the future across spatial locations and temporal frames. The interface also offers suggestions to beautify existing drawings. Our system can reduce manual workload and improve output quality without compromising natural drawing flow and control: users can accept, ignore, or modify such predictions visualized on the canvas by simple gestures. Our key idea is to extend the local similarity method in [Xing et al. 2014], which handles only low-level spatial repetitions such as hatches within a single frame, to a global similarity that can capture high-level structures across multiple frames such as dynamic objects. We evaluate our system through a preliminary user study and confirm that it can enhance both users' objective performance and subjective satisfaction.

66 citations


Cited by
More filters
Proceedings Article
01 Jan 1999

2,010 citations

Book
01 Dec 1988
TL;DR: In this paper, the spectral energy distribution of the reflected light from an object made of a specific real material is obtained and a procedure for accurately reproducing the color associated with the spectrum is discussed.
Abstract: This paper presents a new reflectance model for rendering computer synthesized images. The model accounts for the relative brightness of different materials and light sources in the same scene. It describes the directional distribution of the reflected light and a color shift that occurs as the reflectance changes with incidence angle. The paper presents a method for obtaining the spectral energy distribution of the light reflected from an object made of a specific real material and discusses a procedure for accurately reproducing the color associated with the spectral energy distribution. The model is applied to the simulation of a metal and a plastic.

1,401 citations

Proceedings ArticleDOI
15 Jun 2019
TL;DR: In this paper, an implicit field is used to assign a value to each point in 3D space, so that a shape can be extracted as an iso-surface, and a binary classifier is trained to perform this assignment.
Abstract: We advocate the use of implicit fields for learning generative models of shapes and introduce an implicit field decoder, called IM-NET, for shape generation, aimed at improving the visual quality of the generated shapes. An implicit field assigns a value to each point in 3D space, so that a shape can be extracted as an iso-surface. IM-NET is trained to perform this assignment by means of a binary classifier. Specifically, it takes a point coordinate, along with a feature vector encoding a shape, and outputs a value which indicates whether the point is outside the shape or not. By replacing conventional decoders by our implicit decoder for representation learning (via IM-AE) and shape generation (via IM-GAN), we demonstrate superior results for tasks such as generative shape modeling, interpolation, and single-view 3D reconstruction, particularly in terms of visual quality. Code and supplementary material are available at https://github.com/czq142857/implicit-decoder.

1,261 citations

Journal ArticleDOI
TL;DR: This paper presents a comprehensive review of recent progress in deep learning methods for point clouds, covering three major tasks, including 3D shape classification, 3D object detection and tracking, and 3D point cloud segmentation.
Abstract: Point cloud learning has lately attracted increasing attention due to its wide applications in many areas, such as computer vision, autonomous driving, and robotics As a dominating technique in AI, deep learning has been successfully used to solve various 2D vision problems However, deep learning on point clouds is still in its infancy due to the unique challenges faced by the processing of point clouds with deep neural networks Recently, deep learning on point clouds has become even thriving, with numerous methods being proposed to address different problems in this area To stimulate future research, this paper presents a comprehensive review of recent progress in deep learning methods for point clouds It covers three major tasks, including 3D shape classification, 3D object detection and tracking, and 3D point cloud segmentation It also presents comparative results on several publicly available datasets, together with insightful observations and inspiring future research directions

1,021 citations