Home
/
Authors
/
Zexiang Xu

Author

Zexiang Xu

Other affiliations: University of California, Beihang University, University of California, San Diego ...read more

Bio: Zexiang Xu is an academic researcher from Adobe Systems. The author has contributed to research in topics: Rendering (computer graphics) & Computer science. The author has an hindex of 13, co-authored 38 publications receiving 791 citations. Previous affiliations of Zexiang Xu include University of California & Beihang University.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2014

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Learning to reconstruct shape and spatially-varying reflectance from a single image

[...]

Zhengqin Li¹, Zexiang Xu¹, Ravi Ramamoorthi¹, Kalyan Sunkavalli², Manmohan Chandraker¹ - Show less +1 more•Institutions (2)

University of California¹, Adobe Systems²

04 Dec 2018-ACM Transactions on Graphics

TL;DR: This work demonstrates that it can recover non-Lambertian, spatially-varying BRDFs and complex geometry belonging to any arbitrary shape class, from a single RGB image captured under a combination of unknown environment illumination and flash lighting.

...read moreread less

Abstract: Reconstructing shape and reflectance properties from images is a highly under-constrained problem, and has previously been addressed by using specialized hardware to capture calibrated data or by assuming known (or highly constrained) shape or reflectance. In contrast, we demonstrate that we can recover non-Lambertian, spatially-varying BRDFs and complex geometry belonging to any arbitrary shape class, from a single RGB image captured under a combination of unknown environment illumination and flash lighting. We achieve this by training a deep neural network to regress shape and reflectance from the image. Our network is able to address this problem because of three novel contributions: first, we build a large-scale dataset of procedurally generated shapes and real-world complex SVBRDFs that approximate real world appearance well. Second, single image inverse rendering requires reasoning at multiple scales, and we propose a cascade network structure that allows this in a tractable manner. Finally, we incorporate an in-network rendering layer that aids the reconstruction task by handling global illumination effects that are important for real-world scenes. Together, these contributions allow us to tackle the entire inverse rendering problem in a holistic manner and produce state-of-the-art results on both synthetic and real data.

...read moreread less

244 citations

Proceedings Article•DOI•

TensoRF: Tensorial Radiance Fields

[...]

Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, Hao Su - Show less +1 more

17 Mar 2022

TL;DR: TensoRF is presented, a novel approach to model and reconstruct radiance fields as a 4D tensor, which represents a 3D voxel grid with per-voxel multi-channel features, and a novel vector-matrix (VM) decomposition that relaxes the low-rank constraints for two modes of a tensor and factorizes tensors into compact vector and matrix factors.

...read moreread less

Abstract: We present TensoRF, a novel approach to model and reconstruct radiance fields. Unlike NeRF that purely uses MLPs, we model the radiance field of a scene as a 4D tensor, which represents a 3D voxel grid with per-voxel multi-channel features. Our central idea is to factorize the 4D scene tensor into multiple compact low-rank tensor components. We demonstrate that applying traditional CP decomposition -- that factorizes tensors into rank-one components with compact vectors -- in our framework leads to improvements over vanilla NeRF. To further boost performance, we introduce a novel vector-matrix (VM) decomposition that relaxes the low-rank constraints for two modes of a tensor and factorizes tensors into compact vector and matrix factors. Beyond superior rendering quality, our models with CP and VM decompositions lead to a significantly lower memory footprint in comparison to previous and concurrent works that directly optimize per-voxel features. Experimentally, we demonstrate that TensoRF with CP decomposition achieves fast reconstruction (<30 min) with better rendering quality and even a smaller model size (<4 MB) compared to NeRF. Moreover, TensoRF with VM decomposition further boosts rendering quality and outperforms previous state-of-the-art methods, while reducing the reconstruction time (<10 min) and retaining a compact model size (<75 MB).

...read moreread less

241 citations

Proceedings Article•DOI•

Deep Stereo Using Adaptive Thin Volume Representation With Uncertainty Awareness

[...]

Shuo Cheng¹, Zexiang Xu¹, Shilin Zhu¹, Zhuwen Li, Li Erran Li², Ravi Ramamoorthi¹, Hao Su¹ - Show less +3 more•Institutions (2)

University of California, San Diego¹, Columbia University²

14 Jun 2020

TL;DR: The proposed ATV consists of only a small number of planes with low memory and computation costs; yet, it efficiently partitions local depth ranges within learned small uncertainty intervals, which enables reconstruction with high completeness and accuracy in a coarse-to-fine fashion.

...read moreread less

Abstract: We present Uncertainty-aware Cascaded Stereo Network (UCS-Net) for 3D reconstruction from multiple RGB images. Multi-view stereo (MVS) aims to reconstruct fine-grained scene geometry from multi-view images. Previous learning-based MVS methods estimate per-view depth using plane sweep volumes (PSVs) with a fixed depth hypothesis at each plane; this requires densely sampled planes for high accuracy, which is impractical for high-resolution depth because of limited memory. In contrast, we propose adaptive thin volumes (ATVs); in an ATV, the depth hypothesis of each plane is spatially varying, which adapts to the uncertainties of previous per-pixel depth predictions. Our UCS-Net has three stages: the first stage processes a small PSV to predict low-resolution depth; two ATVs are then used in the following stages to refine the depth with higher resolution and higher accuracy. Our ATV consists of only a small number of planes with low memory and computation costs; yet, it efficiently partitions local depth ranges within learned small uncertainty intervals. We propose to use variance-based uncertainty estimates to adaptively construct ATVs; this differentiable process leads to reasonable and fine-grained spatial partitioning. Our multi-stage framework progressively sub-divides the vast scene space with increasing depth resolution and precision, which enables reconstruction with high completeness and accuracy in a coarse-to-fine fashion. We demonstrate that our method achieves superior performance compared with other learning-based MVS methods on various challenging datasets.

...read moreread less

181 citations

Journal Article•DOI•

Single image portrait relighting

[...]

Tiancheng Sun¹, Jonathan T. Barron², Yun-Ta Tsai², Zexiang Xu¹, Xueming Yu², Graham Fyffe², Christoph Rhemann², Jay Busch², Paul Debevec², Ravi Ramamoorthi¹ - Show less +6 more•Institutions (2)

University of California¹, Google²

12 Jul 2019-ACM Transactions on Graphics

TL;DR: In this paper, a neural network is trained on a small database of 18 individuals captured under different directional light sources in a controlled light stage setup consisting of a densely sampled sphere of lights.

...read moreread less

Abstract: Lighting plays a central role in conveying the essence and depth of the subject in a portrait photograph. Professional photographers will carefully control the lighting in their studio to manipulate the appearance of their subject, while consumer photographers are usually constrained to the illumination of their environment. Though prior works have explored techniques for relighting an image, their utility is usually limited due to requirements of specialized hardware, multiple images of the subject under controlled or known illuminations, or accurate models of geometry and reflectance. To this end, we present a system for portrait relighting: a neural network that takes as input a single RGB image of a portrait taken with a standard cellphone camera in an unconstrained environment, and from that image produces a relit image of that subject as though it were illuminated according to any provided environment map. Our method is trained on a small database of 18 individuals captured under different directional light sources in a controlled light stage setup consisting of a densely sampled sphere of lights. Our proposed technique produces quantitatively superior results on our dataset's validation set compared to prior works, and produces convincing qualitative relighting results on a dataset of hundreds of real-world cellphone portraits. Because our technique can produce a 640 × 640 image in only 160 milliseconds, it may enable interactive user-facing photographic applications in the future.

...read moreread less

179 citations

Proceedings Article•DOI•

Advances in neural rendering

[...]

Ayush Tewari, Ohad Fried¹, Justus Thies², Vincent Sitzmann³, Stephen Lombardi⁴, Zexiang Xu⁵, Tomas Simon⁴, Matthias Nießner⁶, Edgar Tretschk, Lingjie Liu, Ben Mildenhall⁷, Pratul P. Srinivasan⁷, Rohit Pandey⁷, Sergio Orts-Escolano⁷, Sean Fanello⁷, M. Guo⁸, Gordon Wetzstein⁸, Jun-Yan Zhu⁹, Christian Theobalt, Maneesh Agrawala⁸, Dan B. Goldman⁷, Michael Zollhöfer⁴ - Show less +18 more•Institutions (9)

Interdisciplinary Center Herzliya¹, Max Planck Society², Massachusetts Institute of Technology³, Facebook⁴, Adobe Systems⁵, Technische Universität München⁶, Google⁷, Stanford University⁸, Carnegie Mellon University⁹

09 Aug 2021

TL;DR: Loss functions for Neural Rendering Jun-Yan Zhu shows the importance of knowing the number of neurons in the system and how many neurons are firing at the same time.

...read moreread less

Abstract: Loss functions for Neural Rendering Jun-Yan Zhu

...read moreread less

174 citations

1
2
3
4
…
5
6
7
8
9
10
11
12

Collapse

Cited by

PDF

Open Access

More filters

Book•

A reflectance model for computer graphics

[...]

Robert L. Cook, Kenneth E. Torrance

01 Dec 1988

TL;DR: In this paper, the spectral energy distribution of the reflected light from an object made of a specific real material is obtained and a procedure for accurately reproducing the color associated with the spectrum is discussed.

...read moreread less

Abstract: This paper presents a new reflectance model for rendering computer synthesized images. The model accounts for the relative brightness of different materials and light sources in the same scene. It describes the directional distribution of the reflected light and a color shift that occurs as the reflectance changes with incidence angle. The paper presents a method for obtaining the spectral energy distribution of the light reflected from an object made of a specific real material and discusses a procedure for accurately reproducing the color associated with the spectral energy distribution. The model is applied to the simulation of a metal and a plastic.

...read moreread less

1,401 citations

Journal Article•DOI•

A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges

[...]

Moloud Abdar¹, Farhad Pourpanah², Sadiq Hussain³, Dana Rezazadegan⁴, Li Liu⁵, Mohammad Ghavamzadeh⁶, Paul Fieguth⁷, Xiaochun Cao⁸, Abbas Khosravi¹, U. Rajendra Acharya⁹, U. Rajendra Acharya¹⁰, U. Rajendra Acharya¹¹, Vladimir Makarenkov¹², Saeid Nahavandi¹ - Show less +10 more•Institutions (12)

Deakin University¹, Shenzhen University², Dibrugarh University³, Swinburne University of Technology⁴, University of Oulu⁵, Google⁶, University of Waterloo⁷, Chinese Academy of Sciences⁸, National University of Singapore⁹, Asia University (Taiwan)¹⁰, Ngee Ann Polytechnic¹¹, Université du Québec¹²

12 Nov 2020-arXiv: Learning

TL;DR: This study reviews recent advances in UQ methods used in deep learning and investigates the application of these methods in reinforcement learning (RL), and outlines a few important applications of UZ methods.

...read moreread less

Abstract: Uncertainty quantification (UQ) plays a pivotal role in reduction of uncertainties during both optimization and decision making processes. It can be applied to solve a variety of real-world applications in science and engineering. Bayesian approximation and ensemble learning techniques are two most widely-used UQ methods in the literature. In this regard, researchers have proposed different UQ methods and examined their performance in a variety of applications such as computer vision (e.g., self-driving cars and object detection), image processing (e.g., image restoration), medical image analysis (e.g., medical image classification and segmentation), natural language processing (e.g., text classification, social media texts and recidivism risk-scoring), bioinformatics, etc. This study reviews recent advances in UQ methods used in deep learning. Moreover, we also investigate the application of these methods in reinforcement learning (RL). Then, we outline a few important applications of UQ methods. Finally, we briefly highlight the fundamental research challenges faced by UQ methods and discuss the future research directions in this field.

...read moreread less

809 citations

Proceedings Article•DOI•

IBRNet: Learning Multi-View Image-Based Rendering

[...]

Qianqian Wang¹, Zhicheng Wang¹, Kyle Genova¹, Pratul P. Srinivasan¹, Howard Zhou¹, Jonathan T. Barron¹, Ricardo Martin-Brualla¹, Noah Snavely¹, Thomas Funkhouser¹ - Show less +5 more•Institutions (1)

Google¹

20 Jun 2021

TL;DR: A method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views using a network architecture that includes a multilayer perceptron and a ray transformer that estimates radiance and volume density at continuous 5D locations.

...read moreread less

Abstract: We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views. The core of our method is a network architecture that includes a multilayer perceptron and a ray transformer that estimates radiance and volume density at continuous 5D locations (3D spatial locations and 2D viewing directions), drawing appearance information on the fly from multiple source views. By drawing on source views at render time, our method hearkens back to classic work on image-based rendering (IBR), and allows us to render high-resolution imagery. Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that generalizes to novel scenes. We render images using classic volume rendering, which is fully differentiable and allows us to train using only multi-view posed images as supervision. Experiments show that our method outperforms recent novel view synthesis methods that also seek to generalize to novel scenes. Further, if fine-tuned on each scene, our method is competitive with state-of-the-art single-scene neural rendering methods.1

...read moreread less

402 citations

Journal Article•DOI•

Neural volumes: learning dynamic renderable volumes from images

[...]

Stephen Lombardi¹, Tomas Simon¹, Jason Saragih¹, Gabriel Schwartz¹, Andreas M. Lehrmann¹, Yaser Sheikh¹ - Show less +2 more•Institutions (1)

Facebook¹

12 Jul 2019-ACM Transactions on Graphics

TL;DR: This work presents a learning-based approach to representing dynamic objects inspired by the integral projection model used in tomographic imaging, and learns a latent representation of a dynamic scene that enables us to produce novel content sequences not seen during training.

...read moreread less

Abstract: Modeling and rendering of dynamic scenes is challenging, as natural scenes often contain complex phenomena such as thin structures, evolving topology, translucency, scattering, occlusion, and biological motion. Mesh-based reconstruction and tracking often fail in these cases, and other approaches (e.g., light field video) typically rely on constrained viewing conditions, which limit interactivity. We circumvent these difficulties by presenting a learning-based approach to representing dynamic objects inspired by the integral projection model used in tomographic imaging. The approach is supervised directly from 2D images in a multi-view capture setting and does not require explicit reconstruction or tracking of the object. Our method has two primary components: an encoder-decoder network that transforms input images into a 3D volume representation, and a differentiable ray-marching operation that enables end-to-end training. By virtue of its 3D representation, our construction extrapolates better to novel viewpoints compared to screen-space rendering techniques. The encoder-decoder architecture learns a latent representation of a dynamic scene that enables us to produce novel content sequences not seen during training. To overcome memory limitations of voxel-based representations, we learn a dynamic irregular grid structure implemented with a warp field during ray-marching. This structure greatly improves the apparent resolution and reduces grid-like artifacts and jagged motion. Finally, we demonstrate how to incorporate surface-based representations into our volumetric-learning framework for applications where the highest resolution is required, using facial performance capture as a case in point.

...read moreread less

333 citations

Proceedings Article•DOI•

DreamFusion: Text-to-3D using 2D Diffusion

[...]

Ben Poole, Ajay Jain, Jonathan T. Barron, Ben Mildenhall

29 Sep 2022

TL;DR: This work introduces a loss based on probability density distillation that enables the use of a 2D diffusion model as a prior for optimization of a parametric image generator and optimize a randomly-initialized 3D model via gradient descent such that its 2D renderings from random angles achieve a low loss.

...read moreread less

Abstract: Recent breakthroughs in text-to-image synthesis have been driven by diffusion models trained on billions of image-text pairs. Adapting this approach to 3D synthesis would require large-scale datasets of labeled 3D data and efficient architectures for denoising 3D data, neither of which currently exist. In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis. We introduce a loss based on probability density distillation that enables the use of a 2D diffusion model as a prior for optimization of a parametric image generator. Using this loss in a DeepDream-like procedure, we optimize a randomly-initialized 3D model (a Neural Radiance Field, or NeRF) via gradient descent such that its 2D renderings from random angles achieve a low loss. The resulting 3D model of the given text can be viewed from any angle, relit by arbitrary illumination, or composited into any 3D environment. Our approach requires no 3D training data and no modifications to the image diffusion model, demonstrating the effectiveness of pretrained image diffusion models as priors.

...read moreread less

316 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse