Showing papers by "Jingyi Yu published in 2020"

PDF

Open Access

Proceedings Article•DOI•

[...]

Minye Wu¹, Yuehao Wang², Qiang Hu², Jingyi Yu²•Institutions (2)

Chinese Academy of Sciences¹, ShanghaiTech University²

14 Jun 2020

TL;DR: Comprehensive experiments show NHR significantly outperforms the state-of-the-art neural and image-based rendering techniques, especially on hands, hair, nose, foot, etc.

...read moreread less

Abstract: We present an end-to-end Neural Human Renderer (NHR) for dynamic human captures under the multi-view setting. NHR adopts PointNet++ for feature extraction (FE) to enable robust 3D correspondence matching on low quality, dynamic 3D reconstructions. To render new views, we map 3D features onto the target camera as a 2D feature map and employ an anti-aliased CNN to handle holes and noises. Newly synthesized views from NHR can be further used to construct visual hulls to handle textureless and/or dark regions such as black clothing. Comprehensive experiments show NHR significantly outperforms the state-of-the-art neural and image-based rendering techniques, especially on hands, hair, nose, foot, etc.

...read moreread less

74 citations

Proceedings Article•DOI•

Geometric Structure Based and Regularized Depth Estimation From 360 Indoor Imagery

[...]

Lei Jin¹, Yanyu Xu¹, Jia Zheng¹, Junfei Zhang, Rui Tang, Shugong Xu², Jingyi Yu¹, Shenghua Gao¹ - Show less +4 more•Institutions (2)

ShanghaiTech University¹, Shanghai University²

14 Jun 2020

TL;DR: A novel learning-based depth estimation framework that leverages the geometric structure of a scene to conduct depth estimation and demonstrates that the method can be applied to counterfactual depth.

...read moreread less

Abstract: Motivated by the correlation between the depth and the geometric structure of a 360 indoor image, we propose a novel learning-based depth estimation framework that leverages the geometric structure of a scene to conduct depth estimation. Specifically, we represent the geometric structure of an indoor scene as a collection of corners, boundaries and planes. On the one hand, once a depth map is estimated, this geometric structure can be inferred from the estimated depth map; thus, the geometric structure functions as a regularizer for depth estimation. On the other hand, this estimation also benefits from the geometric structure of a scene estimated from an image where the structure functions as a prior. However, furniture in indoor scenes makes it challenging to infer geometric structure from depth or image data. An attention map is inferred to facilitate both depth estimation from features of the geometric structure and also geometric inferences from the estimated depth map. To validate the effectiveness of each component in our framework under controlled conditions, we render a synthetic dataset, Shanghaitech-Kujiale Indoor 360 dataset with 3550 360 indoor images. Extensive experiments on popular datasets validate the effectiveness of our solution. We also demonstrate that our method can also be applied to counterfactual depth.

...read moreread less

70 citations

Book Chapter•DOI•

Spatial-Angular Interaction for Light Field Image Super-Resolution

[...]

Yingqian Wang¹, Longguang Wang¹, Jungang Yang¹, Wei An¹, Jingyi Yu², Yulan Guo¹ - Show less +2 more•Institutions (2)

National University of Defense Technology¹, ShanghaiTech University²

23 Aug 2020

TL;DR: Experimental results demonstrate the superiority of LF-InterNet over the state-of-the-art methods, i.e., the method can achieve high PSNR and SSIM scores with low computational cost, and recover faithful details in the reconstructed images.

...read moreread less

Abstract: Light field (LF) cameras record both intensity and directions of light rays, and capture scenes from a number of viewpoints. Both information within each perspective (i.e., spatial information) and among different perspectives (i.e., angular information) is beneficial to image super-resolution (SR). In this paper, we propose a spatial-angular interactive network (namely, LF-InterNet) for LF image SR. Specifically, spatial and angular features are first separately extracted from input LFs, and then repetitively interacted to progressively incorporate spatial and angular information. Finally, the interacted features are fused to super-resolve each sub-aperture image. Experimental results demonstrate the superiority of LF-InterNet over the state-of-the-art methods, i.e., our method can achieve high PSNR and SSIM scores with low computational cost, and recover faithful details in the reconstructed images.

...read moreread less

58 citations

Posted Content•DOI•

SofGAN: A Portrait Image Generator with Dynamic Styling

[...]

Anpei Chen, Ruiyang Liu, Ling Xie, Zhang Chen, Hao Su, Jingyi Yu - Show less +2 more

07 Jul 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: Wang et al. as discussed by the authors propose to separate the latent space of portrait images into two subspaces: a geometry space and a texture space, which are then fed to two network branches separately, one to generate the 3D geometry of portraits with canonical pose, and the other to generate textures.

...read moreread less

Abstract: Recently, Generative Adversarial Networks (GANs)} have been widely used for portrait image generation. However, in the latent space learned by GANs, different attributes, such as pose, shape, and texture style, are generally entangled, making the explicit control of specific attributes difficult. To address this issue, we propose a SofGAN image generator to decouple the latent space of portraits into two subspaces: a geometry space and a texture space. The latent codes sampled from the two subspaces are fed to two network branches separately, one to generate the 3D geometry of portraits with canonical pose, and the other to generate textures. The aligned 3D geometries also come with semantic part segmentation, encoded as a semantic occupancy field (SOF). The SOF allows the rendering of consistent 2D semantic segmentation maps at arbitrary views, which are then fused with the generated texture maps and stylized to a portrait photo using our semantic instance-wise (SIW) module. Through extensive experiments, we show that our system can generate high quality portrait images with independently controllable geometry and texture attributes. The method also generalizes well in various applications such as appearance-consistent facial animation and dynamic styling.

...read moreread less

44 citations

Journal Article•DOI•

Saliency Detection via Depth-Induced Cellular Automata on Light Field

[...]

Yongri Piao¹, Xiao Li¹, Miao Zhang¹, Jingyi Yu², Huchuan Lu¹ - Show less +1 more•Institutions (2)

Dalian University of Technology¹, ShanghaiTech University²

01 Jan 2020-IEEE Transactions on Image Processing

TL;DR: Experiments show the proposed method is robust to a wide range of challenging scenes and outperforms the state-of-the-art 2D/3D/4D (light-field) saliency detection approaches.

...read moreread less

Abstract: Incorrect saliency detection such as false alarms and missed alarms may lead to potentially severe consequences in various application areas. Effective separation of salient objects in complex scenes is a major challenge in saliency detection. In this paper, we propose a new method for saliency detection on light field to improve the saliency detection in challenging scenes. We construct an object-guided depth map, which acts as an inducer to efficiently incorporate the relations among light field cues, by using abundant light field cues. Furthermore, we enforce spatial consistency by constructing an optimization model, named Depth-induced Cellular Automata (DCA), in which the saliency value of each superpixel is updated by exploiting the intrinsic relevance of its similar regions. Additionally, the proposed DCA model enables inaccurate saliency maps to achieve a high level of accuracy. We analyze our approach on one publicly available dataset. Experiments show the proposed method is robust to a wide range of challenging scenes and outperforms the state-of-the-art 2D/3D/4D (light-field) saliency detection approaches.

...read moreread less

42 citations

Proceedings Article•DOI•

A Neural Rendering Framework for Free-Viewpoint Relighting

[...]

Zhang Chen¹, Anpei Chen², Guli Zhang², Chengyuan Wang³, Yu Ji, Kiriakos N. Kutulakos⁴, Jingyi Yu² - Show less +3 more•Institutions (4)

Chinese Academy of Sciences¹, ShanghaiTech University², Shanghai University³, University of Toronto⁴

14 Jun 2020

TL;DR: Comprehensive experiments show that RNR provides a practical and effective solution for conducting free-viewpoint relighting and improves the quality of view synthesis.

...read moreread less

Abstract: We present a novel Relightable Neural Renderer (RNR) for simultaneous view synthesis and relighting using multi-view image inputs. Existing neural rendering (NR) does not explicitly model the physical rendering process and hence has limited capabilities on relighting. RNR instead models image formation in terms of environment lighting, object intrinsic attributes, and light transport function (LTF), each corresponding to a learnable component. In particular, the incorporation of a physically based rendering process not only enables relighting but also improves the quality of view synthesis. Comprehensive experiments on synthetic and real data show that RNR provides a practical and effective solution for conducting free-viewpoint relighting.

...read moreread less

39 citations

Journal Article•DOI•

Deep Coarse-to-fine Dense Light Field Reconstruction with Flexible Sampling and Geometry-aware Fusion.

[...]

Jing Jin¹, Junhui Hou¹, Jie Chen², Huanqiang Zeng³, Sam Kwong¹, Jingyi Yu⁴ - Show less +2 more•Institutions (4)

City University of Hong Kong¹, Nanyang Technological University², Huaqiao University³, ShanghaiTech University⁴

23 Sep 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel learning-based method is proposed, which accepts sparsely-sampled LFs with irregular structures, and produces densely-samplings with arbitrary angular resolution accurately and efficiently, and a simple yet effective method for optimizing the sampling pattern.

...read moreread less

Abstract: A densely-sampled light field (LF) is highly desirable in various applications. However, it is costly to acquire such data. Although many computational methods have been proposed to reconstruct a densely-sampled LF from a sparsely-sampled one, they still suffer from either low reconstruction quality, low computational efficiency, or the restriction on the regularity of the sampling pattern. To this end, we propose a novel learning-based method, which accepts sparsely-sampled LFs with irregular structures, and produces densely-sampled LFs with arbitrary angular resolution accurately and efficiently. We also propose a simple yet effective method for optimizing the sampling pattern. Our proposed method, an end-to-end trainable network, reconstructs a densely-sampled LF in a coarse-to-fine manner. Specifically, the coarse sub-aperture image (SAI) synthesis module first explores the scene geometry from an unstructured sparsely-sampled LF and leverages it to independently synthesize novel SAIs, in which a confidence-based blending strategy is proposed to fuse the information from different input SAIs, giving an intermediate densely-sampled LF. Then, the efficient LF refinement module learns the angular relationship within the intermediate result to recover the LF parallax structure. Comprehensive experimental evaluations demonstrate the superiority of our method on both real-world and synthetic LF images when compared with state-of-the-art methods.

...read moreread less

35 citations

Journal Article•DOI•

AutoSweep : Recovering 3D Editable Objects from a Single Photograph

[...]

Xin Chen¹, Yuwei Li¹, Xi Luo¹, Tianjia Shao², Jingyi Yu¹, Kun Zhou³, Youyi Zheng³ - Show less +3 more•Institutions (3)

ShanghaiTech University¹, University of Leeds², Zhejiang University³

01 Mar 2020-IEEE Transactions on Visualization and Computer Graphics

TL;DR: This paper presents a fully automatic framework for extracting editable 3D objects directly from a single photograph, and builds a novel instance-aware segmentation network for accurate part separation.

...read moreread less

Abstract: This paper presents a fully automatic framework for extracting editable 3D objects directly from a single photograph. Unlike previous methods which recover either depth maps, point clouds, or mesh surfaces, we aim to recover 3D objects with semantic parts and can be directly edited. We base our work on the assumption that most human-made objects are constituted by parts and these parts can be well represented by generalized primitives. Our work makes an attempt towards recovering two types of primitive-shaped objects, namely, generalized cuboids and generalized cylinders. To this end, we build a novel instance-aware segmentation network for accurate part separation. Our GeoNet outputs a set of smooth part-level masks labeled as profiles and bodies. Then in a key stage, we simultaneously identify profile-body relations and recover 3D parts by sweeping the recognized profile along their body contour and jointly optimize the geometry to align with the recovered masks. Qualitative and quantitative experiments show that our algorithm can recover high quality 3D models and outperforms existing methods in both instance segmentation and 3D reconstruction.

...read moreread less

20 citations

Journal Article•DOI•

OSLNet: Deep Small-Sample Classification With an Orthogonal Softmax Layer

[...]

Xiaoxu Li¹, Dongliang Chang², Zhanyu Ma², Zheng-Hua Tan³, Jing-Hao Xue⁴, Jie Cao¹, Jingyi Yu⁵, Jun Guo¹ - Show less +4 more•Institutions (5)

Lanzhou University of Technology¹, Beijing University of Posts and Telecommunications², Aalborg University³, University College London⁴, ShanghaiTech University⁵

06 May 2020-IEEE Transactions on Image Processing

TL;DR: In this paper, the Orthogonal Softmax Layer (OSL) is proposed to make the weight vectors in the classification layer remain orthogonal during both the training and test processes.

...read moreread less

Abstract: A deep neural network of multiple nonlinear layers forms a large function space, which can easily lead to overfitting when it encounters small-sample data. To mitigate overfitting in small-sample classification, learning more discriminative features from small-sample data is becoming a new trend. To this end, this paper aims to find a subspace of neural networks that can facilitate a large decision margin. Specifically, we propose the Orthogonal Softmax Layer (OSL), which makes the weight vectors in the classification layer remain orthogonal during both the training and test processes. The Rademacher complexity of a network using the OSL is only $\frac {1}{K}$ , where $K$ is the number of classes, of that of a network using the fully connected classification layer, leading to a tighter generalization error bound. Experimental results demonstrate that the proposed OSL has better performance than the methods used for comparison on four small-sample benchmark datasets, as well as its applicability to large-sample datasets. Codes are available at: https://github.com/dongliangchang/OSLNet .

...read moreread less

20 citations

Proceedings Article•DOI•

Light Field Super-resolution via Attention-Guided Fusion of Hybrid Lenses

[...]

Jing Jin¹, Junhui Hou¹, Jie Chen², Sam Kwong, Jingyi Yu³ - Show less +1 more•Institutions (3)

City University of Hong Kong¹, Hong Kong Baptist University², ShanghaiTech University³

12 Oct 2020

TL;DR: This paper proposes a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input from two complementary and parallel perspectives to reconstructing high-resolution light field images from hybrid lenses.

...read moreread less

Abstract: This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses, including a high-resolution camera surrounded by multiple low-resolution cameras. To tackle this challenge, we propose a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input from two complementary and parallel perspectives. Specifically, one module regresses a spatially consistent intermediate estimation by learning a deep multidimensional and cross-domain feature representation; the other one constructs another intermediate estimation, which maintains the high-frequency textures, by propagating the information of the high-resolution view. We finally leverage the advantages of the two intermediate estimations via the learned attention maps, leading to the final high-resolution LF image. Extensive experiments demonstrate the significant superiority of our approach over state-of-the-art ones. That is, our method not only improves the PSNR by more than 2 dB, but also preserves the LF structure much better. To the best of our knowledge, this is the first end-to-end deep learning method for reconstructing a high-resolution LF image with a hybrid input. We believe our framework could potentially decrease the cost of high-resolution LF data acquisition and also be beneficial to LF data storage and transmission. The code is available at https://github.com/jingjin25/LFhybridSR-Fusion.

...read moreread less

20 citations

Proceedings Article•DOI•

LGNN: A Context-aware Line Segment Detector

[...]

Quan Meng¹, Jiakai Zhang¹, Qiang Hu¹, Xuming He¹, Jingyi Yu¹ - Show less +1 more•Institutions (1)

ShanghaiTech University¹

13 Aug 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel real-time line segment detection scheme called Line Graph Neural Network (LGNN), which employs a deep convolutional neural network for proposing line segment directly, with a graph neural network module for reasoning their connectivities.

...read moreread less

Abstract: We present a novel real-time line segment detection scheme called Line Graph Neural Network (LGNN). Existing approaches require a computationally expensive verification or postprocessing step. Our LGNN employs a deep convolutional neural network (DCNN) for proposing line segment directly, with a graph neural network (GNN) module for reasoning their connectivities. Specifically, LGNN exploits a new quadruplet representation for each line segment where the GNN module takes the predicted candidates as vertexes and constructs a sparse graph to enforce structural context. Compared with the state-of-the-art, LGNN achieves near real-time performance without compromising accuracy. LGNN further enables time-sensitive 3D applications. When a 3D point cloud is accessible, we present a multi-modal line segment classification technique for extracting a 3D wireframe of the environment robustly and efficiently.

...read moreread less

Journal Article•DOI•

OSLNet: Deep Small-Sample Classification with an Orthogonal Softmax Layer

[...]

Xiaoxu Li¹, Dongliang Chang², Zhanyu Ma², Zheng-Hua Tan³, Jing-Hao Xue⁴, Jie Cao¹, Jingyi Yu⁵, Jun Guo¹ - Show less +4 more•Institutions (5)

Lanzhou University of Technology¹, Beijing University of Posts and Telecommunications², Aalborg University³, University College London⁴, ShanghaiTech University⁵

20 Apr 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: The proposed Orthogonal Softmax Layer (OSL), which makes the weight vectors in the classification layer remain orthogonal during both the training and test processes, has better performance than the methods used for comparison on four small-sample benchmark datasets, as well as its applicability to large-sample datasets.

...read moreread less

Abstract: A deep neural network of multiple nonlinear layers forms a large function space, which can easily lead to overfitting when it encounters small-sample data. To mitigate overfitting in small-sample classification, learning more discriminative features from small-sample data is becoming a new trend. To this end, this paper aims to find a subspace of neural networks that can facilitate a large decision margin. Specifically, we propose the Orthogonal Softmax Layer (OSL), which makes the weight vectors in the classification layer remain orthogonal during both the training and test processes. The Rademacher complexity of a network using the OSL is only $\frac{1}{K}$, where $K$ is the number of classes, of that of a network using the fully connected classification layer, leading to a tighter generalization error bound. Experimental results demonstrate that the proposed OSL has better performance than the methods used for comparison on four small-sample benchmark datasets, as well as its applicability to large-sample datasets. Codes are available at: this https URL.

...read moreread less

Journal Article•DOI•

Visual Tracking With Multiview Trajectory Prediction

[...]

Minye Wu, Haibin Ling¹, Ning Bi, Shenghua Gao², Qiang Hu², Hao Sheng³, Jingyi Yu² - Show less +3 more•Institutions (3)

Stony Brook University¹, ShanghaiTech University², Beihang University³

13 Aug 2020-IEEE Transactions on Image Processing

TL;DR: A generic multiview tracking (GMT) framework that allows camera movement, while requiring neither specific object model nor camera calibration, is proposed, which shows clear advantages in terms of robustness over state-of-the-art ones.

...read moreread less

Abstract: Recent progresses in visual tracking have greatly improved the tracking performance. However, challenges such as occlusion and view change remain obstacles in real world deployment. A natural solution to these challenges is to use multiple cameras with multiview inputs, though existing systems are mostly limited to specific targets ( e.g. human), static cameras, and/or require camera calibration. To break through these limitations, we propose a generic multiview tracking (GMT) framework that allows camera movement, while requiring neither specific object model nor camera calibration. A key innovation in our framework is a cross-camera trajectory prediction network (TPN), which implicitly and dynamically encodes camera geometric relations, and hence addresses missing target issues such as occlusion. Moreover, during tracking, we assemble information across different cameras to dynamically update a novel collaborative correlation filter (CCF), which is shared among cameras to achieve robustness against view change. The two components are integrated into a correlation filter tracking framework, where features are trained offline using existing single view tracking datasets. For evaluation, we first contribute a new generic multiview tracking dataset (GMTD) with careful annotations, and then run experiments on the GMTD and CAMPUS datasets. The proposed GMT algorithm shows clear advantages in terms of robustness over state-of-the-art ones.

...read moreread less

Journal Article•DOI•

Shape and Reflectance Reconstruction Using Concentric Multi-Spectral Light Field

[...]

Mingyuan Zhou, Yuqi Ding¹, Yu Ji, S. Susan Young², Jingyi Yu³, Jinwei Ye¹ - Show less +2 more•Institutions (3)

Louisiana State University¹, United States Army Research Laboratory², ShanghaiTech University³

01 Jul 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Zhang et al. as discussed by the authors introduced a novel concentric multi-spectral light field (CMSLF) design that is able to recover the shape and reflectance of surfaces of various materials.

...read moreread less

Abstract: Recovering the shape and reflectance of non-Lambertian surfaces remains a challenging problem in computer vision since the view-dependent appearance invalidates traditional photo-consistency constraint. In this paper, we introduce a novel concentric multi-spectral light field (CMSLF) design that is able to recover the shape and reflectance of surfaces of various materials in one shot. Our CMSLF system consists of an array of cameras arranged on concentric circles where each ring captures a specific spectrum. Coupled with a multi-spectral ring light, we are able to sample viewpoint and lighting variations in a single shot via spectral multiplexing. We further show that our concentric camera and light source setting results in a unique single-peak pattern in specularity variations across viewpoints. This property enables robust depth estimation for specular points. To estimate depth and multi-spectral reflectance map, we formulate a physics-based reflectance model for the CMSLF under the surface camera (S-Cam) representation. Extensive synthetic and real experiments show that our method outperforms the state-of-the-art shape reconstruction methods, especially for non-Lambertian surfaces.

...read moreread less

Journal Article•DOI•

Neural Opacity Point Cloud

[...]

Cen Wang¹, Minye Wu¹, Ziyu Wang¹, Liao Wang¹, Hao Sheng², Jingyi Yu¹ - Show less +2 more•Institutions (2)

ShanghaiTech University¹, Beihang University²

01 Jul 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Comprehensive experiments show the novel and practical neural rendering technique called neural opacity point cloud (NOPC) can produce photorealistic rendering on inputs from multi-view setups such as a turntable system for hair and furry toy captures.

...read moreread less

Abstract: Fuzzy objects composed of hair, fur, or feather are impossible to scan even with the latest active or passive 3D scanners. We present a novel and practical neural rendering (NR) technique called neural opacity point cloud (NOPC) to allow high quality rendering of such fuzzy objects at any viewpoint. NOPC employs a learning-based scheme to extract geometric and appearance features on 3D point clouds including their opacity. It then maps the 3D features onto virtual viewpoints where a new U-Net based NR manages to handle noisy and incomplete geometry while maintaining translation equivariance. Comprehensive experiments on existing and new datasets show our NOPC can produce photorealistic rendering on inputs from multi-view setups such as a turntable system for hair and furry toy captures.

...read moreread less

Posted Content•

A Free Viewpoint Portrait Generator with Dynamic Styling.

[...]

Anpei Chen, Ruiyang Liu, Ling Xie, Jingyi Yu

07 Jul 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work first encode portrait scans with a semantic occupancy field (SOF), which represents semantic-embedded geometry structure and output free-viewpoint semantic segmentation maps, and designs a semantic instance wised (SIW) StyleGAN to regionally styling the segmentation map.

...read moreread less

Abstract: Generating portrait images from a single latent space facing the problem of entangled attributes, making it difficult to explicitly adjust the generation on specific attributes, e.g., contour and viewpoint control or dynamic styling. Therefore, we propose to decompose the generation space into two subspaces: geometric and texture space. We first encode portrait scans with a semantic occupancy field (SOF), which represents semantic-embedded geometry structure and output free-viewpoint semantic segmentation maps. Then we design a semantic instance wised(SIW) StyleGAN to regionally styling the segmentation map. We capture 664 3D portrait scans for our SOF training and use real capture photos(FFHQ and CelebA-HQ) for SIW StyleGAN training. Adequate experiments show that our representations enable appearance consistent shape, pose, regional styles controlling, achieve state-of-the-art results, and generalize well in various application scenarios.

...read moreread less

Proceedings Article•DOI•

Neural3D: Light-weight Neural Portrait Scanning via Context-aware Correspondence Learning

[...]

Xin Suo¹, Minye Wu¹, Yanshun Zhang, Yingliang Zhang, Lan Xu¹, Qiang Hu¹, Jingyi Yu¹ - Show less +3 more•Institutions (1)

ShanghaiTech University¹

12 Oct 2020

TL;DR: This paper proposes Neural3D: a novel neural human portrait scanning system using only a single RGB camera, and proposes a context-aware correspondence learning approach which jointly models the appearance, spatial and motion information between feature pairs.

...read moreread less

Abstract: Reconstructing a human portrait in a realistic and convenient manner is critical for human modeling and understanding. Aiming at light-weight and realistic human portrait reconstruction, in this paper we propose Neural3D: a novel neural human portrait scanning system using only a single RGB camera. In our system, to enable accurate pose estimation,we propose a context-aware correspondence learning approach which jointly models the appearance, spatial and motion information between feature pairs. To enable realistic reconstruction and suppress the geometry error, we further adopt a point-based neural rendering scheme to generate realistic and immersive portrait visualization in arbitrary virtual view-points. By introducing these learning-based technical components into the pure RGB-based human modeling framework, we can achieve both accurate camera pose estimation and realistic free-viewpoint rendering of the reconstructed human portrait. Extensive experiments on a variety of challenging capture scenarios demonstrate the robustness and effectiveness of our approach.

...read moreread less

Proceedings Article•DOI•

LGNN: A Context-aware Line Segment Detector

[...]

Quan Meng¹, Jiakai Zhang¹, Qiang Hu¹, Xuming He¹, Jingyi Yu¹ - Show less +1 more•Institutions (1)

ShanghaiTech University¹

12 Oct 2020

TL;DR: Li et al. as discussed by the authors proposed Line Graph Neural Network (LGNN), which employs a deep convolutional neural network (DCNN) for proposing line segment directly, with a graph neural network(GNN) module for reasoning their connectivities.

...read moreread less

Abstract: We present a novel real-time line segment detection scheme called Line Graph Neural Network (LGNN). Existing approaches require a computationally expensive verification or postprocessing step. Our LGNN employs a deep convolutional neural network (DCNN) for proposing line segment directly, with a graph neural network (GNN) module for reasoning their connectivities. Specifically, LGNN exploits a new quadruplet representation for each segment where the GNN module takes the predicted candidates as vertexes and constructs a sparse graph to enforce structural context. Compared with the state-of-the-art, LGNN achieves near real-time performance without compromising accuracy. LGNN further enables time-sensitive 3D applications. When a 3D point cloud is accessible, we present a multi-modal line segment classification technique for extracting a 3D wireframe of the environment robustly and efficiently.

...read moreread less

Book Chapter•DOI•

3D Fluid Flow Reconstruction Using Compact Light Field PIV.

[...]

Zhong Li¹, Yu Ji, Jingyi Yu², Jinwei Ye³•Institutions (3)

University of Delaware¹, ShanghaiTech University², Louisiana State University³

23 Aug 2020

TL;DR: A PIV solution that uses a compact lenslet-based light field camera to track dense particles floating in the fluid and reconstruct the 3D fluid flow and develops a motion-constrained optical flow estimation algorithm by enforcing the local motion rigidity and the Navier-Stoke fluid constraint.

...read moreread less

Abstract: Particle Imaging Velocimetry (PIV) estimates the fluid flow by analyzing the motion of injected particles. The problem is challenging as the particles lie at different depths but have similar appearances. Tracking a large number of moving particles is particularly difficult due to the heavy occlusion. In this paper, we present a PIV solution that uses a compact lenslet-based light field camera to track dense particles floating in the fluid and reconstruct the 3D fluid flow. We exploit the focal symmetry property in the light field focal stacks for recovering the depths of similar-looking particles. We further develop a motion-constrained optical flow estimation algorithm by enforcing the local motion rigidity and the Navier-Stoke fluid constraint. Finally, the estimated particle motion trajectory is used to visualize the 3D fluid flow. Comprehensive experiments on both synthetic and real data show that using a compact light field camera, our technique can recover dense and accurate 3D fluid flow.

...read moreread less

Journal Article•DOI•

Ray-Space Epipolar Geometry for Light Field Cameras

[...]

Qi Zhang¹, Qing Wang¹, Hongdong Li², Jingyi Yu³•Institutions (3)

Northwestern Polytechnical University¹, Australian National University², ShanghaiTech University³

22 Sep 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel ray-space ep bipolar geometry is developed which intrinsically encapsulates the complete projective relationship between two light fields, while the generalized epipolar geometry which describes relationship of normalized light fields is the specialization of the proposed model to calibrated cameras.

...read moreread less

Abstract: Light field essentially represents rays in space. The epipolar geometry between two light fields is an important relationship that captures ray-ray correspondences and relative configuration of two views. Unfortunately, so far little work has been done in deriving a formal epipolar geometry model that is specifically tailored for light field cameras. This is primarily due to the high-dimensional nature of the ray sampling process with a light field camera. This paper fills in this gap by developing a novel ray-space epipolar geometry which intrinsically encapsulates the complete projective relationship between two light fields, while the generalized epipolar geometry which describes relationship of normalized light fields is the specialization of the proposed model to calibrated cameras. With Plecker parameterization, we propose the ray-space projection model involving a 6 6 ray-space intrinsic matrix for ray sampling of light field camera. Ray-space fundamental matrix and its properties are then derived to constrain ray-ray correspondences for general and special motions. Finally, based on ray-space epipolar geometry, we present two novel algorithms, one for fundamental matrix estimation, and the other for calibration. Experiments on synthetic and real data have validated the effectiveness of ray-space epipolar geometry in solving 3D computer vision tasks with light field cameras.

...read moreread less

Proceedings Article•DOI•

3D Face Reconstruction using Color Photometric Stereo with Uncalibrated Near Point Lights

[...]

Zhang Chen¹, Yu Ji, Mingyuan Zhou, Sing Bing Kang, Jingyi Yu² - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, ShanghaiTech University²

01 Apr 2020

TL;DR: In this article, a color photometric stereo (CPS) method was proposed to recover high quality, detailed 3D face geometry in a single shot using three uncalibrated near point lights of different colors and a single camera.

...read moreread less

Abstract: We present a new color photometric stereo (CPS) method that recovers high quality, detailed 3D face geometry in a single shot. Our system uses three uncalibrated near point lights of different colors and a single camera. For robust self-calibration of the light sources, we use 3D morphable model (3DMM) [1] and semantic segmentation of facial parts. For reconstruction, we address the inherent spectral ambiguity in color photometric stereo by incorporating albedo consensus, albedo similarity, and proxy prior into a unified framework. In this way, we jointly exploit multiple cues to resolve under-determinedness, without the need for spatial constancy of albedo. Experiments show that our new approach produces state-of-the-art results from single image with high-fidelity geometry that includes details such as wrinkles.

...read moreread less

Journal Article•DOI•

Accurate Line-Based Relative Pose Estimation With Camera Matrices

[...]

Peihong Yu¹, Cen Wang¹, Zhirui Wang, Jingyi Yu¹, Laurent Kneip¹ - Show less +1 more•Institutions (1)

ShanghaiTech University¹

06 May 2020-IEEE Access

TL;DR: A novel stereo trifocal tensor solver is presented and the camera matrix’s ability to continuously and robustly bootstrap visual motion estimation pipelines via integration into a robust, purely line-based visual odometry pipeline is outlined.

...read moreread less

Abstract: While most monocular structure-from-motion frameworks rely on sparse keypoints, it has long been acknowledged that lines represent an alternative, higher-order feature with high accuracy, repeatability, and abundant availability in man-made environments. Its exclusive use, however, is severely complicated by its inability to resolve the common bootstrapping scenario of two-view geometry. Even with stereo cameras, a one-dimensional disparity space, as well as ill-posed triangulations of horizontal lines make the realization of purely line-based tracking pipelines difficult. The present paper successfully leverages the redundancy in camera matrices to alleviate this shortcoming. We present a novel stereo trifocal tensor solver and extend it to the case of two camera matrix view-points. Our experiments demonstrate superior behavior with respect to both 2D-2D and 3D-3D alternatives. We furthermore outline the camera matrix’s ability to continuously and robustly bootstrap visual motion estimation pipelines via integration into a robust, purely line-based visual odometry pipeline. The result leads to state-of-the-art tracking accuracy comparable to what is achieved by point-based stereo or even dense depth camera alternatives.

...read moreread less

Patent•

Method and system for three-dimensional model reconstruction

[...]

Jingyi Yu¹•Institutions (1)

ShanghaiTech University¹

05 Mar 2020

TL;DR: In this paper, a method of generating a 3D model of an object from a plurality of views at a plurality-of-views is described. But the method is limited to the case of a single camera.

...read moreread less

Abstract: A method of generating a three-dimensional model of an object is disclosed. The method may use a light field camera to capture a plurality of light field images at a plurality of viewpoints. The method may include capturing a first light field image at a first viewpoint; capturing a second light field image at the second viewpoint; estimating a rotation and a translation of a light field from the first viewpoint to the second viewpoint; obtaining a disparity map from each of the plurality of light field image; and computing a three-dimensional point cloud by optimizing the rotation and translation of the light field and the disparity map. The first light field image may include a first plurality of subaperture images and the second light field image may include a second plurality of subaperture images.

...read moreread less

Patent•

Face detection and recognition method using light field camera system

[...]

Zhiru Shi, Minye Wu, Ma Wengguang, Jingyi Yu

28 Jul 2020

TL;DR: In this paper, a method of detecting and recognizing faces using a light field camera array, comprising capturing multi-view color images using the LF camera array; obtaining a depth map; conducting light field rendering using a weight function comprising a depth component and a sematic component.

...read moreread less

Abstract: A method of detecting and recognizing faces using a light field camera array, comprising capturing multi-view color images using the light field camera array; obtaining a depth map; conducting light field rendering using a weight function comprising a depth component and a sematic component, where the weight function assigns a ray in the light field with a weight; and detecting and recognizing a face.

...read moreread less