scispace - formally typeset
Search or ask a question

Showing papers by "Jingyi Yu published in 2020"


Proceedings ArticleDOI
14 Jun 2020
TL;DR: Comprehensive experiments show NHR significantly outperforms the state-of-the-art neural and image-based rendering techniques, especially on hands, hair, nose, foot, etc.
Abstract: We present an end-to-end Neural Human Renderer (NHR) for dynamic human captures under the multi-view setting. NHR adopts PointNet++ for feature extraction (FE) to enable robust 3D correspondence matching on low quality, dynamic 3D reconstructions. To render new views, we map 3D features onto the target camera as a 2D feature map and employ an anti-aliased CNN to handle holes and noises. Newly synthesized views from NHR can be further used to construct visual hulls to handle textureless and/or dark regions such as black clothing. Comprehensive experiments show NHR significantly outperforms the state-of-the-art neural and image-based rendering techniques, especially on hands, hair, nose, foot, etc.

74 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: A novel learning-based depth estimation framework that leverages the geometric structure of a scene to conduct depth estimation and demonstrates that the method can be applied to counterfactual depth.
Abstract: Motivated by the correlation between the depth and the geometric structure of a 360 indoor image, we propose a novel learning-based depth estimation framework that leverages the geometric structure of a scene to conduct depth estimation. Specifically, we represent the geometric structure of an indoor scene as a collection of corners, boundaries and planes. On the one hand, once a depth map is estimated, this geometric structure can be inferred from the estimated depth map; thus, the geometric structure functions as a regularizer for depth estimation. On the other hand, this estimation also benefits from the geometric structure of a scene estimated from an image where the structure functions as a prior. However, furniture in indoor scenes makes it challenging to infer geometric structure from depth or image data. An attention map is inferred to facilitate both depth estimation from features of the geometric structure and also geometric inferences from the estimated depth map. To validate the effectiveness of each component in our framework under controlled conditions, we render a synthetic dataset, Shanghaitech-Kujiale Indoor 360 dataset with 3550 360 indoor images. Extensive experiments on popular datasets validate the effectiveness of our solution. We also demonstrate that our method can also be applied to counterfactual depth.

70 citations


Book ChapterDOI
23 Aug 2020
TL;DR: Experimental results demonstrate the superiority of LF-InterNet over the state-of-the-art methods, i.e., the method can achieve high PSNR and SSIM scores with low computational cost, and recover faithful details in the reconstructed images.
Abstract: Light field (LF) cameras record both intensity and directions of light rays, and capture scenes from a number of viewpoints. Both information within each perspective (i.e., spatial information) and among different perspectives (i.e., angular information) is beneficial to image super-resolution (SR). In this paper, we propose a spatial-angular interactive network (namely, LF-InterNet) for LF image SR. Specifically, spatial and angular features are first separately extracted from input LFs, and then repetitively interacted to progressively incorporate spatial and angular information. Finally, the interacted features are fused to super-resolve each sub-aperture image. Experimental results demonstrate the superiority of LF-InterNet over the state-of-the-art methods, i.e., our method can achieve high PSNR and SSIM scores with low computational cost, and recover faithful details in the reconstructed images.

58 citations


Posted ContentDOI
TL;DR: Wang et al. as discussed by the authors propose to separate the latent space of portrait images into two subspaces: a geometry space and a texture space, which are then fed to two network branches separately, one to generate the 3D geometry of portraits with canonical pose, and the other to generate textures.
Abstract: Recently, Generative Adversarial Networks (GANs)} have been widely used for portrait image generation. However, in the latent space learned by GANs, different attributes, such as pose, shape, and texture style, are generally entangled, making the explicit control of specific attributes difficult. To address this issue, we propose a SofGAN image generator to decouple the latent space of portraits into two subspaces: a geometry space and a texture space. The latent codes sampled from the two subspaces are fed to two network branches separately, one to generate the 3D geometry of portraits with canonical pose, and the other to generate textures. The aligned 3D geometries also come with semantic part segmentation, encoded as a semantic occupancy field (SOF). The SOF allows the rendering of consistent 2D semantic segmentation maps at arbitrary views, which are then fused with the generated texture maps and stylized to a portrait photo using our semantic instance-wise (SIW) module. Through extensive experiments, we show that our system can generate high quality portrait images with independently controllable geometry and texture attributes. The method also generalizes well in various applications such as appearance-consistent facial animation and dynamic styling.

44 citations


Journal ArticleDOI
TL;DR: Experiments show the proposed method is robust to a wide range of challenging scenes and outperforms the state-of-the-art 2D/3D/4D (light-field) saliency detection approaches.
Abstract: Incorrect saliency detection such as false alarms and missed alarms may lead to potentially severe consequences in various application areas. Effective separation of salient objects in complex scenes is a major challenge in saliency detection. In this paper, we propose a new method for saliency detection on light field to improve the saliency detection in challenging scenes. We construct an object-guided depth map, which acts as an inducer to efficiently incorporate the relations among light field cues, by using abundant light field cues. Furthermore, we enforce spatial consistency by constructing an optimization model, named Depth-induced Cellular Automata (DCA), in which the saliency value of each superpixel is updated by exploiting the intrinsic relevance of its similar regions. Additionally, the proposed DCA model enables inaccurate saliency maps to achieve a high level of accuracy. We analyze our approach on one publicly available dataset. Experiments show the proposed method is robust to a wide range of challenging scenes and outperforms the state-of-the-art 2D/3D/4D (light-field) saliency detection approaches.

42 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: Comprehensive experiments show that RNR provides a practical and effective solution for conducting free-viewpoint relighting and improves the quality of view synthesis.
Abstract: We present a novel Relightable Neural Renderer (RNR) for simultaneous view synthesis and relighting using multi-view image inputs. Existing neural rendering (NR) does not explicitly model the physical rendering process and hence has limited capabilities on relighting. RNR instead models image formation in terms of environment lighting, object intrinsic attributes, and light transport function (LTF), each corresponding to a learnable component. In particular, the incorporation of a physically based rendering process not only enables relighting but also improves the quality of view synthesis. Comprehensive experiments on synthetic and real data show that RNR provides a practical and effective solution for conducting free-viewpoint relighting.

39 citations


Journal ArticleDOI
TL;DR: A novel learning-based method is proposed, which accepts sparsely-sampled LFs with irregular structures, and produces densely-samplings with arbitrary angular resolution accurately and efficiently, and a simple yet effective method for optimizing the sampling pattern.
Abstract: A densely-sampled light field (LF) is highly desirable in various applications. However, it is costly to acquire such data. Although many computational methods have been proposed to reconstruct a densely-sampled LF from a sparsely-sampled one, they still suffer from either low reconstruction quality, low computational efficiency, or the restriction on the regularity of the sampling pattern. To this end, we propose a novel learning-based method, which accepts sparsely-sampled LFs with irregular structures, and produces densely-sampled LFs with arbitrary angular resolution accurately and efficiently. We also propose a simple yet effective method for optimizing the sampling pattern. Our proposed method, an end-to-end trainable network, reconstructs a densely-sampled LF in a coarse-to-fine manner. Specifically, the coarse sub-aperture image (SAI) synthesis module first explores the scene geometry from an unstructured sparsely-sampled LF and leverages it to independently synthesize novel SAIs, in which a confidence-based blending strategy is proposed to fuse the information from different input SAIs, giving an intermediate densely-sampled LF. Then, the efficient LF refinement module learns the angular relationship within the intermediate result to recover the LF parallax structure. Comprehensive experimental evaluations demonstrate the superiority of our method on both real-world and synthetic LF images when compared with state-of-the-art methods.

35 citations


Journal ArticleDOI
TL;DR: This paper presents a fully automatic framework for extracting editable 3D objects directly from a single photograph, and builds a novel instance-aware segmentation network for accurate part separation.
Abstract: This paper presents a fully automatic framework for extracting editable 3D objects directly from a single photograph. Unlike previous methods which recover either depth maps, point clouds, or mesh surfaces, we aim to recover 3D objects with semantic parts and can be directly edited. We base our work on the assumption that most human-made objects are constituted by parts and these parts can be well represented by generalized primitives. Our work makes an attempt towards recovering two types of primitive-shaped objects, namely, generalized cuboids and generalized cylinders. To this end, we build a novel instance-aware segmentation network for accurate part separation. Our GeoNet outputs a set of smooth part-level masks labeled as profiles and bodies. Then in a key stage, we simultaneously identify profile-body relations and recover 3D parts by sweeping the recognized profile along their body contour and jointly optimize the geometry to align with the recovered masks. Qualitative and quantitative experiments show that our algorithm can recover high quality 3D models and outperforms existing methods in both instance segmentation and 3D reconstruction.

20 citations


Journal ArticleDOI
TL;DR: In this paper, the Orthogonal Softmax Layer (OSL) is proposed to make the weight vectors in the classification layer remain orthogonal during both the training and test processes.
Abstract: A deep neural network of multiple nonlinear layers forms a large function space, which can easily lead to overfitting when it encounters small-sample data. To mitigate overfitting in small-sample classification, learning more discriminative features from small-sample data is becoming a new trend. To this end, this paper aims to find a subspace of neural networks that can facilitate a large decision margin. Specifically, we propose the Orthogonal Softmax Layer (OSL), which makes the weight vectors in the classification layer remain orthogonal during both the training and test processes. The Rademacher complexity of a network using the OSL is only $\frac {1}{K}$ , where $K$ is the number of classes, of that of a network using the fully connected classification layer, leading to a tighter generalization error bound. Experimental results demonstrate that the proposed OSL has better performance than the methods used for comparison on four small-sample benchmark datasets, as well as its applicability to large-sample datasets. Codes are available at: https://github.com/dongliangchang/OSLNet .

20 citations


Proceedings ArticleDOI
12 Oct 2020
TL;DR: This paper proposes a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input from two complementary and parallel perspectives to reconstructing high-resolution light field images from hybrid lenses.
Abstract: This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses, including a high-resolution camera surrounded by multiple low-resolution cameras. To tackle this challenge, we propose a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input from two complementary and parallel perspectives. Specifically, one module regresses a spatially consistent intermediate estimation by learning a deep multidimensional and cross-domain feature representation; the other one constructs another intermediate estimation, which maintains the high-frequency textures, by propagating the information of the high-resolution view. We finally leverage the advantages of the two intermediate estimations via the learned attention maps, leading to the final high-resolution LF image. Extensive experiments demonstrate the significant superiority of our approach over state-of-the-art ones. That is, our method not only improves the PSNR by more than 2 dB, but also preserves the LF structure much better. To the best of our knowledge, this is the first end-to-end deep learning method for reconstructing a high-resolution LF image with a hybrid input. We believe our framework could potentially decrease the cost of high-resolution LF data acquisition and also be beneficial to LF data storage and transmission. The code is available at https://github.com/jingjin25/LFhybridSR-Fusion.

20 citations


Proceedings ArticleDOI
Quan Meng1, Jiakai Zhang1, Qiang Hu1, Xuming He1, Jingyi Yu1 
TL;DR: A novel real-time line segment detection scheme called Line Graph Neural Network (LGNN), which employs a deep convolutional neural network for proposing line segment directly, with a graph neural network module for reasoning their connectivities.
Abstract: We present a novel real-time line segment detection scheme called Line Graph Neural Network (LGNN). Existing approaches require a computationally expensive verification or postprocessing step. Our LGNN employs a deep convolutional neural network (DCNN) for proposing line segment directly, with a graph neural network (GNN) module for reasoning their connectivities. Specifically, LGNN exploits a new quadruplet representation for each line segment where the GNN module takes the predicted candidates as vertexes and constructs a sparse graph to enforce structural context. Compared with the state-of-the-art, LGNN achieves near real-time performance without compromising accuracy. LGNN further enables time-sensitive 3D applications. When a 3D point cloud is accessible, we present a multi-modal line segment classification technique for extracting a 3D wireframe of the environment robustly and efficiently.

Journal ArticleDOI
TL;DR: The proposed Orthogonal Softmax Layer (OSL), which makes the weight vectors in the classification layer remain orthogonal during both the training and test processes, has better performance than the methods used for comparison on four small-sample benchmark datasets, as well as its applicability to large-sample datasets.
Abstract: A deep neural network of multiple nonlinear layers forms a large function space, which can easily lead to overfitting when it encounters small-sample data. To mitigate overfitting in small-sample classification, learning more discriminative features from small-sample data is becoming a new trend. To this end, this paper aims to find a subspace of neural networks that can facilitate a large decision margin. Specifically, we propose the Orthogonal Softmax Layer (OSL), which makes the weight vectors in the classification layer remain orthogonal during both the training and test processes. The Rademacher complexity of a network using the OSL is only $\frac{1}{K}$, where $K$ is the number of classes, of that of a network using the fully connected classification layer, leading to a tighter generalization error bound. Experimental results demonstrate that the proposed OSL has better performance than the methods used for comparison on four small-sample benchmark datasets, as well as its applicability to large-sample datasets. Codes are available at: this https URL.

Journal ArticleDOI
TL;DR: A generic multiview tracking (GMT) framework that allows camera movement, while requiring neither specific object model nor camera calibration, is proposed, which shows clear advantages in terms of robustness over state-of-the-art ones.
Abstract: Recent progresses in visual tracking have greatly improved the tracking performance. However, challenges such as occlusion and view change remain obstacles in real world deployment. A natural solution to these challenges is to use multiple cameras with multiview inputs, though existing systems are mostly limited to specific targets ( e.g. human), static cameras, and/or require camera calibration. To break through these limitations, we propose a generic multiview tracking (GMT) framework that allows camera movement, while requiring neither specific object model nor camera calibration. A key innovation in our framework is a cross-camera trajectory prediction network (TPN), which implicitly and dynamically encodes camera geometric relations, and hence addresses missing target issues such as occlusion. Moreover, during tracking, we assemble information across different cameras to dynamically update a novel collaborative correlation filter (CCF), which is shared among cameras to achieve robustness against view change. The two components are integrated into a correlation filter tracking framework, where features are trained offline using existing single view tracking datasets. For evaluation, we first contribute a new generic multiview tracking dataset (GMTD) with careful annotations, and then run experiments on the GMTD and CAMPUS datasets. The proposed GMT algorithm shows clear advantages in terms of robustness over state-of-the-art ones.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors introduced a novel concentric multi-spectral light field (CMSLF) design that is able to recover the shape and reflectance of surfaces of various materials.
Abstract: Recovering the shape and reflectance of non-Lambertian surfaces remains a challenging problem in computer vision since the view-dependent appearance invalidates traditional photo-consistency constraint. In this paper, we introduce a novel concentric multi-spectral light field (CMSLF) design that is able to recover the shape and reflectance of surfaces of various materials in one shot. Our CMSLF system consists of an array of cameras arranged on concentric circles where each ring captures a specific spectrum. Coupled with a multi-spectral ring light, we are able to sample viewpoint and lighting variations in a single shot via spectral multiplexing. We further show that our concentric camera and light source setting results in a unique single-peak pattern in specularity variations across viewpoints. This property enables robust depth estimation for specular points. To estimate depth and multi-spectral reflectance map, we formulate a physics-based reflectance model for the CMSLF under the surface camera (S-Cam) representation. Extensive synthetic and real experiments show that our method outperforms the state-of-the-art shape reconstruction methods, especially for non-Lambertian surfaces.

Journal ArticleDOI
TL;DR: Comprehensive experiments show the novel and practical neural rendering technique called neural opacity point cloud (NOPC) can produce photorealistic rendering on inputs from multi-view setups such as a turntable system for hair and furry toy captures.
Abstract: Fuzzy objects composed of hair, fur, or feather are impossible to scan even with the latest active or passive 3D scanners. We present a novel and practical neural rendering (NR) technique called neural opacity point cloud (NOPC) to allow high quality rendering of such fuzzy objects at any viewpoint. NOPC employs a learning-based scheme to extract geometric and appearance features on 3D point clouds including their opacity. It then maps the 3D features onto virtual viewpoints where a new U-Net based NR manages to handle noisy and incomplete geometry while maintaining translation equivariance. Comprehensive experiments on existing and new datasets show our NOPC can produce photorealistic rendering on inputs from multi-view setups such as a turntable system for hair and furry toy captures.

Posted Content
TL;DR: This work first encode portrait scans with a semantic occupancy field (SOF), which represents semantic-embedded geometry structure and output free-viewpoint semantic segmentation maps, and designs a semantic instance wised (SIW) StyleGAN to regionally styling the segmentation map.
Abstract: Generating portrait images from a single latent space facing the problem of entangled attributes, making it difficult to explicitly adjust the generation on specific attributes, e.g., contour and viewpoint control or dynamic styling. Therefore, we propose to decompose the generation space into two subspaces: geometric and texture space. We first encode portrait scans with a semantic occupancy field (SOF), which represents semantic-embedded geometry structure and output free-viewpoint semantic segmentation maps. Then we design a semantic instance wised(SIW) StyleGAN to regionally styling the segmentation map. We capture 664 3D portrait scans for our SOF training and use real capture photos(FFHQ and CelebA-HQ) for SIW StyleGAN training. Adequate experiments show that our representations enable appearance consistent shape, pose, regional styles controlling, achieve state-of-the-art results, and generalize well in various application scenarios.

Proceedings ArticleDOI
12 Oct 2020
TL;DR: This paper proposes Neural3D: a novel neural human portrait scanning system using only a single RGB camera, and proposes a context-aware correspondence learning approach which jointly models the appearance, spatial and motion information between feature pairs.
Abstract: Reconstructing a human portrait in a realistic and convenient manner is critical for human modeling and understanding. Aiming at light-weight and realistic human portrait reconstruction, in this paper we propose Neural3D: a novel neural human portrait scanning system using only a single RGB camera. In our system, to enable accurate pose estimation,we propose a context-aware correspondence learning approach which jointly models the appearance, spatial and motion information between feature pairs. To enable realistic reconstruction and suppress the geometry error, we further adopt a point-based neural rendering scheme to generate realistic and immersive portrait visualization in arbitrary virtual view-points. By introducing these learning-based technical components into the pure RGB-based human modeling framework, we can achieve both accurate camera pose estimation and realistic free-viewpoint rendering of the reconstructed human portrait. Extensive experiments on a variety of challenging capture scenarios demonstrate the robustness and effectiveness of our approach.

Proceedings ArticleDOI
Quan Meng1, Jiakai Zhang1, Qiang Hu1, Xuming He1, Jingyi Yu1 
12 Oct 2020
TL;DR: Li et al. as discussed by the authors proposed Line Graph Neural Network (LGNN), which employs a deep convolutional neural network (DCNN) for proposing line segment directly, with a graph neural network(GNN) module for reasoning their connectivities.
Abstract: We present a novel real-time line segment detection scheme called Line Graph Neural Network (LGNN). Existing approaches require a computationally expensive verification or postprocessing step. Our LGNN employs a deep convolutional neural network (DCNN) for proposing line segment directly, with a graph neural network (GNN) module for reasoning their connectivities. Specifically, LGNN exploits a new quadruplet representation for each segment where the GNN module takes the predicted candidates as vertexes and constructs a sparse graph to enforce structural context. Compared with the state-of-the-art, LGNN achieves near real-time performance without compromising accuracy. LGNN further enables time-sensitive 3D applications. When a 3D point cloud is accessible, we present a multi-modal line segment classification technique for extracting a 3D wireframe of the environment robustly and efficiently.

Book ChapterDOI
23 Aug 2020
TL;DR: A PIV solution that uses a compact lenslet-based light field camera to track dense particles floating in the fluid and reconstruct the 3D fluid flow and develops a motion-constrained optical flow estimation algorithm by enforcing the local motion rigidity and the Navier-Stoke fluid constraint.
Abstract: Particle Imaging Velocimetry (PIV) estimates the fluid flow by analyzing the motion of injected particles. The problem is challenging as the particles lie at different depths but have similar appearances. Tracking a large number of moving particles is particularly difficult due to the heavy occlusion. In this paper, we present a PIV solution that uses a compact lenslet-based light field camera to track dense particles floating in the fluid and reconstruct the 3D fluid flow. We exploit the focal symmetry property in the light field focal stacks for recovering the depths of similar-looking particles. We further develop a motion-constrained optical flow estimation algorithm by enforcing the local motion rigidity and the Navier-Stoke fluid constraint. Finally, the estimated particle motion trajectory is used to visualize the 3D fluid flow. Comprehensive experiments on both synthetic and real data show that using a compact light field camera, our technique can recover dense and accurate 3D fluid flow.

Journal ArticleDOI
TL;DR: A novel ray-space ep bipolar geometry is developed which intrinsically encapsulates the complete projective relationship between two light fields, while the generalized epipolar geometry which describes relationship of normalized light fields is the specialization of the proposed model to calibrated cameras.
Abstract: Light field essentially represents rays in space. The epipolar geometry between two light fields is an important relationship that captures ray-ray correspondences and relative configuration of two views. Unfortunately, so far little work has been done in deriving a formal epipolar geometry model that is specifically tailored for light field cameras. This is primarily due to the high-dimensional nature of the ray sampling process with a light field camera. This paper fills in this gap by developing a novel ray-space epipolar geometry which intrinsically encapsulates the complete projective relationship between two light fields, while the generalized epipolar geometry which describes relationship of normalized light fields is the specialization of the proposed model to calibrated cameras. With Plecker parameterization, we propose the ray-space projection model involving a 6 6 ray-space intrinsic matrix for ray sampling of light field camera. Ray-space fundamental matrix and its properties are then derived to constrain ray-ray correspondences for general and special motions. Finally, based on ray-space epipolar geometry, we present two novel algorithms, one for fundamental matrix estimation, and the other for calibration. Experiments on synthetic and real data have validated the effectiveness of ray-space epipolar geometry in solving 3D computer vision tasks with light field cameras.

Proceedings ArticleDOI
01 Apr 2020
TL;DR: In this article, a color photometric stereo (CPS) method was proposed to recover high quality, detailed 3D face geometry in a single shot using three uncalibrated near point lights of different colors and a single camera.
Abstract: We present a new color photometric stereo (CPS) method that recovers high quality, detailed 3D face geometry in a single shot. Our system uses three uncalibrated near point lights of different colors and a single camera. For robust self-calibration of the light sources, we use 3D morphable model (3DMM) [1] and semantic segmentation of facial parts. For reconstruction, we address the inherent spectral ambiguity in color photometric stereo by incorporating albedo consensus, albedo similarity, and proxy prior into a unified framework. In this way, we jointly exploit multiple cues to resolve under-determinedness, without the need for spatial constancy of albedo. Experiments show that our new approach produces state-of-the-art results from single image with high-fidelity geometry that includes details such as wrinkles.

Journal ArticleDOI
Peihong Yu1, Cen Wang1, Zhirui Wang, Jingyi Yu1, Laurent Kneip1 
TL;DR: A novel stereo trifocal tensor solver is presented and the camera matrix’s ability to continuously and robustly bootstrap visual motion estimation pipelines via integration into a robust, purely line-based visual odometry pipeline is outlined.
Abstract: While most monocular structure-from-motion frameworks rely on sparse keypoints, it has long been acknowledged that lines represent an alternative, higher-order feature with high accuracy, repeatability, and abundant availability in man-made environments. Its exclusive use, however, is severely complicated by its inability to resolve the common bootstrapping scenario of two-view geometry. Even with stereo cameras, a one-dimensional disparity space, as well as ill-posed triangulations of horizontal lines make the realization of purely line-based tracking pipelines difficult. The present paper successfully leverages the redundancy in camera matrices to alleviate this shortcoming. We present a novel stereo trifocal tensor solver and extend it to the case of two camera matrix view-points. Our experiments demonstrate superior behavior with respect to both 2D-2D and 3D-3D alternatives. We furthermore outline the camera matrix’s ability to continuously and robustly bootstrap visual motion estimation pipelines via integration into a robust, purely line-based visual odometry pipeline. The result leads to state-of-the-art tracking accuracy comparable to what is achieved by point-based stereo or even dense depth camera alternatives.

Patent
05 Mar 2020
TL;DR: In this paper, a method of generating a 3D model of an object from a plurality of views at a plurality-of-views is described. But the method is limited to the case of a single camera.
Abstract: A method of generating a three-dimensional model of an object is disclosed. The method may use a light field camera to capture a plurality of light field images at a plurality of viewpoints. The method may include capturing a first light field image at a first viewpoint; capturing a second light field image at the second viewpoint; estimating a rotation and a translation of a light field from the first viewpoint to the second viewpoint; obtaining a disparity map from each of the plurality of light field image; and computing a three-dimensional point cloud by optimizing the rotation and translation of the light field and the disparity map. The first light field image may include a first plurality of subaperture images and the second light field image may include a second plurality of subaperture images.

Patent
28 Jul 2020
TL;DR: In this paper, a method of detecting and recognizing faces using a light field camera array, comprising capturing multi-view color images using the LF camera array; obtaining a depth map; conducting light field rendering using a weight function comprising a depth component and a sematic component.
Abstract: A method of detecting and recognizing faces using a light field camera array, comprising capturing multi-view color images using the light field camera array; obtaining a depth map; conducting light field rendering using a weight function comprising a depth component and a sematic component, where the weight function assigns a ray in the light field with a weight; and detecting and recognizing a face.