scispace - formally typeset
Search or ask a question
Author

Lam C. Tran

Bio: Lam C. Tran is an academic researcher from University of California, San Diego. The author has contributed to research in topics: View synthesis & Pixel. The author has an hindex of 4, co-authored 4 publications receiving 95 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A new method called sparse variational message passing is developed which can reduce inference time by an order of magnitude with negligible loss in quality and allows efficient learning over large data sets when energy functions violate the constraints imposed by graph cuts.
Abstract: Until recently, the lack of ground truth data has hindered the application of discriminative structured prediction techniques to the stereo problem. In this paper we use ground truth data sets that we have recently constructed to explore different model structures and parameter learning techniques. To estimate parameters in Markov random fields (MRFs) via maximum likelihood one usually needs to perform approximate probabilistic inference. Conditional random fields (CRFs) are discriminative versions of traditional MRFs. We explore a number of novel CRF model structures including a CRF for stereo matching with an explicit occlusion model. CRFs require expensive inference steps for each iteration of optimization and inference is particularly slow when there are many discrete states. We explore belief propagation, variational message passing and graph cuts as inference methods during learning and compare with learning via pseudolikelihood. To accelerate approximate inference we have developed a new method called sparse variational message passing which can reduce inference time by an order of magnitude with negligible loss in quality. Learning using sparse variational message passing improves upon previous approaches using graph cuts and allows efficient learning over large data sets when energy functions violate the constraints imposed by graph cuts.

33 citations

Proceedings ArticleDOI
22 May 2011
TL;DR: This paper presents a new method for view synthesis that is both fast and accurate and has applications in free-viewpoint television, angular scalability for 3D video coding/decoding, and stereo-to-multiview conversion.
Abstract: From a rectified stereo image pair, the task of view synthesis is to generate images from any viewpoint along the baseline. The main difficulty of the problem is how to fill occluded regions. In this paper, we present a new method for view synthesis that is both fast and accurate. Occlusions are filled using color and disparity information to produce consistent pixel estimates. Results are comparable to current state-of-the-art methods in terms of objective measures while computation time is drastically reduced. This work has applications in free-viewpoint television, angular scalability for 3D video coding/decoding, and stereo-to-multiview conversion.

27 citations

Journal ArticleDOI
TL;DR: A novel stereo view synthesis algorithm that is highly accurate with respect to inter-view consistency, thus to enabling stereo contents to be viewed on the autostereoscopic displays and the implementation of a simplified GPU accelerated version of the approach and its implementation in CUDA.
Abstract: In this paper we present a novel stereo view synthesis algorithm that is highly accurate with respect to inter-view consistency, thus to enabling stereo contents to be viewed on the autostereoscopic displays. The algorithm finds identical occluded regions within each virtual view and aligns them together to extract a surrounding background layer. The background layer for each occluded region is then used with an exemplar based inpainting method to synthesize all virtual views simultaneously. Our algorithm requires the alignment and extraction of background layers for each occluded region; however, these two steps are done efficiently with lower computational complexity in comparison to previous approaches using the exemplar based inpainting algorithms. Thus, it is more efficient than existing algorithms that synthesize one virtual view at a time. This paper also describes the implementation of a simplified GPU accelerated version of the approach and its implementation in CUDA. Our CUDA method has sublinear complexity in terms of the number of views that need to be generated, which makes it especially useful for generating content for autostereoscopic displays that require many views to operate. An objective of our work is to allow the user to change depth and viewing perspective on the fly. Therefore, to further accelerate the CUDA variant of our approach, we present a modified version of our method to warp the background pixels from reference views to a middle view to recover background pixels. We then use an exemplar based inpainting method to fill in the occluded regions. We use warping of the foreground from the reference images and background from the filled regions to synthesize new virtual views on the fly. Our experimental results indicate that the simplified CUDA implementation decreases running time by orders of magnitude with negligible loss in quality.

21 citations

Proceedings ArticleDOI
03 Dec 2010
TL;DR: A novel method to synthesize intermediate views from two stereo images and disparity maps that is robust to errors in disparity maps and provides an explicit probabilistic model to select the best candidate for each disoccluded pixel efficiently with Conditional Random Fields and graph-cuts is proposed.
Abstract: We propose a novel method to synthesize intermediate views from two stereo images and disparity maps that is robust to errors in disparity maps. The proposed method computes a placement matrix from each disparity map that can be used to correct errors when warping pixels from reference view to virtual view. The second contribution is a new hole filling method that uses depth, edge, and segmentation information to aid the process of filling disoccluded pixels. The proposed method selects pixels from segmented regions that are connected to the disoccluded region as candidates to fill the disoccluded pixels. We also provide an explicit probabilistic model to select the best candidate for each disoccluded pixel efficiently with Conditional Random Fields (CRFs) and graph-cuts.

14 citations


Cited by
More filters
Proceedings ArticleDOI
21 Jul 2017
TL;DR: This benchmark is the first to cover the important use case of hand-held mobile devices while also providing high-resolution DSLR camera images and provides data at significantly higher temporal and spatial resolution.
Abstract: Motivated by the limitations of existing multi-view stereo benchmarks, we present a novel dataset for this task. Towards this goal, we recorded a variety of indoor and outdoor scenes using a high-precision laser scanner and captured both high-resolution DSLR imagery as well as synchronized low-resolution stereo videos with varying fields-of-view. To align the images with the laser scans, we propose a robust technique which minimizes photometric errors conditioned on the geometry. In contrast to previous datasets, our benchmark provides novel challenges and covers a diverse set of viewpoints and scene types, ranging from natural scenes to man-made indoor and outdoor environments. Furthermore, we provide data at significantly higher temporal and spatial resolution. Our benchmark is the first to cover the important use case of hand-held mobile devices while also providing high-resolution DSLR camera images. We make our datasets and an online evaluation server available at http://www.eth3d.net.

537 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: A novel SGM parameterization, which deploys different penalties depending on either positive or negative disparity changes in order to represent the object structures more discriminatively, is proposed.
Abstract: This paper deals with deep neural networks for predicting accurate dense disparity map with Semi-global matching (SGM). SGM is a widely used regularization method for real scenes because of its high accuracy and fast computation speed. Even though SGM can obtain accurate results, tuning of SGMs penalty-parameters, which control a smoothness and discontinuity of a disparity map, is uneasy and empirical methods have been proposed. We propose a learning based penalties estimation method, which we call SGM-Nets that consist of Convolutional Neural Networks. A small image patch and its position are input into SGMNets to predict the penalties for the 3D object structures. In order to train the networks, we introduce a novel loss function which is able to use sparsely annotated disparity maps such as captured by a LiDAR sensor in real environments. Moreover, we propose a novel SGM parameterization, which deploys different penalties depending on either positive or negative disparity changes in order to represent the object structures more discriminatively. Our SGM-Nets outperformed state of the art accuracy on KITTI benchmark datasets.

272 citations

Proceedings Article
16 Jun 2013
TL;DR: It is shown that gradients of a variety of loss functions over the mean field marginals can be computed efficiently and the resulting algorithm learns parameters that directly optimize the performance of mean field inference in the model.
Abstract: Dense random fields are models in which all pairs of variables are directly connected by pairwise potentials. It has recently been shown that mean field inference in dense random fields can be performed efficiently and that these models enable significant accuracy gains in computer vision applications. However, parameter estimation for dense random fields is still poorly understood. In this paper, we present an efficient algorithm for learning parameters in dense random fields. All parameters are estimated jointly, thus capturing dependencies between them. We show that gradients of a variety of loss functions over the mean field marginals can be computed efficiently. The resulting algorithm learns parameters that directly optimize the performance of mean field inference in the model. As a supporting result, we present an efficient inference algorithm for dense random fields that is guaranteed to converge.

224 citations

Posted Content
TL;DR: A simple convolutional neural network architecture that is able to learn to compute dense disparity maps directly from the stereo inputs that outperforms many state-of-the-art stereo matching methods with a margin, and at the same time significantly faster.
Abstract: Exiting deep-learning based dense stereo matching methods often rely on ground-truth disparity maps as the training signals, which are however not always available in many situations. In this paper, we design a simple convolutional neural network architecture that is able to learn to compute dense disparity maps directly from the stereo inputs. Training is performed in an end-to-end fashion without the need of ground-truth disparity maps. The idea is to use image warping error (instead of disparity-map residuals) as the loss function to drive the learning process, aiming to find a depth-map that minimizes the warping error. While this is a simple concept well-known in stereo matching, to make it work in a deep-learning framework, many non-trivial challenges must be overcome, and in this work we provide effective solutions. Our network is self-adaptive to different unseen imageries as well as to different camera settings. Experiments on KITTI and Middlebury stereo benchmark datasets show that our method outperforms many state-of-the-art stereo matching methods with a margin, and at the same time significantly faster.

155 citations

Journal ArticleDOI
TL;DR: A novel framework for the single depth image superresolution is proposed that is guided by a high-resolution edge map, which is constructed from the edges of the low-resolution depth image through a Markov random field optimization in a patch synthesis based manner.
Abstract: Recently, consumer depth cameras have gained significant popularity due to their affordable cost. However, the limited resolution and the quality of the depth map generated by these cameras are still problematic for several applications. In this paper, a novel framework for the single depth image superresolution is proposed. In our framework, the upscaling of a single depth image is guided by a high-resolution edge map, which is constructed from the edges of the low-resolution depth image through a Markov random field optimization in a patch synthesis based manner. We also explore the self-similarity of patches during the edge construction stage, when limited training data are available. With the guidance of the high-resolution edge map, we propose upsampling the high-resolution depth image through a modified joint bilateral filter. The edge-based guidance not only helps avoiding artifacts introduced by direct texture prediction, but also reduces jagged artifacts and preserves the sharp edges. Experimental results demonstrate the effectiveness of our method both qualitatively and quantitatively compared with the state-of-the-art methods.

145 citations