scispace - formally typeset
Search or ask a question
Author

Li Guan

Bio: Li Guan is an academic researcher from General Electric. The author has contributed to research in topics: Silhouette & Object detection. The author has an hindex of 13, co-authored 30 publications receiving 484 citations. Previous affiliations of Li Guan include ETH Zurich & University of North Carolina at Chapel Hill.

Papers
More filters
Book ChapterDOI
05 Nov 2012
TL;DR: This work proposes to extract high level primitives---planes---from an RGB-D camera, in addition to low level image features, to better constrain the problem and help improve indoor 3D reconstruction and demonstrates with real datasets that the method with plane constraints achieves more accurate and more appealing results comparing with other state-of-the-art scene reconstruction algorithms.
Abstract: Given a hand-held RGB-D camera (e.g. Kinect), methods such as Structure from Motion (SfM) and Iterative Closest Point (ICP), perform poorly when reconstructing indoor scenes with few image features or little geometric structure information. In this paper, we propose to extract high level primitives---planes---from an RGB-D camera, in addition to low level image features (e.g. SIFT), to better constrain the problem and help improve indoor 3D reconstruction. Our work has two major contributions: first, for frame to frame matching, we propose a new scheme which takes into account both low-level appearance feature correspondences in RGB image and high-level plane correspondences in depth image. Second, in the global bundle adjustment step, we formulate a novel error measurement that not only takes into account the traditional 3D point re-projection errors, but also the planar surface alignment errors. We demonstrate with real datasets that our method with plane constraints achieves more accurate and more appealing results comparing with other state-of-the-art scene reconstruction algorithms in aforementioned challenging indoor scenarios.

66 citations

Proceedings ArticleDOI
17 Jun 2007
TL;DR: Results show that the shape of static occluders can be robustly recovered from pure dynamic object motion, and that this information can be used for online self-correction and consolidation of dynamic object shape reconstruction.
Abstract: We consider the problem of detecting and accounting for the presence of occluders in a 3D scene based on silhouette cues in video streams obtained from multiple, calibrated views. While well studied and robust in controlled environments, silhouette-based reconstruction of dynamic objects fails in general environments where uncontrolled occlusions are commonplace, due to inherent silhouette corruption by occluders. We show that occluders in the interaction space of dynamic objects can be detected and their 3D shape fully recovered as a byproduct of shape-from-silhouette analysis. We provide a Bayesian sensor fusion formulation to process all occlusion cues occurring in a multi-view sequence. Results show that the shape of static occluders can be robustly recovered from pure dynamic object motion, and that this information can be used for online self-correction and consolidation of dynamic object shape reconstruction.

54 citations

Proceedings ArticleDOI
14 Jun 2006
TL;DR: This paper theoretically proves that the proposed visual hull algorithm deterministically computes the tightest, correct visual hull in the presence of occlusion, and analyzes that this new algorithm is still within the time complexity of the traditional method.
Abstract: In this paper, we propose a visual hull algorithm, which guarantees a correct construction even in the presence of partial occlusion, while "correct" here means that the real shape is located inside the visual hull. The algorithm is based on a new idea of the "extended silhouette", which requires the silhouette from background subtraction and the "occlusion mask" of the same view. In order to prepare the occlusion mask, we also propose a novel concept of "effective boundary" of moving foreground objects in a video obtained from a static camera. The accumulation of the effective boundary through time automatically gives robust occluder boundaries. We theoretically prove that our algorithm deterministically computes the tightest, correct visual hull in the presence of occlusion. Both synthetic and real examples are given as a demonstration of the correctness of the algorithm. Finally we analyze that this new algorithm is still within the time complexity of the traditional method.

46 citations

Proceedings Article
18 Jun 2008
TL;DR: This paper reconstructs 3D objects with a heterogeneous sensor network of Range Imaging(RIM) sensors and high-res camcorders and proposes a sensor fusion framework so that the computation is general, simple and scalable.
Abstract: In this paper, we reconstruct 3D objects with a heterogeneous sensor network of Range Imaging(RIM) sensors and high-res camcorders. With this setup, we first carry out simple but effective depth calibration for the RIM cameras. We then combine the camcorder silhouette cues and RIM camera depth information, for the reconstruction. Our main contribution is the proposal of a sensor fusion framework so that the computation is general, simple and scalable. Although we only discuss the camcorders and RIM cameras in this paper, the proposed framework can be applied to any type of vision sensors. It uses a space occupancy grid as a probabilistic 3D representation of scene contents. After defining sensing models for each type of sensors, the reconstruction is simply a Bayesian inference problem, and can be solved robustly. The experiments show that the recover full 3D closed shapes substantially improved the quality of the noisy RIM sensor measurement.

42 citations

Proceedings ArticleDOI
23 Jun 2008
TL;DR: This paper proposes a new algorithm to automatically detect and reconstruct scenes with a variable number of dynamic objects, and distinguishes between m different shapes in the scene by using automatically learnt view-specific appearance models, eliminating the color calibration requirement.
Abstract: This paper deals with the 3D shape estimation from silhouette cues of multiple moving objects in general indoor or outdoor 3D scenes with potential static obstacles, using multiple calibrated video streams. Most shape-from-silhouette techniques use a two-classification of space occupancy and silhouettes, based on image regions that match or disagree with a static background appearance model. Binary silhouette information becomes insufficient to unambiguously carve 3D space regions as the number and density of dynamic objects increases. In such difficult scenes, multi-view stereo methods suffer from visibility problems, and rely on color calibration procedures tedious to achieve outdoors. We propose a new algorithm to automatically detect and reconstruct scenes with a variable number of dynamic objects. Our formulation distinguishes between m different shapes in the scene by using automatically learnt view-specific appearance models, eliminating the color calibration requirement. Bayesian reasoning is then applied to solve the m-shape occupancy problem, with m updated as objects enter or leave the scene. Results show that this method yields multiple silhouette-based estimates that drastically improve scene reconstructions over traditional two-label silhouette scene analysis. This enables the method to also efficiently deal with multi-person tracking problems.

31 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A growing number of applications depend on accurate and fast 3D scene analysis, and the estimation of a range map by image analysis or laser scan techniques is still a time‐consuming and expensive part of such systems.
Abstract: A growing number of applications depend on accurate and fast 3D scene analysis. Examples are model and lightfield acquisition, collision prevention, mixed reality and gesture recognition. The estimation of a range map by image analysis or laser scan techniques is still a time-consuming and expensive part of such systems. A lower-priced, fast and robust alternative for distance measurements are time-of-flight (ToF) cameras. Recently, significant advances have been made in producing low-cost and compact ToF devices, which have the potential to revolutionize many fields of research, including computer graphics, computer vision and human machine interaction (HMI). These technologies are starting to have an impact on research and commercial applications. The upcoming generation of ToF sensors, however, will be even more powerful and will have the potential to become ‘ubiquitous real-time geometry devices’ for gaming, web-conferencing, and numerous other applications. This paper gives an account of recent developments in ToF technology and discusses the current state of the integration of this technology into various graphics-related applications.

289 citations

Proceedings ArticleDOI
01 Jan 2009
TL;DR: This STAR gives an account of recent developments in ToF-technology and discusses the current state of the integration of this technology into various graphics-related applications.
Abstract: A growing number of applications depend on accurate and fast 3D scene analysis. Examples are model and lightfield acquisition, collision prevention, mixed reality, and gesture recognition. The estimation of a range map by image analysis or laser scan techniques is still a time-consuming and expensive part of such systems. A lower-priced, fast and robust alternative for distance measurements are Time-of-Flight (ToF) cameras. Recently, significant advances have been made in producing low-cost and compact ToF-devices, which have the potential to revolutionize many fields of research, including Computer Graphics, Computer Vision and Human Machine Interaction (HMI). These technologies are starting to have an impact on research and commercial applications. The upcoming generation of ToF sensors, however, will be even more powerful and will have the potential to become “ubiquitous real-time geometry devices” for gaming, web-conferencing, and numerous other applications. This STAR gives an account of recent developments in ToF-technology and discusses the current state of the integration of this technology into various graphics-related applications.

234 citations

Proceedings ArticleDOI
10 Sep 2014
TL;DR: This work presents an efficient new real-time approach which densely maps an environment using bounded planes and surfels extracted from depth images (like those produced by RGB-D sensors or dense multi-view stereo reconstruction) to take advantage of the planarity of many parts of real-world scenes.
Abstract: Using higher-level entities during mapping has the potential to improve camera localisation performance and give substantial perception capabilities to real-time 3D SLAM systems. We present an efficient new real-time approach which densely maps an environment using bounded planes and surfels extracted from depth images (like those produced by RGB-D sensors or dense multi-view stereo reconstruction). Our method offers the every-pixel descriptive power of the latest dense SLAM approaches, but takes advantage directly of the planarity of many parts of real-world scenes via a data-driven process to directly regularize planar regions and represent their accurate extent efficiently using an occupancy approach with on-line compression. Large areas can be mapped efficiently and with useful semantic planar structure which enables intuitive and useful AR applications such as using any wall or other planar surface in a scene to display a user's content.

197 citations

Proceedings ArticleDOI
01 Sep 2009
TL;DR: This work proposes an integrated multi-view sensor fusion approach that combines information from multiple color cameras and multiple ToF depth sensors to obtain high quality dense and detailed 3D models of scenes challenging for stereo alone, while simultaneously reducing complex noise of ToF sensors.
Abstract: Multi-view stereo methods frequently fail to properly reconstruct 3D scene geometry if visible texture is sparse or the scene exhibits difficult self-occlusions Time-of-Flight (ToF) depth sensors can provide 3D information regardless of texture but with only limited resolution and accuracy To find an optimal reconstruction, we propose an integrated multi-view sensor fusion approach that combines information from multiple color cameras and multiple ToF depth sensors First, multi-view ToF sensor measurements are combined to obtain a coarse but complete model Then, the initial model is refined by means of a probabilistic multi-view fusion framework, optimizing over an energy function that aggregates ToF depth sensor information with multi-view stereo and silhouette constraints We obtain high quality dense and detailed 3D models of scenes challenging for stereo alone, while simultaneously reducing complex noise of ToF sensors

174 citations

Journal ArticleDOI
TL;DR: A method is introduced that substantially improves upon the manufacturer's calibration of time-of-flight range sensors and leads to improved accuracy and robustness on an extensive set of experimental results.
Abstract: Time-of-flight range sensors have error characteristics, which are complementary to passive stereo. They provide real-time depth estimates in conditions where passive stereo does not work well, such as on white walls. In contrast, these sensors are noisy and often perform poorly on the textured scenes where stereo excels. We explore their complementary characteristics and introduce a method for combining the results from both methods that achieve better accuracy than either alone. In our fusion framework, the depth probability distribution functions from each of these sensor modalities are formulated and optimized. Robust and adaptive fusion is built on a pixel-wise reliability weighting function calculated for each method. In addition, since time-of-flight devices have primarily been used as individual sensors, they are typically poorly calibrated. We introduce a method that substantially improves upon the manufacturer's calibration. We demonstrate that our proposed techniques lead to improved accuracy and robustness on an extensive set of experimental results.

159 citations