scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Heterogeneous Multi-View Information Fusion: Review of 3-D Reconstruction Methods and a New Registration with Uncertainty Modeling

TL;DR: A framework to incorporate measurement uncertainties in the registered imagery is proposed, which is a critical issue to ensure the robustness of these applications but is often not addressed.
Abstract: We consider a multisensor network fusion framework for 3-D data registration using inertial planes, the underlying geometric relations, and transformation model uncertainties. We present a comprehensive review of 3-D reconstruction methods and registration techniques in terms of the underlying geometric relations and associated uncertainties in the registered images. The 3-D data registration and the scene reconstruction task using a set of multiview images are an essential goal of structure-from-motion algorithms that still remains challenging for many applications, such as surveillance, human motion and behavior modeling, virtual-reality, smart-rooms, health-care, teleconferencing, games, human–robot interaction, medical imaging, and scene understanding. We propose a framework to incorporate measurement uncertainties in the registered imagery, which is a critical issue to ensure the robustness of these applications but is often not addressed. In our test bed environment, a network of sensors is used where each physical node consists of a coupled camera and associated inertial sensor (IS)/inertial measurement unit. Each camera-IS node can be considered as a hybrid sensor or fusion-based virtual camera. The 3-D scene information is registered onto a set of virtual planes defined by the IS. The virtual registrations are based on using the homography calculated from 3-D orientation data provided by the IS. The uncertainty associated with each 3-D point projected onto the virtual planes is modeled using statistical geometry methods. Experimental results demonstrate the feasibility and effectiveness of the proposed approach for multiview reconstruction with sensor fusion.
Citations
More filters
Journal ArticleDOI
TL;DR: This survey will directly help researchers understand the research developments of MSIF under RST and provide state-of-the-art understanding in specialized literature, as well as clarify the approaches and application of MSif in RST research community.

105 citations

Journal ArticleDOI
TL;DR: The proposed approach recovers accurate camera pose and (sparse) 3-D structure using bundle adjustment for sequential imagery (BA4S) and then stabilize the video from the moving platform by analytically solving for the image-plane-to-ground-plane homography transformation.
Abstract: We describe a fast and efficient camera pose refinement and Structure from Motion (SfM) method for sequential aerial imagery with applications to georegistration and 3-D reconstruction. Inputs to the system are 2-D images combined with initial noisy camera metadata measurements, available from on-board sensors (e.g., camera, global positioning system, and inertial measurement unit). Georegistration is required to stabilize the ground-plane motion to separate camera-induced motion from object motion to support vehicle tracking in aerial imagery. In the proposed approach, we recover accurate camera pose and (sparse) 3-D structure using bundle adjustment for sequential imagery (BA4S) and then stabilize the video from the moving platform by analytically solving for the image-plane-to-ground-plane homography transformation. Using this approach, we avoid relying upon image-to-image registration, which requires estimating feature correspondences (i.e., matching) followed by warping between images (in a 2-D space) that is an error prone process for complex scenes with parallax, appearance, and illumination changes. Both our SfM (BA4S) and our analytical ground-plane georegistration method avoid the use of iterative consensus combinatorial methods like RANdom SAmple Consensus which is a core part of many published approaches. BA4S is very efficient for long sequential imagery and is more than 130 times faster than VisualSfM, 35 times faster than MavMap, and about 274 times faster than Pix4D. Various experimental results demonstrate the efficiency and robustness of the proposed pipeline for the refinement of camera parameters in sequential aerial imagery and georegistration.

19 citations

Journal ArticleDOI
Yejian Zhou1, Lei Zhang1, Chao Xing1, Pengfei Xie1, Yunhe Cao1 
TL;DR: A new reconstruction algorithm is proposed to reconstruct the 3D surface of the stable attitude target from its multi-view radar image sequence to achieve dramatic performance enhancement of the reconstruction in this condition.
Abstract: Target three-dimensional (3D) reconstruction is a hot topic and also a challenge in remote sensing applications. In this paper, a new reconstruction algorithm is proposed to reconstruct the 3D surface of the stable attitude target from its multi-view radar image sequence. Uniform explicit expression of the radar and optical imaging geometries is derived to bridge the 3D target structure and these two sorts of observation images. In this way, the visual hull of the target is reconstructed by exploiting the multi-view stereo techniques to the silhouette information extracted from the radar image sequence. Meanwhile, the target absolute attitude is also determined. Furthermore, we analyze the primary difficulty of the method induced from the limited radar observation view in a typical application, the 3D reconstruction of an in-orbit satellite. Then, an extended algorithm is proposed with the feature fusion of the radar and optical images to achieve dramatic performance enhancement of the reconstruction in this condition. The feasibility of the proposed algorithm is confirmed in the experiment part, and some conclusions are drawn to guide the future work about extended applications of the proposed algorithm as well.

14 citations

Journal ArticleDOI
TL;DR: A gesture recognition algorithm based on image information fusion in virtual reality based on the multi-sensor information fusion model of the virtual environment for gesture recognition achieves the highest recognition success rate and is better than several comparison machine learning methods in recognition time.
Abstract: Combining image information fusion theory with machine learning for biometric recognition is an important field in computer vision research in recent years. Based on this, a gesture recognition algorithm based on image information fusion in virtual reality is proposed. Firstly, it introduces the basic concepts and principles of virtual reality and information fusion technology, analyzes the characteristics and basic components of virtual environment system, points out the relationship between human and virtual environment and the impact of virtual environment on people, and gives a virtual reality. Then, the multi-sensor information fusion model of the virtual environment for gesture recognition is proposed. The membership degree and template matching algorithm are further selected for data correlation and gesture recognition in the fusion model. Finally, the design comparison experiment verifies the proposed method. The results show that the proposed multi-sensor information fusion model in the interactive virtual environment achieves the highest recognition success rate of 96.17% and is better than several comparison machine learning methods in recognition time.

14 citations

Posted Content
TL;DR: The experimental results show that the proposed approach outperforms state-of-the-art multi-camera pedestrian detectors, even some specifically trained on the target scenario, signifying the versatility and robustness of the proposed method without requiring ad hoc annotations nor human-guided configuration.
Abstract: In the current worldwide situation, pedestrian detection has reemerged as a pivotal tool for intelligent video-based systems aiming to solve tasks such as pedestrian tracking, social distancing monitoring or pedestrian mass counting. Pedestrian detection methods, even the top performing ones, are highly sensitive to occlusions among pedestrians, which dramatically degrades their performance in crowded scenarios. The generalization of multi-camera set-ups permits to better confront occlusions by combining information from different viewpoints. In this paper, we present a multi-camera approach to globally combine pedestrian detections leveraging automatically extracted scene context. Contrarily to the majority of the methods of the state-of-the-art, the proposed approach is scene-agnostic, not requiring a tailored adaptation to the target scenario\textemdash e.g., via fine-tunning. This noteworthy attribute does not require \textit{ad hoc} training with labelled data, expediting the deployment of the proposed method in real-world situations. Context information, obtained via semantic segmentation, is used 1) to automatically generate a common Area of Interest for the scene and all the cameras, avoiding the usual need of manually defining it; and 2) to obtain detections for each camera by solving a global optimization problem that maximizes coherence of detections both in each 2D image and in the 3D scene. This process yields tightly-fitted bounding boxes that circumvent occlusions or miss-detections. Experimental results on five publicly available datasets show that the proposed approach outperforms state-of-the-art multi-camera pedestrian detectors, even some specifically trained on the target scenario, signifying the versatility and robustness of the proposed method without requiring ad-hoc annotations nor human-guided configuration.

13 citations

References
More filters
Book
01 Apr 1996
TL;DR: Theorems and statistical properties of least squares solutions are explained and basic numerical methods for solving least squares problems are described.
Abstract: Preface 1. Mathematical and statistical properties of least squares solutions 2. Basic numerical methods 3. Modified least squares problems 4. Generalized least squares problems 5. Constrained least squares problems 6. Direct methods for sparse problems 7. Iterative methods for least squares problems 8. Least squares problems with special bases 9. Nonlinear least squares problems Bibliography Index.

3,405 citations

Journal ArticleDOI
01 Jul 2006
TL;DR: This work presents a system for interactively browsing and exploring large unstructured collections of photographs of a scene using a novel 3D interface that consists of an image-based modeling front end that automatically computes the viewpoint of each photograph and a sparse 3D model of the scene and image to model correspondences.
Abstract: We present a system for interactively browsing and exploring large unstructured collections of photographs of a scene using a novel 3D interface. Our system consists of an image-based modeling front end that automatically computes the viewpoint of each photograph as well as a sparse 3D model of the scene and image to model correspondences. Our photo explorer uses image-based rendering techniques to smoothly transition between photographs, while also enabling full 3D navigation and exploration of the set of images and world geometry, along with auxiliary information such as overhead maps. Our system also makes it easy to construct photo tours of scenic or historic locations, and to annotate image details, which are automatically transferred to other relevant images. We demonstrate our system on several large personal photo collections as well as images gathered from Internet photo sharing sites.

3,398 citations


"Heterogeneous Multi-View Informatio..." refers background in this paper

  • ...based on mined photographs from web users, see also [3]....

    [...]

Proceedings ArticleDOI
17 Jun 2006
TL;DR: This paper first survey multi-view stereo algorithms and compare them qualitatively using a taxonomy that differentiates their key properties, then describes the process for acquiring and calibrating multiview image datasets with high-accuracy ground truth and introduces the evaluation methodology.
Abstract: This paper presents a quantitative comparison of several multi-view stereo reconstruction algorithms. Until now, the lack of suitable calibrated multi-view image datasets with known ground truth (3D shape models) has prevented such direct comparisons. In this paper, we first survey multi-view stereo algorithms and compare them qualitatively using a taxonomy that differentiates their key properties. We then describe our process for acquiring and calibrating multiview image datasets with high-accuracy ground truth and introduce our evaluation methodology. Finally, we present the results of our quantitative comparison of state-of-the-art multi-view stereo reconstruction algorithms on six benchmark datasets. The datasets, evaluation details, and instructions for submitting new models are available online at http://vision.middlebury.edu/mview.

2,556 citations


Additional excerpts

  • ...[19]....

    [...]

Journal ArticleDOI
TL;DR: A system that can match and reconstruct 3D scenes from extremely large collections of photographs such as those found by searching for a given city on Internet photo sharing sites and is designed to scale gracefully with both the size of the problem and the amount of available computation.
Abstract: We present a system that can reconstruct 3D geometry from large, unorganized collections of photographs such as those found by searching for a given city (e.g., Rome) on Internet photo-sharing sites. Our system is built on a set of new, distributed computer vision algorithms for image matching and 3D reconstruction, designed to maximize parallelism at each stage of the pipeline and to scale gracefully with both the size of the problem and the amount of available computation. Our experimental results demonstrate that it is now possible to reconstruct city-scale image collections with more than a hundred thousand images in less than a day.

1,307 citations


Additional excerpts

  • ...[2] presented a city scale 3D reconstructions...

    [...]

Book
19 Nov 1993

1,260 citations