scispace - formally typeset
Search or ask a question

Showing papers on "Homography (computer vision) published in 2020"


Posted Content
TL;DR: This work presents Patch2Pix, a novel refinement network that refines match proposals by regressing pixel-level matches from the local regions defined by those proposals and jointly rejecting outlier matches with confidence scores.
Abstract: The classical matching pipeline used for visual localization typically involves three steps: (i) local feature detection and description, (ii) feature matching, and (iii) outlier rejection. Recently emerged correspondence networks propose to perform those steps inside a single network but suffer from low matching resolution due to the memory bottleneck. In this work, we propose a new perspective to estimate correspondences in a detect-to-refine manner, where we first predict patch-level match proposals and then refine them. We present Patch2Pix, a novel refinement network that refines match proposals by regressing pixel-level matches from the local regions defined by those proposals and jointly rejecting outlier matches with confidence scores. Patch2Pix is weakly supervised to learn correspondences that are consistent with the epipolar geometry of an input image pair. We show that our refinement network significantly improves the performance of correspondence networks on image matching, homography estimation, and localization tasks. In addition, we show that our learned refinement generalizes to fully-supervised methods without re-training, which leads us to state-of-the-art localization performance. The code is available at this https URL.

64 citations


Book ChapterDOI
23 Aug 2020
TL;DR: Zhang et al. as discussed by the authors propose an unsupervised deep homography method with a new architecture design, which learns an outlier mask to only select reliable regions for homography estimation.
Abstract: Homography estimation is a basic image alignment method in many applications. It is usually conducted by extracting and matching sparse feature points, which are error-prone in low-light and low-texture images. On the other hand, previous deep homography approaches use either synthetic images for supervised learning or aerial images for unsupervised learning, both ignoring the importance of handling depth disparities and moving objects in real world applications. To overcome these problems, in this work we propose an unsupervised deep homography method with a new architecture design. In the spirit of the RANSAC procedure in traditional methods, we specifically learn an outlier mask to only select reliable regions for homography estimation. We calculate loss with respect to our learned deep features instead of directly comparing image content as did previously. To achieve the unsupervised training, we also formulate a novel triplet loss customized for our network. We verify our method by conducting comprehensive comparisons on a new dataset that covers a wide range of scenes with varying degrees of difficulties for the task. Experimental results reveal that our method outperforms the state-of-the-art including deep solutions and feature-based solutions.

62 citations


Journal ArticleDOI
TL;DR: In this article, a dynamics-level finite-time fuzzy monocular visual servo (DFFMVS) scheme is created for regulating an unmanned surface vehicle (USV) to the desired pose.
Abstract: In this article, in the presence of completely unknown dynamics and unmeasurable velocities, a dynamics-level finite-time fuzzy monocular visual servo (DFFMVS) scheme is created for regulating an unmanned surface vehicle (USV) to the desired pose. Main contributions are as follows: first, with the aid of homography decomposition, a novel homography-based visual servo structure for a USV with both kinematics and dynamics is first established such that complex unknowns including unmeasurable poses and velocities, image depth, system dynamics, and time-varying inertia are sufficiently encapsulated; second, using finite-time observer technique, finite-time velocity observer (FVO) based visual-servo error dynamics are elaboratively formulated, and thereby facilitating backstepping synthesis; third, by virtue of the FVO, the adaptive fuzzy dynamics approximator together with adaptive residual feedback is deployed to compensate complex unknowns, and thereby contributing to accurate regulation of pose errors; and fourth, a completely model-free monocular visual servo approach only using a camera is eventually invented. Simulation studies on a benchmark prototype USV demonstrate that the proposed DFFMVS scheme has remarkable performance with significant superiority in both visual servo and unknowns observation.

60 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: This paper proposes an image stitching algorithm robust to large parallax based on the novel concept of warping residuals and partitions input images into superpixels and warp each superpixel adaptively according to an optimal homography which is computed by minimizing the error of feature matches weighted by the warping remnants.
Abstract: Image stitching techniques align two images captured at different viewing positions onto a single wider image. When the captured 3D scene is not planar and the camera baseline is large, two images exhibit parallax where the relative positions of scene structures are quite different from each view. The existing image stitching methods often fail to work on the images with large parallax. In this paper, we propose an image stitching algorithm robust to large parallax based on the novel concept of warping residuals. We first estimate multiple homographies and find their inlier feature matches between two images. Then we evaluate warping residual for each feature match with respect to the multiple homographies. To alleviate the parallax artifacts, we partition input images into superpixels and warp each superpixel adaptively according to an optimal homography which is computed by minimizing the error of feature matches weighted by the warping residuals. Experimental results demonstrate that the proposed algorithm provides accurate stitching results for images with large parallax, and outperforms the existing methods qualitatively and quantitatively.

52 citations


Journal ArticleDOI
Lang Nie1, Chunyu Lin1, Kang Liao1, Meiqin Liu1, Yao Zhao1 
TL;DR: A cascaded view-free image stitching network based on a global homography, which can achieve almost 100% elimination of artifacts in overlapping areas at the cost of acceptable slight distortions in non-overlapping areas, compared with traditional methods.

41 citations


Posted Content
TL;DR: In this article, a multi-scale neural network is proposed to estimate the homography of a dynamic scene in a more principled way, by identifying the dynamic content and then augmenting the multi-task learning principles to jointly estimate the dynamics masks and homographies.
Abstract: Homography estimation is an important step in many computer vision problems. Recently, deep neural network methods have shown to be favorable for this problem when compared to traditional methods. However, these new methods do not consider dynamic content in input images. They train neural networks with only image pairs that can be perfectly aligned using homographies. This paper investigates and discusses how to design and train a deep neural network that handles dynamic scenes. We first collect a large video dataset with dynamic content. We then develop a multi-scale neural network and show that when properly trained using our new dataset, this neural network can already handle dynamic scenes to some extent. To estimate a homography of a dynamic scene in a more principled way, we need to identify the dynamic content. Since dynamic content detection and homography estimation are two tightly coupled tasks, we follow the multi-task learning principles and augment our multi-scale network such that it jointly estimates the dynamics masks and homographies. Our experiments show that our method can robustly estimate homography for challenging scenarios with dynamic scenes, blur artifacts, or lack of textures.

39 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: A mod for GTA V to record a MTMCT dataset has been developed and used toRecord a simulated M TMCT dataset called Multi Camera Track Auto (MTA), which contains over 2,800 person identities, 6 cameras and a video length of over 100 minutes per camera.
Abstract: Existing multi target multi camera tracking (MTMCT) datasets are small in terms of the number of identities and video length. The creation of new real world datasets is hard as privacy has to be guaranteed and the labeling is tedious. Therefore in the scope of this work a mod for GTA V to record a MTMCT dataset has been developed and used to record a simulated MTMCT dataset called Multi Camera Track Auto (MTA). The MTA dataset contains over 2,800 person identities, 6 cameras and a video length of over 100 minutes per camera. Additionally a MTMCT system has been implemented to provide a baseline for the created dataset. The system's pipeline consists of stages for person detection, person re-identification, single camera multi target tracking, track distance calculation, and track association. The track distance calculation comprises a weighted aggregation of the following distances: a single camera time constraint, a multi camera time constraint using overlapping camera areas, an appearance feature distance, a homography matching with pairwise camera homographies, and a linear prediction based on the velocity and the time difference of tracks. When using all partial distances, we were able to surpass the results of state-of-the-art single camera trackers by +13% IDF1 score. The MTA dataset, code, and baselines are available at github.com/schuar-iosb/mta-dataset.

36 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: This paper proposes an end-to-end approach for single moving camera calibration across challenging scenarios in sports with three key modules: area-based court segmentation, camera pose estimation with embedded templates, and homography prediction via a spatial transform network (STN).
Abstract: The increasing number of vision-based tracking systems deployed in production have necessitated fast, robust camera calibration. In the domain of sport, the majority of current work focuses on sports where lines and intersections are easy to extract, and appearance is relatively consistent across venues. However, for more challenging sports like basketball, those techniques are not sufficient. In this paper, we propose an end-to-end approach for single moving camera calibration across challenging scenarios in sports. Our method contains three key modules: 1) area-based court segmentation, 2) camera pose estimation with embedded templates, 3) homography prediction via a spatial transform network (STN). All three modules are connected, enabling end-to-end training. We evaluate our method on a new college basketball dataset and demonstrate state of the art performance in variable and dynamic environments. We also validate our method on the World Cup 2014 dataset to show its competitive performance against the state-of-the-art methods. Lastly, we show that our method is two orders of magnitude faster than the previous state of the art on both datasets.

31 citations


Journal ArticleDOI
TL;DR: The proposed mosaicking framework outperformed existing methods and generated meaningful mosaic, while reducing the accumulated drift, even in the presence of visual challenges such as specular highlights, reflection, texture paucity, and low video resolution.
Abstract: Fetoscopic laser photocoagulation is a minimally invasive surgical procedure used to treat twin-to-twin transfusion syndrome (TTTS), which involves localization and ablation of abnormal vascular connections on the placenta to regulate the blood flow in both fetuses This procedure is particularly challenging due to the limited field of view, poor visibility, occasional bleeding, and poor image quality Fetoscopic mosaicking can help in creating an image with the expanded field of view which could facilitate the clinicians during the TTTS procedure We propose a deep learning-based mosaicking framework for diverse fetoscopic videos captured from different settings such as simulation, phantoms, ex vivo, and in vivo environments The proposed mosaicking framework extends an existing deep image homography model to handle video data by introducing the controlled data generation and consistent homography estimation modules Training is performed on a small subset of fetoscopic images which are independent of the testing videos We perform both quantitative and qualitative evaluations on 5 diverse fetoscopic videos (2400 frames) that captured different environments To demonstrate the robustness of the proposed framework, a comparison is performed with the existing feature-based and deep image homography methods The proposed mosaicking framework outperformed existing methods and generated meaningful mosaic, while reducing the accumulated drift, even in the presence of visual challenges such as specular highlights, reflection, texture paucity, and low video resolution

20 citations


Journal ArticleDOI
03 Jun 2020-Energies
TL;DR: An image processing technique is introduced that automatically identifies the module generating the hot spots in the solar module and derives the area of interest by using the inrange function, using the blue color of the PV module.
Abstract: Several factors cause the output degradation of the photovoltaic (PV) module. The main affecting elements are the higher PV module temperature, the shaded cell, the shortened or conducting bypass diodes, and the soiled and degraded PV array. In this paper, we introduce an image processing technique that automatically identifies the module generating the hot spots in the solar module. In order to extract feature points, we used the maximally stable extremal regions (MSER) method, which derives the area of interest by using the inrange function, using the blue color of the PV module. We propose an effective matching method for feature points and a homography translation technique. The temperature data derivation method and the normal/ abnormal decision method are described in order to enhance the performance. The effectiveness of the proposed system was evaluated through experiments. Finally, a thermal image analysis of approximately 240 modules was confirmed to be 97% consistent with the visual evaluation in the experimental results.

19 citations


Proceedings ArticleDOI
20 Apr 2020
TL;DR: Wang et al. as mentioned in this paper proposed a deep network for exposure fusion, which takes two images, an underexposed image and an overexposed one, and integrates homography estimation for compensating camera motion, the attention mechanism for correcting remaining misalignment and moving pixels, and adversarial learning for alleviating other remaining artifacts.
Abstract: Modern cameras have limited dynamic ranges and often produce images with saturated or dark regions using a single exposure. Although the problem could be addressed by taking multiple images with different exposures, exposure fusion methods need to deal with ghosting artifacts and detail loss caused by camera motion or moving objects. This paper proposes a deep network for exposure fusion. For reducing the potential ghosting problem, our network only takes two images, an underexposed image and an overexposed one. Our network integrates together thew homography estimation for compensating camera motion, the attention mechanism for correcting remaining misalignment and moving pixels, and adversarial learning for alleviating other remaining artifacts. Experiments on real-world photos taken using handheld mobile phones show that the proposed method can generate high-quality images with faithful detail and vivid color rendition in both dark and bright areas.

Posted Content
TL;DR: A novel monocular camera-only holistic end-to-end trajectory planning network with a Bird-Eye-View (BEV) intermediate representation that comes in the form of binary Occupancy Grid Maps (OGMs) to ease the prediction of OGMs in BEV from camera images.
Abstract: Camera-based end-to-end driving neural networks bring the promise of a low-cost system that maps camera images to driving control commands. These networks are appealing because they replace laborious hand engineered building blocks but their black-box nature makes them difficult to delve in case of failure. Recent works have shown the importance of using an explicit intermediate representation that has the benefits of increasing both the interpretability and the accuracy of networks' decisions. Nonetheless, these camera-based networks reason in camera view where scale is not homogeneous and hence not directly suitable for motion forecasting. In this paper, we introduce a novel monocular camera-only holistic end-to-end trajectory planning network with a Bird-Eye-View (BEV) intermediate representation that comes in the form of binary Occupancy Grid Maps (OGMs). To ease the prediction of OGMs in BEV from camera images, we introduce a novel scheme where the OGMs are first predicted as semantic masks in camera view and then warped in BEV using the homography between the two planes. The key element allowing this transformation to be applied to 3D objects such as vehicles, consists in predicting solely their footprint in camera-view, hence respecting the flat world hypothesis implied by the homography.

Journal ArticleDOI
TL;DR: A disparity consistency seam-cutting and blending method is presented to determine the optimal seam and conduct stereoscopic image stitching, which achieves competitive performance compared with other state-of-the-art methods.
Abstract: As a significant branch of virtual reality, stereoscopic image stitching aims to generating wide perspectives and natural-looking scenes. Existing 2D image stitching methods cannot be successfully applied to the stereoscopic images without considering the disparity consistency of stereoscopic images. To address this issue, this paper presents a stereoscopic image stitching method based on disparity-constrained warping and blending, which could avoid visual distortion and preserve disparity consistency. First, a point-line-driven homography based disparity minimization method is designed to pre-align the left and right images and reduce vertical disparity. Afterwards, a multi-constraint warping is proposed to further align the left and right images, where the initial disparity map is introduced to control the consistency of disparities. Finally, a disparity consistency seam-cutting and blending method is presented to determine the optimal seam and conduct stereoscopic image stitching. Experimental results demonstrate that the proposed method achieves competitive performance compared with other state-of-the-art methods.

Journal ArticleDOI
TL;DR: The primary idea is that blending two images in the deep-feature-domain is effective for synthesizing multi-exposure images that are structurally aligned to the reference, resulting in better-aligned images than the pixel-domain blending or geometric transformation methods.
Abstract: This paper presents a deep end-to-end network for high dynamic range (HDR) imaging of dynamic scenes with background and foreground motions Generating an HDR image from a sequence of multi-exposure images is a challenging process when the images have misalignments by being taken in a dynamic situation Hence, recent methods first align the multi-exposure images to the reference by using patch matching, optical flow, homography transformation, or attention module before the merging In this paper, we propose a deep network that synthesizes the aligned images as a result of blending the information from multi-exposure images, because explicitly aligning photos with different exposures is inherently a difficult problem Specifically, the proposed network generates under/over-exposure images that are structurally aligned to the reference, by blending all the information from the dynamic multi-exposure images Our primary idea is that blending two images in the deep-feature-domain is effective for synthesizing multi-exposure images that are structurally aligned to the reference, resulting in better-aligned images than the pixel-domain blending or geometric transformation methods Specifically, our alignment network consists of a two-way encoder for extracting features from two images separately, several convolution layers for blending deep features, and a decoder for constructing the aligned images The proposed network is shown to generate the aligned images with a wide range of exposure differences very well and thus can be effectively used for the HDR imaging of dynamic scenes Moreover, by adding a simple merging network after the alignment network and training the overall system end-to-end, we obtain a performance gain compared to the recent state-of-the-art methods

Journal ArticleDOI
TL;DR: A two-step calibration based on soft constraint optimization, motivated by "no free lunch" theorem and error analysis is presented, which outperforms Zhang's algorithms from the point of view of the success ratio, accuracy and precision.
Abstract: Camera calibration is a basic and crucial problem in photogrammetry and computer vision. Although existing calibration techniques exhibit excellent precision and flexibility in classical cases, most of them need from 5 to 10 calibration images. Unfortunately, only a limited number of calibration images and control points can be available in many application fields such as criminal investigation, industrial robot and augmented reality. For these cases, this paper presented a two-step calibration based on soft constraint optimization, which is motivated by "no free lunch" theorem and error analysis. The key steps include (1) homography estimation with weighting function, (2) Initialization based on a simplified model, and (3) soft constraint optimization in terms of reprojection error. The proposed method provides direct access to geometric information of the object from very few images. After extensive experiments, the results demonstrate that the proposed algorithm outperforms Zhang's algorithms from the point of view of the success ratio, accuracy and precision.

Book ChapterDOI
23 Aug 2020
TL;DR: In this article, Danini et al. proposed a method for refining the local feature geometries by symmetric intensity-based matching, combined uncertainty propagation inside RANSAC with preemptive model verification, show a general scheme for computing uncertainty of minimal solvers results, and adapt the sample cheirality check for homography estimation.
Abstract: Local features e.g. SIFT and its affine and learned variants provide region-to-region rather than point-to-point correspondences. This has recently been exploited to create new minimal solvers for classical problems such as homography, essential and fundamental matrix estimation. The main advantage of such solvers is that their sample size is smaller, e.g., only two instead of four matches are required to estimate a homography. Works proposing such solvers often claim a significant improvement in run-time thanks to fewer RANSAC iterations. We show that this argument is not valid in practice if the solvers are used naively. To overcome this, we propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline. We propose a method for refining the local feature geometries by symmetric intensity-based matching, combine uncertainty propagation inside RANSAC with preemptive model verification, show a general scheme for computing uncertainty of minimal solvers results, and adapt the sample cheirality check for homography estimation. Our experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times when following our guidelines. We make code available at https://github.com/danini/affine-correspondences-for-camera-geometry.

Journal ArticleDOI
TL;DR: An analytical expression of optimal trajectory for camera in projective homography space is proposed in this article, which is totally free of camera parameters and is corresponding to camera’s shortest path in the 3-D space with straight path in translation and minimal geodesic in rotation.
Abstract: In order to improve the robustness and optimize trajectory of projective homography-based uncalibrated visual servoing (PHUVS) proposed in our previous work, an analytical expression of optimal trajectory for camera in projective homography space is proposed in this article, which is totally free of camera parameters and is corresponding to camera’s shortest path in the 3-D space with straight path in translation and minimal geodesic in rotation. The projective homography is computed without scale ambiguity in both planning and tracking stages. The PHUVS controller is modified correspondingly to track the planned trajectory in projective homography space while maintaining superior characteristic of PHUVS under uncalibrated scenario. The simulations and experiments’ results reveal the effectiveness and necessity of the proposed trajectory optimization method in the existence of large initial errors. Note to Practitioners —The state-of-the-art visual-guided robotic technology in industry usually requires system calibration, which is often costly, vulnerable, and challenging for ordinary workers. In our previous work, we offered an uncalibrated visual servo method based on projective homography named projective homography-based uncalibrated visual servoing (PHUVS), which is suitable for plug and play application for eye-in-hand robot visual servo tasks. However, PHUVS suffers from some defects, including undesirable 3-D space motion and local convergence. In this article, we proposed the trajectory planning method along with a modified PHUVS controller to improve the original one from the disadvantages mentioned earlier. This planning method is also calibration-free. With pure image information, a straight-line path in translational motion along with minimal geodesic in rotary motion can be achieved. This approach is capable of extending the range of applications for uncalibrated visual servo technology in robotic tasks, such as assembling, painting, and robotic machining.

Journal ArticleDOI
10 Feb 2020
TL;DR: An image-based approach that enables an aerial robot with a cable-suspended mechanism to find a payload with unknown mass, pick it up, transport it, and then put it down at a new location is presented.
Abstract: This letter presents an image-based approach that enables an aerial robot with a cable-suspended mechanism to find a payload with unknown mass, pick it up, transport it, and then put it down at a new location. The pick and place task is performed with minimum cable-swing angle. A new process is presented for state estimation, trajectory planning, and dynamic control using a single onboard camera and inertia measurement unit (IMU). Specifically, a new nonlinear observer, which does not require localization information from external sensors (such as a motion capture system), is created for controlling the robot's velocity. The outer position-control loop is designed in terms of the invariant image feature which is decoupled from the robot's attitude and is proved to be the flat outputs of the system. The combined estimation and control scheme is shown to be asymptotically stable in the Lyapunov sense, where damping due to air contributes to stabilizing the rates of the swing angle. The optimal trajectory of the Euclidean homography is efficiently generated to obtain dynamically-feasible trajectories for the image features. A least-square identification technique is employed to estimate the mass of each payload, and the input shaping technique is implemented to reduce payload swing motion. Finally, a practical package-delivery task is performed to validate the process, where it is shown that the robot can effectively pick up and deliver payloads on command with minimum cable-swing angle.

Posted Content
TL;DR: This paper proposes a deep network for exposure fusion that only takes two images, an underexposed image and an overexposed one, and integrates together homography estimation for compensating camera motion, attention mechanism for correcting remaining misalignment and moving pixels, and adversarial learning for alleviating other remaining artifacts.
Abstract: Modern cameras have limited dynamic ranges and often produce images with saturated or dark regions using a single exposure. Although the problem could be addressed by taking multiple images with different exposures, exposure fusion methods need to deal with ghosting artifacts and detail loss caused by camera motion or moving objects. This paper proposes a deep network for exposure fusion. For reducing the potential ghosting problem, our network only takes two images, an underexposed image and an overexposed one. Our network integrates together homography estimation for compensating camera motion, attention mechanism for correcting remaining misalignment and moving pixels, and adversarial learning for alleviating other remaining artifacts. Experiments on real-world photos taken using handheld mobile phones show that the proposed method can generate high-quality images with faithful detail and vivid color rendition in both dark and bright areas.

Book ChapterDOI
01 Jan 2020
TL;DR: An analytic image stabilization approach is described in this chapter where pixel information from the focal plane of the camera is stabilized and georegistered in a global reference frame to derive a direct closed-form analytic expression from 3D camera poses that is robust even in the presence of significant scene parallax.
Abstract: Aerial video captured from an airborne platform has an expanding range of applications including scene understanding, photogrammetry, surveying and mapping, traffic monitoring, bridge and civil infrastructure inspection, architecture and construction, delivery, disaster and emergency response, news and film, precision agriculture, and environmental monitoring and conservation. Some of the challenges in analyzing aerial video to track pedestrians, vehicles, and objects include small object size, relative motion of the object and platform, sensor jitter, and quality of imaging optics. An analytic image stabilization approach is described in this chapter where pixel information from the focal plane of the camera is stabilized and georegistered in a global reference frame. The aerial video is stabilized to maintain a fixed relative displacement between the moving platform and the scene. The proposed algorithm can be used to stabilize aerial imagery even when the available GPS and IMU measurements from the platform and sensor are inaccurate and noisy. Camera 3D poses are optimized using a homography-based robust cost function, but unlike most existing methods, the homography transformations are estimated without using any image-to-image estimation techniques. We derive a direct closed-form analytic expression from 3D camera poses that is robust even in the presence of significant scene parallax (i.e. very tall 3D buildings and man-made or natural structures). A robust non-linear least squares cost function is used to deal with outliers and speeds up computation by avoiding the use of RANdom SAmple Consensus (RANSAC). The proposed method and its efficiency is validated using several datasets and scenarios including DARPA Video and Image Retrieval and Analysis Tool (VIRAT) and high resolution wide area motion imagery (WAMI). scenarios.

Book ChapterDOI
23 Aug 2020
TL;DR: In this paper, Wang et al. derive a new differential homography that can account for the scanline-varying camera poses in Rolling Shutter (RS) cameras, and demonstrate its application to carry out RS-aware image stitching and rectification at one stroke.
Abstract: In this paper, we derive a new differential homography that can account for the scanline-varying camera poses in Rolling Shutter (RS) cameras, and demonstrate its application to carry out RS-aware image stitching and rectification at one stroke. Despite the high complexity of RS geometry, we focus in this paper on a special yet common input—two consecutive frames from a video stream, wherein the inter-frame motion is restricted from being arbitrarily large. This allows us to adopt simpler differential motion model, leading to a straightforward and practical minimal solver. To deal with non-planar scene and camera parallax in stitching, we further propose an RS-aware spatially-varying homogarphy field in the principle of As-Projective-As-Possible (APAP). We show superior performance over state-of-the-art methods both in RS image stitching and rectification, especially for images captured by hand-held shaking cameras.

Patent
06 May 2020
TL;DR: In this paper, a neural network is trained by generating a plurality of points, determining a 3D trajectory, sampling the 3D trajectories to obtain camera poses viewing the points, projecting the points onto 2D planes, comparing a generated homography using the projected points to the ground-truth homography and modifying the neural network based on the comparison.
Abstract: Augmented reality devices and methods for computing a homography based on two images. One method may include receiving a first image based on a first camera pose and a second image based on a second camera pose, generating a first point cloud based on the first image and a second point cloud based on the second image, providing the first point cloud and the second point cloud to a neural network, and generating, by the neural network, the homography based on the first point cloud and the second point cloud. The neural network may be trained by generating a plurality of points, determining a 3D trajectory, sampling the 3D trajectory to obtain camera poses viewing the points, projecting the points onto 2D planes, comparing a generated homography using the projected points to the ground-truth homography and modifying the neural network based on the comparison.

Journal ArticleDOI
TL;DR: The experimental results show that the proposed video error concealment method gives significantly improved objective and subjective video quality compared with the traditional reconstruction methods and is feasible for practical real-time applications.
Abstract: This paper proposes a novel video error concealment method for the reconstruction of all unknown regions in frames. The proposed method first performs temporal error concealment (TEC) sequentially in forward direction with the propagation of source information for the unknown regions of frames in group-of-frames (GOF). Then, it performs the TEC sequentially in backward direction for the remaining unknown regions in the GOF. For the reconstruction of unknown regions in each target frame in forward and backward TEC stages, we propose an adaptive homography-based registration of the reference frame. The proposed method selects adaptively global or local homography-based registration for the reconstruction of each unknown region. After the backward direction, the TEC is performed in each frame and the spatial error concealment is applied to reconstruct completely the remaining unknown regions with relatively small sizes. The experimental results show that the proposed method gives significantly improved objective and subjective video quality compared with the traditional reconstruction methods. The proposed method also gives comparably good reconstruction results for large size unknown regions and is feasible for practical real-time applications.

Journal ArticleDOI
TL;DR: FADE (feature aggregation for depth estimation), which treats spatial and context information separately and focuses on aggregating features for efficient learning of the MVS problem, achieves state-of-the-art performance in terms of accuracy and requiring the least amount of model parameters.
Abstract: Both structural and contextual information is essential and widely used in image analysis. However, current multi-view stereo (MVS) approaches usually use a single common pre-trained model as pixel descriptor to extract features, which mix structural and contextual information together and thus increase the difficulty of matching correspondence. In this paper, we propose FADE (feature aggregation for depth estimation), which treats spatial and context information separately and focuses on aggregating features for efficient learning of the MVS problem. Spatial information includes image details such as edges and corners, whereas context information comprises object features such as shapes and traits. To aggregate these multi-level features, we use an attention mechanism to select important features for matching. We then build a plane sweep volume by using a homography backward warping method to generate match candidates. Furthermore, we propose a novel cost volume regularization network aims to minimize the noise in the matching candidates. Finally, we take advantage of 3D stacked hourglass and regression to produces high-quality depth maps. With these well-aggregated features, FADE can efficiently perform dense depth reconstruction, achieving state-of-the-art performance in terms of accuracy and requiring the least amount of model parameters.

Posted Content
TL;DR: A new differential homography that can account for the scanline-varying camera poses in Rolling Shutter (RS) cameras is derived, and its application to carry out RS-aware image stitching and rectification at one stroke is demonstrated.
Abstract: In this paper, we derive a new differential homography that can account for the scanline-varying camera poses in Rolling Shutter (RS) cameras, and demonstrate its application to carry out RS-aware image stitching and rectification at one stroke. Despite the high complexity of RS geometry, we focus in this paper on a special yet common input -- two consecutive frames from a video stream, wherein the inter-frame motion is restricted from being arbitrarily large. This allows us to adopt simpler differential motion model, leading to a straightforward and practical minimal solver. To deal with non-planar scene and camera parallax in stitching, we further propose an RS-aware spatially-varying homography field in the principle of As-Projective-As-Possible (APAP). We show superior performance over state-of-the-art methods both in RS image stitching and rectification, especially for images captured by hand-held shaking cameras.

Journal ArticleDOI
TL;DR: A novel nonlinear observer on the Special Linear group SL(3) applied to homography estimation is developed, formulation of observer innovation that exploits directly point and line correspondences as input without requiring prior algebraic reconstruction of individual homographies.

Journal ArticleDOI
TL;DR: An original autonomous correction algorithm for UAV navigation system based on comparison between terrain images obtained by onboard machine vision system and vector topographic map images is considered.
Abstract: —The paper considers an original autonomous correction algorithm for UAV navigation system based on comparison between terrain images obtained by onboard machine vision system and vector topographic map images. Comparison is performed by calculating the homography of vision system images segmented using the convolutional neural network and the vector map images. The presented results of mathematical and flight experiments confirm the algorithm effectiveness for navigation applications.

Journal ArticleDOI
TL;DR: The results show that the ground plane can be successfully detected, if visible, regardless of camera orientation, ground plane size, and movement speed of the human, and has broad application in conditions where environments are dynamic and cluttered.
Abstract: Identifying the orientation and location of a camera placed arbitrarily in a room is a challenging problem. Existing approaches impose common assumptions (e.g. the ground plane is the largest plane in the scene, the camera roll angle is zero). We present a method for estimating the ground plane and camera orientation in an unknown indoor environment given RGB-D data (colour and depth) from a camera with arbitrary orientation and location assuming that at least one person can be seem smoothly moving within the camera field of view with their body perpendicular to the ground plane. From a set of RGB-D data trials captured using a Kinect sensor, we develop an approach to identify potential ground planes, cluster objects in the scenes and find 2D Scale-Invariant Feature Transform (SIFT) keypoints for those objects, and then build a motion sequence for each object by evaluating the intersection of each object's histogram in three dimensions across frames. After finding the reliable homography for all objects, we identify the moving human object by checking the change in the histogram intersection, object dimensions and the trajectory vector of the homgraphy decomposition. We then estimate the ground plane from the potential planes using the normal vector of the homography decomposition, the trajectory vector, and the spatial relationship of the planes to the other objects in the scene. Our results show that the ground plane can be successfully detected, if visible, regardless of camera orientation, ground plane size, and movement speed of the human. We evaluated our approach on our own data and on three public datasets, robustly estimating the ground plane in all indoor scenarios. Our successful approach substantially reduces restrictions on a prior knowledge of the ground plane, and has broad application in conditions where environments are dynamic and cluttered, as well as fields such as automated robotics, localization and mapping.

Proceedings ArticleDOI
01 May 2020
TL;DR: A novel initialization method for monocular SLAM based on planar features that fully exploits the plane information from multiple frames and avoids the ambiguities in homography decomposition is proposed.
Abstract: Initialization is essential to monocular Simultaneous Localization and Mapping (SLAM) problems. This paper focuses on a novel initialization method for monocular SLAM based on planar features. The algorithm starts by homography estimation in a sliding window. It then proceeds to a global plane optimization (GPO) to obtain camera poses and the plane normal. 3D points can be recovered using planar constraints without triangulation. The proposed method fully exploits the plane information from multiple frames and avoids the ambiguities in homography decomposition. We validate our algorithm on the collected chessboard dataset against baseline implementations and present extensive analysis. Experimental results show that our method outperforms the ne-tuned baselines in both accuracy and real-time.

Proceedings ArticleDOI
24 Oct 2020
TL;DR: The approach provides a mean to perform a dense 3D plane reconstruction from two RGB images only without relying on RGB-D inputs or strong priors such as Manhattan assumptions, and can be extented to handle sequences of images.
Abstract: This paper proposes a novel method to simultaneously perform relative camera pose estimation and planar reconstruction of a scene from two RGB images. We start by extracting and matching superpixel information from both images and rely on a novel multi-model RANSAC approach to estimate multiple homographies from superpixels and identify matching planes. Ambiguity issues when performing homography decomposition are handled by proposing a voting system to more reliably estimate relative camera pose and plane parameters. A non-linear optimization process is also proposed to perform bundle adjustment that exploits a joint representation of homographies and works both for image pairs and whole sequences of image (vSLAM). As a result, the approach provides a mean to perform a dense 3D plane reconstruction from two RGB images only without relying on RGB-D inputs or strong priors such as Manhattan assumptions, and can be extented to handle sequences of images. Our results compete with keypointbased techniques such as ORB-SLAM while providing a dense representation and are more precise than direct and semi-direct pose estimation techniques used in LSD-SLAM or DPPTAM.