scispace - formally typeset
Search or ask a question

Showing papers on "Homography (computer vision) published in 2021"


Proceedings ArticleDOI
01 Jun 2021
TL;DR: Wang et al. as mentioned in this paper propose a novel implicit method to learn spatial correspondence among blurry frames in the feature space, which builds a correlation volume pyramid among all the pixel-pairs between neigh-boring frames.
Abstract: Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions. In order to utilize neighboring sharp patches, typical methods rely mainly on homography or optical flows to spatially align neighboring blurry frames. However, such explicit approaches are less effective in the presence of fast motions with large pixel displacements. In this work, we propose a novel implicit method to learn spatial correspondence among blurry frames in the feature space. To construct distant pixel correspondences, our model builds a correlation volume pyramid among all the pixel-pairs between neigh-boring frames. To enhance the features of the reference frame, we design a correlative aggregation module that maximizes the pixel-pair correlations with its neighbors based on the volume pyramid. Finally, we feed the aggregated features into a reconstruction module to obtain the restored frame. We design a generative adversarial paradigm to optimize the model progressively. Our proposed method is evaluated on the widely-adopted DVD dataset, along with a newly collected High-Frame-Rate (1000 fps) Dataset for Video Deblurring (HFR-DVD). Quantitative and qualitative experiments show that our model performs favorably on both datasets against previous state-of-the-art methods, confirming the benefit of modeling all-range spatial correspondence for video deblurring.

59 citations


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed an unsupervised deep image stitching framework consisting of two stages, which consists of an ablation-based loss and a transformer layer to warp the input images in the stitching-domain space.
Abstract: Traditional feature-based image stitching technologies rely heavily on feature detection quality, often failing to stitch images with few features or low resolution. The learning-based image stitching solutions are rarely studied due to the lack of labeled data, making the supervised methods unreliable. To address the above limitations, we propose an unsupervised deep image stitching framework consisting of two stages: unsupervised coarse image alignment and unsupervised image reconstruction. In the first stage, we design an ablation-based loss to constrain an unsupervised homography network, which is more suitable for large-baseline scenes. Moreover, a transformer layer is introduced to warp the input images in the stitching-domain space. In the second stage, motivated by the insight that the misalignments in pixel-level can be eliminated to a certain extent in feature-level, we design an unsupervised image reconstruction network to eliminate the artifacts from features to pixels. Specifically, the reconstruction network can be implemented by a low-resolution deformation branch and a high-resolution refined branch, learning the deformation rules of image stitching and enhancing the resolution simultaneously. To establish an evaluation benchmark and train the learning framework, a comprehensive real-world image dataset for unsupervised deep image stitching is presented and released. Extensive experiments well demonstrate the superiority of our method over other state-of-the-art solutions. Even compared with the supervised solutions, our image stitching quality is still preferred by users.

48 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: Patch2Pix as discussed by the authors proposes a new perspective to estimate correspondences in a detect-to-refine manner, where they first predict patch-level match proposals and then refine them.
Abstract: The classical matching pipeline used for visual localization typically involves three steps: (i) local feature detection and description, (ii) feature matching, and (iii) outlier rejection. Recently emerged correspondence networks propose to perform those steps inside a single network but suffer from low matching resolution due to the memory bottle-neck. In this work, we propose a new perspective to estimate correspondences in a detect-to-refine manner, where we first predict patch-level match proposals and then refine them. We present Patch2Pix, a novel refinement network that refines match proposals by regressing pixel-level matches from the local regions defined by those proposals and jointly rejecting outlier matches with confidence scores. Patch2Pix is weakly supervised to learn correspondences that are consistent with the epipolar geometry of an input image pair. We show that our refinement network significantly improves the performance of correspondence networks on image matching, homography estimation, and localization tasks. In addition, we show that our learned refinement generalizes to fully-supervised methods without retraining, which leads us to state-of-the-art localization performance. The code is available at https://github.com/GrumpyZhou/patch2pix.

43 citations


Journal ArticleDOI
TL;DR: A homography-based method by combining an unmanned aerial vehicle (UAV) and the digital image correlation (DIC) together for vibration measurement of a bridge model and the effectiveness of the proposed method is validated against the fixed camera- based method.

43 citations


Proceedings ArticleDOI
20 Jun 2021
TL;DR: In this article, a projective invariant, Characteristic Number, is used to match co-planar local sub-regions for input images, which produces consistent line and point pairs, suppressing artifacts in overlapping areas.
Abstract: Generating high-quality stitched images with natural structures is a challenging task in computer vision. In this paper, we succeed in preserving both local and global geometric structures for wide parallax images, while reducing artifacts and distortions. A projective invariant, Characteristic Number, is used to match co-planar local sub-regions for input images. The homography between these well-matched sub-regions produces consistent line and point pairs, suppressing artifacts in overlapping areas. We explore and introduce global collinear structures into an objective function to specify and balance the desired characters for image warping, which can preserve both local and global structures while alleviating distortions. We also develop comprehensive measures for stitching quality to quantify the collinearity of points and the discrepancy of matched line pairs by considering the sensitivity to linear structures for human vision. Extensive experiments demonstrate the superior performance of the proposed method over the state-of-the-art by presenting sharp textures and preserving prominent natural structures in stitched images. Especially, our method not only exhibits lower errors but also the least divergence across all test images. Code is available at https://github.com/dut-media-lab/Image-Stitching.

41 citations


Journal ArticleDOI
TL;DR: In this article, a vision-based approach using unmanned aerial vehicles (UAVs) mounted with high-resolution cameras was proposed to assess the health of civil infrastructure in the Middle East.
Abstract: Structural displacement is an important quantity to assess the health of civil infrastructure. Vision‐based approaches using unmanned aerial vehicles (UAV) mounted with high‐resolution cam...

41 citations


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed an unsupervised deep image stitching framework consisting of two stages, which consists of an ablation-based loss and a transformer layer to warp the input images in the stitching-domain space.
Abstract: Traditional feature-based image stitching technologies rely heavily on feature detection quality, often failing to stitch images with few features or low resolution. The learning-based image stitching solutions are rarely studied due to the lack of labeled data, making the supervised methods unreliable. To address the above limitations, we propose an unsupervised deep image stitching framework consisting of two stages: unsupervised coarse image alignment and unsupervised image reconstruction. In the first stage, we design an ablation-based loss to constrain an unsupervised homography network, which is more suitable for large-baseline scenes. Moreover, a transformer layer is introduced to warp the input images in the stitching-domain space. In the second stage, motivated by the insight that the misalignments in pixel-level can be eliminated to a certain extent in feature-level, we design an unsupervised image reconstruction network to eliminate the artifacts from features to pixels. Specifically, the reconstruction network can be implemented by a low-resolution deformation branch and a high-resolution refined branch, learning the deformation rules of image stitching and enhancing the resolution simultaneously. To establish an evaluation benchmark and train the learning framework, a comprehensive real-world image dataset for unsupervised deep image stitching is presented and released. Extensive experiments well demonstrate the superiority of our method over other state-of-the-art solutions. Even compared with the supervised solutions, our image stitching quality is still preferred by users.

32 citations


Journal ArticleDOI
TL;DR: A novel application of the image-to-world homography which gives the monocular vision system the efficacy of counting vehicles by lane and estimating vehicle length and speed in real-world units.
Abstract: Cameras have been widely used in traffic operations. While many technologically smart camera solutions in the market can be integrated into Intelligent Transport Systems (ITS) for automated detection, monitoring and data generation, many Network Operations (a.k.a Traffic Control) Centres still use legacy camera systems as manual surveillance devices. In this paper, we demonstrate effective use of these older assets by applying computer vision techniques to extract traffic data from videos captured by legacy cameras. In our proposed vision-based pipeline, we adopt recent state-of-the-art object detectors and transfer-learning to detect vehicles, pedestrians, and cyclists from monocular videos. By weakly calibrating the camera, we demonstrate a novel application of the image-to-world homography which gives our monocular vision system the efficacy of counting vehicles by lane and estimating vehicle length and speed in real-world units. Our pipeline also includes a module which combines a convolutional neural network (CNN) classifier with projective geometry information to classify vehicles. We have tested it on videos captured at several sites with different traffic flow conditions and compared the results with the data collected by piezoelectric sensors. Our experimental results show that the proposed pipeline can process 60 frames per second for pre-recorded videos and yield high-quality metadata for further traffic analysis.

30 citations


Proceedings ArticleDOI
22 Apr 2021
TL;DR: Deep Lucas-Kanade feature map (DLKFM) as mentioned in this paper was proposed to align multimodal image pairs by extending the traditional Lucas-kanade algorithm with networks, which can spontaneously recognize invariant features under various appearance changing conditions.
Abstract: Estimating homography to align image pairs captured by different sensors or image pairs with large appearance changes is an important and general challenge for many computer vision applications. In contrast to others, we propose a generic solution to pixel-wise align multimodal image pairs by extending the traditional Lucas-Kanade algorithm with networks. The key contribution in our method is how we construct feature maps, named as deep Lucas-Kanade feature map (DLKFM). The learned DLKFM can spontaneously recognize invariant features under various appearance-changing conditions. It also has two nice properties for the Lucas-Kanade algorithm: (1) The template feature map keeps brightness consistency with the input feature map, thus the color difference is very small while they are well-aligned. (2) The Lucas-Kanade objective function built on DLKFM has a smooth landscape around ground truth homography parameters, so the iterative solution of the Lucas-Kanade can easily converge to the ground truth. With those properties, directly updating the Lucas-Kanade algorithm on our feature maps will precisely align image pairs with large appearance changes. We share the datasets, code, and demo video online 1.

27 citations


Journal ArticleDOI
Qiang Zhao1, Yike Ma1, Chen Zhu1, Chunfeng Yao2, Bailan Feng2, Feng Dai1 
TL;DR: In this article, a deep neural network that estimates homography accurately enough for image stitching of images with small parallax is presented, where the key components of the network are feature maps with progressively increased resolution and matching cost volumes constructed in a hybrid manner.

22 citations


Proceedings ArticleDOI
20 Apr 2021
TL;DR: In this article, a bidirectional implicit homography estimation (biHomE) loss is proposed to minimize the distance in the feature space between warped images from the source viewpoint and the corresponding image from the target viewpoint.
Abstract: Homography estimation is often an indispensable step in many computer vision tasks. The existing approaches, however, are not robust to illumination and/or larger viewpoint changes. In this paper, we propose bidirectional implicit Homography Estimation (biHomE) loss for unsupervised homography estimation. biHomE minimizes the distance in the feature space between the warped image from the source viewpoint and the corresponding image from the target viewpoint. Since we use a fixed pre-trained feature extractor and the only learnable component of our frame-work is the homography network, we effectively decouple the homography estimation from representation learning. We use an additional photometric distortion step in the synthetic COCO dataset generation to better represent the illumination variation of the real-world scenarios. We show that biHomE achieves state-of-the-art performance on synthetic COCO dataset, which is also comparable or better compared to supervised approaches. Furthermore, the empirical results demonstrate the robustness of our approach to illumination variation compared to existing methods.

Posted Content
TL;DR: 3D vehicle detection is conducted by estimating the rotated bounding boxes (r-boxes) in the bird’s eye view (BEV) images generated from inverse perspective mapping and proposes a new regression target called tailed r-box and a dual-view network architecture which boosts the detection accuracy on warped BEV images.
Abstract: This paper proposes a method to extract the position and pose of vehicles in the 3D world from a single traffic camera. Most previous monocular 3D vehicle detection algorithms focused on cameras on vehicles from the perspective of a driver, and assumed known intrinsic and extrinsic calibration. On the contrary, this paper focuses on the same task using uncalibrated monocular traffic cameras. We observe that the homography between the road plane and the image plane is essential to 3D vehicle detection and the data synthesis for this task, and the homography can be estimated without the camera intrinsics and extrinsics. We conduct 3D vehicle detection by estimating the rotated bounding boxes (r-boxes) in the bird's eye view (BEV) images generated from inverse perspective mapping. We propose a new regression target called \textit{tailed~r-box} and a \textit{dual-view} network architecture which boosts the detection accuracy on warped BEV images. Experiments show that the proposed method can generalize to new camera and environment setups despite not seeing imaged from them during training.

Journal ArticleDOI
TL;DR: The accuracy and efficiency of HTSM are the highest among the three methods if the object has a planar surface, and the efficiency of SRSM is higher than that of CSM.

Proceedings Article
01 Jan 2021
TL;DR: Zhang et al. as discussed by the authors proposed a homography flow representation, which can be estimated by a weighted sum of 8 pre-defined homography Flow bases, and a Low Rank Representation (LRR) block that reduces the feature rank, so that features corresponding to the dominant motions are retained while others are rejected.
Abstract: In this paper, we introduce a new framework for unsupervised deep homography estimation. Our contributions are 3 folds. First, unlike previous methods that regress 4 offsets for a homography, we propose a homography flow representation, which can be estimated by a weighted sum of 8 pre-defined homography flow bases. Second, considering a homography contains 8 Degree-of-Freedoms (DOFs) that is much less than the rank of the network features, we propose a Low Rank Representation (LRR) block that reduces the feature rank, so that features corresponding to the dominant motions are retained while others are rejected. Last, we propose a Feature Identity Loss (FIL) to enforce the learned image feature warp-equivariant, meaning that the result should be identical if the order of warp operation and feature extraction is swapped. With this constraint, the unsupervised optimization is achieved more effectively and more stable features are learned. Extensive experiments are conducted to demonstrate the effectiveness of all the newly proposed components, and results show that our approach outperforms the state-of-the-art on the homography benchmark datasets both qualitatively and quantitatively. Code is available at this https URL.

Journal ArticleDOI
TL;DR: This paper proposes a CNN-based moiré removal method for recaptured screen images, and proposes a convolutional neural Network with Additive and Multiplicative modules (termed as AMNet) to transfer the low light moirés image to the bright moirÉ-free image.
Abstract: In many situations, such as transferring data between devices and recording precious moments, we would like to capture the contents on screens using digital cameras for convenience. These recaptured screen images and videos suffer from a special type of degradation called “moire pattern”, which is caused by the aliasing between the grid of display screen and the array of camera sensor. However, few works are proposed to tackle this problem. Considering the great success of convolutional neural networks (CNNs) in image restoration, we propose a CNN-based moire removal method for recaptured screen images. There are mainly two contributions in this paper. First, for the generation of training data, we propose an image registration algorithm via global homography transform and local patch matching to compensate the significant viewpoint disparity between the recaptured screen image and the moire-free image obtained via screenshot. We construct a moire removal and brightness improvement (MRBI) database with aligned moire-free and moire images. Second, we propose a convolutional neural Network with Additive and Multiplicative modules (termed as AMNet) to transfer the low light moire image to the bright moire-free image. The proposed network is trained with pixel-wise loss, perceptual loss, and adversarial loss. Extensive experiments on 340 test images demonstrate that the proposed method outperforms state-of-the-art moire removal methods.

Journal ArticleDOI
TL;DR: 3D reconstruction method can effectively overcome the limitation of the homography-based method that the fixed reference points and the target points must be coplanar.
Abstract: This paper presents a measurement method of bridge vibration based on three-dimensional (3D) reconstruction. A video of bridge model vibration is recorded by an unmanned aerial vehicle (UAV), and the displacement of target points on the bridge model is tracked by the digital image correlation (DIC) method. Due to the UAV motion, the DIC-tracked displacement of the bridge model includes the absolute displacement caused by the excitation and the false displacement induced by the UAV motion. Therefore, the UAV motion must be corrected to measure the real displacement. Using four corner points on a fixed object plane as the reference points, the projection matrix for each frame of images can be estimated by the UAV camera calibration, and then the 3D world coordinates of the target points on the bridge model can be recovered. After that, the real displacement of the target points can be obtained. To verify the correctness of the results, the operational modal analysis (OMA) method is used to extract the natural frequencies of the bridge model. The results show that the first natural frequency obtained from the proposed method is consistent with the one obtained from the homography-based method. By further comparing with the homography-based correction method, it is found that the 3D reconstruction method can effectively overcome the limitation of the homography-based method that the fixed reference points and the target points must be coplanar.

Proceedings ArticleDOI
01 Jan 2021
TL;DR: In this paper, a grid of key-points distributed uniformly on the entire field instead of using only sparse local corners and line intersections is used to extend the keypoint coverage to the textureless parts of the field as well.
Abstract: We propose a novel framework to register sports-fields as they appear in broadcast sports videos. Unlike previous approaches, we particularly address the challenge of field- registration when: (a) there are not enough distinguishable features on the field, and (b) no prior knowledge is available about the camera. To this end, we detect a grid of key- points distributed uniformly on the entire field instead of using only sparse local corners and line intersections, thereby extending the keypoint coverage to the texture-less parts of the field as well. To further improve keypoint based homography estimate, we differentialbly warp and align it with a set of dense field-features defined as normalized distance- map of pixels to their nearest lines and key-regions. We predict the keypoints and dense field-features simultaneously using a multi-task deep network to achieve computational efficiency. To have a comprehensive evaluation, we have compiled a new dataset called SportsFields which is collected from 192 video-clips from 5 different sports covering large environmental and camera variations. We empirically demonstrate that our algorithm not only achieves state of the art field-registration accuracy but also runs in real-time for HD resolution videos using commodity hardware.

Journal ArticleDOI
TL;DR: A mobile projective AR framework in which the AR device is detached from human workers and carried by one or more mobile collaborative robots (co-robots) is proposed, which achieves glassless AR that is visible to the naked eye using a camera-projector system to superimpose virtual 3D information onto planar or non-planar physical surfaces.


Journal ArticleDOI
TL;DR: In this paper, a method which detects the pollution levels of transport vehicles from the images of IP cameras by means of computer vision techniques and neural networks is proposed, and the trajectory of each vehicle is computed by applying convolutional neural networks for object detection and tracking algorithms.

Journal ArticleDOI
TL;DR: This work is the first to directly infer Manhattan-aligned outputs, and introduces the geodesic heatmaps and loss and a boundary-aware center of mass calculation that facilitate higher quality keypoint estimation in the spherical domain.

Journal ArticleDOI
TL;DR: In this article, the authors developed an automated system to estimate the size and count the number of steel rebars in bale packing using computer vision techniques based on a convolutional neural network (CNN).
Abstract: Conventionally, the number of steel rebars at construction sites is manually counted by workers. However, this practice gives rise to several problems: it is slow, human-resource-intensive, time-consuming, error-prone, and not very accurate. Consequently, a new method of quickly and accurately counting steel rebars with a minimal number of workers needs to be developed to enhance work efficiency and reduce labor costs at construction sites. In this study, the authors developed an automated system to estimate the size and count the number of steel rebars in bale packing using computer vision techniques based on a convolutional neural network (CNN). A dataset containing 622 images of rebars with a total of 186,522 rebar cross sections and 409 poly tags was established for segmentation rebars and poly tags in images. The images were collected in a full HD resolution of 1920 × 1080 pixels and then center-cropped to 512 × 512 pixels. Moreover, data augmentation was carried out to create 4668 images for the training dataset. Based on the training dataset, YOLACT-based steel bar size estimation and a counting model with a Box and Mask of over 30 mAP was generated to satisfy the aim of this study. The proposed method, which is a CNN model combined with homography, can estimate the size and count the number of steel rebars in an image quickly and accurately, and the developed method can be applied to real construction sites to efficiently manage the stock of steel rebars.

Journal ArticleDOI
TL;DR: The proposed RSF map can be applied to semi-open scenarios in practice to provide a reliable basic for IV localization and demonstrate that the mean error of the nodes between the created and actual maps was 2.7 cm.
Abstract: In order to pursue high-accuracy localization for intelligent vehicles (IVs) in semi-open scenarios, this study proposes a new map creation method based on multi-sensor fusion technique. In this new method, the road scenario fingerprint (RSF) was employed to fuse the visual features, three-dimensional (3D) data and trajectories in the multi-view and multi-sensor information fusion process. The visual features were collected in the front and downward views of the IVs; the 3D data were collected by the laser scanner and the downward camera and a homography method was proposed to reconstruct the monocular 3D data; the trajectories were computed from the 3D data in the downward view. Moreover, a new plane-corresponding calibration strategy was developed to ensure the fusion quality of sensory measurements of the camera and laser. In order to evaluate the proposed method, experimental tests were carried out in a 5 km semi-open ring route. A series of nodes were found to construct the RSF map. The experimental results demonstrate that the mean error of the nodes between the created and actual maps was 2.7 cm, the standard deviation of the nodes was 2.1 cm and the max error was 11.8 cm. The localization error of the IV was 10.8 cm. Hence, the proposed RSF map can be applied to semi-open scenarios in practice to provide a reliable basic for IV localization.

Proceedings ArticleDOI
15 Jun 2021
TL;DR: In this article, the visual tracking of an evading UAV using a pursuer-UAV is examined, which combines principles of deep learning, optical flow, intra-frame homography and correlation based tracking.
Abstract: In this work the visual tracking of an evading UAV using a pursuer-UAV is examined. The developed method combines principles of deep learning, optical flow, intra-frame homography and correlation based tracking. A Yolo tracker for short term tracking is employed, complimented by optical flow and homography techniques. In case there is no detected evader-UAV, the MOSSE tracking algorithm re-initializes the search and the PTZ-camera zooms-out to cover a wider Filed of View. The camera's controller adjusts the pan and tilt angles so that the evader-UAV is as close to the center of view as possible, while its zoom is commanded in order to for the captured evader-UAV bounding box cover as much as possible the captured-frame. Experimental studies are offered to highlight the algorithm's principle and evaluate its performance.

Journal ArticleDOI
Chong Wei1, Shurong Li1, Kai Wu2, Zhang Zijian1, Ying Wang1 
TL;DR: The experimental results confirm that the proposed computer vision-based damage inspection system for road markings is effective in automating the inspection of road markings and producing objective damage assessments that should significantly assist road managers in prioritizing maintenance operations.

Proceedings ArticleDOI
01 Jan 2021
TL;DR: In this article, a novel monocular camera-only holistic end-to-end trajectory planning network with a Bird-Eye-View (BEV) intermediate representation that comes in the form of binary Occupancy Grid Maps (OGMs) is proposed.
Abstract: Camera-based end-to-end driving neural networks bring the promise of a low-cost system that maps camera images to driving control commands. These networks are appealing because they replace laborious hand engineered building blocks but their black-box nature makes them difficult to delve in case of failure. Recent works have shown the importance of using an explicit intermediate representation that has the benefits of increasing both the interpretability and the accuracy of networks’ decisions. Nonetheless, these camera-based networks reason in camera view where scale is not homogeneous and hence not directly suitable for motion forecasting. In this paper, we introduce a novel monocular camera-only holistic end-to-end trajectory planning network with a Bird-Eye-View (BEV) intermediate representation that comes in the form of binary Occupancy Grid Maps (OGMs). To ease the prediction of OGMs in BEV from camera images, we introduce a novel scheme where the OGMs are first predicted as semantic masks in camera view and then warped in BEV using the homography between the two planes. The key element allowing this transformation to be applied to 3D objects such as vehicles, consists in predicting solely their footprint in camera-view, hence respecting the flat world hypothesis implied by the homography.

Journal ArticleDOI
TL;DR: A drone image stitching based on mesh-guided deformation and ground constraint, which can closely match the characteristics of images and achieve precise registration and acquire ideal stitching effect is introduced.
Abstract: This article introduces a drone image stitching based on mesh-guided deformation and ground constraint, which can closely match the characteristics of images and achieve precise registration and acquire ideal stitching effect. The traditional methods use the homography model to align the image, which causes artifacts in the result of stitching the images with parallax. To overcome this situation, the image is divided into meshes and the mesh vertices of the target image are used to guide the warping. A new energy function is designed to represent the deformation characteristics of the image. We propose a new alignment term by using local homography and a local scale term by using the edge information of the mesh. The established mesh-guided deformation model can overcome image parallax caused by some external factors and eliminate the ghostly parts of the result. Moreover, imaged scene is not effectively planar and some fluctuations exist in the scene of the images, which will distort the stitching result. We propose a ground constraint with the ground plane as the main plane to reduce projection distortions in non-overlapping areas between images. Finally, the method of creating groundtruth is proposed, which can evaluate the naturalness of results and make comparison more reasonable. Several sets of challenging drone images are tested, and the experimental results show that our stitching system has good results.

Journal ArticleDOI
TL;DR: This work proposes deep neural network models that can locate four corner plate positions, which can then be used to perform the perspective transformation that can been used to rectify plates.
Abstract: Skewness and obliqueness of vehicle plate images influence license plate recognition. The more tilted plate images are, the harder the recognition task is. To this end, if plate images are preprocessed to be aligned and rectified, the recognition performance would improve. We propose deep neural network models that can locate four corner plate positions, which can then be used to perform the perspective transformation that can be used to rectify plates. Such a transformation is called homography. The models consist of two sequential parts: a feature extraction part having convolution and a regression part with fully connected layers. The models are open in the sense that the feature extraction part can host other well-known models such as Mobilenet as long as they have the feature capture capability. We devise a loss function as the sum of Euclidean distance between predicted coordinates and ground truth and discuss image augmentation schemes. The experiment results show that the models with well-known object detection models are able to predict corner positions with relatively high precision.

Journal ArticleDOI
Chunhui Zhao1, Bin Fan1, Jinwen Hu1, Quan Pan1, Zhao Xu1 
TL;DR: Zhang et al. as mentioned in this paper statistically optimizes the solution for the homography-based relative pose estimation problem, assuming a known gravity direction and a dominant ground plane, enabling a least squares pose estimation between two views.
Abstract: Relative pose estimation has become a fundamental and important problem in visual simultaneous localization and mapping. This paper statistically optimizes the solution for the homography-based relative pose estimation problem. Assuming a known gravity direction and a dominant ground plane, the homography representation in the normalized image plane enables a least squares pose estimation between two views. Furthermore, an iterative estimation method of the camera trajectory is developed for visual odometry. The accuracy and robustness of the proposed algorithm are experimentally tested on synthetic and real data in indoor and outdoor environments. Various metrics confirm the effectiveness of the proposed method in practical applications.

Journal ArticleDOI
TL;DR: A homography-based dynamic control approach applied to station keeping of autonomous underwater vehicles (AUVs) without relying on linear velocity measurements is proposed, which is robust with respect to model uncertainties and unknown currents.
Abstract: A homography-based dynamic control approach applied to station keeping of autonomous underwater vehicles (AUVs) without relying on linear velocity measurements is proposed. The homography estimated from images of a planar target scene captured by a downward-looking camera is directly used as feedback information. The full dynamics of the AUV are exploited in a hierarchical control design with inner and outer loop architectures. Enhanced by integral compensation actions and disturbance torque estimation, the proposed controller is robust with respect to model uncertainties and unknown currents. The performance of the proposed control approach is illustrated using both comparative simulation results conducted on a realistic AUV model and experimental validations on an in-house AUV.