Showing papers on "Monocular vision published in 2021"

PDF

Open Access

Journal Article•DOI•

A Vision-Based Pipeline for Vehicle Counting, Speed Estimation, and Classification

[...]

Chenghuan Liu¹, Du Q. Huynh¹, Yuchao Sun¹, Mark Reynolds¹, Steve Atkinson² - Show less +1 more•Institutions (2)

University of Western Australia¹, Main Roads Western Australia²

01 Dec 2021-IEEE Transactions on Intelligent Transportation Systems

TL;DR: A novel application of the image-to-world homography which gives the monocular vision system the efficacy of counting vehicles by lane and estimating vehicle length and speed in real-world units.

...read moreread less

Abstract: Cameras have been widely used in traffic operations. While many technologically smart camera solutions in the market can be integrated into Intelligent Transport Systems (ITS) for automated detection, monitoring and data generation, many Network Operations (a.k.a Traffic Control) Centres still use legacy camera systems as manual surveillance devices. In this paper, we demonstrate effective use of these older assets by applying computer vision techniques to extract traffic data from videos captured by legacy cameras. In our proposed vision-based pipeline, we adopt recent state-of-the-art object detectors and transfer-learning to detect vehicles, pedestrians, and cyclists from monocular videos. By weakly calibrating the camera, we demonstrate a novel application of the image-to-world homography which gives our monocular vision system the efficacy of counting vehicles by lane and estimating vehicle length and speed in real-world units. Our pipeline also includes a module which combines a convolutional neural network (CNN) classifier with projective geometry information to classify vehicles. We have tested it on videos captured at several sites with different traffic flow conditions and compared the results with the data collected by piezoelectric sensors. Our experimental results show that the proposed pipeline can process 60 frames per second for pre-recorded videos and yield high-quality metadata for further traffic analysis.

...read moreread less

30 citations

Journal Article•DOI•

Pose Estimation of Excavator Manipulator Based on Monocular Vision Marker System

[...]

Jiangying Zhao¹, Yongbiao Hu¹, Mingrui Tian¹•Institutions (1)

Chang'an University¹

30 Jun 2021-Sensors

TL;DR: In this paper, an approach, through a monocular camera and marker, was proposed to estimate the pose parameters (including orientation and position) of the excavator manipulator, and the results showed that the maximum detectable depth of the system is greater than 11 m, the orientation error is less than 8.5° and the position error was less than 22 mm.

...read moreread less

Abstract: Excavation is one of the broadest activities in the construction industry, often affected by safety and productivity. To address these problems, it is necessary for construction sites to automatically monitor the poses of excavator manipulators in real time. Based on computer vision (CV) technology, an approach, through a monocular camera and marker, was proposed to estimate the pose parameters (including orientation and position) of the excavator manipulator. To simulate the pose estimation process, a measurement system was established with a common camera and marker. Through comprehensive experiments and error analysis, this approach showed that the maximum detectable depth of the system is greater than 11 m, the orientation error is less than 8.5°, and the position error is less than 22 mm. A prototype of the system that proved the feasibility of the proposed method was tested. Furthermore, this study provides an alternative CV technology for monitoring construction machines.

...read moreread less

14 citations

Journal Article•DOI•

Monocular Vision-Based Calibration Method for Determining Frequency Characteristics of Low-Frequency Vibration Sensors

[...]

Ming Yang¹, Chenguang Cai, Zhihua Liu, Ying Wang²•Institutions (2)

Guizhou University¹, Beijing University of Chemical Technology²

15 Feb 2021-IEEE Sensors Journal

TL;DR: A monocular vision-based calibration method for low-frequency vibration sensors based on a sub-pixel edge detection method which based on Gaussian curve fitting is applied to extract the edges of motion sequence images to accurately measure the excitation acceleration of the sensors.

...read moreread less

Abstract: Calibration is required to determine the frequency characteristics of vibration sensors to ensure their measurement accuracy in engineering applications. Thus, a monocular vision-based calibration method for low-frequency vibration sensors is investigated in this study. A sub-pixel edge detection method which based on Gaussian curve fitting is applied to extract the edges of motion sequence images in order to accurately measure the excitation acceleration of the sensors. Because the motion sequence images and the output signal of the sensors cannot be collected synchronously, it is very difficult to align the excitation acceleration signal obtained from the images and the output signal in the time domain. Although the misalignment only has a negligible effect on the magnitude frequency characteristic calibration, it dramatically decreases the calibration accuracy of the phase frequency characteristic, especially with the increasing of the frequency. A time-spatial synchronization technique is proposed to accurately calibrate the phase frequency characteristic by determining the phase of the excitation acceleration signal at a specific spatial position and that of the output signal at the corresponding time. Finally, both the magnitude and phase frequency characteristics are simultaneously calibrated by the investigated method with a flexible and low-cost vision system. The experimental results compared with laser interferometry show that the investigated method accomplishes the high-accuracy magnitude and phase frequency characteristics calibration in a broad low-frequency range. Its calibration accuracy is superior to that of laser interferometry when the frequency is less than 0.3 Hz, and is equivalent at other frequencies.

...read moreread less

14 citations

Journal Article•DOI•

6-D pose estimation method for large gear structure assembly using monocular vision

[...]

Kuai Zhou¹, Xiang Huang¹, Shuanggao Li¹, Li Hangyu, Shengjie Kong¹ - Show less +1 more•Institutions (1)

Nanjing University of Aeronautics and Astronautics¹

01 Oct 2021-Measurement

TL;DR: A 6-D pose estimation method based on the monocular vision was proposed, which comprises a feature detection method and a pose calculation method, which is improved by using the weighting coefficient that was applied to the assembly of large gear structures.

...read moreread less

12 citations

Journal Article•DOI•

Autonomous quadrotor obstacle avoidance based on dueling double deep recurrent Q-learning with monocular vision

[...]

Jiajun Ou¹, Xiao Guo¹, Ming Zhu¹, Wenjie Lou¹•Institutions (1)

Beihang University¹

21 Jun 2021-Neurocomputing

TL;DR: A novel learning-based framework that enables the quadrotor to realize autonomous obstacle avoidance without any prior environment information or labeled datasets for training, and its model can be easily updated while facing new application scenarios.

...read moreread less

12 citations

Proceedings Article•DOI•

Neighbor-Vote: Improving Monocular 3D Object Detection through Neighbor Distance Voting

[...]

Xiaomeng Chu¹, Jiajun Deng¹, Yao Li¹, Zhenxun Yuan², Yanyong Zhang¹, Jianmin Ji¹, Yu Zhang¹ - Show less +3 more•Institutions (2)

University of Science and Technology of China¹, University of Sydney²

17 Oct 2021

TL;DR: Chen et al. as discussed by the authors proposed a neighbor-voting method that incorporates neighbor predictions to ameliorate object detection from severely deformed pseudo-LiDAR point clouds.

...read moreread less

Abstract: As cameras are increasingly deployed in new application domains such as autonomous driving, performing 3D object detection on monocular images becomes an important task for visual scene understanding. Recent advances on monocular 3D object detection mainly rely on the "pseudo-LiDAR'' generation, which performs monocular depth estimation and lifts the 2D pixels to pseudo 3D points. However, depth estimation from monocular images, due to its poor accuracy, leads to inevitable position shift of pseudo-LiDAR points within the object. Therefore, the predicted bounding boxes may suffer from inaccurate location and deformed shape. In this paper, we present a novel neighbor-voting method that incorporates neighbor predictions to ameliorate object detection from severely deformed pseudo-LiDAR point clouds. Specifically, each feature point around the object forms their own predictions, and then the "consensus'' is achieved through voting. In this way, we can effectively combine the neighbors' predictions with local prediction and achieve more accurate 3D detection. To further enlarge the difference between the foreground region of interest (ROI) pseudo-LiDAR points and the background points, we also encode the ROI prediction scores of 2D foreground pixels into the corresponding pseudo-LiDAR points. We conduct extensive experiments on the KITTI benchmark to validate the merits of our proposed method. Our results on the bird's eye view detection outperform the state-of-the-art performance, especially for the "hard" level detection. The code is available at https://github.com/cxmomo/Neighbor-Vote.

...read moreread less

12 citations

Journal Article•DOI•

Robust Visual-Inertial Navigation System for Low Precision Sensors under Indoor and Outdoor Environments

[...]

Changhui Xu, Zhenbin Liu, Zengke Li

20 Feb 2021-Remote Sensing

TL;DR: In this paper, a new initialization scheme that can calculate the acceleration bias as a variable during the initialization process so that it can be applied to low-cost IMU sensors is proposed.

...read moreread less

Abstract: Simultaneous Localization and Mapping (SLAM) has always been the focus of the robot navigation for many decades and becomes a research hotspot in recent years. Because a SLAM system based on vision sensor is vulnerable to environment illumination and texture, the problem of initial scale ambiguity still exists in a monocular SLAM system. The fusion of a monocular camera and an inertial measurement unit (IMU) can effectively solve the scale blur problem, improve the robustness of the system, and achieve higher positioning accuracy. Based on a monocular visual-inertial navigation system (VINS-mono), a state-of-the-art fusion performance of monocular vision and IMU, this paper designs a new initialization scheme that can calculate the acceleration bias as a variable during the initialization process so that it can be applied to low-cost IMU sensors. Besides, in order to obtain better initialization accuracy, visual matching positioning method based on feature point is used to assist the initialization process. After the initialization process, it switches to optical flow tracking visual positioning mode to reduce the calculation complexity. By using the proposed method, the advantages of feature point method and optical flow method can be fused. This paper, the first one to use both the feature point method and optical flow method, has better performance in the comprehensive performance of positioning accuracy and robustness under the low-cost sensors. Through experiments conducted with the EuRoc dataset and campus environment, the results show that the initial values obtained through the initialization process can be efficiently used for launching nonlinear visual-inertial state estimator and positioning accuracy of the improved VINS-mono has been improved by about 10% than VINS-mono.

...read moreread less

11 citations

Proceedings Article•DOI•

IRS: A Large Naturalistic Indoor Robotics Stereo Dataset to Train Deep Models for Disparity and Surface Normal Estimation

[...]

Qiang Wang¹, Zheng Shizhen¹, Qingsong Yan, Fei Deng, Kaiyong Zhao¹, Xiaowen Chu¹ - Show less +2 more•Institutions (1)

Hong Kong Baptist University¹

05 Jul 2021

TL;DR: A large-scale indoor robotics stereo (IRS) dataset with over 100K stereo images and high-quality surface normal and disparity maps is introduced, and DTN-Net, a two-stage deep model for surface normal estimation is presented.

...read moreread less

Abstract: Indoor robotics localization, navigation, and interaction heavily rely on scene understanding and reconstruction. Compared to the monocular vision which usually does not explicitly introduce any geometrical constraint, stereo vision-based schemes are more promising and robust to produce accurate geometrical information, such as surface normal and depth/disparity. Besides, deep learning models trained with large-scale datasets have shown their superior performance in many stereo vision tasks. However, existing stereo datasets rarely contain the high-quality surface normal and disparity ground truth, which hardly satisfies the demand of training a prospective deep model for indoor scenes. To this end, we introduce a large-scale synthetic but naturalistic indoor robotics stereo (IRS) dataset with over 100K stereo RGB images and high-quality surface normal and disparity maps. Leveraging the advanced rendering techniques of our customized rendering engine, the dataset is considerably close to the real-world captured images and covers several visual effects, such as brightness changes, light reflection/transmission, lens flare, vivid shadow, etc. We compare the data distribution of IRS with existing stereo datasets to illustrate the typical visual attributes of indoor scenes. Besides, we present DTN-Net, a two-stage deep model for surface normal estimation. Extensive experiments show the advantages and effectiveness of IRS in training deep models for disparity estimation, and DTN-Net provides state-of-the-art results for normal estimation compared to existing methods.

...read moreread less

11 citations

Posted Content•

Deep Learning on Monocular Object Pose Detection and Tracking: A Comprehensive Overview.

[...]

Zhaoxin Fan, Yazhi Zhu, Yulin He, Qi Sun, Hongyan Liu, Jun He - Show less +2 more

29 May 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a comprehensive review of recent progress in object pose detection and tracking that belongs to the deep learning technical route is presented, together with insightful observations and inspiring future research directions.

...read moreread less

Abstract: Object pose detection and tracking has recently attracted increasing attention due to its wide applications in many areas, such as autonomous driving, robotics, and augmented reality. Among methods for object pose detection and tracking, deep learning is the most promising one that has shown better performance than others. However, there is lack of survey study about latest development of deep learning based methods. Therefore, this paper presents a comprehensive review of recent progress in object pose detection and tracking that belongs to the deep learning technical route. To achieve a more thorough introduction, the scope of this paper is limited to methods taking monocular RGB/RGBD data as input, covering three kinds of major tasks: instance-level monocular object pose detection, category-level monocular object pose detection, and monocular object pose tracking. In our work, metrics, datasets, and methods about both detection and tracking are presented in detail. Comparative results of current state-of-the-art methods on several publicly available datasets are also presented, together with insightful observations and inspiring future research directions.

...read moreread less

10 citations

Journal Article•DOI•

Two AUVs Guidance Method for Self-Reconfiguration Mission Based on Monocular Vision

[...]

Ranzhen Ren¹, Lichuan Zhang¹, Lu Liu¹, Yijie Yuan¹•Institutions (1)

Northwestern Polytechnic University¹

15 Apr 2021-IEEE Sensors Journal

TL;DR: In this paper, a method for AUV to reconstruct the docking ring autonomously under the guidance of visual information is proposed, which is based on the knowledge of self-reconfiguration of two AUVs in underwater environment.

...read moreread less

Abstract: In this work, we propose a method for AUV to reconstruct the docking ring autonomously under the guidance of visual information. The proposed method is based on the knowledge of self-reconfiguration of two AUVs in underwater environment. In order to enable long distance performance, we make use of blue-green light beacons. The light beacons are detected by using YOLO V3 algorithm. In addition, P4P algorithm is applied to resolve the problem of relative poses of AUVs. In case of short distances, we use Aruco marker to precisely guide the AUV towards the docking ring. In addition, we use KCF algorithm to shorten the recognition time, and Kalman filter method to eliminate the interference of occlusions and jitters. The feasibility of the proposed method is verified by performing multiple experiments. Under the reasonable control flow and visual algorithm, the proposed solution avoids the situation where the accuracy of long-distance monocular recognition is low and is easy to be misjudged. The underwater multi-target integrated guidance method based on monocular vision is convenient for the recovery of underwater AUV, and can effectively take into account the advantages of wide effective guidance range and high precision of close range guidance.

...read moreread less

9 citations

Journal Article•DOI•

DeepFoveaNet: Deep Fovea Eagle-Eye Bioinspired Model to Detect Moving Objects

[...]

Abimael Guzman-Pando, Mario I. Chacon-Murguia

01 Jan 2021-IEEE Transactions on Image Processing

TL;DR: DeepFoveaNet as mentioned in this paper is a convolutional neural network model to detect moving objects in video sequences, which can detect very small moving objects through its deep fovea model that other algorithms cannot detect.

...read moreread less

Abstract: Birds of prey especially eagles and hawks have a visual acuity two to five times better than humans. Among the peculiar characteristics of their biological vision are that they have two types of foveae; one shallow fovea used in their binocular vision, and a deep fovea for monocular vision. The deep fovea allows these birds to see objects at long distances and to identify them as possible prey. Inspired by the biological functioning of the deep fovea a model called DeepFoveaNet is proposed in this paper. DeepFoveaNet is a convolutional neural network model to detect moving objects in video sequences. DeepFoveaNet emulates the monocular vision of birds of prey through two Encoder-Decoder convolutional neural network modules. This model combines the capacity of magnification of the deep fovea and the context information of the peripheral vision. Unlike algorithms to detect moving objects, ranked in the first places of the Change Detection database ( CDnet14 ), DeepFoveaNet does not depend on previously trained neural networks, neither on a huge number of training images for its training. Besides, its architecture allows it to learn spatiotemporal information of the video. DeepFoveaNet was evaluated in the CDnet14 database achieving high performance and was ranked as one of the ten best algorithms. The characteristics and results of DeepFoveaNet demonstrated that the model is comparable to the state-of-the-art algorithms to detect moving objects, and it can detect very small moving objects through its deep fovea model that other algorithms cannot detect.

...read moreread less

Journal Article•DOI•

Real-Time Monocular Vision System for UAV Autonomous Landing in Outdoor Low-Illumination Environments

[...]

Shanggang Lin¹, Lianwen Jin¹, Ziwei Chen¹•Institutions (1)

South China University of Technology¹

16 Sep 2021-Sensors

TL;DR: In this paper, the authors used a model-based enhancement scheme to improve the quality and brightness of onboard captured images, then presented a hierarchical-based method consisting of a decision tree with an associated light-weight convolutional neural network (CNN) for coarse-to-fine landing marker localization, where the key information of the marker is extracted and reserved for post-processing.

...read moreread less

Abstract: Landing an unmanned aerial vehicle (UAV) autonomously and safely is a challenging task. Although the existing approaches have resolved the problem of precise landing by identifying a specific landing marker using the UAV’s onboard vision system, the vast majority of these works are conducted in either daytime or well-illuminated laboratory environments. In contrast, very few researchers have investigated the possibility of landing in low-illumination conditions by employing various active light sources to lighten the markers. In this paper, a novel vision system design is proposed to tackle UAV landing in outdoor extreme low-illumination environments without the need to apply an active light source to the marker. We use a model-based enhancement scheme to improve the quality and brightness of the onboard captured images, then present a hierarchical-based method consisting of a decision tree with an associated light-weight convolutional neural network (CNN) for coarse-to-fine landing marker localization, where the key information of the marker is extracted and reserved for post-processing, such as pose estimation and landing control. Extensive evaluations have been conducted to demonstrate the robustness, accuracy, and real-time performance of the proposed vision system. Field experiments across a variety of outdoor nighttime scenarios with an average luminance of 5 lx at the marker locations have proven the feasibility and practicability of the system.

...read moreread less

Journal Article•DOI•

A 3D reconstruction method for buildings based on monocular vision

[...]

Boqiang Xu¹, Chao Liu¹•Institutions (1)

Tongji University¹

28 Jun 2021-Computer-aided Civil and Infrastructure Engineering

Journal Article•DOI•

Calibration-Free Monocular Vision-Based Robot Manipulations With Occlusion Awareness

[...]

Yongle Luo¹, Kun Dong¹, Lili Zhao¹, Zhiyong Sun¹, Erkang Cheng¹, Honglin Kan¹, Chao Zhou¹, Bo Song¹ - Show less +4 more•Institutions (1)

Chinese Academy of Sciences¹

25 May 2021-IEEE Access

TL;DR: In this article, a calibration-free monocular vision-based robot manipulation approach is proposed based on domain randomization and deep reinforcement learning (DRL) to estimate the spatial information of the target from a single monocular camera arbitrarily mounted in the manipulation environment.

...read moreread less

Abstract: Vision-based manipulation has been largely used in various robot applications. Normally, in order to obtain the spatial information of the operated target, a carefully calibrated stereo vision system is required. However, it limits the application of robots in the unstructured environment which limits both the number and the pose of the camera. In this study, a calibration-free monocular vision-based robot manipulation approach is proposed based on domain randomization and deep reinforcement learning (DRL). Firstly, a learning strategy combined domain randomization is developed to estimate the spatial information of the target from a single monocular camera arbitrarily mounted in a large area of the manipulation environment. Secondly, to address the monocular occlusion problem which regularly happens during robot manipulations, an occlusion awareness DRL policy has been designed to control the robot to avoid occlusions actively in the manipulation tasks. The performance of our method has been evaluated on two common manipulation tasks, reaching and lifting of a target building block, which show the efficiency and effectiveness of our proposed approach.

...read moreread less

Journal Article•DOI•

High-precision location for occluded reference hole based on robust extraction algorithm

[...]

Huacheng Lou, Min Lu, Haihua Cui, Tao Jiang, Wei Tian, Yi Huang - Show less +2 more

01 Mar 2021-Measurement Science and Technology

Journal Article•DOI•

Obstacle Distance Measurement Under Varying Illumination Conditions Based on Monocular Vision Using a Cable Inspection Robot

[...]

Le Huang¹, Gongping Wu¹, Wenjie Tang¹, Yi Wu¹•Institutions (1)

Wuhan University¹

05 Apr 2021-IEEE Access

TL;DR: In this paper, a research method that fuses image enhancement with robot monocular vision so that the robot can adapt to various levels of illumination running along the transmission line is presented.

...read moreread less

Abstract: Obstacle distance measurement is one of the key technologies for autonomous navigation of high-voltage transmission line inspection robots. To address the robustness of obstacle distance measurement under varying illumination conditions, this article develops a research method that fuses image enhancement with robot monocular vision so that the robot can adapt to various levels of illumination running along the transmission line. During the inspection of high-voltage transmission lines in such an overexposed (excessively bright) environment, a specular highlight suppression method is proposed to suppress the specular reflections in an image; when scene illumination is insufficient, a robust low-light image enhancement method based on a tone mapping algorithm with weighted guided filtering is presented. Based on the monocular vision measurement principle, the error generation mechanism is analyzed through experiments, and we introduce the parameter modification mechanism. The two proposed image enhancement methods outperform other state-of-the-art enhancement algorithms in qualitative and quantitative analyses. The experimental results show that the measurement error is less than 3% for static distance measurements and less than 5% for dynamic distance measurements within 6 m. The proposed method can meet the requirements of high-accuracy positioning, real-time performance and strong robustness. This method greatly contributes to the sustainable development of inspection robots in the power industry.

...read moreread less

Journal Article•DOI•

MGG: Monocular Global Geolocation for Outdoor Long-Range Targets

[...]

Feng Gao¹, Fang Deng¹, Linhan Li¹, Zhang Lele¹, Jiaqi Zhu¹, Chengpu Yu¹ - Show less +2 more•Institutions (1)

Beijing Institute of Technology¹

07 Jul 2021-IEEE Transactions on Image Processing

TL;DR: Zhang et al. as mentioned in this paper designed a camera pose correction method via pixel mapping to correct the pose of the camera and then used anchor-based methods to improve the detection ability for long-range targets with small image regions.

...read moreread less

Abstract: Traditional monocular vision localization methods are usually suitable for short-range area and indoor relative positioning tasks. This paper presents MGG, a novel monocular global geolocation method for outdoor long-range targets. This method takes a single RGB image combined with necessary navigation parameters as input and outputs targets’ GPS information under the Global Navigation Satellite System (GNSS). In MGG, we first design a camera pose correction method via pixel mapping to correct the pose of the camera. Then, we use anchor-based methods to improve the detection ability for long-range targets with small image regions. Next, the local monocular vision model (LMVM) with a local structure coefficient is proposed to establish an accurate 2D-to-3D mapping relationship. Subsequently, a soft correspondence constraint (SCC) is presented to solve the local structure coefficient, which can weaken the coupling degree between detection and localization. Finally, targets can be geolocated through optimization theory-based methods and a series of coordinate transformations. Furthermore, we demonstrate the importance of focal length on solving the error explosion problem in locating long-range targets with monocular vision. Extensive experiments on the challenging KITTI dataset as well as applications in outdoor environments with targets located at a long range of up to 150 meters show the superiority of our method.

...read moreread less

Proceedings Article•DOI•

Vision-Based Sense and Avoid with Monocular Vision and Real-Time Object Detection for UAVs

[...]

Wai Lun Leong¹, Pengfei Wang¹, Sunan Huang¹, Zhengtian Ma¹, Hong Yang¹, Jingxuan Sun¹, Yu Zhou¹, Mohamed Redhwan Abdul Hamid¹, Sutthiphong Srigrarom¹, Rodney Teo¹ - Show less +6 more•Institutions (1)

National University of Singapore¹

15 Jun 2021

TL;DR: In this article, the authors implemented, integrated and evaluated the effectiveness of using a low cost, wide angle monocular camera with real-time computer vision algorithms to detect and track other UAVs in local airspace and perform collision avoidance.

...read moreread less

Abstract: The use of unmanned aerial vehicles (UAVs) or drones have become ubiquitous in the recent years. Collision avoidance is a critical component of path planning, allowing multi-agent networks of cooperative UAVs to work together towards common objectives while avoiding each other. We implemented, integrated and evaluated the effectiveness of using a low cost, wide angle monocular camera with real-time computer vision algorithms to detect and track other UAVs in local airspace and perform collision avoidance in the event of a communications degradation or the presence of non-cooperative adversaries, through experimental flight tests where the UAVs were set on collision courses.

...read moreread less

Journal Article•DOI•

Research on 3D Reconstruction methods Based on Binocular Structured Light Vision

[...]

Ruilu Han, Hongjuan Yan, Liping Ma

01 Feb 2021

TL;DR: Binocular structured light vision is used to study 3D reconstruction based on monocular vision to obtain 3D coordinates based on the triangulation principle that there is only one intersection between the camera straight line equation and the projector plane equation in space.

...read moreread less

Abstract: There are many studies on 3D reconstruction based on monocular vision, but for complex surface parts, contour occlusion problems will occur, which requires binocular or multi-eye vision for 3D reconstruction. This paper mainly uses binocular structured light vision to study 3D reconstruction. The specific methods are: use structured light coding to calibrate the binocular camera, the projector and the left and right cameras to obtain the calibration parameters, and then obtain the 3D coordinates based on the triangulation principle that there is only one intersection between the camera straight line equation and the projector plane equation in space. Because of the binocular structured light used in this article, the point clouds of the left and right cameras need to be merged to complete the stitching process of the two points clouds. Through the verification of this methods, the stitching error is good, and the point cloud can be streamlined later to improve the point cloud registration rate.

...read moreread less

Journal Article•DOI•

Adaptive 3D Visual Servoing of a Scara Robot Manipulator with Unknown Dynamic and Vision System Parameters

[...]

Jorge Antonio Sarapura, Flavio Roberti, Ricardo Carelli

27 Jul 2021

TL;DR: An adaptive dynamic controller based on monocular vision for the tracking of objects with a three-degrees of freedom (DOF) Scara robot manipulator that considers the robot dynamics, the depth of the moving object, and the mounting of the fixed camera to be unknown.

...read moreread less

Abstract: In the present work, we develop an adaptive dynamic controller based on monocular vision for the tracking of objects with a three-degrees of freedom (DOF) Scara robot manipulator. The main characteristic of the proposed control scheme is that it considers the robot dynamics, the depth of the moving object, and the mounting of the fixed camera to be unknown. The design of the control algorithm is based on an adaptive kinematic visual servo controller whose objective is the tracking of moving objects even with uncertainties in the parameters of the camera and its mounting. The design also includes a dynamic controller in cascade with the former one whose objective is to compensate the dynamics of the manipulator by generating the final control actions to the robot even with uncertainties in the parameters of its dynamic model. Using Lyapunov’s theory, we analyze the two proposed adaptive controllers for stability properties, and, through simulations, the performance of the complete control scheme is shown.

...read moreread less

Proceedings Article•DOI•

Obstacle Detection and Avoidance For Mobile Robots Using Monocular Vision

[...]

Venkat Raman Nagarajan, Pavan Singh

01 Jul 2021

TL;DR: In this paper, a robust approach for obstacle detection and avoidance algorithm using a single camera was proposed, which is able to use edges as keypoints along with pixel gradient and achieved promising results.

...read moreread less

Abstract: This paper proposes a robust approach for obstacle detection and avoidance algorithm using a single camera. Monocular Vision using single camera architecture cannot identify depth with a single image and thus depends on pixel gradient or keypoint extractors to identify traversable path and obstacles. Pixel gradient does not work well where there are shadows and sharp illumination changes and keypoint extractor does not work well in the absence of dense texture. In this paper we propose an algorithm that is able to use edges as keypoints along with pixel gradient. The entire algorithm was successfully tested on Sphero RVR Rover platform that uses Raspberry Pi and a color camera with IR. The proposed method performs well in obstacle detection and obstacle avoidance and is potentially an alternative to a binocular solution.

...read moreread less

Journal Article•DOI•

Prediction of stopping distance for autonomous emergency braking using stereo camera pedestrian detection

[...]

Sivaramakrishnan Rajendar, Dhivya Rathinasamy¹, R. Pavithra², Vishnu Kumar Kaliappan, S. Gnanamurthy - Show less +1 more•Institutions (2)

Dr. Mahalingam College of Engineering and Technology¹, Bannari Amman Institute of Technology, Sathy²

30 Jul 2021-Materials Today: Proceedings

TL;DR: A stereo-vision based pedestrian detection and collision avoidance system for AVs that uses two cameras fixed at a specific distance apart to scan the environment and is promising in terms of prediction accuracy and minimizing fatalities.

...read moreread less

Journal Article•DOI•

Calibration of five-axis motion platform based on monocular vision

[...]

Qiang Lu¹, Haibo Zhou¹, Zhiqiang Li¹, Xia Ju¹, Shuaixia Tan, Ji’an Duan¹ - Show less +2 more•Institutions (1)

Central South University¹

14 Oct 2021-The International Journal of Advanced Manufacturing Technology

TL;DR: A low-cost pose measurement method, based on monocular vision, which can accurately determine the pose in the environment, even with image shadow and noise is proposed and an improved method, combining pose measurement and kinematic parameters identification, is proposed to calibrate a five-axis motion platform.

...read moreread less

Abstract: In order to solve the problem of high measurement cost and complex operation of position-independent geometric errors (PIGEs) calibration on a five-axis motion platform, this paper first proposes a low-cost pose measurement method, based on monocular vision, which can accurately determine the pose in the environment, even with image shadow and noise. Next, an improved method, combining pose measurement and kinematic parameters identification, is proposed to calibrate a five-axis motion platform. The kinematic error model of the platform and the pose planning of automatic image acquisition are established, providing the pose data and motor position data, required for calibration. Combined with the kinematic loop method, the kinematic parameters of the five-axis motion platform are identified, while the geometric structure parameters are accurately calibrated. Before and after calibration, a circular trajectory of the target coordinate system (TCS) origin, relative to the camera coordinate system (CCS), is used to test the comprehensive accuracy evolution of the five-axis motion platform, by comparing the position and orientation errors of the theoretical circle trajectory to the actual one. The experimental data show that, before and after calibration, the average position error of the five-axis motion platform is reduced by 79.46%, while the average direction error is reduced by 86.53%. The experimental results clearly demonstrate that the proposed calibration method significantly improves the comprehensive motion accuracy of the five-axis motion platform, and they verify the practical value and effectiveness of the calibration scheme.

...read moreread less

Journal Article•DOI•

Conscious perception and perceptual echoes: a binocular rivalry study.

[...]

Canhuang Luo, Rufin VanRullen, Andrea Alamia

30 Mar 2021-Neuroscience of Consciousness

TL;DR: In this paper, the authors investigated the relationship between conscious perception and the generation and amplitude of perceptual echoes (PE) and found that the alpha power of the perceived stimulus was comparable with that of the non-perceived stimulus and higher than the PE induced by the suppressed stimulus.

...read moreread less

Abstract: Alpha rhythms (∼10Hz) in the human brain are classically associated with idling activities, being predominantly observed during quiet restfulness with closed eyes. However, recent studies demonstrated that alpha (∼10Hz) rhythms can directly relate to visual stimulation, resulting in oscillations, which can last for as long as one second. This alpha reverberation, dubbed perceptual echoes (PE), suggests that the visual system actively samples and processes visual information within the alpha-band frequency. Although PE have been linked to various visual functions, their underlying mechanisms and functional role are not completely understood. In this study, we investigated the relationship between conscious perception and the generation and the amplitude of PE. Specifically, we displayed two coloured Gabor patches with different orientations on opposite sides of the screen, and using a set of dichoptic mirrors, we induced a binocular rivalry between the two stimuli. We asked participants to continuously report which one of two Gabor patches they consciously perceived, while recording their EEG signals. Importantly, the luminance of each patch fluctuated randomly over time, generating random sequences from which we estimated two impulse-response functions (IRFs) reflecting the PE generated by the perceived (dominant) and non-perceived (suppressed) stimulus, respectively. We found that the alpha power of the PE generated by the consciously perceived stimulus was comparable with that of the PE generated during monocular vision (control condition) and higher than the PE induced by the suppressed stimulus. Moreover, confirming previous findings, we found that all PEs propagated as a travelling wave from posterior to frontal brain regions, irrespective of conscious perception. All in all our results demonstrate a correlation between conscious perception and PE, suggesting that the synchronization of neural activity plays an important role in visual sampling and conscious perception.

...read moreread less

Journal Article•DOI•

Real-Time 3D Reconstruction Method Based on Monocular Vision.

[...]

Jia Qingyu¹, Liang Chang¹, Baohua Qiang¹, Shihao Zhang¹, Wu Xie¹, Xianyi Yang¹, Yangchang Sun², Minghao Yang² - Show less +4 more•Institutions (2)

Guilin University of Electronic Technology¹, Chinese Academy of Sciences²

02 Sep 2021-Sensors

TL;DR: Zhang et al. as mentioned in this paper proposed a real-time 3D reconstruction method based on monocular vision, where a single RGB-D camera is used to collect visual information in real time, and the YOLACT++ network was used to identify and segment the visual information to extract part of the important visual information.

...read moreread less

Abstract: Real-time 3D reconstruction is one of the current popular research directions of computer vision, and it has become the core technology in the fields of virtual reality, industrialized automatic systems, and mobile robot path planning. Currently, there are three main problems in the real-time 3D reconstruction field. Firstly, it is expensive. It requires more varied sensors, so it is less convenient. Secondly, the reconstruction speed is slow, and the 3D model cannot be established accurately in real time. Thirdly, the reconstruction error is large, which cannot meet the requirements of scenes with accuracy. For this reason, we propose a real-time 3D reconstruction method based on monocular vision in this paper. Firstly, a single RGB-D camera is used to collect visual information in real time, and the YOLACT++ network is used to identify and segment the visual information to extract part of the important visual information. Secondly, we combine the three stages of depth recovery, depth optimization, and deep fusion to propose a three-dimensional position estimation method based on deep learning for joint coding of visual information. It can reduce the depth error caused by the depth measurement process, and the accurate 3D point values of the segmented image can be obtained directly. Finally, we propose a method based on the limited outlier adjustment of the cluster center distance to optimize the three-dimensional point values obtained above. It improves the real-time reconstruction accuracy and obtains the three-dimensional model of the object in real time. Experimental results show that this method only needs a single RGB-D camera, which is not only low cost and convenient to use, but also significantly improves the speed and accuracy of 3D reconstruction.

...read moreread less

Journal Article•DOI•

A novel tracking system for human following robots with fusion of MMW radar and monocular vision

[...]

Zhu Yipeng, Tao Wang, Shiqiang Zhu

16 Sep 2021-Information Retrieval

TL;DR: A robust tracking system with the fusion of MMW radars and cameras is proposed, which is cost-effective and requires no additional tags with people, and shows good application prospects in human following robots.

...read moreread less

Abstract: Purpose This paper aims to develop a robust person tracking method for human following robots. The tracking system adopts the multimodal fusion results of millimeter wave (MMW) radars and monocular cameras for perception. A prototype of human following robot is developed and evaluated by using the proposed tracking system. Design/methodology/approach Limited by angular resolution, point clouds from MMW radars are too sparse to form features for human detection. Monocular cameras can provide semantic information for objects in view, but cannot provide spatial locations. Considering the complementarity of the two sensors, a sensor fusion algorithm based on multimodal data combination is proposed to identify and localize the target person under challenging conditions. In addition, a closed-loop controller is designed for the robot to follow the target person with expected distance. Findings A series of experiments under different circumstances are carried out to validate the fusion-based tracking method. Experimental results show that the average tracking errors are around 0.1 m. It is also found that the robot can handle different situations and overcome short-term interference, continually track and follow the target person. Originality/value This paper proposed a robust tracking system with the fusion of MMW radars and cameras. Interference such as occlusion and overlapping are well handled with the help of the velocity information from the radars. Compared to other state-of-the-art plans, the sensor fusion method is cost-effective and requires no additional tags with people. Its stable performance shows good application prospects in human following robots.

...read moreread less

Journal Article•DOI•

Monocular guidance of reaches-to-grasp using visible support surface texture: data and model

[...]

Rachel A. Herth¹, Xiaoye Michael Wang¹, Xiaoye Michael Wang², Olivia Cherry¹, Geoffrey P. Bingham¹ - Show less +1 more•Institutions (2)

Indiana University¹, York University²

03 Jan 2021-Experimental Brain Research

TL;DR: In this article, the authors investigated monocular information for the continuous online guidance of reach-to-grasp and presented a dynamical control model thereof using optical texture projected from a support surface (i.e., a table) over which the participants reached to grasp target objects sitting on the table surface at different distances.

...read moreread less

Abstract: We investigated monocular information for the continuous online guidance of reaches-to-grasp and present a dynamical control model thereof. We defined an information variable using optical texture projected from a support surface (i.e. a table) over which the participants reached-to-grasp target objects sitting on the table surface at different distances. Using either binocular or monocular vision in the dark, participants rapidly reached-to-grasp a phosphorescent square target object with visibly phosphorescent thumb and index finger. Targets were one of three sizes. The target either sat flat on the support surface or was suspended a few centimeters above the surface at a slant. The later condition perturbed the visible relation of the target to the support surface. The support surface was either invisible in the dark or covered with a visible phosphorescent checkerboard texture. Reach-to-grasp trajectories were recorded and Maximum Grasp Apertures (MGA), Movement Times (MT), Time of MGA (TMGA), and Time of Peak Velocities (TPV) were analyzed. These measures were selected as most indicative of the participant’s certainty about the relation of hand to target object during the reaches. The findings were that, in general, especially monocular reaches were less certain (slower, earlier TMGA and TPV) than binocular reaches except with the target flat on the visible support surface where performance with monocular and binocular vision was equivalent. The hypothesized information was the difference in image width of optical texture (equivalent to density of optical texture) at the hand versus the target. A control dynamic equation was formulated representing proportional rate control of the reaches-to-grasp (akin to the model using binocular disparity formulated by Anderson and Bingham (Exp Brain Res 205: 291–306, 2010). Simulations were performed and presented using this model. Simulated performance was compared to actual performance and found to replicate it. To our knowledge, this is the first study of monocular information used for continuous online guidance of reaches-to-grasp, complete with a control dynamic model.

...read moreread less

Journal Article•DOI•

Improve the Estimation of Monocular Vision 6-DOF Pose Based on the Fusion of Camera and Laser Rangefinder

[...]

Zifa Zhu, Yuebo Ma, Rujin Zhao, Enhai Liu, Sikang Zeng, Jinhui Yi, Jian Ding - Show less +3 more

16 Sep 2021-Remote Sensing

TL;DR: In this article, a novel monocular camera and 1D laser rangefinder (LRF) fusion strategy is proposed to overcome this weakness and design a remote and ultra-high precision cooperative targets 6-DOF pose estimation sensor.

...read moreread less

Abstract: Monocular vision is one of the most commonly used noncontact six-degrees-of-freedom (6-DOF) pose estimation methods. However, the large translational DOF measurement error along the optical axis of the camera is one of its main weaknesses, which greatly limits the measurement accuracy of monocular vision measurement. In this paper, we propose a novel monocular camera and 1D laser rangefinder (LRF) fusion strategy to overcome this weakness and design a remote and ultra-high precision cooperative targets 6-DOF pose estimation sensor. Our approach consists of two modules: (1) a feature fusion module that precisely fuses the initial pose estimated from the camera and the depth information obtained by the LRF. (2) An optimization module that optimizes pose and system parameters. The performance of our proposed 6-DOF pose estimation method is validated using simulations and real-world experiments. The experimental results show that our fusion strategy can accurately integrate the information of the camera and the LRF. Further optimization carried out on this basis effectively reduces the measurement error of monocular vision 6-DOF pose measurement. The experimental results obtained from a prototype show that its translational and rotational DOF measurement accuracy can reach up to 0.02 mm and 15″, respectively, at a distance of 10 m.

...read moreread less

Proceedings Article•DOI•

Monocular Vision-based Obstacle Avoidance Scheme for Micro Aerial Vehicle Navigation

[...]

Samuel Karlsson¹, Christoforos Kanellakis¹, Sina Sharif Mansouri¹, George Nikolakopoulos¹•Institutions (1)

Luleå University of Technology¹

15 Jun 2021

TL;DR: In this paper, a monocular vision-based reactive planner for obstacle avoidance is proposed, which is structured around a Convolution Neural Network (CNN) for object detection and classification, used to identify the bounding box of the objects of interest in the image plane.

...read moreread less

Abstract: One of the challenges in deploying Micro Aerial Vehicless (MAVs) in unknown environments is the need of securing for collision-free paths with static and dynamic obstacles. This article proposes a monocular vision-based reactive planner for MAVs obstacle avoidance. The avoidance scheme is structured around a Convolution Neural Network (CNN) for object detection and classification (You Only Lock Once (YOLO)), used to identify the bounding box of the objects of interest in the image plane. Moreover, the YOLO is combined with a Kalman filter to robustify the object tracking, in case of losing the boundary boxes, by estimating their position and providing a fixed rate estimation. Since MAVs are fast and agile platforms, the object tracking should be performed in real-time for the collision avoidance. By processing the information of the bounding boxes with the image field of view and applying trigonometry operations, the pixel coordinates of the object are translated to heading commands, which results to a collision free maneuver. The efficacy of the proposed scheme has been extensively evaluated in the Gazebo simulation environment, as well as in experimental evaluations with a MAV equipped with a monocular camera.

...read moreread less

Journal Article•DOI•

A Monocular Vision Obstacle Avoidance Method Applied to Indoor Tracking Robot

[...]

Shubo Wang, Ling Wang, Xiongkui He, Yi Cao

26 Sep 2021

TL;DR: The experimental results show that an unmanned wheeled robot with a bionic transfer-convolution neural network as the control command output can realize autonomous obstacle avoidance in complex indoor scenes.

...read moreread less

Abstract: The overall safety of a building can be effectively evaluated through regular inspection of the indoor walls by unmanned ground vehicles (UGVs). However, when the UGV performs line patrol inspections according to the specified path, it is easy to be affected by obstacles. This paper presents an obstacle avoidance strategy for unmanned ground vehicles in indoor environments. The proposed method is based on monocular vision. Through the obtained environmental information in front of the unmanned vehicle, the obstacle orientation is determined, and the moving direction and speed of the mobile robot are determined based on the neural network output and confidence. This paper also innovatively adopts the method of collecting indoor environment images based on camera array and realizes the automatic classification of data sets by arranging cameras with different directions and focal lengths. In the training of a transfer neural network, aiming at the problem that it is difficult to set the learning rate factor of the new layer, the improved bat algorithm is used to find the optimal learning rate factor on a small sample data set. The simulation results show that the accuracy can reach 94.84%. Single-frame evaluation and continuous obstacle avoidance evaluation are used to verify the effectiveness of the obstacle avoidance algorithm. The experimental results show that an unmanned wheeled robot with a bionic transfer-convolution neural network as the control command output can realize autonomous obstacle avoidance in complex indoor scenes.

...read moreread less