scispace - formally typeset
Search or ask a question

Showing papers on "Monocular vision published in 2022"


Journal ArticleDOI
TL;DR: In this paper, a monocular vision-based method is investigated to measure the plane motion, which can get the displacement and angle as well as orbit simultaneously by using the Zernike moment method with sub-pixel accuracy and decoupling model.

13 citations


Journal ArticleDOI
TL;DR: In this article , the authors proposed an obstacle avoidance strategy for small multi-rotor drones with a monocular camera using deep reinforcement learning, which consists of two steps: depth estimation and navigation decision making.
Abstract: This paper proposes an obstacle avoidance strategy for small multi-rotor drones with a monocular camera using deep reinforcement learning. The proposed method is composed of two steps: depth estimation and navigation decision making. For the depth estimation step, a pre-trained depth estimation algorithm based on the convolutional neural network is used. On the navigation decision making step, a dueling double deep Q-network is employed with a well-designed reward function. The network is trained using the robot operating system and Gazebo simulation environment. To validate the performance and robustness of the proposed approach, simulations and real experiments have been carried out using a Parrot Bebop2 drone in various complex indoor environments. We demonstrate that the proposed algorithm successfully travels along the narrow corridors with the texture free walls, people, and boxes.

11 citations


Journal ArticleDOI
TL;DR: A comprehensive review of recent progress in object pose detection and tracking that belongs to the deep learning technical route is presented in this article , where the authors take monocular RGB/RGBD data as input and cover three kinds of major tasks: instance-level, category-level and monocular object pose tracking.
Abstract: Object pose detection and tracking has recently attracted increasing attention due to its wide applications in many areas, such as autonomous driving, robotics, and augmented reality. Among methods for object pose detection and tracking, deep learning is the most promising one that has shown better performance than others. However, survey study about the latest development of deep learning-based methods is lacking. Therefore, this study presents a comprehensive review of recent progress in object pose detection and tracking that belongs to the deep learning technical route. To achieve a more thorough introduction, the scope of this study is limited to methods taking monocular RGB/RGBD data as input and covering three kinds of major tasks: instance-level monocular object pose detection, category-level monocular object pose detection, and monocular object pose tracking. In our work, metrics, datasets, and methods of both detection and tracking are presented in detail. Comparative results of current state-of-the-art methods on several publicly available datasets are also presented, together with insightful observations and inspiring future research directions.

8 citations



Journal ArticleDOI
TL;DR: In this paper , a stereo-vision based pedestrian detection and collision avoidance system for AVs is proposed, which uses two cameras fixed at a specific distance apart to scan the environment.

4 citations


Book ChapterDOI
TL;DR: Zhang et al. as mentioned in this paper employed the front-view features from LiDAR models to help the monocular image backbone embed depth cues into image feature maps, which significantly boosted monocular 3D object detection performance without introducing any extra cost in the inference phase.
Abstract: Detecting 3D objects from monocular RGB images is an ill-posed task for lacking depth knowledge, and monocular-based 3D detection methods perform poorly compared with LiDAR-based 3D detection methods. Some bird’s-eye-view-based monocular 3D detection methods transform front-view image feature maps into bird’s-eye-view feature maps and then use LiDAR 3D detection heads to detect objects. These methods get relatively high performance. However, there is still a large gap between monocular and LiDAR bird’s-eye-view feature maps. Based on the fact that LiDAR bird’s-eye-view feature maps are marginally better than monocular, to bridge their gap and boost monocular 3D detection performance, on the one hand, we directly employ the bird’s-eye-view features from LiDAR models to guide the training of monocular detection models; on the other hand, we employ the front-view features from LiDAR models to help the monocular image backbone embed depth cues into image feature maps. Experimental results show that our method significantly boosts monocular 3D object detection performance without introducing any extra cost in the inference phase.

3 citations


Journal ArticleDOI
TL;DR: In this article , a monocular 3D object detection method that utilizes the discrete depth and orientation representation was proposed to predict object locations on 3D space utilizing keypoint detection on the object's center point.
Abstract: On-road object detection is a critical component in an autonomous driving system. The safety of the vehicle can only be as good as the reliability of the on-road object detection system. Thus, developing a fast and robust object detection algorithm has been the primary goal of many automotive industries and institutes. In recent years, multi-purpose vision-based driver assistance systems have gained popularity with the emergence of a deep neural network. A monocular camera has been developed to locate an object in the image plane and estimate the distance of the said object in the real world or the vehicle plane. In this work, we present a monocular 3D object detection method that utilizes the discrete depth and orientation representation. Our proposed method strives to predict object locations on 3D space utilizing keypoint detection on the object’s center point. To improve the point detection, we employ center regression on the objects segmentation mask, reducing the detection offset significantly. The simplicity of our proposed network architecture and its one-stage approach allows our algorithm to achieve competitive speed compared with prior methods. Our proposed method is able to achieve 26.93% detection score on the Cityscapes 3D object detection dataset, outperforming the preceding monocular method by a margin of 2.8 points.

3 citations


Journal ArticleDOI
TL;DR: In this paper , a measurement method using a camera(monocular or dual) was used to evaluate autonomous vehicles and integrated scenarios was proposed wherein the scenarios proceeded continuously, and the precision of the autonomous vehicle safety evaluation method using cameras was verified via comparisons and analyses with the results of real vehicle tests.
Abstract: Currently, the stage of technological development for commercialization of autonomous driving level 3 has been achieved. However, the legal and institutional bases and traffic safety facilities for safe driving on actual roads in autonomous driving mode are insufficient. Therefore, in this study, a measurement method using a camera(monocular or dual) was used to evaluate autonomous vehicles. In addition, integrated scenarios was proposed wherein the scenarios proceeded continuously. The precision of the autonomous vehicle safety evaluation method using cameras was verified via comparisons and analyses with the results of real vehicle tests. As a result of the test, the difference in the average error rate of inter-vehicle distance between the monocular camera and the dual camera was 0.34%. The difference in the average error rate of the distance to the lane was 0.3 to 0.5%, showing similar results. It is judged that it will be possible to compensate for each other’s shortcomings if they are used at the same time rather than the independent use of monocular cameras and dual cameras.

3 citations


Journal ArticleDOI
TL;DR: In this paper , a relative distance measurement adaptive to rough roads, using monocular vision and the features from connected vehicles, is proposed, where a connected-vehicle network can provide the fixed topology of feature points on the target vehicles for position estimation.
Abstract: Assessing the relative position of surrounding vehicles is a core requirement of autonomous vehicles to make decisions and plan driving trajectories. Many approaches based on monocular vision are designed for flat roads but contain unpredictable errors in some corner cases, such as sloping and uneven roads. In this article, the proposed method focuses on a relative-distance measurement adaptive to rough roads, using monocular vision and the features from connected vehicles. The proposed model takes the perspective-n-point approach as a basic framework. A connected-vehicle network can provide the fixed topology of feature points on the target vehicles for position estimation. The fixed topology replaces the requirement of cameras’ extrinsic parameters, which may change unpredictably on rough roads. The proposed approach is implemented on real vehicles driving on sloping and uneven roads. The results are compared with Mobileye, a widely used product for relative positioning. The experiments take the real-time kinematic GPS as ground truth and show that the proposed method achieves decimeter-level measurements and outperforms Mobileye on sloping and uneven roads. This article also shows great potential to improve environment perception using the connected-vehicle network in the future.

3 citations


Journal ArticleDOI
TL;DR: In this paper , a multi-object relative position detection algorithm (Deep-YOLO) is proposed to detect the relative positions of people and lights from the video and control the lights, thereby reducing the complexity of calibration and deployment.

3 citations


Journal ArticleDOI
19 May 2022-Machines
TL;DR: A multi-cue fusion monocular velocity and ranging framework is proposed to improve the accuracy of monocular ranging and velocity measurement and uses the attention mechanism to fuse different feature information.
Abstract: Many consumers and scholars currently focus on driving assistance systems (DAS) and intelligent transportation technologies. The distance and speed measurement technology of the vehicle ahead is an important part of the DAS. Existing vehicle distance and speed estimation algorithms based on monocular cameras still have limitations, such as ignoring the relationship between the underlying features of vehicle speed and distance. A multi-cue fusion monocular velocity and ranging framework is proposed to improve the accuracy of monocular ranging and velocity measurement. We use the attention mechanism to fuse different feature information. The training method is used to jointly train the network through the distance velocity regression loss function and the depth loss as an auxiliary loss function. Finally, experimental validation is performed on the Tusimple dataset and the KITTI dataset. On the Tusimple dataset, the average speed mean square error of the proposed method is less than 0.496 m2/s2, and the average mean square error of the distance is 5.695 m2. On the KITTI dataset, the average velocity mean square error of our method is less than 0.40 m2/s2. In addition, we test in different scenarios and confirm the effectiveness of the network.

Journal ArticleDOI
TL;DR: In this article , a point cloud confidence sampling strategy based on a 3D Gaussian distribution is proposed to assign small confidence to the points with great error in depth estimation and filter them out according to the confidence.


Journal ArticleDOI
TL;DR: In this article , a hybrid relative navigation algorithm is proposed based on the data fusing principle of monocular cameras and Lidar sensors for a large-scale free tumbling non-cooperative target.


Journal ArticleDOI
Ming Xing Liu, Shuang Yue, Shu Li, Yuying Du, Bin Li 
TL;DR: Wang et al. as mentioned in this paper proposed an intelligent detection method of concrete aggregate level based on monocular imaging, which uses a monocular camera, installed at 45-degree angle, construct a specific projection model of camera and storage bin to establish a mapping relationship between the image coordinates and actual imaging angle, and then combine peak and valley positioning information derived from YOLOv5 to find the height of aggregate level.

Journal ArticleDOI
TL;DR: In this article , the authors proposed a monocular vision pose determination-based large rigid-body automatic docking method (MVPDLD) in which targets with circular feature points are attached to the surfaces of both mobile and fixed rigid-bodies.

Journal ArticleDOI
TL;DR: In this paper , an implantable metaverse featuring retinal prostheses in association with bionic vision processing is presented, where the electrodes are rearranged to match the distribution of ganglion cells.
Abstract: We present an implantable metaverse featuring retinal prostheses in association with bionic vision processing. Unlike conventional retinal prostheses, whose electrodes are spaced equidistantly, our solution is to rearrange the electrodes to match the distribution of ganglion cells. To naturally imitate the human vision, a scheme of bionic vision processing is developed. On top of a three-dimensional eye model, our bionic vision processing is able to visualize the monocular image, binocular image fusion, and parallax-induced depth map.

Proceedings ArticleDOI
23 May 2022
TL;DR: In this paper , a method for AUVs to autonomously navigate and achieve diver-relative positioning to begin interaction with humans is presented. But this method is based only on monocular vision, requires no global localization, and is computationally efficient.
Abstract: Direct communication between humans and autonomous underwater vehicles (AUVs) is a relatively under-explored area in human-robot interaction research, although many tasks (e.g., surveillance, inspection, and search-and-rescue) require close diver-robot collaboration. Suboptimal AUV positioning relative to its human collaborators can lead to poor quality interaction and lead to excessive cognitive and physical load for divers. In this paper, we introduce a novel method for AUVs to autonomously navigate and achieve diver-relative positioning to begin interaction. Our method is based only on monocular vision, requires no global localization, and is computationally efficient. We present our algorithm and its implementation on board a physical AUV, performing extensive evaluations in the form of closed-water tests in a controlled pool. Our results show that the proposed monocular vision-based algorithm performs reliably and efficiently, operating entirely on-board the AUV.

Journal ArticleDOI
TL;DR: In this article , a cascaded convolutional neural network (CNN) method for robotic grasping based on monocular vision and small data set of scattered parts is proposed, which can be divided into three steps: object detection, monocular depth estimation and keypoint estimation.
Abstract: Purpose Scattered parts are laid randomly during the manufacturing process and have difficulty to recognize and manipulate. This study aims to complete the grasp of the scattered parts by a manipulator with a camera and learning method. Design/methodology/approach In this paper, a cascaded convolutional neural network (CNN) method for robotic grasping based on monocular vision and small data set of scattered parts is proposed. This method can be divided into three steps: object detection, monocular depth estimation and keypoint estimation. In the first stage, an object detection network is improved to effectively locate the candidate parts. Then, it contains a neural network structure and corresponding training method to learn and reason high-resolution input images to obtain depth estimation. The keypoint estimation in the third step is expressed as a cumulative form of multi-scale prediction from a network to use an red green blue depth (RGBD) map that is acquired from the object detection and depth map estimation. Finally, a grasping strategy is studied to achieve successful and continuous grasping. In the experiments, different workpieces are used to validate the proposed method. The best grasping success rate is more than 80%. Findings By using the CNN-based method to extract the key points of the scattered parts and calculating the possibility of grasp, the successful rate is increased. Practical implications This method and robotic systems can be used in picking and placing of most industrial automatic manufacturing or assembly processes. Originality/value Unlike standard parts, scattered parts are randomly laid and have difficulty recognizing and grasping for the robot. This study uses a cascaded CNN network to extract the keypoints of the scattered parts, which are also labeled with the possibility of successful grasping. Experiments are conducted to demonstrate the grasping of those scattered parts.

Journal ArticleDOI
01 Oct 2022-Optik
TL;DR: In this paper , the authors describe an internal analysis algorithm for mobile robots that combines various convolutional neural network (CNN) layers with the decision-making process in a hierarchical way.

Journal ArticleDOI
TL;DR: In this paper , the distance between the buoys and the camera was calculated based on monocular and stereo vision using the detected image coordinates and compared with those from a laser distance sensor and radar.
Abstract: Aqua farms will be the most frequently encountered obstacle when autonomous ships sail along the coastal area of Korea. We used YOLOv5 to create a model that detects aquaculture buoys. The distances between the buoys and the camera were calculated based on monocular and stereo vision using the detected image coordinates and compared with those from a laser distance sensor and radar. A dataset containing 2700 images of aquaculture buoys was divided between training and testing data in the ratio of 8:2. The trained model had precision, recall, and mAP of 0.936, 0.903, and 94.3%, respectively. Monocular vision calculates the distance based on camera position estimation and water surface coordinates of maritime objects, while stereo vision calculates the distance by finding corresponding points using SSD, NCC, and ORB and then calculating the disparity. The stereo vision had small error rates of −3.16% and −14.81% for short (NCC) and long distances (ORB); however, large errors were detected for objects located at a far distance. Monocular vision had error rates of 2.86% and −4.00% for short and long distances, respectively. Monocular vision is more effective than stereo vision for detecting maritime obstacles and can be employed as auxiliary sailing equipment along with radar.

Proceedings ArticleDOI
08 Apr 2022
TL;DR: Zhang et al. as discussed by the authors proposed a monocular vision based perception system for nighttime driving, which can detect vehicles and pedestrians on the road with distance in the nighttime driving dataset.
Abstract: The perception of objects around the vehicle is important for both advanced driving assistance system (ADAS) and autonomous driving systems. However, it is a huge challenge to recognize objects in a low-light environment. In this paper, we propose a monocular vision based perception system for nighttime driving. First, the transnational nighttime driving videos are collected, which are further processed into a low-light enhancement dataset and a nighttime object detection dataset for deep learning. Then, the GAN-based EnlightenGAN is trained for enhancing the low-light image. The CNN-based YOLOX is trained and inferred to detect objects. Next, to obtain reliable pixel points on objects, dense feature points are output on the enhanced image by LoFTR, which extracts global features through the transformer. On the other hand, the deep learning-based monodepth2 is used to generate a depth map from the enhanced image. Finally, the relative distance of the object is obtained by projecting and filtering the feature points on the depth map. The experiments show that this system can robustly detect vehicles and pedestrians on the road with distance in the nighttime driving dataset, which can provide effective information for vehicles and robots in nighttime scenes.


Journal ArticleDOI
TL;DR: This work proposes an effective monocular vision assisted method to measure the depth of an Unmanned Aerial Vehicle (UAV) from an impending frontal obstacle, followed by collision-free navigation in unknown GPS-denied environments.
Abstract: Monocular vision-based 3D scene understanding has been an integral part of many machine vision applications. Always, the objective is to measure the depth using a single RGB camera, which is at par with the depth cameras. In this regard, monocular vision-guided autonomous navigation of robots is rapidly gaining popularity among the research community. We propose an effective monocular vision assisted method to measure the depth of an Unmanned Aerial Vehicle (UAV) from an impending frontal obstacle. This is followed by collision-free navigation in unknown GPS-denied environments. Our approach deals upon the fundamental principle of perspective vision that the size of an object relative to its field of view (FoV) increases as the center of projection moves closer towards the object. Our contribution involves modeling the depth followed by its realization through scale-invariant SURF features. Noisy depth measurements arising due to external wind, or the turbulence in the UAV, are rectified by employing a constant velocity based Kalman filter model. Necessary control commands are then designed based on the rectified depth value to avoid the obstacle before collision. Rigorous experiments with SURF scale-invariant features reveal an overall accuracy of \(88.6\% \) with varying obstacles, in both indoor and outdoor environments.

Journal ArticleDOI
01 Jan 2022-Sensors
TL;DR: A monocular visual pose measurement method consisting of the high precision optimal solution to the PnP problem (OPnP) method and the highly robust distance matching (DM) method that can meet the measurement accuracy requirement of the vision system of the disc cutter changing robot in practical engineering application.
Abstract: The visual measurement system plays a vital role in the disc cutter changing robot of the shield machine, and its accuracy directly determines the success rate of the disc cutter grasping. However, the actual industrial environment with strong noise brings a great challenge to the pose measurement methods. The existing methods are difficult to meet the required accuracy of pose measurement based on machine vision under the disc cutter changing conditions. To solve this problem, we propose a monocular visual pose measurement method consisting of the high precision optimal solution to the PnP problem (OPnP) method and the highly robust distance matching (DM) method. First, the OPnP method is used to calculate the rough pose of the shield machine’s cutter holder, and then the DM method is used to measure its pose accurately. Simulation results show that the proposed monocular measurement method has better accuracy and robustness than the several mainstream PnP methods. The experimental results also show that the maximum error of the proposed method is 0.28° in the direction of rotation and 0.32 mm in the direction of translation, which can meet the measurement accuracy requirement of the vision system of the disc cutter changing robot in practical engineering application.

Journal ArticleDOI
TL;DR: The experimental results demonstrated that the improved VM obstacle detection method could identify and eliminate unknown obstacles and is more accurate than more advanced detection methods.
Abstract: An obstacle detection method based on VM (VIDAR and machine learning joint detection model) is proposed to improve the monocular vision system's identification accuracy. When VIDAR (Vision-IMU-based detection and range method) detects unknown obstacles in a reflective environment, the reflections of the obstacles are identified as obstacles, reducing the accuracy of obstacle identification. We proposed an obstacle detection method called improved VM to avoid this situation. The experimental results demonstrated that the improved VM could identify and eliminate unknown obstacles. Compared with more advanced detection methods, the improved VM obstacle detection method is more accurate. It can detect unknown obstacles in reflection, reflective road environments.

Proceedings ArticleDOI
15 Aug 2022
TL;DR: Zhang et al. as discussed by the authors presented a coordinate system calibration algorithm to obtain the relative pose relationship between the monocular camera and the laser transmitter, and a modified YOLO-v4 algorithm is designed to further improve the speed and accuracy of target detection.
Abstract: With the overall development of artificial intelligence technology, the autonomous application of robotic arms has been significantly improved, which places extremely superior requirements on the reliability of algorithms such as target detection and target positioning. In this paper, based on a 6-DOF collaborative manipulator, we propose a 3D target aiming algorithm with high real-time performance through monocular vision feed-forward control. First, we present a coordinate system calibration algorithm to obtain the relative pose relationship between the monocular camera and the laser transmitter. Second, a modified YOLO-v4 algorithm is designed to further improve the speed and accuracy of target detection. Third, we propose a monocular aiming algorithm, which can directly aim at a 3D target based on the target’s 2D pixel deviation. Through experimental verification, the single-frame detection time of our algorithm can be reduced to 14.12 ms, and the response time of monocular aiming algorithm is only 0.61 s.

Journal ArticleDOI
TL;DR: MLPeng et al. as discussed by the authors proposed a lightweight monocular depth prediction method based on hierarchical multi-stage MLP, and utilizes depth-wise convolution to improve local modeling capabilities and reduce parameters and computational costs.
Abstract: Monocular sensors depth prediction has received continuous attention in recent years because of its wide application in autonomous driving, intelligent system navigation and other fields. Convolutional neural networks have dominated monocular depth prediction for a long time, and the recent introduction of Transformer-based and MLP-based architectures in the field of computer vision has provided some new ideas for monocular depth prediction. However, they all have a series of problems such as high computational complexity and excessive parameters. In this paper, we propose MLP-Depth, which is a lightweight monocular depth prediction method based on hierarchical multi-stage MLP, and utilizes depth-wise convolution to improve local modeling capabilities and reduce parameters and computational costs. In addition, we also design a multi-scale inverse attention mechanism to implicitly improve the global expressiveness of MLP-Depth. Our method effectively reduces the number of parameters of monocular depth prediction network using transformer-like architectures, and extensive experiments show that MLP-Depth can achieve competitive results with fewer parameters in challenging outdoor and indoor datasets.

Journal ArticleDOI
TL;DR: The results show that the method proposed in this paper has good real-time performance and can estimate the relative pose of non-cooperative space targets accurately and robustly.
Abstract: In many space missions such as fly-around observing and approaching the targets, the relative pose estimation of non-cooperative space targets is one of the key technologies. In this paper, a relative pose estimation method of non-cooperative space targets based on a monocular camera and a laser rangefinder is proposed. The monocular camera is used to obtain the sequence image of the target. The laser rangefinder is used to solve the scale fuzziness problem of the monocular camera and construct the world coordinate system in real scale during initialization. The camera data and the laser rangefinder data are fused in a tightly coupled form to optimize the estimated pose in continuous pose estimation. The non-cooperative space target images generated by Blender are used for simulation. The results show that the method proposed in this paper has good real-time performance and can estimate the relative pose of non-cooperative space targets accurately and robustly. Compared with the existing methods based on monocular vision, the proposed method does not require the initial pose assumption and can effectively improve the accuracy of the pose estimation.