scispace - formally typeset
Search or ask a question

Showing papers on "Monocular vision published in 2021"


Journal ArticleDOI
TL;DR: A novel application of the image-to-world homography which gives the monocular vision system the efficacy of counting vehicles by lane and estimating vehicle length and speed in real-world units.
Abstract: Cameras have been widely used in traffic operations. While many technologically smart camera solutions in the market can be integrated into Intelligent Transport Systems (ITS) for automated detection, monitoring and data generation, many Network Operations (a.k.a Traffic Control) Centres still use legacy camera systems as manual surveillance devices. In this paper, we demonstrate effective use of these older assets by applying computer vision techniques to extract traffic data from videos captured by legacy cameras. In our proposed vision-based pipeline, we adopt recent state-of-the-art object detectors and transfer-learning to detect vehicles, pedestrians, and cyclists from monocular videos. By weakly calibrating the camera, we demonstrate a novel application of the image-to-world homography which gives our monocular vision system the efficacy of counting vehicles by lane and estimating vehicle length and speed in real-world units. Our pipeline also includes a module which combines a convolutional neural network (CNN) classifier with projective geometry information to classify vehicles. We have tested it on videos captured at several sites with different traffic flow conditions and compared the results with the data collected by piezoelectric sensors. Our experimental results show that the proposed pipeline can process 60 frames per second for pre-recorded videos and yield high-quality metadata for further traffic analysis.

30 citations


Journal ArticleDOI
30 Jun 2021-Sensors
TL;DR: In this paper, an approach, through a monocular camera and marker, was proposed to estimate the pose parameters (including orientation and position) of the excavator manipulator, and the results showed that the maximum detectable depth of the system is greater than 11 m, the orientation error is less than 8.5° and the position error was less than 22 mm.
Abstract: Excavation is one of the broadest activities in the construction industry, often affected by safety and productivity. To address these problems, it is necessary for construction sites to automatically monitor the poses of excavator manipulators in real time. Based on computer vision (CV) technology, an approach, through a monocular camera and marker, was proposed to estimate the pose parameters (including orientation and position) of the excavator manipulator. To simulate the pose estimation process, a measurement system was established with a common camera and marker. Through comprehensive experiments and error analysis, this approach showed that the maximum detectable depth of the system is greater than 11 m, the orientation error is less than 8.5°, and the position error is less than 22 mm. A prototype of the system that proved the feasibility of the proposed method was tested. Furthermore, this study provides an alternative CV technology for monitoring construction machines.

14 citations


Journal ArticleDOI
TL;DR: A monocular vision-based calibration method for low-frequency vibration sensors based on a sub-pixel edge detection method which based on Gaussian curve fitting is applied to extract the edges of motion sequence images to accurately measure the excitation acceleration of the sensors.
Abstract: Calibration is required to determine the frequency characteristics of vibration sensors to ensure their measurement accuracy in engineering applications. Thus, a monocular vision-based calibration method for low-frequency vibration sensors is investigated in this study. A sub-pixel edge detection method which based on Gaussian curve fitting is applied to extract the edges of motion sequence images in order to accurately measure the excitation acceleration of the sensors. Because the motion sequence images and the output signal of the sensors cannot be collected synchronously, it is very difficult to align the excitation acceleration signal obtained from the images and the output signal in the time domain. Although the misalignment only has a negligible effect on the magnitude frequency characteristic calibration, it dramatically decreases the calibration accuracy of the phase frequency characteristic, especially with the increasing of the frequency. A time-spatial synchronization technique is proposed to accurately calibrate the phase frequency characteristic by determining the phase of the excitation acceleration signal at a specific spatial position and that of the output signal at the corresponding time. Finally, both the magnitude and phase frequency characteristics are simultaneously calibrated by the investigated method with a flexible and low-cost vision system. The experimental results compared with laser interferometry show that the investigated method accomplishes the high-accuracy magnitude and phase frequency characteristics calibration in a broad low-frequency range. Its calibration accuracy is superior to that of laser interferometry when the frequency is less than 0.3 Hz, and is equivalent at other frequencies.

14 citations


Journal ArticleDOI
TL;DR: A 6-D pose estimation method based on the monocular vision was proposed, which comprises a feature detection method and a pose calculation method, which is improved by using the weighting coefficient that was applied to the assembly of large gear structures.

12 citations


Journal ArticleDOI
TL;DR: A novel learning-based framework that enables the quadrotor to realize autonomous obstacle avoidance without any prior environment information or labeled datasets for training, and its model can be easily updated while facing new application scenarios.

12 citations


Proceedings ArticleDOI
17 Oct 2021
TL;DR: Chen et al. as discussed by the authors proposed a neighbor-voting method that incorporates neighbor predictions to ameliorate object detection from severely deformed pseudo-LiDAR point clouds.
Abstract: As cameras are increasingly deployed in new application domains such as autonomous driving, performing 3D object detection on monocular images becomes an important task for visual scene understanding. Recent advances on monocular 3D object detection mainly rely on the "pseudo-LiDAR'' generation, which performs monocular depth estimation and lifts the 2D pixels to pseudo 3D points. However, depth estimation from monocular images, due to its poor accuracy, leads to inevitable position shift of pseudo-LiDAR points within the object. Therefore, the predicted bounding boxes may suffer from inaccurate location and deformed shape. In this paper, we present a novel neighbor-voting method that incorporates neighbor predictions to ameliorate object detection from severely deformed pseudo-LiDAR point clouds. Specifically, each feature point around the object forms their own predictions, and then the "consensus'' is achieved through voting. In this way, we can effectively combine the neighbors' predictions with local prediction and achieve more accurate 3D detection. To further enlarge the difference between the foreground region of interest (ROI) pseudo-LiDAR points and the background points, we also encode the ROI prediction scores of 2D foreground pixels into the corresponding pseudo-LiDAR points. We conduct extensive experiments on the KITTI benchmark to validate the merits of our proposed method. Our results on the bird's eye view detection outperform the state-of-the-art performance, especially for the "hard" level detection. The code is available at https://github.com/cxmomo/Neighbor-Vote.

12 citations


Journal ArticleDOI
TL;DR: In this paper, a new initialization scheme that can calculate the acceleration bias as a variable during the initialization process so that it can be applied to low-cost IMU sensors is proposed.
Abstract: Simultaneous Localization and Mapping (SLAM) has always been the focus of the robot navigation for many decades and becomes a research hotspot in recent years. Because a SLAM system based on vision sensor is vulnerable to environment illumination and texture, the problem of initial scale ambiguity still exists in a monocular SLAM system. The fusion of a monocular camera and an inertial measurement unit (IMU) can effectively solve the scale blur problem, improve the robustness of the system, and achieve higher positioning accuracy. Based on a monocular visual-inertial navigation system (VINS-mono), a state-of-the-art fusion performance of monocular vision and IMU, this paper designs a new initialization scheme that can calculate the acceleration bias as a variable during the initialization process so that it can be applied to low-cost IMU sensors. Besides, in order to obtain better initialization accuracy, visual matching positioning method based on feature point is used to assist the initialization process. After the initialization process, it switches to optical flow tracking visual positioning mode to reduce the calculation complexity. By using the proposed method, the advantages of feature point method and optical flow method can be fused. This paper, the first one to use both the feature point method and optical flow method, has better performance in the comprehensive performance of positioning accuracy and robustness under the low-cost sensors. Through experiments conducted with the EuRoc dataset and campus environment, the results show that the initial values obtained through the initialization process can be efficiently used for launching nonlinear visual-inertial state estimator and positioning accuracy of the improved VINS-mono has been improved by about 10% than VINS-mono.

11 citations


Proceedings ArticleDOI
05 Jul 2021
TL;DR: A large-scale indoor robotics stereo (IRS) dataset with over 100K stereo images and high-quality surface normal and disparity maps is introduced, and DTN-Net, a two-stage deep model for surface normal estimation is presented.
Abstract: Indoor robotics localization, navigation, and interaction heavily rely on scene understanding and reconstruction. Compared to the monocular vision which usually does not explicitly introduce any geometrical constraint, stereo vision-based schemes are more promising and robust to produce accurate geometrical information, such as surface normal and depth/disparity. Besides, deep learning models trained with large-scale datasets have shown their superior performance in many stereo vision tasks. However, existing stereo datasets rarely contain the high-quality surface normal and disparity ground truth, which hardly satisfies the demand of training a prospective deep model for indoor scenes. To this end, we introduce a large-scale synthetic but naturalistic indoor robotics stereo (IRS) dataset with over 100K stereo RGB images and high-quality surface normal and disparity maps. Leveraging the advanced rendering techniques of our customized rendering engine, the dataset is considerably close to the real-world captured images and covers several visual effects, such as brightness changes, light reflection/transmission, lens flare, vivid shadow, etc. We compare the data distribution of IRS with existing stereo datasets to illustrate the typical visual attributes of indoor scenes. Besides, we present DTN-Net, a two-stage deep model for surface normal estimation. Extensive experiments show the advantages and effectiveness of IRS in training deep models for disparity estimation, and DTN-Net provides state-of-the-art results for normal estimation compared to existing methods.

11 citations


Posted Content
TL;DR: In this paper, a comprehensive review of recent progress in object pose detection and tracking that belongs to the deep learning technical route is presented, together with insightful observations and inspiring future research directions.
Abstract: Object pose detection and tracking has recently attracted increasing attention due to its wide applications in many areas, such as autonomous driving, robotics, and augmented reality. Among methods for object pose detection and tracking, deep learning is the most promising one that has shown better performance than others. However, there is lack of survey study about latest development of deep learning based methods. Therefore, this paper presents a comprehensive review of recent progress in object pose detection and tracking that belongs to the deep learning technical route. To achieve a more thorough introduction, the scope of this paper is limited to methods taking monocular RGB/RGBD data as input, covering three kinds of major tasks: instance-level monocular object pose detection, category-level monocular object pose detection, and monocular object pose tracking. In our work, metrics, datasets, and methods about both detection and tracking are presented in detail. Comparative results of current state-of-the-art methods on several publicly available datasets are also presented, together with insightful observations and inspiring future research directions.

10 citations


Journal ArticleDOI
TL;DR: In this paper, a method for AUV to reconstruct the docking ring autonomously under the guidance of visual information is proposed, which is based on the knowledge of self-reconfiguration of two AUVs in underwater environment.
Abstract: In this work, we propose a method for AUV to reconstruct the docking ring autonomously under the guidance of visual information. The proposed method is based on the knowledge of self-reconfiguration of two AUVs in underwater environment. In order to enable long distance performance, we make use of blue-green light beacons. The light beacons are detected by using YOLO V3 algorithm. In addition, P4P algorithm is applied to resolve the problem of relative poses of AUVs. In case of short distances, we use Aruco marker to precisely guide the AUV towards the docking ring. In addition, we use KCF algorithm to shorten the recognition time, and Kalman filter method to eliminate the interference of occlusions and jitters. The feasibility of the proposed method is verified by performing multiple experiments. Under the reasonable control flow and visual algorithm, the proposed solution avoids the situation where the accuracy of long-distance monocular recognition is low and is easy to be misjudged. The underwater multi-target integrated guidance method based on monocular vision is convenient for the recovery of underwater AUV, and can effectively take into account the advantages of wide effective guidance range and high precision of close range guidance.

9 citations


Journal ArticleDOI
TL;DR: DeepFoveaNet as mentioned in this paper is a convolutional neural network model to detect moving objects in video sequences, which can detect very small moving objects through its deep fovea model that other algorithms cannot detect.
Abstract: Birds of prey especially eagles and hawks have a visual acuity two to five times better than humans. Among the peculiar characteristics of their biological vision are that they have two types of foveae; one shallow fovea used in their binocular vision, and a deep fovea for monocular vision. The deep fovea allows these birds to see objects at long distances and to identify them as possible prey. Inspired by the biological functioning of the deep fovea a model called DeepFoveaNet is proposed in this paper. DeepFoveaNet is a convolutional neural network model to detect moving objects in video sequences. DeepFoveaNet emulates the monocular vision of birds of prey through two Encoder-Decoder convolutional neural network modules. This model combines the capacity of magnification of the deep fovea and the context information of the peripheral vision. Unlike algorithms to detect moving objects, ranked in the first places of the Change Detection database ( CDnet14 ), DeepFoveaNet does not depend on previously trained neural networks, neither on a huge number of training images for its training. Besides, its architecture allows it to learn spatiotemporal information of the video. DeepFoveaNet was evaluated in the CDnet14 database achieving high performance and was ranked as one of the ten best algorithms. The characteristics and results of DeepFoveaNet demonstrated that the model is comparable to the state-of-the-art algorithms to detect moving objects, and it can detect very small moving objects through its deep fovea model that other algorithms cannot detect.

Journal ArticleDOI
16 Sep 2021-Sensors
TL;DR: In this paper, the authors used a model-based enhancement scheme to improve the quality and brightness of onboard captured images, then presented a hierarchical-based method consisting of a decision tree with an associated light-weight convolutional neural network (CNN) for coarse-to-fine landing marker localization, where the key information of the marker is extracted and reserved for post-processing.
Abstract: Landing an unmanned aerial vehicle (UAV) autonomously and safely is a challenging task. Although the existing approaches have resolved the problem of precise landing by identifying a specific landing marker using the UAV’s onboard vision system, the vast majority of these works are conducted in either daytime or well-illuminated laboratory environments. In contrast, very few researchers have investigated the possibility of landing in low-illumination conditions by employing various active light sources to lighten the markers. In this paper, a novel vision system design is proposed to tackle UAV landing in outdoor extreme low-illumination environments without the need to apply an active light source to the marker. We use a model-based enhancement scheme to improve the quality and brightness of the onboard captured images, then present a hierarchical-based method consisting of a decision tree with an associated light-weight convolutional neural network (CNN) for coarse-to-fine landing marker localization, where the key information of the marker is extracted and reserved for post-processing, such as pose estimation and landing control. Extensive evaluations have been conducted to demonstrate the robustness, accuracy, and real-time performance of the proposed vision system. Field experiments across a variety of outdoor nighttime scenarios with an average luminance of 5 lx at the marker locations have proven the feasibility and practicability of the system.


Journal ArticleDOI
TL;DR: In this article, a calibration-free monocular vision-based robot manipulation approach is proposed based on domain randomization and deep reinforcement learning (DRL) to estimate the spatial information of the target from a single monocular camera arbitrarily mounted in the manipulation environment.
Abstract: Vision-based manipulation has been largely used in various robot applications. Normally, in order to obtain the spatial information of the operated target, a carefully calibrated stereo vision system is required. However, it limits the application of robots in the unstructured environment which limits both the number and the pose of the camera. In this study, a calibration-free monocular vision-based robot manipulation approach is proposed based on domain randomization and deep reinforcement learning (DRL). Firstly, a learning strategy combined domain randomization is developed to estimate the spatial information of the target from a single monocular camera arbitrarily mounted in a large area of the manipulation environment. Secondly, to address the monocular occlusion problem which regularly happens during robot manipulations, an occlusion awareness DRL policy has been designed to control the robot to avoid occlusions actively in the manipulation tasks. The performance of our method has been evaluated on two common manipulation tasks, reaching and lifting of a target building block, which show the efficiency and effectiveness of our proposed approach.


Journal ArticleDOI
TL;DR: In this paper, a research method that fuses image enhancement with robot monocular vision so that the robot can adapt to various levels of illumination running along the transmission line is presented.
Abstract: Obstacle distance measurement is one of the key technologies for autonomous navigation of high-voltage transmission line inspection robots. To address the robustness of obstacle distance measurement under varying illumination conditions, this article develops a research method that fuses image enhancement with robot monocular vision so that the robot can adapt to various levels of illumination running along the transmission line. During the inspection of high-voltage transmission lines in such an overexposed (excessively bright) environment, a specular highlight suppression method is proposed to suppress the specular reflections in an image; when scene illumination is insufficient, a robust low-light image enhancement method based on a tone mapping algorithm with weighted guided filtering is presented. Based on the monocular vision measurement principle, the error generation mechanism is analyzed through experiments, and we introduce the parameter modification mechanism. The two proposed image enhancement methods outperform other state-of-the-art enhancement algorithms in qualitative and quantitative analyses. The experimental results show that the measurement error is less than 3% for static distance measurements and less than 5% for dynamic distance measurements within 6 m. The proposed method can meet the requirements of high-accuracy positioning, real-time performance and strong robustness. This method greatly contributes to the sustainable development of inspection robots in the power industry.

Journal ArticleDOI
Feng Gao1, Fang Deng1, Linhan Li1, Zhang Lele1, Jiaqi Zhu1, Chengpu Yu1 
TL;DR: Zhang et al. as mentioned in this paper designed a camera pose correction method via pixel mapping to correct the pose of the camera and then used anchor-based methods to improve the detection ability for long-range targets with small image regions.
Abstract: Traditional monocular vision localization methods are usually suitable for short-range area and indoor relative positioning tasks. This paper presents MGG, a novel monocular global geolocation method for outdoor long-range targets. This method takes a single RGB image combined with necessary navigation parameters as input and outputs targets’ GPS information under the Global Navigation Satellite System (GNSS). In MGG, we first design a camera pose correction method via pixel mapping to correct the pose of the camera. Then, we use anchor-based methods to improve the detection ability for long-range targets with small image regions. Next, the local monocular vision model (LMVM) with a local structure coefficient is proposed to establish an accurate 2D-to-3D mapping relationship. Subsequently, a soft correspondence constraint (SCC) is presented to solve the local structure coefficient, which can weaken the coupling degree between detection and localization. Finally, targets can be geolocated through optimization theory-based methods and a series of coordinate transformations. Furthermore, we demonstrate the importance of focal length on solving the error explosion problem in locating long-range targets with monocular vision. Extensive experiments on the challenging KITTI dataset as well as applications in outdoor environments with targets located at a long range of up to 150 meters show the superiority of our method.

Proceedings ArticleDOI
15 Jun 2021
TL;DR: In this article, the authors implemented, integrated and evaluated the effectiveness of using a low cost, wide angle monocular camera with real-time computer vision algorithms to detect and track other UAVs in local airspace and perform collision avoidance.
Abstract: The use of unmanned aerial vehicles (UAVs) or drones have become ubiquitous in the recent years. Collision avoidance is a critical component of path planning, allowing multi-agent networks of cooperative UAVs to work together towards common objectives while avoiding each other. We implemented, integrated and evaluated the effectiveness of using a low cost, wide angle monocular camera with real-time computer vision algorithms to detect and track other UAVs in local airspace and perform collision avoidance in the event of a communications degradation or the presence of non-cooperative adversaries, through experimental flight tests where the UAVs were set on collision courses.

Journal ArticleDOI
01 Feb 2021
TL;DR: Binocular structured light vision is used to study 3D reconstruction based on monocular vision to obtain 3D coordinates based on the triangulation principle that there is only one intersection between the camera straight line equation and the projector plane equation in space.
Abstract: There are many studies on 3D reconstruction based on monocular vision, but for complex surface parts, contour occlusion problems will occur, which requires binocular or multi-eye vision for 3D reconstruction. This paper mainly uses binocular structured light vision to study 3D reconstruction. The specific methods are: use structured light coding to calibrate the binocular camera, the projector and the left and right cameras to obtain the calibration parameters, and then obtain the 3D coordinates based on the triangulation principle that there is only one intersection between the camera straight line equation and the projector plane equation in space. Because of the binocular structured light used in this article, the point clouds of the left and right cameras need to be merged to complete the stitching process of the two points clouds. Through the verification of this methods, the stitching error is good, and the point cloud can be streamlined later to improve the point cloud registration rate.

Journal ArticleDOI
27 Jul 2021
TL;DR: An adaptive dynamic controller based on monocular vision for the tracking of objects with a three-degrees of freedom (DOF) Scara robot manipulator that considers the robot dynamics, the depth of the moving object, and the mounting of the fixed camera to be unknown.
Abstract: In the present work, we develop an adaptive dynamic controller based on monocular vision for the tracking of objects with a three-degrees of freedom (DOF) Scara robot manipulator. The main characteristic of the proposed control scheme is that it considers the robot dynamics, the depth of the moving object, and the mounting of the fixed camera to be unknown. The design of the control algorithm is based on an adaptive kinematic visual servo controller whose objective is the tracking of moving objects even with uncertainties in the parameters of the camera and its mounting. The design also includes a dynamic controller in cascade with the former one whose objective is to compensate the dynamics of the manipulator by generating the final control actions to the robot even with uncertainties in the parameters of its dynamic model. Using Lyapunov’s theory, we analyze the two proposed adaptive controllers for stability properties, and, through simulations, the performance of the complete control scheme is shown.

Proceedings ArticleDOI
01 Jul 2021
TL;DR: In this paper, a robust approach for obstacle detection and avoidance algorithm using a single camera was proposed, which is able to use edges as keypoints along with pixel gradient and achieved promising results.
Abstract: This paper proposes a robust approach for obstacle detection and avoidance algorithm using a single camera. Monocular Vision using single camera architecture cannot identify depth with a single image and thus depends on pixel gradient or keypoint extractors to identify traversable path and obstacles. Pixel gradient does not work well where there are shadows and sharp illumination changes and keypoint extractor does not work well in the absence of dense texture. In this paper we propose an algorithm that is able to use edges as keypoints along with pixel gradient. The entire algorithm was successfully tested on Sphero RVR Rover platform that uses Raspberry Pi and a color camera with IR. The proposed method performs well in obstacle detection and obstacle avoidance and is potentially an alternative to a binocular solution.

Journal ArticleDOI
TL;DR: A stereo-vision based pedestrian detection and collision avoidance system for AVs that uses two cameras fixed at a specific distance apart to scan the environment and is promising in terms of prediction accuracy and minimizing fatalities.

Journal ArticleDOI
Qiang Lu1, Haibo Zhou1, Zhiqiang Li1, Xia Ju1, Shuaixia Tan, Ji’an Duan1 
TL;DR: A low-cost pose measurement method, based on monocular vision, which can accurately determine the pose in the environment, even with image shadow and noise is proposed and an improved method, combining pose measurement and kinematic parameters identification, is proposed to calibrate a five-axis motion platform.
Abstract: In order to solve the problem of high measurement cost and complex operation of position-independent geometric errors (PIGEs) calibration on a five-axis motion platform, this paper first proposes a low-cost pose measurement method, based on monocular vision, which can accurately determine the pose in the environment, even with image shadow and noise. Next, an improved method, combining pose measurement and kinematic parameters identification, is proposed to calibrate a five-axis motion platform. The kinematic error model of the platform and the pose planning of automatic image acquisition are established, providing the pose data and motor position data, required for calibration. Combined with the kinematic loop method, the kinematic parameters of the five-axis motion platform are identified, while the geometric structure parameters are accurately calibrated. Before and after calibration, a circular trajectory of the target coordinate system (TCS) origin, relative to the camera coordinate system (CCS), is used to test the comprehensive accuracy evolution of the five-axis motion platform, by comparing the position and orientation errors of the theoretical circle trajectory to the actual one. The experimental data show that, before and after calibration, the average position error of the five-axis motion platform is reduced by 79.46%, while the average direction error is reduced by 86.53%. The experimental results clearly demonstrate that the proposed calibration method significantly improves the comprehensive motion accuracy of the five-axis motion platform, and they verify the practical value and effectiveness of the calibration scheme.

Journal ArticleDOI
TL;DR: In this paper, the authors investigated the relationship between conscious perception and the generation and amplitude of perceptual echoes (PE) and found that the alpha power of the perceived stimulus was comparable with that of the non-perceived stimulus and higher than the PE induced by the suppressed stimulus.
Abstract: Alpha rhythms (∼10Hz) in the human brain are classically associated with idling activities, being predominantly observed during quiet restfulness with closed eyes. However, recent studies demonstrated that alpha (∼10Hz) rhythms can directly relate to visual stimulation, resulting in oscillations, which can last for as long as one second. This alpha reverberation, dubbed perceptual echoes (PE), suggests that the visual system actively samples and processes visual information within the alpha-band frequency. Although PE have been linked to various visual functions, their underlying mechanisms and functional role are not completely understood. In this study, we investigated the relationship between conscious perception and the generation and the amplitude of PE. Specifically, we displayed two coloured Gabor patches with different orientations on opposite sides of the screen, and using a set of dichoptic mirrors, we induced a binocular rivalry between the two stimuli. We asked participants to continuously report which one of two Gabor patches they consciously perceived, while recording their EEG signals. Importantly, the luminance of each patch fluctuated randomly over time, generating random sequences from which we estimated two impulse-response functions (IRFs) reflecting the PE generated by the perceived (dominant) and non-perceived (suppressed) stimulus, respectively. We found that the alpha power of the PE generated by the consciously perceived stimulus was comparable with that of the PE generated during monocular vision (control condition) and higher than the PE induced by the suppressed stimulus. Moreover, confirming previous findings, we found that all PEs propagated as a travelling wave from posterior to frontal brain regions, irrespective of conscious perception. All in all our results demonstrate a correlation between conscious perception and PE, suggesting that the synchronization of neural activity plays an important role in visual sampling and conscious perception.

Journal ArticleDOI
02 Sep 2021-Sensors
TL;DR: Zhang et al. as mentioned in this paper proposed a real-time 3D reconstruction method based on monocular vision, where a single RGB-D camera is used to collect visual information in real time, and the YOLACT++ network was used to identify and segment the visual information to extract part of the important visual information.
Abstract: Real-time 3D reconstruction is one of the current popular research directions of computer vision, and it has become the core technology in the fields of virtual reality, industrialized automatic systems, and mobile robot path planning. Currently, there are three main problems in the real-time 3D reconstruction field. Firstly, it is expensive. It requires more varied sensors, so it is less convenient. Secondly, the reconstruction speed is slow, and the 3D model cannot be established accurately in real time. Thirdly, the reconstruction error is large, which cannot meet the requirements of scenes with accuracy. For this reason, we propose a real-time 3D reconstruction method based on monocular vision in this paper. Firstly, a single RGB-D camera is used to collect visual information in real time, and the YOLACT++ network is used to identify and segment the visual information to extract part of the important visual information. Secondly, we combine the three stages of depth recovery, depth optimization, and deep fusion to propose a three-dimensional position estimation method based on deep learning for joint coding of visual information. It can reduce the depth error caused by the depth measurement process, and the accurate 3D point values of the segmented image can be obtained directly. Finally, we propose a method based on the limited outlier adjustment of the cluster center distance to optimize the three-dimensional point values obtained above. It improves the real-time reconstruction accuracy and obtains the three-dimensional model of the object in real time. Experimental results show that this method only needs a single RGB-D camera, which is not only low cost and convenient to use, but also significantly improves the speed and accuracy of 3D reconstruction.

Journal ArticleDOI
TL;DR: A robust tracking system with the fusion of MMW radars and cameras is proposed, which is cost-effective and requires no additional tags with people, and shows good application prospects in human following robots.
Abstract: Purpose This paper aims to develop a robust person tracking method for human following robots. The tracking system adopts the multimodal fusion results of millimeter wave (MMW) radars and monocular cameras for perception. A prototype of human following robot is developed and evaluated by using the proposed tracking system. Design/methodology/approach Limited by angular resolution, point clouds from MMW radars are too sparse to form features for human detection. Monocular cameras can provide semantic information for objects in view, but cannot provide spatial locations. Considering the complementarity of the two sensors, a sensor fusion algorithm based on multimodal data combination is proposed to identify and localize the target person under challenging conditions. In addition, a closed-loop controller is designed for the robot to follow the target person with expected distance. Findings A series of experiments under different circumstances are carried out to validate the fusion-based tracking method. Experimental results show that the average tracking errors are around 0.1 m. It is also found that the robot can handle different situations and overcome short-term interference, continually track and follow the target person. Originality/value This paper proposed a robust tracking system with the fusion of MMW radars and cameras. Interference such as occlusion and overlapping are well handled with the help of the velocity information from the radars. Compared to other state-of-the-art plans, the sensor fusion method is cost-effective and requires no additional tags with people. Its stable performance shows good application prospects in human following robots.

Journal ArticleDOI
TL;DR: In this article, the authors investigated monocular information for the continuous online guidance of reach-to-grasp and presented a dynamical control model thereof using optical texture projected from a support surface (i.e., a table) over which the participants reached to grasp target objects sitting on the table surface at different distances.
Abstract: We investigated monocular information for the continuous online guidance of reaches-to-grasp and present a dynamical control model thereof. We defined an information variable using optical texture projected from a support surface (i.e. a table) over which the participants reached-to-grasp target objects sitting on the table surface at different distances. Using either binocular or monocular vision in the dark, participants rapidly reached-to-grasp a phosphorescent square target object with visibly phosphorescent thumb and index finger. Targets were one of three sizes. The target either sat flat on the support surface or was suspended a few centimeters above the surface at a slant. The later condition perturbed the visible relation of the target to the support surface. The support surface was either invisible in the dark or covered with a visible phosphorescent checkerboard texture. Reach-to-grasp trajectories were recorded and Maximum Grasp Apertures (MGA), Movement Times (MT), Time of MGA (TMGA), and Time of Peak Velocities (TPV) were analyzed. These measures were selected as most indicative of the participant’s certainty about the relation of hand to target object during the reaches. The findings were that, in general, especially monocular reaches were less certain (slower, earlier TMGA and TPV) than binocular reaches except with the target flat on the visible support surface where performance with monocular and binocular vision was equivalent. The hypothesized information was the difference in image width of optical texture (equivalent to density of optical texture) at the hand versus the target. A control dynamic equation was formulated representing proportional rate control of the reaches-to-grasp (akin to the model using binocular disparity formulated by Anderson and Bingham (Exp Brain Res 205: 291–306, 2010). Simulations were performed and presented using this model. Simulated performance was compared to actual performance and found to replicate it. To our knowledge, this is the first study of monocular information used for continuous online guidance of reaches-to-grasp, complete with a control dynamic model.

Journal ArticleDOI
TL;DR: In this article, a novel monocular camera and 1D laser rangefinder (LRF) fusion strategy is proposed to overcome this weakness and design a remote and ultra-high precision cooperative targets 6-DOF pose estimation sensor.
Abstract: Monocular vision is one of the most commonly used noncontact six-degrees-of-freedom (6-DOF) pose estimation methods. However, the large translational DOF measurement error along the optical axis of the camera is one of its main weaknesses, which greatly limits the measurement accuracy of monocular vision measurement. In this paper, we propose a novel monocular camera and 1D laser rangefinder (LRF) fusion strategy to overcome this weakness and design a remote and ultra-high precision cooperative targets 6-DOF pose estimation sensor. Our approach consists of two modules: (1) a feature fusion module that precisely fuses the initial pose estimated from the camera and the depth information obtained by the LRF. (2) An optimization module that optimizes pose and system parameters. The performance of our proposed 6-DOF pose estimation method is validated using simulations and real-world experiments. The experimental results show that our fusion strategy can accurately integrate the information of the camera and the LRF. Further optimization carried out on this basis effectively reduces the measurement error of monocular vision 6-DOF pose measurement. The experimental results obtained from a prototype show that its translational and rotational DOF measurement accuracy can reach up to 0.02 mm and 15″, respectively, at a distance of 10 m.

Proceedings ArticleDOI
15 Jun 2021
TL;DR: In this paper, a monocular vision-based reactive planner for obstacle avoidance is proposed, which is structured around a Convolution Neural Network (CNN) for object detection and classification, used to identify the bounding box of the objects of interest in the image plane.
Abstract: One of the challenges in deploying Micro Aerial Vehicless (MAVs) in unknown environments is the need of securing for collision-free paths with static and dynamic obstacles. This article proposes a monocular vision-based reactive planner for MAVs obstacle avoidance. The avoidance scheme is structured around a Convolution Neural Network (CNN) for object detection and classification (You Only Lock Once (YOLO)), used to identify the bounding box of the objects of interest in the image plane. Moreover, the YOLO is combined with a Kalman filter to robustify the object tracking, in case of losing the boundary boxes, by estimating their position and providing a fixed rate estimation. Since MAVs are fast and agile platforms, the object tracking should be performed in real-time for the collision avoidance. By processing the information of the bounding boxes with the image field of view and applying trigonometry operations, the pixel coordinates of the object are translated to heading commands, which results to a collision free maneuver. The efficacy of the proposed scheme has been extensively evaluated in the Gazebo simulation environment, as well as in experimental evaluations with a MAV equipped with a monocular camera.

Journal ArticleDOI
26 Sep 2021
TL;DR: The experimental results show that an unmanned wheeled robot with a bionic transfer-convolution neural network as the control command output can realize autonomous obstacle avoidance in complex indoor scenes.
Abstract: The overall safety of a building can be effectively evaluated through regular inspection of the indoor walls by unmanned ground vehicles (UGVs). However, when the UGV performs line patrol inspections according to the specified path, it is easy to be affected by obstacles. This paper presents an obstacle avoidance strategy for unmanned ground vehicles in indoor environments. The proposed method is based on monocular vision. Through the obtained environmental information in front of the unmanned vehicle, the obstacle orientation is determined, and the moving direction and speed of the mobile robot are determined based on the neural network output and confidence. This paper also innovatively adopts the method of collecting indoor environment images based on camera array and realizes the automatic classification of data sets by arranging cameras with different directions and focal lengths. In the training of a transfer neural network, aiming at the problem that it is difficult to set the learning rate factor of the new layer, the improved bat algorithm is used to find the optimal learning rate factor on a small sample data set. The simulation results show that the accuracy can reach 94.84%. Single-frame evaluation and continuous obstacle avoidance evaluation are used to verify the effectiveness of the obstacle avoidance algorithm. The experimental results show that an unmanned wheeled robot with a bionic transfer-convolution neural network as the control command output can realize autonomous obstacle avoidance in complex indoor scenes.