scispace - formally typeset
Search or ask a question

Showing papers on "Monocular vision published in 2011"


Journal ArticleDOI
TL;DR: The here‐presented work describes the first aerial vehicle that uses onboard monocular vision as a main sensor to navigate through an unknown GPS‐denied environment and independently of any external artificial aids.
Abstract: Autonomous micro aerial vehicles (MAVs) will soon play a major role in tasks such as search and rescue, environment monitoring, surveillance, and inspection. They allow us to easily access environments to which no humans or other vehicles can get access. This reduces the risk for both the people and the environment. For the above applications, it is, however, a requirement that the vehicle is able to navigate without using GPS, or without relying on a preexisting map, or without specific assumptions about the environment. This will allow operations in unstructured, unknown, and GPS-denied environments. We present a novel solution for the task of autonomous navigation of a micro helicopter through a completely unknown environment by using solely a single camera and inertial sensors onboard. Many existing solutions suffer from the problem of drift in the xy plane or from the dependency on a clean GPS signal. The novelty in the here-presented approach is to use a monocular simultaneous localization and mapping (SLAM) framework to stabilize the vehicle in six degrees of freedom. This way, we overcome the problem of both the drift and the GPS dependency. The pose estimated by the visual SLAM algorithm is used in a linear optimal controller that allows us to perform all basic maneuvers such as hovering, set point and trajectory following, vertical takeoff, and landing. All calculations including SLAM and controller are running in real time and online while the helicopter is flying. No offline processing or preprocessing is done. We show real experiments that demonstrate that the vehicle can fly autonomously in an unknown and unstructured environment. To the best of our knowledge, the here-presented work describes the first aerial vehicle that uses onboard monocular vision as a main sensor to navigate through an unknown GPS-denied environment and independently of any external artificial aids. © 2011 Wiley Periodicals, Inc. © 2011 Wiley Periodicals, Inc.

422 citations


Proceedings ArticleDOI
09 May 2011
TL;DR: The latest achievements towards the goal of autonomous flights of an MAV in unknown environments, only having a monocular camera as exteroceptive sensor are presented, and a solution to overcome the issue of having a low frequent onboard visual pose update versus the high agility of anMAV is presented.
Abstract: In this paper, we present our latest achievements towards the goal of autonomous flights of an MAV in unknown environments, only having a monocular camera as exteroceptive sensor. As MAVs are highly agile, it is not sufficient to directly use the visual input for position control at the framerates that can be achieved with small onboard computers. Our contributions in this work are twofold. First, we present a solution to overcome the issue of having a low frequent onboard visual pose update versus the high agility of an MAV. This is solved by filtering visual information with inputs from inertial sensors. Second, as our system is based on monocular vision, we present a solution to estimate the metric visual scale aid of an air pressure sensor. All computation is running onboard and is tightly integrated on the MAV to avoid jitter and latencies. This framework enables stable flights indoors and outdoors even under windy conditions.

313 citations


Proceedings ArticleDOI
09 May 2011
TL;DR: This paper presents a solution to tackle the lack of metric scale in monocular vision pose estimation by adding an inertial sensor equipped with a three-axis accelerometer and gyroscope and shows how to detect failures and estimate drifts in it.
Abstract: Single camera solutions - such as monocular visual odometry or mono SLAM approaches - found a wide echo in the community. All the monocular approaches, however, suffer from the lack of metric scale. In this paper, we present a solution to tackle this issue by adding an inertial sensor equipped with a three-axis accelerometer and gyroscope. In contrast to previous approaches, our solution is independent of the underlying vision algorithm which estimates the camera poses. As a direct consequence, the algorithm presented here operates at a constant computational complexity in real time. We treat the visual framework as a black box and thus the approach is modular and widely applicable to existing monocular solutions. It can be used with any pose estimation algorithm such as visual odometry, visual SLAM, monocular or stereo setups or even GPS solutions with gravity and compass attitude estimation. In this paper, we show the thorough development of the metric state estimation based on an Extended Kalman Filter. Furthermore, even though we treat the visual framework as a black box, we show how to detect failures and estimate drifts in it. We implement our solution on a monocular vision pose estimation framework and show the results both in simulation and on real data.

231 citations


Proceedings ArticleDOI
06 Nov 2011
TL;DR: This paper presents a graphical model that relates photometric cues learned from labeled data, stereo photo-consistency across multiple views, and depth cues derived from structure-from-motion point clouds, allowing exact, global inference in ∼100 ms (in addition to feature computation of under one second) without using specialized hardware.
Abstract: This paper addresses scene understanding in the context of a moving camera, integrating semantic reasoning ideas from monocular vision with 3D information available through structure-from-motion. We combine geometric and photometric cues in a Bayesian framework, building on recent successes leveraging the indoor Manhattan assumption in monocular vision. We focus on indoor environments and show how to extract key boundaries while ignoring clutter and decorations. To achieve this we present a graphical model that relates photometric cues learned from labeled data, stereo photo-consistency across multiple views, and depth cues derived from structure-from-motion point clouds. We show how to solve MAP inference using dynamic programming, allowing exact, global inference in ∼100 ms (in addition to feature computation of under one second) without using specialized hardware. Experiments show our system out-performing the state-of-the-art.

153 citations


Journal ArticleDOI
TL;DR: It is concluded that some human adults deprived of normal binocular vision can recover stereopsis at least partially through perceptual learning, the repetitive practice of a demanding visual task.
Abstract: Stereopsis, the perception of depth based on the disparity of the images projected to the retinas of the two eyes, is an important process in our three-dimensional world; however, 3–5% of the population is stereoblind or has seriously impaired stereovision. Here we provide evidence for the recovery of stereopsis through perceptual learning, the repetitive practice of a demanding visual task, in human adults long deprived of normal binocular vision. We used a training paradigm that combines monocular cues that were correlated perfectly with the disparity cues. Following perceptual learning (thousands of trials) with stereoscopic gratings, five adults who initially were stereoblind or stereoanomalous showed substantial recovery of stereopsis, both on psychophysical tests with stimuli that contained no monocular cues and on clinical testing. They reported that depth “popped out” in daily life, and enjoyed 3D movies for the first time. After training, stereo tests with dynamic random-dot stereograms and band-pass noise revealed the properties of the recovered stereopsis: It has reduced resolution and precision, although it is based on perceiving depth by detecting binocular disparity. We conclude that some human adults deprived of normal binocular vision can recover stereopsis at least partially.

123 citations


Journal ArticleDOI
21 Sep 2011-Sensors
TL;DR: A three-level fusion strategy based on visual attention mechanism and driver’s visual consciousness is provided for MMW radar and monocular vision fusion so as to obtain better comprehensive performance.
Abstract: This paper presents a systematic scheme for fusing millimeter wave (MMW) radar and a monocular vision sensor for on-road obstacle detection. As a whole, a three-level fusion strategy based on visual attention mechanism and driver's visual consciousness is provided for MMW radar and monocular vision fusion so as to obtain better comprehensive performance. Then an experimental method for radar-vision point alignment for easy operation with no reflection intensity of radar and special tool requirements is put forward. Furthermore, a region searching approach for potential target detection is derived in order to decrease the image processing time. An adaptive thresholding algorithm based on a new understanding of shadows in the image is adopted for obstacle detection, and edge detection is used to assist in determining the boundary of obstacles. The proposed fusion approach is verified through real experimental examples of on-road vehicle/pedestrian detection. In the end, the experimental results show that the proposed method is simple and feasible.

114 citations


Journal ArticleDOI
TL;DR: A novel approach to monocular SLAM using corner, lamp, and door features simultaneously to achieve stable navigation in various environments is proposed and showed that the proposed scheme resulted in dependable navigation.
Abstract: We examine monocular vision-based simultaneous localization and mapping (SLAM) of a mobile robot using an upward-looking camera. Although a monocular camera looking up toward the ceiling can provide a low-cost solution to indoor SLAM, this approach is often unable to achieve dependable navigation due to a lack of reliable visual features on the ceiling. We propose a novel approach to monocular SLAM using corner, lamp, and door features simultaneously to achieve stable navigation in various environments. We use the corner features and the circular-shaped brightest parts of the ceiling image for detection of lamp features. Furthermore, vertical and horizontal lines are combined to robustly detect line-based door features to reduce the problem that line features can be easily misidentified due to nearby edges. The use of these three types of features as landmarks increases our ability to observe the features in various environments and maintains the stability of the SLAM process. A series of experiments in indoor environments showed that the proposed scheme resulted in dependable navigation.

108 citations


Journal ArticleDOI
TL;DR: This letter proposes a binocular JND (BJND) model based on psychophysical experiments conducted to model the basic binocular vision properties in response to asymmetric noises in a pair of stereoscopic images, and develops a BJND model that measures the perceptible distortion of binocular sight for stereoscopicImages.
Abstract: Conventional 2-D Just-Noticeable-Difference (JND) models measure the perceptible distortion of visual signal based on monocular vision properties by presenting a single image for both eyes. However, they are not applicable for stereoscopic displays in which a pair of stereoscopic images is presented to a viewer's left and right eyes, respectively. Some unique binocular vision properties, e.g., binocular combination and rivalry, need to be considered in the development of a JND model for stereoscopic images. In this letter, we propose a binocular JND (BJND) model based on psychophysical experiments which are conducted to model the basic binocular vision properties in response to asymmetric noises in a pair of stereoscopic images. The first experiment exploits the joint visibility thresholds according to the luminance masking effect and the binocular combination of noises. The second experiment examines the reduction of visual sensitivity in binocular vision due to the contrast masking effect. Based on these experiments, the developed BJND model measures the perceptible distortion of binocular vision for stereoscopic images. Subjective evaluations on stereoscopic images validate of the proposed BJND model.

100 citations


Proceedings ArticleDOI
05 Jun 2011
TL;DR: The paper furthermore quantifies the benefit of stereo vision for ROI generation and localization; at equal detection rates, false positives are reduced by a factor of 4–5 with stereo over mono, using the same HOG/linSVM classification component.
Abstract: Pedestrian detection is a rapidly evolving area in the intelligent vehicles domain. Stereo vision is an attractive sensor for this purpose. But unlike for monocular vision, there are no realistic, large scale benchmarks available for stereo-based pedestrian detection, to provide a common point of reference for evaluation. This paper introduces the Daimler Stereo-Vision Pedestrian Detection benchmark, which consists of several thousands of pedestrians in the training set, and a 27-min test drive through urban environment and associated vehicle data. The data, including ground truth, is made publicly available for non-commercial purposes. The paper furthermore quantifies the benefit of stereo vision for ROI generation and localization; at equal detection rates, false positives are reduced by a factor of 4–5 with stereo over mono, using the same HOG/linSVM classification component.

82 citations


Proceedings ArticleDOI
05 Dec 2011
TL;DR: The objective of this paper is the full 6D relative localization of mobile devices, and direct robot-robot localization in particular, with a novel relative localization system that consists of two complementary modules: a monocular vision module and a target module with four active or passive markers.
Abstract: The objective of this paper is the full 6D relative localization of mobile devices, and direct robot-robot localization in particular. We present a novel relative localization system that consists of two complementary modules: a monocular vision module and a target module with four active or passive markers. The core localization algorithm running on the modules determines the marker positions in the camera image and derives the relative robot pose in 3D space. The system is supported by a prediction mechanism based on regression. The modules are tested successfully in experiments with a quadrotor helicopter as well as on a team of two e-puck robots performing a coverage task. The relative localization system provides accuracies of a few centimeters in position and up to a few degrees in orientation. Furthermore, the system is lightweight, with low complexity and system requirements, which enables its application to a wide range of mobile robot platforms.

52 citations


Proceedings ArticleDOI
05 Dec 2011
TL;DR: A novel, deterministic closed-form solution for computing the scale factor and the gravity direction of a moving, loosely-coupled, and monocular vision-inertial system, using only a single inertial integration period, and no absolute orientation information is required.
Abstract: In this work, we present a novel, deterministic closed-form solution for computing the scale factor and the gravity direction of a moving, loosely-coupled, and monocular vision-inertial system. The methodology is based on analysing delta-velocities. On one hand, they are obtained from a differentiation of the up-to-scale camera pose computation by a visual odometry or visual SLAM algorithm. On the other hand, they can also be retrieved from the gravity-affected short-term integration of acceleration signals. We derive a method for separating the gravity contribution and recovering the metric scale factor of the vision algorithm. The method thus also recovers the offset in roll and pitch angles of the vision reference frame with respect to the direction of the gravity vector. It uses only a single inertial integration period, and no absolute orientation information is required. For optimal sensor-fusion and metric scale-estimation filters in the loosely-coupled case, it has been shown that the convergence of the fusion of an up-to-scale pose information with inertial measurements largely depends on the availability of a good initial value for the scale factor. We show how this problem can be tackled by applying the method presented in this paper. Finally, we present results in simulation and on real data, demonstrating the suitability of the method in real scenarios.

Proceedings ArticleDOI
01 Nov 2011
TL;DR: Preliminary experimental results using real-world multi-lane highways show the basic promise of this approach, and a data-driven learning framework has been applied to automatically learn surround behaviors.
Abstract: Safe operation of a motor vehicle requires awareness of the current traffic situation as well as the ability to predict future maneuvers. In order to provide an intelligent vehicle the ability to make predictions, this work proposes a framework for understanding the driving situation based on vehicle mounted vision sensors. Vehicles are tracked using Kalman filtering based on a vision-based system that detects and tracks using a combination of monocular and stereo-vision. The vehicles' full trajectories are recorded, and a data-driven learning framework has been applied to automatically learn surround behaviors. By learning based on observations, the ADAS system is being trained by experience. Learned trajectories have been compared between dense and free-flowing traffic conditions. Preliminary experimental results using real-world multi-lane highways show the basic promise of this approach. Future research directions are discussed.

Proceedings ArticleDOI
18 Nov 2011
TL;DR: The proposed approach combines monocular detection with stereo-vision for on-road vehicle localization and tracking for driver assistance and fusing information from both the monocular and stereo modalities.
Abstract: In this paper, we introduce a novel stereo-monocular fusion approach to on-road localization and tracking of vehicles. Utilizing a calibrated stereo-vision rig, the proposed approach combines monocular detection with stereo-vision for on-road vehicle localization and tracking for driver assistance. The system initially acquires synchronized monocular frames and calculates depth maps from the stereo rig. The system then detects vehicles in the image plane using an active learning-based monocular vision approach. Using the image coordinates of detected vehicles, the system then localizes the vehicles in real-world coordinates using the calculated depth map. The vehicles are tracked both in the image plane, and in real-world coordinates, fusing information from both the monocular and stereo modalities. Vehicles' states are estimated and tracked using Kalman filtering. Quantitative analysis of tracks is provided. The full system takes 46ms to process a single frame.


Journal ArticleDOI
TL;DR: To alleviate the effect of matching of corresponding feature points and extraction error of single feature point, a method by using a single camera as monocular measurement is presented based on image processing.
Abstract: To alleviate the effect of matching of corresponding feature points and extraction error of single feature point,a method by using a single camera as monocular measurement is presented based on image processing.Firstly,this paper sets up the mapping relationship between image point and target point,and establishes pinhole imaging model.Secondly,it describes the mapping relationship between object area and image area of object by using image analysis,and establishes the model of distance measurement in optical direction.Then,the principle of distance measurement between optical center and feature point is proposed after image processing and feature point extracting are carried out.At last,the paper starts out verification experiments and analyzes the cause that error increases with further distance.After analyzing the data of experiment,the conclusion that error is related with the optical axis deviation is made.Sequentially,with the maximum relative error 1.68% of revised data,there is a remarkable improvement which proves the feasibility and the effectiveness of the proposed principle.

Proceedings ArticleDOI
09 May 2011
TL;DR: This paper presents an approach to obstacle detection for collision-free, efficient humanoid robot navigation based on monocular images and sparse laser range data, and demonstrates how this approach enables the robot to reliably avoid obstacles during navigation.
Abstract: In this paper, we present an approach to obstacle detection for collision-free, efficient humanoid robot navigation based on monocular images and sparse laser range data. To detect arbitrary obstacles in the surroundings of the robot, we analyze 3D data points obtained from a 2D laser range finder installed in the robot's head. Relying only on this laser data, however, can be problematic. While walking, the floor close to the robot's feet is not observable by the laser sensor, which inherently increases the risk of collisions, especially in nonstatic scenes. Furthermore, it is time-consuming to frequently stop walking and tilting the head to obtain reliable information about close obstacles. We therefore present a technique to train obstacle detectors for images obtained from a monocular camera also located in the robot's head. The training is done online based on sparse laser data in a self-supervised fashion. Our approach projects the obstacles identified from the laser data into the camera image and learns a classifier that considers color and texture information. While the robot is walking, it then applies the learned classifiers to the images to decide which areas are traversable. As we illustrate in experiments with a real humanoid, our approach enables the robot to reliably avoid obstacles during navigation. Furthermore, the results show that our technique leads to significantly more efficient navigation compared to extracting obstacles solely based on 3D laser range data acquired while the robot is standing at certain intervals.

Proceedings ArticleDOI
15 Nov 2011
TL;DR: The main topic explains method used in monocular vision system, which is Hough transforms and how this method operates and the basic mathematical morphology which is implemented in Hough transform for image processing is shown.
Abstract: Mobile robot with vision could be useful for many applications and purposes. However, the vision system needs to be robust, effective, robust and fast to achieve on its goal. Somehow it needs stereo vision system to estimate the depth of the object. In this paper, a monocular vision system is introduced to the mobile robot to enhance their capabilities for calculating the distance or depth approximately. The main topic explains method used in monocular vision system, which is Hough transforms and how this method operates. This paper also shows the basic mathematical morphology which is implemented in Hough transform for image processing.

Journal ArticleDOI
TL;DR: Both subjective and electrophysiological results show that binocular vision ameliorates the effect of defocus, and the increased binocular facilitation observed with retinal blur may be due to the activation of a larger population of neurons at close-to-threshold detection under binocular stimulation.
Abstract: PURPOSE. To assess whether there are any advantages of binocular over monocular vision under blur conditions. METHODS. The effect of defocus, induced by positive lenses, was measured on the pattern reversal visual evoked potential (VEP) and on visual acuity (VA). Monocular (dominant eye) and binocular VEPs were recorded from 13 volunteers (average age, 28 ?? 5 years; average spherical equivalent, -0.25 ?? 0.73 D) for defocus up to 2.00 D using positive powered lenses. VEPs were elicited using reversing 10 arcmin checks (4 reversals/s). The stimulus subtended a circular field of 7?? with 100% contrast and mean luminance 30 cd/m 2. VA was measured under the same conditions using ETDRS charts. All measurements were performed at 1 m viewing distance with best spectacle sphero-cylindrical correction and natural pupils. RESULTS. With binocular stimulation, amplitudes and implicit times of the P100 component of the VEPs were greater and shorter, respectively, in all cases than for monocular stimulation. Mean binocular enhancement ratio in the P100 amplitude was 2.1 in focus, increasing linearly with defocus to be 3.1 at +2.00 D defocus. Mean peak latency was 2.9 ms shorter in focus with binocular than for monocular stimulation, with the difference increasing with defocus to 8.8 ms at +2.00 D. As for the VEP amplitude, VA was always better with binocular than with monocular vision, with the difference being greater for higher retinal blur. CONCLUSIONS. Both subjective and electrophysiological results show that binocular vision ameliorates the effect of defocus. The increased binocular facilitation observed with retinal blur may be due to the activation of a larger population of neurons at close-to-threshold detection under binocular stimulation. ?? 2011 The Association for Research in Vision and Ophthalmology, Inc.

Journal ArticleDOI
TL;DR: This paper presents an attention-driven method that focuses the feature selection to image areas where the obstacle situation is unclear and where a more detailed scene reconstruction is necessary, allowing the autonomous use of mobile robots in complex public and home environments.

Proceedings ArticleDOI
09 May 2011
TL;DR: This work proposes a particle filter-based algorithm for monocular vision-aided odometry for mobile robot localization that fuses information from odometry with observations of naturally occurring static point features in the environment and develops a novel approach for computing the particle weights which does not require including the feature positions in the state vector.
Abstract: We propose a particle filter-based algorithm for monocular vision-aided odometry for mobile robot localization. The algorithm fuses information from odometry with observations of naturally occurring static point features in the environment. A key contribution of this work is a novel approach for computing the particle weights, which does not require including the feature positions in the state vector. As a result, the computational and sample complexities of the algorithm remain low even in feature-dense environments. We validate the effectiveness of the approach extensively with both simulations as well as real-world data, and compare its performance against that of the extended Kalman filter (EKF) and FastSLAM. Results from the simulation tests show that the particle filter approach is better than these competing approaches in terms of the RMS error. Moreover, the experiments demonstrate that the approach is capable of achieving good localization accuracy in complex environments.

Proceedings ArticleDOI
18 Nov 2011
TL;DR: A robust horizon finding algorithm that finds the horizon line was proposed and applied to generate the navigable area and permits to investigate dynamically only a small portion of the image (road) ahead of the vehicle.
Abstract: Navigation of an Autonomous Vehicle is based on its interaction with the environment, through information acquired by sensors. The perception of the environment is a major issue in autonomous and (semi)-autonomous systems. This work presents the embedded real-time visual perception problem applied to experimental platform. In this way, a robust horizon finding algorithm that finds the horizon line was proposed and applied to generate the navigable area. It permits to investigate dynamically only a small portion of the image (road) ahead of the vehicle. From a dynamic threshold search method based on Otsu segmentation and Hough transform, this system was robust to illumination changes and does not need any contrast adjustments.

Journal ArticleDOI
TL;DR: This paper presents an approach that uses planar information (homography matrix) to build a visual 2D occupancy grid map from monocular vision to help classify parts of the image in floor or non floor.

Book ChapterDOI
14 Dec 2011
TL;DR: The main advantages of monocular vision based obstacle avoidance techniques are their ease of implementation and high availability for real time applications, and the illumination problem that varies with time and the computational complexity of the avoidance algorithms and the cost of the sensors.
Abstract: Vision is one of the most powerful and popular sensing method used for autonomous navigation. Compared with other on-board sensing techniques, vision based approaches to navigation continue to demand a lot of attention from the mobile robot research community. This is largely due to its ability to provide detailed information about the environment, which may not be available using combinations of other types of sensors. One of the key research problems in mobile robot navigation is the focus on obstacle avoidance methods. In order to cope this problem, most autonomous navigation systems rely on range data for obstacle detection. Ultrasonic sensors, laser rangefinders and stereo vision techniques are widely used for estimating the range data. However all of these have drawbacks. Ultrasonic sensors suffer from poor angular resolution. Laser range finders and stereo vision systems are quite expensive, and computational complexity of the stereo vision systems is another key challenge (Saitoh et al., 2009). In addition to their individual shortcomings, Range sensors are also unable to distinguish between different types of ground surfaces, such as they are not capable of differentiating between the sidewalk pavement and adjacent flat grassy areas. The computational complexity of the avoidance algorithms and the cost of the sensors are the most critical aspects for real time applications. Monocular vision based systems avoid these problems and are able to provide appropriate solution to the obstacle avoidance problem. There are two fundamental groups of vision based obstacle avoidance techniques; those that compute the apparent motion, and those that rely on the appearance of individual pixels for monocular vision based obstacle avoidance systems. First group is called as Optical flow based techniques, and the main idea behind this technique is to control the robot using optical flow, from which heading of the observer and time-to-contact values are obtained (Guzel & Bicker, 2010). One way of the control using these values is by acting to achieve a certain type of flow. For instance, to maintain ambient orientation, the type of Optic flow required is no flow at all. If some flow is detected, then the robot should change the forces produced by its effectors so as to minimize this flow, based on Law of Control (Contreras, 2007). A second group is called Appearance Based methods rely on basic image processing techniques, and consist of detecting pixels different in appearance than that of the ground and classifying them as obstacles. The algorithm performs in real-time, provides a highresolution obstacle image, and operates in a variety of environments (DeSouza & Kak, 2002). The main advantages of these two conventional methods are their ease of implementation and high availability for real time applications. However optical flow based methods suffer from two major problems, which are the illumination problem that varies with time and the

Journal ArticleDOI
TL;DR: A method for navigation of a small unmanned rotorcraft through an unsurveyed environment consisting of forest and urban canyons and results of simulations in a two-dimensional environment using a potential field obstacle avoidance routine are presented.
Abstract: We present a method for navigation of a small unmanned rotorcraft through an unsurveyed environment consisting of forest and urban canyons. Optical flow measurements obtained from a vision system are fused with measurements of vehicle velocity to compute estimates of range to obstacles. These estimates are used to populate a local occupancy grid which is fixed to the vehicle. This local occupancy grid allows modeling of complex environments and is suitable for use by generic trajectory planners. Results of simulations in a two-dimensional environment using a potential field obstacle avoidance routine are presented.

Proceedings ArticleDOI
29 Mar 2011
TL;DR: In this paper, a monocular vision strategy incorporating image segmentation and epipolar geometry is proposed to extend the capability of the ranging method to unknown outdoor environments, and the validity of the proposed method is verified through experiments in a river-like environment.
Abstract: This paper presents a new method to estimate the range and bearing of landmarks and solve the simultaneous localization and mapping (SLAM) problem. The proposed ranging and SLAM algorithms have application to a micro aerial vehicle (MAV) flying through riverine environments which occasionally involve heavy foliage and forest canopy. Monocular vision navigation has merits in MAV applications since it is lightweight and provides abundant visual cues of the environment in comparison to other ranging methods. In this paper, we suggest a monocular vision strategy incorporating image segmentation and epipolar geometry to extend the capability of the ranging method to unknown outdoor environments. The validity of our proposed method is verified through experiments in a river-like environment.

Journal ArticleDOI
TL;DR: Binocular cues from the finger are critical to effective online control of hand movements in depth and an optimal feedback controller that takes into account the low peripheral stereoacuity and inherent ambiguity in cast shadows can explain the difference in response time in the binocular conditions and lack of response in monocular conditions.
Abstract: Previous work has shown that humans continuously use visual feedback of the hand to control goal-directed movements online. In most studies, visual error signals were predominantly in the image plane and thus were available in an observer’s retinal image. We investigate how humans use visual feedback about finger depth provided by binocular and monocular depth cues to control pointing movements. When binocularly viewing a scene in which the hand movement was made in free space, subjects were about 60 ms slower in responding to perturbations in depth than in the image plane. When monocularly viewing a scene designed to maximize the available monocular cues to finger depth (motion, changing size and cast shadows), subjects showed no response to perturbations in depth. Thus, binocular cues from the finger are critical to effective online control of hand movements in depth. An optimal feedback controller that takes into account of the low peripheral stereoacuity and inherent ambiguity in cast shadows can explain the difference in response time in the binocular conditions and lack of response in monocular conditions.

Journal ArticleDOI
TL;DR: A solution for indoor mobile robot navigation and obstacle distance detection which based on monocular vision is described which is relatively simple with high real-time ability and robustness.
Abstract: To achieve autonomous indoor mobile robot navigation, the robot needs to identify the direction, obstacle position and other information. This paper describes a solution for indoor mobile robot navigation and obstacle distance detection which based on monocular vision. We use Hough transform to find the straight lines in the corridor environment which were used to find the vanishing point as the navigation direction for the mobile robot. At the same time, use the priori knowledge of the corridor environment and the pinhole camera model to calculate the distance of the obstacle. Test results on our robot system show that the algorithm presented in this paper can detect the direction of the corridor and estimate the obstacle distance accuracy. This algorithm is relatively simple with high real-time ability and robustness

Patent
27 Jul 2011
TL;DR: In this article, a monocular vision technique based fire monitor control method for adjusting relative positions of a fire point and a water-drop point was proposed, which belongs to the technical field offire monitoring and self-extinguishing.
Abstract: The invention relates to a monocular vision technique based fire monitor control method for adjusting relative positions of a fire point and a water-drop point, which belongs to the technical field offire monitoring and self-extinguishing. By using a method for controlling a pitching angle and a horizontal angle for a plurality of times respectively in sequence, the fire monitor control method solves the problem of adjusting the relative positions of the fire point and the water-drop point in a three-dimensional space by using a two-dimensional image so that the water-drop point of the fire monitor can track the changes of the position of the fire point in real time so as to provide a condition for the self-extinguishing in a large space.

Dissertation
01 Jan 2011
TL;DR: This thesis implements a visual SLAM system using a stereo camera as the only sensor that allows to obtain accurate 3D reconstructions of the environment and investigates a novel family of appearance descriptors known as Gauge-Speeded Up Robust Features (G-SURF).
Abstract: Nowadays, 3D applications have recently become a more and more popular topic in robotics, computer vision or augmented reality. By means of cameras and computer vision techniques, it is possible to obtain accurate 3D models of large-scale environments such as cities. In addition, cameras are low-cost, non-intrusive sensors compared to other sensors such as laser scanners. Furthermore, cameras also offer a rich information about the environment. One application of great interest is the vision-based localization in a prior 3D map. Robots need to perform tasks in the environment autonomously, and for this purpose, is very important to know precisely the location of the robot in the map. In the same way, providing accurate information about the location and spatial orientation of the user in a large-scale environment can be of benefit for those who suffer from visual impairment problems. A safe and autonomous navigation in unknown or known environments, can be a great challenge for those who are blind or are visually impaired. Most of the commercial solutions for visually impaired localization and navigation assistance are based on the satellite Global Positioning System (GPS). However, these solutions are not suitable enough for the visually impaired community in urban-environments. The errors are about of the order of several meters and there are also other problems such GPS signal loss or line-of-sight restrictions. In addition, GPS does not work if an insufficient number of satellites are directly visible. Therefore, GPS cannot be used for indoor environments. Thus, it is important to do further research on new more robust and accurate localization systems. In this thesis we propose several algorithms in order to obtain an accurate real-time vision-based localization from a prior 3D map. For that purpose, it is necessary to compute a 3D map of the environment beforehand. For computing that 3D map, we employ well-known techniques such as Simultaneous Localization and Mapping (SLAM) or Structure from Motion (SfM). In this thesis, we implement a visual SLAM system using a stereo camera as the only sensor that allows to obtain accurate 3D reconstructions of the environment. The proposed SLAM system is also capable to detect moving objects especially in a close range to the camera up to approximately 5 meters, thanks to a moving objects detection module. This is possible, thanks to a dense scene flow representation of the environment, that allows to obtain the 3D motion of the world points. This moving objects detection module seems to be very effective in highly crowded and dynamic environments, where there are a huge number of dynamic objects such as pedestrians. By means of the moving objects detection module we avoid adding erroneous 3D points into the SLAM process, yielding much better and consistent 3D reconstruction results. Up to the best of our knowledge, this is the first time that dense scene flow and derived detection of moving objects has been applied in the context of visual SLAM for challenging crowded and dynamic environments, such as the ones presented in this Thesis. In SLAM and vision-based localization approaches, 3D map points are usually described by means of appearance descriptors. By means of these appearance descriptors, the data association between 3D map elements and perceived 2D image features can be done. In this thesis we have investigated a novel family of appearance descriptors known as Gauge-Speeded Up Robust Features (G-SURF). Those descriptors are based on the use of gauge coordinates. By means of these coordinates every pixel in the image is fixed separately in its own local coordinate frame defined by the local structure itself and consisting of the gradient vector and its perpendicular direction. We have carried out an extensive experimental evaluation on different applications such as image matching, visual object categorization and 3D SfM applications that show the usefulness and improved results of G-SURF descriptors against other state-of-the-art descriptors such as the Scale Invariant Feature Transform (SIFT) or SURF. In vision-based localization applications, one of the most expensive computational steps is the data association between a large map of 3D points and perceived 2D features in the image. Traditional approaches often rely on purely appearence information for solving the data association step. These algorithms can have a high computational demand and for environments with highly repetitive textures, such as cities, this data association can lead to erroneous results due to the ambiguities introduced by visually similar features. In this thesis we have done an algorithm for predicting the visibility of 3D points by means of a memory based learning approach from a prior 3D reconstruction. Thanks to this learning approach, we can speed-up the data association step by means of the prediction of visible 3D points given a prior camera pose. We have implemented and evaluated visual SLAM and vision-based localization algorithms for two different applications of great interest: humanoid robots and visually impaired people. Regarding humanoid robots, a monocular vision-based localization algorithm with visibility prediction has been evaluated under different scenarios and different types of sequences such as square trajectories, circular, with moving objects, changes in lighting, etc. A comparison of the localization and mapping error has been done with respect to a precise motion capture system, yielding errors about the order of few cm. Furthermore, we also compared our vision-based localization system with respect to the Parallel Tracking and Mapping (PTAM) approach, obtaining much better results with our localization algorithm. With respect to the vision-based localization approach for the visually impaired, we have evaluated the vision-based localization system in indoor and cluttered office-like environments. In addition, we have evaluated the visual SLAM algorithm with moving objects detection considering test with real visually impaired users in very dynamic environments such as inside the Atocha railway station (Madrid, Spain) and in the city center of Alcala de Henares (Madrid, Spain). The obtained results highlight the potential benefits of our approach for the localization of the visually impaired in large and cluttered environments.

Journal ArticleDOI
TL;DR: It is suggested, that binocular vision is not necessary for experts to perform to their best, however, eliminating binocular sight could be part of an optimization strategy for apprentices, which could in turn be transferred to new training programs.
Abstract: The goal of this study was to investigate the role of binocular and monocular vision in 16 gymnasts as they perform a handspring on vault. In particular we reasoned, if binocular visual information is eliminated while experts and apprentices perform a handspring on vault, and their performance level changes or is maintained, then such information must or must not be necessary for their best performance. If the elimination of binocular vision leads to differences in gaze behavior in either experts or apprentices, this would answer the question of an adaptive gaze behavior, and thus if this is a function of expertise level or not. Gaze behavior was measured using a portable and wireless eye-tracking system in combination with a movement-analysis system. Results revealed that gaze behavior differed between experts and apprentices in the binocular and monocular conditions. In particular, apprentices showed less fixations of longer duration in the monocular condition as compared to experts and the binocular condition. Apprentices showed longer blink duration than experts in both, the monocular and binocular conditions. Eliminating binocular vision led to a shorter repulsion phase and a longer second flight phase in apprentices. Experts exhibited no differences in phase durations between binocular and monocular conditions. Findings suggest, that experts may not rely on binocular vision when performing handsprings, and movement performance maybe influenced in apprentices when eliminating binocular vision. We conclude that knowledge about gaze-movement relationships may be beneficial for coaches when teaching the handspring on vault in gymnastics. Key pointsSkills in gymnastics are quite complex and the athlete has to meet temporal and spatial constraints to perform these skills adequately. Visual information pickup is thought to be integral in complex skill performance. However, there is no compelling evidence on the role of binocular vision in complex skill performance.The study reveals, that apprentices optimize their gaze behavior and their movement behavior when binocular vision is eliminated, whereas experts gaze behavior and movement behavior is uninfluenced by eliminating binocular vision.We state, that binocular vision is not necessary for experts to perform to their best. However, eliminating binocular vision could be part of an optimization strategy for apprentices, which could in turn be transferred to new training programs.