scispace - formally typeset
Search or ask a question
Topic

Monocular vision

About: Monocular vision is a research topic. Over the lifetime, 2667 publications have been published within this topic receiving 48827 citations.


Papers
More filters
Proceedings ArticleDOI
07 Aug 2005
TL;DR: An approach in which supervised learning is first used to estimate depths from single monocular images, which is able to learn monocular vision cues that accurately estimate the relative depths of obstacles in a scene is presented.
Abstract: We consider the task of driving a remote control car at high speeds through unstructured outdoor environments. We present an approach in which supervised learning is first used to estimate depths from single monocular images. The learning algorithm can be trained either on real camera images labeled with ground-truth distances to the closest obstacles, or on a training set consisting of synthetic graphics images. The resulting algorithm is able to learn monocular vision cues that accurately estimate the relative depths of obstacles in a scene. Reinforcement learning/policy search is then applied within a simulator that renders synthetic scenes. This learns a control policy that selects a steering direction as a function of the vision system's output. We present results evaluating the predictive ability of the algorithm both on held out test data, and in actual autonomous driving experiments.

435 citations

Journal ArticleDOI
TL;DR: The here‐presented work describes the first aerial vehicle that uses onboard monocular vision as a main sensor to navigate through an unknown GPS‐denied environment and independently of any external artificial aids.
Abstract: Autonomous micro aerial vehicles (MAVs) will soon play a major role in tasks such as search and rescue, environment monitoring, surveillance, and inspection. They allow us to easily access environments to which no humans or other vehicles can get access. This reduces the risk for both the people and the environment. For the above applications, it is, however, a requirement that the vehicle is able to navigate without using GPS, or without relying on a preexisting map, or without specific assumptions about the environment. This will allow operations in unstructured, unknown, and GPS-denied environments. We present a novel solution for the task of autonomous navigation of a micro helicopter through a completely unknown environment by using solely a single camera and inertial sensors onboard. Many existing solutions suffer from the problem of drift in the xy plane or from the dependency on a clean GPS signal. The novelty in the here-presented approach is to use a monocular simultaneous localization and mapping (SLAM) framework to stabilize the vehicle in six degrees of freedom. This way, we overcome the problem of both the drift and the GPS dependency. The pose estimated by the visual SLAM algorithm is used in a linear optimal controller that allows us to perform all basic maneuvers such as hovering, set point and trajectory following, vertical takeoff, and landing. All calculations including SLAM and controller are running in real time and online while the helicopter is flying. No offline processing or preprocessing is done. We show real experiments that demonstrate that the vehicle can fly autonomously in an unknown and unstructured environment. To the best of our knowledge, the here-presented work describes the first aerial vehicle that uses onboard monocular vision as a main sensor to navigate through an unknown GPS-denied environment and independently of any external artificial aids. © 2011 Wiley Periodicals, Inc. © 2011 Wiley Periodicals, Inc.

422 citations

Journal ArticleDOI
TL;DR: A new real-time localization system for a mobile robot that shows that autonomous navigation is possible in outdoor situation with the use of a single camera and natural landmarks and a three step approach is presented.
Abstract: This paper presents a new real-time localization system for a mobile robot. We show that autonomous navigation is possible in outdoor situation with the use of a single camera and natural landmarks. To do that, we use a three step approach. In a learning step, the robot is manually guided on a path and a video sequence is recorded with a front looking camera. Then a structure from motion algorithm is used to build a 3D map from this learning sequence. Finally in the navigation step, the robot uses this map to compute its localization in real-time and it follows the learning path or a slightly different path if desired. The vision algorithms used for map building and localization are first detailed. Then a large part of the paper is dedicated to the experimental evaluation of the accuracy and robustness of our algorithms based on experimental data collected during two years in various environments.

361 citations

Journal ArticleDOI
TL;DR: A real-time visual processing theory is developed to explain how three-dimensional form, color, and brightness percepts are coherently synthesized and how boundary completion and segmentation processes become binocular at an earlier processing stage than do color and brightness perception processes.
Abstract: A real-time visual processing theory is developed to explain how three-dimensional form, color, and brightness percepts are coherently synthesized. The theory describes how several fundamental uncertainty principles which limit the computation of visual information at individual processing stages are resolved through parallel and hierarchical interactions among several processing stages. The theory hereby provides a unified analysis and many predictions of data about stereopsis, binocular rivalry, hyperacuity, McCollough effect, textural grouping, border distinctness, surface perception, monocular and binocular brightness percepts, filling-in, metacontrast, transparency, figural aftereffects, lateral inhibition within spatial frequency channels, proximity-luminance covariance, tissue contrast, motion segmentation, and illusory figures, as well as about reciprocal interactions among the hypercolumns, blobs, and stripes of cortical areas VI, V2, and V4. Monocular and binocular interactions between a Boundary Contour (BC) System and a Feature Contour (FC) System are developed. The BC System, defined by a hierarchy of oriented interactions, synthesizes an emergent and coherent binocular boundary segmentation from combinations of unoriented and oriented scenic elements. These BC System interactions instantiate a new theory of stereopsis and of how mechanisms of stereopsis are related to mechanisms of boundary segmentation. Interactions between the BC System and the FC System explain why boundary completion and segmentation processes become binocular at an earlier processing stage than do color and brightness perception processes. The new stereopsis theory includes a new model of how chromatically broadband cortical complex cells can be adaptively tuned to multiplex information about position, orientation, spatial frequency, positional disparity, and orientational disparity. These binocular cells input to spatially short-range competitive interactions (within orientations and between positions, followed by between orientations and within positions) that initiate suppression of binocular double images as they complete boundaries at scenic line ends and corners. The competitive interactions interact via both feedforward and feedback pathways with spatially long-range-oriented cooperative gating interactions that generate a coherent, multiple-scale, three-dimensional boundary segmentation as they complete the suppression of double-image boundaries. The completed BC System boundary segmentation generates output signals, called filling-in generators (FIGs) and filling-in barriers (FIBs), along parallel pathways to two successive FC System stages: the monocular syncytium and the binocular syncytium. FIB signals at the monocular syncytium suppress monocular color and brightness signals that are binocularly inconsistent and select binocularly consistent, monocular FC signals as outputs to the binocular syncytium. Binocular matching of these FC signals further suppresses binocularly inconsistent color and brightness signals. Binocular FC contour signals that survive these multiple suppressive events interact with FEB signals at the binocular syncytium to fill-in a multiplescale representation of form-and-color-in-depth. To achieve these properties, distinct syncytia correspond to each spatial scale of the BC System. Each syncytium is composed of opponent subsyncytia that generate output signals through a network of double-opponent cells. Although composed of unoriented wavelength-sensitive cells, double-opponent networks detect oriented properties of form when they interact with FIG signals, yet also generate nonselective properties of binocular rivalry. Electrotonic and chemical transmitter interactions within the syncytia are formally akin to interactions in HI horizontal cells of turtle retina. The cortical syncytia are hypothesized to be encephalizations of ancestral retinal syncytia. In addition to double-opponent-cell networks, electrotonic syncytial interactions, and resistive gating signals due to BC System outputs, the FC System processes also include habituative transmitters and non-Hebbian adaptive filters that maintain the positional and chromatic selectivity of FC interactions. Alternative perceptual theories are evaluated in light of these results. The theoretical circuits provide qualitatively new design principles and architectures for computer vision applications.

336 citations

Proceedings ArticleDOI
09 May 2011
TL;DR: The latest achievements towards the goal of autonomous flights of an MAV in unknown environments, only having a monocular camera as exteroceptive sensor are presented, and a solution to overcome the issue of having a low frequent onboard visual pose update versus the high agility of anMAV is presented.
Abstract: In this paper, we present our latest achievements towards the goal of autonomous flights of an MAV in unknown environments, only having a monocular camera as exteroceptive sensor. As MAVs are highly agile, it is not sufficient to directly use the visual input for position control at the framerates that can be achieved with small onboard computers. Our contributions in this work are twofold. First, we present a solution to overcome the issue of having a low frequent onboard visual pose update versus the high agility of an MAV. This is solved by filtering visual information with inputs from inertial sensors. Second, as our system is based on monocular vision, we present a solution to estimate the metric visual scale aid of an air pressure sensor. All computation is running onboard and is tightly integrated on the MAV to avoid jitter and latencies. This framework enables stable flights indoors and outdoors even under windy conditions.

313 citations


Network Information
Related Topics (5)
Visual perception
20.8K papers, 997.2K citations
77% related
Facial recognition system
38.7K papers, 883.4K citations
76% related
Visual cortex
18.8K papers, 1.2M citations
73% related
Lens (optics)
156.4K papers, 1.2M citations
72% related
Body movement
14.6K papers, 804.3K citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202358
2022126
202192
2020163
2019208
2018170