scispace - formally typeset
Search or ask a question
Author

Maxime Meilland

Bio: Maxime Meilland is an academic researcher from University of Nice Sophia Antipolis. The author has contributed to research in topics: Pose & Visual servoing. The author has an hindex of 13, co-authored 23 publications receiving 628 citations. Previous affiliations of Maxime Meilland include French Institute for Research in Computer Science and Automation & Centre national de la recherche scientifique.

Papers
More filters
Proceedings ArticleDOI
01 Nov 2013
TL;DR: An approach to real-time dense localisation and mapping that aims at unifying two different representations commonly used to define dense models, able to perform large scale reconstruction accurately at the scale of mapping a building.
Abstract: This paper proposes an approach to real-time dense localisation and mapping that aims at unifying two different representations commonly used to define dense models. On one hand, much research has looked at 3D dense model representations using voxel grids in 3D. On the other hand, image-based key-frame representations for dense environment mapping have been developed. Both techniques have their relative advantages and disadvantages which will be analysed in this paper. In particular each representation's space-size requirements, their effective resolution, the computation efficiency, their accuracy and robustness will be compared. This paper then proposes a new model which unifies various concepts and exhibits the main advantages of each approach within a common framework. One of the main results of the proposed approach is its ability to perform large scale reconstruction accurately at the scale of mapping a building.

117 citations

Proceedings ArticleDOI
23 Dec 2013
TL;DR: Augmented reality results are provided which demonstratereal-time 3D HDR mapping, virtual light-probe synthesis and light source detection for rendering reflective objects with shadows seamlessly with the real video stream in real-time.
Abstract: Acquiring High Dynamic Range (HDR) light-fields from several images with different exposures (sensor integration periods) has been widely considered for static camera positions. In this paper a new approach is proposed that enables 3D HDR environment maps to be acquired directly from a dynamic set of images in real-time. In particular a method will be proposed to use an RGB-D camera as a dynamic light-field sensor, based on a dense real-time 3D tracking and mapping approach, that avoids the need for a light-probe or the observation of reflective surfaces. The 6dof pose and dense scene structure will be estimated simultaneously with the observed dynamic range so as to compute the radiance map of the scene and fuse a stream of low dynamic range images (LDR) into an HDR image. This will then be used to create an arbitrary number of virtual omni-directional light-probes that will be placed at the positions where virtual augmented objects will be rendered. In addition, a solution is provided for the problem of automatic shutter variations in visual SLAM. Augmented reality results are provided which demonstrate real-time 3D HDR mapping, virtual light-probe synthesis and light source detection for rendering reflective objects with shadows seamlessly with the real video stream in real-time.

69 citations

Proceedings ArticleDOI
03 Dec 2010
TL;DR: This paper describes a generic method for vision-based navigation in real urban environments based on spherical images augmented with depth information and a spherical saliency map, both constructed in a learning phase.
Abstract: This paper describes a generic method for vision-based navigation in real urban environments. The proposed approach relies on a representation of the scene based on spherical images augmented with depth information and a spherical saliency map, both constructed in a learning phase. Saliency maps are built by analyzing useful information of points which best condition spherical projections constraints in the image. During navigation, an image-based registration technique combined with robust outlier rejection is used to precisely locate the vehicle. The main objective of this work is to improve computational time by better representing and selecting information from the reference sphere and current image without degrading matching. It will be shown that by using this pre-learned global spherical memory no error is accumulated along the trajectory and the vehicle can be precisely located without drift.

64 citations

Book ChapterDOI
07 Oct 2012
TL;DR: This article proposes a new direct visual tracking method based on the normalized cross correlation (NCC), an efficient Newton-style optimization procedure that does not require the explicit computation of the Hessian.
Abstract: Direct visual tracking can be impaired by changes in illumination if the right choice of similarity function and photometric model is not made. Tracking using the sum of squared differences, for instance, often needs to be coupled with a photometric model to mitigate illumination changes. More sophisticated similarities, e.g. mutual information and cross cumulative residual entropy, however, can cope with complex illumination variations at the cost of a reduction of the convergence radius, and an increase of the computational effort. In this context, the normalized cross correlation (NCC) represents an interesting alternative. The NCC is intrinsically invariant to affine illumination changes, and also presents low computational cost. This article proposes a new direct visual tracking method based on the NCC. Two techniques have been developed to improve the robustness to complex illumination variations and partial occlusions. These techniques are based on subregion clusterization, and weighting by a residue invariant to affine illumination changes. The last contribution is an efficient Newton-style optimization procedure that does not require the explicit computation of the Hessian. The proposed method is compared against the state of the art using a benchmark database with ground-truth, as well as real-world sequences.

49 citations


Cited by
More filters
Proceedings ArticleDOI
24 Dec 2012
TL;DR: A large set of image sequences from a Microsoft Kinect with highly accurate and time-synchronized ground truth camera poses from a motion capture system is recorded for the evaluation of RGB-D SLAM systems.
Abstract: In this paper, we present a novel benchmark for the evaluation of RGB-D SLAM systems. We recorded a large set of image sequences from a Microsoft Kinect with highly accurate and time-synchronized ground truth camera poses from a motion capture system. The sequences contain both the color and depth images in full sensor resolution (640 × 480) at video frame rate (30 Hz). The ground-truth trajectory was obtained from a motion-capture system with eight high-speed tracking cameras (100 Hz). The dataset consists of 39 sequences that were recorded in an office environment and an industrial hall. The dataset covers a large variety of scenes and camera motions. We provide sequences for debugging with slow motions as well as longer trajectories with and without loop closures. Most sequences were recorded from a handheld Kinect with unconstrained 6-DOF motions but we also provide sequences from a Kinect mounted on a Pioneer 3 robot that was manually navigated through a cluttered indoor environment. To stimulate the comparison of different approaches, we provide automatic evaluation tools both for the evaluation of drift of visual odometry systems and the global pose error of SLAM systems. The benchmark website [1] contains all data, detailed descriptions of the scenes, specifications of the data formats, sample code, and evaluation tools.

3,050 citations

Reference EntryDOI
15 Oct 2004

2,118 citations

Proceedings ArticleDOI
29 Sep 2014
TL;DR: A semi-direct monocular visual odometry algorithm that is precise, robust, and faster than current state-of-the-art methods and applied to micro-aerial-vehicle state-estimation in GPS-denied environments is proposed.
Abstract: We propose a semi-direct monocular visual odometry algorithm that is precise, robust, and faster than current state-of-the-art methods. The semi-direct approach eliminates the need of costly feature extraction and robust matching techniques for motion estimation. Our algorithm operates directly on pixel intensities, which results in subpixel precision at high frame-rates. A probabilistic mapping method that explicitly models outlier measurements is used to estimate 3D points, which results in fewer outliers and more reliable points. Precise and high frame-rate motion estimation brings increased robustness in scenes of little, repetitive, and high-frequency texture. The algorithm is applied to micro-aerial-vehicle state-estimation in GPS-denied environments and runs at 55 frames per second on the onboard embedded computer and at more than 300 frames per second on a consumer laptop. We call our approach SVO (Semi-direct Visual Odometry) and release our implementation as open-source software.

1,814 citations

01 Jan 1979
TL;DR: This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis and addressing interesting real-world computer Vision and multimedia applications.
Abstract: In the real world, a realistic setting for computer vision or multimedia recognition problems is that we have some classes containing lots of training data and many classes contain a small amount of training data. Therefore, how to use frequent classes to help learning rare classes for which it is harder to collect the training data is an open question. Learning with Shared Information is an emerging topic in machine learning, computer vision and multimedia analysis. There are different level of components that can be shared during concept modeling and machine learning stages, such as sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, etc. Regarding the specific methods, multi-task learning, transfer learning and deep learning can be seen as using different strategies to share information. These learning with shared information methods are very effective in solving real-world large-scale problems. This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis. Both state-of-the-art works, as well as literature reviews, are welcome for submission. Papers addressing interesting real-world computer vision and multimedia applications are especially encouraged. Topics of interest include, but are not limited to: • Multi-task learning or transfer learning for large-scale computer vision and multimedia analysis • Deep learning for large-scale computer vision and multimedia analysis • Multi-modal approach for large-scale computer vision and multimedia analysis • Different sharing strategies, e.g., sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, • Real-world computer vision and multimedia applications based on learning with shared information, e.g., event detection, object recognition, object detection, action recognition, human head pose estimation, object tracking, location-based services, semantic indexing. • New datasets and metrics to evaluate the benefit of the proposed sharing ability for the specific computer vision or multimedia problem. • Survey papers regarding the topic of learning with shared information. Authors who are unsure whether their planned submission is in scope may contact the guest editors prior to the submission deadline with an abstract, in order to receive feedback.

1,758 citations