scispace - formally typeset
Search or ask a question
Author

Patrick Bouthemy

Bio: Patrick Bouthemy is an academic researcher from French Institute for Research in Computer Science and Automation. The author has contributed to research in topics: Motion estimation & Image segmentation. The author has an hindex of 41, co-authored 189 publications receiving 7619 citations. Previous affiliations of Patrick Bouthemy include Institut de Recherche en Informatique et Systèmes Aléatoires.


Papers
More filters
Journal ArticleDOI
TL;DR: Numerical results support this approach, as validated by the use of these algorithms on complex sequences, and two robust estimators in a multi-resolution framework are developed.

673 citations

Proceedings ArticleDOI
23 Jun 2013
TL;DR: It is established that adequately decomposing visual motion into dominant and residual motions, both in the extraction of the space-time trajectories and for the computation of descriptors, significantly improves action recognition algorithms.
Abstract: Several recent works on action recognition have attested the importance of explicitly integrating motion characteristics in the video description. This paper establishes that adequately decomposing visual motion into dominant and residual motions, both in the extraction of the space-time trajectories and for the computation of descriptors, significantly improves action recognition algorithms. Then, we design a new motion descriptor, the DCS descriptor, based on differential motion scalar quantities, divergence, curl and shear features. It captures additional information on the local motion patterns enhancing results. Finally, applying the recent VLAD coding technique proposed in image retrieval provides a substantial improvement for action recognition. Our three contributions are complementary and lead to outperform all reported results by a significant margin on three challenging datasets, namely Hollywood 2, HMDB51 and Olympic Sports.

436 citations

Journal ArticleDOI
TL;DR: A survey of optical flow estimation classifying the main principles elaborated during this evolution, with a particular concern given to recent developments is proposed.

368 citations

Journal ArticleDOI
TL;DR: It is shown that multiple constraints can provide more accurate flow estimation in a wide range of circumstances and is presented a multimodal approach to the problem of motion estimation in which the computation of visual motion is based on several complementary constraints.
Abstract: The estimation of dense velocity fields from image sequences is basically an ill-posed problem, primarily because the data only partially constrain the solution. It is rendered especially difficult by the presence of motion boundaries and occlusion regions which are not taken into account by standard regularization approaches. In this paper, the authors present a multimodal approach to the problem of motion estimation in which the computation of visual motion is based on several complementary constraints. It is shown that multiple constraints can provide more accurate flow estimation in a wide range of circumstances. The theoretical framework relies on Bayesian estimation associated with global statistical models, namely, Markov random fields. The constraints introduced here aim to address the following issues: optical flow estimation while preserving motion boundaries, processing of occlusion regions, fusion between gradient and feature-based motion constraint equations. Deterministic relaxation algorithms are used to merge information and to provide a solution to the maximum a posteriori estimation of the unknown dense motion field. The algorithm is well suited to a multiresolution implementation which brings an appreciable speed-up as well as a significant improvement of estimation when large displacements are present in the scene. Experiments on synthetic and real world image sequences are reported. >

322 citations

Proceedings ArticleDOI
23 Jun 2014
TL;DR: This paper introduces a sampling strategy to produce 2D+t sequences of bounding boxes, called tubelets, that significantly outperforms the state-of-the-art on both datasets, while restricting the search of actions to a fraction of possible bounding box sequences.
Abstract: This paper considers the problem of action localization, where the objective is to determine when and where certain actions appear We introduce a sampling strategy to produce 2D+t sequences of bounding boxes, called tubelets Compared to state-of-the-art alternatives, this drastically reduces the number of hypotheses that are likely to include the action of interest Our method is inspired by a recent technique introduced in the context of image localization Beyond considering this technique for the first time for videos, we revisit this strategy for 2D+t sequences obtained from super-voxels Our sampling strategy advantageously exploits a criterion that reflects how action related motion deviates from background motion We demonstrate the interest of our approach by extensive experiments on two public datasets: UCF Sports and MSR-II Our approach significantly outperforms the state-of-the-art on both datasets, while restricting the search of actions to a fraction of possible bounding box sequences

317 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This paper has designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can easily be extended to include new algorithms.
Abstract: Stereo matching is one of the most active research areas in computer vision. While a large number of algorithms for stereo correspondence have been developed, relatively little work has been done on characterizing their performance. In this paper, we present a taxonomy of dense, two-frame stereo methods designed to assess the different components and design decisions made in individual stereo algorithms. Using this taxonomy, we compare existing stereo methods and present experiments evaluating the performance of many different variants. In order to establish a common software platform and a collection of data sets for easy evaluation, we have designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can be easily extended to include new algorithms. We have also produced several new multiframe stereo data sets with ground truth, and are making both the code and data sets available on the Web.

7,458 citations

Proceedings Article
08 Dec 2014
TL;DR: This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.
Abstract: We investigate architectures of discriminatively trained deep Convolutional Networks (ConvNets) for action recognition in video. The challenge is to capture the complementary information on appearance from still frames and motion between frames. We also aim to generalise the best performing hand-crafted features within a data-driven learning framework. Our contribution is three-fold. First, we propose a two-stream ConvNet architecture which incorporates spatial and temporal networks. Second, we demonstrate that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data. Finally, we show that multitask learning, applied to two different action classification datasets, can be used to increase the amount of training data and improve the performance on both. Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101 and HMDB-51, where it is competitive with the state of the art. It also exceeds by a large margin previous attempts to use deep nets for video classification.

6,397 citations

Book
30 Sep 2010
TL;DR: Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images and takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene.
Abstract: Humans perceive the three-dimensional structure of the world with apparent ease. However, despite all of the recent advances in computer vision research, the dream of having a computer interpret an image at the same level as a two-year old remains elusive. Why is computer vision such a challenging problem and what is the current state of the art? Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images. It also describes challenging real-world applications where vision is being successfully used, both for specialized applications such as medical imaging, and for fun, consumer-level tasks such as image editing and stitching, which students can apply to their own personal photos and videos. More than just a source of recipes, this exceptionally authoritative and comprehensive textbook/reference also takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene. These problems are also analyzed using statistical models and solved using rigorous engineering techniques Topics and features: structured to support active curricula and project-oriented courses, with tips in the Introduction for using the book in a variety of customized courses; presents exercises at the end of each chapter with a heavy emphasis on testing algorithms and containing numerous suggestions for small mid-term projects; provides additional material and more detailed mathematical topics in the Appendices, which cover linear algebra, numerical techniques, and Bayesian estimation theory; suggests additional reading at the end of each chapter, including the latest research in each sub-field, in addition to a full Bibliography at the end of the book; supplies supplementary course material for students at the associated website, http://szeliski.org/Book/. Suitable for an upper-level undergraduate or graduate-level course in computer science or engineering, this textbook focuses on basic techniques that work under real-world conditions and encourages students to push their creative boundaries. Its design and exposition also make it eminently suitable as a unique reference to the fundamental techniques and current research literature in computer vision.

4,146 citations

Proceedings ArticleDOI
01 Dec 2013
TL;DR: Dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets are improved by taking into account camera motion to correct them.
Abstract: Recently dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets. This paper improves their performance by taking into account camera motion to correct them. To estimate camera motion, we match feature points between frames using SURF descriptors and dense optical flow, which are shown to be complementary. These matches are, then, used to robustly estimate a homography with RANSAC. Human motion is in general different from camera motion and generates inconsistent matches. To improve the estimation, a human detector is employed to remove these matches. Given the estimated camera motion, we remove trajectories consistent with it. We also use this estimation to cancel out camera motion from the optical flow. This significantly improves motion-based descriptors, such as HOF and MBH. Experimental results on four challenging action datasets (i.e., Hollywood2, HMDB51, Olympic Sports and UCF50) significantly outperform the current state of the art.

3,487 citations