scispace - formally typeset
Search or ask a question
Author

Mark R. Stevens

Bio: Mark R. Stevens is an academic researcher from Charles River Laboratories. The author has contributed to research in topics: Automatic target recognition & Mobile robot. The author has an hindex of 7, co-authored 30 publications receiving 259 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A visibility approach that uses all possible color information from the photographs during reconstruction, photo-consistency measures that are more robust and/or require less manual intervention, and a volumetric warping method for application of these reconstruction methods to large-scale scenes are described.
Abstract: In this paper, we present methods for 3D volumetric reconstruction of visual scenes photographed by multiple calibrated cameras placed at arbitrary viewpoints. Our goal is to generate a 3D model that can be rendered to synthesize new photo-realistic views of the scene. We improve upon existing voxel coloring/space carving approaches by introducing new ways to compute visibility and photo-consistency, as well as model infinitely large scenes. In particular, we describe a visibility approach that uses all possible color information from the photographs during reconstruction, photo-consistency measures that are more robust and/or require less manual intervention, and a volumetric warping method for application of these reconstruction methods to large-scale scenes.

130 citations

Proceedings ArticleDOI
19 Oct 2005
TL;DR: A performance predictor using a trained classifier ATD that was constructed using GENIE, a tool developed at Los Alamos National Laboratory is presented.
Abstract: Automatic target detection (ATD) systems process imagery to detect and locate targets in imagery in support of a variety of military missions. Accurate prediction of ATD performance would assist in system design and trade stud-ies, collection management, and mission planning. A need exists for ATD performance prediction based exclusively on information available from the imagery and its associated metadata. We present a predictor based on image measures quantifying the intrinsic ATD difficulty on an image. The modeling effort consists of two phases: a learn-ing phase, where image measures are computed for a set of test images, the ATD performance is measured, and a prediction model is developed; and a second phase to test and validate performance prediction. The learning phase produces a mapping, valid across various ATR algorithms, which is even applicable when no image truth is avail-able (e.g., when evaluating denied area imagery). The testbed has plug-in capability to allow rapid evaluation of new ATR algorithms. The image measures employed in the model include: statistics derived from a constant false alarm rate (CFAR) processor, the Power Spectrum Signature, and others. We present a performance predictor using a trained classifier ATD that was constructed using GENIE, a tool developed at Los Alamos National Laboratory. The paper concludes with a discussion of future research.

20 citations

Proceedings ArticleDOI
20 Jun 2005
TL;DR: This work presents a single-camera stereo system that incorporates a Levenberg-Marquardt minimization of rectification parameters to bring the rectified images into alignment.
Abstract: Mobile robot designers frequently look to computer vision to solve navigation, obstacle avoidance, and object detection problems such as those encountered in parking lot surveillance. Stereo reconstruction is a useful technique in this domain. The advantage of a single-camera stereo method versus a stereo rig is the flexibility to change the baseline distance to best match each scenario. This directly increases the robustness of the stereo algorithm and increases the effective range of the system. The challenge comes from accurately rectifying the images into an ideal stereo pair. Structure from motion (SFM) can be used to compute the camera motion between the two images, but its accuracy is limited and small errors can cause rectified images to be misaligned. We present a single-camera stereo system that incorporates a Levenberg-Marquardt minimization of rectification parameters to bring the rectified images into alignment.

12 citations

Proceedings ArticleDOI
25 Jul 2002
TL;DR: An ATR algorithm that operates over data collected from a set of accelerometers that is useful for discrimination of the three different target categories and presents classification results based on these features.
Abstract: Automatic Target Recognition (ATR) algorithm performance is sensitive to variability in the observed target signature. Algorithms are developed and tested under a specific set of operating conditions and then are often required to perform well under very different conditions (referred to as Extended Operating Conditions, or EOCs). The stability of the target signature as the operating conditions change dictates the success or failure of the recognition algorithm. Laser vibrometry is a promising sensor modality for vehicle identification because target signatures tend to remain stable under a variety of EOCs. A micro-doppler vibrometry sensor measures surface deflection at a very high frequency, thus enabling the surface vibrations of a vehicle to be sensed from afar. Vehicle identification is possible since most vehicles with running engines have a unique vibration signature defined by the engine type. In this paper, we present an ATR algorithm that operates over data collected from a set of accelerometers. These contact accelerometers were placed at a variety of locations on three target vehicles to emulate an ideal laser vibrometer. We discuss a set of features that are useful for discrimination of the three different target categories. We also present classification results based on these features.

11 citations

Proceedings ArticleDOI
04 Jan 2006
TL;DR: This paper addresses the practical considerations associated with scale space representations and makes explicit how a scale space is constructed, thereby increasing the accessibility of this powerful representation to developers of computer vision systems.
Abstract: Over the last 30 years, scale space representations have emerged as a fundamental tool for allowing systems to become increasingly robust against changes in camera viewpoint. Unfortunately, the implementation details that are required to properly construct a scale space representation are not published in the literature. Incorrectly implementing these details will lead to extremely poor system performance. In this paper, we address the practical considerations associated with scale space representations. Our focus is to make explicit how a scale space is constructed, thereby increasing the accessibility of this powerful representation to developers of computer vision systems.

10 citations


Cited by
More filters
Book
30 Sep 2010
TL;DR: Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images and takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene.
Abstract: Humans perceive the three-dimensional structure of the world with apparent ease. However, despite all of the recent advances in computer vision research, the dream of having a computer interpret an image at the same level as a two-year old remains elusive. Why is computer vision such a challenging problem and what is the current state of the art? Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images. It also describes challenging real-world applications where vision is being successfully used, both for specialized applications such as medical imaging, and for fun, consumer-level tasks such as image editing and stitching, which students can apply to their own personal photos and videos. More than just a source of recipes, this exceptionally authoritative and comprehensive textbook/reference also takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene. These problems are also analyzed using statistical models and solved using rigorous engineering techniques Topics and features: structured to support active curricula and project-oriented courses, with tips in the Introduction for using the book in a variety of customized courses; presents exercises at the end of each chapter with a heavy emphasis on testing algorithms and containing numerous suggestions for small mid-term projects; provides additional material and more detailed mathematical topics in the Appendices, which cover linear algebra, numerical techniques, and Bayesian estimation theory; suggests additional reading at the end of each chapter, including the latest research in each sub-field, in addition to a full Bibliography at the end of the book; supplies supplementary course material for students at the associated website, http://szeliski.org/Book/. Suitable for an upper-level undergraduate or graduate-level course in computer science or engineering, this textbook focuses on basic techniques that work under real-world conditions and encourages students to push their creative boundaries. Its design and exposition also make it eminently suitable as a unique reference to the fundamental techniques and current research literature in computer vision.

4,146 citations

Proceedings ArticleDOI
17 Jun 2006
TL;DR: This paper first survey multi-view stereo algorithms and compare them qualitatively using a taxonomy that differentiates their key properties, then describes the process for acquiring and calibrating multiview image datasets with high-accuracy ground truth and introduces the evaluation methodology.
Abstract: This paper presents a quantitative comparison of several multi-view stereo reconstruction algorithms. Until now, the lack of suitable calibrated multi-view image datasets with known ground truth (3D shape models) has prevented such direct comparisons. In this paper, we first survey multi-view stereo algorithms and compare them qualitatively using a taxonomy that differentiates their key properties. We then describe our process for acquiring and calibrating multiview image datasets with high-accuracy ground truth and introduce our evaluation methodology. Finally, we present the results of our quantitative comparison of state-of-the-art multi-view stereo reconstruction algorithms on six benchmark datasets. The datasets, evaluation details, and instructions for submitting new models are available online at http://vision.middlebury.edu/mview.

2,556 citations

Journal ArticleDOI
TL;DR: For a broad family of features, this work finds that features computed at octave-spaced scale intervals are sufficient to approximate features on a finely-sampled pyramid, and this approximation yields considerable speedups with negligible loss in detection accuracy.
Abstract: Multi-resolution image features may be approximated via extrapolation from nearby scales, rather than being computed explicitly. This fundamental insight allows us to design object detection algorithms that are as accurate, and considerably faster, than the state-of-the-art. The computational bottleneck of many modern detectors is the computation of features at every scale of a finely-sampled image pyramid. Our key insight is that one may compute finely sampled feature pyramids at a fraction of the cost, without sacrificing performance: for a broad family of features we find that features computed at octave-spaced scale intervals are sufficient to approximate features on a finely-sampled pyramid. Extrapolation is inexpensive as compared to direct feature computation. As a result, our approximation yields considerable speedups with negligible loss in detection accuracy. We modify three diverse visual recognition systems to use fast feature pyramids and show results on both pedestrian detection (measured on the Caltech, INRIA, TUD-Brussels and ETH data sets) and general object detection (measured on the PASCAL VOC). The approach is general and is widely applicable to vision algorithms requiring fine-grained multi-scale analysis. Our approximation is valid for images with broad spectra (most natural images) and fails for images with narrow band-pass spectra (e.g., periodic textures).

2,000 citations

Book ChapterDOI
Christopher Choy1, Danfei Xu1, JunYoung Gwak1, Kevin Chen1, Silvio Savarese1 
08 Oct 2016
TL;DR: 3D-R2N2 as discussed by the authors proposes a 3D Recurrent Reconstruction Neural Network that learns a mapping from images of objects to their underlying 3D shapes from a large collection of synthetic data.
Abstract: Inspired by the recent success of methods that employ shape priors to achieve robust 3D reconstructions, we propose a novel recurrent neural network architecture that we call the 3D Recurrent Reconstruction Neural Network (3D-R2N2). The network learns a mapping from images of objects to their underlying 3D shapes from a large collection of synthetic data [13]. Our network takes in one or more images of an object instance from arbitrary viewpoints and outputs a reconstruction of the object in the form of a 3D occupancy grid. Unlike most of the previous works, our network does not require any image annotations or object class labels for training or testing. Our extensive experimental analysis shows that our reconstruction framework (i) outperforms the state-of-the-art methods for single view reconstruction, and (ii) enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).

1,336 citations

Proceedings ArticleDOI
01 Sep 2010
TL;DR: A technique to avoid constructing such a finely sampled image pyramid without sacrificing performance is proposed, and for a broad family of features, including gradient histograms, the feature responses computed at a single scale can be used to approximate feature responses at nearby scales.
Abstract: We demonstrate a multiscale pedestrian detector operating in near real time ( 6 fps on 640x480 images) with state-of-the-art detection performance. The computational bottleneck of many modern detectors is the construction of an image pyramid, typically sampled at 8-16 scales per octave, and associated feature computations at each scale. We propose a technique to avoid constructing such a finely sampled image pyramid without sacrificing performance: our key insight is that for a broad family of features, including gradient histograms, the feature responses computed at a single scale can be used to approximate feature responses at nearby scales. The approximation is accurate within an entire scale octave. This allows us to decouple the sampling of the image pyramid from the sampling of detection scales. Overall, our approximation yields a speedup of 10-100 times over competing methods with only a minor loss in detection accuracy of about 1-2% on the Caltech Pedestrian dataset across a wide range of evaluation settings. The results are confirmed on three additional datasets (INRIA, ETH, and TUD-Brussels) where our method always scores within a few percent of the state-of-the-art while being 1-2 orders of magnitude faster. The approach is general and should be widely applicable.

680 citations