scispace - formally typeset
Search or ask a question

Showing papers by "David G. Lowe published in 2012"


Book
14 Feb 2012
TL;DR: Spatial organization and recognition are shown to be a practical basis for current systems and to provide a promising path for further development of improved visual capabilities.
Abstract: A computational model is presented for the visual recognition of three-dimensional objects based upon their spatial correspondence with two-dimensional features in an image A number of components of this model are developed in further detail and implemented as computer algorithms At the highest level, a verification process has been developed which can determine exact values of viewpoint and object parameters from hypothesized matches between three-dimensional object features and two-dimensional image features This provides a reliable quantitative procedure for evaluating the correctness of an interpretation, even in the presence of noise or occlusion Given a reliable method for final evaluation of correspondence, the remaining components of the system are aimed at reducing the size of the search space which must be covered Unlike many previous approaches, this recognition process does not assume that it is possible to directly derive depth information from the image Instead, the primary descriptive component is a process of perceptual organization, in which spatial relations are detected directly among two-dimensional image features A basic requirement of the recognition process is that perceptual organization should accurately distinguish meaningful groupings from those which arise by accident of viewpoint or position This requirement is used to derive a number of further constraints which must be satisfied by algorithms for perceptual grouping A specific algorithm is presented for the problem of segmenting curves into natural descriptions Methods are also presented for using the viewpoint-invariance properties of the perceptual groupings to infer three-dimensional relations directly from the image The search process itself is described, both for covering the range of possible viewpoints and the range of possible objects A method is presented for using evidential reasoning to combine information from multiple sources to determine the most efficient ordering for the search This use of evidential reasoning allows a system to automatically improve its performance as it gains visual experience In summary, spatial organization and recognition are shown to be a practical basis for current systems and to provide a promising path for further development of improved visual capabilities

1,263 citations


Proceedings ArticleDOI
28 May 2012
TL;DR: This paper introduces a new algorithm for approximate matching of binary features, based on priority search of multiple hierarchical clustering trees, and shows that it performs well for large datasets, both in terms of speed and memory efficiency.
Abstract: There has been growing interest in the use of binary-valued features, such as BRIEF, ORB, and BRISK for efficient local feature matching. These binary features have several advantages over vector-based features as they can be faster to compute, more compact to store, and more efficient to compare. Although it is fast to compute the Hamming distance between pairs of binary features, particularly on modern architectures, it can still be too slow to use linear search in the case of large datasets. For vector-based features, such as SIFT and SURF, the solution has been to use approximate nearest-neighbor search, but these existing algorithms are not suitable for binary features. In this paper we introduce a new algorithm for approximate matching of binary features, based on priority search of multiple hierarchical clustering trees. We compare this to existing alternatives, and show that it performs well for large datasets, both in terms of speed and memory efficiency.

312 citations


Proceedings ArticleDOI
16 Jun 2012
TL;DR: Local Naive Bayes Nearest Neighbor (Local NBNN) as discussed by the authors is an improvement to the NBNN image classification algorithm that increases classification accuracy and improves its ability to scale to large numbers of object classes.
Abstract: We present Local Naive Bayes Nearest Neighbor, an improvement to the NBNN image classification algorithm that increases classification accuracy and improves its ability to scale to large numbers of object classes. The key observation is that only the classes represented in the local neighborhood of a descriptor contribute significantly and reliably to their posterior probability estimates. Instead of maintaining a separate search structure for each class's training descriptors, we merge all of the reference data together into one search structure, allowing quick identification of a descriptor's local neighborhood. We show an increase in classification accuracy when we ignore adjustments to the more distant classes and show that the run time grows with the log of the number of classes rather than linearly in the number of classes as did the original. Local NBNN gives a 100 times speed-up over the original NBNN on the Caltech 256 dataset. We also provide the first head-to-head comparison of NBNN against spatial pyramid methods using a common set of input features. We show that local NBNN outperforms all previous NBNN based methods and the original spatial pyramid model. However, we find that local NBNN, while competitive with, does not beat state-of-the-art spatial pyramid methods that use local soft assignment and max-pooling.

198 citations


Journal ArticleDOI
TL;DR: An integrated framework for accurately tracking tissue in surgical stereo-cameras at real-time speeds is presented and the salient feature framework is extended to support region tracking in order to maintain the spatial correspondence of a tracked region of tissue or a medical image registration to the surrounding tissue.
Abstract: Vision-based tracking of tissue is a key component to enable augmented reality during a surgical operation. Conventional tracking techniques in computer vision rely on identifying strong edge features or distinctive textures in a well-lit environment; however endoscopic tissue images do not have strong edge features, are poorly lit and exhibit a high degree of specular reflection. Therefore, prior work in achieving densely populated 3-D features for describing tissue surface profiles require complex image processing techniques and have been limited in providing stable, long-term tracking or real-time processing. In this paper, we present an integrated framework for accurately tracking tissue in surgical stereo-cameras at real-time speeds. We use a combination of the STAR feature detector and binary robust independent elementary features to acquire salient features that can be persistently tracked at high frame rates. The features are then used to acquire a densely-populated map of the deformations of tissue surface in 3-D. We evaluate the method against popular feature algorithms in in vivo animal study video sequences, and we also apply the proposed method to human partial nephrectomy video sequences. We extend the salient feature framework to support region tracking in order to maintain the spatial correspondence of a tracked region of tissue or a medical image registration to the surrounding tissue. In vitro tissue studies show registration accuracies of 1.3–3.3 mm using a rigid-body transformation method.

75 citations


Book ChapterDOI
05 Nov 2012
TL;DR: This work introduces spatially local coding, an alternative way to include spatial information in the image model that performs better than all previous single-feature methods when tested on the Caltech 101 and 256 object recognition datasets.
Abstract: The spatial pyramid and its variants have been among the most popular and successful models for object recognition. In these models, local visual features are coded across elements of a visual vocabulary, and then these codes are pooled into histograms at several spatial granularities. We introduce spatially local coding, an alternative way to include spatial information in the image model. Instead of only coding visual appearance and leaving the spatial coherence to be represented by the pooling stage, we include location as part of the coding step. This is a more flexible spatial representation as compared to the fixed grids used in the spatial pyramid models and we can use a simple, whole-image region during the pooling stage. We demonstrate that combining features with multiple levels of spatial locality performs better than using just a single level. Our model performs better than all previous single-feature methods when tested on the Caltech 101 and 256 object recognition datasets.

55 citations


Book ChapterDOI
27 Jun 2012
TL;DR: A novel framework to enable long-term tracking of image features is developed and two fast and robust feature algorithms are implemented, STAR and BRIEF, for application to endoscopic images that are able to acquire dense sets of salient features at real-time speeds.
Abstract: Salient feature tracking for endoscopic images has been investigated in the past for 3D reconstruction of endoscopic scenes as well as tracking of tissue through a video sequence. Recent work in the field has shown success in acquiring dense salient feature profiling of the scene. However, there has been relatively little work in performing long-term feature tracking for capturing tissue deformation. In addition, real-time solutions for tracking tissue features result in sparse densities, rely on restrictive scene and camera assumptions, or are limited in feature distinctiveness. In this paper, we develop a novel framework to enable long-term tracking of image features. We implement two fast and robust feature algorithms, STAR and BRIEF, for application to endoscopic images. We show that we are able to acquire dense sets of salient features at real-time speeds, and are able to track their positions for long periods of time.

21 citations