scispace - formally typeset
Search or ask a question
Author

David G. Lowe

Bio: David G. Lowe is an academic researcher from University of British Columbia. The author has contributed to research in topics: Cognitive neuroscience of visual object recognition & Feature (computer vision). The author has an hindex of 52, co-authored 108 publications receiving 83353 citations. Previous affiliations of David G. Lowe include Courant Institute of Mathematical Sciences & Google.


Papers
More filters
Book ChapterDOI
07 May 2006
TL;DR: This work proposes a new model representation that has a less restrictive prior on the geometry and number of local features, where the geometry of each local feature is influenced by its k closest neighbors, and proposes a novel unsupervised on-line learning algorithm that is capable of estimating the model parameters efficiently and accurately.
Abstract: In recent years there has been growing interest in recognition models using local image features for applications ranging from long range motion matching to object class recognition systems. Currently, many state-of-the-art approaches have models involving very restrictive priors in terms of the number of local features and their spatial relations. The adoption of such priors in those models are necessary for simplifying both the learning and inference tasks. Also, most of the state-of-the-art learning approaches are semi-supervised batch processes, which considerably reduce their suitability in dynamic environments, where unannotated new images are continuously presented to the learning system. In this work we propose: 1) a new model representation that has a less restrictive prior on the geometry and number of local features, where the geometry of each local feature is influenced by its k closest neighbors and models may contain hundreds of features; and 2) a novel unsupervised on-line learning algorithm that is capable of estimating the model parameters efficiently and accurately. We implement a visual class recognition system using the new model and learning method proposed here, and demonstrate that our system produces competitive classification and localization results compared to state-of-the-art methods. Moreover, we show that the learning algorithm is able to model not only classes with consistent texture (e.g., faces), but also classes with shape only (e.g., leaves), classes with a common shape but with a great variability in terms of internal texture (e.g., cups), and classes of flexible objects (e.g., snake).

47 citations

01 Jan 2003
TL;DR: This paper addresses the problem of automatically computing homographies between successive frames in image sequences and compensating for the panning, tilting and zooming of the cameras by combining elements of two previous approaches.
Abstract: This paper addresses the problem of automatically computing homographies between successive frames in image sequences and compensating for the panning, tilting and zooming of the cameras. A homography is a projective mapping between two image planes and describes the transformation created by a fixed camera as it pans, tilts, rotates, and zooms around its optical centre. Our algorithm achieves improved robustness for large motions by combining elements of two previous approaches: it first computes the local displacements of image features using the KanadeLucas-Tomasi (KLT) tracker and determines local matches. The majority of these features are selected by RANSAC and give the initial estimate of the homography. Our modelbased correction system then compensates for remaining projection errors in the image to rink mapping. The system is demonstrated on a digitized sequence of an NHL hockey game, and it is capable of analyzing long sequences of consecutive frames from broadcast video by mapping them into the rink coordinates.

44 citations

Proceedings ArticleDOI
20 Apr 1997
TL;DR: An implemented model-based telerobotic system designed to investigate assembly and other tasks involving contact and manipulation of known objects and a task-centric operator interface is described, which includes performing assembly-like tasks over the Internet.
Abstract: We describe an implemented model-based telerobotic system designed to investigate assembly and other tasks involving contact and manipulation of known objects. Key features of our system include ease of maintaining a world model at the operator site and a task-centric operator interface. Our system incorporates gray-scale model-based vision to assist in building and maintaining the local model. The local model is used to provide a task-centric operator interface, emphasizing the natural and direct manipulation of objects, with the robot's presence indicated in a more abstract fashion. The operator interface is designed to work with widely available and inexpensive desktop computers with low DOF input devices (such as a mouse). We also describe experimental results to date, which include performing assembly-like tasks over the Internet.

36 citations

Book ChapterDOI
13 Apr 1996
TL;DR: Experiments show the method capable of learning to recognize complex objects in cluttered images, acquiring models that represent those objects using relatively few views.
Abstract: We describe how to model the appearance of an object using multiple views, learn such a model from training images, and recognize objects with it The model uses probability distributions to characterize the significance, position, and intrinsic measurements of various discrete features of appearance; it also describes topological relations among features The features and their distributions are learned from training images depicting the modeled object A matching procedure, combining qualities of both alignment and graph subisomorphism methods, uses feature uncertainty information recorded by the model to guide the search for a match between model and image Experiments show the method capable of learning to recognize complex objects in cluttered images, acquiring models that represent those objects using relatively few views

36 citations

Journal ArticleDOI
TL;DR: Improvement is achieved in both path-tracking accuracy and slippage control problems for a tracked mobile robot (an excavator).
Abstract: This paper describes a vision-based control system for a tracked mobile robot (an excavator). The system includes several controllers that collaborate to move the mobile vehicle from a starting position to a goal position. First, the path planner designs an optimum path using a predefined elevation map of the work space. Second, a fuzzy logic path-tracking controller estimates the rotational and translational velocities for the vehicle to move along the predesigned path. Third, a cross coupling controller corrects the possible orientation error that may occur when moving along the path. A motor controller then converts the track velocities to the corresponding rotational wheel velocities. Fourth, a vision-based motion tracking system is implemented to find the three-dimensional (3-D) motion of the vehicle as it moves in the work space. Finally, a specially-designed slippage controller detects slippage by comparing the motion through reading of flowmeters and the vision system. If slippage has occurred, the remaining path is corrected within the path tracking controller to stop at the goal position. Experiments are conducted to test and verify the presented control system. An analysis of the results shows that improvement is achieved in both path-tracking accuracy and slippage control problems.

35 citations


Cited by
More filters
Proceedings ArticleDOI
Jia Deng1, Wei Dong1, Richard Socher1, Li-Jia Li1, Kai Li1, Li Fei-Fei1 
20 Jun 2009
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Abstract: The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond.

49,639 citations

Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Journal ArticleDOI
TL;DR: Fiji is a distribution of the popular open-source software ImageJ focused on biological-image analysis that facilitates the transformation of new algorithms into ImageJ plugins that can be shared with end users through an integrated update system.
Abstract: Fiji is a distribution of the popular open-source software ImageJ focused on biological-image analysis. Fiji uses modern software engineering practices to combine powerful software libraries with a broad range of scripting languages to enable rapid prototyping of image-processing algorithms. Fiji facilitates the transformation of new algorithms into ImageJ plugins that can be shared with end users through an integrated update system. We propose Fiji as a platform for productive collaboration between computer science and biology research communities.

43,540 citations

Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

31,952 citations

Journal ArticleDOI
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Abstract: The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the 5 years of the challenge, and propose future directions and improvements.

30,811 citations