scispace - formally typeset
Search or ask a question

Showing papers on "Scale-invariant feature transform published in 2005"


Proceedings ArticleDOI
18 Apr 2005
TL;DR: A vision-based approach to self-localization that uses a novel scheme to integrate feature-based matching of panoramic images with Monte Carlo localization and a specially modified version of Lowe’s SIFT algorithm is used.
Abstract: In this paper we present a vision-based approach to self-localization that uses a novel scheme to integrate feature-based matching of panoramic images with Monte Carlo localization. A specially modified version of Lowe’s SIFT algorithm is used to match features extracted from local interest points in the image, rather than using global features calculated from the whole image. Experiments conducted in a large, populated indoor environment (up to 5 persons visible) over a period of several months demonstrate the robustness of the approach, including kidnapping and occlusion of up to 90% of the robot’s field of view.

124 citations


Proceedings Article
01 Jan 2005
TL;DR: In this paper, the Scale Invariant Feature Transform (SIFT) has been successfully applied to robot localization, but the number of features extracted with this approach is immense, especially when dealing with large robots.
Abstract: The Scale Invariant Feature Transform, SIFT, has been successfully applied to robot localization. Still, the number of features extracted with this approach is immense, especially when dealing with ...

88 citations


Proceedings ArticleDOI
15 Oct 2005
TL;DR: This paper presents a novel feature based object representation attributed relational graph (ARG) for reliable object tracking and adopts a competitive and efficient dynamic model to adoptively update the object model by adding new stable features as well as deleting inactive features.
Abstract: Two major problems for model-based object tracking are: 1) how to represent an object so that it can effectively be discriminated with background and other objects; 2) how to dynamically update the model to accommodate the object appearance and structure changes Traditional appearance based representations (like color histogram) fails when the object has rich texture In this paper, we present a novel feature based object representation attributed relational graph (ARG) for reliable object tracking The object is modeled with invariant features (SIFT) and their relationship is encoded in the form of an ARG that can effectively distinguish itself from background and other objects We adopt a competitive and efficient dynamic model to adoptively update the object model by adding new stable features as well as deleting inactive features A relaxation labeling method is used to match the model graph with the observation to gel the best object position Experiments show that our method can get reliable track even under dramatic appearance changes, occlusions, etc

59 citations


Proceedings ArticleDOI
05 Jan 2005
TL;DR: Improvements to the popular scale invariant feature transform (SIFT) are suggested which incorporate local object boundary information and the resulting feature detection and descriptor creation processes are invariant to changes in background.
Abstract: Current feature-based object recognition methods use information derived from local image patches. For robustness, features are engineered for invariance to various transformations, such as rotation, scaling, or affine warping. When patches overlap object boundaries, however, errors in both detection and matching will almost certainly occur due to inclusion of unwanted background pixels. This is common in real images, which often contain significant background clutter, objects which are not heavily textured, or objects which occupy a relatively small portion of the image. We suggest improvements to the popular scale invariant feature transform (SIFT) which incorporate local object boundary information. The resulting feature detection and descriptor creation processes are invariant to changes in background. We call this method the background and scale invariant feature transform (BSIFT). We demonstrate BSIFT's superior performance in feature detection and matching on synthetic and natural images.

49 citations


Proceedings ArticleDOI
16 Sep 2005
TL;DR: The combination of edges and interest points brings efficient feature detection and high recognition ratio to the image retrieval system.
Abstract: This paper presents a novel approach using combined features to retrieve images containing specific objects, scenes or buildings The content of an image is characterized by two kinds of features: Harris-Laplace interest points described by the SIFT descriptor and edges described by the edge color histogram Edges and corners contain the maximal amount of information necessary for image retrieval The feature detection in this work is an integrated process: edges are detected directly based on the Harris function; Harris interest points are detected at several scales and Harris-Laplace interest points are found using the Laplace function The combination of edges and interest points brings efficient feature detection and high recognition ratio to the image retrieval system Experimental results show this system has good performance

46 citations


Proceedings ArticleDOI
06 Jun 2005
TL;DR: This paper presents a system, which computes the computationally intensive parts of SIFT (Gaussian pyramid, Sobel etc) using an FPGA, and suggests deferring a large portion of the image processing onto a field programmable gate array (FPGA) since most operations can be heavily parallelized.
Abstract: Online stereo calibration is useful in many situations where the cameras are moving relative to each other. The motion can either be intentional, as in an active stereo head, or due to vibrations, heat etc. which is commonly found in automotive applications. However, most approaches for finding the essential matrix relating the two cameras are computationally very expensive and, hence, this problem must be addressed. In this paper, we suggest deferring a large portion of the image processing onto a field programmable gate array (FPGA) since most operations can be heavily parallelized. The specific algorithm chosen to find point correspondences between the left and the right images is SIFT, which has the advantage of producing a very small number of outliers. Having few outliers is important as computing the essential matrix from point correspondences is an inherently unstable problem, particularly in the case where the cameras are nearly parallel. We present a system, which computes the computationally intensive parts of SIFT (Gaussian pyramid, Sobel etc) using an FPGA. The host computer then uses the resulting point correspondences to estimate the essential matrix with the help of a reduced model of the camera setup. On-line stereo calibration at frame rate (60Hz) is then possible without excessively loading the host computer.

42 citations


Proceedings ArticleDOI
20 Jun 2005
TL;DR: In this paper, the authors investigate rapid pruning of the recognition search space using the already-computed low-level features that guide attention, which improves the speed of object recognition in complex natural scenes.
Abstract: Bottom-up visual attention allows primates to quickly select regions of an image that contain salient objects. In artificial systems, restricting the task of object recognition to these regions allows faster recognition and unsupervised learning of multiple objects in cluttered scenes. A problem is that objects superficially dissimilar to the target are given the same consideration in recognition as similar objects. Here we investigate rapid pruning of the recognition search space using the already-computed low-level features that guide attention. Itti and Koch’s bottom-up visual attention algorithm selects salient locations based on low-level features such as contrast, orientation, color, and intensity. Lowe’s SIFT recognition algorithm then extracts a signature of the attended object, for comparison with the object database. The database search is prioritized for objects which better match the low-level features used to guide attention to the current candidate for recognition. The SIFT signatures of prioritized database objects are then checked for match against the attended candidate. By comparing performance of Lowe’s recognition algorithm and Itti and Koch’s bottom-up attention model with or without search space pruning, we demonstrate that our pruning approach improves the speed of object recognition in complex natural scenes.

30 citations


Proceedings ArticleDOI
09 May 2005
TL;DR: This paper describes a view-based method for object recognition and estimation of object pose from a single image based on feature vector matching and clustering and the patch-duplet feature is compared to the SIFT feature.
Abstract: This paper describes a view-based method for object recognition and estimation of object pose from a single image. The method is based on feature vector matching and clustering. A set of interest points is detected and combined into pairs. A pair of patches, centered around each point in the pair, is extracted from a local orientation image. The patch orientation and size depends on the relative positions of the points, which make them invariant to translation, rotation, and locally invariant to scale. Each pair of patches constitutes a feature vector. The method is demonstrated on a number of real images and the patch-duplet feature is compared to the SIFT feature.

29 citations


Proceedings ArticleDOI
18 Apr 2005
TL;DR: This work presents a new method – the informative local features approach – based on an information theoretic saliency measure that is rapidly derived from a local Parzen window density estimation in feature subspace that enables attentive access to discriminative information and thereby significantly speeds up the recognition process.
Abstract: Autonomous mobile agents require object recognition for high level interpretation and localization in complex scenes. In urban environments, recognition of buildings might play a dominant role in robotic systems that need object based navigation, that take advantage of visual feedback and multimodal information for self-localization, or that enable association to related information from the identified semantics. We present a new method – the informative local features approach – based on an information theoretic saliency measure that is rapidly derived from a local Parzen window density estimation in feature subspace. From the learning of a decision tree based mapping to informative features, it enables attentive access to discriminative information and thereby significantly speeds up the recognition process. This approach is highly robust with respect to severe degrees of partial occlusion, noise, and tolerant to some changes in scale and illumination. We present performance evaluation on our publicly available reference object database (TSG-20) that demonstrates the efficiency of this approach, case wise even outperforming the SIFT feature approach [1]. Building recognition will be advantageous in various application domains, such as, mobile mapping, unmanned vehicle navigation, and systems for car driver assistance.

27 citations


Proceedings ArticleDOI
20 Jun 2005
TL;DR: A new method is introduced that characterizes typical local image features in terms of their distinctiveness, detectability, and robustness to image deformations in order to reduce the recognition time, improve the recognition accuracy, and increase the scalability of the recognition system given the smaller number of features per model.
Abstract: We introduce a new method that characterizes typical local image features (e.g., SIFT, phase feature) in terms of their distinctiveness, detectability, and robustness to image deformations. This is useful for the task of classifying local image features in terms of those three properties. The importance of this classification process for a recognition system using local features is as follows: a) reduce the recognition time due to a smaller number of features present in the test image and in the database of model features; b) improve the recognition accuracy since only the most useful features for the recognition task are kept in the model database; and c) increase the scalability of the recognition system given the smaller number of features per model. A discriminant classifier is trained to select well behaved feature points. A regression network is then trained to provide quantitative models of the detection distributions for each selected feature point. It is important to note that both the classifier and the regression network use image data alone as their input. Experimental results show that the use of these trained networks not only improves the performance of our recognition system, but it also significantly reduces the computation time for the recognition process.

27 citations



Book ChapterDOI
07 Jun 2005
TL;DR: This paper describes a new corner detection algorithm based on the Hough Transform that was tested on various test images, and the results are compared with well-known algorithms.
Abstract: This paper describes a new corner detection algorithm based on the Hough Transform. The basic idea is to find the straight lines in the images and then search for their intersections, which are the corner points of the objects in the images. The Hough Transform is used for detecting the straight lines and the inverse Hough Transform is used for locating the intersection points among the straight lines, and hence determine the corner points. The algorithm was tested on various test images, and the results are compared with well-known algorithms.

Proceedings ArticleDOI
09 May 2005
TL;DR: A method is presented to recover 3D scene structure and camera motion from a sequence of multiple images captured by an omnidirectional catadioptric camera, and this 3D model is then used to localize other panoramic images taken in the vicinity.
Abstract: Omni-directional sensors are useful in obtaining a 360/spl deg/ field of view of a scene for robot navigation, scene modeling, and telepresence. A method is presented to recover 3D scene structure and camera motion from a sequence of multiple images captured by an omnidirectional catadioptric camera. This 3D model is then used to localize other panoramic images taken in the vicinity. This goal is achieved by tracking the trajectories of SIFT keypoints, and finding the path they travel by utilizing a Hough transform technique modified for panoramic imagery. This technique is applied to spatio-temporal feature extraction in the three-dimensional space of an image sequence, as that scene points trace a horizontal line trajectory relative to the camera. SIFT (scale invariant feature transform) keypoints are distinctive image features which can be identified between images invariant to scale and rotation. Together these methods are applied to reconstruct a three-dimensional model from a sequence of panoramic images, where the panoramic camera was translating in a straight line horizontal path. Only the camera/mirror geometry is known a priori. The camera positions and the world model is determined, up to a scale factor. Experimental results of model building and camera localization using this model are shown.

Proceedings ArticleDOI
09 May 2005
TL;DR: An improved technique based on the observation of an additive property of the Hough transform, which results in improved efficiency in finding endpoints of line segments and improved robustness and reliability in extracting lines in noisy situations, at a slightly increased cost of memory.
Abstract: In the field of image processing, it is a common problem to search for edges within an image, typically using the Hough transform, and attempt to extract the end points of those edges. This paper discusses an improved technique for accomplishing this task. The idea is based on the observation of an additive property of the Hough transform. That is, the global Hough Transform can be obtained by the summation of local Hough transforms of disjoint sub-regions. The method discussed involves the recursive subdivision of the image into sub-images, each with their own parameter space, and organized in a quadtree structure, which allows for implicit storage of arbitrary parameter space manifolds. This method results in improved efficiency in finding endpoints of line segments and improved robustness and reliability in extracting lines in noisy situations, at a slightly increased cost of memory. The new algorithm is presented in detail, along with a discussion of time and space complexities. The paper is concluded with proposed future research in this direction.

Book ChapterDOI
09 Nov 2005
TL;DR: This work describes a method whereby the feature sets may be summarized using the stable bounded canonical set (SBCS), thus allowing the efficient computation of point correspondences between large feature sets.
Abstract: A common approach to the image matching problem is representing images as sets of features in some feature space followed by establishing correspondences among the features. Previous work by Huttenlocher and Ullman [1] shows how a similarity transformation – rotation, translation, and scaling – between two images may be determined assuming that three corresponding image points are known. While robust, such methods suffer from computational inefficiencies for general feature sets. We describe a method whereby the feature sets may be summarized using the stable bounded canonical set (SBCS), thus allowing the efficient computation of point correspondences between large feature sets. We use a notion of stability to influence the set summarization such that stable image features are preferred.

Journal Article
TL;DR: The application in full vision Mounter SMT2505 demonstrates that the average error of detection in center and the executive time are less than 1 pixel and 50ms respectively, which maintains the advantages of conventional Hough transform, such as high-accuracy and good performance in anti-noise and improves the processing speed for 2~3 orders of magnitude.
Abstract: A method using Hough transform is proposed for the detection of circular PCB Mark.After threshold transform of test image,the region of Mark is separated from background and noise by area segmentation.By computing the center of the Mark region,the accumulating range of circle center in Hough transform is restricted near this center.Through calculation of the radius of the Mark on the test image based on practical mark size,the accumulator of Hough transform is changed from 3-dimension to 2-dimension.After edge detection of test image using Canny operator,the detection process of Hough transform is accomplished through rough-and-fine accumulation strategy which firstly uses big accumulating distance and then uses small distance.The application in full vision Mounter SMT2505 demonstrates that the average error of detection in center and the executive time are less than 1 pixel and 50ms respectively,which maintains the advantages of conventional Hough transform,such as high-accuracy and good performance in anti-noise and improves the processing speed for 2~3 orders of magnitude.

Book ChapterDOI
31 Aug 2005
TL;DR: This paper introduces a new operator to characterize a point in an image in a distinctive and invariant way and demonstrates how a point can be recognized in spite of significant image noise, inhomogeneous change in illumination and altered perspective.
Abstract: This paper introduces a new operator to characterize a point in an image in a distinctive and invariant way. The robust recognition of points is a key technique in computer vision: algorithms for stereo correspondence, motion tracking and object recognition rely heavily on this type of operator. The goal in this paper is to describe the salient point to be characterized by a constellation of surrounding anchor points. Salient points are the most reliably localized points extracted by an interest point operator. The anchor points are multiple interest points in a visually homogenous segment surrounding the salient point. Because of its appearance, this constellation is called a spider. With a prototype of the spider operator, results in this paper demonstrate how a point can be recognized in spite of significant image noise, inhomogeneous change in illumination and altered perspective. For an example that requires a high performance close to object / background boundaries, the prototype yields better results than David Lowe’s SIFT operator.

Proceedings ArticleDOI
03 Oct 2005
TL;DR: The use of the scale invariant feature transform to match areas between stereo images and the spatial relationship between pairs of objects is described linguistically using a system of fuzzy rules.
Abstract: In this paper, we discuss the use of the scale invariant feature transform to match areas between stereo images. The three dimensional location of matched points are then computed. Each matched area is further matched to a database of known objects. After projecting the three dimensional locations onto a horizontal plane, the spatial relationship between pairs of objects are then described linguistically using a system of fuzzy rules. We are exploring the technique to facilitate human-like communication with a robot.

Proceedings ArticleDOI
05 Dec 2005
TL;DR: This work addresses the problem of globally consistent estimation of the trajectory of a robot arm moving in three dimensional space based on a sequence of binocular stereo images from a stereo camera mounted on the tip of the arm, and compares three different methods for solving this estimation problem.
Abstract: We address the problem of globally consistent estimation of the trajectory of a robot arm moving in three dimensional space based on a sequence of binocular stereo images from a stereo camera mounted on the tip of the arm. Correspondence between 3D points from successive stereo camera positions is established through matching of 2D SIFT features in the images. We compare three different methods for solving this estimation problem, based on three distance measures between 3D points, Euclidean distance, Mahalanobis distance and a distance measure defined by a maximum likelihood formulation. Theoretical analysis and experimental results demonstrate that the maximum likelihood formulation is the most accurate. If the measurement error is guaranteed to be small, then Euclidean distance is the fastest, without significantly compromising accuracy, and therefore it is best for on-line robot navigation.

Journal Article
TL;DR: The paper designed a new fast Hough transform for line detection based on analyzing the existing modified Hough Transform, and validated that the new designing algorithm has better real-time performance by comparing the existingmodified Houghtransform.
Abstract: Hough Transform is the best method to detect curve,but the efficiency of conventional Hough Transform is low,and is not real-time.The paper designed a new fast Hough Transform for line detection based on analyzing the existing modified Hough Transform,and then validated that the new designing algorithm has better real-time performance by comparing the existing modified Hough Transform.

Book ChapterDOI
Seungdo Jeong1, Jonglyul Chung1, Sanghoon Lee1, Il Hong Suh1, Byung-Uk Choi1 
14 Sep 2005
TL;DR: The Harris corner detector and pyramid Lucas-Kanade optical flow are combined for robot localization and SIFT keypoints and its descriptors for the model-based object recognition and stereo vision technique are applied to spatial context recognition.
Abstract: In this work, we propose a simultaneous mobile robot localization and spatial context recognition system. The Harris corner detector and pyramid Lucas-Kanade optical flow are combined for robot localization. And, SIFT keypoints and its descriptors for the model-based object recognition and stereo vision technique are applied to spatial context recognition. The effectiveness of our proposed method is verified by experiments.

Journal ArticleDOI
05 Jan 2005
TL;DR: 対象画像内に存在する2次元的な長方形物体が可能であり, また, 実用的な処理時間が達成できることを実験によって示す.
Abstract: 対象画像内に存在する2次元的な長方形物体を検出するための手法を提案する. 長方形物体の頂点候補をエッジ画素対における幾何学的特徴に基づく組合せHough変換によって求める. パラメータ空間への投票処理の結果から頂点候補の位置を検出するため, 高い信頼性が得られることが特長である. 提案手法は, 各種の長方形物体を安定に検出することが可能であり, また, 実用的な処理時間が達成できることを実験によって示す.

01 Jan 2005
TL;DR: This work investigates rapid pruning of the recognition search space using the already-computed low-level features that guide attention and demonstrates that the pruning approach improves the speed of object recognition in complex natural scenes.
Abstract: Bottom-up visual attention allows primates to quickly select regions of an image that contain salient objects. In artificial systems, restricting the task of object recognition to these regions allows faster recognition and unsupervised learning of multiple objects in cluttered scenes. A problem is that objects superficially dissimilar to the target are given the same consideration in recognition as similar objects. Here we investigate rapid pruning of the recognition search space using the already-computed low-level features that guide attention. Itti and Koch’s bottom-up visual attention algorithm selects salient locations based on low-level features such as contrast, orientation, color, and intensity. Lowe’s SIFT recognition algorithm then extracts a signature of the attended object, for comparison with the object database. The database search is prioritized for objects which better match the low-level features used to guide attention to the current candidate for recognition. The SIFT signatures of prioritized database objects are then checked for match against the attended candidate. By comparing performance of Lowe’s recognition algorithm and Itti and Koch’s bottom-up attention model with or without search space pruning, we demonstrate that our pruning approach improves the speed of object recognition in complex natural scenes.

Book ChapterDOI
20 Sep 2005
TL;DR: The presented method is based on the Hough transform for irregular objects, with a parameter space defined by translation, rotation and scaling operations, and may be used in a robotic system, identification system or for image analysis, directly on grey-level images.
Abstract: This paper presents an application of the Hough transform to the tasks of identifying irregular patterns. The presented method is based on the Hough transform for irregular objects, with a parameter space defined by translation, rotation and scaling operations. The technique may be used in a robotic system, identification system or for image analysis, directly on grey-level images. An example application of the Hough transform to a robot monitoring within computer vision systems is presented. A hardware implementation of the Hough technique is introduced which accelerates the calculations considerably.

Book ChapterDOI
05 Sep 2005
TL;DR: A robust algorithm based on three-point Hough transform and the segmentation of points on the dual sphere using the metric defined on a sphere is proposed.
Abstract: We propose the algorithm for detecting great circles on images on a sphere using the Hough transform Since our Hough transform on images on a sphere is derived on the basis of the duality, the Hough transform employs a dual sphere as the accumulator of the voting procedure Furthermore, we propose a robust algorithm based on three-point Hough transform and the segmentation of points on the dual sphere using the metric defined on a sphere

Journal Article
LI Cui-hua1
TL;DR: In this article, the authors take building detection and recognition as an instance, by using improved Hough transform, several line analysis strategies and measures to remove the inveracious targets are presented and the experimental results show that the approach with these methods can get acceptable results when image processing with scaling, different angles of view, various conditions of sunlight and mosaic phenomenon involved.
Abstract: In this paper, taking building detection and recognition as an instance, by using improved Hough transform, several line analysis strategies and measures to remove the inveracious targets are presented. The experimental results show that the approach with these methods can get acceptable results when image processing with scaling, different angles of view , various conditions of sunlight and mosaic phenomenon involved.

01 Jan 2005
TL;DR: A robust contour descriptor is introduced which is hoped can be combined with texture based features to obtain object recognition systems that work in a wider range of situations and is applied in a robotic setting where object appearances are learned by manipulating the objects.
Abstract: This report introduces a robust contour descriptor for view-based object recognition. In recent years great progress has been made in the field of view based object recognition mainly due to the introduction of texture based features such as SIFT and MSER. Although these are remarkably successful for textured objects, they have problems with man-made objects with little or no texture. For such objects, either explicit geometrical models, or contour and shading based features are also needed. This report introduces a robust contour descriptor which we hope can be combined with texture based features to obtain object recognition systems that work in a wider range of situations. Each detected contour is described as a sequence of line and ellipse segments, both which have well defined geometrical transformations to other views. The feature detector is also quite fast, this is mainly due to the idea of first detecting chains of contour points, these chains are then split into line segments, which are later either kept, grouped into ellipses or discarded. We demonstrate the robustness of the feature detector with a repeatability test under general homography transformations of a planar scene. Through the repeatability test, we find that using ellipse segments instead of lines, where this is appropriate improves repeatability. We also apply the features in a robotic setting where object appearances are learned by manipulating the objects.

Journal ArticleDOI
TL;DR: Definition of Continuous Kernel Hough transform, image processing, system identification and basics of parameter estimation are presented.
Abstract: covi´ and L'uboOvsenik Abstract: The paper deals with new modification of Hough transform - Continuous Kernel Hough transform. Definition of Continuous Kernel Hough transform, image processing, system identification and basics of parameter estimation are presented.

01 Jan 2005
TL;DR: In this paper, a keypoint descriptor is added to the feature detection phase to cope with large scale and view-point changes, and the descriptor is included in the equations of the proximity matrix that is central to the SVD-matching.
Abstract: The paper tackles the problem of feature points matching between pair of images of the same scene. This is a key problem in computer vision. The method we discuss here is a version of the SVD-matching proposed by Scott and Longuet-Higgins and later modified by Pilu, that we elaborate in order to cope with large scale variations. To this end we add to the feature detection phase a keypoint descriptor that is robust to large scale and view-point changes. Furthermore, we include this descriptor in the equations of the proximity matrix that is central to the SVD-matching. At the same time we remove from the proximity matrix all the information about the point locations in the image, that is the source of mismatches when the amount of scene variation increases. The main contribution of this work is in showing that this compact and easy algorithm can be used for severe scene variations. We present experimental evidence of the improved performance with respect to the previous versions of the algorithm.

Proceedings ArticleDOI
01 Jul 2005
TL;DR: A novel image registration method using multi-resolution based Hough transform that combines the robustness of General Houghtransform and the computation efficiency of iterative Hough Transform to achieve more accurate parameters.
Abstract: A novel image registration method using multi-resolution based Hough transform is presented in this paper. Hough transform is a classic method for parameter finding, but its huge space complexity and time complexity obstruct its using in image registration. Therefore multi-resolution decomposition is combined to reduce the computation. General Hough transform, which searches parameter in a 4D space, is first used in largest scale level to get initial transform parameters with less control points. After that, Iterative Hough transform, which searches parameter in 2D space, is further utilized in other scale levels in order to achieve more accurate parameters. The proposed method combines the robustness of General Hough transform and the computation efficiency of iterative Hough transform. At the same time, the huge complexity of General Hough transform and fragility of Iterative Hough transform are abstained by multi-resolution decomposition. The experiments also show the performance of proposed method.