Showing papers on "Scale-invariant feature transform published in 2007"

PDF

Open Access

Proceedings Article•DOI•

A 3-dimensional sift descriptor and its application to action recognition

[...]

Paul Scovanner¹, Saad Ali¹, Mubarak Shah¹•Institutions (1)

29 Sep 2007

TL;DR: This paper uses a bag of words approach to represent videos, and presents a method to discover relationships between spatio-temporal words in order to better describe the video data.

...read moreread less

Abstract: In this paper we introduce a 3-dimensional (3D) SIFT descriptor for video or 3D imagery such as MRI data. We also show how this new descriptor is able to better represent the 3D nature of video data in the application of action recognition. This paper will show how 3D SIFT is able to outperform previously used description methods in an elegant and efficient manner. We use a bag of words approach to represent videos, and present a method to discover relationships between spatio-temporal words in order to better describe the video data.

...read moreread less

1,757 citations

Journal Article•DOI•

An Efficient Multimodal 2D-3D Hybrid Approach to Automatic Face Recognition

[...]

Ajmal Mian¹, Mohammed Bennamoun¹, Robyn Owens¹•Institutions (1)

University of Western Australia¹

01 Nov 2007-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A fully automatic face recognition algorithm that is multimodal (2D and 3D) and performs hybrid (feature based and holistic) matching in order to achieve efficiency and robustness to facial expressions is presented.

...read moreread less

Abstract: We present a fully automatic face recognition algorithm and demonstrate its performance on the FRGC v2.0 data. Our algorithm is multimodal (2D and 3D) and performs hybrid (feature based and holistic) matching in order to achieve efficiency and robustness to facial expressions. The pose of a 3D face along with its texture is automatically corrected using a novel approach based on a single automatically detected point and the Hotelling transform. A novel 3D spherical face representation (SFR) is used in conjunction with the scale-invariant feature transform (SIFT) descriptor to form a rejection classifier, which quickly eliminates a large number of candidate faces at an early stage for efficient recognition in case of large galleries. The remaining faces are then verified using a novel region-based matching approach, which is robust to facial expressions. This approach automatically segments the eyes- forehead and the nose regions, which are relatively less sensitive to expressions and matches them separately using a modified iterative closest point (ICP) algorithm. The results of all the matching engines are fused at the metric level to achieve higher accuracy. We use the FRGC benchmark to compare our results to other algorithms that used the same database. Our multimodal hybrid algorithm performed better than others by achieving 99.74 percent and 98.31 percent verification rates at a 0.001 false acceptance rate (FAR) and identification rates of 99.02 percent and 95.37 percent for probes with a neutral and a nonneutral expression, respectively.

...read moreread less

495 citations

SiftGPU : A GPU Implementation of Scale Invariant Feature Transform (SIFT)

[...]

Changchang Wu

01 Jan 2007

486 citations

Proceedings Article•DOI•

Learning Local Image Descriptors

[...]

Simon Winder¹, Matthew Brown¹•Institutions (1)

Microsoft¹

17 Jun 2007

TL;DR: The best descriptors were those with log polar histogramming regions and feature vectors constructed from rectified outputs of steerable quadrature filters, which gave one third of the incorrect matches produced by SIFT.

...read moreread less

Abstract: In this paper we study interest point descriptors for image matching and 3D reconstruction. We examine the building blocks of descriptor algorithms and evaluate numerous combinations of components. Various published descriptors such as SIFT, GLOH, and Spin images can be cast into our framework. For each candidate algorithm we learn good choices for parameters using a training set consisting of patches from a multi-image 3D reconstruction where accurate ground-truth matches are known. The best descriptors were those with log polar histogramming regions and feature vectors constructed from rectified outputs of steerable quadrature filters. At a 95% detection rate these gave one third of the incorrect matches produced by SIFT.

...read moreread less

433 citations

Proceedings Article•DOI•

Shape Descriptors for Maximally Stable Extremal Regions

[...]

Per-Erik Forssén¹, David G. Lowe¹•Institutions (1)

University of British Columbia¹

26 Dec 2007

TL;DR: An affine invariant shape descriptor for maximally stable extremal regions (MSER) is introduced that uses only the shape of the detected MSER itself and can achieve the best performance under a range of imaging conditions by matching both the texture and shape descriptors.

...read moreread less

Abstract: This paper introduces an affine invariant shape descriptor for maximally stable extremal regions (MSER). Affine invariant feature descriptors are normally computed by sampling the original grey-scale image in an invariant frame defined from each detected feature, but we instead use only the shape of the detected MSER itself. This has the advantage that features can be reliably matched regardless of the appearance of the surroundings of the actual region. The descriptor is computed using the scale invariant feature transform (SIFT), with the resampled MSER binary mask as input. We also show that the original MSER detector can be modified to achieve better scale invariance by detecting MSERs in a scale pyramid. We make extensive comparisons of the proposed feature against a SIFT descriptor computed on grey-scale patches, and also explore the possibility of grouping the shape descriptors into pairs to incorporate more context. While the descriptor does not perform as well on planar scenes, we demonstrate various categories of full 3D scenes where it outperforms the SIFT descriptor computed on grey-scale patches. The shape descriptor is also shown to be more robust to changes in illumination. We show that a system can achieve the best performance under a range of imaging conditions by matching both the texture and shape descriptors.

...read moreread less

245 citations

Proceedings Article•DOI•

SURF features for efficient robot localization with omnidirectional images

[...]

Ana C. Murillo¹, José Jesús Guerrero¹, Carlos Sagues¹•Institutions (1)

University of Zaragoza¹

10 Apr 2007

TL;DR: The use of a recently developed feature, SURF, is proposed to improve the performance of appearance-based localization methods that perform image retrieval in large data sets, showing the use of SURF as the best compromise between efficiency and accuracy in the results.

...read moreread less

Abstract: Many robotic applications work with visual reference maps, which usually consist of sets of more or less organized images. In these applications, there is a compromise between the density of reference data stored and the capacity to identify later the robot localization, when it is not exactly in the same position as one of the reference views. Here we propose the use of a recently developed feature, SURF, to improve the performance of appearance-based localization methods that perform image retrieval in large data sets. This feature is integrated with a vision-based algorithm that allows both topological and metric localization using omnidirectional images in a hierarchical approach. It uses pyramidal kernels for the topological localization and three-view geometric constraints for the metric one. Experiments with several omnidirectional images sets are shown, including comparisons with other typically used features (radial lines and SIFT). The advantages of this approach are proved, showing the use of SURF as the best compromise between efficiency and accuracy in the results.

...read moreread less

243 citations

Proceedings Article•DOI•

Scalable near identical image and shot detection

[...]

Ondřej Chum¹, James Philbin¹, Michael Isard, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

09 Jul 2007

TL;DR: Two novel schemes for near duplicate image and video-shot detection based on global hierarchical colour histograms, using Locality Sensitive Hashing for fast retrieval and local feature descriptors, are proposed and compared.

...read moreread less

Abstract: This paper proposes and compares two novel schemes for near duplicate image and video-shot detection. The first approach is based on global hierarchical colour histograms, using Locality Sensitive Hashing for fast retrieval. The second approach uses local feature descriptors (SIFT) and for retrieval exploits techniques used in the information retrieval community to compute approximate set intersections between documents using a min-Hash algorithm.The requirements for near-duplicate images vary according to the application, and we address two types of near duplicate definition: (i) being perceptually identical (e.g. up to noise, discretization effects, small photometric distortions etc); and (ii) being images of the same 3D scene (so allowing for viewpoint changes and partial occlusion). We define two shots to be near-duplicates if they share a large percentage of near-duplicate frames.We focus primarily on scalability to very large image and video databases, where fast query processing is necessary. Both methods are designed so that only a small amount of data need be stored for each image. In the case of near-duplicate shot detection it is shown that a weak approximation to histogram matching, consuming substantially less storage, is sufficient for good results. We demonstrate our methods on the TRECVID 2006 data set which contains approximately 165 hours of video (about 17.8M frames with 146K key frames), and also on feature films and pop videos.

...read moreread less

237 citations

Proceedings Article•DOI•

Person-Specific SIFT Features for Face Recognition

[...]

Jun Luo¹, Yong Ma², Erina Takikawa², Shihong Lao², M. Kawade², Bao-Liang Lu¹ - Show less +2 more•Institutions (2)

Shanghai Jiao Tong University¹, Omron²

15 Apr 2007

TL;DR: The experimental results demonstrate the robustness of SIFT features to expression, accessory and pose variations and a simple non-statistical matching strategy combined with local and global similarity on key-points clusters to solve face recognition problems.

...read moreread less

Abstract: Scale invariant feature transform (SIFT) proposed by Lowe has been widely and successfully applied to object detection and recognition. However, the representation ability of SIFT features in face recognition has rarely been investigated systematically. In this paper, we proposed to use the person-specific SIFT features and a simple non-statistical matching strategy combined with local and global similarity on key-points clusters to solve face recognition problems. Large scale experiments on FERET and CAS-PEAL face databases using only one training sample per person have been carried out to compare it with other non person-specific features such as Gabor wavelet feature and local binary pattern feature. The experimental results demonstrate the robustness of SIFT features to expression, accessory and pose variations.

...read moreread less

225 citations

Proceedings Article•DOI•

Discriminant Embedding for Local Image Descriptors

[...]

Gang Hua¹, Matthew Brown¹, Simon Winder¹•Institutions (1)

Microsoft¹

26 Dec 2007

TL;DR: This paper formulates descriptor design as a non- parametric dimensionality reduction problem, and adopts a discriminative approach that can exceed the performance of the current state of the art techniques such as SIFT with far fewer dimensions, and with virtually no parameters to be tuned by hand.

...read moreread less

Abstract: Invariant feature descriptors such as SIFT and GLOH have been demonstrated to be very robust for image matching and visual recognition. However, such descriptors are generally parameterised in very high dimensional spaces e.g. 128 dimensions in the case of SIFT. This limits the performance of feature matching techniques in terms of speed and scalability. Furthermore, these descriptors have traditionally been carefully hand crafted by manually tuning many parameters. In this paper, we tackle both of these problems by formulating descriptor design as a non- parametric dimensionality reduction problem. In contrast to previous approaches that use only the global statistics of the inputs, we adopt a discriminative approach. Starting from a large training set of labelled match/non-match pairs, we pursue lower dimensional embeddings that are optimised for their discriminative power. Extensive comparative experiments demonstrate that we can exceed the performance of the current state of the art techniques such as SIFT with far fewer dimensions, and with virtually no parameters to be tuned by hand.

...read moreread less

211 citations

Proceedings Article•

SIFT, SURF and seasons : long-term outdoor localization using local features

[...]

Christoffer Valgren¹, Achim J. Lilienthal•Institutions (1)

Örebro University¹

01 Jan 2007

TL;DR: This paper addresses the issues of outdoor appearance-based topological localization for a mobile robot over time and shows that two variants of SURF, called U-SURF and SURF-128, outperform the other algorithms in terms of accuracy and speed.

...read moreread less

Abstract: Local feature matching has become a commonly used method to compare images. For mobile robots, a reliable method for comparing images can constitute a key component for localization and loop closing tasks. In this paper, we address the issues of outdoor appearance-based topological localization for a mobile robot over time. Our data sets, each consisting of a large number of panoramic images, have been acquired over a period of nine months with large seasonal changes (snowcovered ground, bare trees, autumn leaves, dense foliage, etc.). Two different types of image feature algorithms, SIFT and the more recent SURF, have been used to compare the images. We show that two variants of SURF, called U-SURF and SURF-128, outperform the other algorithms in terms of accuracy and speed.

...read moreread less

175 citations

Proceedings Article•DOI•

N-sift: n-dimensional scale invariant feature transform for matching medical images

[...]

Warren A. Cheung¹, Ghassan Hamarneh²•Institutions (2)

University of British Columbia¹, Simon Fraser University²

12 Apr 2007

TL;DR: This method extends the concepts used in the computer vision SIFT technique for extracting and matching distinctive scale invariant features in 2D scalar images to scalar image of arbitrary dimensionality by using hyperspherical coordinates for gradients and multidimensional histograms to create the feature vectors.

...read moreread less

Abstract: We present a fully automated multimodal medical image matching technique Our method extends the concepts used in the computer vision SIFT technique for extracting and matching distinctive scale invariant features in 2D scalar images to scalar images of arbitrary dimensionality This extension involves using hyperspherical coordinates for gradients and multidimensional histograms to create the feature vectors These features were successfully applied to determine accurate feature point correspondence between pairs of medical images (3D) and dynamic volumetric data (3D+time)

...read moreread less

Journal Article•DOI•

Comparing several implementations of two recently published feature detectors

[...]

Johannes Bauer¹, Niko Sünderhauf¹, Peter Protzel¹•Institutions (1)

Chemnitz University of Technology¹

01 Jan 2007-IFAC Proceedings Volumes

TL;DR: In this paper, the authors compare and evaluate how well different implementations of SIFT and SURF perform in terms of invariancy and runtime efficiency for object detection and object recognition.

...read moreread less

Proceedings Article•DOI•

Thrift: Local 3D Structure Recognition

[...]

Alex Flint¹, Anthony Dick¹, Anton van den Hengel¹•Institutions (1)

University of Adelaide¹

03 Dec 2007

TL;DR: A 3D interest point detector that is based on SURF and a 3D descriptor that extends SIFT are proposed that are applied to the problem of detecting repeated structure in range images, and promising results are reported.

...read moreread less

Abstract: This paper presents a method for describing and recognising local structure in 3D images. The method extends proven techniques for 2D object recognition in images. In particular, we propose a 3D interest point detector that is based on SURF, and a 3D descriptor that extends SIFT. The method is applied to the problem of detecting repeated structure in range images, and promising results are reported.

...read moreread less

Proceedings Article•DOI•

Improving Descriptors for Fast Tree Matching by Optimal Linear Projection

[...]

Krystian Mikolajczyk¹, Jiri Matas²•Institutions (2)

University of Surrey¹, Czech Technical University in Prague²

26 Dec 2007

TL;DR: It is shown experimentally that the transformation allows a significant dimensionality reduction and improves matching performance of a state-of-the art SIFT descriptor and consistent improvement in precision-recall and speed of fast matching in tree structures at the expense of little overhead for projecting the descriptors into transformed space.

...read moreread less

Abstract: In this paper we propose to transform an image descriptor so that nearest neighbor (NN) search for correspondences becomes the optimal matching strategy under the assumption that inter-image deviations of corresponding descriptors have Gaussian distribution. The Euclidean NN in the transformed domain corresponds to the NN according to a truncated Mahalanobis metric in the original descriptor space. We provide theoretical justification for the proposed approach and show experimentally that the transformation allows a significant dimensionality reduction and improves matching performance of a state-of-the art SIFT descriptor. We observe consistent improvement in precision-recall and speed of fast matching in tree structures at the expense of little overhead for projecting the descriptors into transformed space. In the context of SIFT vs. transformed M- SIFT comparison, tree search structures are evaluated according to different criteria and query types. All search tree experiments confirm that transformed M-SIFTperforms better than the original SIFT.

...read moreread less

Proceedings Article•DOI•

Real-time eye blink detection with GPU-based SIFT tracking

[...]

Marc Lalonde, David Byrns, Langis Gagnon, Normand Teasdale¹, Denis Laurendeau¹ - Show less +1 more•Institutions (1)

Laval University¹

28 May 2007

TL;DR: This paper reports on the implementation of a GPU-based, real-time eye blink detector on very low contrast images acquired under near-infrared illumination that is part of a multi-sensor data acquisition and analysis system for driver performance assessment and training.

...read moreread less

Abstract: This paper reports on the implementation of a GPU-based, real-time eye blink detector on very low contrast images acquired under near-infrared illumination. This detector is part of a multi-sensor data acquisition and analysis system for driver performance assessment and training. Eye blinks are detected inside regions of interest that are aligned with the subject's eyes at initialization. Alignment is maintained through time by tracking SIFT feature points that are used to estimate the affine transformation between the initial face pose and the pose in subsequent frames. The GPU implementation of the SIFT feature point extraction algorithm ensures real-time processing. An eye blink detection rate of 97% is obtained on a video dataset of 33,000 frames showing 237 blinks from 22 subjects.

...read moreread less

Book Chapter•DOI•

Hand posture recognition using adaboost with SIFT for human robot interaction

[...]

Chieh-Chih Wang¹, Ko-Chih Wang¹•Institutions (1)

National Taiwan University¹

01 Jan 2007

TL;DR: A hand posture recognition system using the discrete Adaboost learning algorithm with Lowe’s scale invariant feature transform (SIFT) features is proposed to tackle the degraded performance due to background noise in training images and the in-plane rotation variant detection.

...read moreread less

Abstract: Hand posture understanding is essential to human robot interaction. The existing hand detection approaches using a Viola-Jones detector have two fundamental issues, the degraded performance due to background noise in training images and the in-plane rotation variant detection. In this paper, a hand posture recognition system using the discrete Adaboost learning algorithm with Lowe’s scale invariant feature transform (SIFT) features is proposed to tackle these issues simultaneously. In addition, we apply a sharing feature concept to increase the accuracy of multi-class hand posture recognition. The experimental results demonstrate that the proposed approach successfully recognizes three hand posture classes and can deal with the background noise issues. Our detector is in-plane rotation invariant, and achieves satisfactory multi-view hand detection.

...read moreread less

Proceedings Article•DOI•

SIFT Features Tracking for Video Stabilization

[...]

Battiato, Gallo, Puglisi, Scellato

01 Jan 2007

Proceedings Article•DOI•

Ellipse Detection with Hough Transform in One Dimensional Parametric Space

[...]

Alex Yong-Sang Chia¹, Maylor K. H. Leung¹, How-Lung Eng², Susanto Rahardja²•Institutions (2)

Nanyang Technological University¹, Institute for Infocomm Research Singapore²

12 Nov 2007

TL;DR: A novel ellipse detection algorithm which retains the original advantages of the Hough Transform while minimizing the storage and computation complexity and uses an accumulator that is only one dimensional.

...read moreread less

Abstract: The main advantage of using the Hough Transform to detect ellipses is its robustness against missing data points. However, the storage and computational requirements of the Hough Transform preclude practical applications. Although there are many modifications to the Hough Transform, these modifications still demand significant storage requirement. In this paper, we present a novel ellipse detection algorithm which retains the original advantages of the Hough Transform while minimizing the storage and computation complexity. More specifically, we use an accumulator that is only one dimensional. As such, our algorithm is more effective in terms of storage requirement. In addition, our algorithm can be easily parallelized to achieve good execution time. Experimental results on both synthetic and real images demonstrate the robustness and effectiveness of our algorithm in which both complete and incomplete ellipses can be extracted.

...read moreread less

Proceedings Article•

Pruning SIFT for scalable near-duplicate image matching

[...]

Jun Jie Foo¹, Ranjan Sinha¹•Institutions (1)

RMIT University¹

30 Mar 2007

TL;DR: This work shows that for this application domain, the SIFT interest points can be dramatically pruned to effect large reductions in both memory requirements and query run-time, with almost negligible loss in effectiveness.

...read moreread less

Abstract: The detection of image versions from large image collections is a formidable task as two images are rarely identical. Geometric variations such as cropping, rotation, and slight photometric alteration are unsuitable for content-based retrieval techniques, whereas digital watermarking techniques have limited application for practical retrieval. Recently, the application of Scale Invariant Feature Transform (SIFT) interest points to this domain have shown high effectiveness, but scalability remains a problem due to the large number of features generated for each image. In this work, we show that for this application domain, the SIFT interest points can be dramatically pruned to effect large reductions in both memory requirements and query run-time, with almost negligible loss in effectiveness. We demonstrate that, unlike using the original SIFT features, the pruned features scales better for collections containing hundreds of thousands of images.

...read moreread less

Journal Article•DOI•

Hierarchical building recognition

[...]

Wei Zhang¹, Jana Kosecka¹•Institutions (1)

George Mason University¹

01 May 2007-Image and Vision Computing

TL;DR: A hierarchical approach for building recognition using a method for selecting discriminative SIFT features and a simple probabilistic model for integration of the evidence from individual matches based on the match quality is proposed.

...read moreread less

Proceedings Article•DOI•

Classifying Computer Generated Charts

[...]

V.S.N. Prasad, Behjat Siddiquie, J. Golbeck, Larry S. Davis

25 Jun 2007

TL;DR: An approach for classifying images of charts based on the shape and spatial relationships of their primitives and two novel features to represent the structural information based on region segmentation and curve saliency are introduced.

...read moreread less

Abstract: We present an approach for classifying images of charts based on the shape and spatial relationships of their primitives. Five categories are considered: bar-charts, curve-plots, pie-charts, scatter-plots and surface-plots. We introduce two novel features to represent the structural information based on (a) region segmentation and (b) curve saliency. The local shape is characterized using the Histograms of Oriented Gradients (HOG) and the Scale Invariant Feature Transform (SIFT) descriptors. Each image is represented by sets of feature vectors of each modality. The similarity between two images is measured by the overlap in the distribution of the features -measured using the Pyramid Match algorithm. A test image is classified based on its similarity with training images from the categories. The approach is tested with a database of images collected from the Internet.

...read moreread less

Proceedings Article•DOI•

Quantitative Evaluation of Feature Extractors for Visual SLAM

[...]

J. Klippenstein¹, Hong Zhang¹•Institutions (1)

University of Alberta¹

28 May 2007

TL;DR: A performance evaluation framework for visual feature extraction and matching in the visual simultaneous localization and mapping (SLAM) context is presented and shows that all methods can be made to perform well, although it is possible to distinguish between the three.

...read moreread less

Abstract: We present a performance evaluation framework for visual feature extraction and matching in the visual simultaneous localization and mapping (SLAM) context. Although feature extraction is a crucial component, no qualitative study comparing different techniques from the visual SLAM perspective exists. We extend previous image pair evaluation methods to handle non-planar scenes and the multiple image sequence requirements of our application, and compare three popular feature extractors used in visual SLAM: the Harris corner detector, the Kanade-Lucas-Tomasi tracker (KLT), and the scale-invariant feature transform (SIFT). We present results from a typical indoor environment in the form of recall/precision curves, and also investigate the effect of increasing distance between image viewpoints on extractor performance. Our results show that all methods can be made to perform well, although it is possible to distinguish between the three. We conclude by presenting guidelines for selecting a feature extractor for visual SLAM based on our experiments.

...read moreread less

Patent•

Image recognition method, image recognition device, and image recognition program

[...]

Kazuto Noguchi¹, Koichi Kise¹, Masakazu Iwamura¹•Institutions (1)

Osaka Prefecture University¹

01 Aug 2007

TL;DR: In this paper, the authors proposed a method for object recognition based on nearest neighbor search of local descriptors such as SIFT, which is based on the observation that the level of accuracy of nearest neighbour search for correct recognition depends on images to be recognized.

...read moreread less

Abstract: For object recognition based on nearest neighbor search of local descriptors such as SIFT, it is important to keep the nearest neighbor search efficient to deal with a huge number of descriptors. The present invention provides methods of efficient recognition. In one embodiment, the method is based on the observation that the level of accuracy of nearest neighbor search for correct recognition depends on images to be recognized. The method is characterized by the mechanism that multiple recognizers with approximate nearest neighbor search are cascaded in the order of the level of approximation so as to improve the efficiency by adaptively controlling the level to be applied depending on images. In another embodiment the method is characterized by excluding local descriptors with low discriminability when a plenty of local descriptors are present in the vicinity and a plenty of distance calculation are required.

...read moreread less

Proceedings Article•DOI•

On Model-Based Analysis of Ear Biometrics

[...]

Banafshe Arbab-Zavar¹, Mark S. Nixon¹, David J. Hurley¹•Institutions (1)

University of Southampton¹

12 Dec 2007

TL;DR: A new model-based approach, capitalizing on explicit structure and with the advantages of being robust in noise and occlusion handling is proposed, which achieves an encouraging recognition rate, on an image database selected from the XM2VTS database.

...read moreread less

Abstract: Ears are a new biometric with major advantage in that they appear to maintain their structure with increasing age. Most current approaches are holistic and describe the ear by its general properties. We propose a new model-based approach, capitalizing on explicit structure and with the advantages of being robust in noise and occlusion. Our model is a constellation of generalized ear parts, which is learned off-line using an unsupervised learning algorithm over an enrolled training set of 63 ear images. The Scale Invariant Feature Transform (SIFT), is used to detect the features within the ear images. In recognition, given a profile image of the human head, the ear is enrolled and recognised from the parts selected via the model. We achieve an encouraging recognition rate, on an image database selected from the XM2VTS database. A head-to-head comparison with PCA is also presented to show the advantage derived by the use of the model in successful occlusion handling.

...read moreread less

Proceedings Article•DOI•

Large head movement tracking using sift-based registration

[...]

Gangqiang Zhao¹, Ling Chen², Jie Song¹, Gencai Chen¹•Institutions (2)

Zhejiang University¹, University of Nottingham²

29 Sep 2007

TL;DR: A novel tracking method to handle the problem of large motion by using Scale Invariant Feature Transform (SIFT) based registration algorithm that shows an accurate pose recovery when the head has large motion, even with movement along the Z axis.

...read moreread less

Abstract: Although there exists dozens of vision based 3D head tracking methods, none of them considers the problem of large motion, especially the movement along the Z axis. In this paper we propose a novel tracking method to handle this problem by using Scale Invariant Feature Transform (SIFT) based registration algorithm. Salient SIFT features are first detected and tracked between two images, and then the 3D points corresponding to these features are obtained from a stereo camera. With these 3D points, a registration algorithm in a RANSAC framework is employed to detect the outliers and estimate the head pose. Performance evaluation shows an accurate pose recovery (3° RMS) when the head has large motion, even with movement along the Z axis was about 150 cm.

...read moreread less

Proceedings Article•DOI•

Perspectively Invariant Normal Features

[...]

Kevin Köser¹, Reinhard Koch¹•Institutions (1)

University of Kiel¹

26 Dec 2007

TL;DR: This work extends the successful 2D robust feature concept into the third dimension in that it produces a descriptor for a reconstructed 3D surface region that is perspectively invariant if the region can locally be approximated well by a plane.

...read moreread less

Abstract: We extend the successful 2D robust feature concept into the third dimension in that we produce a descriptor for a reconstructed 3D surface region. The descriptor is perspectively invariant if the region can locally be approximated well by a plane. We exploit depth and texture information, which is nowadays available in real-time from video of moving cameras, from stereo systems or PMD cameras (photonic mixer devices). By computing a normal view onto the surface we still keep the descriptiveness of similarity invariant features like SIFT while achieving in- variance against perspective distortions, while descriptiveness typically suffers when using affine invariant features. Our approach can be exploited for structure-from-motion, for stereo or PMD cameras, alignment of large scale reconstructions or improved video registration.

...read moreread less

Proceedings Article•DOI•

Visual Model Feature Tracking For UAV Control

[...]

Iván F. Mondragón, Pascual Campoy, Juan F. Correa, Luis Mejias¹•Institutions (1)

Queensland University of Technology¹

01 Oct 2007

TL;DR: The results presented are promising in order to be used as reference generator for the control system, in which a series of matched key-points pairs that fulfill the transformation equations are selected, rejecting otherwise the corrupted data.

...read moreread less

Abstract: This paper explores the possibilities to use robust object tracking algorithms based on visual model features as generator of visual references for UAV control. A scale invariant feature transform (SIFT) algorithm is used for detecting the salient points at every processed image, then a projective transformation for evaluating the visual references is obtained using a version of the RANSAC algorithm, in which a series of matched key-points pairs that fulfill the transformation equations are selected, rejecting otherwise the corrupted data. The system has been tested using diverse image sequences showing its capability to track objects significantly changed in scale, position, rotation, generating at the same time velocity references to the UAV flight controller. The robustness our approach has also been validated using images taken from real flights showing noise and lighting distortions. The results presented are promising in order to be used as reference generator for the control system.

...read moreread less

Proceedings Article•DOI•

Multiple Vehicles Detection and Tracking based on Scale-Invariant Feature Transform

[...]

Jae-Young Choi, Kyung-Sang Sung, Young-Kyu Yang

22 Oct 2007

TL;DR: This paper suggests multiple vehicles detection by quad-tree segmentation and tracking method using scale invariant feature transform to improve the performance of tracking for extracting traffic parameter such as vehicle count, speed, class, and so on.

...read moreread less

Abstract: To monitor road situation, the source from CCTV is more useful than any other data from GPS or loop detector because it can give the whole picture of the two-dimensional traffic situation. This paper suggests multiple vehicles detection by quad-tree segmentation and tracking method using scale invariant feature transform to improve the performance of tracking for extracting traffic parameter such as vehicle count, speed, class, and so on. The experimental result presents the proposed method is effective and robust on detection and tracking vehicle, especially in cases that a vehicle changes a lane, occlusion of vehicles is occurred, and an affine shape of vehicle is changed due to car movement.

...read moreread less

Journal Article•DOI•

Dense 3D Map Construction for Indoor Search and Rescue

[...]

Lars-Peter Ellekilde¹, Shoudong Huang², Jaime Valls Miro², Gamini Dissanayake²•Institutions (2)

University of Southern Denmark¹, University of Technology, Sydney²

01 Jan 2007-Journal of Field Robotics

TL;DR: A new simultaneous localization and mapping (SLAM) algorithm for building dense three‐dimensional maps using information acquired from a range imager and a conventional camera, for robotic search and rescue in unstructured indoor environments.

...read moreread less

Abstract: The main contribution of this paper is a new simultaneous localization and mapping (SLAM) algorithm for building dense three-dimensional maps using information acquired from a range imager and a conventional camera, for robotic search and rescue in unstructured indoor environments. A key challenge in this scenario is that the robot moves in 6D and no odometry information is available. An extended information filter (EIF) is used to estimate the state vector containing the sequence of camera poses and some selected 3D point features in the environment. Data association is performed using a combination of scale invariant feature transformation (SIFT) feature detection and matching, random sampling consensus (RANSAC), and least square 3D point sets fitting. Experimental results are provided to demonstrate the effectiveness of the techniques developed. © 2007 Wiley Periodicals, Inc.

...read moreread less

Book Chapter•DOI•

Gesture recognition under small sample size

[...]

Tae-Kyun Kim¹, Roberto Cipolla¹•Institutions (1)

University of Cambridge¹

18 Nov 2007

TL;DR: The method of Canonical Correlation Analysis is combined with the discriminant functions and Scale-Invariant-Feature-Transform for the discriminative spatiotemporal features for robust gesture recognition.

...read moreread less

Abstract: This paper addresses gesture recognition under small sample size, where direct use of traditional classifiers is difficult due to high dimensionality of input space.We propose a pairwise feature extraction method of video volumes for classification. The method of Canonical Correlation Analysis is combined with the discriminant functions and Scale-Invariant-Feature-Transform (SIFT) for the discriminative spatiotemporal features for robust gesture recognition. The proposed method is practically favorable as it works well with a small amount of training samples, involves few parameters, and is computationally efficient. In the experiments using 900 videos of 9 hand gesture classes, the proposed method notably outperformed the classifiers such as Support Vector Machine/Relevance Vector Machine, achieving 85% accuracy.

...read moreread less

Collapse