scispace - formally typeset
Search or ask a question
Author

Yan Ke

Other affiliations: Microsoft
Bio: Yan Ke is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Image retrieval & The Internet. The author has an hindex of 20, co-authored 29 publications receiving 7043 citations. Previous affiliations of Yan Ke include Microsoft.

Papers
More filters
Proceedings ArticleDOI
27 Jun 2004
TL;DR: This paper examines (and improves upon) the local image descriptor used by SIFT, and demonstrates that the PCA-based local descriptors are more distinctive, more robust to image deformations, and more compact than the standard SIFT representation.
Abstract: Stable local feature detection and representation is a fundamental component of many image registration and object recognition algorithms. Mikolajczyk and Schmid (June 2003) recently evaluated a variety of approaches and identified the SIFT [D. G. Lowe, 1999] algorithm as being the most resistant to common image deformations. This paper examines (and improves upon) the local image descriptor used by SIFT. Like SIFT, our descriptors encode the salient aspects of the image gradient in the feature point's neighborhood; however, instead of using SIFT's smoothed weighted histograms, we apply principal components analysis (PCA) to the normalized gradient patch. Our experiments demonstrate that the PCA-based local descriptors are more distinctive, more robust to image deformations, and more compact than the standard SIFT representation. We also present results showing that using these descriptors in an image retrieval application results in increased accuracy and faster matching.

3,325 citations

Proceedings ArticleDOI
Yan Ke1, Xiaoou Tang1, Feng Jing1
17 Jun 2006
TL;DR: A principled method for designing high level features for photo quality assessment is proposed and the resulting system can classify between high quality professional photos and low quality snapshots.
Abstract: We propose a principled method for designing high level features forphoto quality assessment. Our resulting system can classify between high quality professional photos and low quality snapshots. Instead of using the bag of low-level features approach, we first determine the perceptual factors that distinguish between professional photos and snapshots. Then, we design high level semantic features to measure the perceptual differences. We test our features on a large and diverse dataset and our system is able to achieve a classification rate of 72% on this difficult task. Since our system is able to achieve a precision of over 90% in low recall scenarios, we show excellent results in a web image search application.

695 citations

Proceedings ArticleDOI
17 Oct 2005
TL;DR: This paper constructs a realtime event detector for each action of interest by learning a cascade of filters based on volumetric features that efficiently scans video sequences in space and time and confirms that it achieves performance comparable to a current interest point based human activity recognizer on a standard database of human activities.
Abstract: This paper studies the use of volumetric features as an alternative to popular local descriptor approaches for event detection in video sequences. Motivated by the recent success of similar ideas in object detection on static images, we generalize the notion of 2D box features to 3D spatio-temporal volumetric features. This general framework enables us to do real-time video analysis. We construct a realtime event detector for each action of interest by learning a cascade of filters based on volumetric features that efficiently scans video sequences in space and time. This event detector recognizes actions that are traditionally problematic for interest point methods - such as smooth motions where insufficient space-time interest points are available. Our experiments demonstrate that the technique accurately detects actions on real-world sequences and is robust to changes in viewpoint, scale and action speed. We also adapt our technique to the related task of human action classification and confirm that it achieves performance comparable to a current interest point based human activity recognizer on a standard database of human activities.

616 citations

Journal ArticleDOI
TL;DR: An architecture and system are designed and built that enable easy deployment of wide-area sensing services and provide the missing software components for realizing a worldwide sensor Web.
Abstract: We discuss about the IrisNet (Internet-scale resource-intensive sensor network services) project at Intel Research, we design an architecture and build a system that enable easy deployment of such wide-area sensing services. We aim to provide the missing software components for realizing a worldwide sensor Web. Wide-area architectures for pervasive sensing enable a new generation of powerful distributed sensing services. A worldwide sensor Web, in which users can query, as a single unit, vast quantities of data from thousands or even millions of widely distributed, heterogeneous sensors. Internet-connected PCs that source sensor feeds and cooperate to answer users' queries will form the global sensor Web's backbone. Developers of wide-area sensing services (service authors) deploy the services on this distributed infrastructure.

493 citations

Proceedings ArticleDOI
26 Dec 2007
TL;DR: This work proposes a technique for event recognition in crowded videos that reliably identifies actions in the presence of partial occlusion and background clutter, enabling robustness against occlusions and actor variability.
Abstract: Real-world actions occur often in crowded, dynamic environments. This poses a difficult challenge for current approaches to video event detection because it is difficult to segment the actor from the background due to distracting motion from other objects in the scene. We propose a technique for event recognition in crowded videos that reliably identifies actions in the presence of partial occlusion and background clutter. Our approach is based on three key ideas: (1) we efficiently match the volumetric representation of an event against oversegmented spatio-temporal video volumes; (2) we augment our shape-based features using flow; (3) rather than treating an event template as an atomic entity, we separately match by parts (both in space and time), enabling robustness against occlusions and actor variability. Our experiments on human actions, such as picking up a dropped object or waving in a crowd show reliable detection with few false positives.

436 citations


Cited by
More filters
Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

31,952 citations

Book ChapterDOI
07 May 2006
TL;DR: A novel scale- and rotation-invariant interest point detector and descriptor, coined SURF (Speeded Up Robust Features), which approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster.
Abstract: In this paper, we present a novel scale- and rotation-invariant interest point detector and descriptor, coined SURF (Speeded Up Robust Features). It approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster. This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectors and descriptors (in casu, using a Hessian matrix-based measure for the detector, and a distribution-based descriptor); and by simplifying these methods to the essential. This leads to a combination of novel detection, description, and matching steps. The paper presents experimental results on a standard evaluation set, as well as on imagery obtained in the context of a real-life object recognition application. Both show SURF's strong performance.

13,011 citations

Journal ArticleDOI
TL;DR: A novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features), which approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster.

12,449 citations

Journal ArticleDOI
TL;DR: An object detection system based on mixtures of multiscale deformable part models that is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges is described.
Abstract: We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks such as the PASCAL data sets. Our system relies on new methods for discriminative training with partially labeled data. We combine a margin-sensitive approach for data-mining hard negative examples with a formalism we call latent SVM. A latent SVM is a reformulation of MI--SVM in terms of latent variables. A latent SVM is semiconvex, and the training problem becomes convex once latent information is specified for the positive examples. This leads to an iterative training algorithm that alternates between fixing latent values for positive examples and optimizing the latent SVM objective function.

10,501 citations

Proceedings ArticleDOI
06 Nov 2011
TL;DR: This paper proposes a very fast binary descriptor based on BRIEF, called ORB, which is rotation invariant and resistant to noise, and demonstrates through experiments how ORB is at two orders of magnitude faster than SIFT, while performing as well in many situations.
Abstract: Feature matching is at the base of many computer vision problems, such as object recognition or structure from motion. Current methods rely on costly descriptors for detection and matching. In this paper, we propose a very fast binary descriptor based on BRIEF, called ORB, which is rotation invariant and resistant to noise. We demonstrate through experiments how ORB is at two orders of magnitude faster than SIFT, while performing as well in many situations. The efficiency is tested on several real-world applications, including object detection and patch-tracking on a smart phone.

8,702 citations