scispace - formally typeset
Search or ask a question
Author

Navneet Dalal

Bio: Navneet Dalal is an academic researcher from Google. The author has contributed to research in topics: Object detection & Optical flow. The author has an hindex of 17, co-authored 31 publications receiving 32776 citations. Previous affiliations of Navneet Dalal include Siemens & French Institute for Research in Computer Science and Automation.

Papers
More filters
Patent
13 Jun 2011
TL;DR: In this paper, a method and system for detecting user interface gestures that includes obtaining an image from an imaging unit, identifying object search area of the images, detecting at least a first gesture object in the search areas of an image of a first instance, detecting a second gesture object, and determining an input gesture from an occurrence of the first gesture objects and the at least second gesture objects is presented.
Abstract: A method and system for detecting user interface gestures that includes obtaining an image from an imaging unit; identifying object search area of the images; detecting at least a first gesture object in the search area of an image of a first instance; detecting at least a second gesture object in the search area of an image of at least a second instance; and determining an input gesture from an occurrence of the first gesture object and the at least second gesture object.

75 citations

Patent
12 Mar 2013
TL;DR: In this paper, a system and method that includes detecting an application change within a multi-application operating framework, updating an application hierarchy model for gesture-to-action responses with the detected application change, detecting a gesture; according to the hierarchy model, mapping the detected gesture to an action of an application; and triggering the action.
Abstract: A system and method that includes detecting an application change within a multi-application operating framework; updating an application hierarchy model for gesture-to-action responses with the detected application change; detecting a gesture; according to the hierarchy model, mapping the detected gesture to an action of an application; and triggering the action.

53 citations

Proceedings ArticleDOI
05 Dec 2002
TL;DR: This work considers sport videos where the moving and deforming athlete is visible in every frame of the sequence, thereby making the alignment task tricky, and proposes an integrated framework to construct what is called motion panoramas, based on a mixed feature-based and direct approach.
Abstract: We address the problem of constructing mosaics from video sequences taken by rotating cameras. In particular we investigate the widespread case where the scene is not only static but may also contain large dynamic areas, induced by moving or deforming objects. Most of the existing techniques fail to produce reliable results on such video sequences. For such alignment purposes, two classes of techniques may be used: feature-based and direct methods. We derive both of them in a unified statistical manner and propose an integrated framework to construct what we call motion panoramas, based on a mixed feature-based and direct approach. Experimental results are provided on large image sequences. In particular we consider sport videos where the moving and deforming athlete is visible in every frame of the sequence, thereby making the alignment task tricky.

35 citations

Patent
Salih Burak Gokturk, Danny Yang1, Navneet Dalal, Munjal Shah, Goodman Marissa 
14 Jul 2009
TL;DR: In this article, the highly relevant supplemental content may be determined from criteria that is generated or associated with image content (and text and/or metadata) of the seed supplemental content.
Abstract: Supplemental content, such as advertisement media and promotional content, may be presented as seeds from which additional, highly relevant supplemental content may be provided to the user. The highly relevant supplemental content may be determined from criteria that is generated or associated with image content (and text and/or metadata) of the seed supplemental content.

30 citations

Patent
17 Dec 2010
TL;DR: In this paper, a system and method for recommending clothing or apparel to a user is presented, where a user's activity is detected in order to identify a set of items that are of interest to the user.
Abstract: A system and method for recommending clothing or apparel to a user. Activity of a user is detected in order to identify a set of items that are of interest to the user. One or more recommendation parameters may be determined for the used based at least in part on the individual items of clothing/apparel that are of interest to the user. Clothing/apparel content is selected for display to the user based on the recommendation parameters.

28 citations


Cited by
More filters
Proceedings ArticleDOI
07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

40,257 citations

Book ChapterDOI
06 Sep 2014
TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Abstract: We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in precise object localization. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old. With a total of 2.5 million labeled instances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model.

30,462 citations

Proceedings ArticleDOI
27 Jun 2016
TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
Abstract: We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background. Finally, YOLO learns very general representations of objects. It outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

27,256 citations

Proceedings ArticleDOI
23 Jun 2014
TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.
Abstract: Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also present experiments that provide insight into what the network learns, revealing a rich hierarchy of image features. Source code for the complete system is available at http://www.cs.berkeley.edu/~rbg/rcnn.

21,729 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.
Abstract: Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But pyramid representations have been avoided in recent object detectors that are based on deep convolutional networks, partially because they are slow to compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A top-down architecture with lateral connections is developed for building high-level semantic feature maps at all scales. This architecture, called a Feature Pyramid Network (FPN), shows significant improvement as a generic feature extractor in several applications. Using a basic Faster R-CNN system, our method achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles, surpassing all existing single-model entries including those from the COCO 2016 challenge winners. In addition, our method can run at 5 FPS on a GPU and thus is a practical and accurate solution to multi-scale object detection. Code will be made publicly available.

16,727 citations