scispace - formally typeset
Search or ask a question
Author

Shahram Izadi

Other affiliations: PARC, Xerox, Microsoft  ...read more
Bio: Shahram Izadi is an academic researcher from Google. The author has contributed to research in topics: Augmented reality & Depth map. The author has an hindex of 82, co-authored 304 publications receiving 25952 citations. Previous affiliations of Shahram Izadi include PARC & Xerox.


Papers
More filters
Proceedings ArticleDOI
26 Oct 2011
TL;DR: A system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware, which fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real- time.
Abstract: We present a system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware. We fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real-time. The current sensor pose is simultaneously obtained by tracking the live depth frame relative to the global model using a coarse-to-fine iterative closest point (ICP) algorithm, which uses all of the observed depth data available. We demonstrate the advantages of tracking against the growing full surface model compared with frame-to-frame tracking, obtaining tracking and mapping results in constant time within room sized scenes with limited drift and high accuracy. We also show both qualitative and quantitative results relating to various aspects of our tracking and mapping system. Modelling of natural scenes, in real-time with only commodity sensor and GPU hardware, promises an exciting step forward in augmented reality (AR), in particular, it allows dense surfaces to be reconstructed in real-time, with a level of detail and robustness beyond any solution yet presented using passive computer vision.

4,184 citations

Proceedings ArticleDOI
16 Oct 2011
TL;DR: Novel extensions to the core GPU pipeline demonstrate object segmentation and user interaction directly in front of the sensor, without degrading camera tracking or reconstruction, to enable real-time multi-touch interactions anywhere.
Abstract: KinectFusion enables a user holding and moving a standard Kinect camera to rapidly create detailed 3D reconstructions of an indoor scene. Only the depth data from Kinect is used to track the 3D pose of the sensor and reconstruct, geometrically precise, 3D models of the physical scene in real-time. The capabilities of KinectFusion, as well as the novel GPU-based pipeline are described in full. Uses of the core system for low-cost handheld scanning, and geometry-aware augmented reality and physics-based interactions are shown. Novel extensions to the core GPU pipeline demonstrate object segmentation and user interaction directly in front of the sensor, without degrading camera tracking or reconstruction. These extensions are used to enable real-time multi-touch interactions anywhere, allowing any planar or non-planar reconstructed physical surface to be appropriated for touch.

2,373 citations

Journal ArticleDOI
01 Nov 2013
TL;DR: An online system for large and fine scale volumetric reconstruction based on a memory and speed efficient data structure that compresses space, and allows for real-time access and updates of implicit surface data, without the need for a regular or hierarchical grid data structure.
Abstract: Online 3D reconstruction is gaining newfound interest due to the availability of real-time consumer depth cameras. The basic problem takes live overlapping depth maps as input and incrementally fuses these into a single 3D model. This is challenging particularly when real-time performance is desired without trading quality or scale. We contribute an online system for large and fine scale volumetric reconstruction based on a memory and speed efficient data structure. Our system uses a simple spatial hashing scheme that compresses space, and allows for real-time access and updates of implicit surface data, without the need for a regular or hierarchical grid data structure. Surface data is only stored densely where measurements are observed. Additionally, data can be streamed efficiently in or out of the hash table, allowing for further scalability during sensor motion. We show interactive reconstructions of a variety of scenes, reconstructing both fine-grained details and large scale environments. We illustrate how all parts of our pipeline from depth map pre-processing, camera pose estimation, depth map fusion, and surface rendering are performed at real-time rates on commodity graphics hardware. We conclude with a comparison to current state-of-the-art online systems, illustrating improved performance and reconstruction quality.

940 citations

Proceedings ArticleDOI
23 Jun 2013
TL;DR: This work addresses the problem of inferring the pose of an RGB-D camera relative to a known 3D scene, given only a single acquired image, and employs a regression forest that is capable of inferting an estimate of each pixel's correspondence to 3D points in the scene's world coordinate frame.
Abstract: We address the problem of inferring the pose of an RGB-D camera relative to a known 3D scene, given only a single acquired image. Our approach employs a regression forest that is capable of inferring an estimate of each pixel's correspondence to 3D points in the scene's world coordinate frame. The forest uses only simple depth and RGB pixel comparison features, and does not require the computation of feature descriptors. The forest is trained to be capable of predicting correspondences at any pixel, so no interest point detectors are required. The camera pose is inferred using a robust optimization scheme. This starts with an initial set of hypothesized camera poses, constructed by applying the forest at a small fraction of image pixels. Preemptive RANSAC then iterates sampling more pixels at which to evaluate the forest, counting inliers, and refining the hypothesized poses. We evaluate on several varied scenes captured with an RGB-D camera and observe that the proposed technique achieves highly accurate relocalization and substantially out-performs two state of the art baselines.

796 citations

Book ChapterDOI
17 Sep 2006
TL;DR: The results of this initial evaluation of the SenseCam are extremely promising; periodic review of images of events recorded by SenseCam results in significant recall of those events by the patient, which was previously impossible.
Abstract: This paper presents a novel ubiquitous computing device, the SenseCam, a sensor augmented wearable stills camera. SenseCam is designed to capture a digital record of the wearer's day, by recording a series of images and capturing a log of sensor data. We believe that reviewing this information will help the wearer recollect aspects of earlier experiences that have subsequently been forgotten, and thereby form a powerful retrospective memory aid. In this paper we review existing work on memory aids and conclude that there is scope for an improved device. We then report on the design of SenseCam in some detail for the first time. We explain the details of a first in-depth user study of this device, a 12-month clinical trial with a patient suffering from amnesia. The results of this initial evaluation are extremely promising; periodic review of images of events recorded by SenseCam results in significant recall of those events by the patient, which was previously impossible. We end the paper with a discussion of future work, including the application of SenseCam to a wider audience, such as those with neurodegenerative conditions such as Alzheimer's disease.

753 citations


Cited by
More filters
Proceedings ArticleDOI
07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

40,257 citations

Proceedings ArticleDOI
01 Jun 2016
TL;DR: This work introduces Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling, and exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity.
Abstract: Visual understanding of complex urban street scenes is an enabling factor for a wide range of applications. Object detection has benefited enormously from large-scale datasets, especially in the context of deep learning. For semantic urban scene understanding, however, no current dataset adequately captures the complexity of real-world urban scenes. To address this, we introduce Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling. Cityscapes is comprised of a large, diverse set of stereo video sequences recorded in streets from 50 different cities. 5000 of these images have high quality pixel-level annotations, 20 000 additional images have coarse annotations to enable methods that leverage large volumes of weakly-labeled data. Crucially, our effort exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity. Our accompanying empirical study provides an in-depth analysis of the dataset characteristics, as well as a performance evaluation of several state-of-the-art approaches based on our benchmark.

7,547 citations

Journal ArticleDOI
TL;DR: As an example of how the current "war on terrorism" could generate a durable civic renewal, Putnam points to the burst in civic practices that occurred during and after World War II, which he says "permanently marked" the generation that lived through it and had a "terrific effect on American public life over the last half-century."
Abstract: The present historical moment may seem a particularly inopportune time to review Bowling Alone, Robert Putnam's latest exploration of civic decline in America. After all, the outpouring of volunteerism, solidarity, patriotism, and self-sacrifice displayed by Americans in the wake of the September 11 terrorist attacks appears to fly in the face of Putnam's central argument: that \"social capital\" -defined as \"social networks and the norms of reciprocity and trustworthiness that arise from them\" (p. 19)'has declined to dangerously low levels in America over the last three decades. However, Putnam is not fazed in the least by the recent effusion of solidarity. Quite the contrary, he sees in it the potential to \"reverse what has been a 30to 40-year steady decline in most measures of connectedness or community.\"' As an example of how the current \"war on terrorism\" could generate a durable civic renewal, Putnam points to the burst in civic practices that occurred during and after World War II, which he says \"permanently marked\" the generation that lived through it and had a \"terrific effect on American public life over the last half-century.\" 3 If Americans can follow this example and channel their current civic

5,309 citations

Proceedings ArticleDOI
26 Oct 2011
TL;DR: A system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware, which fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real- time.
Abstract: We present a system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware. We fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real-time. The current sensor pose is simultaneously obtained by tracking the live depth frame relative to the global model using a coarse-to-fine iterative closest point (ICP) algorithm, which uses all of the observed depth data available. We demonstrate the advantages of tracking against the growing full surface model compared with frame-to-frame tracking, obtaining tracking and mapping results in constant time within room sized scenes with limited drift and high accuracy. We also show both qualitative and quantitative results relating to various aspects of our tracking and mapping system. Modelling of natural scenes, in real-time with only commodity sensor and GPU hardware, promises an exciting step forward in augmented reality (AR), in particular, it allows dense surfaces to be reconstructed in real-time, with a level of detail and robustness beyond any solution yet presented using passive computer vision.

4,184 citations