Author
Szymon Rusinkiewicz
Other affiliations: Massachusetts Institute of Technology, IEEE Computer Society, Stanford University ...read more
Bio: Szymon Rusinkiewicz is an academic researcher from Princeton University. The author has contributed to research in topics: Rendering (computer graphics) & Computer science. The author has an hindex of 57, co-authored 137 publications receiving 18760 citations. Previous affiliations of Szymon Rusinkiewicz include Massachusetts Institute of Technology & IEEE Computer Society.
Papers published on a yearly basis
Papers
More filters
01 May 2001
TL;DR: An implementation is demonstrated that is able to align two range images in a few tens of milliseconds, assuming a good initial guess, and has potential application to real-time 3D model acquisition and model-based tracking.
Abstract: The ICP (Iterative Closest Point) algorithm is widely used for geometric alignment of three-dimensional models when an initial estimate of the relative pose is known. Many variants of ICP have been proposed, affecting all phases of the algorithm from the selection and matching of points to the minimization strategy. We enumerate and classify many of these variants, and evaluate their effect on the speed with which the correct alignment is reached. In order to improve convergence for nearly-flat meshes with small features, such as inscribed surfaces, we introduce a new variant based on uniform sampling of the space of normals. We conclude by proposing a combination of ICP variants optimized for high speed. We demonstrate an implementation that is able to align two range images in a few tens of milliseconds, assuming a good initial guess. This capability has potential application to real-time 3D model acquisition and model-based tracking.
4,059 citations
01 Jul 2000
TL;DR: A hardware and software system for digitizing the shape and color of large fragile objects under non-laboratory conditions and the largest single dataset is of the David - 2 billion polygons and 7,000 color images.
Abstract: We describe a hardware and software system for digitizing the shape and color of large fragile objects under non-laboratory conditions Our system employs laser triangulation rangefinders, laser time-of-flight rangefinders, digital still cameras, and a suite of software for acquiring, aligning, merging, and viewing scanned data As a demonstration of this system, we digitized 10 statues by Michelangelo, including the well-known figure of David, two building interiors, and all 1,163 extant fragments of the Forma Urbis Romae, a giant marble map of ancient Rome Our largest single dataset is of the David - 2 billion polygons and 7,000 color images In this paper, we discuss the challenges we faced in building this system, the solutions we employed, and the lessons we learned We focus in particular on the unusual design of our laser triangulation scanner and on the algorithms and software we developed for handling very large scanned models
1,675 citations
23 Jun 2003
TL;DR: The limitations of canonical alignment are described and an alternate method, based on spherical harmonics, for obtaining rotation invariant representations is discussed, which reduces the dimensionality of the descriptor, providing a more compact representation, which in turn makes comparing two models more efficient.
Abstract: One of the challenges in 3D shape matching arises from the fact that in many applications, models should be considered to be the same if they differ by a rotation. Consequently, when comparing two models, a similarity metric implicitly provides the measure of similarity at the optimal alignment. Explicitly solving for the optimal alignment is usually impractical. So, two general methods have been proposed for addressing this issue: (1) Every model is represented using rotation invariant descriptors. (2) Every model is described by a rotation dependent descriptor that is aligned into a canonical coordinate system defined by the model. In this paper, we describe the limitations of canonical alignment and discuss an alternate method, based on spherical harmonics, for obtaining rotation invariant representations. We describe the properties of this tool and show how it can be applied to a number of existing, orientation dependent descriptors to improve their matching performance. The advantages of this tool are two-fold: First, it improves the matching performance of many descriptors. Second, it reduces the dimensionality of the descriptor, providing a more compact representation, which in turn makes comparing two models more efficient.
1,424 citations
01 Jul 2000
TL;DR: This work describes a system for representing and progressively displaying meshes that combines a multiresolution hierarchy based on bounding spheres with a rendering system based on points, which is suitable for large data sets.
Abstract: Advances in 3D scanning technologies have enabled the practical creation of meshes with hundreds of millions of polygons. Traditional algorithms for display, simplification, and progressive transmission of meshes are impractical for data sets of this size. We describe a system for representing and progressively displaying these meshes that combines a multiresolution hierarchy based on bounding spheres with a rendering system based on points. A single data structure is used for view frustum culling, backface culling, level-of-detail selection, and rendering. The representation is compact and can be computed quickly, making it suitable for large data sets. Our implementation, written for use in a large-scale 3D digitization project, launches quickly, maintains a user-settable interactive frame rate regardless of object complexity or camera position, yields reasonable image quality during motion, and refines progressively when idle to a high final image quality. We have demonstrated the system on scanned models containing hundreds of millions of samples.
1,157 citations
01 Jul 2002
TL;DR: A new 3D model acquisition system that permits the user to rotate an object by hand and see a continuously-updated model as the object is scanned, demonstrating the ability of the prototype to scan objects faster and with greater ease than conventional model acquisition pipelines.
Abstract: The digitization of the 3D shape of real objects is a rapidly expanding field, with applications in entertainment, design, and archaeology. We propose a new 3D model acquisition system that permits the user to rotate an object by hand and see a continuously-updated model as the object is scanned. This tight feedback loop allows the user to find and fill holes in the model in real time, and determine when the object has been completely covered. Our system is based on a 60 Hz. structured-light rangefinder, a real-time variant of ICP (iterative closest points) for alignment, and point-based merging and rendering algorithms. We demonstrate the ability of our prototype to scan objects faster and with greater ease than conventional model acquisition pipelines.
752 citations
Cited by
More filters
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.
10,141 citations
21 Jul 2017
TL;DR: This paper designs a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input and provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.
Abstract: Point cloud is an important type of geometric data structure. Due to its irregular format, most researchers transform such data to regular 3D voxel grids or collections of images. This, however, renders data unnecessarily voluminous and causes issues. In this paper, we design a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input. Our network, named PointNet, provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing. Though simple, PointNet is highly efficient and effective. Empirically, it shows strong performance on par or even better than state of the art. Theoretically, we provide analysis towards understanding of what the network has learnt and why the network is robust with respect to input perturbation and corruption.
9,457 citations
01 Jan 2006
TL;DR: This coherent and comprehensive book unifies material from several sources, including robotics, control theory, artificial intelligence, and algorithms, into planning under differential constraints that arise when automating the motions of virtually any mechanical system.
Abstract: Planning algorithms are impacting technical disciplines and industries around the world, including robotics, computer-aided design, manufacturing, computer graphics, aerospace applications, drug design, and protein folding. This coherent and comprehensive book unifies material from several sources, including robotics, control theory, artificial intelligence, and algorithms. The treatment is centered on robot motion planning but integrates material on planning in discrete spaces. A major part of the book is devoted to planning under uncertainty, including decision theory, Markov decision processes, and information spaces, which are the “configuration spaces” of all sensor-based planning problems. The last part of the book delves into planning under differential constraints that arise when automating the motions of virtually any mechanical system. Developed from courses taught by the author, the book is intended for students, engineers, and researchers in robotics, artificial intelligence, and control theory as well as computer graphics, algorithms, and computational biology.
6,340 citations
TL;DR: The guided filter is a novel explicit image filter derived from a local linear model that can be used as an edge-preserving smoothing operator like the popular bilateral filter, but it has better behaviors near edges.
Abstract: In this paper, we propose a novel explicit image filter called guided filter. Derived from a local linear model, the guided filter computes the filtering output by considering the content of a guidance image, which can be the input image itself or another different image. The guided filter can be used as an edge-preserving smoothing operator like the popular bilateral filter [1], but it has better behaviors near edges. The guided filter is also a more generic concept beyond smoothing: It can transfer the structures of the guidance image to the filtering output, enabling new filtering applications like dehazing and guided feathering. Moreover, the guided filter naturally has a fast and nonapproximate linear time algorithm, regardless of the kernel size and the intensity range. Currently, it is one of the fastest edge-preserving filters. Experiments show that the guided filter is both effective and efficient in a great variety of computer vision and computer graphics applications, including edge-aware smoothing, detail enhancement, HDR compression, image matting/feathering, dehazing, joint upsampling, etc.
4,730 citations
07 Jun 2015
TL;DR: This work proposes to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network, and shows that this 3D deep representation enables significant performance improvement over the-state-of-the-arts in a variety of tasks.
Abstract: 3D shape is a crucial but heavily underutilized cue in today's computer vision systems, mostly due to the lack of a good generic shape representation. With the recent availability of inexpensive 2.5D depth sensors (e.g. Microsoft Kinect), it is becoming increasingly important to have a powerful 3D shape representation in the loop. Apart from category recognition, recovering full 3D shapes from view-based 2.5D depth maps is also a critical part of visual understanding. To this end, we propose to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network. Our model, 3D ShapeNets, learns the distribution of complex 3D shapes across different object categories and arbitrary poses from raw CAD data, and discovers hierarchical compositional part representation automatically. It naturally supports joint object recognition and shape completion from 2.5D depth maps, and it enables active object recognition through view planning. To train our 3D deep learning model, we construct ModelNet - a large-scale 3D CAD model dataset. Extensive experiments show that our 3D deep representation enables significant performance improvement over the-state-of-the-arts in a variety of tasks.
4,266 citations