Showing papers by "Hao Su published in 2016"

PDF

Open Access

Proceedings Article•DOI•

Volumetric and Multi-view CNNs for Object Classification on 3D Data

[...]

Charles R. Qi¹, Hao Su¹, Matthias NieBner¹, Angela Dai¹, Mengyuan Yan¹, Leonidas J. Guibas¹ - Show less +2 more•Institutions (1)

Stanford University¹

27 Jun 2016

TL;DR: In this paper, two distinct network architectures of volumetric CNNs and multi-view CNNs are introduced, where they introduce multiresolution filtering in 3D. And they provide extensive experiments designed to evaluate underlying design choices.

...read moreread less

Abstract: 3D shape models are becoming widely available and easier to capture, making available 3D information crucial for progress in object classification. Current state-of-theart methods rely on CNNs to address this problem. Recently, we witness two types of CNNs being developed: CNNs based upon volumetric representations versus CNNs based upon multi-view representations. Empirical results from these two types of CNNs exhibit a large gap, indicating that existing volumetric CNN architectures and approaches are unable to fully exploit the power of 3D representations. In this paper, we aim to improve both volumetric CNNs and multi-view CNNs according to extensive analysis of existing approaches. To this end, we introduce two distinct network architectures of volumetric CNNs. In addition, we examine multi-view CNNs, where we introduce multiresolution filtering in 3D. Overall, we are able to outperform current state-of-the-art methods for both volumetric CNNs and multi-view CNNs. We provide extensive experiments designed to evaluate underlying design choices, thus providing a better understanding of the space of methods available for object classification on 3D data.

...read moreread less

1,488 citations

Posted Content•

A Point Set Generation Network for 3D Object Reconstruction from a Single Image

[...]

Haoqiang Fan¹, Hao Su², Leonidas J. Guibas²•Institutions (2)

Tsinghua University¹, Stanford University²

02 Dec 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, the authors address the problem of 3D reconstruction from a single image, generating a straight-forward form of output -point cloud coordinates. But the groundtruth shape for an input image may be ambiguous, and they design architecture, loss function and learning paradigm that are novel and effective.

...read moreread less

Abstract: Generation of 3D data by deep neural network has been attracting increasing attention in the research community. The majority of extant works resort to regular representations such as volumetric grids or collection of images; however, these representations obscure the natural invariance of 3D shapes under geometric transformations and also suffer from a number of other issues. In this paper we address the problem of 3D reconstruction from a single image, generating a straight-forward form of output -- point cloud coordinates. Along with this problem arises a unique and interesting issue, that the groundtruth shape for an input image may be ambiguous. Driven by this unorthodox output form and the inherent ambiguity in groundtruth, we design architecture, loss function and learning paradigm that are novel and effective. Our final solution is a conditional shape sampler, capable of predicting multiple plausible 3D point clouds from an input image. In experiments not only can our system outperform state-of-the-art methods on single image based 3d reconstruction benchmarks; but it also shows a strong performance for 3d shape completion and promising ability in making multiple plausible predictions.

...read moreread less

1,194 citations

Posted Content•

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

[...]

Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas

02 Dec 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: PointNet as discussed by the authors proposes a novel type of neural network that directly consumes point clouds and well respects the permutation invariance of points in the input, which provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.

...read moreread less

Abstract: Point cloud is an important type of geometric data structure. Due to its irregular format, most researchers transform such data to regular 3D voxel grids or collections of images. This, however, renders data unnecessarily voluminous and causes issues. In this paper, we design a novel type of neural network that directly consumes point clouds and well respects the permutation invariance of points in the input. Our network, named PointNet, provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing. Though simple, PointNet is highly efficient and effective. Empirically, it shows strong performance on par or even better than state of the art. Theoretically, we provide analysis towards understanding of what the network has learnt and why the network is robust with respect to input perturbation and corruption.

...read moreread less

1,156 citations

Journal Article•DOI•

A scalable active framework for region annotation in 3D shape collections

[...]

Li Yi¹, Vladimir G. Kim¹, Duygu Ceylan², I-Chao Shen³, Mengyan Yan¹, Hao Su¹, Cewu Lu¹, Qixing Huang⁴, Alla Sheffer³, Leonidas J. Guibas¹ - Show less +6 more•Institutions (4)

Stanford University¹, Adobe Systems², University of British Columbia³, University of Texas at Austin⁴

11 Nov 2016

TL;DR: This work proposes a novel active learning method capable of enriching massive geometric datasets with accurate semantic region annotations, and demonstrates that incorporating verification of all produced labelings within this unified objective improves both accuracy and efficiency of the active learning procedure.

...read moreread less

Abstract: Large repositories of 3D shapes provide valuable input for data-driven analysis and modeling tools. They are especially powerful once annotated with semantic information such as salient regions and functional parts. We propose a novel active learning method capable of enriching massive geometric datasets with accurate semantic region annotations. Given a shape collection and a user-specified region label our goal is to correctly demarcate the corresponding regions with minimal manual work. Our active framework achieves this goal by cycling between manually annotating the regions, automatically propagating these annotations across the rest of the shapes, manually verifying both human and automatic annotations, and learning from the verification results to improve the automatic propagation algorithm. We use a unified utility function that explicitly models the time cost of human input across all steps of our method. This allows us to jointly optimize for the set of models to annotate and for the set of models to verify based on the predicted impact of these actions on the human efficiency. We demonstrate that incorporating verification of all produced labelings within this unified objective improves both accuracy and efficiency of the active learning procedure. We automatically propagate human labels across a dynamic shape network using a conditional random field (CRF) framework, taking advantage of global shape-to-shape similarities, local feature similarities, and point-to-point correspondences. By combining these diverse cues we achieve higher accuracy than existing alternatives. We validate our framework on existing benchmarks demonstrating it to be significantly more efficient at using human input compared to previous techniques. We further validate its efficiency and robustness by annotating a massive shape dataset, labeling over 93,000 shape parts, across multiple model classes, and providing a labeled part collection more than one order of magnitude larger than existing ones.

...read moreread less

959 citations

Book Chapter•DOI•

ObjectNet3D: A Large Scale Database for 3D Object Recognition

[...]

Yu Xiang¹, Wonhui Kim¹, Wei Chen¹, Jingwei Ji¹, Christopher Choy¹, Hao Su¹, Roozbeh Mottaghi¹, Leonidas J. Guibas¹, Silvio Savarese¹ - Show less +5 more•Institutions (1)

Stanford University¹

08 Oct 2016

TL;DR: A large scale database for 3D object recognition that consists of 100 categories, 90,127 images, 201,888 objects in these images and 44,147 3D shapes, which is useful for recognizing the 3D pose and 3D shape of objects from 2D images is contributed.

...read moreread less

Abstract: We contribute a large scale database for 3D object recognition, named ObjectNet3D, that consists of 100 categories, 90,127 images, 201,888 objects in these images and 44,147 3D shapes. Objects in the 2D images in our database are aligned with the 3D shapes, and the alignment provides both accurate 3D pose annotation and the closest 3D shape annotation for each 2D object. Consequently, our database is useful for recognizing the 3D pose and 3D shape of objects from 2D images. We also provide baseline experiments on four tasks: region proposal generation, 2D object detection, joint 2D detection and 3D object pose estimation, and image-based 3D shape retrieval, which can serve as baselines for future research using our database. Our database is available online at http://cvgl.stanford.edu/projects/objectnet3d.

...read moreread less

309 citations

Proceedings Article•DOI•

Synthesizing Training Images for Boosting Human 3D Pose Estimation

[...]

Wenzheng Chen¹, Huan Wang¹, Yangyan Li², Hao Su², Zhenhua Wang³, Changhe Tu¹, Dani Lischinski⁴, Daniel Cohen-Or⁵, Baoquan Chen¹ - Show less +5 more•Institutions (5)

Shandong University¹, Stanford University², Nanyang Technological University³, Hebrew University of Jerusalem⁴, Tel Aviv University⁵

01 Oct 2016

TL;DR: In this paper, a fully automatic, scalable approach that samples the human pose space for guiding the synthesis procedure and extracts clothing textures from real images is presented. But this approach is not suitable for 3D pose estimation, since 3D poses are much harder to annotate.

...read moreread less

Abstract: Human 3D pose estimation from a single image is a challenging task with numerous applications. Convolutional Neural Networks (CNNs) have recently achieved superior performance on the task of 2D pose estimation from a single image, by training on images with 2D annotations collected by crowd sourcing. This suggests that similar success could be achieved for direct estimation of 3D poses. However, 3D poses are much harder to annotate, and the lack of suitable annotated training images hinders attempts towards end-to-end solutions. To address this issue, we opt to automatically synthesize training images with ground truth pose annotations. Our work is a systematic study along this road. We find that pose space coverage and texture diversity are the key ingredients for the effectiveness of synthetic training data. We present a fully automatic, scalable approach that samples the human pose space for guiding the synthesis procedure and extracts clothing textures from real images. Furthermore, we explore domain adaptation for bridging the gap between our synthetic training images and real testing photos. We demonstrate that CNNs trained with our synthetic images out-perform those trained with real photos on 3D pose estimation tasks.

...read moreread less

240 citations

Posted Content•

FPNN: Field Probing Neural Networks for 3D Data

[...]

Yangyan Li¹, Sören Pirk², Hao Su², Charles R. Qi², Leonidas J. Guibas² - Show less +1 more•Institutions (2)

Tel Aviv University¹, Stanford University²

20 May 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work represents 3D spaces as volumetric fields, and proposes a novel design that employs field probing filters to efficiently extract features from them, showing that field probing is significantly more efficient than 3DCNNs, while providing state-of-the-art performance, on classification tasks for 3D object recognition benchmark datasets.

...read moreread less

Abstract: Building discriminative representations for 3D data has been an important task in computer graphics and computer vision research. Convolutional Neural Networks (CNNs) have shown to operate on 2D images with great success for a variety of tasks. Lifting convolution operators to 3D (3DCNNs) seems like a plausible and promising next step. Unfortunately, the computational complexity of 3D CNNs grows cubically with respect to voxel resolution. Moreover, since most 3D geometry representations are boundary based, occupied regions do not increase proportionately with the size of the discretization, resulting in wasted computation. In this work, we represent 3D spaces as volumetric fields, and propose a novel design that employs field probing filters to efficiently extract features from them. Each field probing filter is a set of probing points --- sensors that perceive the space. Our learning algorithm optimizes not only the weights associated with the probing points, but also their locations, which deforms the shape of the probing filters and adaptively distributes them in 3D space. The optimized probing points sense the 3D space "intelligently", rather than operating blindly over the entire domain. We show that field probing is significantly more efficient than 3DCNNs, while providing state-of-the-art performance, on classification tasks for 3D object recognition benchmark datasets.

...read moreread less

233 citations

Posted Content•

Volumetric and Multi-View CNNs for Object Classification on 3D Data

[...]

Charles R. Qi, Hao Su, Matthias Niessner, Angela Dai, Mengyuan Yan, Leonidas J. Guibas - Show less +2 more

12 Apr 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, two distinct network architectures of volumetric CNNs and multi-view CNNs are introduced, where they are able to outperform current state-of-the-art methods.

...read moreread less

Abstract: 3D shape models are becoming widely available and easier to capture, making available 3D information crucial for progress in object classification. Current state-of-the-art methods rely on CNNs to address this problem. Recently, we witness two types of CNNs being developed: CNNs based upon volumetric representations versus CNNs based upon multi-view representations. Empirical results from these two types of CNNs exhibit a large gap, indicating that existing volumetric CNN architectures and approaches are unable to fully exploit the power of 3D representations. In this paper, we aim to improve both volumetric CNNs and multi-view CNNs according to extensive analysis of existing approaches. To this end, we introduce two distinct network architectures of volumetric CNNs. In addition, we examine multi-view CNNs, where we introduce multi-resolution filtering in 3D. Overall, we are able to outperform current state-of-the-art methods for both volumetric CNNs and multi-view CNNs. We provide extensive experiments designed to evaluate underlying design choices, thus providing a better understanding of the space of methods available for object classification on 3D data.

...read moreread less

219 citations

Proceedings Article•

FPNN: Field Probing Neural Networks for 3D Data

[...]

Yangyan Li¹, Sören Pirk², Hao Su², Charles R. Qi², Leonidas J. Guibas² - Show less +1 more•Institutions (2)

Tel Aviv University¹, Stanford University²

20 May 2016

TL;DR: In this article, the authors represent 3D spaces as volumetric fields, and propose a novel design that employs field probing filters to efficiently extract features from them, where each field probing filter is a set of probing points.

...read moreread less

Abstract: Building discriminative representations for 3D data has been an important task in computer graphics and computer vision research. Convolutional Neural Networks (CNNs) have shown to operate on 2D images with great success for a variety of tasks. Lifting convolution operators to 3D (3DCNNs) seems like a plausible and promising next step. Unfortunately, the computational complexity of 3D CNNs grows cubically with respect to voxel resolution. Moreover, since most 3D geometry representations are boundary based, occupied regions do not increase proportionately with the size of the discretization, resulting in wasted computation. In this work, we represent 3D spaces as volumetric fields, and propose a novel design that employs field probing filters to efficiently extract features from them. Each field probing filter is a set of probing points -- sensors that perceive the space. Our learning algorithm optimizes not only the weights associated with the probing points, but also their locations, which deforms the shape of the probing filters and adaptively distributes them in 3D space. The optimized probing points sense the 3D space "intelligently", rather than operating blindly over the entire domain. We show that field probing is significantly more efficient than 3DCNNs, while providing state-of-the-art performance, on classification tasks for 3D object recognition benchmark datasets.

...read moreread less

129 citations

Posted Content•

Learning Shape Abstractions by Assembling Volumetric Primitives.

[...]

Shubham Tulsiani¹, Hao Su², Leonidas J. Guibas³, Alexei A. Efros¹, Jitendra Malik¹ - Show less +1 more•Institutions (3)

University of California, Berkeley¹, Wyss Institute for Biologically Inspired Engineering², Stanford University³

01 Dec 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: A learning framework for abstracting complex shapes by learning to assemble objects using 3D volumetric primitives that allows predicting shape representations which can be leveraged for obtaining a consistent parsing across the instances of a shape collection and constructing an interpretable shape similarity measure.

...read moreread less

Abstract: We present a learning framework for abstracting complex shapes by learning to assemble objects using 3D volumetric primitives. In addition to generating simple and geometrically interpretable explanations of 3D objects, our framework also allows us to automatically discover and exploit consistent structure in the data. We demonstrate that using our method allows predicting shape representations which can be leveraged for obtaining a consistent parsing across the instances of a shape collection and constructing an interpretable shape similarity measure. We also examine applications for image-based prediction as well as shape manipulation.

...read moreread less

83 citations

Journal Article•DOI•

Toward Controlled Hierarchical Heterogeneities in Giant Molecules with Precisely Arranged Nano Building Blocks.

[...]

Wei Zhang¹, Mingjun Huang¹, Hao Su¹, Siyu Zhang¹, Kan Yue¹, Xue-Hui Dong¹, Xiaopeng Li², Hao Liu¹, Shuo Zhang¹, Chrys Wesdemiotis¹, Bernard Lotz³, Wen-Bin Zhang⁴, Yiwen Li¹, Stephen Z. D. Cheng¹ - Show less +10 more•Institutions (4)

University of Akron¹, Texas State University², University of Strasbourg³, Peking University⁴

27 Jan 2016-ACS central science

TL;DR: A unique synthetic methodology to prepare a library of giant molecules with multiple, precisely arranged nano building blocks and the influence of minute structural differences on their self-assembly behaviors is introduced.

...read moreread less

Abstract: Herein we introduce a unique synthetic methodology to prepare a library of giant molecules with multiple, precisely arranged nano building blocks, and illustrate the influence of minute structural differences on their self-assembly behaviors. The T8 polyhedral oligomeric silsesquioxane (POSS) nanoparticles are orthogonally functionalized and sequentially attached onto the end of a hydrophobic polymer chain in either linear or branched configuration. The heterogeneity of primary chemical structure in terms of composition, surface functionality, sequence, and topology can be precisely controlled and is reflected in the self-assembled supramolecular structures of these giant molecules in the condensed state. This strategy offers promising opportunities to manipulate the hierarchical heterogeneities of giant molecules via precise and modular assemblies of various nano building blocks.

...read moreread less

Posted Content•

Synthesizing Training Images for Boosting Human 3D Pose Estimation

[...]

Wenzheng Chen¹, Huan Wang¹, Yangyan Li², Hao Su², Zhenhua Wang³, Changhe Tu¹, Dani Lischinski⁴, Daniel Cohen-Or⁵, Baoquan Chen¹ - Show less +5 more•Institutions (5)

Shandong University¹, Stanford University², Nanyang Technological University³, Hebrew University of Jerusalem⁴, Tel Aviv University⁵

10 Apr 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: It is shown that pose space coverage and texture diversity are the key ingredients for the effectiveness of synthetic training data and CNNs trained with the authors' synthetic images out-perform those trained with real photos on 3D pose estimation tasks.

...read moreread less

Proceedings Article•DOI•

Large-scale 3D shape retrieval from ShapeNet core55

[...]

Manolis Savva¹, Fisher Yu², Hao Su¹, Masaki Aono³, Baoquan Chen⁴, Daniel Cohen-Or⁵, W. Deng⁶, Hang Su⁷, Song Bai⁸, Xiang Bai⁸, Noa Fish⁵, J. Han⁶, Evangelos Kalogerakis⁷, Erik Learned-Miller⁷, Yangyan Li⁵, M. Liao⁸, Subhransu Maji⁷, Atsushi Tatsuma³, Y. Wang⁶, N. Zhang⁶, Zhichao Zhou⁸ - Show less +17 more•Institutions (8)

Stanford University¹, Princeton University², Toyohashi University of Technology³, Shandong University⁴, Tel Aviv University⁵, Beijing University of Posts and Telecommunications⁶, University of Massachusetts Amherst⁷, Huazhong University of Science and Technology⁸

08 May 2016

TL;DR: This track aims to provide a benchmark to evaluate large-scale shape retrieval based on the ShapeNet dataset, using ShapeNet Core55, which provides more than 50 thousands models over 55 common categories in total for training and evaluating several algorithms.

...read moreread less

Abstract: With the advent of commodity 3D capturing devices and better 3D modeling tools, 3D shape content is becoming increasingly prevalent. Therefore, the need for shape retrieval algorithms to handle large-scale shape repositories is more and more important. This track aims to provide a benchmark to evaluate large-scale shape retrieval based on the ShapeNet dataset. We use ShapeNet Core55, which provides more than 50 thousands models over 55 common categories in total for training and evaluating several algorithms. Five participating teams have submitted a variety of retrieval methods which were evaluated on several standard information retrieval performance metrics. We find the submitted methods work reasonably well on the track benchmark, but we also see significant space for improvement by future algorithms. We release all the data, results, and evaluation code for the benefit of the community.

...read moreread less

Journal Article•DOI•

Supramolecular Crafting of Self-Assembling Camptothecin Prodrugs with Enhanced Efficacy against Primary Cancer Cells.

[...]

Hao Su¹, Pengcheng Zhang¹, Andrew G. Cheetham¹, Jin Mo Koo¹, Ran Lin¹, Asad Masood¹, Paula Schiapparelli², Alfredo Quiñones-Hinojosa², Honggang Cui² - Show less +5 more•Institutions (2)

Johns Hopkins University¹, Johns Hopkins University School of Medicine²

28 Apr 2016-Theranostics

TL;DR: It is found that camptothecin (CPT) prodrugs created by conjugating two CPT molecules onto a hydrophilic segment can associate into filamentous nanostructures in water and exhibit much greater efficacy against primary brain cancer cells relative to that of irinotecan, a clinically used CPT prodrug.

...read moreread less

Abstract: Chemical modification of small molecule hydrophobic drugs is a clinically proven strategy to devise prodrugs with enhanced treatment efficacy. While this prodrug strategy improves the parent drug's water solubility and pharmacokinetic profile, it typically compromises the drug's potency against cancer cells due to the retarded drug release rate and reduced cellular uptake efficiency. Here we report on the supramolecular design of self-assembling prodrugs (SAPD) with much improved water solubility while maintaining high potency against cancer cells. We found that camptothecin (CPT) prodrugs created by conjugating two CPT molecules onto a hydrophilic segment can associate into filamentous nanostructures in water. Our results suggest that these SAPD exhibit much greater efficacy against primary brain cancer cells relative to that of irinotecan, a clinically used CPT prodrug. We believe these findings open a new avenue for rational design of supramolecular prodrugs for cancer treatment.

...read moreread less

Journal Article•DOI•

3D attention-driven depth acquisition for object identification

[...]

Kai Xu¹, Yifei Shi¹, Lintao Zheng¹, Junyu Zhang, Min Liu¹, Hui Huang², Hao Su³, Daniel Cohen-Or⁴, Baoquan Chen⁵ - Show less +5 more•Institutions (5)

National University of Defense Technology¹, Shenzhen University², Stanford University³, Tel Aviv University⁴, Shandong University⁵

11 Nov 2016

TL;DR: A 3D Attention Model that selects the best views to scan from, as well as the most informative regions in each view to focus on, to achieve efficient object recognition is developed, which leads to focus-driven features which are quite robust against object occlusion.

...read moreread less

Abstract: We address the problem of autonomously exploring unknown objects in a scene by consecutive depth acquisitions. The goal is to reconstruct the scene while online identifying the objects from among a large collection of 3D shapes. Fine-grained shape identification demands a meticulous series of observations attending to varying views and parts of the object of interest. Inspired by the recent success of attention-based models for 2D recognition, we develop a 3D Attention Model that selects the best views to scan from, as well as the most informative regions in each view to focus on, to achieve efficient object recognition. The region-level attention leads to focus-driven features which are quite robust against object occlusion. The attention model, trained with the 3D shape collection, encodes the temporal dependencies among consecutive views with deep recurrent networks. This facilitates order-aware view planning accounting for robot movement cost. In achieving instance identification, the shape collection is organized into a hierarchy, associated with pre-trained hierarchical classifiers. The effectiveness of our method is demonstrated on an autonomous robot (PR) that explores a scene and identifies the objects to construct a 3D scene model.

...read moreread less

Proceedings Article•DOI•

Multilinear Hyperplane Hashing

[...]

Xianglong Liu¹, Xinjie Fan, Cheng Deng², Zhujin Li¹, Hao Su³, Dacheng Tao⁴ - Show less +2 more•Institutions (4)

Beihang University¹, Xidian University², Stanford University³, University of Technology, Sydney⁴

01 Jun 2016

TL;DR: A multilinear hyperplane hashing that generates a hash bit using multiple linear projections with strong locality sensitivity to hyperplane queries is proposed and an angular quantization based learning framework for compact multil inear hashing is introduced, which considerably boosts the search performance with less hash bits.

...read moreread less

Abstract: Hashing has become an increasingly popular technique for fast nearest neighbor search. Despite its successful progress in classic pointto-point search, there are few studies regarding point-to-hyperplane search, which has strong practical capabilities of scaling up applications like active learning with SVMs. Existing hyperplane hashing methods enable the fast search based on randomly generated hash codes, but still suffer from a low collision probability and thus usually require long codes for a satisfying performance. To overcome this problem, this paper proposes a multilinear hyperplane hashing that generates a hash bit using multiple linear projections. Our theoretical analysis shows that with an even number of random linear projections, the multilinear hash function possesses strong locality sensitivity to hyperplane queries. To leverage its sensitivity to the angle distance, we further introduce an angular quantization based learning framework for compact multilinear hashing, which considerably boosts the search performance with less hash bits. Experiments with applications to large-scale (up to one million) active learning on two datasets demonstrate the overall superiority of the proposed approach.

...read moreread less

Journal Article•DOI•

Unsupervised texture transfer from images to model collections

[...]

Tuanfeng Y. Wang¹, Hao Su², Qixing Huang³, Jingwei Huang², Leonidas J. Guibas², Niloy J. Mitra¹ - Show less +2 more•Institutions (3)

University College London¹, Stanford University², University of Texas at Austin³

11 Nov 2016

TL;DR: The method allows for large-scale unsupervised production of richly textured 3D models directly from image data, providing high quality virtual objects for 3D scene design or photo editing applications, as well as a wealth of data for training machine learning algorithms for various inference tasks in graphics and vision.

...read moreread less

Abstract: Large 3D model repositories of common objects are now ubiquitous and are increasingly being used in computer graphics and computer vision for both analysis and synthesis tasks. However, images of objects in the real world have a richness of appearance that these repositories do not capture, largely because most existing 3D models are untextured. In this work we develop an automated pipeline capable of transporting texture information from images of real objects to 3D models of similar objects. This is a challenging problem, as an object's texture as seen in a photograph is distorted by many factors, including pose, geometry, and illumination. These geometric and photometric distortions must be undone in order to transfer the pure underlying texture to a new object --- the 3D model. Instead of using problematic dense correspondences, we factorize the problem into the reconstruction of a set of base textures (materials) and an illumination model for the object in the image. By exploiting the geometry of the similar 3D model, we reconstruct certain reliable texture regions and correct for the illumination, from which a full texture map can be recovered and applied to the model. Our method allows for large-scale unsupervised production of richly textured 3D models directly from image data, providing high quality virtual objects for 3D scene design or photo editing applications, as well as a wealth of data for training machine learning algorithms for various inference tasks in graphics and vision.

...read moreread less

Journal Article•DOI•

A Concentric Tube Continuum Robot with Piezoelectric Actuation for MRI-Guided Closed-Loop Targeting

[...]

Hao Su¹, Gang Li², D. Caleb Rucker³, Robert J. Webster⁴, Gregory S. Fischer² - Show less +1 more•Institutions (4)

Harvard University¹, Worcester Polytechnic Institute², Vanderbilt University³, University of Tennessee⁴

16 Mar 2016-Annals of Biomedical Engineering

TL;DR: As the first of its kind, MRI-guided targeted concentric tube needle placements with ex vivo porcine liver are demonstrated with 4.64 mm RMS error through closed-loop control of the piezoelectrically-actuated robot.

...read moreread less

Abstract: This paper presents the design, modeling and experimental evaluation of a magnetic resonance imaging (MRI)-compatible concentric tube continuum robotic system. This system enables MRI-guided deployment of a precurved and steerable concentric tube continuum mechanism, and is suitable for clinical applications where a curved trajectory is needed. This compact 6 degree-of-freedom (DOF) robotic system is piezoelectrically-actuated, and allows simultaneous robot motion and imaging with no visually observable image artifact. The targeting accuracy is evaluated with optical tracking system and gelatin phantom under live MRI-guidance with Root Mean Square (RMS) errors of 1.94 and 2.17 mm respectively. Furthermore, we demonstrate that the robot has kinematic redundancy to reach the same target through different paths. This was evaluated in both free space and MRI-guided gelatin phantom trails, with RMS errors of 0.48 and 0.59 mm respectively. As the first of its kind, MRI-guided targeted concentric tube needle placements with ex vivo porcine liver are demonstrated with 4.64 mm RMS error through closed-loop control of the piezoelectrically-actuated robot.

...read moreread less

Posted Content•

SyncSpecCNN: Synchronized Spectral CNN for 3D Shape Segmentation

[...]

Li Yi¹, Hao Su¹, Xingwen Guo², Leonidas J. Guibas¹•Institutions (2)

Stanford University¹, University of Hong Kong²

02 Dec 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: SyncSpecCNN as discussed by the authors proposes a spectral convolutional neural network for 3D shape part segmentation and keypoint prediction, which enables weight sharing by parameterizing kernels in the spectral domain spanned by graph laplacian eigenbases.

...read moreread less

Abstract: In this paper, we study the problem of semantic annotation on 3D models that are represented as shape graphs. A functional view is taken to represent localized information on graphs, so that annotations such as part segment or keypoint are nothing but 0-1 indicator vertex functions. Compared with images that are 2D grids, shape graphs are irregular and non-isomorphic data structures. To enable the prediction of vertex functions on them by convolutional neural networks, we resort to spectral CNN method that enables weight sharing by parameterizing kernels in the spectral domain spanned by graph laplacian eigenbases. Under this setting, our network, named SyncSpecCNN, strive to overcome two key challenges: how to share coefficients and conduct multi-scale analysis in different parts of the graph for a single shape, and how to share information across related but different shapes that may be represented by very different graphs. Towards these goals, we introduce a spectral parameterization of dilated convolutional kernels and a spectral transformer network. Experimentally we tested our SyncSpecCNN on various tasks, including 3D shape part segmentation and 3D keypoint prediction. State-of-the-art performance has been achieved on all benchmark datasets.

...read moreread less

Proceedings Article•

Adaptive Binary Quantization for Fast Nearest Neighbor Search

[...]

Zhujin Li¹, Xianglong Liu¹, Junjie Wu, Hao Su²•Institutions (2)

Beihang University¹, Stanford University²

01 Jan 2016

TL;DR: This work proposes an adaptive binary quantization method that learns a discriminative hash function with prototypes correspondingly associated with small unique binary codes that significantly outperforms state-of-the-art hashing methods.

...read moreread less

Abstract: Hashing has been proved an attractive technique for fast nearest neighbor search over big data. Compared to the projection based hashing methods, prototype based ones own stronger capability of generating discriminative binary codes for the data with complex inherent structure. However, our observation indicates that they still suffer from the insufficient coding that usually utilizes the complete binary codes in a hypercube. To address this problem, we propose an adaptive binary quantization method that learns a discriminative hash function with prototypes correspondingly associated with small unique binary codes. Our alternating optimization adaptively discovers the prototype set and the code set of a varying size in an efficient way, which together robustly approximate the data relations. Our method can be naturally generalized to the product space for long hash codes. We believe that our idea serves as a very helpful insight to hashing research. The extensive experiments on four large-scale (up to 80 million) datasets demonstrate that our method significantly outperforms state-of-the-art hashing methods, with up to 58.84% performance gains relatively.

...read moreread less

Journal Article•DOI•

Targeting Tumors with Small Molecule Peptides.

[...]

Andrew G. Cheetham, Daniel Keith, Pengcheng Zhang, Ran Lin, Hao Su, Honggang Cui¹ - Show less +2 more•Institutions (1)

Johns Hopkins University¹

30 Jun 2016-Current Cancer Drug Targets

TL;DR: This Review highlights recent examples of peptide-based targeting ligands that have been exploited to selectively deliver a chemotherapeutic payload to specific tumor-associated sites such as the vasculature, lymphatics, or cell surface.

...read moreread less

Abstract: Chemotherapeutic treatment of cancers is a challenging endeavor, hindered by poor selectivity towards tumorous tissues over healthy ones. Preferentially delivering a given drug to tumor sites necessitates the use of targeting elements, of which there are a wide range in development. In this Review, we highlight recent examples of peptide-based targeting ligands that have been exploited to selectively deliver a chemotherapeutic payload to specific tumor-associated sites such as the vasculature, lymphatics, or cell surface. The advantages and limitations of such approaches will be discussed with a view to potential future development. Additionally, we will also examine how peptide-based ligands can be used diagnostically in the detection and characterization of cancers through their incorporation into imaging agents.

...read moreread less

Patent•

Method for Implementing a High-Level Image Representation for Image Analysis

[...]

Fei-Fei Li¹, Jia Li¹, Hao Su¹•Institutions (1)

Stanford University¹

22 Jan 2016

TL;DR: In this article, an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors, blind to the testing dataset or visual task.

...read moreread less

Abstract: Robust low-level image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification; but pixels, or even local image patches, carry little semantic meanings. For high-level visual tasks, such low-level image representations are potentially not enough. The present invention provides a high-level image representation where an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors, blind to the testing dataset or visual task. Leveraging on this representation, superior performances on high-level visual recognition tasks are achieved with relatively classifiers such as logistic regression and linear SVM classifiers.

...read moreread less

Posted Content•

Beyond Holistic Object Recognition: Enriching Image Understanding with Part States

[...]

Cewu Lu, Hao Su, Yongyi Lu, Li Yi, Chi-Keung Tang, Leonidas J. Guibas - Show less +2 more

15 Dec 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper addresses the problem of inferring rich semantics imparted by an object part in still images by proposing to tokenize the semantic space as a discrete set of part states, and presents a part state dataset which contains rich part-level semantic annotations.

...read moreread less

Abstract: Important high-level vision tasks such as human-object interaction, image captioning and robotic manipulation require rich semantic descriptions of objects at part level. Based upon previous work on part localization, in this paper, we address the problem of inferring rich semantics imparted by an object part in still images. We propose to tokenize the semantic space as a discrete set of part states. Our modeling of part state is spatially localized, therefore, we formulate the part state inference problem as a pixel-wise annotation problem. An iterative part-state inference neural network is specifically designed for this task, which is efficient in time and accurate in performance. Extensive experiments demonstrate that the proposed method can effectively predict the semantic states of parts and simultaneously correct localization errors, thus benefiting a few visual understanding applications. The other contribution of this paper is our part state dataset which contains rich part-level semantic annotations.

...read moreread less

Novel design of an MR-safe pneumatic stepper motor for MRI-guided robotic interventions

[...]

Ziyan Guo, Tlt Lun, Yu Chen, Hao Su, Dtm Chan, Ka-Wai Kwok - Show less +2 more

01 Jan 2016

TL;DR: A novel MR-safe pneumatic stepper motor, whose design could be relatively compact, flexibly customized for various actuation requirements, is proposed and potential to be incorporated into an MRI-compatible robot for needle manipulation during intra-operative procedures, e.g. prostate surgery.

...read moreread less

Abstract: Magnetic resonance imaging (MRI) superiority is wellknown by providing non-ionizing radiation, noninvasive and high-contrast imaging in particular for soft tissues [1]. These advantages have prompted MRI for the various application in surgical interventions ranging from neurosurgery, cardiac ablation to prostate biopsies. However, MR-safe mechatronics is still confronted by the fundamental challenge, namely to maintain zero interference of its imaging operation during the MRI navigation. Currently, four types of MRI actuations have been explored at different MR-safety conditions [2]: 1) electric actuators, e.g. piezoelectric motors and ultrasonic motors; 2) fluid-power motors; 3) MRpowered actuators. In terms of adaptability in general hospital setup, image quality and MR-safety, pneumatic actuators are advantageous in both material and energetics considerations. The material of pneumatic actuators could be non-magnetic and non-conducting, minimizing the effects on inhomogeneity of magnetic field. Pressured clean air as power supply is commonly available in MRI scanner rooms. This ensures zero image artifacts caused by the electromagnetic (EM) waves of electricity. Such pneumatic stepper actuators with the capability of generating accurate stepwise motion have been introduced recently. Stoianovici et al. [3] invented the first MR-safe pneumatic stepper motor that comprises three actuated diaphragms driving a hoop gear. Several pneumatic stepper motors [4-6] have been sequentially developed and their performances (e.g. torque-speed) have been demonstrated. However, many technical challenges still have not been addressed in these motors, e.g. typically large in motor size, high cost for the complicated fabrication and sterilization, dissatisfactory signal-to-noise ratio (SNR), as well as image distortion induced by the proximal electronics and valves of motor drivers. In this paper, we propose a novel MR-safe pneumatic stepper motor, whose design could be relatively compact, flexibly customized for various actuation requirements. Such a motor can be made of a homogeneous material for ease of minimization and reconfiguration. One set of design parameters are selected for the experimental evaluation. Self-locking and high speed (up to 160RPM) is achieved in both rotation directions. Steady torque within a wide range of speed can also be preserved. Low imaging interference has been experimentally demonstrated while operating the motor inside the MRI scanner. Regarding these specifications, this motor is potential to be incorporated into an MRI-compatible robot for needle manipulation during intra-operative procedures, e.g. prostate surgery.

...read moreread less

Posted Content•

Learning Non-Lambertian Object Intrinsics across ShapeNet Categories

[...]

Jian Shi¹, Yue Dong², Hao Su³, Stella X. Yu⁴•Institutions (4)

Chinese Academy of Sciences¹, Microsoft², Stanford University³, University of California, Berkeley⁴

27 Dec 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work focuses on the non-Lambertian object-level intrinsic problem of recovering diffuse albedo, shading, and specular highlights from a single image of an object, and shows that feature learning at the encoder stage is more crucial for developing a universal representation across categories.

...read moreread less

Abstract: We consider the non-Lambertian object intrinsic problem of recovering diffuse albedo, shading, and specular highlights from a single image of an object. We build a large-scale object intrinsics database based on existing 3D models in the ShapeNet database. Rendered with realistic environment maps, millions of synthetic images of objects and their corresponding albedo, shading, and specular ground-truth images are used to train an encoder-decoder CNN. Once trained, the network can decompose an image into the product of albedo and shading components, along with an additive specular component. Our CNN delivers accurate and sharp results in this classical inverse problem of computer vision, sharp details attributed to skip layer connections at corresponding resolutions from the encoder to the decoder. Benchmarked on our ShapeNet and MIT intrinsics datasets, our model consistently outperforms the state-of-the-art by a large margin. We train and test our CNN on different object categories. Perhaps surprising especially from the CNN classification perspective, our intrinsics CNN generalizes very well across categories. Our analysis shows that feature learning at the encoder stage is more crucial for developing a universal representation across categories. We apply our synthetic data trained model to images and videos downloaded from the internet, and observe robust and realistic intrinsics results. Quality non-Lambertian intrinsics could open up many interesting applications such as image-based albedo and specular editing.

...read moreread less