Showing papers by "Hao Su published in 2018"

PDF

Open Access

Proceedings Article•DOI•

Frustum PointNets for 3D Object Detection from RGB-D Data

[...]

Charles R. Qi¹, Wei Liu, Chenxia Wu, Hao Su², Leonidas J. Guibas¹ - Show less +1 more•Institutions (2)

Stanford University¹, University of California, San Diego²

18 Jun 2018

TL;DR: This work directly operates on raw point clouds by popping up RGBD scans and leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects.

...read moreread less

Abstract: In this work, we study 3D object detection from RGBD data in both indoor and outdoor scenes. While previous methods focus on images or 3D voxels, often obscuring natural 3D patterns and invariances of 3D data, we directly operate on raw point clouds by popping up RGB-D scans. However, a key challenge of this approach is how to efficiently localize objects in point clouds of large-scale scenes (region proposal). Instead of solely relying on 3D proposals, our method leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects. Benefited from learning directly in raw point clouds, our method is also able to precisely estimate 3D bounding boxes even under strong occlusion or with very sparse points. Evaluated on KITTI and SUN RGB-D 3D detection benchmarks, our method outperforms the state of the art by remarkable margins while having real-time capability.

...read moreread less

1,947 citations

Journal Article•DOI•

Leveraging Elastic instabilities for Amplified Performance: spine-inspired high-speed and high-force soft robots

[...]

Yichao Tang¹, Yichao Tang², Yinding Chi¹, Jiefeng Sun³, Tzu-Hao Huang⁴, Omid Haji Maghsoudi², Andrew J. Spence², Jianguo Zhao³, Hao Su⁴, Jie Yin¹ - Show less +6 more•Institutions (4)

North Carolina State University¹, Temple University², Colorado State University³, City University of New York⁴

19 Oct 2018-arXiv: Applied Physics

TL;DR: This study establishes a new generic design paradigm of next-generation high-performance soft robots that are applicable for multifunctionality, different actuation methods, and materials at multiscales.

...read moreread less

Abstract: Soft machines typically exhibit slow locomotion speed and low manipulation strength because of intrinsic limitations of soft materials. Here, we present a generic design principle that harnesses mechanical instability for a variety of spine-inspired fast and strong soft machines. Unlike most current soft robots that are designed as inherently and unimodally stable, our design leverages tunable snap-through bistability to fully explore the ability of soft robots to rapidly store and release energy within tens of milliseconds. We demonstrate this generic design principle with three high-performance soft machines: High-speed cheetah-like galloping crawlers with locomotion speeds of 2.68 body length/s, high-speed underwater swimmers (0.78 body length/s), and tunable low-to-high-force soft grippers with over 1 to 103 stiffness modulation (maximum load capacity is 11.4 kg). Our study establishes a new generic design paradigm of next-generation high-performance soft robots that are applicable for multifunctionality, different actuation methods, and materials at multiscales.

...read moreread less

177 citations

Proceedings Article•DOI•

Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning

[...]

Chuang Gan¹, Boqing Gong², Kun Liu³, Hao Su⁴, Leonidas J. Guibas⁵ - Show less +1 more•Institutions (5)

IBM¹, Tencent², Beijing University of Posts and Telecommunications³, University of California, San Diego⁴, Stanford University⁵

18 Jun 2018

TL;DR: Geometry is explored, a grand new type of auxiliary supervision for the self-supervised learning of video representations, and it is found that the convolutional neural networks pre-trained by the geometry cues can be effectively adapted to semantic video understanding tasks.

...read moreread less

Abstract: It is often laborious and costly to manually annotate videos for training high-quality video recognition models, so there has been some work and interest in exploring alternative, cheap, and yet often noisy and indirect training signals for learning the video representations. However, these signals are still coarse, supplying supervision at the whole video frame level, and subtle, sometimes enforcing the learning agent to solve problems that are even hard for humans. In this paper, we instead explore geometry, a grand new type of auxiliary supervision for the self-supervised learning of video representations. In particular, we extract pixel-wise geometry information as flow fields and disparity maps from synthetic imagery and real 3D movies, respectively. Although the geometry and high-level semantics are seemingly distant topics, surprisingly, we find that the convolutional neural networks pre-trained by the geometry cues can be effectively adapted to semantic video understanding tasks. In addition, we also find that a progressive training strategy can foster a better neural network for the video recognition task than blindly pooling the distinct sources of geometry cues together. Extensive results on video dynamic scene recognition and action recognition tasks show that our geometry guided networks significantly outperform the competing methods that are trained with other types of labeling-free supervision signals.

...read moreread less

153 citations

Posted Content•

PartNet: A Large-scale Benchmark for Fine-grained and Hierarchical Part-level 3D Object Understanding

[...]

Kaichun Mo¹, Shilin Zhu², Angel X. Chang, Li Yi¹, Subarna Tripathi³, Leonidas J. Guibas¹, Hao Su² - Show less +3 more•Institutions (3)

Stanford University¹, University of California, San Diego², Intel³

06 Dec 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work presents PartNet, a consistent, large-scale dataset of 3D objects annotated with fine-grained, instance-level, and hierarchical 3D part information, and proposes a baseline method for part instance segmentation that is superior performance over existing methods.

...read moreread less

Abstract: We present PartNet: a consistent, large-scale dataset of 3D objects annotated with fine-grained, instance-level, and hierarchical 3D part information. Our dataset consists of 573,585 part instances over 26,671 3D models covering 24 object categories. This dataset enables and serves as a catalyst for many tasks such as shape analysis, dynamic 3D scene modeling and simulation, affordance analysis, and others. Using our dataset, we establish three benchmarking tasks for evaluating 3D part recognition: fine-grained semantic segmentation, hierarchical semantic segmentation, and instance segmentation. We benchmark four state-of-the-art 3D deep learning algorithms for fine-grained semantic segmentation and three baseline methods for hierarchical semantic segmentation. We also propose a novel method for part instance segmentation and demonstrate its superior performance over existing methods.

...read moreread less

129 citations

Posted Content•

Robust Watertight Manifold Surface Generation Method for ShapeNet Models

[...]

Jingwei Huang, Hao Su, Leonidas J. Guibas

05 Feb 2018-arXiv: Computational Geometry

TL;DR: A robust algorithm for 2-Manifold generation of various kinds of ShapeNet Models that can be adopted efficiently to all ShapeNet models with the guarantee of correct 2- manifold topology.

...read moreread less

Abstract: In this paper, we describe a robust algorithm for 2-Manifold generation of various kinds of ShapeNet Models. The input of our pipeline is a triangle mesh, with a set of vertices and triangular faces. The output of our pipeline is a 2-Manifold with vertices roughly uniformly distributed on the geometry surface. Our algorithm uses an octree to represent the original mesh, and construct the surface by isosurface extraction. Finally, we project the vertices to the original mesh to achieve high precision. As a result, our method can be adopted efficiently to all ShapeNet models with the guarantee of correct 2-Manifold topology.

...read moreread less

91 citations

Journal Article•DOI•

Comfort-Centered Design of a Lightweight and Backdrivable Knee Exoskeleton

[...]

Junlin Wang¹, Xiao Li¹, Tzu-Hao Huang¹, Shuangyue Yu¹, Yanjun Li¹, Tianyao Chen², Alessandra Carriero¹, Mooyeon Oh-Park³, Hao Su¹ - Show less +5 more•Institutions (3)

City University of New York¹, The Catholic University of America², Burke Rehabilitation Hospital³

08 Aug 2018

TL;DR: Kinematic simulations demonstrate that misalignment between the robot joint and knee joint can be reduced by 74% at maximum knee flexion and a low impedance mechanical transmission reduces the reflected inertia and damping of the actuator to human, thus the exoskeleton is highlybackdrivable.

...read moreread less

Abstract: This letter presents design principles for comfort-centered wearable robots and their application in a lightweight and backdrivable knee exoskeleton. The mitigation of discomfort is treated as mechanical design and control issues and three solutions are proposed in this letter: 1) a new wearable structure optimizes the strap attachment configuration and suit layout to ameliorate excessive shear forces of conventional wearable structure design; 2) rolling knee joint and double-hinge mechanisms reduce the misalignment in the sagittal and frontal plane, without increasing the mechanical complexity and inertia, respectively; 3) a low impedance mechanical transmission reduces the reflected inertia and damping of the actuator to human, thus the exoskeleton is highlybackdrivable. Kinematic simulations demonstrate that misalignment between the robot joint and knee joint can be reduced by 74% at maximum knee flexion. In experiments, the exoskeleton in the unpowered mode exhibits 1.03 Nm root mean square (RMS) low resistive torque. The torque control experiments demonstrate 0.31 Nm RMS torque tracking error in three human subjects.

...read moreread less

79 citations

Journal Article•DOI•

Deep part induction from articulated object pairs

[...]

Li Yi¹, Haibin Huang, Difan Liu², Evangelos Kalogerakis², Hao Su³, Leonidas J. Guibas¹ - Show less +2 more•Institutions (3)

Stanford University¹, University of Massachusetts Amherst², University of California³

04 Dec 2018-ACM Transactions on Graphics

TL;DR: In this article, the authors explore how the observation of different articulation states provides evidence for part structure and motion of 3D objects and propose a neural network architecture with three modules that respectively propose correspondences, estimate 3D deformation flows, and perform segmentation.

...read moreread less

Abstract: Object functionality is often expressed through part articulation - as when the two rigid parts of a scissor pivot against each other to perform the cutting function. Such articulations are often similar across objects within the same functional category. In this paper we explore how the observation of different articulation states provides evidence for part structure and motion of 3D objects. Our method takes as input a pair of unsegmented shapes representing two different articulation states of two functionally related objects, and induces their common parts along with their underlying rigid motion. This is a challenging setting, as we assume no prior shape structure, no prior shape category information, no consistent shape orientation, the articulation states may belong to objects of different geometry, plus we allow inputs to be noisy and partial scans, or point clouds lifted from RGB images. Our method learns a neural network architecture with three modules that respectively propose correspondences, estimate 3D deformation flows, and perform segmentation. To achieve optimal performance, our architecture alternates between correspondence, deformation flow, and segmentation prediction iteratively in an ICP-like fashion. Our results demonstrate that our method significantly outperforms state-of-the-art techniques in the task of discovering articulated parts of objects. In addition, our part induction is object-class agnostic and successfully generalizes to new and unseen objects.

...read moreread less

77 citations

Posted Content•

Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit?

[...]

Shilin Zhu¹, Xin Dong², Hao Su¹•Institutions (2)

University of California, San Diego¹, Harvard University²

20 Jun 2018-arXiv: Learning

TL;DR: The Binary Ensemble Neural Network (BENN) as mentioned in this paper leverages ensemble methods to improve the performance of BNNs with limited efficiency cost, which can even surpass the accuracy of the full-precision floating number network with the same architecture.

...read moreread less

Abstract: Binary neural networks (BNN) have been studied extensively since they run dramatically faster at lower memory and power consumption than floating-point networks, thanks to the efficiency of bit operations. However, contemporary BNNs whose weights and activations are both single bits suffer from severe accuracy degradation. To understand why, we investigate the representation ability, speed and bias/variance of BNNs through extensive experiments. We conclude that the error of BNNs is predominantly caused by the intrinsic instability (training time) and non-robustness (train & test time). Inspired by this investigation, we propose the Binary Ensemble Neural Network (BENN) which leverages ensemble methods to improve the performance of BNNs with limited efficiency cost. While ensemble techniques have been broadly believed to be only marginally helpful for strong classifiers such as deep neural networks, our analyses and experiments show that they are naturally a perfect fit to boost BNNs. We find that our BENN, which is faster and much more robust than state-of-the-art binary networks, can even surpass the accuracy of the full-precision floating number network with the same architecture.

...read moreread less

61 citations

Proceedings Article•DOI•

View Extrapolation of Human Body from a Single Image

[...]

Hao Zhu¹, Hao Su², Peng Wang, Xun Cao³, Ruigang Yang¹ - Show less +1 more•Institutions (3)

University of Kentucky¹, University of California, San Diego², Nanjing University³

18 Jun 2018

TL;DR: A novel deep learning based pipeline that explicitly estimates and leverages the geometry of the underlying human body is proposed that is able to factor out the space of data variation and makes learning at each step much easier.

...read moreread less

Abstract: We study how to synthesize novel views of human body from a single image. Though recent deep learning based methods work well for rigid objects, they often fail on objects with large articulation, like human bodies. The core step of existing methods is to fit a map from the observable views to novel views by CNNs; however, the rich articulation modes of human body make it rather challenging for CNNs to memorize and interpolate the data well. To address the problem, we propose a novel deep learning based pipeline that explicitly estimates and leverages the geometry of the underlying human body. Our new pipeline is a composition of a shape estimation network and an image generation network, and at the interface a perspective transformation is applied to generate a forward flow for pixel value transportation. Our design is able to factor out the space of data variation and makes learning at each step much easier. Empirically, we show that the performance for pose-varying objects can be improved dramatically. Our method can also be applied on real data captured by 3D sensors, and the flow generated by our methods can be used for generating high quality results in higher resolution.

...read moreread less

47 citations

Journal Article•DOI•

Techniques for Stereotactic Neurosurgery: Beyond the Frame, Toward the Intraoperative Magnetic Resonance Imaging–Guided and Robot-Assisted Approaches

[...]

Ziyan Guo¹, Martin C. W. Leong¹, Hao Su², Ka-Wai Kwok¹, Danny Tat Ming Chan³, Wai Sang Poon³ - Show less +2 more•Institutions (3)

University of Hong Kong¹, City University of New York², The Chinese University of Hong Kong³

01 Aug 2018-World Neurosurgery

TL;DR: To improve the surgical workflow and achieve greater clinical penetration, 3 key enabling techniques are proposed with emphasis on their current status, limitations, and future trends.

...read moreread less

40 citations

Posted Content•

View Extrapolation of Human Body from a Single Image

[...]

Hao Zhu¹, Hao Su², Peng Wang, Xun Cao³, Ruigang Yang¹ - Show less +1 more•Institutions (3)

University of Kentucky¹, University of California, San Diego², Nanjing University³

11 Apr 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a shape estimation network and an image generation network are combined to synthesize novel views of human body from a single image, and at the interface a perspective transformation is applied to generate a forward flow for pixel value transportation.

...read moreread less

Journal Article•DOI•

Deep Part Induction from Articulated Object Pairs

[...]

Li Yi¹, Haibin Huang, Difan Liu², Evangelos Kalogerakis², Hao Su³, Leonidas J. Guibas¹ - Show less +2 more•Institutions (3)

Stanford University¹, University of Massachusetts Amherst², University of California³

19 Sep 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper explores how the observation of different articulation states provides evidence for part structure and motion of 3D objects, and learns a neural network architecture with three modules that respectively propose correspondences, estimate 3D deformation flows, and perform segmentation.

...read moreread less

Abstract: Object functionality is often expressed through part articulation -- as when the two rigid parts of a scissor pivot against each other to perform the cutting function. Such articulations are often similar across objects within the same functional category. In this paper, we explore how the observation of different articulation states provides evidence for part structure and motion of 3D objects. Our method takes as input a pair of unsegmented shapes representing two different articulation states of two functionally related objects, and induces their common parts along with their underlying rigid motion. This is a challenging setting, as we assume no prior shape structure, no prior shape category information, no consistent shape orientation, the articulation states may belong to objects of different geometry, plus we allow inputs to be noisy and partial scans, or point clouds lifted from RGB images. Our method learns a neural network architecture with three modules that respectively propose correspondences, estimate 3D deformation flows, and perform segmentation. To achieve optimal performance, our architecture alternates between correspondence, deformation flow, and segmentation prediction iteratively in an ICP-like fashion. Our results demonstrate that our method significantly outperforms state-of-the-art techniques in the task of discovering articulated parts of objects. In addition, our part induction is object-class agnostic and successfully generalizes to new and unseen objects.

...read moreread less

Proceedings Article•DOI•

Beyond Holistic Object Recognition: Enriching Image Understanding with Part States

[...]

Cewu Lu¹, Hao Su², Yong-Lu Li¹, Yongyi Lu³, Li Yi⁴, Chi-Keung Tang³, Leonidas J. Guibas⁴ - Show less +3 more•Institutions (4)

Shanghai Jiao Tong University¹, City University of New York², Hong Kong University of Science and Technology³, Stanford University⁴

01 Jun 2018

TL;DR: Zhang et al. as mentioned in this paper propose to tokenize the semantic space as a discrete set of part states and formulate the part state inference problem as a pixel-wise annotation problem.

...read moreread less

Abstract: Important high-level vision tasks require rich semantic descriptions of objects at part level. Based upon previous work on part localization, in this paper, we address the problem of inferring rich semantics imparted by an object part in still images. Specifically, we propose to tokenize the semantic space as a discrete set of part states. Our modeling of part state is spatially localized, therefore, we formulate the part state inference problem as a pixel-wise annotation problem. An iterative part-state inference neural network that is efficient in time and accurate in performance is specifically designed for this task. Extensive experiments demonstrate that the proposed method can effectively predict the semantic states of parts and simultaneously improve part segmentation, thus benefiting a number of visual understanding applications. The other contribution of this paper is our part state dataset which contains rich part-level semantic annotations.

...read moreread less

Posted Content•

Deep Functional Dictionaries: Learning Consistent Semantic Structures on 3D Models from Functions

[...]

Minhyuk Sung¹, Hao Su², Ronald Yu³, Leonidas J. Guibas¹•Institutions (3)

Stanford University¹, Johns Hopkins University², University of Southern California³

25 May 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a neural network is trained on point cloud representations of shape geometry and associated semantic functions on that point cloud to generate latent bases, in a consistent order, that reflect the shared semantic structure among the shapes.

...read moreread less

Abstract: Various 3D semantic attributes such as segmentation masks, geometric features, keypoints, and materials can be encoded as per-point probe functions on 3D geometries. Given a collection of related 3D shapes, we consider how to jointly analyze such probe functions over different shapes, and how to discover common latent structures using a neural network --- even in the absence of any correspondence information. Our network is trained on point cloud representations of shape geometry and associated semantic functions on that point cloud. These functions express a shared semantic understanding of the shapes but are not coordinated in any way. For example, in a segmentation task, the functions can be indicator functions of arbitrary sets of shape parts, with the particular combination involved not known to the network. Our network is able to produce a small dictionary of basis functions for each shape, a dictionary whose span includes the semantic functions provided for that shape. Even though our shapes have independent discretizations and no functional correspondences are provided, the network is able to generate latent bases, in a consistent order, that reflect the shared semantic structure among the shapes. We demonstrate the effectiveness of our technique in various segmentation and keypoint selection applications.

...read moreread less

Proceedings Article•

Deep Functional Dictionaries: Learning Consistent Semantic Structures on 3D Models from Functions

[...]

Minhyuk Sung¹, Hao Su², Ronald Yu³, Leonidas J. Guibas¹•Institutions (3)

Stanford University¹, Johns Hopkins University², University of Southern California³

01 Jan 2018

TL;DR: In this paper, a neural network is trained on point cloud representations of shape geometry and associated semantic functions on that point cloud to generate latent bases, in a consistent order, that reflect the shared semantic structure among the shapes.

...read moreread less

Abstract: Various 3D semantic attributes such as segmentation masks, geometric features, keypoints, and materials can be encoded as per-point probe functions on 3D geometries. Given a collection of related 3D shapes, we consider how to jointly analyze such probe functions over different shapes, and how to discover common latent structures using a neural network — even in the absence of any correspondence information. Our network is trained on point cloud representations of shape geometry and associated semantic functions on that point cloud. These functions express a shared semantic understanding of the shapes but are not coordinated in any way. For example, in a segmentation task, the functions can be indicator functions of arbitrary sets of shape parts, with the particular combination involved not known to the network. Our network is able to produce a small dictionary of basis functions for each shape, a dictionary whose span includes the semantic functions provided for that shape. Even though our shapes have independent discretizations and no functional correspondences are provided, the network is able to generate latent bases, in a consistent order, that reflect the shared semantic structure among the shapes. We demonstrate the effectiveness of our technique in various segmentation and keypoint selection applications.

...read moreread less

Journal Article•DOI•

Transparent-to-dark photo- and electrochromic gels

[...]

Laura Gonzalez¹, Chung Liu², Bart Dietrich¹, Hao Su³, Stephen Sproules¹, Honggang Cui³, Dirk Honecker, Dave J. Adams¹, Emily R. Draper¹ - Show less +5 more•Institutions (3)

University of Glasgow¹, University of Liverpool², Johns Hopkins University³

02 Nov 2018

TL;DR: In this article, the authors describe a gel-based device which is both photo-and electrochromic, formed by the self-assembly of a naphthalene diimide.

...read moreread less

Abstract: Smart windows in which the transmittance can be controlled on demand are a promising solution for the reduction of energy use in buildings. Windows are often the most energy inefficient part of a building, and so controlling the transmittance has the potential to significantly improve heating costs. Whilst numerous approaches exist, many suitable materials are costly to manufacture and process and so new materials could have a significant impact. Here we describe a gel-based device which is both photo- and electrochromic. The gel matrix is formed by the self-assembly of a naphthalene diimide. The radical anion of the naphthalene diimide can be formed photo or electrochemically, and leads to a desirable transition from transparent to black. The speed of response, low potential needed to generate the radical anion, cyclability of the system, temperature stability and low cost mean these devices may be suitable for applications in smart windows.

...read moreread less

Proceedings Article•DOI•

A soft robotic exo-sheath using fabric EMG sensing for hand rehabilitation and assistance

[...]

Jiaqi Guo¹, Shuangyue Yu¹, Yanjun Li¹, Tzu-Hao Huang¹, Junlin Wang¹, Brian Lynn¹, Jeremy Fidock², Chien-Lung Shen, Dylan J. Edwards², Hao Su¹ - Show less +6 more•Institutions (2)

City University of New York¹, Cornell University²

24 Apr 2018

TL;DR: This paper presents the design and evaluation of a soft hand exo-sheath integrated with a soft fabric electromyography (EMG) sensor for rehabilitation and activities of daily living (ADL) assistance of stroke and spinal cord injury patients.

...read moreread less

Abstract: This paper presents the design and evaluation of a soft hand exo-sheath integrated with a soft fabric electromyography (EMG) sensor for rehabilitation and activities of daily living (ADL) assistance of stroke and spinal cord injury (SCI) patients. This wearable robot addresses the limitations of the soft robot gloves with design considerations in terms of ergonomics and clinical practice. Its features include: this exo-sheath is based on electric actuation and has been designed to be compact and portable. It reduces the shear force and avoids kinematic singularity comparing with tendon-driven soft gloves as their tendon routings are typically in parallel with individual fingers. Disparate from conventional robotic gloves, this design optimizes a bio-inspired fin-ray structure to enhance the hand proprioception as the palm is not covered by wearable structures. With a novel self-fastening finger clasp design, wearers can self-don/doff the exoskeleton device simplifying ADL assistance. To develop more intuitive control interface, a soft fabric EMG sensor has been developed to understand human intentions. The functionality of this soft robot has been demonstrated with experimental results using the low-level position control, kinematics evaluation and reliable EMG measurements.

...read moreread less

Posted Content•

Towards Fast and Energy-Efficient Binarized Neural Network Inference on FPGA

[...]

Cheng Fu¹, Shilin Zhu¹, Hao Su¹, Ching-En Lee, Jishen Zhao¹ - Show less +1 more•Institutions (1)

University of California, San Diego¹

04 Oct 2018-arXiv: Learning

TL;DR: Two types of fast and energy-efficient architectures for BNN inference are proposed and analysis and insights are provided to pick the better strategy of these two for different datasets and network models.

...read moreread less

Abstract: Binarized Neural Network (BNN) removes bitwidth redundancy in classical CNN by using a single bit (-1/+1) for network parameters and intermediate representations, which has greatly reduced the off-chip data transfer and storage overhead. However, a large amount of computation redundancy still exists in BNN inference. By analyzing local properties of images and the learned BNN kernel weights, we observe an average of $\sim$78% input similarity and $\sim$59% weight similarity among weight kernels, measured by our proposed metric in common network architectures. Thus there does exist redundancy that can be exploited to further reduce the amount of on-chip computations. Motivated by the observation, in this paper, we proposed two types of fast and energy-efficient architectures for BNN inference. We also provide analysis and insights to pick the better strategy of these two for different datasets and network models. By reusing the results from previous computation, much cycles for data buffer access and computations can be skipped. By experiments, we demonstrate that 80% of the computation and 40% of the buffer access can be skipped by exploiting BNN similarity. Thus, our design can achieve 17% reduction in total power consumption, 54% reduction in on-chip power consumption and 2.4$\times$ maximum speedup, compared to the baseline without applying our reuse technique. Our design also shows 1.9$\times$ more area-efficiency compared to state-of-the-art BNN inference design. We believe our deployment of BNN on FPGA leads to a promising future of running deep learning models on mobile devices.

...read moreread less

Posted Content•

A Main/Subsidiary Network Framework for Simplifying Binary Neural Network

[...]

Yinghao Xu, Xin Dong, Yudian Li, Hao Su

11 Dec 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: The filter-level pruning problem for binary neural networks, which cannot be solved by simply migrating existing structural pruning methods for full-precision models, is defined for the first time and a novel learning-based approach is proposed to prune filters in main/subsidiary network frame-work.

...read moreread less

Abstract: To reduce memory footprint and run-time latency, techniques such as neural network pruning and binarization have been explored separately. However, it is unclear how to combine the best of the two worlds to get extremely small and efficient models. In this paper, we, for the first time, define the filter-level pruning problem for binary neural networks, which cannot be solved by simply migrating existing structural pruning methods for full-precision models. A novel learning-based approach is proposed to prune filters in our main/subsidiary network framework, where the main network is responsible for learning representative features to optimize the prediction performance, and the subsidiary component works as a filter selector on the main network. To avoid gradient mismatch when training the subsidiary component, we propose a layer-wise and bottom-up scheme. We also provide the theoretical and experimental comparison between our learning-based and greedy rule-based methods. Finally, we empirically demonstrate the effectiveness of our approach applied on several binary models, including binarized NIN, VGG-11, and ResNet-18, on various image classification datasets.

...read moreread less

Posted Content•

Stylize Aesthetic QR Code

[...]

Mingliang Xu, Hao Su, Yafei Li, Xi Li, Jing Liao, Jianwei Niu, Pei Lv, Bing Zhou - Show less +4 more

03 Mar 2018-arXiv: Multimedia

TL;DR: This paper proposes a novel type of aesthetic QR codes, Stylized aEsthEtic (SEE) QR code, and a three-stage approach to automatically produce such robust style-oriented codes and designs a module-based robustness-optimization mechanism to ensure the performance robust by balancing two competing terms: visual quality and readability.

...read moreread less

Abstract: With the continued proliferation of smart mobile devices, Quick Response (QR) code has become one of the most-used types of two-dimensional code in the world. Aiming at beautifying the visual-unpleasant appearance of QR codes, existing works have developed a series of techniques. However, these works still leave much to be desired, such as personalization, artistry, and robustness. To address these issues, in this paper, we propose a novel type of aesthetic QR codes, SEE (Stylize aEsthEtic) QR code, and a three-stage approach to automatically produce such robust style-oriented codes. Specifically, in the first stage, we propose a method to generate an optimized baseline aesthetic QR code, which reduces the visual contrast between the noise-like black/white modules and the blended image. In the second stage, to obtain art style QR code, we tailor an appropriate neural style transformation network to endow the baseline aesthetic QR code with artistic elements. In the third stage, we design an error-correction mechanism by balancing two competing terms, visual quality and readability, to ensure the performance robust. Extensive experiments demonstrate that SEE QR code has high quality in terms of both visual appearance and robustness, and also offers a greater variety of personalized choices to users.

...read moreread less

Posted Content•

Adversarial Defense by Stratified Convolutional Sparse Coding

[...]

Bo Sun¹, Nian-hsuan Tsai², Fangchen Liu², Ronald Yu², Hao Su² - Show less +1 more•Institutions (2)

Peking University¹, University of California, San Diego²

30 Nov 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work constructs a stratified low-dimensional quasi-natural image space that faithfully approximates the natural image space while also removing adversarial perturbations and introduces a novel Sparse Transformation Layer (STL) in between the input image and the first layer of the neural network to efficiently project images into the quasi-Natural image space.

...read moreread less

Abstract: We propose an adversarial defense method that achieves state-of-the-art performance among attack-agnostic adversarial defense methods while also maintaining robustness to input resolution, scale of adversarial perturbation, and scale of dataset size. Based on convolutional sparse coding, we construct a stratified low-dimensional quasi-natural image space that faithfully approximates the natural image space while also removing adversarial perturbations. We introduce a novel Sparse Transformation Layer (STL) in between the input image and the first layer of the neural network to efficiently project images into our quasi-natural image space. Our experiments show state-of-the-art performance of our method compared to other attack-agnostic adversarial defense methods in various adversarial settings.

...read moreread less

Journal Article•DOI•

DeepPrimitive: Image decomposition by layered primitive detection

[...]

Jiahui Huang¹, Jun Gao², Vignesh Ganapathi-Subramanian³, Hao Su⁴, Yin Liu⁵, Chengcheng Tang³, Leonidas J. Guibas³ - Show less +3 more•Institutions (5)

Tsinghua University¹, University of Toronto², Stanford University³, University of California, San Diego⁴, University of Wisconsin-Madison⁵

23 Dec 2018-Computational Visual Media

TL;DR: This paper builds a framework to detect primitives from images in a layered manner by modifying the YOLO network, and uses an RNN with a novel loss function to equip this network with the capability to predict primitives with a variable number of parameters.

...read moreread less

Abstract: The perception of the visual world through basic building blocks, such as cubes, spheres, and cones, gives human beings a parsimonious understanding of the visual world. Thus, efforts to find primitive-based geometric interpretations of visual data date back to 1970s studies of visual media. However, due to the difficulty of primitive fitting in the pre-deep learning age, this research approach faded from the main stage, and the vision community turned primarily to semantic image understanding. In this paper, we revisit the classical problem of building geometric interpretations of images, using supervised deep learning tools. We build a framework to detect primitives from images in a layered manner by modifying the YOLO network; an RNN with a novel loss function is then used to equip this network with the capability to predict primitives with a variable number of parameters. We compare our pipeline to traditional and other baseline learning methods, demonstrating that our layered detection model has higher accuracy and performs better reconstruction.

...read moreread less

Proceedings Article•DOI•

Interaction Force Modeling for Joint Misalignment Minimization Toward Bio-Inspired Knee Exoskeleton Design

[...]

Yanjun Li¹, Shuo-Hsiu Chang², Gerard E. Francisco², Hao Su¹•Institutions (2)

City College of New York¹, University of Texas Health Science Center at Houston²

14 Jun 2018

TL;DR: This paper presents a new approach to tackle the joint misalignment problems of conventional rigid exoskeletons by using complex mechanisms (e.g. cam mechanism or five-bar linkage) to compensate the inertia and render low impedance to the wearers.

...read moreread less

Abstract: Roboticists have developed a diverse array of powered exoskeletons for human augmentation and rehabilitation over the last few decades. One of the key design objectives is to minimize the discomfort to enhance the user experience. The high inertia and joint misalignment of conventional rigid exoskeletons are two key factors that cause these problems. Different types of control algorithms have been developed to compensate the inertia and render low impedance to the wearers [1-2]. In addition to the high inertia, the misalignment between exoskeleton joints and musculoskeletal joints of wearers can cause detrimental forces [3-4]. Conventionally, the mechanical knee joints of rigid knee exoskeletons are typically treated as a simple 1 degree of freedom (DOF) hinge mechanism, but the biological knee possesses complex kinematic characteristics. When this kind of 1-DOF exoskeleton and wearer’s limb form a closed kinematic chain, both kinematic and kinetic interference will inevitably occur. There are two existing solutions to tackle the joint misalignment problems. One method aims to use complex mechanisms (e.g. cam mechanism or five-bar linkage) to

...read moreread less

Posted Content•

ApolloCar3D: A Large 3D Car Instance Understanding Benchmark for Autonomous Driving.

[...]

Xibin Song¹, Peng Wang¹, Dingfu Zhou¹, Rui Zhu², Chenye Guan¹, Yuchao Dai³, Hao Su², Hongdong Li⁴, Ruigang Yang¹ - Show less +5 more•Institutions (4)

Baidu¹, University of California, San Diego², Northwestern Polytechnical University³, Australian National University⁴

29 Nov 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper contributes the first large scale database suitable for 3D car instance understanding – ApolloCar3D and builds various baseline algorithms with the state-of-the-art deep convolutional neural networks.

...read moreread less

Abstract: Autonomous driving has attracted remarkable attention from both industry and academia. An important task is to estimate 3D properties(e.g.translation, rotation and shape) of a moving or parked vehicle on the road. This task, while critical, is still under-researched in the computer vision community - partially owing to the lack of large scale and fully-annotated 3D car database suitable for autonomous driving research. In this paper, we contribute the first large-scale database suitable for 3D car instance understanding - ApolloCar3D. The dataset contains 5,277 driving images and over 60K car instances, where each car is fitted with an industry-grade 3D CAD model with absolute model size and semantically labelled keypoints. This dataset is above 20 times larger than PASCAL3D+ and KITTI, the current state-of-the-art. To enable efficient labelling in 3D, we build a pipeline by considering 2D-3D keypoint correspondences for a single instance and 3D relationship among multiple instances. Equipped with such dataset, we build various baseline algorithms with the state-of-the-art deep convolutional neural networks. Specifically, we first segment each car with a pre-trained Mask R-CNN, and then regress towards its 3D pose and shape based on a deformable 3D car model with or without using semantic keypoints. We show that using keypoints significantly improves fitting performance. Finally, we develop a new 3D metric jointly considering 3D pose and 3D shape, allowing for comprehensive evaluation and ablation study. By comparing with human performance we suggest several future directions for further improvements.

...read moreread less

Proceedings Article•DOI•

Real-Time Robust 3D Plane Extraction for Wearable Robot Perception and Control

[...]

Ran Duan¹, Shuangyue Yu¹, Guang H. Yue², Richard Foulds³, Chen Feng⁴, Yingli Tian¹, Hao Su¹ - Show less +3 more•Institutions (4)

City University of New York¹, Kessler Foundation², New Jersey Institute of Technology³, Mitsubishi Electric Research Laboratories⁴

14 Jun 2018

Proceedings Article•DOI•

A Modular Approach for Lightweight Humanoid Hand Design Using High Torque Density Electric Actuators

[...]

Haotian Cui¹, Shuangyue Yu¹, Xunge Yan, Shuo-Hsiu Chang², Gerard E. Francisco², Qiushi Fu³, Hao Su¹ - Show less +3 more•Institutions (3)

City College of New York¹, University of Texas Health Science Center at Houston², University of Central Florida³

14 Jun 2018

TL;DR: The contribution of this work is the development of a custom high-torque density electric actuator and its application.

...read moreread less

Abstract: BACKGROUND The human hand has extraordinary dexterity with more than 20 degrees of freedom (DOF) actuated by lightweight and efficient biological actuators (i.e., muscles). The average weight of human hand is only 400g [1]. Over the last few decades, research and commercialization effort has been dedicated to the development of novel robotic hands for humanoid or prosthetic application towards dexterous and biomimetic design [2]. However, due to the limitations of existing electric motors in terms of torque density and energy efficiency, the design of humanoid hands has to compromise between dexterity and weight. For example, commercial prosthetic terminal devices i-Limb [3] and Bebionic [4] prioritize the lightweight need (450g) and use 5-DOF motors to under-actuated 11 joints, which is only able to realize a few basic grasp postures. On the other hand, some humanoid robot hand devices like DLR-HIT I & II hands [5] prioritize the dexterity need (15 DOF) but weigh more than four times than their biological counterpart (2200g and 1500g, respectively). The contribution of this work is the development of a custom high-torque density electric actuator and its application

...read moreread less

Book Chapter•DOI•

Synthesis of Mikto-Arm Star Peptide Conjugates.

[...]

Jin Mo Koo¹, Hao Su¹, Yi-An Lin¹, Honggang Cui¹, Honggang Cui² - Show less +1 more•Institutions (2)

Johns Hopkins University¹, Johns Hopkins University School of Medicine²

01 Jan 2018-Methods of Molecular Biology

TL;DR: This work describes the detailed procedure for synthesis of an ABC Mikto-arm star peptide conjugate in which two immiscible entities are conjugated onto a short β-sheet forming peptide sequence, GNNQQNY, derived from the Sup35 prion, through a lysine junction.

...read moreread less

Abstract: Mikto-arm star peptide conjugates are an emerging class of self-assembling peptide-based structural units that contain three or more auxiliary segments of different chemical compositions and/or functionalities. This group of molecules exhibit interesting self-assembly behavior in solution due to their chemically asymmetric topology. Here we describe the detailed procedure for synthesis of an ABC Mikto-arm star peptide conjugate in which two immiscible entities (a saturated hydrocarbon and a hydrophobic and lipophobic fluorocarbon) are conjugated onto a short β-sheet forming peptide sequence, GNNQQNY, derived from the Sup35 prion, through a lysine junction. Automated and manual Fmoc-solid phase synthesis techniques are used to synthesize the Mikto-arm star peptide conjugates, followed by HPLC purification. We envision that this set of protocols can afford a versatile platform to synthesize a new class of peptidic building units for diverse applications.

...read moreread less