scispace - formally typeset
Search or ask a question

Showing papers by "Hao Su published in 2018"


Proceedings ArticleDOI
18 Jun 2018
TL;DR: This work directly operates on raw point clouds by popping up RGBD scans and leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects.
Abstract: In this work, we study 3D object detection from RGBD data in both indoor and outdoor scenes. While previous methods focus on images or 3D voxels, often obscuring natural 3D patterns and invariances of 3D data, we directly operate on raw point clouds by popping up RGB-D scans. However, a key challenge of this approach is how to efficiently localize objects in point clouds of large-scale scenes (region proposal). Instead of solely relying on 3D proposals, our method leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects. Benefited from learning directly in raw point clouds, our method is also able to precisely estimate 3D bounding boxes even under strong occlusion or with very sparse points. Evaluated on KITTI and SUN RGB-D 3D detection benchmarks, our method outperforms the state of the art by remarkable margins while having real-time capability.

1,947 citations


Journal ArticleDOI
TL;DR: This study establishes a new generic design paradigm of next-generation high-performance soft robots that are applicable for multifunctionality, different actuation methods, and materials at multiscales.
Abstract: Soft machines typically exhibit slow locomotion speed and low manipulation strength because of intrinsic limitations of soft materials. Here, we present a generic design principle that harnesses mechanical instability for a variety of spine-inspired fast and strong soft machines. Unlike most current soft robots that are designed as inherently and unimodally stable, our design leverages tunable snap-through bistability to fully explore the ability of soft robots to rapidly store and release energy within tens of milliseconds. We demonstrate this generic design principle with three high-performance soft machines: High-speed cheetah-like galloping crawlers with locomotion speeds of 2.68 body length/s, high-speed underwater swimmers (0.78 body length/s), and tunable low-to-high-force soft grippers with over 1 to 103 stiffness modulation (maximum load capacity is 11.4 kg). Our study establishes a new generic design paradigm of next-generation high-performance soft robots that are applicable for multifunctionality, different actuation methods, and materials at multiscales.

177 citations


Proceedings ArticleDOI
18 Jun 2018
TL;DR: Geometry is explored, a grand new type of auxiliary supervision for the self-supervised learning of video representations, and it is found that the convolutional neural networks pre-trained by the geometry cues can be effectively adapted to semantic video understanding tasks.
Abstract: It is often laborious and costly to manually annotate videos for training high-quality video recognition models, so there has been some work and interest in exploring alternative, cheap, and yet often noisy and indirect training signals for learning the video representations. However, these signals are still coarse, supplying supervision at the whole video frame level, and subtle, sometimes enforcing the learning agent to solve problems that are even hard for humans. In this paper, we instead explore geometry, a grand new type of auxiliary supervision for the self-supervised learning of video representations. In particular, we extract pixel-wise geometry information as flow fields and disparity maps from synthetic imagery and real 3D movies, respectively. Although the geometry and high-level semantics are seemingly distant topics, surprisingly, we find that the convolutional neural networks pre-trained by the geometry cues can be effectively adapted to semantic video understanding tasks. In addition, we also find that a progressive training strategy can foster a better neural network for the video recognition task than blindly pooling the distinct sources of geometry cues together. Extensive results on video dynamic scene recognition and action recognition tasks show that our geometry guided networks significantly outperform the competing methods that are trained with other types of labeling-free supervision signals.

153 citations


Posted Content
TL;DR: This work presents PartNet, a consistent, large-scale dataset of 3D objects annotated with fine-grained, instance-level, and hierarchical 3D part information, and proposes a baseline method for part instance segmentation that is superior performance over existing methods.
Abstract: We present PartNet: a consistent, large-scale dataset of 3D objects annotated with fine-grained, instance-level, and hierarchical 3D part information. Our dataset consists of 573,585 part instances over 26,671 3D models covering 24 object categories. This dataset enables and serves as a catalyst for many tasks such as shape analysis, dynamic 3D scene modeling and simulation, affordance analysis, and others. Using our dataset, we establish three benchmarking tasks for evaluating 3D part recognition: fine-grained semantic segmentation, hierarchical semantic segmentation, and instance segmentation. We benchmark four state-of-the-art 3D deep learning algorithms for fine-grained semantic segmentation and three baseline methods for hierarchical semantic segmentation. We also propose a novel method for part instance segmentation and demonstrate its superior performance over existing methods.

129 citations


Posted Content
TL;DR: A robust algorithm for 2-Manifold generation of various kinds of ShapeNet Models that can be adopted efficiently to all ShapeNet models with the guarantee of correct 2- manifold topology.
Abstract: In this paper, we describe a robust algorithm for 2-Manifold generation of various kinds of ShapeNet Models. The input of our pipeline is a triangle mesh, with a set of vertices and triangular faces. The output of our pipeline is a 2-Manifold with vertices roughly uniformly distributed on the geometry surface. Our algorithm uses an octree to represent the original mesh, and construct the surface by isosurface extraction. Finally, we project the vertices to the original mesh to achieve high precision. As a result, our method can be adopted efficiently to all ShapeNet models with the guarantee of correct 2-Manifold topology.

91 citations


Journal ArticleDOI
08 Aug 2018
TL;DR: Kinematic simulations demonstrate that misalignment between the robot joint and knee joint can be reduced by 74% at maximum knee flexion and a low impedance mechanical transmission reduces the reflected inertia and damping of the actuator to human, thus the exoskeleton is highlybackdrivable.
Abstract: This letter presents design principles for comfort-centered wearable robots and their application in a lightweight and backdrivable knee exoskeleton. The mitigation of discomfort is treated as mechanical design and control issues and three solutions are proposed in this letter: 1) a new wearable structure optimizes the strap attachment configuration and suit layout to ameliorate excessive shear forces of conventional wearable structure design; 2) rolling knee joint and double-hinge mechanisms reduce the misalignment in the sagittal and frontal plane, without increasing the mechanical complexity and inertia, respectively; 3) a low impedance mechanical transmission reduces the reflected inertia and damping of the actuator to human, thus the exoskeleton is highlybackdrivable. Kinematic simulations demonstrate that misalignment between the robot joint and knee joint can be reduced by 74% at maximum knee flexion. In experiments, the exoskeleton in the unpowered mode exhibits 1.03 Nm root mean square (RMS) low resistive torque. The torque control experiments demonstrate 0.31 Nm RMS torque tracking error in three human subjects.

79 citations


Journal ArticleDOI
TL;DR: In this article, the authors explore how the observation of different articulation states provides evidence for part structure and motion of 3D objects and propose a neural network architecture with three modules that respectively propose correspondences, estimate 3D deformation flows, and perform segmentation.
Abstract: Object functionality is often expressed through part articulation - as when the two rigid parts of a scissor pivot against each other to perform the cutting function. Such articulations are often similar across objects within the same functional category. In this paper we explore how the observation of different articulation states provides evidence for part structure and motion of 3D objects. Our method takes as input a pair of unsegmented shapes representing two different articulation states of two functionally related objects, and induces their common parts along with their underlying rigid motion. This is a challenging setting, as we assume no prior shape structure, no prior shape category information, no consistent shape orientation, the articulation states may belong to objects of different geometry, plus we allow inputs to be noisy and partial scans, or point clouds lifted from RGB images. Our method learns a neural network architecture with three modules that respectively propose correspondences, estimate 3D deformation flows, and perform segmentation. To achieve optimal performance, our architecture alternates between correspondence, deformation flow, and segmentation prediction iteratively in an ICP-like fashion. Our results demonstrate that our method significantly outperforms state-of-the-art techniques in the task of discovering articulated parts of objects. In addition, our part induction is object-class agnostic and successfully generalizes to new and unseen objects.

77 citations


Posted Content
TL;DR: The Binary Ensemble Neural Network (BENN) as mentioned in this paper leverages ensemble methods to improve the performance of BNNs with limited efficiency cost, which can even surpass the accuracy of the full-precision floating number network with the same architecture.
Abstract: Binary neural networks (BNN) have been studied extensively since they run dramatically faster at lower memory and power consumption than floating-point networks, thanks to the efficiency of bit operations. However, contemporary BNNs whose weights and activations are both single bits suffer from severe accuracy degradation. To understand why, we investigate the representation ability, speed and bias/variance of BNNs through extensive experiments. We conclude that the error of BNNs is predominantly caused by the intrinsic instability (training time) and non-robustness (train & test time). Inspired by this investigation, we propose the Binary Ensemble Neural Network (BENN) which leverages ensemble methods to improve the performance of BNNs with limited efficiency cost. While ensemble techniques have been broadly believed to be only marginally helpful for strong classifiers such as deep neural networks, our analyses and experiments show that they are naturally a perfect fit to boost BNNs. We find that our BENN, which is faster and much more robust than state-of-the-art binary networks, can even surpass the accuracy of the full-precision floating number network with the same architecture.

61 citations


Proceedings ArticleDOI
18 Jun 2018
TL;DR: A novel deep learning based pipeline that explicitly estimates and leverages the geometry of the underlying human body is proposed that is able to factor out the space of data variation and makes learning at each step much easier.
Abstract: We study how to synthesize novel views of human body from a single image. Though recent deep learning based methods work well for rigid objects, they often fail on objects with large articulation, like human bodies. The core step of existing methods is to fit a map from the observable views to novel views by CNNs; however, the rich articulation modes of human body make it rather challenging for CNNs to memorize and interpolate the data well. To address the problem, we propose a novel deep learning based pipeline that explicitly estimates and leverages the geometry of the underlying human body. Our new pipeline is a composition of a shape estimation network and an image generation network, and at the interface a perspective transformation is applied to generate a forward flow for pixel value transportation. Our design is able to factor out the space of data variation and makes learning at each step much easier. Empirically, we show that the performance for pose-varying objects can be improved dramatically. Our method can also be applied on real data captured by 3D sensors, and the flow generated by our methods can be used for generating high quality results in higher resolution.

47 citations


Journal ArticleDOI
TL;DR: To improve the surgical workflow and achieve greater clinical penetration, 3 key enabling techniques are proposed with emphasis on their current status, limitations, and future trends.

40 citations


Posted Content
TL;DR: In this article, a shape estimation network and an image generation network are combined to synthesize novel views of human body from a single image, and at the interface a perspective transformation is applied to generate a forward flow for pixel value transportation.
Abstract: We study how to synthesize novel views of human body from a single image. Though recent deep learning based methods work well for rigid objects, they often fail on objects with large articulation, like human bodies. The core step of existing methods is to fit a map from the observable views to novel views by CNNs; however, the rich articulation modes of human body make it rather challenging for CNNs to memorize and interpolate the data well. To address the problem, we propose a novel deep learning based pipeline that explicitly estimates and leverages the geometry of the underlying human body. Our new pipeline is a composition of a shape estimation network and an image generation network, and at the interface a perspective transformation is applied to generate a forward flow for pixel value transportation. Our design is able to factor out the space of data variation and makes learning at each step much easier. Empirically, we show that the performance for pose-varying objects can be improved dramatically. Our method can also be applied on real data captured by 3D sensors, and the flow generated by our methods can be used for generating high quality results in higher resolution.

Journal ArticleDOI
TL;DR: This paper explores how the observation of different articulation states provides evidence for part structure and motion of 3D objects, and learns a neural network architecture with three modules that respectively propose correspondences, estimate 3D deformation flows, and perform segmentation.
Abstract: Object functionality is often expressed through part articulation -- as when the two rigid parts of a scissor pivot against each other to perform the cutting function. Such articulations are often similar across objects within the same functional category. In this paper, we explore how the observation of different articulation states provides evidence for part structure and motion of 3D objects. Our method takes as input a pair of unsegmented shapes representing two different articulation states of two functionally related objects, and induces their common parts along with their underlying rigid motion. This is a challenging setting, as we assume no prior shape structure, no prior shape category information, no consistent shape orientation, the articulation states may belong to objects of different geometry, plus we allow inputs to be noisy and partial scans, or point clouds lifted from RGB images. Our method learns a neural network architecture with three modules that respectively propose correspondences, estimate 3D deformation flows, and perform segmentation. To achieve optimal performance, our architecture alternates between correspondence, deformation flow, and segmentation prediction iteratively in an ICP-like fashion. Our results demonstrate that our method significantly outperforms state-of-the-art techniques in the task of discovering articulated parts of objects. In addition, our part induction is object-class agnostic and successfully generalizes to new and unseen objects.

Proceedings ArticleDOI
01 Jun 2018
TL;DR: Zhang et al. as mentioned in this paper propose to tokenize the semantic space as a discrete set of part states and formulate the part state inference problem as a pixel-wise annotation problem.
Abstract: Important high-level vision tasks require rich semantic descriptions of objects at part level. Based upon previous work on part localization, in this paper, we address the problem of inferring rich semantics imparted by an object part in still images. Specifically, we propose to tokenize the semantic space as a discrete set of part states. Our modeling of part state is spatially localized, therefore, we formulate the part state inference problem as a pixel-wise annotation problem. An iterative part-state inference neural network that is efficient in time and accurate in performance is specifically designed for this task. Extensive experiments demonstrate that the proposed method can effectively predict the semantic states of parts and simultaneously improve part segmentation, thus benefiting a number of visual understanding applications. The other contribution of this paper is our part state dataset which contains rich part-level semantic annotations.

Posted Content
TL;DR: In this article, a neural network is trained on point cloud representations of shape geometry and associated semantic functions on that point cloud to generate latent bases, in a consistent order, that reflect the shared semantic structure among the shapes.
Abstract: Various 3D semantic attributes such as segmentation masks, geometric features, keypoints, and materials can be encoded as per-point probe functions on 3D geometries. Given a collection of related 3D shapes, we consider how to jointly analyze such probe functions over different shapes, and how to discover common latent structures using a neural network --- even in the absence of any correspondence information. Our network is trained on point cloud representations of shape geometry and associated semantic functions on that point cloud. These functions express a shared semantic understanding of the shapes but are not coordinated in any way. For example, in a segmentation task, the functions can be indicator functions of arbitrary sets of shape parts, with the particular combination involved not known to the network. Our network is able to produce a small dictionary of basis functions for each shape, a dictionary whose span includes the semantic functions provided for that shape. Even though our shapes have independent discretizations and no functional correspondences are provided, the network is able to generate latent bases, in a consistent order, that reflect the shared semantic structure among the shapes. We demonstrate the effectiveness of our technique in various segmentation and keypoint selection applications.

Proceedings Article
01 Jan 2018
TL;DR: In this paper, a neural network is trained on point cloud representations of shape geometry and associated semantic functions on that point cloud to generate latent bases, in a consistent order, that reflect the shared semantic structure among the shapes.
Abstract: Various 3D semantic attributes such as segmentation masks, geometric features, keypoints, and materials can be encoded as per-point probe functions on 3D geometries. Given a collection of related 3D shapes, we consider how to jointly analyze such probe functions over different shapes, and how to discover common latent structures using a neural network — even in the absence of any correspondence information. Our network is trained on point cloud representations of shape geometry and associated semantic functions on that point cloud. These functions express a shared semantic understanding of the shapes but are not coordinated in any way. For example, in a segmentation task, the functions can be indicator functions of arbitrary sets of shape parts, with the particular combination involved not known to the network. Our network is able to produce a small dictionary of basis functions for each shape, a dictionary whose span includes the semantic functions provided for that shape. Even though our shapes have independent discretizations and no functional correspondences are provided, the network is able to generate latent bases, in a consistent order, that reflect the shared semantic structure among the shapes. We demonstrate the effectiveness of our technique in various segmentation and keypoint selection applications.

Journal ArticleDOI
02 Nov 2018
TL;DR: In this article, the authors describe a gel-based device which is both photo-and electrochromic, formed by the self-assembly of a naphthalene diimide.
Abstract: Smart windows in which the transmittance can be controlled on demand are a promising solution for the reduction of energy use in buildings. Windows are often the most energy inefficient part of a building, and so controlling the transmittance has the potential to significantly improve heating costs. Whilst numerous approaches exist, many suitable materials are costly to manufacture and process and so new materials could have a significant impact. Here we describe a gel-based device which is both photo- and electrochromic. The gel matrix is formed by the self-assembly of a naphthalene diimide. The radical anion of the naphthalene diimide can be formed photo or electrochemically, and leads to a desirable transition from transparent to black. The speed of response, low potential needed to generate the radical anion, cyclability of the system, temperature stability and low cost mean these devices may be suitable for applications in smart windows.

Proceedings ArticleDOI
24 Apr 2018
TL;DR: This paper presents the design and evaluation of a soft hand exo-sheath integrated with a soft fabric electromyography (EMG) sensor for rehabilitation and activities of daily living (ADL) assistance of stroke and spinal cord injury patients.
Abstract: This paper presents the design and evaluation of a soft hand exo-sheath integrated with a soft fabric electromyography (EMG) sensor for rehabilitation and activities of daily living (ADL) assistance of stroke and spinal cord injury (SCI) patients. This wearable robot addresses the limitations of the soft robot gloves with design considerations in terms of ergonomics and clinical practice. Its features include: this exo-sheath is based on electric actuation and has been designed to be compact and portable. It reduces the shear force and avoids kinematic singularity comparing with tendon-driven soft gloves as their tendon routings are typically in parallel with individual fingers. Disparate from conventional robotic gloves, this design optimizes a bio-inspired fin-ray structure to enhance the hand proprioception as the palm is not covered by wearable structures. With a novel self-fastening finger clasp design, wearers can self-don/doff the exoskeleton device simplifying ADL assistance. To develop more intuitive control interface, a soft fabric EMG sensor has been developed to understand human intentions. The functionality of this soft robot has been demonstrated with experimental results using the low-level position control, kinematics evaluation and reliable EMG measurements.

Posted Content
TL;DR: Two types of fast and energy-efficient architectures for BNN inference are proposed and analysis and insights are provided to pick the better strategy of these two for different datasets and network models.
Abstract: Binarized Neural Network (BNN) removes bitwidth redundancy in classical CNN by using a single bit (-1/+1) for network parameters and intermediate representations, which has greatly reduced the off-chip data transfer and storage overhead. However, a large amount of computation redundancy still exists in BNN inference. By analyzing local properties of images and the learned BNN kernel weights, we observe an average of $\sim$78% input similarity and $\sim$59% weight similarity among weight kernels, measured by our proposed metric in common network architectures. Thus there does exist redundancy that can be exploited to further reduce the amount of on-chip computations. Motivated by the observation, in this paper, we proposed two types of fast and energy-efficient architectures for BNN inference. We also provide analysis and insights to pick the better strategy of these two for different datasets and network models. By reusing the results from previous computation, much cycles for data buffer access and computations can be skipped. By experiments, we demonstrate that 80% of the computation and 40% of the buffer access can be skipped by exploiting BNN similarity. Thus, our design can achieve 17% reduction in total power consumption, 54% reduction in on-chip power consumption and 2.4$\times$ maximum speedup, compared to the baseline without applying our reuse technique. Our design also shows 1.9$\times$ more area-efficiency compared to state-of-the-art BNN inference design. We believe our deployment of BNN on FPGA leads to a promising future of running deep learning models on mobile devices.

Posted Content
TL;DR: The filter-level pruning problem for binary neural networks, which cannot be solved by simply migrating existing structural pruning methods for full-precision models, is defined for the first time and a novel learning-based approach is proposed to prune filters in main/subsidiary network frame-work.
Abstract: To reduce memory footprint and run-time latency, techniques such as neural network pruning and binarization have been explored separately. However, it is unclear how to combine the best of the two worlds to get extremely small and efficient models. In this paper, we, for the first time, define the filter-level pruning problem for binary neural networks, which cannot be solved by simply migrating existing structural pruning methods for full-precision models. A novel learning-based approach is proposed to prune filters in our main/subsidiary network framework, where the main network is responsible for learning representative features to optimize the prediction performance, and the subsidiary component works as a filter selector on the main network. To avoid gradient mismatch when training the subsidiary component, we propose a layer-wise and bottom-up scheme. We also provide the theoretical and experimental comparison between our learning-based and greedy rule-based methods. Finally, we empirically demonstrate the effectiveness of our approach applied on several binary models, including binarized NIN, VGG-11, and ResNet-18, on various image classification datasets.

Posted Content
TL;DR: This paper proposes a novel type of aesthetic QR codes, Stylized aEsthEtic (SEE) QR code, and a three-stage approach to automatically produce such robust style-oriented codes and designs a module-based robustness-optimization mechanism to ensure the performance robust by balancing two competing terms: visual quality and readability.
Abstract: With the continued proliferation of smart mobile devices, Quick Response (QR) code has become one of the most-used types of two-dimensional code in the world. Aiming at beautifying the visual-unpleasant appearance of QR codes, existing works have developed a series of techniques. However, these works still leave much to be desired, such as personalization, artistry, and robustness. To address these issues, in this paper, we propose a novel type of aesthetic QR codes, SEE (Stylize aEsthEtic) QR code, and a three-stage approach to automatically produce such robust style-oriented codes. Specifically, in the first stage, we propose a method to generate an optimized baseline aesthetic QR code, which reduces the visual contrast between the noise-like black/white modules and the blended image. In the second stage, to obtain art style QR code, we tailor an appropriate neural style transformation network to endow the baseline aesthetic QR code with artistic elements. In the third stage, we design an error-correction mechanism by balancing two competing terms, visual quality and readability, to ensure the performance robust. Extensive experiments demonstrate that SEE QR code has high quality in terms of both visual appearance and robustness, and also offers a greater variety of personalized choices to users.

Posted Content
TL;DR: This work constructs a stratified low-dimensional quasi-natural image space that faithfully approximates the natural image space while also removing adversarial perturbations and introduces a novel Sparse Transformation Layer (STL) in between the input image and the first layer of the neural network to efficiently project images into the quasi-Natural image space.
Abstract: We propose an adversarial defense method that achieves state-of-the-art performance among attack-agnostic adversarial defense methods while also maintaining robustness to input resolution, scale of adversarial perturbation, and scale of dataset size. Based on convolutional sparse coding, we construct a stratified low-dimensional quasi-natural image space that faithfully approximates the natural image space while also removing adversarial perturbations. We introduce a novel Sparse Transformation Layer (STL) in between the input image and the first layer of the neural network to efficiently project images into our quasi-natural image space. Our experiments show state-of-the-art performance of our method compared to other attack-agnostic adversarial defense methods in various adversarial settings.

Journal ArticleDOI
TL;DR: This paper builds a framework to detect primitives from images in a layered manner by modifying the YOLO network, and uses an RNN with a novel loss function to equip this network with the capability to predict primitives with a variable number of parameters.
Abstract: The perception of the visual world through basic building blocks, such as cubes, spheres, and cones, gives human beings a parsimonious understanding of the visual world. Thus, efforts to find primitive-based geometric interpretations of visual data date back to 1970s studies of visual media. However, due to the difficulty of primitive fitting in the pre-deep learning age, this research approach faded from the main stage, and the vision community turned primarily to semantic image understanding. In this paper, we revisit the classical problem of building geometric interpretations of images, using supervised deep learning tools. We build a framework to detect primitives from images in a layered manner by modifying the YOLO network; an RNN with a novel loss function is then used to equip this network with the capability to predict primitives with a variable number of parameters. We compare our pipeline to traditional and other baseline learning methods, demonstrating that our layered detection model has higher accuracy and performs better reconstruction.

Proceedings ArticleDOI
14 Jun 2018
TL;DR: This paper presents a new approach to tackle the joint misalignment problems of conventional rigid exoskeletons by using complex mechanisms (e.g. cam mechanism or five-bar linkage) to compensate the inertia and render low impedance to the wearers.
Abstract: Roboticists have developed a diverse array of powered exoskeletons for human augmentation and rehabilitation over the last few decades. One of the key design objectives is to minimize the discomfort to enhance the user experience. The high inertia and joint misalignment of conventional rigid exoskeletons are two key factors that cause these problems. Different types of control algorithms have been developed to compensate the inertia and render low impedance to the wearers [1-2]. In addition to the high inertia, the misalignment between exoskeleton joints and musculoskeletal joints of wearers can cause detrimental forces [3-4]. Conventionally, the mechanical knee joints of rigid knee exoskeletons are typically treated as a simple 1 degree of freedom (DOF) hinge mechanism, but the biological knee possesses complex kinematic characteristics. When this kind of 1-DOF exoskeleton and wearer’s limb form a closed kinematic chain, both kinematic and kinetic interference will inevitably occur. There are two existing solutions to tackle the joint misalignment problems. One method aims to use complex mechanisms (e.g. cam mechanism or five-bar linkage) to

Posted Content
TL;DR: This paper contributes the first large scale database suitable for 3D car instance understanding – ApolloCar3D and builds various baseline algorithms with the state-of-the-art deep convolutional neural networks.
Abstract: Autonomous driving has attracted remarkable attention from both industry and academia. An important task is to estimate 3D properties(e.g.translation, rotation and shape) of a moving or parked vehicle on the road. This task, while critical, is still under-researched in the computer vision community - partially owing to the lack of large scale and fully-annotated 3D car database suitable for autonomous driving research. In this paper, we contribute the first large-scale database suitable for 3D car instance understanding - ApolloCar3D. The dataset contains 5,277 driving images and over 60K car instances, where each car is fitted with an industry-grade 3D CAD model with absolute model size and semantically labelled keypoints. This dataset is above 20 times larger than PASCAL3D+ and KITTI, the current state-of-the-art. To enable efficient labelling in 3D, we build a pipeline by considering 2D-3D keypoint correspondences for a single instance and 3D relationship among multiple instances. Equipped with such dataset, we build various baseline algorithms with the state-of-the-art deep convolutional neural networks. Specifically, we first segment each car with a pre-trained Mask R-CNN, and then regress towards its 3D pose and shape based on a deformable 3D car model with or without using semantic keypoints. We show that using keypoints significantly improves fitting performance. Finally, we develop a new 3D metric jointly considering 3D pose and 3D shape, allowing for comprehensive evaluation and ablation study. By comparing with human performance we suggest several future directions for further improvements.


Proceedings ArticleDOI
14 Jun 2018
TL;DR: The contribution of this work is the development of a custom high-torque density electric actuator and its application.
Abstract: BACKGROUND The human hand has extraordinary dexterity with more than 20 degrees of freedom (DOF) actuated by lightweight and efficient biological actuators (i.e., muscles). The average weight of human hand is only 400g [1]. Over the last few decades, research and commercialization effort has been dedicated to the development of novel robotic hands for humanoid or prosthetic application towards dexterous and biomimetic design [2]. However, due to the limitations of existing electric motors in terms of torque density and energy efficiency, the design of humanoid hands has to compromise between dexterity and weight. For example, commercial prosthetic terminal devices i-Limb [3] and Bebionic [4] prioritize the lightweight need (450g) and use 5-DOF motors to under-actuated 11 joints, which is only able to realize a few basic grasp postures. On the other hand, some humanoid robot hand devices like DLR-HIT I & II hands [5] prioritize the dexterity need (15 DOF) but weigh more than four times than their biological counterpart (2200g and 1500g, respectively). The contribution of this work is the development of a custom high-torque density electric actuator and its application

Book ChapterDOI
TL;DR: This work describes the detailed procedure for synthesis of an ABC Mikto-arm star peptide conjugate in which two immiscible entities are conjugated onto a short β-sheet forming peptide sequence, GNNQQNY, derived from the Sup35 prion, through a lysine junction.
Abstract: Mikto-arm star peptide conjugates are an emerging class of self-assembling peptide-based structural units that contain three or more auxiliary segments of different chemical compositions and/or functionalities. This group of molecules exhibit interesting self-assembly behavior in solution due to their chemically asymmetric topology. Here we describe the detailed procedure for synthesis of an ABC Mikto-arm star peptide conjugate in which two immiscible entities (a saturated hydrocarbon and a hydrophobic and lipophobic fluorocarbon) are conjugated onto a short β-sheet forming peptide sequence, GNNQQNY, derived from the Sup35 prion, through a lysine junction. Automated and manual Fmoc-solid phase synthesis techniques are used to synthesize the Mikto-arm star peptide conjugates, followed by HPLC purification. We envision that this set of protocols can afford a versatile platform to synthesize a new class of peptidic building units for diverse applications.