scispace - formally typeset
Search or ask a question

Showing papers by "Hao Su published in 2019"


Proceedings ArticleDOI
15 Jun 2019
TL;DR: PartNet as discussed by the authors is a large-scale dataset of 3D objects annotated with fine-grained, instance-level, and hierarchical 3D part information, consisting of 573,585 part instances over 26,671 3D models.
Abstract: We present PartNet: a consistent, large-scale dataset of 3D objects annotated with fine-grained, instance-level, and hierarchical 3D part information. Our dataset consists of 573,585 part instances over 26,671 3D models covering 24 object categories. This dataset enables and serves as a catalyst for many tasks such as shape analysis, dynamic 3D scene modeling and simulation, affordance analysis, and others. Using our dataset, we establish three benchmarking tasks for evaluating 3D part recognition: fine-grained semantic segmentation, hierarchical semantic segmentation, and instance segmentation. We benchmark four state-of-the-art 3D deep learning algorithms for fine-grained semantic segmentation and three baseline methods for hierarchical semantic segmentation. We also propose a baseline method for part instance segmentation and demonstrate its superior performance over existing methods.

487 citations


Proceedings ArticleDOI
01 Oct 2019
TL;DR: Point-MVSNet as discussed by the authors predicts the depth in a coarse-to-fine manner by generating a coarse depth map, converting it into a point cloud and refining the point cloud iteratively by estimating the residual between the depth of the current iteration and the ground truth.
Abstract: We introduce Point-MVSNet, a novel point-based deep framework for multi-view stereo (MVS). Distinct from existing cost volume approaches, our method directly processes the target scene as point clouds. More specifically, our method predicts the depth in a coarse-to-fine manner. We first generate a coarse depth map, convert it into a point cloud and refine the point cloud iteratively by estimating the residual between the depth of the current iteration and that of the ground truth. Our network leverages 3D geometry priors and 2D texture information jointly and effectively by fusing them into a feature-augmented point cloud, and processes the point cloud to estimate the 3D flow for each point. This point-based architecture allows higher accuracy, more computational efficiency and more flexibility than cost-volume-based counterparts. Experimental results show that our approach achieves a significant improvement in reconstruction quality compared with state-of-the-art methods on the DTU and the Tanks and Temples dataset. Our source code and trained models are available at https://github.com/callmeray/PointMVSNet.

246 citations


Journal ArticleDOI
TL;DR: In this paper, a hierarchical graph network is proposed to encode shapes represented as n-ary graphs, which can be robustly trained on large and complex shape families, and can be used to generate a great diversity of realistic structured shape geometries.
Abstract: The ability to generate novel, diverse, and realistic 3D shapes along with associated part semantics and structure is central to many applications requiring high-quality 3D assets or large volumes of realistic training data. A key challenge towards this goal is how to accommodate diverse shape variations, including both continuous deformations of parts as well as structural or discrete alterations which add to, remove from, or modify the shape constituents and compositional structure. Such object structure can typically be organized into a hierarchy of constituent object parts and relationships, represented as a hierarchy of n-ary graphs. We introduce StructureNet, a hierarchical graph network which (i) can directly encode shapes represented as such n-ary graphs, (ii) can be robustly trained on large and complex shape families, and (iii) be used to generate a great diversity of realistic structured shape geometries. Technically, we accomplish this by drawing inspiration from recent advances in graph neural networks to propose an order-invariant encoding of n-ary graphs, considering jointly both part geometry and inter-part relations during network training. We extensively evaluate the quality of the learned latent spaces for various shape families and show significant advantages over baseline and competing methods. The learned latent spaces enable several structure-aware geometry processing applications, including shape generation and interpolation, shape editing, or shape structure discovery directly from un-annotated images, point clouds, or partial scans.

146 citations


Posted Content
TL;DR: StructureNet is introduced, a hierarchical graph network which can directly encode shapes represented as such n-ary graphs, and can be robustly trained on large and complex shape families and used to generate a great diversity of realistic structured shape geometries.
Abstract: The ability to generate novel, diverse, and realistic 3D shapes along with associated part semantics and structure is central to many applications requiring high-quality 3D assets or large volumes of realistic training data. A key challenge towards this goal is how to accommodate diverse shape variations, including both continuous deformations of parts as well as structural or discrete alterations which add to, remove from, or modify the shape constituents and compositional structure. Such object structure can typically be organized into a hierarchy of constituent object parts and relationships, represented as a hierarchy of n-ary graphs. We introduce StructureNet, a hierarchical graph network which (i) can directly encode shapes represented as such n-ary graphs; (ii) can be robustly trained on large and complex shape families; and (iii) can be used to generate a great diversity of realistic structured shape geometries. Technically, we accomplish this by drawing inspiration from recent advances in graph neural networks to propose an order-invariant encoding of n-ary graphs, considering jointly both part geometry and inter-part relations during network training. We extensively evaluate the quality of the learned latent spaces for various shape families and show significant advantages over baseline and competing methods. The learned latent spaces enable several structure-aware geometry processing applications, including shape generation and interpolation, shape editing, or shape structure discovery directly from un-annotated images, point clouds, or partial scans.

134 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: The ApolloCar3D dataset as discussed by the authors contains 5,277 driving images and over 60k car instances, where each car is fitted with an industry-grade 3D CAD model with absolute model size and semantically labeled keypoints.
Abstract: Autonomous driving has attracted remarkable attention from both industry and academia. An important task is to estimate 3D properties (e.g. translation, rotation and shape) of a moving or parked vehicle on the road. This task, while critical, is still under-researched in the computer vision community – partially owing to the lack of large scale and fully-annotated 3D car database suitable for autonomous driving research. In this paper, we contribute the first large scale database suitable for 3D car instance understanding – ApolloCar3D. The dataset contains 5,277 driving images and over 60K car instances, where each car is fitted with an industry-grade 3D CAD model with absolute model size and semantically labelled keypoints. This dataset is above 20× larger than PASCAL3D+ and KITTI, the current state-of-the-art. To enable efficient labelling in 3D, we build a pipeline by considering 2D-3D keypoint correspondences for a single instance and 3D relationship among multiple instances. Equipped with such dataset, we build various baseline algorithms with the state-of-the-art deep convolutional neural networks. Specifically, we first segment each car with a pre-trained Mask R-CNN, and then regress towards its 3D pose and shape based on a deformable 3D car model with or without using semantic keypoints. We show that using keypoints significantly improves fitting performance. Finally, we develop a new 3D metric jointly considering 3D pose and 3D shape, allowing for comprehensive evaluation and ablation study.

129 citations


Proceedings ArticleDOI
01 Jun 2019
TL;DR: The Binary Ensemble Neural Network (BENN) is proposed, which leverages ensemble methods to improve the performance of BNNs with limited efficiency cost and can even surpass the accuracy of the full-precision floating number network with the same architecture.
Abstract: Binary neural networks (BNN) have been studied extensively since they run dramatically faster at lower memory and power consumption than floating-point networks, thanks to the efficiency of bit operations. However, contemporary BNNs whose weights and activations are both single bits suffer from severe accuracy degradation. To understand why, we investigate the representation ability, speed and bias/variance of BNNs through extensive experiments. We conclude that the error of BNNs is predominantly caused by intrinsic instability (training time) and non-robustness (train & test time). Inspired by this investigation, we propose the Binary Ensemble Neural Network (BENN) which leverages ensemble methods to improve the performance of BNNs with limited efficiency cost. While ensemble techniques have been broadly believed to be only marginally helpful for strong classifiers such as deep neural networks, our analysis and experiments show that they are naturally a perfect fit to boost BNNs. We find that our BENN, which is faster and more robust than state-of-the-art binary networks, can even surpass the accuracy of the full-precision floating number network with the same architecture.

121 citations


Proceedings ArticleDOI
27 Oct 2019
TL;DR: Zhang et al. as discussed by the authors proposed MVPNet, which aggregates 2D multi-view image features into 3D point clouds, and then uses a point-based network to fuse the features in 3D canonical space to predict 3D semantic labels.
Abstract: Fusion of 2D images and 3D point clouds is important because information from dense images can enhance sparse point clouds. However, fusion is challenging because 2D and 3D data live in different spaces. In this work, we propose MVPNet (Multi-View PointNet), where we aggregate 2D multi-view image features into 3D point clouds, and then use a point based network to fuse the features in 3D canonical space to predict 3D semantic labels. To this end, we introduce view selection along with a 2D-3D feature aggregation module. Extensive experiments show the benefit of leveraging features from dense images and reveal superior robustness to varying point cloud density compared to 3D-only methods. On the ScanNetV2 benchmark, our MVPNet significantly outperforms prior point cloud based approaches on the task of 3D Semantic Segmentation. It is much faster to train than the large networks of the sparse voxel approach. We provide solid ablation studies to ease the future design of 2D-3D fusion methods and their extension to other tasks, as we showcase for 3D instance segmentation.

116 citations


Journal ArticleDOI
TL;DR: This paper synthesizes novel viewpoints across a wide range of viewing directions (covering a 60° cone) from a sparse set of just six viewing directions, based on a deep convolutional network trained to directly synthesize new views from the six input views.
Abstract: The goal of light transport acquisition is to take images from a sparse set of lighting and viewing directions, and combine them to enable arbitrary relighting with changing view. While relighting from sparse images has received significant attention, there has been relatively less progress on view synthesis from a sparse set of "photometric" images---images captured under controlled conditions, lit by a single directional source; we use a spherical gantry to position the camera on a sphere surrounding the object. In this paper, we synthesize novel viewpoints across a wide range of viewing directions (covering a 60° cone) from a sparse set of just six viewing directions. While our approach relates to previous view synthesis and image-based rendering techniques, those methods are usually restricted to much smaller baselines, and are captured under environment illumination. At our baselines, input images have few correspondences and large occlusions; however we benefit from structured photometric images. Our method is based on a deep convolutional network trained to directly synthesize new views from the six input views. This network combines 3D convolutions on a plane sweep volume with a novel per-view per-depth plane attention map prediction network to effectively aggregate multi-view appearance. We train our network with a large-scale synthetic dataset of 1000 scenes with complex geometry and material properties. In practice, it is able to synthesize novel viewpoints for captured real data and reproduces complex appearance effects like occlusions, view-dependent specularities and hard shadows. Moreover, the method can also be combined with previous relighting techniques to enable changing both lighting and view, and applied to computer vision problems like multiview stereo from sparse image sets.

109 citations


Posted Content
TL;DR: Overall, it is found that 3D point cloud classifiers are weak to adversarial attacks, but they are also more easily defensible compared to 2D image classifiers.
Abstract: 3D object classification and segmentation using deep neural networks has been extremely successful. As the problem of identifying 3D objects has many safety-critical applications, the neural networks have to be robust against adversarial changes to the input data set. There is a growing body of research on generating human-imperceptible adversarial attacks and defenses against them in the 2D image classification domain. However, 3D objects have various differences with 2D images, and this specific domain has not been rigorously studied so far. We present a preliminary evaluation of adversarial attacks on deep 3D point cloud classifiers, namely PointNet and PointNet++, by evaluating both white-box and black-box adversarial attacks that were proposed for 2D images and extending those attacks to reduce the perceptibility of the perturbations in 3D space. We also show the high effectiveness of simple defenses against those attacks by proposing new defenses that exploit the unique structure of 3D point clouds. Finally, we attempt to explain the effectiveness of the defenses through the intrinsic structures of both the point clouds and the neural network architectures. Overall, we find that networks that process 3D point cloud data are weak to adversarial attacks, but they are also more easily defensible compared to 2D image classifiers. Our investigation will provide the groundwork for future studies on improving the robustness of deep neural networks that handle 3D data.

76 citations


Proceedings ArticleDOI
01 Sep 2019
TL;DR: In this article, a preliminary evaluation of adversarial attacks on 3D point cloud classifiers was conducted by evaluating 2D images, and extending those attacks to reduce the perceptibility of the perturbations in 3D space.
Abstract: 3D object classification using deep neural networks has been extremely successful. As the problem of identifying 3D objects has many safety-critical applications, the neural networks have to be robust against adversarial changes to the input data set. We present a preliminary evaluation of adversarial attacks on 3D point cloud classifiers by evaluating adversarial attacks that were proposed for 2D images, and extending those attacks to reduce the perceptibility of the perturbations in 3D space. We also show the effectiveness of simple defenses against those attacks. Finally, we attempt to explain the effectiveness of the defenses through the intrinsic structures of both the point clouds and the neural networks. Overall, we find that 3D point cloud classifiers are weak to adversarial attacks, but they are also more easily defensible compared to 2D image classifiers. Our investigation will provide the groundwork for future studies on improving the robustness of deep neural networks that handle 3D data.

76 citations


Posted Content
Yuzhe Qin, Rui Chen, Hao Zhu, Meng Song, Jing Xu, Hao Su 
TL;DR: This paper studies the problem of 6-DoF grasping by a parallel gripper in a cluttered scene captured using a commodity depth sensor from a single viewpoint and proposes a single-shot grasp proposal network, trained with synthetic data and tested in real-world scenarios.
Abstract: Grasping is among the most fundamental and long-lasting problems in robotics study. This paper studies the problem of 6-DoF(degree of freedom) grasping by a parallel gripper in a cluttered scene captured using a commodity depth sensor from a single viewpoint. We address the problem in a learning-based framework. At the high level, we rely on a single-shot grasp proposal network, trained with synthetic data and tested in real-world scenarios. Our single-shot neural network architecture can predict amodal grasp proposal efficiently and effectively. Our training data synthesis pipeline can generate scenes of complex object configuration and leverage an innovative gripper contact model to create dense and high-quality grasp annotations. Experiments in synthetic and real environments have demonstrated that the proposed approach can outperform state-of-the-arts by a large margin.

Proceedings ArticleDOI
01 Jun 2019
TL;DR: Zhang et al. as discussed by the authors constructed a stratified low-dimensional quasi-natural image space that faithfully approximates the natural image space while also removing adversarial perturbations, and introduced a novel Sparse Transformation Layer (STL) in between the input image and the first layer of the neural network.
Abstract: We propose an adversarial defense method that achieves state-of-the-art performance among attack-agnostic adversarial defense methods while also maintaining robustness to input resolution, scale of adversarial perturbation, and scale of dataset size. Based on convolutional sparse coding, we construct a stratified low-dimensional quasi-natural image space that faithfully approximates the natural image space while also removing adversarial perturbations. We introduce a novel Sparse Transformation Layer (STL) in between the input image and the first layer of the neural network to efficiently project images into our quasi-natural image space. Our experiments show state-of-the-art performance of our method compared to other attack-agnostic adversarial defense methods in various adversarial settings.

Journal ArticleDOI
TL;DR: It is found that Spax monomers alone in water self-assemble into spherical micelles of approximately 6.5 nm in diameter but, in the presence of free PTX, undergo a supramolecular polymerization process to form filamentous assemblies of several micrometers in length.
Abstract: Spontaneous association above a threshold concentration is a hallmark of supramolecular polymerization, in which monomeric units self-assemble into polymeric aggregates through noncovalent interactions. This self-initiated supramolecular process differs from the conventional covalent chain-growth polymerization in that the latter often involves the use of a different chemical entity as an initiator to trigger/control the polymerization process. We report here the use of a small molecule hydrophobe, paclitaxel (PTX), as an effective promoter to induce the supramolecular polymerization of a peptide-paclitaxel conjugate, Spheropax (Spax). We found that Spax monomers alone in water self-assemble into spherical micelles of approximately 6.5 nm in diameter but, in the presence of free PTX, undergo a supramolecular polymerization process to form filamentous assemblies of several micrometers in length. Increasing the ratio of promoter to monomer (PTX/Spax) induces Spax's directional polymerization and expedites its kinetic process. We believe these findings provide important insight into the initiator-controlled supramolecular polymerization process.

Journal ArticleDOI
26 Jul 2019
TL;DR: This letter presents design and control innovations of wearable robots that tackle two barriers to widespread adoption of powered exoskeletons: restriction of human movement and versatile control of wearable co-robot systems.
Abstract: This letter presents design and control innovations of wearable robots that tackle two barriers to widespread adoption of powered exoskeletons: restriction of human movement and versatile control of wearable co-robot systems. First, the proposed high torque density actuation comprised of our customized high-torque density motors and low ratio transmission mechanism significantly reduces the mass of the robot and produces high backdrivability. Second, we derive a biomechanics model-based control that generates assistive torque profile for versatile control of both squat and stoop lifting assistance. The control algorithm detects lifting postures using compact inertial measurement unit (IMU) sensors to generate an assistive profile that is proportional to the human joint torque produced from our model. Experimental results demonstrate that the robot exhibits low mechanical impedance (1.5 Nm backdrive torque) when it is unpowered and 0.5 Nm backdrive torque with zero-torque tracking control. Root mean square (RMS) error of torque tracking is less than 0.29 Nm (1.21% error of 24 Nm peak torque). Compared with squatting without the exoskeleton, the controller reduces 87.5%, 80% and 75% of the three knee extensor muscles (average peak EMG of 3 healthy subjects) during squat with 50% of human joint torque assistance.

Journal ArticleDOI
22 May 2019-ACS Nano
TL;DR: This work is able to convert an anticancer drug, paclitaxel (PTX), to a class of prodrug hydrogelators with varying critical gelation concentrations with effective cytotoxicity against glioblastoma cell lines and also primary brain cancer cells derived from patients and show enhanced tumor penetration in a cancer spheroid model.
Abstract: One key design feature in the development of any local drug delivery system is the controlled release of therapeutic agents over a certain period of time. In this context, we report the characteristic feature of a supramolecular filament hydrogel system that enables a linear and sustainable drug release over the period of several months. Through covalent linkage with a short peptide sequence, we are able to convert an anticancer drug, paclitaxel (PTX), to a class of prodrug hydrogelators with varying critical gelation concentrations. These self-assembling PTX prodrugs associate into filamentous nanostructures in aqueous conditions and consequently percolate into a supramolecular filament network in the presence of appropriate counterions. The intriguing linear drug release profile is rooted in the supramolecular nature of the self-assembling filaments which maintain a constant monomer concentration at the gelation conditions. We found that molecular engineering of the prodrug design, such as varying the number of oppositely charged amino acids or through the incorporation of hydrophobic segments, allows for the fine-tuning of the PTX linear release rate. In cell studies, these PTX prodrugs can exert effective cytotoxicity against glioblastoma cell lines and also primary brain cancer cells derived from patients and show enhanced tumor penetration in a cancer spheroid model. We believe this drug-bearing hydrogel platform offers an exciting opportunity for the local treatment of human diseases.

Posted Content
TL;DR: This work introduces Point-MVSNet, a novel point-based deep framework for multi-view stereo (MVS), which directly processes the target scene as point clouds and allows higher accuracy, more computational efficiency and more flexibility than cost-volume-based counterparts.
Abstract: We introduce Point-MVSNet, a novel point-based deep framework for multi-view stereo (MVS). Distinct from existing cost volume approaches, our method directly processes the target scene as point clouds. More specifically, our method predicts the depth in a coarse-to-fine manner. We first generate a coarse depth map, convert it into a point cloud and refine the point cloud iteratively by estimating the residual between the depth of the current iteration and that of the ground truth. Our network leverages 3D geometry priors and 2D texture information jointly and effectively by fusing them into a feature-augmented point cloud, and processes the point cloud to estimate the 3D flow for each point. This point-based architecture allows higher accuracy, more computational efficiency and more flexibility than cost-volume-based counterparts. Experimental results show that our approach achieves a significant improvement in reconstruction quality compared with state-of-the-art methods on the DTU and the Tanks and Temples dataset. Our source code and trained models are available at this https URL .

Journal ArticleDOI
14 Aug 2019
TL;DR: In this article, a spine-inspired wearable exoskeleton is presented to assist both squat and stoops while not impeding walking motion, which can reduce multiple types of forces along the human spine.
Abstract: Back injuries are the most prevalent work-related musculoskeletal disorders and represent a major cause of disability. Although innovations in wearable robots aim to alleviate this hazard, the majority of existing exoskeletons are obtrusive because the rigid linkage design limits natural movement, thus causing ergonomic risk. Moreover, these existing systems are typically only suitable for one type of movement assistance, not ubiquitous for a wide variety of activities. To fill in this gap, this letter presents a new wearable robot design approach continuum soft exoskeleton. This spine-inspired wearable robot is unobtrusive and assists both squat and stoops while not impeding walking motion. To tackle the challenge of the unique anatomy of spine that is inappropriate to be simplified as a single degree of freedom joint, our robot is conformal to human anatomy and it can reduce multiple types of forces along the human spine such as the spinae muscle force, shear, and compression force of the lumbar vertebrae. We derived kinematics and kinetics models of this mechanism and established an analytical biomechanics model of human-robot interaction. Quantitative analysis of disc compression force, disc shear force and muscle force was performed in simulation. We further developed a virtual impedance control strategy to deliver force control and compensate hysteresis of Bowden cable transmission. The feasibility of the prototype was experimentally tested on three healthy subjects. The root mean square error of force tracking is 6.63 N (3.3% of the 200 N peak force) and it demonstrated that it can actively control the stiffness to the desired value. This continuum soft exoskeleton represents a feasible solution with the potential to reduce back pain for multiple activities and multiple forces along the human spine.

Posted Content
TL;DR: The authors proposed a state alignment-based imitation learning method to train the imitator to follow the state sequences in expert demonstrations as much as possible, where state alignment comes from both local and global perspectives and combine them into a reinforcement learning framework by a regularized policy update objective.
Abstract: Consider an imitation learning problem that the imitator and the expert have different dynamics models. Most of the current imitation learning methods fail because they focus on imitating actions. We propose a novel state alignment-based imitation learning method to train the imitator to follow the state sequences in expert demonstrations as much as possible. The state alignment comes from both local and global perspectives and we combine them into a reinforcement learning framework by a regularized policy update objective. We show the superiority of our method on standard imitation learning settings and imitation learning settings where the expert and imitator have different dynamics models.

Journal ArticleDOI
TL;DR: To enhance PN process via nitrifying bacteria enrichment/out-selection within psychrophilic environment, a novel pH-DO (dissolved oxygen) control strategy was proposed and the response of PN, kinetics, AOB enrichment, and NOB out-selection efficiency was investigated during start-up and long-term operation.

Posted Content
TL;DR: This work proposes a deep learning architecture that adapts to perform spline fitting tasks accordingly, providing complementary results to the aforementioned traditional methods.
Abstract: Reconstruction of geometry based on different input modes, such as images or point clouds, has been instrumental in the development of computer aided design and computer graphics. Optimal implementations of these applications have traditionally involved the use of spline-based representations at their core. Most such methods attempt to solve optimization problems that minimize an output-target mismatch. However, these optimization techniques require an initialization that is close enough, as they are local methods by nature. We propose a deep learning architecture that adapts to perform spline fitting tasks accordingly, providing complementary results to the aforementioned traditional methods. We showcase the performance of our approach, by reconstructing spline curves and surfaces based on input images or point clouds.

Journal ArticleDOI
TL;DR: Different self-assembled structures can be formed by varying the chirality of a functionalised dipeptide allowing gels with different properties to be prepared.
Abstract: Most low molecular weight gelators are chiral, with racemic mixtures often unable to form gels. Here, we show an example where all enantiomers, diastereomers and racemates of a single functionalized dipeptide can form gels. At high pH, different self-assembled aggregates are formed and these directly template the structures formed in the gel. Hence, solutions and gels with different properties can be accessed simply by varying the chirality. This opens up new design rules for the field.

Journal ArticleDOI
TL;DR: This review aims to provide an update on the potential and continued growth of the MRI-guided stereotactic neurosurgical techniques by describing the state of the art in MR conditional stereootactic devices including manual and robotic-assisted.
Abstract: Recent technological developments in magnetic resonance imaging (MRI) and stereotactic techniques have significantly improved surgical outcomes. Despite the advantages offered by the conventional MRI-guided stereotactic neurosurgery, the robotic-assisted stereotactic approach has potential to further improve the safety and accuracy of neurosurgeries. This review aims to provide an update on the potential and continued growth of the MRI-guided stereotactic neurosurgical techniques by describing the state of the art in MR conditional stereotactic devices including manual and robotic-assisted. The paper also presents a detailed overview of MRI-guided stereotactic devices, MR conditional actuators and encoders used in MR conditional robotic-assisted stereotactic devices. The review concludes with several research challenges and future perspectives, including actuator and sensor technique, MR image guidance, and robot design issues.

Journal ArticleDOI
TL;DR: The robust tubular assembly of camptothecin analogues into functional SPs is reported, which act as universal dispersing agents in water for low-molecular-weight hydrophobes and effectively suppresses tumor growth.
Abstract: Nanostructured supramolecular polymers (SPs) are filamentous assemblies possessing a high degree of internal order and have important uses in regenerative medicine, drug delivery, and soft matter electronics. Despite recent advances in functional SPs, a challenging topic is the development of robust assembly protocols enabling the incorporation of various functional units without altering its supramolecular architecture. We report here the robust tubular assembly of camptothecin (CPT) analogues into functional SPs. Covalent linkage of two CPT moieties to various short hydrophilic segments (e.g., nonionic, cationic, anionic, and zwitterionic) leads to a class of CPT analogues that self-assemble in water into tubular SPs. Systemic administration of nonionic SPs effectively suppresses tumor growth. Furthermore, these tubular SPs act as universal dispersing agents in water for low-molecular-weight hydrophobes.

Proceedings Article
01 Jan 2019
TL;DR: The method explicitly models the environment in a hierarchical manner, with a high-level dynamic landmark-based map abstracting the visited state space, and a low-level value network to derive precise local decisions that enable the agent to reach long-range goals at the early training stage.
Abstract: An agent that has well understood the environment should be able to apply its skills for any given goals, leading to the fundamental problem of learning the Universal Value Function Approximator (UVFA). A UVFA learns to predict the cumulative rewards between all state-goal pairs. However, empirically, the value function for long-range goals is always hard to estimate and may consequently result in failed policy. This has presented challenges to the learning process and the capability of neural networks. We propose a method to address this issue in large MDPs with sparse rewards, in which exploration and routing across remote states are both extremely challenging. Our method explicitly models the environment in a hierarchical manner, with a high-level dynamic landmark-based map abstracting the visited state space, and a low-level value network to derive precise local decisions. We use farthest point sampling to select landmark states from past experience, which has improved exploration compared with simple uniform sampling. Experimentally we showed that our method enables the agent to reach long-range goals at the early training stage, and achieve better performance than standard RL algorithms for a number of challenging tasks.

Yuzhe Qin, Rui Chen, Hao Zhu, Meng Song, Jing Xu, Hao Su 
01 Jan 2019
TL;DR: In this article, a single-shot grasp proposal network was proposed to predict amodal grasp proposal efficiently and effectively, which can generate complex object configuration and leverage an innovative gripper contact model to create dense and high quality grasp annotations.
Abstract: Grasping is among the most fundamental and long-lasting problems in robotics study. This paper studies the problem of 6-DoF(degree of freedom) grasping by a parallel gripper in a cluttered scene captured using a commodity depth sensor from a single viewpoint. We address the problem in a learning-based framework. At the high level, we rely on a single-shot grasp proposal network, trained with synthetic data and tested in real-world scenarios. Our single-shot neural network architecture can predict amodal grasp proposal efficiently and effectively. Our training data synthesis pipeline can generate scenes of complex object configuration and leverage an innovative gripper contact model to create dense and high-quality grasp annotations. Experiments in synthetic and real environments have demonstrated that the proposed approach can outperform state-of-the-arts by a large margin.

Posted Content
TL;DR: This extended abstract demonstrates that the choice of the distribution plays a major role in the performance of the trained policies in the real world and that the parameter of this distribution can be optimized to maximize the performance on more complex systems.
Abstract: Recently, reinforcement learning (RL) algorithms have demonstrated remarkable success in learning complicated behaviors from minimally processed input. However, most of this success is limited to simulation. While there are promising successes in applying RL algorithms directly on real systems, their performance on more complex systems remains bottle-necked by the relative data inefficiency of RL algorithms. Domain randomization is a promising direction of research that has demonstrated impressive results using RL algorithms to control real robots. At a high level, domain randomization works by training a policy on a distribution of environmental conditions in simulation. If the environments are diverse enough, then the policy trained on this distribution will plausibly generalize to the real world. A human-specified design choice in domain randomization is the form and parameters of the distribution of simulated environments. It is unclear how to the best pick the form and parameters of this distribution and prior work uses hand-tuned distributions. This extended abstract demonstrates that the choice of the distribution plays a major role in the performance of the trained policies in the real world and that the parameter of this distribution can be optimized to maximize the performance of the trained policies in the real world

Journal ArticleDOI
TL;DR: Although these two macromolecules possess identical compositions as "sequence isomers", the distinctly arranged POSS sequences lead to different molecular packing conformations, and further induce distinguished self-assembly behaviors in DMF/water solutions.

Posted Content
TL;DR: This work proposes MVPNet (Multi-View PointNet), where 2D multi-view image features are aggregate into 3D point clouds, and then a point based network is used to fuse the features in 3D canonical space to predict 3D semantic labels.
Abstract: Fusion of 2D images and 3D point clouds is important because information from dense images can enhance sparse point clouds. However, fusion is challenging because 2D and 3D data live in different spaces. In this work, we propose MVPNet (Multi-View PointNet), where we aggregate 2D multi-view image features into 3D point clouds, and then use a point based network to fuse the features in 3D canonical space to predict 3D semantic labels. To this end, we introduce view selection along with a 2D-3D feature aggregation module. Extensive experiments show the benefit of leveraging features from dense images and reveal superior robustness to varying point cloud density compared to 3D-only methods. On the ScanNetV2 benchmark, our MVPNet significantly outperforms prior point cloud based approaches on the task of 3D Semantic Segmentation. It is much faster to train than the large networks of the sparse voxel approach. We provide solid ablation studies to ease the future design of 2D-3D fusion methods and their extension to other tasks, as we showcase for 3D instance segmentation.

Proceedings ArticleDOI
20 Feb 2019
TL;DR: In this paper, the authors proposed two types of fast and energy-efficient architectures for BNN inference on FPGA and demonstrated that 80% of the computation and 40% of buffer access can be skipped by exploiting BNN similarity.
Abstract: Binarized Neural Network (BNN) removes bitwidth redundancy in classical CNN by using a single bit (-1/+1) for network parameters and intermediate representations, which has greatly reduced the off-chip data transfer and storage overhead. However, a large amount of computation redundancy still exists in BNN inference. By analyzing local properties of images and the learned BNN kernel weights, we observe an average of ~78% input similarity and ~59% weight similarity among weight kernels, measured by our proposed metric in common network architectures. Thus there does exist redundancy that can be exploited to further reduce the amount of on-chip computations. Motivated by the observation, in this paper, we proposed two types of fast and energy-efficient architectures for BNN inference. We also provide analysis and insights to pick the better strategy of these two for different datasets and network models. By reusing the results from previous computation, much cycles for data buffer access and computations can be skipped. By experiments, we demonstrate that 80% of the computation and 40% of the buffer access can be skipped by exploiting BNN similarity. Thus, our design can achieve 17% reduction in total power consumption, 54% reduction in on-chip power consumption and 2.4× maximum speedup, compared to the baseline without applying our reuse technique. Our design also shows 1.9× more area-efficiency compared to state-of-the-art BNN inference design. We believe our deployment of BNN on FPGA leads to a promising future of running deep learning models on mobile devices.

Posted Content
TL;DR: Three possible shape attacks for attacking 3D point cloud classification are explored and it is shown that some are able to be effective even against preprocessing steps, like the previously proposed point-removal defenses.
Abstract: The importance of training robust neural network grows as 3D data is increasingly utilized in deep learning for vision tasks in robotics, drone control, and autonomous driving. One commonly used 3D data type is 3D point clouds, which describe shape information. We examine the problem of creating robust models from the perspective of the attacker, which is necessary in understanding how 3D neural networks can be exploited. We explore two categories of attacks: distributional attacks that involve imperceptible perturbations to the distribution of points, and shape attacks that involve deforming the shape represented by a point cloud. We explore three possible shape attacks for attacking 3D point cloud classification and show that some of them are able to be effective even against preprocessing steps, like the previously proposed point-removal defenses.