scispace - formally typeset
Search or ask a question

Showing papers by "Hao Su published in 2020"


Proceedings ArticleDOI
14 Jun 2020
TL;DR: The proposed ATV consists of only a small number of planes with low memory and computation costs; yet, it efficiently partitions local depth ranges within learned small uncertainty intervals, which enables reconstruction with high completeness and accuracy in a coarse-to-fine fashion.
Abstract: We present Uncertainty-aware Cascaded Stereo Network (UCS-Net) for 3D reconstruction from multiple RGB images. Multi-view stereo (MVS) aims to reconstruct fine-grained scene geometry from multi-view images. Previous learning-based MVS methods estimate per-view depth using plane sweep volumes (PSVs) with a fixed depth hypothesis at each plane; this requires densely sampled planes for high accuracy, which is impractical for high-resolution depth because of limited memory. In contrast, we propose adaptive thin volumes (ATVs); in an ATV, the depth hypothesis of each plane is spatially varying, which adapts to the uncertainties of previous per-pixel depth predictions. Our UCS-Net has three stages: the first stage processes a small PSV to predict low-resolution depth; two ATVs are then used in the following stages to refine the depth with higher resolution and higher accuracy. Our ATV consists of only a small number of planes with low memory and computation costs; yet, it efficiently partitions local depth ranges within learned small uncertainty intervals. We propose to use variance-based uncertainty estimates to adaptively construct ATVs; this differentiable process leads to reasonable and fine-grained spatial partitioning. Our multi-stage framework progressively sub-divides the vast scene space with increasing depth resolution and precision, which enables reconstruction with high completeness and accuracy in a coarse-to-fine fashion. We demonstrate that our method achieves superior performance compared with other learning-based MVS methods on various challenging datasets.

181 citations


Journal ArticleDOI
TL;DR: In multiple mouse models of murine tumours, a single low dose of the STING agonist led to tumour regression and increased animal survival, and to long-term immunological memory and systemic immune surveillance, which protected the mice against tumour recurrence and the formation of metastases.
Abstract: Tumours with an immunosuppressive microenvironment respond poorly to therapy. Activation of the stimulator of interferon genes (STING) pathway can enhance intratumoural immune activation, but STING agonists are associated with high toxicity and degrade prematurely, which limits their effectiveness. Here, we show that the extended intratumoural release of the STING agonist cyclic di-AMP transforms the tumour microenvironment from immunosuppressive to immunostimulatory, increasing the efficacy of antitumour therapies. The STING agonist was electrostatically complexed with nanotubes comprising a peptide-drug conjugate (a peptide that binds to the protein neuropilin-1, which is highly expressed in tumours, and the chemotherapeutic agent camptothecin) that self-assemble in situ into a supramolecular hydrogel. In multiple mouse models of murine tumours, a single low dose of the STING agonist led to tumour regression and increased animal survival, and to long-term immunological memory and systemic immune surveillance, which protected the mice against tumour recurrence and the formation of metastases. Locally delivered STING agonists could help to reduce tumour immunosuppression and enhance the efficacy of a wide range of cancer therapies.

140 citations


Posted Content
TL;DR: SAPIEN is a realistic and physics-rich simulated environment that hosts a large-scale set of articulated objects that enables various robotic vision and interaction tasks that require detailed part-level understanding and hopes it will open research directions yet to be explored.
Abstract: Building home assistant robots has long been a pursuit for vision and robotics researchers. To achieve this task, a simulated environment with physically realistic simulation, sufficient articulated objects, and transferability to the real robot is indispensable. Existing environments achieve these requirements for robotics simulation with different levels of simplification and focus. We take one step further in constructing an environment that supports household tasks for training robot learning algorithm. Our work, SAPIEN, is a realistic and physics-rich simulated environment that hosts a large-scale set for articulated objects. Our SAPIEN enables various robotic vision and interaction tasks that require detailed part-level understanding.We evaluate state-of-the-art vision algorithms for part detection and motion attribute recognition as well as demonstrate robotic interaction tasks using heuristic approaches and reinforcement learning algorithms. We hope that our SAPIEN can open a lot of research directions yet to be explored, including learning cognition through interaction, part motion discovery, and construction of robotics-ready simulated game environment.

137 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: SAPIEN as mentioned in this paper is a realistic and physics-rich simulated environment that hosts a large-scale set of articulated objects for part detection and motion attribute recognition, as well as demonstrate robotic interaction tasks using heuristic approaches and reinforcement learning algorithms.
Abstract: Building home assistant robots has long been a goal for vision and robotics researchers. To achieve this task, a simulated environment with physically realistic simulation, sufficient articulated objects, and transferability to the real robot is indispensable. Existing environments achieve these requirements for robotics simulation with different levels of simplification and focus. We take one step further in constructing an environment that supports household tasks for training robot learning algorithm. Our work, SAPIEN, is a realistic and physics-rich simulated environment that hosts a large-scale set of articulated objects. SAPIEN enables various robotic vision and interaction tasks that require detailed part-level understanding.We evaluate state-of-the-art vision algorithms for part detection and motion attribute recognition as well as demonstrate robotic interaction tasks using heuristic approaches and reinforcement learning algorithms. We hope that SAPIEN will open research directions yet to be explored, including learning cognition through interaction, part motion discovery, and construction of robotics-ready simulated game environment.

130 citations


Posted Content
TL;DR: A framework for research and evaluation in Embodied AI is described, based on a canonical task: Rearrangement, that can focus the development of new techniques and serve as a source of trained models that can be transferred to other settings.
Abstract: We describe a framework for research and evaluation in Embodied AI. Our proposal is based on a canonical task: Rearrangement. A standard task can focus the development of new techniques and serve as a source of trained models that can be transferred to other settings. In the rearrangement task, the goal is to bring a given physical environment into a specified state. The goal state can be specified by object poses, by images, by a description in language, or by letting the agent experience the environment in the goal state. We characterize rearrangement scenarios along different axes and describe metrics for benchmarking rearrangement performance. To facilitate research and exploration, we present experimental testbeds of rearrangement scenarios in four different simulation environments. We anticipate that other datasets will be released and new simulation platforms will be built to support training of rearrangement agents and their deployment on physical systems.

111 citations


Journal ArticleDOI
TL;DR: This work found that this carrier-free therapeutic system can serve as a reservoir for extended tumoral release of camptothecin and aPD1 antibody, resulting in an immune-stimulating tumor microenvironment for boosted PD-1 blockade immune response.
Abstract: Immune checkpoint blockers (ICBs) have shown great promise at harnessing immune system to combat cancer. However, only a fraction of patients can directly benefit from the anti-programmed cell death protein 1 (aPD1) therapy, and the treatment often leads to immune-related adverse effects. In this context, we developed a prodrug hydrogelator for local delivery of ICBs to boost the host's immune system against tumor. We found that this carrier-free therapeutic system can serve as a reservoir for extended tumoral release of camptothecin and aPD1 antibody, resulting in an immune-stimulating tumor microenvironment for boosted PD-1 blockade immune response. Our in vivo results revealed that this combination chemoimmunotherapy elicits robust and durable systemic anticancer immunity, inducing tumor regression and inhibiting tumor recurrence and metastasis. This work sheds important light into the use of small-molecule prodrugs as both chemotherapeutic and carrier to awaken and enhance antitumor immune system for improved ICBs therapy.

85 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: A novel consistency loss to train an independent consistency module that refines the depths from depth/normal pairs and it is found that the joint learning can improve both the prediction of normal and depth, and the accuracy and smoothness can be further improved by enforcing the consistency.
Abstract: Accurate stereo depth estimation plays a critical role in various 3D tasks in both indoor and outdoor environments. Recently, learning-based multi-view stereo methods have demonstrated competitive performance with limited number of views. However, in challenging scenarios, especially when building cross-view correspondences is hard, these methods still cannot produce satisfying results. In this paper, we study how to enforce the consistency between surface normal and depth at training time to improve the performance. We couple the learning of a multi-view normal estimation module and a multi-view depth estimation module. In addition, we propose a novel consistency loss to train an independent consistency module that refines the depths from depth/normal pairs. We find that the joint learning can improve both the prediction of normal and depth, and the accuracy and smoothness can be further improved by enforcing the consistency. Experiments on MVS, SUN3D, RGBD and Scenes11 demonstrate the effectiveness of our method and state-of-the-art performance.

62 citations


Journal ArticleDOI
TL;DR: In this paper, a generic design principle that harnesses mechanical instability for a variety of spine-inspired fast and strong soft machines is presented. But, unlike most current soft robots that are designed as inherently and unimodally stable, their design leverages tunable snap-through bistability to fully explore the ability of soft robots to rapidly store and release energy within tens of milliseconds.
Abstract: Soft machines typically exhibit slow locomotion speed and low manipulation strength because of intrinsic limitations of soft materials. Here, we present a generic design principle that harnesses mechanical instability for a variety of spine-inspired fast and strong soft machines. Unlike most current soft robots that are designed as inherently and unimodally stable, our design leverages tunable snap-through bistability to fully explore the ability of soft robots to rapidly store and release energy within tens of milliseconds. We demonstrate this generic design principle with three high-performance soft machines: High-speed cheetah-like galloping crawlers with locomotion speeds of 2.68 body length/s, high-speed underwater swimmers (0.78 body length/s), and tunable low-to-high-force soft grippers with over 1 to 103 stiffness modulation (maximum load capacity is 11.4 kg). Our study establishes a new generic design paradigm of next-generation high-performance soft robots that are applicable for multifunctionality, different actuation methods, and materials at multiscales.

55 citations


Journal ArticleDOI
TL;DR: The design and human–robot interaction modeling of a portable hip exoskeleton based on a custom quasi-direct drive actuation with performance improvement compared with state-of-the-art exoskeletons is described and demonstrated.
Abstract: High-performance actuators are crucial to enable mechanical versatility of wearable robots, which are required to be lightweight, highly backdrivable, and with high bandwidth. State-of-the-art actuators, e.g., series elastic actuators, have to compromise bandwidth to improve compliance (i.e., backdrivability). In this article, we describe the design and human–robot interaction modeling of a portable hip exoskeleton based on our custom quasi-direct drive actuation (i.e., a high torque density motor with low ratio gear). We also present a model-based performance benchmark comparison of representative actuators in terms of torque capability, control bandwidth, backdrivability, and force tracking accuracy. This article aims to corroborate the underlying philosophy of “design for control,” namely meticulous robot design can simplify control algorithms while ensuring high performance. Following this idea, we create a lightweight bilateral hip exoskeleton to reduce joint loadings during normal activities, including walking and squatting. Experiments indicate that the exoskeleton is able to produce high nominal torque (17.5 Nm), high backdrivability (0.4 Nm backdrive torque), high bandwidth (62.4 Hz), and high control accuracy (1.09 Nm root mean square tracking error, 5.4% of the desired peak torque). Its controller is versatile to assist walking at different speeds and squatting. This article demonstrates performance improvement compared with state-of-the-art exoskeletons.

53 citations


Journal ArticleDOI
TL;DR: The design of a series of self-assembling prodrugs (SAPDs) that spontaneously associate in aqueous solution into supramolecular polymers (SPs) with varying CMCs are reported, finding that the lower the CMC, the higher the maximum tolerated dose (MTD) in rodents.
Abstract: The inception and development of supramolecular chemistry have provided a vast library of supramolecular structures and materials for improved practice of medicine. In the context of therapeutic delivery, while supramolecular nanostructures offer a wide variety of morphologies as drug carriers for optimized targeting and controlled release, concerns are often raised as to how their morphological stability and structural integrity impact their in vivo performance. After intravenous (i.v.) administration, the intrinsic reversible and dynamic feature of supramolecular assemblies may lead them to dissociate upon plasma dilution to a concentration below their critical micellization concentration (CMC). As such, CMC represents an important characteristic for supramolecular biomaterials design, but its pharmaceutical role remains elusive. Here, we report the design of a series of self-assembling prodrugs (SAPDs) that spontaneously associate in aqueous solution into supramolecular polymers (SPs) with varying CMCs. Two hydrophobic camptothecin (CPT) molecules were conjugated onto oligoethylene-glycol (OEG)-decorated segments with various OEG repeat numbers (2, 4, 6, 8). Our studies show that the lower the CMC, the lower the maximum tolerated dose (MTD) in rodents. When administrated at the same dosage of 10 mg/kg (CPT equivalent), SAPD 1, the one with the lowest CMC, shows the best efficacy in tumor suppression. These observations can be explained by the circulation and dissociation of SAPD SPs and the difference in molecular and supramolecular distribution between excretion and organ uptake. We believe these findings offer important insight into the role of supramolecular stability in determining their therapeutic index and in vivo efficacy.

48 citations


Journal ArticleDOI
TL;DR: The in vivo studies with a resection and recurrence mouse model suggest that this prodrug hydrogel can release cancer therapeutics into brain parenchyma over a long period of time, suppressing tumor recurrence and leading to prolonged survival.

Journal ArticleDOI
04 Mar 2020
TL;DR: In this paper, a combination of smallangle X-ray and small-angle neutron scattering with selectively deuterated molecules was used to understand the packing in the pre-gelled aggregates and in the gel state.
Abstract: Small molecules can self-assemble into one-dimensional structures to give self-supporting gels. Such gels have a wide range of uses, including tissue engineering and drug delivery catalysis. It is difficult to understand how the molecules are packed in these structures, but this is hugely important if we are going to be able to learn from and design such materials. Here, we use a combination of small-angle X-ray and small-angle neutron scattering with selectively deuterated molecules to understand the packing in the pre-gelled aggregates and in the gel state. We also use kinetic measurements to understand the transition between these aggregates. Our data show that there is a lack of order in the gel state, correlating with the limited predictive design rules in this field and with the importance of kinetics in forming the gel state. This approach allows us to understand our specific systems but represents a general approach that could be taken with different classes of gelator.

Posted ContentDOI
TL;DR: Wang et al. as discussed by the authors propose to separate the latent space of portrait images into two subspaces: a geometry space and a texture space, which are then fed to two network branches separately, one to generate the 3D geometry of portraits with canonical pose, and the other to generate textures.
Abstract: Recently, Generative Adversarial Networks (GANs)} have been widely used for portrait image generation. However, in the latent space learned by GANs, different attributes, such as pose, shape, and texture style, are generally entangled, making the explicit control of specific attributes difficult. To address this issue, we propose a SofGAN image generator to decouple the latent space of portraits into two subspaces: a geometry space and a texture space. The latent codes sampled from the two subspaces are fed to two network branches separately, one to generate the 3D geometry of portraits with canonical pose, and the other to generate textures. The aligned 3D geometries also come with semantic part segmentation, encoded as a semantic occupancy field (SOF). The SOF allows the rendering of consistent 2D semantic segmentation maps at arbitrary views, which are then fused with the generated texture maps and stylized to a portrait photo using our semantic instance-wise (SIW) module. Through extensive experiments, we show that our system can generate high quality portrait images with independently controllable geometry and texture attributes. The method also generalizes well in various applications such as appearance-consistent facial animation and dynamic styling.

Journal ArticleDOI
28 Jul 2020-ACS Nano
TL;DR: In vitro and in vivo studies reveal that these dual drug-bearing supramolecular hydrogels enhance tumor retention and penetration, serving as a local therapeutic depot for potent tumor regression, inhibition of tumor metastasis and recurrence, and mitigation of the off-target side effects.
Abstract: Local chemotherapy is a clinically proven strategy in treating malignant brain tumors. Its benefits, however, are largely limited by the rapid release and clearance of therapeutic agents and the inability to penetrate through tumor tissues. We report here on a supramolecular tubustecan (TT) hydrogel as both a therapeutic and drug carrier that enables long-term, sustained drug release and improved tumor tissue penetration. Covalent linkage of a tissue penetrating cyclic peptide to two camptothecin drug units creates a TT prodrug amphiphile that can associate into tubular supramolecular polymers and subsequently form a well-defined sphere-shaped hydrogel after injection into tumor tissues. The hollow nature of the resultant tubular assemblies allows for encapsulation of doxorubicin or curcumin for combination therapy. Our in vitro and in vivo studies reveal that these dual drug-bearing supramolecular hydrogels enhance tumor retention and penetration, serving as a local therapeutic depot for potent tumor regression, inhibition of tumor metastasis and recurrence, and mitigation of the off-target side effects.

Posted Content
TL;DR: In this paper, the authors describe the design and human-robot interaction modeling of a portable hip exoskeleton based on a custom quasi-direct drive (QDD) actuation (i.e., a high torque density motor with low ratio gear).
Abstract: High-performance actuators are crucial to enable mechanical versatility of lower-limb wearable robots, which are required to be lightweight, highly backdrivable, and with high bandwidth. State-of-the-art actuators, e.g., series elastic actuators (SEAs), have to compromise bandwidth to improve compliance (i.e., backdrivability). In this paper, we describe the design and human-robot interaction modeling of a portable hip exoskeleton based on our custom quasi-direct drive (QDD) actuation (i.e., a high torque density motor with low ratio gear). We also present a model-based performance benchmark comparison of representative actuators in terms of torque capability, control bandwidth, backdrivability, and force tracking accuracy. This paper aims to corroborate the underlying philosophy of "design for control", namely meticulous robot design can simplify control algorithms while ensuring high performance. Following this idea, we create a lightweight bilateral hip exoskeleton (overall mass is 3.4 kg) to reduce joint loadings during normal activities, including walking and squatting. Experimental results indicate that the exoskeleton is able to produce high nominal torque (17.5 Nm), high backdrivability (0.4 Nm backdrive torque), high bandwidth (62.4 Hz), and high control accuracy (1.09 Nm root mean square tracking error, i.e., 5.4% of the desired peak torque). Its controller is versatile to assist walking at different speeds (0.8-1.4 m/s) and squatting at 2 s cadence. This work demonstrates significant improvement in backdrivability and control bandwidth compared with state-of-the-art exoskeletons powered by the conventional actuation or SEA.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: This work demonstrates that a separate encoding of shape deltas or differences provides a principled way to deal with inhomogeneities in the shape space due to different combinatorial part structures, while also allowing for compactness in the representation, as well as edit abstraction and transfer.
Abstract: Learning to encode differences in the geometry and (topological) structure of the shapes of ordinary objects is key to generating semantically plausible variations of a given shape, transferring edits from one shape to another, and for many other applications in 3D content creation. The common approach of encoding shapes as points in a high-dimensional latent feature space suggests treating shape differences as vectors in that space. Instead, we treat shape differences as primary objects in their own right and propose to encode them in their own latent space. In a setting where the shapes themselves are encoded in terms of fine-grained part hierarchies, we demonstrate that a separate encoding of shape deltas or differences provides a principled way to deal with inhomogeneities in the shape space due to different combinatorial part structures, while also allowing for compactness in the representation, as well as edit abstraction and transfer. Our approach is based on a conditional variational autoencoder for encoding and decoding shape deltas, conditioned on a source shape. We demonstrate the effectiveness and robustness of our approach in multiple shape modification and generation tasks, and provide comparison and ablation studies on the PartNet dataset, one of the largest publicly available 3D datasets.


Posted Content
TL;DR: Experiments show that it is feasible and promising to learn 3D shape completion through large-scale data without shape and pose supervision, and jointly optimizes canonical shapes and poses with multi-view geometry constraints during training.
Abstract: 3D shape completion for real data is important but challenging, since partial point clouds acquired by real-world sensors are usually sparse, noisy and unaligned Different from previous methods, we address the problem of learning 3D complete shape from unaligned and real-world partial point clouds To this end, we propose a weakly-supervised method to estimate both 3D canonical shape and 6-DoF pose for alignment, given multiple partial observations associated with the same instance The network jointly optimizes canonical shapes and poses with multi-view geometry constraints during training, and can infer the complete shape given a single partial point cloud Moreover, learned pose estimation can facilitate partial point cloud registration Experiments on both synthetic and real data show that it is feasible and promising to learn 3D shape completion through large-scale data without shape and pose supervision

Journal ArticleDOI
TL;DR: In this paper, single-stage nitrogen removal using anammox and partial-nitritation (SNAP) process was proposed and employed to treat leachate under microaerobic condition in an upflow sludge blanket.
Abstract: Nitrogen levels in landfill leachate (LL) could potentially pollute water bodies and surrounding groundwater, when discharged untreated. Against this backdrop, pollutant removal from leachate generated at landfill sites is essential. However, a cost-efficient approach, which could concurrently remove various pollutants within LL via single-stage approach, and under microaerobic conditions, has not been fully explored. In this study, single-stage nitrogen removal using anammox and partial-nitritation (SNAP) process was proposed and employed to treat LL under microaerobic condition in an upflow sludge blanket. When reactor dissolved oxygen-(DO) of 0.2 mg/L was implemented along with nitrogen loading rate-(NLR) ranging 0.31-1.84 kg/m3·d, 99.5% NH4+-N, 94.3% TN and 31.04% chemical oxygen demand (COD) removals was concurrently achieved. Conversely, when DO was elevated to 0.6 mg/L, nitrogen removal was curtailed to

Proceedings Article
30 Apr 2020
TL;DR: This work proposes a novel state alignment-based imitation learning method to train the imitator by following the state sequences in the expert demonstrations as much as possible, and combines them into a reinforcement learning framework by a regularized policy update objective.
Abstract: Consider an imitation learning problem that the imitator and the expert have different dynamics models. Most of existing imitation learning methods fail because they focus on the imitation of actions. We propose a novel state alignment-based imitation learning method to train the imitator by following the state sequences in the expert demonstrations as much as possible. The alignment of states comes from both local and global perspectives. We combine them into a reinforcement learning framework by a regularized policy update objective. We show the superiority of our method on standard imitation learning settings as well as the challenging settings in which the expert and the imitator have different dynamics models.

Posted Content
TL;DR: This paper designs an end-to-end trainable framework consisting of learnable modules for detection, feature extraction, matching and outlier rejection, while directly optimizing for the geometric pose objective, and shows both quantitatively and qualitatively that pose estimation performance may be achieved on par with the classic pipeline.
Abstract: Estimating relative camera poses from consecutive frames is a fundamental problem in visual odometry (VO) and simultaneous localization and mapping (SLAM), where classic methods consisting of hand-crafted features and sampling-based outlier rejection have been a dominant choice for over a decade. Although multiple works propose to replace these modules with learning-based counterparts, most have not yet been as accurate, robust and generalizable as conventional methods. In this paper, we design an end-to-end trainable framework consisting of learnable modules for detection, feature extraction, matching and outlier rejection, while directly optimizing for the geometric pose objective. We show both quantitatively and qualitatively that pose estimation performance may be achieved on par with the classic pipeline. Moreover, we are able to show by end-to-end training, the key components of the pipeline could be significantly improved, which leads to better generalizability to unseen datasets compared to existing learning-based methods.

Journal ArticleDOI
TL;DR: A strong correlation between ammonium oxidation rate (AOR) and Y(III) dosage was revealed and SEM-EDS showed that the content of extracellular polymeric substances (EPS) increased along with increasing Y(II) dosage, an indication that dosage of Y( III) could affect the partial-nitritation process.

Posted Content
TL;DR: This work discovers that the immense performance drop of binarized models for point clouds is caused by two main challenges: aggregation-induced feature homogenization that leads to a degradation of information entropy, and scale distortion that hinders optimization and invalidates scale-sensitive structures.
Abstract: To alleviate the resource constraint for real-time point cloud applications that run on edge devices, in this paper we present BiPointNet, the first model binarization approach for efficient deep learning on point clouds. We discover that the immense performance drop of binarized models for point clouds mainly stems from two challenges: aggregation-induced feature homogenization that leads to a degradation of information entropy, and scale distortion that hinders optimization and invalidates scale-sensitive structures. With theoretical justifications and in-depth analysis, our BiPointNet introduces Entropy-Maximizing Aggregation (EMA) to modulate the distribution before aggregation for the maximum information entropy, and Layer-wise Scale Recovery (LSR) to efficiently restore feature representation capacity. Extensive experiments show that BiPointNet outperforms existing binarization methods by convincing margins, at the level even comparable with the full precision counterpart. We highlight that our techniques are generic, guaranteeing significant improvements on various fundamental tasks and mainstream backbones. Moreover, BiPointNet gives an impressive 14.7x speedup and 18.9x storage saving on real-world resource-constrained devices.

Proceedings Article
01 Jan 2020
TL;DR: A novel application of the triplet loss is proposed to robustify task inference and relabel the transitions from the training tasks by approximating their reward functions, which leads to significantly faster convergence compared to randomly initialized policies.
Abstract: We tackle the Multi-task Batch Reinforcement Learning problem. Given multiple datasets collected from different tasks, we train a multi-task policy to perform well in unseen tasks sampled from the same distribution. The task identities of the unseen tasks are not provided. To perform well, the policy must infer the task identity from collected transitions by modelling its dependency on states, actions and rewards. Because the different datasets may have state-action distributions with large divergence, the task inference module can learn to ignore the rewards and spuriously correlate only state-action pairs to the task identity, leading to poor test time performance. To robustify task inference, we propose a novel application of the triplet loss. To mine hard negative examples, we relabel the transitions from the training tasks by approximating their reward functions. When we allow further training on the unseen tasks, using the trained policy as an initialization leads to significantly faster convergence compared to randomly initialized policies (up to 80% improvement and across 5 different Mujoco task distributions). We name our method MBML (Multi-task Batch RL with Metric Learning).

Posted Content
TL;DR: A learning-based iterative grouping framework which learns a grouping policy to progressively merge small part proposals into bigger ones in a bottom-up fashion, which guarantees the generalizability to novel categories.
Abstract: We address the problem of discovering 3D parts for objects in unseen categories. Being able to learn the geometry prior of parts and transfer this prior to unseen categories pose fundamental challenges on data-driven shape segmentation approaches. Formulated as a contextual bandit problem, we propose a learning-based agglomerative clustering framework which learns a grouping policy to progressively group small part proposals into bigger ones in a bottom-up fashion. At the core of our approach is to restrict the local context for extracting part-level features, which encourages the generalizability to unseen categories. On the large-scale fine-grained 3D part dataset, PartNet, we demonstrate that our method can transfer knowledge of parts learned from 3 training categories to 21 unseen testing categories without seeing any annotated samples. Quantitative comparisons against four shape segmentation baselines shows that our approach achieve the state-of-the-art performance.

Journal ArticleDOI
TL;DR: A deep neural network-based machine learning technique was used to mitigate the scattering effect, where the NN was employed to study the highly sophisticated relationship between the input digital masks and their corresponding output 3D printed structures.
Abstract: When using light-based three-dimensional (3D) printing methods to fabricate functional micro-devices, unwanted light scattering during the printing process is a significant challenge to achieve high-resolution fabrication. We report the use of a deep neural network (NN)-based machine learning (ML) technique to mitigate the scattering effect, where our NN was employed to study the highly sophisticated relationship between the input digital masks and their corresponding output 3D printed structures. Furthermore, the NN was used to model an inverse 3D printing process, where it took desired printed structures as inputs and subsequently generated grayscale digital masks that optimized the light exposure dose according to the desired structures’ local features. Verification results showed that using NN-generated digital masks yielded significant improvements in printing fidelity when compared with using masks identical to the desired structures.

Posted Content
TL;DR: This work proposes to leverage the input point cloud as much as possible, by only adding connectivity information to existing points, and demonstrates that this method can not only preserve details, handle ambiguous structures, but also possess strong generalizability to unseen categories by experiments on synthetic and real data.
Abstract: We are interested in reconstructing the mesh representation of object surfaces from point clouds. Surface reconstruction is a prerequisite for downstream applications such as rendering, collision avoidance for planning, animation, etc. However, the task is challenging if the input point cloud has a low resolution, which is common in real-world scenarios (e.g., from LiDAR or Kinect sensors). Existing learning-based mesh generative methods mostly predict the surface by first building a shape embedding that is at the whole object level, a design that causes issues in generating fine-grained details and generalizing to unseen categories. Instead, we propose to leverage the input point cloud as much as possible, by only adding connectivity information to existing points. Particularly, we predict which triplets of points should form faces. Our key innovation is a surrogate of local connectivity, calculated by comparing the intrinsic/extrinsic metrics. We learn to predict this surrogate using a deep point cloud network and then feed it to an efficient post-processing module for high-quality mesh generation. We demonstrate that our method can not only preserve details, handle ambiguous structures, but also possess strong generalizability to unseen categories by experiments on synthetic and real data. The code is available at this https URL.

Book ChapterDOI
23 Aug 2020
TL;DR: In this paper, a weakly supervised method is proposed to estimate both 3D canonical shape and 6-DoF pose for alignment, given multiple partial observations associated with the same instance.
Abstract: 3D shape completion for real data is important but challenging, since partial point clouds acquired by real-world sensors are usually sparse, noisy and unaligned. Different from previous methods, we address the problem of learning 3D complete shape from unaligned and real-world partial point clouds. To this end, we propose a weakly-supervised method to estimate both 3D canonical shape and 6-DoF pose for alignment, given multiple partial observations associated with the same instance. The network jointly optimizes canonical shapes and poses with multi-view geometry constraints during training, and can infer the complete shape given a single partial point cloud. Moreover, learned pose estimation can facilitate partial point cloud registration. Experiments on both synthetic and real data show that it is feasible and promising to learn 3D shape completion through large-scale data without shape and pose supervision.

Book ChapterDOI
23 Aug 2020
TL;DR: In this paper, a surrogate of local connectivity is calculated by comparing the intrinsic/extrinsic metrics, which is then used to predict which triplets of points should form faces.
Abstract: We are interested in reconstructing the mesh representation of object surfaces from point clouds. Surface reconstruction is a prerequisite for downstream applications such as rendering, collision avoidance for planning, animation, etc. However, the task is challenging if the input point cloud has a low resolution, which is common in real-world scenarios (e.g., from LiDAR or Kinect sensors). Existing learning-based mesh generative methods mostly predict the surface by first building a shape embedding that is at the whole object level, a design that causes issues in generating fine-grained details and generalizing to unseen categories. Instead, we propose to leverage the input point cloud as much as possible, by only adding connectivity information to existing points. Particularly, we predict which triplets of points should form faces. Our key innovation is a surrogate of local connectivity, calculated by comparing the intrinsic/extrinsic metrics. We learn to predict this surrogate using a deep point cloud network and then feed it to an efficient post-processing module for high-quality mesh generation. We demonstrate that our method can not only preserve details, handle ambiguous structures, but also possess strong generalizability to unseen categories by experiments on synthetic and real data.

Proceedings ArticleDOI
24 Oct 2020
TL;DR: In this article, an end-to-end trainable framework consisting of learnable modules for detection, feature extraction, matching and outlier rejection, while directly optimizing for the geometric pose objective is proposed.
Abstract: Estimating relative camera poses from consecutive frames is a fundamental problem in visual odometry (VO) and simultaneous localization and mapping (SLAM), where classic methods consisting of hand-crafted features and sampling-based outlier rejection have been a dominant choice for over a decade. Although multiple works propose to replace these modules with learning-based counterparts, most have not yet been as accurate, robust and generalizable as conventional methods. In this paper, we design an end-to-end trainable framework consisting of learnable modules for detection, feature extraction, matching and outlier rejection, while directly optimizing for the geometric pose objective. We show both quantitatively and qualitatively that pose estimation performance may be achieved on par with the classic pipeline. Moreover, we are able to show by end-to-end training, the key components of the pipeline could be significantly improved, which leads to better generalizability to unseen datasets compared to existing learning-based methods.