scispace - formally typeset
Search or ask a question

Showing papers by "Hao Su published in 2021"


Proceedings Article
01 Jan 2021
TL;DR: MVSNeRF as discussed by the authors proposes a generic deep neural network that can reconstruct radiance fields from only three nearby input views via fast network inference, leveraging plane-swept cost volumes (widely used in multi-view stereo) for geometry-aware scene reasoning.
Abstract: We present MVSNeRF, a novel neural rendering approach that can efficiently reconstruct neural radiance fields for view synthesis. Unlike prior works on neural radiance fields that consider per-scene optimization on densely captured images, we propose a generic deep neural network that can reconstruct radiance fields from only three nearby input views via fast network inference. Our approach leverages plane-swept cost volumes (widely used in multi-view stereo) for geometry-aware scene reasoning, and combines this with physically based volume rendering for neural radiance field reconstruction. We train our network on real objects in the DTU dataset, and test it on three different datasets to evaluate its effectiveness and generalizability. Our approach can generalize across scenes (even indoor scenes, completely different from our training scenes of objects) and generate realistic view synthesis results using only three input images, significantly outperforming concurrent works on generalizable radiance field reconstruction. Moreover, if dense images are captured, our estimated radiance field representation can be easily fine-tuned; this leads to fast per-scene reconstruction with higher rendering quality and substantially less optimization time than NeRF.

94 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: In this paper, a 3D-to-2D texture mapping network is introduced to explicitly disentangle geometry and appearance from appearance, represented as a continuous 2D texture map.
Abstract: Recent work [28], [5] has demonstrated that volumetric scene representations combined with differentiable volume rendering can enable photo-realistic rendering for challenging scenes that mesh reconstruction fails on. However, these methods entangle geometry and appearance in a "black-box" volume that cannot be edited. Instead, we present an approach that explicitly disentangles geometry—represented as a continuous 3D volume—from appearance—represented as a continuous 2D texture map. We achieve this by introducing a 3D-to-2D texture mapping (or surface parameterization) network into volumetric representations. We constrain this texture mapping network using an additional 2D-to-3D inverse mapping network and a novel cycle consistency loss to make 3D surface points map to 2D texture points that map back to the original 3D points. We demonstrate that this representation can be reconstructed using only multi-view image supervision and generates high-quality rendering results. More importantly, by separating geometry and texture, we allow users to edit appearance by simply editing 2D texture maps.

71 citations


Journal ArticleDOI
TL;DR: In this article, the authors provide an overview of the current state of the art, highlighting the enabling technologies and unmet needs for prospective technological advances within the next five to 10 years, and identify key research and knowledge barriers that need to be addressed in developing effective and flexible solutions to ensure preparedness for rapid and scalable deployment to combat infectious diseases.
Abstract: Medical robots can play an important role in mitigating the spread of infectious diseases and delivering quality care to patients during the COVID-19 pandemic. Methods and procedures involving medical robots in the continuum of care, ranging from disease prevention, screening, diagnosis, treatment, and home care, have been extensively deployed and also present incredible opportunities for future development. This article provides an overview of the current state of the art, highlighting the enabling technologies and unmet needs for prospective technological advances within the next five to 10 years. We also identify key research and knowledge barriers that need to be addressed in developing effective and flexible solutions to ensure preparedness for rapid and scalable deployment to combat infectious diseases.

39 citations


Journal ArticleDOI
31 Mar 2021
TL;DR: In this paper, the fundamental requirements for robotics for infectious disease management and outline how robotic technologies can be used in different scenarios, including disease prevention and monitoring, clinical care, laboratory automation, logistics, and maintenance of socioeconomic activities.
Abstract: The world was unprepared for the COVID-19 pandemic, and recovery is likely to be a long process. Robots have long been heralded to take on dangerous, dull, and dirty jobs, often in environments that are unsuitable for humans. Could robots be used to fight future pandemics? We review the fundamental requirements for robotics for infectious disease management and outline how robotic technologies can be used in different scenarios, including disease prevention and monitoring, clinical care, laboratory automation, logistics, and maintenance of socioeconomic activities. We also address some of the open challenges for developing advanced robots that are application oriented, reliable, safe, and rapidly deployable when needed. Last, we look at the ethical use of robots and call for globally sustained efforts in order for robots to be ready for future outbreaks.

37 citations


Journal ArticleDOI
TL;DR: In this paper, a machine learning framework leveraging existing convolutional neural network architectures and model interpretation techniques was presented to identify and interpret sequence context features most important for predicting whether a particular motif instance will be bound.
Abstract: Transcription factors (TFs) bind DNA by recognizing specific sequence motifs, typically of length 6-12bp. A motif can occur many thousands of times in the human genome, but only a subset of those sites are actually bound. Here we present a machine learning framework leveraging existing convolutional neural network architectures and model interpretation techniques to identify and interpret sequence context features most important for predicting whether a particular motif instance will be bound. We apply our framework to predict binding at motifs for 38 TFs in a lymphoblastoid cell line, score the importance of context sequences at base-pair resolution, and characterize context features most predictive of binding. We find that the choice of training data heavily influences classification accuracy and the relative importance of features such as open chromatin. Overall, our framework enables novel insights into features predictive of TF binding and is likely to inform future deep learning applications to interpret non-coding genetic variants.

36 citations


Journal ArticleDOI
TL;DR: This work introduces VA-Point-MVSNet, a novel visibility-aware point-based deep framework for multi-view stereo (MVS), which directly processes the target scene as point clouds and allows higher accuracy, more computational efficiency and more flexibility than cost-volume-based counterparts.
Abstract: We introduce VA-Point-MVSNet, a novel visibility-aware point-based deep framework for multi-view stereo (MVS). Distinct from existing cost volume approaches, our method directly processes the target scene as point clouds. More specifically, our method predicts the depth in a coarse-to-fine manner. We first generate a coarse depth map, convert it into a point cloud and refine the point cloud iteratively by estimating the residual between the depth of the current iteration and that of the ground truth. Our network leverages 3D geometry priors and 2D texture information jointly and effectively by fusing them into a feature-augmented point cloud, and processes the point cloud to estimate the 3D flow for each point. This point-based architecture allows higher accuracy, more computational efficiency and more flexibility than cost-volume-based counterparts. Furthermore, our visibility-aware multi-view feature aggregation allows the network to aggregate multi-view appearance cues while taking into account visibility. Experimental results show that our approach achieves a significant improvement in reconstruction quality compared with state-of-the-art methods on the DTU and the Tanks and Temples dataset. The code of VA-Point-MVSNet proposed in this work will be released at https://github.com/callmeray/PointMVSNet .

32 citations


Posted Content
TL;DR: A cloud-based benchmark for robotic grasping and manipulation, called the OCRTOC benchmark, which focuses on the object rearrangement problem, specifically table organization tasks and held a competition in the 2020 International Conference on Intelligence Robots and Systems.
Abstract: In this paper, we propose a cloud-based benchmark for robotic grasping and manipulation, called the OCRTOC benchmark. The benchmark focuses on the object rearrangement problem, specifically table organization tasks. We provide a set of identical real robot setups and facilitate remote experiments of standardized table organization scenarios in varying difficulties. In this workflow, users upload their solutions to our remote server and their code is executed on the real robot setups and scored automatically. After each execution, the OCRTOC team resets the experimental setup manually. We also provide a simulation environment that researchers can use to develop and test their solutions. With the OCRTOC benchmark, we aim to lower the barrier of conducting reproducible research on robotic grasping and manipulation and accelerate progress in this field. Executing standardized scenarios on identical real robot setups allows us to quantify algorithm performances and achieve fair comparisons. Using this benchmark we held a competition in the 2020 International Conference on Intelligence Robots and Systems (IROS 2020). In total, 59 teams took part in this competition worldwide. We present the results and our observations of the 2020 competition, and discuss our adjustments and improvements for the upcoming OCRTOC 2021 competition. The homepage of the OCRTOC competition is this http URL, and the OCRTOC software package is available at this https URL.

24 citations


Posted Content
TL;DR: In this article, a unified framework is proposed to handle 9DoF pose tracking for novel rigid object instances as well as per-part pose tracking of articulated objects from known categories, where the 3D amodal bounding box representation is equivalent to a free 6D pose.
Abstract: In this work, we tackle the problem of category-level online pose tracking of objects from point cloud sequences. For the first time, we propose a unified framework that can handle 9DoF pose tracking for novel rigid object instances as well as per-part pose tracking for articulated objects from known categories. Here the 9DoF pose, comprising 6D pose and 3D size, is equivalent to a 3D amodal bounding box representation with free 6D pose. Given the depth point cloud at the current frame and the estimated pose from the last frame, our novel end-to-end pipeline learns to accurately update the pose. Our pipeline is composed of three modules: 1) a pose canonicalization module that normalizes the pose of the input depth point cloud; 2) RotationNet, a module that directly regresses small interframe delta rotations; and 3) CoordinateNet, a module that predicts the normalized coordinates and segmentation, enabling analytical computation of the 3D size and translation. Leveraging the small pose regime in the pose-canonicalized point clouds, our method integrates the best of both worlds by combining dense coordinate prediction and direct rotation regression, thus yielding an end-to-end differentiable pipeline optimized for 9DoF pose accuracy (without using non-differentiable RANSAC). Our extensive experiments demonstrate that our method achieves new state-of-the-art performance on category-level rigid object pose (NOCS-REAL275) and articulated object pose benchmarks (SAPIEN , BMVC) at the fastest FPS ~12.

22 citations


Journal ArticleDOI
TL;DR: In this article, the effects of La(III) on anammox process in performance, microbial community structure, metabolic function, and microbial co-occurrence network were investigated.

22 citations


Journal ArticleDOI
TL;DR: A strategic approach and insight is unraveled into how Nitrosomonas and other functional genera regulate and control their activities against La(III) toxicity and subsequently enhance the PN process.

18 citations


Journal ArticleDOI
TL;DR: The COVID-19 pandemic has highlighted key challenges for patient care and health provider safety, and adaptable robotic systems, with enhanced sensing, manipulation and autonomy capabilities could help address these challenges.
Abstract: The COVID-19 pandemic has highlighted key challenges for patient care and health provider safety. Adaptable robotic systems, with enhanced sensing, manipulation and autonomy capabilities could help address these challenges in future infectious disease outbreaks.

Journal ArticleDOI
TL;DR: The development of a fully actuated robotic assistant for magnetic resonance imaging (MRI) guided precision conformal ablation of brain tumors using an interstitial high-intensity needle-based therapeutic ultrasound ablator probe is reported.
Abstract: This article reports the development of a fully actuated robotic assistant for magnetic resonance imaging (MRI) guided precision conformal ablation of brain tumors using an interstitial high-intensity needle-based therapeutic ultrasound ablator probe. The robot is designed with an eight degree-of-freedom (8-DOF) remote center of motion manipulator driven by piezoelectric actuators, five for aligning the ultrasound thermal ablator to the target lesions, and three for inserting and orienting the ablator and its cannula to generate a desired ablation profile. The 8-DOF fully actuated robot can be operated in the scanner bore during imaging; thus, alleviating the need for moving the patient in or out of the scanner during the procedure, and therefore potentially reducing the procedure time and streamlining the workflow. The free space positioning accuracy of the system is evaluated with the OptiTrack motion capture system, demonstrating the root-mean-square (RMS) error of the tip position to be 1.11 $\pm$ 0.43 mm. The system targeting accuracy in MRI is assessed with phantom studies, indicating the RMS errors of the tip position to be 1.45 $\pm$ 0.66 mm and orientation to be 1.53 $\pm 0.69^\circ$ . The feasibility of the system to perform thermal ablation is validated through a preliminary ex-vivo tissue study with position error less than 4.3 mm and orientation error less than 4.3°.

Proceedings ArticleDOI
20 Jun 2021
TL;DR: DeepMetaHandles as discussed by the authors learns a set of meta-handles for each shape, which are represented as combinations of the given handles, and disentangles the disentangled metahandles factorizing all plausible deformations of the shape, while each of them corresponds to an intuitive deformation.
Abstract: We propose DeepMetaHandles, a 3D conditional generative model based on mesh deformation. Given a collection of 3D meshes of a category and their deformation handles (control points), our method learns a set of meta-handles for each shape, which are represented as combinations of the given handles. The disentangled meta-handles factorize all the plausible deformations of the shape, while each of them corresponds to an intuitive deformation. A new deformation can then be generated by sampling the co-efficients of the meta-handles in a specific range. We employ biharmonic coordinates as the deformation function, which can smoothly propagate the control points’ translations to the entire mesh. To avoid learning zero deformaion as meta-handles, we incorporate a target-fitting module which deforms the input mesh to match a random target. To enhance deformations’ plausibility, we employ a soft-rasterizer-based discriminator that projects the meshes to a 2D space. Our experiments demonstrate the superiority of the generated deformations as well as the interpretability and consistency of the learned meta-handles. The code is available at https://github.com/Colin97/DeepMetaHandles.

Journal ArticleDOI
TL;DR: The proposed system provides an enabling intervention to potentially reduce musculoskeletal injury risks of construction workers by controlling the assistive knee joint torque provided by lightweight exoskeletons with powerful quasi-direct drive actuation.
Abstract: Construction workers regularly perform tasks that require kneeling, crawling, and squatting. Working in awkward kneeling postures for prolonged time periods can lead to knee pain, injuries, and osteoarthritis. In this article, we present lightweight, wearable sensing, and knee assistive devices for construction workers during kneeling and squatting tasks. Analysis of kneeling on level and sloped surfaces (0 $^\circ$ , 10 $^\circ$ , and 20 $^\circ$ ) is performed for single- and double-leg kneeling tasks. Measurements from the integrated inertial measurement units are used for real-time gait detection and lower limb pose estimation. Detected gait events and pose estimation are used to control the assistive knee joint torque provided by lightweight exoskeletons with powerful quasi-direct drive actuation. Human subject experiments are conducted to validate the effectiveness of the proposed analysis and control design. The results show reduction in knee extension/flexion muscle activation (up to 39%) during stand-to-kneel and kneel-to-stand tasks. Knee-ground contact forces/pressures are also reduced (up to 15%) under robotic assistance during single-leg kneeling. Increasing assistive knee torque shows redistribution of the subject's weight from the knee in contact with the ground to both supporting feet. The proposed system provides an enabling intervention to potentially reduce musculoskeletal injury risks of construction workers.

Journal ArticleDOI
TL;DR: In this article, a localized chemo-immunotherapy system was developed using an anti-cancer drug-based supramolecular polymer (SP) hydrogel to re-edit the host's immune system to combat cancer.

Proceedings Article
03 May 2021
TL;DR: BiNet as discussed by the authors introduces entropy-maximizing aggregation (EMA) to modulate the distribution before aggregation for the maximum information entropy, and layer-wise scale recovery (LSR) to efficiently restore feature representation capacity.
Abstract: To alleviate the resource constraint for real-time point cloud applications that run on edge devices, in this paper we present BiPointNet, the first model binarization approach for efficient deep learning on point clouds. We discover that the immense performance drop of binarized models for point clouds mainly stems from two challenges: aggregation-induced feature homogenization that leads to a degradation of information entropy, and scale distortion that hinders optimization and invalidates scale-sensitive structures. With theoretical justifications and in-depth analysis, our BiPointNet introduces Entropy-Maximizing Aggregation (EMA) to modulate the distribution before aggregation for the maximum information entropy, and Layer-wise Scale Recovery (LSR) to efficiently restore feature representation capacity. Extensive experiments show that BiPointNet outperforms existing binarization methods by convincing margins, at the level even comparable with the full precision counterpart. We highlight that our techniques are generic, guaranteeing significant improvements on various fundamental tasks and mainstream backbones. Moreover, BiPointNet gives an impressive 14.7× speedup and 18.9× storage saving on real-world resource-constrained devices.

Journal ArticleDOI
19 Jul 2021
TL;DR: In this article, a reinforcement learning-based motion controller for a lower extremity rehabilitation exoskeleton is proposed to perform collaborative squatting exercises with efficiency, stability, and strong robustness.
Abstract: A significant challenge for the control of a robotic lower extremity rehabilitation exoskeleton is to ensure stability and robustness during programmed tasks or motions, which is crucial for the safety of the mobility-impaired user. Due to various levels of the user's disability, the human-exoskeleton interaction forces and external perturbations are unpredictable and could vary substantially and cause conventional motion controllers to behave unreliably or the robot to fall down. In this work, we propose a new, reinforcement learning-based, motion controller for a lower extremity rehabilitation exoskeleton, aiming to perform collaborative squatting exercises with efficiency, stability, and strong robustness. Unlike most existing rehabilitation exoskeletons, our exoskeleton has ankle actuation on both sagittal and front planes and is equipped with multiple foot force sensors to estimate center of pressure (CoP), an important indicator of system balance. This proposed motion controller takes advantage of the CoP information by incorporating it in the state input of the control policy network and adding it to the reward during the learning to maintain a well balanced system state during motions. In addition, we use dynamics randomization and adversary force perturbations including large human interaction forces during the training to further improve control robustness. To evaluate the effectiveness of the learning controller, we conduct numerical experiments with different settings to demonstrate its remarkable ability on controlling the exoskeleton to repetitively perform well balanced and robust squatting motions under strong perturbations and realistic human interaction forces.

Proceedings Article
03 May 2021
TL;DR: PasticineLab as mentioned in this paper is a differentiable physics benchmark for soft body manipulation tasks, where an agent uses manipulators to deform the plasticine into a desired configuration. But the simulation process usually does not provide gradients that might be useful for planning and control optimizations.
Abstract: Simulated virtual environments serve as one of the main driving forces behind developing and evaluating skill learning algorithms However, existing environments typically only simulate rigid body physics Additionally, the simulation process usually does not provide gradients that might be useful for planning and control optimizations We introduce a new differentiable physics benchmark called PasticineLab, which includes a diverse collection of soft body manipulation tasks In each task, the agent uses manipulators to deform the plasticine into a desired configuration The underlying physics engine supports differentiable elastic and plastic deformation using the DiffTaichi system, posing many under-explored challenges to robotic agents We evaluate several existing reinforcement learning (RL) methods and gradient-based methods on this benchmark Experimental results suggest that 1) RL-based approaches struggle to solve most of the tasks efficiently; 2) gradient-based approaches, by optimizing open-loop control sequences with the built-in differentiable physics engine, can rapidly find a solution within tens of iterations, but still fall short on multi-stage tasks that require long-term planning We expect that PlasticineLab will encourage the development of novel algorithms that combine differentiable physics and RL for more complex physics-based skill learning tasks PlasticineLab will be made publicly available

Proceedings Article
29 Mar 2021
TL;DR: Zhang et al. as discussed by the authors introduced GNeRF, a framework to marry Generative Adversarial Networks (GAN) with Neural Radiance Field (NeRF) reconstruction for the complex scenarios with unknown and even randomly initialized camera poses.
Abstract: We introduce GNeRF, a framework to marry Generative Adversarial Networks (GAN) with Neural Radiance Field (NeRF) reconstruction for the complex scenarios with unknown and even randomly initialized camera poses. Recent NeRF-based advances have gained popularity for remarkable realistic novel view synthesis. However, most of them heavily rely on accurate camera poses estimation, while few recent methods can only optimize the unknown camera poses in roughly forward-facing scenes with relatively short camera trajectories and require rough camera poses initialization. Differently, our GNeRF only utilizes randomly initialized poses for complex outside-in scenarios. We propose a novel two-phases end-to-end framework. The first phase takes the use of GANs into the new realm for optimizing coarse camera poses and radiance fields jointly, while the second phase refines them with additional photometric loss. We overcome local minima using a hybrid and iterative optimization scheme. Extensive experiments on a variety of synthetic and natural scenes demonstrate the effectiveness of GNeRF. More impressively, our approach outperforms the baselines favorably in those scenes with repeated patterns or even low textures that are regarded as extremely challenging before.

Journal ArticleDOI
TL;DR: In this paper, a coarse-to-fine approach is proposed to generate robust aesthetic QR codes based on a module-based scanning probability estimation model that can effectively balance the tradeoff between visual quality and scanning robustness.
Abstract: Quick response (QR) codes are usually scanned in different environments, so they must be robust to variations in illumination, scale, coverage, and camera angles. Aesthetic QR codes improve the visual quality, but subtle changes in their appearance may cause scanning failure. In this article, a new method to generate scanning-robust aesthetic QR codes is proposed, which is based on a module-based scanning probability estimation model that can effectively balance the tradeoff between visual quality and scanning robustness. Our method locally adjusts the luminance of each module by estimating the probability of successful sampling. The approach adopts the hierarchical, coarse-to-fine strategy to enhance the visual quality of aesthetic QR codes, which sequentially generate the following three codes: a binary aesthetic QR code, a grayscale aesthetic QR code, and the final color aesthetic QR code. Our approach also can be used to create QR codes with different visual styles by adjusting some initialization parameters. User surveys and decoding experiments were adopted for evaluating our method compared with state-of-the-art algorithms, which indicates that the proposed approach has excellent performance in terms of both visual quality and scanning robustness.

Proceedings ArticleDOI
23 Apr 2021
TL;DR: The OCRTOC benchmark as mentioned in this paper provides a set of identical real robot setups and facilitates remote experiments of standardized table organization scenarios in varying difficulties, and users upload their solutions to a remote server and their code is executed on the real robot setup and scored automatically.
Abstract: In this paper, we propose a cloud-based benchmark for robotic grasping and manipulation, called the OCRTOC benchmark. The benchmark focuses on the object rearrangement problem, specifically table organization tasks. We provide a set of identical real robot setups and facilitate remote experiments of standardized table organization scenarios in varying difficulties. In this workflow, users upload their solutions to our remote server and their code is executed on the real robot setups and scored automatically. After each execution, the OCRTOC team resets the experimental setup manually. We also provide a simulation environment that researchers can use to develop and test their solutions. With the OCRTOC benchmark, we aim to lower the barrier of conducting reproducible research on robotic grasping and manipulation and accelerate progress in this field. Executing standardized scenarios on identical real robot setups allows us to quantify algorithm performances and achieve fair comparisons. Using this benchmark we held a competition in the 2020 International Conference on Intelligence Robots and Systems (IROS 2020). In total, 59 teams took part in this competition worldwide. We present the results and our observations of the 2020 competition, and discuss our adjustments and improvements for the upcoming OCRTOC 2021 competition. The homepage of the OCRTOC competition is www.ocrtoc.org, and the OCRTOC software package is available at https://github.com/OCRTOC/OCRTOC_software_package.

Journal ArticleDOI
TL;DR: It was found that the effect of dosing Ce( III) in the PN system correlated strongly with the AOR, and 2-dimensional correlation infrared-(2DCOS-IR) revealed ester group (uronic acid) as a major organic functional group that promoted Ce(III) removal.

Journal ArticleDOI
TL;DR: In this article, a self-limiting supramolecular polymerization (SPZ) of a series of multiarmed amphiphiles with propagationattenuated reactivities that can automatically terminate the polymerization process is described.
Abstract: A fundamental goal in the noncovalent synthesis of ordered supramolecular polymers (SPs) is to achieve precise control over their size and size distribution; however, the reversible nature of noncovalent interactions often results in formation of living SPs with high dispersity in length. We report here on the self-limiting supramolecular polymerization (SPZ) of a series of multiarmed amphiphiles with propagation-attenuated reactivities that can automatically terminate the polymerization process, enabling effective control in both lengths and polydispersity. Through incorporating multiarmed oligoethylene-glycol (OEG) onto a quadratic aromatic segment, the lengths of the resultant SPs can be tuned from ∼1 μm to 130 and 50 nm with a polydispersity index of ∼1.2 for the last two SPs. We believe that the level of chain frustration of the multiarmed OEG segments, determined by both the number of arms and the degree of polymerization, poses physical and entropic constrains for supramolecular propagation to exceed a threshold length.

Posted Content
TL;DR: MVSNeRF as discussed by the authors proposes a generic deep neural network that can reconstruct radiance fields from only three nearby input views via fast network inference, leveraging plane-swept cost volumes (widely used in multi-view stereo) for geometry-aware scene reasoning.
Abstract: We present MVSNeRF, a novel neural rendering approach that can efficiently reconstruct neural radiance fields for view synthesis. Unlike prior works on neural radiance fields that consider per-scene optimization on densely captured images, we propose a generic deep neural network that can reconstruct radiance fields from only three nearby input views via fast network inference. Our approach leverages plane-swept cost volumes (widely used in multi-view stereo) for geometry-aware scene reasoning, and combines this with physically based volume rendering for neural radiance field reconstruction. We train our network on real objects in the DTU dataset, and test it on three different datasets to evaluate its effectiveness and generalizability. Our approach can generalize across scenes (even indoor scenes, completely different from our training scenes of objects) and generate realistic view synthesis results using only three input images, significantly outperforming concurrent works on generalizable radiance field reconstruction. Moreover, if dense images are captured, our estimated radiance field representation can be easily fine-tuned; this leads to fast per-scene reconstruction with higher rendering quality and substantially less optimization time than NeRF.

Journal Article
TL;DR: A middle-point sampling-aware baseline discriminator is discovered, PointNet-Mix, which improves all existing point cloud generators by a large margin on sampling-related metrics and is guided by the proposed sampling spectrum.
Abstract: In this paper, we examine the long-neglected yet important effects of point sam- pling patterns in point cloud GANs. Through extensive experiments, we show that sampling-insensitive discriminators (e.g. PointNet-Max) produce shape point clouds with point clustering artifacts while sampling-oversensitive discriminators (e.g. PointNet++, DGCNN, PointConv, KPConv) fail to guide valid shape generation. We propose the concept of sampling spectrum to depict the different sampling sensitivities of discriminators. We further study how different evaluation metrics weigh the sampling pattern against the geometry and propose several perceptual metrics forming a sampling spectrum of metrics. Guided by the proposed sampling spectrum, we discover a middle-point sampling-aware baseline discriminator, PointNet-Mix, which improves all existing point cloud generators by a large margin on sampling-related metrics. We point out that, given that recent research has been focused on the generator design, the discriminator design needs more attention. Our work provides both suggestions and tools for building future discriminators. We will release the code to facilitate future research.

Proceedings Article
01 Jan 2021
TL;DR: This article proposed to enforce the translated outputs to be semantically invariant w.r.t. small perceptual variations of the inputs, a property they call "semantic robustness".
Abstract: Many applications of unpaired image-to-image translation require the input contents to be preserved semantically during translations. Unaware of the inherently unmatched semantics distributions between source and target domains, existing distribution matching methods (i.e., GAN-based) can give undesired solutions. In particular, although producing visually reasonable outputs, the learned models usually flip the semantics of the inputs. To tackle this without using extra supervision, we propose to enforce the translated outputs to be semantically invariant w.r.t. small perceptual variations of the inputs, a property we call "semantic robustness". By optimizing a robustness loss w.r.t. multi-scale feature space perturbations of the inputs, our method effectively reduces semantics flipping and produces translations that outperform existing methods both quantitatively and qualitatively.

Proceedings Article
Hao Su1, Jianwei Niu1, Xuefeng Liu1, Qingfeng Li1, Jiahe Cui1, Ji Wan1 
18 May 2021
TL;DR: MangaGAN as discussed by the authors generates geometric features and converts each facial region into the manga domain with a tailored multi-GANs architecture to produce high-quality manga faces preserving both the facial similarity and manga style, and outperforms other reference methods.
Abstract: Manga is a world popular comic form originated in Japan, which typically employs black-and-white stroke lines and geometric exaggeration to describe humans' appearances, poses, and actions. In this paper, we propose MangaGAN, the first method based on Generative Adversarial Network (GAN) for unpaired photo-to-manga translation. Inspired by the drawing process of experienced manga artists, MangaGAN generates geometric features and converts each facial region into the manga domain with a tailored multi-GANs architecture. For training MangaGAN, we collect a new data-set from a popular manga work with extensive features. To produce high-quality manga faces, we propose a structural smoothing loss to smooth stroke-lines and avoid noisy pixels, and a similarity preserving module to improve the similarity between domains of photo and manga. Extensive experiments show that MangaGAN can produce high-quality manga faces preserving both the facial similarity and manga style, and outperforms other reference methods.

Journal ArticleDOI
TL;DR: It is found that under physiological conditions the DOTA-conjugated CPT prodrug can self-assemble into tubular supramolecular polymers (SPs) with a length of several micrometers, and it is believed that the design and optimization of self-assembling theranostic conjugates could provide a robust yet simple platform for the development of new imaging-guided drug delivery systems.
Abstract: Therapeutic constructs with imaging modalities hold great promise for improving the treatment efficacy for cancer and many other diseases. We report here the design and synthesis of a self-assembling prodrug (SAPD) by the direct linkage of camptothecin (CPT), an anticancer drug, to a metal-chelating agent, DOTA. We found that under physiological conditions the DOTA-conjugated CPT prodrug can self-assemble into tubular supramolecular polymers (SPs) with a length of several micrometers. Our studies also suggest that the resultant assemblies were stable in biological environments and exhibited a fast drug release rate in the presence of intracellular glutathione. Furthermore, the SAPD exhibited remarkable in vitro efficacy against various cancer cell lines and effectively inhibited the growth of tumor spheroids. We believe that the design and optimization of self-assembling theranostic conjugates could provide a robust yet simple platform for the development of new imaging-guided drug delivery systems.

Posted Content
TL;DR: In this article, the authors introduce a differentiable physics benchmark called PasticineLab, which includes a diverse collection of soft body manipulation tasks, and evaluate several existing reinforcement learning (RL) methods and gradient-based methods on this benchmark.
Abstract: Simulated virtual environments serve as one of the main driving forces behind developing and evaluating skill learning algorithms. However, existing environments typically only simulate rigid body physics. Additionally, the simulation process usually does not provide gradients that might be useful for planning and control optimizations. We introduce a new differentiable physics benchmark called PasticineLab, which includes a diverse collection of soft body manipulation tasks. In each task, the agent uses manipulators to deform the plasticine into the desired configuration. The underlying physics engine supports differentiable elastic and plastic deformation using the DiffTaichi system, posing many under-explored challenges to robotic agents. We evaluate several existing reinforcement learning (RL) methods and gradient-based methods on this benchmark. Experimental results suggest that 1) RL-based approaches struggle to solve most of the tasks efficiently; 2) gradient-based approaches, by optimizing open-loop control sequences with the built-in differentiable physics engine, can rapidly find a solution within tens of iterations, but still fall short on multi-stage tasks that require long-term planning. We expect that PlasticineLab will encourage the development of novel algorithms that combine differentiable physics and RL for more complex physics-based skill learning tasks.

Posted Content
TL;DR: In this paper, the authors investigate causes of instability when using data augmentation in common off-policy RL algorithms and propose a simple yet effective technique for stabilizing this class of algorithms under augmentation.
Abstract: While agents trained by Reinforcement Learning (RL) can solve increasingly challenging tasks directly from visual observations, generalizing learned skills to novel environments remains very challenging. Extensive use of data augmentation is a promising technique for improving generalization in RL, but it is often found to decrease sample efficiency and can even lead to divergence. In this paper, we investigate causes of instability when using data augmentation in common off-policy RL algorithms. We identify two problems, both rooted in high-variance Q-targets. Based on our findings, we propose a simple yet effective technique for stabilizing this class of algorithms under augmentation. We perform extensive empirical evaluation of image-based RL using both ConvNets and Vision Transformers (ViT) on a family of benchmarks based on DeepMind Control Suite, as well as in robotic manipulation tasks. Our method greatly improves stability and sample efficiency of ConvNets under augmentation, and achieves generalization results competitive with state-of-the-art methods for image-based RL. We further show that our method scales to RL with ViT-based architectures, and that data augmentation may be especially important in this setting.