scispace - formally typeset
Search or ask a question
Author

Hao Su

Bio: Hao Su is an academic researcher from University of California, San Diego. The author has contributed to research in topics: Computer science & Point cloud. The author has an hindex of 57, co-authored 302 publications receiving 55902 citations. Previous affiliations of Hao Su include Philips & Jiangxi University of Science and Technology.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper synthesizes novel viewpoints across a wide range of viewing directions (covering a 60° cone) from a sparse set of just six viewing directions, based on a deep convolutional network trained to directly synthesize new views from the six input views.
Abstract: The goal of light transport acquisition is to take images from a sparse set of lighting and viewing directions, and combine them to enable arbitrary relighting with changing view. While relighting from sparse images has received significant attention, there has been relatively less progress on view synthesis from a sparse set of "photometric" images---images captured under controlled conditions, lit by a single directional source; we use a spherical gantry to position the camera on a sphere surrounding the object. In this paper, we synthesize novel viewpoints across a wide range of viewing directions (covering a 60° cone) from a sparse set of just six viewing directions. While our approach relates to previous view synthesis and image-based rendering techniques, those methods are usually restricted to much smaller baselines, and are captured under environment illumination. At our baselines, input images have few correspondences and large occlusions; however we benefit from structured photometric images. Our method is based on a deep convolutional network trained to directly synthesize new views from the six input views. This network combines 3D convolutions on a plane sweep volume with a novel per-view per-depth plane attention map prediction network to effectively aggregate multi-view appearance. We train our network with a large-scale synthetic dataset of 1000 scenes with complex geometry and material properties. In practice, it is able to synthesize novel viewpoints for captured real data and reproduces complex appearance effects like occlusions, view-dependent specularities and hard shadows. Moreover, the method can also be combined with previous relighting techniques to enable changing both lighting and view, and applied to computer vision problems like multiview stereo from sparse image sets.

109 citations

Proceedings ArticleDOI
14 May 2012
TL;DR: This paper presents a versatile magnetic resonance imaging (MRI) compatible concentric tube continuum robotic system that enables MR image-guided placement of a curved, steerable active cannula, and is fully MRI-compatible allowing simultaneous robotic motion and imaging with no image quality degradation.
Abstract: This paper presents a versatile magnetic resonance imaging (MRI) compatible concentric tube continuum robotic system. The system enables MR image-guided placement of a curved, steerable active cannula. It is suitable for a variety of clinical applications including image-guided neurosurgery and percutaneous interventions, along with procedures that involve accessing a desired image target, through a curved trajectory. This 6 degree-of-freedom (DOF) robotic device is piezoelectrically actuated to provide precision motion with joint-level precision of better than 0.03mm, and is fully MRI-compatible allowing simultaneous robotic motion and imaging with no image quality degradation. The MRI compatibility of the robot has been evaluated under 3 Tesla MRI using standard prostate imaging sequences, with an average signal to noise ratio loss of less than 2% during actuator motion. The accuracy of active cannula control was evaluated in benchtop trials using an external optical tracking system with RMS error in tip placement of 1.00mm. Preliminary phantom trials of three active cannula placements in the MRI scanner showed cannula trajectories that agree with our kinematic model, with a RMS tip placement error of 0.61 – 2.24 mm.

109 citations

Journal ArticleDOI
TL;DR: A magnetic resonance imaging (MRI)-guided, robotically actuated stereotactic neural intervention system for deep brain stimulation procedure, which offers the potential of reducing procedure duration while improving targeting accuracy and enhancing safety.
Abstract: Stereotaxy is a neurosurgical technique that can take several hours to reach a specific target, typically utilizing a mechanical frame and guided by preoperative imaging. An error in any one of the numerous steps or deviations of the target anatomy from the preoperative plan such as brain shift (up to $20$ mm), may affect the targeting accuracy and thus the treatment effectiveness. Moreover, because the procedure is typically performed through a small burr hole opening in the skull that prevents tissue visualization, the intervention is basically “blind” for the operator with limited means of intraoperative confirmation that may result in reduced accuracy and safety. The presented system is intended to address the clinical needs for enhanced efficiency, accuracy, and safety of image-guided stereotactic neurosurgery for deep brain stimulation lead placement. The study describes a magnetic resonance imaging (MRI)-guided, robotically actuated stereotactic neural intervention system for deep brain stimulation procedure, which offers the potential of reducing procedure duration while improving targeting accuracy and enhancing safety. This is achieved through simultaneous robotic manipulation of the instrument and interactively updated in situ MRI guidance that enables visualization of the anatomy and interventional instrument. During simultaneous actuation and imaging, the system has demonstrated less than $15$ % signal-to-noise ratio variation and less than $0.20\%$ geometric distortion artifact without affecting the imaging usability to visualize and guide the procedure. Optical tracking and MRI phantom experiments streamline the clinical workflow of the prototype system, corroborating targeting accuracy with three-axis root mean square error $1.38\pm 0.45$ mm in tip position and $2.03\pm 0.58^\circ$ in insertion angle.

107 citations

Journal ArticleDOI
TL;DR: This paper presents a fully actuated robotic system for percutaneous prostate therapy under continuously acquired live magnetic resonance imaging (MRI) guidance and develops a 6-degree-of-freedom needle placement robot for transperineal prostate interventions.
Abstract: This paper presents a fully actuated robotic system for percutaneous prostate therapy under continuously acquired live magnetic resonance imaging (MRI) guidance. The system is composed of modular hardware and software to support the surgical workflow of intraoperative MRI-guided surgical procedures. We present the development of a 6-degree-of-freedom (DOF) needle placement robot for transperineal prostate interventions. The robot consists of a 3-DOF needle driver module and a 3-DOF Cartesian motion module. The needle driver provides needle cannula translation and rotation (2-DOF) and stylet translation (1-DOF). A custom robot controller consisting of multiple piezoelectric motor drivers provides precision closed-loop control of piezoelectric motors and enables simultaneous robot motion and MR imaging. The developed modular robot control interface software performs image-based registration, kinematics calculation, and exchanges robot commands and coordinates between the navigation software and the robot controller with a new implementation of the open network communication protocol OpenIGTLink. Comprehensive compatibility of the robot is evaluated inside a $3$ -T MRI scanner using standard imaging sequences and the signal-to-noise ratio loss is limited to $15\%$ . The image deterioration due to the present and motion of robot demonstrates unobservable image interference. Twenty-five targeted needle placements inside gelatin phantoms utilizing an 18-gauge ceramic needle demonstrated $0.87$ -mm root-mean-square (RMS) error in 3-D Euclidean distance based on MRI volume segmentation of the image-guided robotic needle placement procedure.

105 citations

Proceedings Article
01 Jan 2021
TL;DR: MVSNeRF as discussed by the authors proposes a generic deep neural network that can reconstruct radiance fields from only three nearby input views via fast network inference, leveraging plane-swept cost volumes (widely used in multi-view stereo) for geometry-aware scene reasoning.
Abstract: We present MVSNeRF, a novel neural rendering approach that can efficiently reconstruct neural radiance fields for view synthesis. Unlike prior works on neural radiance fields that consider per-scene optimization on densely captured images, we propose a generic deep neural network that can reconstruct radiance fields from only three nearby input views via fast network inference. Our approach leverages plane-swept cost volumes (widely used in multi-view stereo) for geometry-aware scene reasoning, and combines this with physically based volume rendering for neural radiance field reconstruction. We train our network on real objects in the DTU dataset, and test it on three different datasets to evaluate its effectiveness and generalizability. Our approach can generalize across scenes (even indoor scenes, completely different from our training scenes of objects) and generate realistic view synthesis results using only three input images, significantly outperforming concurrent works on generalizable radiance field reconstruction. Moreover, if dense images are captured, our estimated radiance field representation can be easily fine-tuned; this leads to fast per-scene reconstruction with higher rendering quality and substantially less optimization time than NeRF.

94 citations


Cited by
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings Article
04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

55,235 citations

Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,914 citations

Posted Content
TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

44,703 citations

Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations