scispace - formally typeset
Search or ask a question
Author

Roland Hu

Bio: Roland Hu is an academic researcher from Zhejiang University. The author has contributed to research in topics: Segmentation & Image segmentation. The author has an hindex of 12, co-authored 42 publications receiving 494 citations. Previous affiliations of Roland Hu include University of Southampton & Université catholique de Louvain.

Papers
More filters
Proceedings ArticleDOI
23 Jun 2013
TL;DR: A new shape-driven approach for object segmentation is introduced which uses deep Boltzmann machine to learn the hierarchical architecture of shape priors, and is applied to data-driven variational methods to perform object extraction of corrupted data based on shape probabilistic representation.
Abstract: In this paper we introduce a new shape-driven approach for object segmentation. Given a training set of shapes, we first use deep Boltzmann machine to learn the hierarchical architecture of shape priors. This learned hierarchical architecture is then used to model shape variations of global and local structures in an energetic form. Finally, it is applied to data-driven variational methods to perform object extraction of corrupted data based on shape probabilistic representation. Experiments demonstrate that our model can be applied to dataset of arbitrary prior shapes, and can cope with image noise and clutter, as well as partial occlusions.

88 citations

Posted Content
TL;DR: A novel progressive parameter pruning method, named Structured Probabilistic Pruning (SPP), which effectively prunes weights of convolutional layers in a probabilistic manner and can be directly applied to accelerate multi-branch CNN networks, such as ResNet, without specific adaptations.
Abstract: In this paper, we propose a novel progressive parameter pruning method for Convolutional Neural Network acceleration, named Structured Probabilistic Pruning (SPP), which effectively prunes weights of convolutional layers in a probabilistic manner. Unlike existing deterministic pruning approaches, where unimportant weights are permanently eliminated, SPP introduces a pruning probability for each weight, and pruning is guided by sampling from the pruning probabilities. A mechanism is designed to increase and decrease pruning probabilities based on importance criteria in the training process. Experiments show that, with 4x speedup, SPP can accelerate AlexNet with only 0.3% loss of top-5 accuracy and VGG-16 with 0.8% loss of top-5 accuracy in ImageNet classification. Moreover, SPP can be directly applied to accelerate multi-branch CNN networks, such as ResNet, without specific adaptations. Our 2x speedup ResNet-50 only suffers 0.8% loss of top-5 accuracy on ImageNet. We further show the effectiveness of SPP on transfer learning tasks.

62 citations

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper utilized the process of attribute detection to generate corresponding attribute-part detectors, whose invariance to many influences like poses and camera views can be guaranteed.

50 citations

Journal ArticleDOI
TL;DR: A novel variational model based on prior shapes for simultaneous object classification and segmentation is proposed, and a sparse linear combination of training shapes in a low-dimensional representation is used to regularize the target shape in variational image segmentation.
Abstract: In this paper, a novel variational model based on prior shapes for simultaneous object classification and segmentation is proposed. Given a set of training shapes of multiple object classes, a sparse linear combination of training shapes in a low-dimensional representation is used to regularize the target shape in variational image segmentation. By minimizing the proposed variational functional, the model is able to automatically select the reference shapes that best represent the object by sparse recovery and accurately segment the image, taking into account both the image information and the shape priors. For some applications under an appropriate size of training set, the proposed model allows artificial enlargement of the training set by including a certain number of transformed shapes for transformation invariance, and then the model remains jointly convex and can handle the case of overlapping or multiple objects presented in an image within a small range. Numerical experiments show promising results and the potential of the method for object classification and segmentation.

41 citations

Proceedings ArticleDOI
19 Apr 2009
TL;DR: A blind and robust watermarking method for 3D polygonal meshes is proposed by minimising the mean square error between the original mesh and the watermarked mesh under several constraints.
Abstract: In this paper, we propose a blind and robust watermarking method for 3D polygonal meshes by minimising the mean square error between the original mesh and the watermarked mesh under several constraints. We have formulated the problem of assigning distortions to points in a 3D mesh to a quadratic programming problem, so it can be solved reliably and efficiently. Comparing with similar approaches in [1], experiments indicate the advantages of our method in resisting Gaussian noise.

32 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: It is shown that local combination strategies outperform global methods in segmenting high-contrast structures, while global techniques are less sensitive to noise when contrast between neighboring structures is low.
Abstract: It has been shown that employing multiple atlas images improves segmentation accuracy in atlas-based medical image segmentation. Each atlas image is registered to the target image independently and the calculated transformation is applied to the segmentation of the atlas image to obtain a segmented version of the target image. Several independent candidate segmentations result from the process, which must be somehow combined into a single final segmentation. Majority voting is the generally used rule to fuse the segmentations, but more sophisticated methods have also been proposed. In this paper, we show that the use of global weights to ponderate candidate segmentations has a major limitation. As a means to improve segmentation accuracy, we propose the generalized local weighting voting method. Namely, the fusion weights adapt voxel-by-voxel according to a local estimation of segmentation performance. Using digital phantoms and MR images of the human brain, we demonstrate that the performance of each combination technique depends on the gray level contrast characteristics of the segmented region, and that no fusion method yields better results than the others for all the regions. In particular, we show that local combination strategies outperform global methods in segmenting high-contrast structures, while global techniques are less sensitive to noise when contrast between neighboring structures is low. We conclude that, in order to achieve the highest overall segmentation accuracy, the best combination method for each particular structure must be selected.

546 citations

Posted Content
TL;DR: This paper proposed AutoML for Model Compression (AMC) which leverages reinforcement learning to provide the model compression policy, which outperforms conventional rule-based compression policy by having higher compression ratio, better preserving the accuracy and freeing human labor.
Abstract: Model compression is a critical technique to efficiently deploy neural network models on mobile devices which have limited computation resources and tight power budgets. Conventional model compression techniques rely on hand-crafted heuristics and rule-based policies that require domain experts to explore the large design space trading off among model size, speed, and accuracy, which is usually sub-optimal and time-consuming. In this paper, we propose AutoML for Model Compression (AMC) which leverage reinforcement learning to provide the model compression policy. This learning-based compression policy outperforms conventional rule-based compression policy by having higher compression ratio, better preserving the accuracy and freeing human labor. Under 4x FLOPs reduction, we achieved 2.7% better accuracy than the handcrafted model compression policy for VGG-16 on ImageNet. We applied this automated, push-the-button compression pipeline to MobileNet and achieved 1.81x speedup of measured inference latency on an Android phone and 1.43x speedup on the Titan XP GPU, with only 0.1% loss of ImageNet Top-1 accuracy.

538 citations

Journal ArticleDOI
TL;DR: In this article, a generic training strategy that incorporates anatomical prior knowledge into CNNs through a new regularization model, which is trained end-to-end, encourages models to follow the global anatomical properties of the underlying anatomy via learnt non-linear representations of the shape.
Abstract: Incorporation of prior knowledge about organ shape and location is key to improve performance of image analysis approaches In particular, priors can be useful in cases where images are corrupted and contain artefacts due to limitations in image acquisition The highly constrained nature of anatomical objects can be well captured with learning-based techniques However, in most recent and promising techniques such as CNN-based segmentation it is not obvious how to incorporate such prior knowledge State-of-the-art methods operate as pixel-wise classifiers where the training objectives do not incorporate the structure and inter-dependencies of the output To overcome this limitation, we propose a generic training strategy that incorporates anatomical prior knowledge into CNNs through a new regularisation model, which is trained end-to-end The new framework encourages models to follow the global anatomical properties of the underlying anatomy ( eg shape, label structure) via learnt non-linear representations of the shape We show that the proposed approach can be easily adapted to different analysis tasks ( eg image enhancement, segmentation) and improve the prediction accuracy of the state-of-the-art models The applicability of our approach is shown on multi-modal cardiac data sets and public benchmarks In addition, we demonstrate how the learnt deep models of 3-D shapes can be interpreted and used as biomarkers for classification of cardiac pathologies

529 citations

Journal ArticleDOI
TL;DR: This work proposes a generic training strategy that incorporates anatomical prior knowledge into CNNs through a new regularisation model, which is trained end-to-end and demonstrates how the learnt deep models of 3-D shapes can be interpreted and used as biomarkers for classification of cardiac pathologies.
Abstract: Incorporation of prior knowledge about organ shape and location is key to improve performance of image analysis approaches. In particular, priors can be useful in cases where images are corrupted and contain artefacts due to limitations in image acquisition. The highly constrained nature of anatomical objects can be well captured with learning based techniques. However, in most recent and promising techniques such as CNN based segmentation it is not obvious how to incorporate such prior knowledge. State-of-the-art methods operate as pixel-wise classifiers where the training objectives do not incorporate the structure and inter-dependencies of the output. To overcome this limitation, we propose a generic training strategy that incorporates anatomical prior knowledge into CNNs through a new regularisation model, which is trained end-to-end. The new framework encourages models to follow the global anatomical properties of the underlying anatomy (e.g. shape, label structure) via learned non-linear representations of the shape. We show that the proposed approach can be easily adapted to different analysis tasks (e.g. image enhancement, segmentation) and improve the prediction accuracy of the state-of-the-art models. The applicability of our approach is shown on multi-modal cardiac datasets and public benchmarks. Additionally, we demonstrate how the learned deep models of 3D shapes can be interpreted and used as biomarkers for classification of cardiac pathologies.

482 citations

Journal ArticleDOI
Lin Wang1, Kuk-Jin Yoon1
TL;DR: This paper provides a comprehensive survey on the recent progress of KD methods together with S-T frameworks typically used for vision tasks and systematically analyzes the research status of KD in vision applications.
Abstract: Deep neural models, in recent years, have been successful in almost every field. However, these models are huge, demanding heavy computation power. Besides, the performance boost is highly dependent on redundant labeled data. To achieve faster speeds and to handle the problems caused by the lack of labeled data, knowledge distillation (KD) has been proposed to transfer information learned from one model to another. KD is often characterized by the so-called ‘Student-Teacher’ (S-T) learning framework and has been broadly applied in model compression and knowledge transfer. This paper is about KD and S-T learning, which are being actively studied in recent years. First, we aim to provide explanations of what KD is and how/why it works. Then, we provide a comprehensive survey on the recent progress of KD methods together with S-T frameworks typically used for vision tasks. In general, we investigate some fundamental questions that have been driving this research area and thoroughly generalize the research progress and technical details. Additionally, we systematically analyze the research status of KD in vision applications. Finally, we discuss the potentials and open challenges of existing methods and prospect the future directions of KD and S-T learning.

254 citations