scispace - formally typeset
Search or ask a question
Author

Haoyu Ma

Other affiliations: Southeast University, Tencent
Bio: Haoyu Ma is an academic researcher from University of California, Irvine. The author has contributed to research in topics: Computer science & Pose. The author has an hindex of 6, co-authored 20 publications receiving 83 citations. Previous affiliations of Haoyu Ma include Southeast University & Tencent.

Papers
More filters
Proceedings ArticleDOI
Fan Xu1, Haoyu Ma1, Junxiao Sun1, Rui Wu1, Xu Liu1, Youyong Kong1 
05 Jul 2019
TL;DR: This work proposed an architecture for brain tumor segmentation in multi- modal magnetic resonance images (MRI), named LSTM multi-modal UNet, and shows that this method outperforms the state-of-the-art biomedical segmentation approaches.
Abstract: Deep learning models such as convolutional neural network has been widely used in 3D biomedical image segmentation. However, most of them neither consider the correlations between different modalities, nor fully exploit depth information. To better leverage the multi-modalities and depth information, we proposed an architecture for brain tumor segmentation in multi- modal magnetic resonance images (MRI), named LSTM multi- modal UNet. Experiments results on BRATS-2015 show that our method outperforms the state-of-the-art biomedical segmentation approaches.

47 citations

Proceedings ArticleDOI
01 Mar 2020
TL;DR: In this paper, a nonparametric structure regularization machine (NSRM) is proposed to learn hand structure and keypoint representations jointly, guided by synthetic hand mask representations, and further strengthened by a novel probabilistic representation of hand limbs and an anatomically inspired composition strategy of mask synthesis.
Abstract: Hand pose estimation is more challenging than body pose estimation due to severe articulation, self-occlusion and high dexterity of the hand. Current approaches often rely on a popular body pose algorithm, such as the Convolutional Pose Machine (CPM), to learn 2D keypoint features. These algorithms cannot adequately address the unique challenges of hand pose estimation, because they are trained solely based on keypoint positions without seeking to explicitly model structural relationship between them. We propose a novel Nonparametric Structure Regularization Machine (NSRM) for 2D hand pose estimation, adopting a cascade multi-task architecture to learn hand structure and keypoint representations jointly. The structure learning is guided by synthetic hand mask representations, which are directly computed from keypoint positions, and is further strengthened by a novel probabilistic representation of hand limbs and an anatomically inspired composition strategy of mask synthesis. We conduct extensive studies on two public datasets - OneHand 10k and CMU Panoptic Hand. Experimental results demonstrate that explicitly enforcing structure learning consistently improves pose estimation accuracy of CPM baseline models, by 1.17% on the first dataset and 4.01% on the second one. The implementation and experiment code is freely available online1. Our proposal of incorporating structural learning to hand pose estimation requires no additional training information, and can be a generic add-on module to other pose estimation models.

34 citations

Posted Content
TL;DR: In this article, Axial Fusion Transformer UNet (AFTer-UNet) is proposed, which takes both advantages of convolutional layers' capability of extracting detailed features and transformers' strength on long sequence modeling.
Abstract: Recent advances in transformer-based models have drawn attention to exploring these techniques in medical image segmentation, especially in conjunction with the U-Net model (or its variants), which has shown great success in medical image segmentation, under both 2D and 3D settings. Current 2D based methods either directly replace convolutional layers with pure transformers or consider a transformer as an additional intermediate encoder between the encoder and decoder of U-Net. However, these approaches only consider the attention encoding within one single slice and do not utilize the axial-axis information naturally provided by a 3D volume. In the 3D setting, convolution on volumetric data and transformers both consume large GPU memory. One has to either downsample the image or use cropped local patches to reduce GPU memory usage, which limits its performance. In this paper, we propose Axial Fusion Transformer UNet (AFTer-UNet), which takes both advantages of convolutional layers' capability of extracting detailed features and transformers' strength on long sequence modeling. It considers both intra-slice and inter-slice long-range cues to guide the segmentation. Meanwhile, it has fewer parameters and takes less GPU memory to train than the previous transformer-based models. Extensive experiments on three multi-organ segmentation datasets demonstrate that our method outperforms current state-of-the-art methods.

30 citations

Proceedings Article
20 Feb 2022
TL;DR: Two alternatives for sparse adversarial training are introduced: static sparsity and dynamic sparsity, both of which allow the sparse subnetwork to adaptively adjust its connectivity pattern (while sticking to the same sparsity ratio) throughout training.
Abstract: Recent studies demonstrate that deep networks, even robustified by the state-of-the-art adversarial training (AT), still suffer from large robust generalization gaps, in addition to the much more expensive training costs than standard training. In this paper, we investigate this intriguing problem from a new perspective, i.e., injecting appropriate forms of sparsity during adversarial training. We introduce two alternatives for sparse adversarial training: (i) static sparsity, by leveraging recent results from the lottery ticket hypothesis to identify critical sparse subnetworks arising from the early training; (ii) dynamic sparsity, by allowing the sparse subnetwork to adaptively adjust its connectivity pattern (while sticking to the same sparsity ratio) throughout training. We find both static and dynamic sparse methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting, meanwhile significantly saving training and inference FLOPs. Extensive experiments validate our proposals with multiple network architectures on diverse datasets, including CIFAR-10/100 and Tiny-ImageNet. For example, our methods reduce robust generalization gap and overfitting by 34.44% and 4.02%, with comparable robust/standard accuracy boosts and 87.83%/87.82% training/inference FLOPs savings on CIFAR-100 with ResNet-18. Besides, our approaches can be organically combined with existing regularizers, establishing new state-of-the-art results in AT. Codes are available in https://github.com/VITA-Group/Sparsity-Win-Robust-Generalization.

26 citations

Proceedings Article
01 Jan 2019
TL;DR: A new architecture called Adaptive Graphical Model Network (AGMN) is proposed to tackle the task of 2D hand pose estimation from a monocular RGB image and outperforms the state-of-the-art method used in 2DHand keypoints estimation by a notable margin on two public datasets.
Abstract: In this paper, we propose a new architecture called Adaptive Graphical Model Network (AGMN) to tackle the task of 2D hand pose estimation from a monocular RGB image. The AGMN consists of two branches of deep convolutional neural networks for calculating unary and pairwise potential functions, followed by a graphical model inference module for integrating unary and pairwise potentials. Unlike existing architectures proposed to combine DCNNs with graphical models, our AGMN is novel in that the parameters of its graphical model are conditioned on and fully adaptive to individual input images. Experiments show that our approach outperforms the state-of-the-art method used in 2D hand keypoints estimation by a notable margin on two public datasets.

16 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A novel model based on 3D fully convolutional network is proposed that applies multi-pathway architecture to feature extraction so as to effectively extract features from multi-modal MRI images.

70 citations

Posted Content
TL;DR: This work proposes to complete existing databases by generating new database entries by synthesizing data in the skeleton space (instead of doing so in the depth-map space) which enables an easy and intuitive way of manipulating data entries.
Abstract: Crucial to the success of training a depth-based 3D hand pose estimator (HPE) is the availability of comprehensive datasets covering diverse camera perspectives, shapes, and pose variations. However, collecting such annotated datasets is challenging. We propose to complete existing databases by generating new database entries. The key idea is to synthesize data in the skeleton space (instead of doing so in the depth-map space) which enables an easy and intuitive way of manipulating data entries. Since the skeleton entries generated in this way do not have the corresponding depth map entries, we exploit them by training a separate hand pose generator (HPG) which synthesizes the depth map from the skeleton entries. By training the HPG and HPE in a single unified optimization framework enforcing that 1) the HPE agrees with the paired depth and skeleton entries; and 2) the HPG-HPE combination satisfies the cyclic consistency (both the input and the output of HPG-HPE are skeletons) observed via the newly generated unpaired skeletons, our algorithm constructs a HPE which is robust to variations that go beyond the coverage of the existing database. Our training algorithm adopts the generative adversarial networks (GAN) training process. As a by-product, we obtain a hand pose discriminator (HPD) that is capable of picking out realistic hand poses. Our algorithm exploits this capability to refine the initial skeleton estimates in testing, further improving the accuracy. We test our algorithm on four challenging benchmark datasets (ICVL, MSRA, NYU and Big Hand 2.2M datasets) and demonstrate that our approach outperforms or is on par with state-of-the-art methods quantitatively and qualitatively.

56 citations

Journal ArticleDOI
TL;DR: In this article , the authors focused on linking risk-of-bias (RoB) and different AI-based architectures in the DL framework, and presented a set of three primary and six secondary recommendations for lowering the RoB.

38 citations

Journal ArticleDOI
TL;DR: A comprehensive survey of three, recently proposed, major brain tumor segmentation and classification model techniques, namely, region growing, shallow machine learning and deep learning, can be found in this paper.
Abstract: A brain Magnetic resonance imaging (MRI) scan of a single individual consists of several slices across the 3D anatomical view. Therefore, manual segmentation of brain tumors from magnetic resonance (MR) images is a challenging and time-consuming task. In addition, an automated brain tumor classification from an MRI scan is non-invasive so that it avoids biopsy and make the diagnosis process safer. Since the beginning of this millennia and late nineties, the effort of the research community to come-up with automatic brain tumor segmentation and classification method has been tremendous. As a result, there are ample literature on the area focusing on segmentation using region growing, traditional machine learning and deep learning methods. Similarly, a number of tasks have been performed in the area of brain tumor classification into their respective histological type, and an impressive performance results have been obtained. Considering state of-the-art methods and their performance, the purpose of this paper is to provide a comprehensive survey of three, recently proposed, major brain tumor segmentation and classification model techniques, namely, region growing, shallow machine learning and deep learning. The established works included in this survey also covers technical aspects such as the strengths and weaknesses of different approaches, pre- and post-processing techniques, feature extraction, datasets, and models’ performance evaluation metrics.

37 citations

Proceedings Article
28 Feb 2022
TL;DR: This work proposes a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted, and demonstrates state-of-the-art test accuracy against label noise on a variety of real datasets.
Abstract: Recently, over-parameterized deep networks, with increasingly more network parameters than training samples, have dominated the performances of modern machine learning. However, when the training data is corrupted, it has been well-known that over-parameterized networks tend to overfit and do not generalize. In this work, we propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted. The main idea is yet very simple: label noise is sparse and incoherent with the network learned from clean data, so we model the noise and learn to separate it from the data. Specifically, we model the label noise via another sparse over-parameterization term, and exploit implicit algorithmic regularizations to recover and separate the underlying corruptions. Remarkably, when trained using such a simple method in practice, we demonstrate state-of-the-art test accuracy against label noise on a variety of real datasets. Furthermore, our experimental results are corroborated by theory on simplified linear models, showing that exact separation between sparse noise and low-rank data can be achieved under incoherent conditions. The work opens many interesting directions for improving over-parameterized models by using sparse over-parameterization and implicit regularization.

34 citations