scispace - formally typeset
Search or ask a question
Author

Canqun Xiang

Bio: Canqun Xiang is an academic researcher from Shenzhen University. The author has contributed to research in topics: Object detection & Feature learning. The author has an hindex of 3, co-authored 12 publications receiving 123 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A multi-scale capsule network that is more robust and efficient for feature representation in image classification and has a competitive performance on FashionMNIST and CIFAR10 datasets is proposed.
Abstract: Capsule network is a novel architecture to encode the properties and spatial relationships of the feature in an image, which shows encouraging results on image classification. However, the original capsule network is not suitable for some classification tasks, where the target objects are complex internal representations. Hence, we propose a multi-scale capsule network that is more robust and efficient for feature representation in image classification. The proposed multi-scale capsule network consists of two stages. In the first stage, structural and semantic information are obtained by multi-scale feature extraction. In the second stage, the hierarchy of features is encoded to multi-dimensional primary capsules. Moreover, we propose an improved dropout to enhance the robustness of the capsule network. Experimental results show that our method has a competitive performance on FashionMNIST and CIFAR10 datasets.

181 citations

Proceedings ArticleDOI
16 Jun 2019
TL;DR: The proposed 6D-VNet extends Mask R-CNN by adding customised heads for predicting vehicle's finer class, rotation and translation, and takes the spatial neighbouring information into consideration whilst counteracting the effect of extreme gradient values.
Abstract: We present a conceptually simple framework for 6DoF object pose estimation, especially for autonomous driving scenario. Our approach efficiently detects traffic participants in a monocular RGB image while simultaneously regressing their 3D translation and rotation vectors. The method, called 6D-VNet, extends Mask R-CNN by adding customised heads for predicting vehicle's finer class, rotation and translation. The proposed 6D-VNet is trained end-to-end compared to previous methods. Furthermore, we show that the inclusion of translational regression in the joint losses is crucial for the 6DoF pose estimation task, where object translation distance along longitudinal axis varies significantly, e.g., in autonomous driving scenarios. Additionally, we incorporate the mutual information between traffic participants via a modified non-local block. As opposed to the original non-local block implementation, the proposed weighting modification takes the spatial neighbouring information into consideration whilst counteracting the effect of extreme gradient values. Our 6D-VNet reaches the 1 st place in ApolloScape challenge 3D Car Instance task. Code has been made available at: https://github.com/stevenwudi/6DVNET .

31 citations

Journal ArticleDOI
TL;DR: Huang et al. as mentioned in this paper proposed 6D-VNet, which extends Mask R-CNN by adding customised heads for predicting vehicle's finer class, rotation and translation.
Abstract: We present a conceptually simple framework for 6DoF object pose estimation, especially for autonomous driving scenarios Our approach can efficiently detect the traffic participants from a monocular RGB image while simultaneously regressing their 3D translation and rotation vectors The proposed method 6D-VNet, extends the Mask R-CNN by adding customised heads for predicting vehicle’s finer class, rotation and translation It is trained end-to-end compared to previous methods Furthermore, we show that the inclusion of translational regression in the joint losses is crucial for the 6DoF pose estimation task, where object translation distance along longitudinal axis varies significantly, eg, in autonomous driving scenarios Additionally, we incorporate the mutual information between traffic participants via a modified non-local block to capture the spatial dependencies among the detected objects As opposed to the original non-local block implementation, the proposed weighting modification takes the spatial neighbouring information into consideration whilst counteracting the effect of extreme gradient values We evaluate our method on the challenging real-world Pascal3D+ dataset and our 6D-VNet reaches the 1st place in ApolloScape challenge 3D Car Instance task (Apolloscape, 2018), (Huang et al , 2018)

13 citations

Journal ArticleDOI
Canqun Xiang1, Zhennan Wang1, Shishun Tian1, Jianxin Liao1, Wenbin Zou1, Chen Xu1 
TL;DR: A matrix capsule convolution projection (MCCP) module is proposed by replacing the feature vector with a feature matrix, of which each column represents a local feature, and the CapDetNet is designed to explore the structural information encoding of the MCCP module based on object detection task.
Abstract: Capsule projection network (CapProNet) has shown its ability to obtain semantic information, and spatial structural information from the raw images. However, the vector capsule of CapProNet has limitations in representing semantic information due to ignoring local information. Besides, the number of trainable parameters also increases greatly with the dimension of the feature vector. To that end, we propose a matrix capsule convolution projection (MCCP) module by replacing the feature vector with a feature matrix, of which each column represents a local feature. The feature matrix is then convoluted by columns into capsule subspaces to decrease the number of trainable parameters effectively. Furthermore, the CapDetNet is designed to explore the structural information encoding of the MCCP module based on object detection task. Experimental results demonstrate that the proposed MCCP outperforms the baselines in image classification, and CapDetNet achieves the 2.3% performance gain in object detection.

5 citations

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a two-stage strong correlation learning framework, abbreviated as SC-RPN, which aims to set up stronger relationship among different modules in the region proposal task.
Abstract: Current state-of-the-art two-stage detectors heavily rely on region proposals to guide the accurate detection for objects. In previous region proposal approaches, the interaction between different functional modules is correlated weakly, which limits or decreases the performance of region proposal approaches. In this paper, we propose a novel two-stage strong correlation learning framework, abbreviated as SC-RPN, which aims to set up stronger relationship among different modules in the region proposal task. Firstly, we propose a Light-weight IoU-Mask branch to predict intersection-over-union (IoU) mask and refine region classification scores as well, it is used to prevent high-quality region proposals from being filtered. Furthermore, a sampling strategy named Size-Aware Dynamic Sampling (SADS) is proposed to ensure sampling consistency between different stages. In addition, point-based representation is exploited to generate region proposals with stronger fitting ability. Without bells and whistles, SC-RPN achieves AR1000 14.5% higher than that of Region Proposal Network (RPN), surpassing all the existing region proposal approaches. We also integrate SC-RPN into Fast R-CNN and Faster R-CNN to test its effectiveness on object detection task, the experimental results achieve a gain of 3.2% and 3.8% in terms of mAP compared to the original ones.

5 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: An effective remote sensing image scene classification architecture named CNN-CapsNet is proposed to make full use of the merits of these two models: CNN and CapsNet to lead to a competitive classification performance compared with the state-of-the-art methods.
Abstract: Remote sensing image scene classification is one of the most challenging problems in understanding high-resolution remote sensing images. Deep learning techniques, especially the convolutional neural network (CNN), have improved the performance of remote sensing image scene classification due to the powerful perspective of feature learning and reasoning. However, several fully connected layers are always added to the end of CNN models, which is not efficient in capturing the hierarchical structure of the entities in the images and does not fully consider the spatial information that is important to classification. Fortunately, capsule network (CapsNet), which is a novel network architecture that uses a group of neurons as a capsule or vector to replace the neuron in the traditional neural network and can encode the properties and spatial information of features in an image to achieve equivariance, has become an active area in the classification field in the past two years. Motivated by this idea, this paper proposes an effective remote sensing image scene classification architecture named CNN-CapsNet to make full use of the merits of these two models: CNN and CapsNet. First, a CNN without fully connected layers is used as an initial feature maps extractor. In detail, a pretrained deep CNN model that was fully trained on the ImageNet dataset is selected as a feature extractor in this paper. Then, the initial feature maps are fed into a newly designed CapsNet to obtain the final classification result. The proposed architecture is extensively evaluated on three public challenging benchmark remote sensing image datasets: the UC Merced Land-Use dataset with 21 scene categories, AID dataset with 30 scene categories, and the NWPU-RESISC45 dataset with 45 challenging scene categories. The experimental results demonstrate that the proposed method can lead to a competitive classification performance compared with the state-of-the-art methods.

254 citations

Proceedings ArticleDOI
24 Feb 2021
TL;DR: GDR-Net as mentioned in this paper proposes a geometry-guided direct regression network to learn the 6D pose in an end-to-end manner from dense correspondence-based intermediate geometric representations.
Abstract: 6D pose estimation from a single RGB image is a fundamental task in computer vision. The current top-performing deep learning-based methods rely on an indirect strategy, i.e., first establishing 2D-3D correspondences between the coordinates in the image plane and object coordinate system, and then applying a variant of the PnP/RANSAC algorithm. However, this two-stage pipeline is not end-to-end trainable, thus is hard to be employed for many tasks requiring differentiable poses. On the other hand, methods based on direct regression are currently inferior to geometry-based methods. In this work, we perform an in-depth investigation on both direct and indirect methods, and propose a simple yet effective Geometry-guided Direct Regression Network (GDR-Net) to learn the 6D pose in an end-to-end manner from dense correspondence-based intermediate geometric representations. Extensive experiments show that our approach remarkably outperforms state-of-the-art methods on LM, LM-O and YCB-V datasets. Code is available at https://git.io/GDR-Net.

147 citations

Journal ArticleDOI
TL;DR: A comprehensive review of the state of the art architectures, tools and methodologies in existing implementations of capsule networks highlights the successes, failures and opportunities for further research to serve as a motivation to researchers and industry players to exploit the full potential of this new field.

135 citations

Proceedings ArticleDOI
12 May 2019
TL;DR: In this article, a modified CapsNet architecture is proposed for brain tumor classification, which takes the tumor coarse boundaries as extra inputs within its pipeline to increase the CapsNet's focus.
Abstract: According to official statistics, cancer is considered as the second leading cause of human fatalities. Among different types of cancer, brain tumor is seen as one of the deadliest forms due to its aggressive nature, heterogeneous characteristics, and low relative survival rate. Determining the type of brain tumor has significant impact on the treatment choice and patient’s survival. Human-centered diagnosis is typically error-prone and unreliable resulting in a recent surge of interest to automatize this process using convolutional neural networks (CNNs). CNNs, however, fail to fully utilize spatial relations, which is particularly harmful for tumor classification, as the relation between the tumor and its surrounding tissue is a critical indicator of the tumor’s type. In our recent work, we have incorporated newly developed CapsNets to overcome this shortcoming. CapsNets are, however, highly sensitive to the miscellaneous image background. The paper addresses this gap. The main contribution is to equip CapsNet with access to the tumor surrounding tissues, without distracting it from the main target. A modified CapsNet architecture is, therefore, proposed for brain tumor classification, which takes the tumor coarse boundaries as extra inputs within its pipeline to increase the CapsNet’s focus. The proposed approach noticeably outperforms its counterparts.

129 citations

Posted Content
TL;DR: A modified CapsNet architecture is proposed for brain tumor classification, which takes the tumor coarse boundaries as extra inputs within its pipeline to increase the CapsNet’s focus, and noticeably outperforms its counterparts.
Abstract: According to official statistics, cancer is considered as the second leading cause of human fatalities. Among different types of cancer, brain tumor is seen as one of the deadliest forms due to its aggressive nature, heterogeneous characteristics, and low relative survival rate. Determining the type of brain tumor has significant impact on the treatment choice and patient's survival. Human-centered diagnosis is typically error-prone and unreliable resulting in a recent surge of interest to automatize this process using convolutional neural networks (CNNs). CNNs, however, fail to fully utilize spatial relations, which is particularly harmful for tumor classification, as the relation between the tumor and its surrounding tissue is a critical indicator of the tumor's type. In our recent work, we have incorporated newly developed CapsNets to overcome this shortcoming. CapsNets are, however, highly sensitive to the miscellaneous image background. The paper addresses this gap. The main contribution is to equip CapsNet with access to the tumor surrounding tissues, without distracting it from the main target. A modified CapsNet architecture is, therefore, proposed for brain tumor classification, which takes the tumor coarse boundaries as extra inputs within its pipeline to increase the CapsNet's focus. The proposed approach noticeably outperforms its counterparts.

126 citations