PCT: Point cloud transformer
TLDR
Point Cloud Transformer (PCT) as mentioned in this paper is based on Transformer, which is inherently permutation invariant for processing a sequence of points, making it well suited for point cloud learning.Abstract:
The irregular domain and lack of ordering make it challenging to design deep neural networks for point cloud processing. This paper presents a novel framework named Point Cloud Transformer (PCT) for point cloud learning. PCT is based on Transformer, which achieves huge success in natural language processing and displays great potential in image processing. It is inherently permutation invariant for processing a sequence of points, making it well-suited for point cloud learning. To better capture local context within the point cloud, we enhance input embedding with the support of farthest point sampling and nearest neighbor search. Extensive experiments demonstrate that the PCT achieves the state-of-the-art performance on shape classification, part segmentation, semantic segmentation, and normal estimation tasks.read more
Citations
More filters
Journal ArticleDOI
Attention mechanisms in computer vision: A survey
TL;DR: Guo et al. as mentioned in this paper provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention.
Posted Content
Attention Mechanisms in Computer Vision: A Survey.
Meng-Hao Guo,Tian-Xing Xu,Jiangjiang Liu,Zheng-Ning Liu,Peng-Tao Jiang,Tai-Jiang Mu,Song-Hai Zhang,Ralph R. Martin,Ming-Ming Cheng,Shi-Min Hu +9 more
TL;DR: A comprehensive review of attention mechanisms in computer vision can be found in this article, which categorizes them according to approach, such as channel attention, spatial attention, temporal attention and branch attention.
Journal ArticleDOI
A Survey on Vision Transformer
TL;DR: Transformer as discussed by the authors is a type of deep neural network mainly based on the self-attention mechanism, which has been applied to the field of natural language processing, and has been shown to perform similar to or better than other types of networks such as convolutional and recurrent neural networks.
Journal ArticleDOI
PRA-Net: Point Relation-Aware Network for 3D Point Cloud Analysis
TL;DR: Chen et al. as mentioned in this paper proposed a Point Relation-Aware Network (PRA-Net), which is composed of an Intra-region Structure Learning (ISL) module and an Inter-region Relation Learning (IRL), which dynamically integrate the local structural information into the point features, while the IRL module captures inter-region relations adaptively and efficiently via a differentiable region partition scheme and a representative point-based strategy.
Posted Content
Perceiver: General Perception with Iterative Attention
TL;DR: The Perceiver as mentioned in this paper is a model that builds upon Transformers and hence makes few architectural assumptions about the relationship between its inputs, but that also scales to hundreds of thousands of inputs, like ConvNets.
References
More filters
Proceedings ArticleDOI
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Proceedings Article
Neural Machine Translation by Jointly Learning to Align and Translate
TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Journal ArticleDOI
Squeeze-and-Excitation Networks
TL;DR: This work proposes a novel architectural unit, which is term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and finds that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at minimal additional computational cost.
Posted Content
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy,Lucas Beyer,Alexander Kolesnikov,Dirk Weissenborn,Xiaohua Zhai,Thomas Unterthiner,Mostafa Dehghani,Matthias Minderer,Georg Heigold,Sylvain Gelly,Jakob Uszkoreit,Neil Houlsby +11 more
TL;DR: Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
Proceedings ArticleDOI
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
TL;DR: This paper designs a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input and provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.