PCT: Point cloud transformer

doi:10.1007/S41095-021-0229-5

Open AccessJournal ArticleDOI

PCT: Point cloud transformer

Meng-Hao Guo, +5 more

- 26 Jun 2021 -

Computational Visual Media

- Vol. 7, Iss: 2, pp 187-199

TLDR

Point Cloud Transformer (PCT) as mentioned in this paper is based on Transformer, which is inherently permutation invariant for processing a sequence of points, making it well suited for point cloud learning.

Abstract:

The irregular domain and lack of ordering make it challenging to design deep neural networks for point cloud processing. This paper presents a novel framework named Point Cloud Transformer (PCT) for point cloud learning. PCT is based on Transformer, which achieves huge success in natural language processing and displays great potential in image processing. It is inherently permutation invariant for processing a sequence of points, making it well-suited for point cloud learning. To better capture local context within the point cloud, we enhance input embedding with the support of farthest point sampling and nearest neighbor search. Extensive experiments demonstrate that the PCT achieves the state-of-the-art performance on shape classification, part segmentation, semantic segmentation, and normal estimation tasks.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Attention mechanisms in computer vision: A survey

- 15 Mar 2022 -

Computational Visual Media

TL;DR: Guo et al. as mentioned in this paper provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention.

...read moreread less

Posted Content

Attention Mechanisms in Computer Vision: A Survey.

Meng-Hao Guo, +9 more

- 15 Nov 2021 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A comprehensive review of attention mechanisms in computer vision can be found in this article, which categorizes them according to approach, such as channel attention, spatial attention, temporal attention and branch attention.

...read moreread less

Journal ArticleDOI

A Survey on Vision Transformer

- 01 Jan 2023 -

IEEE Transactions on Pattern Analysis an...

TL;DR: Transformer as discussed by the authors is a type of deep neural network mainly based on the self-attention mechanism, which has been applied to the field of natural language processing, and has been shown to perform similar to or better than other types of networks such as convolutional and recurrent neural networks.

...read moreread less

Journal ArticleDOI

PRA-Net: Point Relation-Aware Network for 3D Point Cloud Analysis

Silin Cheng, +4 more

- 15 Apr 2021 -

IEEE Transactions on Image Processing

TL;DR: Chen et al. as mentioned in this paper proposed a Point Relation-Aware Network (PRA-Net), which is composed of an Intra-region Structure Learning (ISL) module and an Inter-region Relation Learning (IRL), which dynamically integrate the local structural information into the point features, while the IRL module captures inter-region relations adaptively and efficiently via a differentiable region partition scheme and a representative point-based strategy.

...read moreread less

Posted Content

Perceiver: General Perception with Iterative Attention

Andrew Jaegle, +5 more

- 04 Mar 2021 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: The Perceiver as mentioned in this paper is a model that builds upon Transformers and hence makes few architectural assumptions about the relationship between its inputs, but that also scales to hundreds of thousands of inputs, like ConvNets.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Journal ArticleDOI

Squeeze-and-Excitation Networks

Jie Hu, +4 more

TL;DR: This work proposes a novel architectural unit, which is term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and finds that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at minimal additional computational cost.

...read moreread less

Posted Content

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, +11 more

- 22 Oct 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

...read moreread less

Proceedings ArticleDOI

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

R. Qi Charles, +3 more

TL;DR: This paper designs a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input and provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.

...read moreread less

Collapse

ACM Transactions on Graphics

3D ShapeNets: A deep representation for volumetric shapes

Zhirong Wu, +6 more

PCT: Point cloud transformer

Citations

Attention mechanisms in computer vision: A survey

Attention Mechanisms in Computer Vision: A Survey.

A Survey on Vision Transformer

PRA-Net: Point Relation-Aware Network for 3D Point Cloud Analysis

Perceiver: General Perception with Iterative Attention

References

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Neural Machine Translation by Jointly Learning to Align and Translate

Squeeze-and-Excitation Networks

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

Related Papers (5)

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Attention is All you Need

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

Dynamic Graph CNN for Learning on Point Clouds

3D ShapeNets: A deep representation for volumetric shapes