scispace - formally typeset
Open AccessJournal ArticleDOI

PCT: Point cloud transformer

TLDR
Point Cloud Transformer (PCT) as mentioned in this paper is based on Transformer, which is inherently permutation invariant for processing a sequence of points, making it well suited for point cloud learning.
Abstract
The irregular domain and lack of ordering make it challenging to design deep neural networks for point cloud processing. This paper presents a novel framework named Point Cloud Transformer (PCT) for point cloud learning. PCT is based on Transformer, which achieves huge success in natural language processing and displays great potential in image processing. It is inherently permutation invariant for processing a sequence of points, making it well-suited for point cloud learning. To better capture local context within the point cloud, we enhance input embedding with the support of farthest point sampling and nearest neighbor search. Extensive experiments demonstrate that the PCT achieves the state-of-the-art performance on shape classification, part segmentation, semantic segmentation, and normal estimation tasks.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Attention mechanisms in computer vision: A survey

TL;DR: Guo et al. as mentioned in this paper provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention.
Posted Content

Attention Mechanisms in Computer Vision: A Survey.

TL;DR: A comprehensive review of attention mechanisms in computer vision can be found in this article, which categorizes them according to approach, such as channel attention, spatial attention, temporal attention and branch attention.
Journal ArticleDOI

A Survey on Vision Transformer

TL;DR: Transformer as discussed by the authors is a type of deep neural network mainly based on the self-attention mechanism, which has been applied to the field of natural language processing, and has been shown to perform similar to or better than other types of networks such as convolutional and recurrent neural networks.
Journal ArticleDOI

PRA-Net: Point Relation-Aware Network for 3D Point Cloud Analysis

TL;DR: Chen et al. as mentioned in this paper proposed a Point Relation-Aware Network (PRA-Net), which is composed of an Intra-region Structure Learning (ISL) module and an Inter-region Relation Learning (IRL), which dynamically integrate the local structural information into the point features, while the IRL module captures inter-region relations adaptively and efficiently via a differentiable region partition scheme and a representative point-based strategy.
Posted Content

Perceiver: General Perception with Iterative Attention

TL;DR: The Perceiver as mentioned in this paper is a model that builds upon Transformers and hence makes few architectural assumptions about the relationship between its inputs, but that also scales to hundreds of thousands of inputs, like ConvNets.
References
More filters
Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Journal ArticleDOI

Squeeze-and-Excitation Networks

TL;DR: This work proposes a novel architectural unit, which is term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and finds that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at minimal additional computational cost.
Posted Content

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

TL;DR: Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
Proceedings ArticleDOI

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

TL;DR: This paper designs a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input and provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.
Related Papers (5)