Z
Ze Liu
Researcher at Microsoft
Publications - 12
Citations - 3807
Ze Liu is an academic researcher from Microsoft. The author has contributed to research in topics: Computer science & Object detection. The author has an hindex of 6, co-authored 9 publications receiving 512 citations. Previous affiliations of Ze Liu include University of Science and Technology of China.
Papers
More filters
Posted Content
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.
TL;DR: Wang et al. as mentioned in this paper proposed a new vision Transformer called Swin Transformer, which is computed with shifted windows to address the differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text.
Proceedings Article
Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows
TL;DR: Wang et al. as discussed by the authors proposed a new vision Transformer called Swin Transformer, which is computed with shifted windows to address the differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text.
Posted Content
A Closer Look at Local Aggregation Operators in Point Cloud Analysis
TL;DR: A simple local aggregation operator without learnable weights is proposed, named Position Pooling (PosPool), which performs similarly or slightly better than existing sophisticated operators, and outperforms the previous state-of-the methods on the challenging PartNet datasets by a large margin.
Book ChapterDOI
A Closer Look at Local Aggregation Operators in Point Cloud Analysis
TL;DR: Zhou et al. as discussed by the authors proposed a simple local aggregation operator without learnable weights, named Position Pooling (PosPool), which performs similarly or slightly better than existing sophisticated operators.
Posted Content
Video Swin Transformer.
TL;DR: In this article, the authors advocate an inductive bias of locality in video Transformers, which leads to a better speed-accuracy trade-off compared to previous approaches which compute self-attention globally even with spatial-temporal factorization.