scispace - formally typeset
Z

Ze Liu

Researcher at Microsoft

Publications -  12
Citations -  3807

Ze Liu is an academic researcher from Microsoft. The author has contributed to research in topics: Computer science & Object detection. The author has an hindex of 6, co-authored 9 publications receiving 512 citations. Previous affiliations of Ze Liu include University of Science and Technology of China.

Papers
More filters
Posted Content

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.

TL;DR: Wang et al. as mentioned in this paper proposed a new vision Transformer called Swin Transformer, which is computed with shifted windows to address the differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text.
Proceedings Article

Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows

TL;DR: Wang et al. as discussed by the authors proposed a new vision Transformer called Swin Transformer, which is computed with shifted windows to address the differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text.
Posted Content

A Closer Look at Local Aggregation Operators in Point Cloud Analysis

TL;DR: A simple local aggregation operator without learnable weights is proposed, named Position Pooling (PosPool), which performs similarly or slightly better than existing sophisticated operators, and outperforms the previous state-of-the methods on the challenging PartNet datasets by a large margin.
Book ChapterDOI

A Closer Look at Local Aggregation Operators in Point Cloud Analysis

TL;DR: Zhou et al. as discussed by the authors proposed a simple local aggregation operator without learnable weights, named Position Pooling (PosPool), which performs similarly or slightly better than existing sophisticated operators.
Posted Content

Video Swin Transformer.

TL;DR: In this article, the authors advocate an inductive bias of locality in video Transformers, which leads to a better speed-accuracy trade-off compared to previous approaches which compute self-attention globally even with spatial-temporal factorization.