Z
Ziwei Liu
Publications - 17
Citations - 215
Ziwei Liu is an academic researcher. The author has contributed to research in topics: Computer science & Transformer (machine learning model). The author has an hindex of 2, co-authored 3 publications receiving 43 citations.
Papers
More filters
Posted Content
Incorporating Convolution Designs into Visual Transformers.
TL;DR: CeiT as discussed by the authors combines the advantages of CNNs in extracting low-level features, strengthening locality, and the advantage of Transformers in establishing long-range dependencies, which can reduce the training cost significantly.
Journal ArticleDOI
Neural Prompt Search
TL;DR: This paper proposes Neural prOmpt seArcH (NOAH), a novel approach that learns, for large vision models, the optimal design of prompt modules through a neural architecture search algorithm, specifically for each downstream dataset.
Proceedings ArticleDOI
StyleSwap: Style-Based Generator Empowers Robust Face Swapping
Zhiliang Xu,Hang Zhou,Zhibin Hong,Ziwei Liu,Jiaming Liu,Zhizhi Guo,Junyu Han,Jingtuo Liu,Errui Ding,Jingdong Wang +9 more
TL;DR: The core idea is to leverage a style-based generator to empower high-fidelity and robust face swapping, thus the generator’s advantage can be adopted for optimizing identity similarity.
Journal ArticleDOI
StyleFaceV: Face Video Generation via Decomposing and Recomposing Pretrained StyleGAN3
TL;DR: A principled framework named StyleFaceV is proposed, which produces high-fidelity identity-preserving face videos with vivid move- ments, and a temporal-dependent model is built upon the decomposed latent features, and samples sequences of motions that are capable of generating realistic and temporally coherent face videos.
Proceedings ArticleDOI
Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers
Yasheng Sun,Hang Zhou,Kaisiyuan Wang,Qianyi Wu,Zhibin Hong,Jingtuo Liu,Errui Ding,Jingdong Wang,Ziwei Liu,Koike Hideki +9 more
TL;DR: In this paper , a convolution-transformer hybrid backbone is proposed to fuse the textural information on the unmasked regions and the reference frame, and a refinement network with audio injection improves both image and lip-sync quality.