X
Xingchen Song
Researcher at Tsinghua University
Publications - 16
Citations - 133
Xingchen Song is an academic researcher from Tsinghua University. The author has contributed to research in topics: Computer science & Engineering. The author has an hindex of 4, co-authored 6 publications receiving 57 citations.
Papers
More filters
Proceedings ArticleDOI
Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks
TL;DR: Speech-XLNet as discussed by the authors proposes an XLNet-like pretraining scheme for unsupervised acoustic model pretraining to learn speech representations with self-attention network, which is finetuned under the hybrid SAN/HMM framework.
Proceedings ArticleDOI
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
Binbin Zhang,Di Wu,Zhendong Peng,Xingchen Song,Zhuoyuan Yao,Hang Lv,Lei Xie,Chao Yang,Fuping Pan,Jianwei Niu +9 more
TL;DR: The brand-new WeNet 2.0 achieves up to 10% relative recognition performance improvement over the original WeNet on various corpora and makes available several important production-oriented features.
Posted Content
Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks
TL;DR: Speech-XLNet, an XLNet-like pretraining scheme for unsupervised acoustic model pretraining to learn speech representations with SAN, greatly improves the SAN/HMM performance in terms of both convergence speed and recognition accuracy compared to the one trained from randomly initialized weights.
Posted Content
Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input
TL;DR: This work proposes a CTC-enhanced NAR transformer, which generates target sequence by refining predictions of the CTC module and achieves 50x faster decoding speed than a strong AR baseline with only 0.3 absolute CER degradation on Aishell-1 and AIShell-2 datasets.
Proceedings ArticleDOI
Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input
TL;DR: This paper proposed a CTC-enhanced NAR transformer, which generates target sequence by refining predictions of the CTC module and achieves 50x faster decoding speed than a strong AR baseline with only 0.0∼ 0.3 absolute CER degradation on Aishell-1 and AIShell-2 datasets.