scispace - formally typeset
X

Xingchen Song

Researcher at Tsinghua University

Publications -  16
Citations -  133

Xingchen Song is an academic researcher from Tsinghua University. The author has contributed to research in topics: Computer science & Engineering. The author has an hindex of 4, co-authored 6 publications receiving 57 citations.

Papers
More filters
Proceedings ArticleDOI

Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks

TL;DR: Speech-XLNet as discussed by the authors proposes an XLNet-like pretraining scheme for unsupervised acoustic model pretraining to learn speech representations with self-attention network, which is finetuned under the hybrid SAN/HMM framework.
Proceedings ArticleDOI

WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

TL;DR: The brand-new WeNet 2.0 achieves up to 10% relative recognition performance improvement over the original WeNet on various corpora and makes available several important production-oriented features.
Posted Content

Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks

TL;DR: Speech-XLNet, an XLNet-like pretraining scheme for unsupervised acoustic model pretraining to learn speech representations with SAN, greatly improves the SAN/HMM performance in terms of both convergence speed and recognition accuracy compared to the one trained from randomly initialized weights.
Posted Content

Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input

TL;DR: This work proposes a CTC-enhanced NAR transformer, which generates target sequence by refining predictions of the CTC module and achieves 50x faster decoding speed than a strong AR baseline with only 0.3 absolute CER degradation on Aishell-1 and AIShell-2 datasets.
Proceedings ArticleDOI

Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input

TL;DR: This paper proposed a CTC-enhanced NAR transformer, which generates target sequence by refining predictions of the CTC module and achieves 50x faster decoding speed than a strong AR baseline with only 0.0∼ 0.3 absolute CER degradation on Aishell-1 and AIShell-2 datasets.