Y
Yuchen Hu
Publications - 9
Citations - 78
Yuchen Hu is an academic researcher. The author has contributed to research in topics: Engineering & Computer science. The author has an hindex of 6, co-authored 9 publications receiving 78 citations.
Papers
More filters
Proceedings ArticleDOI
Noise-Robust Speech Recognition With 10 Minutes Unparalleled In-Domain Data
TL;DR: A generative adversarial network to simulate noisy spectrum from the clean spectrum (SimuGAN), where only 10 minutes of unparalleled in-domain noisy speech data is required as labels is proposed, and a dual-path speech recognition system is proposed to improve the robustness of the system under noisy conditions.
Proceedings ArticleDOI
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning
TL;DR: A noise-robust data2vec for self-supervised speech representation learning is proposed by jointly optimizing the contrastive learning and regression tasks in the pre-training stage to avoid the model collapse to some extent compared to only training the regression task.
Proceedings ArticleDOI
Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning
TL;DR: This work proposes a novel AAC system called CLIP-AAC to learn interactive cross-modality representation with both acoustic and textual information and indicates that both the pre-trained model and contrastive learning contribute to the performance gain of the AAC model.
Journal ArticleDOI
Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning
TL;DR: In this paper , a reinforcement learning-based framework called MSRL is proposed, where the agent dynamically harmonizes modality-invariant and modality specific representations in the auto-regressive decoding process.
Journal ArticleDOI
Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition
TL;DR: Visualizations of intermediate embeddings indicate that DPSL-ASR can recover abundant over-suppressed information in enhanced speech and employ consistency loss to minimize the distance of ASR outputs in two paths to improve noise-robustness.