scispace - formally typeset
Y

Yuchen Hu

Publications -  9
Citations -  78

Yuchen Hu is an academic researcher. The author has contributed to research in topics: Engineering & Computer science. The author has an hindex of 6, co-authored 9 publications receiving 78 citations.

Papers
More filters
Proceedings ArticleDOI

Noise-Robust Speech Recognition With 10 Minutes Unparalleled In-Domain Data

TL;DR: A generative adversarial network to simulate noisy spectrum from the clean spectrum (SimuGAN), where only 10 minutes of unparalleled in-domain noisy speech data is required as labels is proposed, and a dual-path speech recognition system is proposed to improve the robustness of the system under noisy conditions.
Proceedings ArticleDOI

Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning

TL;DR: A noise-robust data2vec for self-supervised speech representation learning is proposed by jointly optimizing the contrastive learning and regression tasks in the pre-training stage to avoid the model collapse to some extent compared to only training the regression task.
Proceedings ArticleDOI

Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning

TL;DR: This work proposes a novel AAC system called CLIP-AAC to learn interactive cross-modality representation with both acoustic and textual information and indicates that both the pre-trained model and contrastive learning contribute to the performance gain of the AAC model.
Journal ArticleDOI

Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning

TL;DR: In this paper , a reinforcement learning-based framework called MSRL is proposed, where the agent dynamically harmonizes modality-invariant and modality specific representations in the auto-regressive decoding process.
Journal ArticleDOI

Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition

TL;DR: Visualizations of intermediate embeddings indicate that DPSL-ASR can recover abundant over-suppressed information in enhanced speech and employ consistency loss to minimize the distance of ASR outputs in two paths to improve noise-robustness.