scispace - formally typeset
Search or ask a question

Showing papers by "Tim Fingscheidt published in 2023"


05 Jun 2023
TL;DR: In this paper , the authors proposed an efficient Fully Convolutional Recurrent Neural Network (FCRN) and a new family of efficient convolutional recurrent neural networks (EffCRN23, EffCRN 23lite) for speech enhancement.
Abstract: Fully convolutional recurrent neural networks (FCRNs) have shown state-of-the-art performance in single-channel speech enhancement. However, the number of parameters and the FLOPs/second of the original FCRN are restrictively high. A further important class of efficient networks is the CRUSE topology, serving as reference in our work. By applying a number of topological changes at once, we propose both an efficient FCRN (FCRN15), and a new family of efficient convolutional recurrent neural networks (EffCRN23, EffCRN23lite). We show that our FCRN15 (875K parameters) and EffCRN23lite (396K) outperform the already efficient CRUSE5 (85M) and CRUSE4 (7.2M) networks, respectively, w.r.t. PESQ, DNSMOS and DeltaSNR, while requiring about 94% less parameters and about 20% less #FLOPs/frame. Thereby, according to these metrics, the FCRN/EffCRN class of networks provides new best-in-class network topologies for speech enhancement.

Proceedings ArticleDOI
09 Jan 2023
TL;DR: This paper applied the regularized dropout (R-Drop) method to transformer-based lip-reading to improve their training-inference consistency and relaxed attention technique is applied during training for a better external language model integration.
Abstract: End-to-end automatic lip-reading usually comprises an encoder-decoder model and an optional external language model. In this work, we introduce two regularization methods to the field of lip-reading: First, we apply the regularized dropout (R-Drop) method to transformer-based lip-reading to improve their training-inference consistency. Second, the relaxed attention technique is applied during training for a better external language model integration. We are the first to show that these two complementary approaches yield particu1arly strong performance if combined in the right manner. In particular, by adding an additional R - Drop loss and smoothing the attention weights in cross multi-head attention during training only, we achieve a new state of the art with a word error rate of 22.2% on Lip Reading Sentences 2 (LRS2). On LRS3, we are 2nd ranked with 25.5% WER using only 1,759 h of training data, while the 1 st rank uses about 90,000 h. Our code is available at GitHub.11https://github.com/ifnspaml/Lipreading-RDrop-RA

Journal ArticleDOI
TL;DR: In this paper , a non-intrusive neural-network-based speech quality measure, focusing on super-wideband speech signals, is proposed to predict the well-known PESQ metric.
Abstract: Wideband codecs such as AMR-WB or EVS are widely used in (mobile) speech communication. Evaluation of coded speech quality is often performed subjectively by an absolute category rating (ACR) listening test. However, the ACR test is impractical for online monitoring of speech communication networks. Perceptual evaluation of speech quality (PESQ) is one of the widely used metrics instrumentally predicting the results of an ACR test. However, the PESQ algorithm requires an original reference signal, which is usually unavailable in network monitoring, thus limiting its applicability. NISQA is a new non-intrusive neural-network-based speech quality measure, focusing on super-wideband speech signals. In this work, however, we aim at predicting the well-known PESQ metric using a non-intrusive PESQ-DNN model. We illustrate the potential of this model by predicting the PESQ scores of wideband-coded speech obtained from AMR-WB or EVS codecs operating at different bitrates in noisy, tandeming, and error-prone transmission conditions. We compare our methods with the state-of-the-art network topologies of QualityNet, WaweNet, and DNSMOS -- all applied to PESQ prediction -- by measuring the mean absolute error (MAE) and the linear correlation coefficient (LCC). The proposed PESQ-DNN offers the best total MAE and LCC of 0.11 and 0.92, respectively, in conditions without frame loss, and still is best when including frame loss. Note that our model could be similarly used to non-intrusively predict POLQA or other (intrusive) metrics. Upon article acceptance, code will be provided at GitHub.

Journal ArticleDOI
TL;DR: Unsupervised domain adaptation (UDA) as discussed by the authors has been a hot topic in the field of automated driving, where the authors present an overview of the current state-of-the-art in this research field.
Abstract: Deep neural networks (DNNs) have proven their capabilities in the past years and play a significant role in environment perception for the challenging application of automated driving. They are employed for tasks such as detection, semantic segmentation, and sensor fusion. Despite tremendous research efforts, several issues still need to be addressed that limit the applicability of DNNs in automated driving. The bad generalization of DNNs to unseen domains is a major problem on the way to a safe, large-scale application, because manual annotation of new domains is costly, particularly for semantic segmentation. For this reason, methods are required to adapt DNNs to new domains without labeling effort. This task is termed unsupervised domain adaptation (UDA). While several different domain shifts challenge DNNs, the shift between synthetic and real data is of particular importance for automated driving, as it allows the use of simulation environments for DNN training. We present an overview of the current state of the art in this research field. We categorize and explain the different approaches for UDA. The number of considered publications is larger than any other survey on this topic. We also go far beyond the description of the UDA state-of-the-art, as we present a quantitative comparison of approaches and point out the latest trends in this field. We conduct a critical analysis of the state-of-the-art and highlight promising future research directions. With this survey, we aim to facilitate UDA research further and encourage scientists to exploit novel research directions.