scispace - formally typeset
Search or ask a question
Author

Tim Fingscheidt

Other affiliations: Siemens, AT&T Labs, AT&T  ...read more
Bio: Tim Fingscheidt is an academic researcher from Braunschweig University of Technology. The author has contributed to research in topics: Speech enhancement & Speech coding. The author has an hindex of 22, co-authored 231 publications receiving 2140 citations. Previous affiliations of Tim Fingscheidt include Siemens & AT&T Labs.


Papers
More filters
Book ChapterDOI
23 Aug 2020
TL;DR: A new self-supervised semantically-guided depth estimation (SGDepth) method to deal with moving dynamic-class (DC) objects, such as moving cars and pedestrians, which violate the static-world assumptions typically made during training of such models.
Abstract: Self-supervised monocular depth estimation presents a powerful method to obtain 3D scene information from single camera images, which is trainable on arbitrary image sequences without requiring depth labels, e.g., from a LiDAR sensor. In this work we present a new self-supervised semantically-guided depth estimation (SGDepth) method to deal with moving dynamic-class (DC) objects, such as moving cars and pedestrians, which violate the static-world assumptions typically made during training of such models. Specifically, we propose (i) mutually beneficial cross-domain training of (supervised) semantic segmentation and self-supervised depth estimation with task-specific network heads, (ii) a semantic masking scheme providing guidance to prevent moving DC objects from contaminating the photometric loss, and (iii) a detection method for frames with non-moving DC objects, from which the depth of DC objects can be learned. We demonstrate the performance of our method on several benchmarks, in particular on the Eigen split, where we exceed all baselines without test-time refinement.

217 citations

Journal ArticleDOI
TL;DR: A new and generalizing approach to error concealment is described as part of a modified robust speech decoder that can be applied to any speech codec standard and preserves bit exactness in the case of an error free channel.
Abstract: In digital speech communication over noisy channels there is the need for reducing the subjective effects of residual bit errors which have not been eliminated by channel decoding. This task is usually called error concealment. We describe a new and generalizing approach to error concealment as part of a modified robust speech decoder. It can be applied to any speech codec standard and preserves bit exactness in the case of an error free channel. The proposed method requires bit reliability information provided by the demodulator or by the equalizer or specifically by the channel decoder and can exploit additionally a priori knowledge about codec parameters. We apply our algorithms to PCM, ADPCM, and GSM full-rate speech coding using AWGN, fading, and GSM channel models, respectively. It turns out that the speech quality is significantly enhanced, showing the desired inherent muting mechanism or graceful degradation behavior in the case of extreme adverse transmission conditions.

192 citations

Journal ArticleDOI
TL;DR: An urn-ball paradigm is introduced to relate event-related potentials (ERPs) such as the P300 wave to Bayesian inference and indicates that the three components of the late positive complex reflect distinct neural computations, which are consistent with the Bayesian brain hypothesis but seem to be subject to nonlinear probability weighting.

86 citations

Journal ArticleDOI
TL;DR: A data-driven approach to a priori SNR estimation is presented, which reduces speech distortion, particularly in speech onset, while retaining a high level of noise attenuation in speech absence.
Abstract: The a priori signal-to-noise ratio (SNR) plays an important role in many speech enhancement algorithms. In this paper, we present a data-driven approach to a priori SNR estimation. It may be used with a wide range of speech enhancement techniques, such as, e.g., the minimum mean square error (MMSE) (log) spectral amplitude estimator, the super Gaussian joint maximum a posteriori (JMAP) estimator, or the Wiener filter. The proposed SNR estimator employs two trained artificial neural networks, one for speech presence, one for speech absence. The classical decision-directed a priori SNR estimator by Ephraim and Malah is broken down into its two additive components, which now represent the two input signals to the neural networks. Both output nodes are combined to represent the new a priori SNR estimate. As an alternative to the neural networks, also simple lookup tables are investigated. Employment of these data-driven nonlinear a priori SNR estimators reduces speech distortion, particularly in speech onset, while retaining a high level of noise attenuation in speech absence.

78 citations

Proceedings ArticleDOI
09 Jun 2019
TL;DR: This paper provides a formal definition of a corner case and proposes a system framework for both the online and the offline use case that can handle video signals from front cameras of a naturally moving vehicle and can output a corners case score.
Abstract: The progress in autonomous driving is also due to the increased availability of vast amounts of training data for the underlying machine learning approaches. Machine learning systems are generally known to lack robustness, e.g., if the training data did rarely or not at all cover critical situations. The challenging task of corner case detection in video, which is also somehow related to unusual event or anomaly detection, aims at detecting these unusual situations, which could become critical, and to communicate this to the autonomous driving system (online use case). Such a system, however, could be also used in offline mode to screen vast amounts of data and select only the relevant situations for storing and (re)training machine learning algorithms. So far, the approaches for corner case detection have been limited to videos recorded from a fixed camera, mostly for security surveillance. In this paper, we provide a formal definition of a corner case and propose a system framework for both the online and the offline use case that can handle video signals from front cameras of a naturally moving vehicle and can output a corner case score.

72 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
01 Oct 1980

1,565 citations

Patent
11 Jan 2011
TL;DR: In this article, an intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions.
Abstract: An intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions. The system can be implemented using any of a number of different platforms, such as the web, email, smartphone, and the like, or any combination thereof. In one embodiment, the system is based on sets of interrelated domains and tasks, and employs additional functionally powered by external services with which the system can interact.

1,462 citations

Posted Content
TL;DR: This work proposes the Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities, and performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques.
Abstract: When building a unified vision system or gradually adding new capabilities to a system, the usual assumption is that training data for all tasks is always available. However, as the number of tasks grows, storing and retraining on such data becomes infeasible. A new problem arises where we add new capabilities to a Convolutional Neural Network (CNN), but the training data for its existing capabilities are unavailable. We propose our Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities. Our method performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques and performs similarly to multitask learning that uses original task data we assume unavailable. A more surprising observation is that Learning without Forgetting may be able to replace fine-tuning with similar old and new task datasets for improved new task performance.

1,037 citations