Showing papers by "Tim Fingscheidt published in 2014"

PDF

Open Access

Proceedings Article•DOI•

Writer Identification for Historical Arabic Documents

[...]

Daniel Fecker, Abedelkadir Asit¹, Volker Märgner, Jihad El-Sana¹, Tim Fingscheidt - Show less +1 more•Institutions (1)

24 Aug 2014

TL;DR: This paper presents several feature extraction and classification approaches for the identification of writers in historical Arabic manuscripts and these approaches are able to successfully identify writers of multipage documents.

...read moreread less

Abstract: Identification of writers of handwritten historical documents is an important and challenging task. In this paper we present several feature extraction and classification approaches for the identification of writers in historical Arabic manuscripts. The approaches are able to successfully identify writers of multipage documents. The feature extraction methods rely on different principles, such as contour-, textural- and key point-based and the classification schemes are based on averaging and voting. For all experiments a dedicated data set based on a publicly available database is used. The experiments show promising results and the best performance was achieved using a novel feature extraction based on key point descriptors.

...read moreread less

25 citations

Proceedings Article•DOI•

HMM-based artificial bandwidth extension supported by neural networks

[...]

Patrick Bauer¹, Johannes Abel¹, Tim Fingscheidt¹•Institutions (1)

Braunschweig University of Technology¹

20 Nov 2014

TL;DR: Two neural networks are employed to support an HMM-based ABE: the first one detects /s, z/ phonemes to assist the estimation process, while the second one corrects the estimated high-band energy.

...read moreread less

Abstract: In telephony applications, artificial bandwidth extension (ABE) can be applied to narrowband (NB) calls for speech quality and intelligibility enhancement. However, high-band extension is challenging due to insufficient mutual information between the lower and upper frequency band in speech. Estimation errors particularly of fricatives /s, z/ are the consequence leading to annoying artifacts, such as lisping. In this paper, two neural networks are employed to support an HMM-based ABE: The first one detects /s, z/ phonemes to assist the estimation process, while the second one corrects the estimated high-band energy. In an absolute category rating test the proposed ABE attains a significantly improved speech quality vs. NB speech. This is confirmed by a comparison category rating test pointing out a speech quality gain of 1.0 CMOS points over NB speech.

...read moreread less

19 citations

Proceedings Article•DOI•

An Historical Handwritten Arabic Dataset for Segmentation-Free Word Spotting - HADARA80P

[...]

Werner Pantke, Martin Dennhardt, Daniel Fecker, Volker Märgner, Tim Fingscheidt - Show less +1 more

15 Dec 2014

TL;DR: It is shown that the extended word spotter outperforms the original version in terms of mean average precision on both datasets and is also tested on the widely used George Washington dataset.

...read moreread less

Abstract: In this paper, we present a new and freely available dataset comprising 80 pages of an historical handwritten Arabic document in conjunction with a detailed ground truth for the development and evaluation of segmentation-free word spotting approaches. Besides information on the underlying manuscript and technical details, we introduce a comprehensive list of tags that each word is labeled with. These tags can be used for research on specific issues such as dealing with text in different colors. For comparison of different word spotters, a fixed set of 25 keywords with different properties is included. Furthermore, some specifics of spotting on Arabic manuscripts are discussed. We exemplarily present a state-of-the-art word spotting algorithm in its original and a new extended implementation and evaluate both approaches on the new dataset. For comparison, they are also tested on the widely used George Washington dataset. It is shown that the extended word spotter outperforms the original version in terms of mean average precision on both datasets.

...read moreread less

17 citations

Proceedings Article•DOI•

On speech quality assessment of artificial bandwidth extension

[...]

Patrick Bauer, Cyril Guillaumea¹, Wouter Tirry¹, Tim Fingscheidt•Institutions (1)

Katholieke Universiteit Leuven¹

04 May 2014

TL;DR: This paper investigates the relevance of instrumental and subjective assessment methods for ABE systems, and compares and discusses the results of an ACR and a CCR test.

...read moreread less

Abstract: During the transition to wideband speech telephony, artificial bandwidth extension (ABE) could help to preserve customer satisfaction by enhancing speech quality in case of narrowband (NB) calls. However, the assessment of speech quality for ABE systems is still an open question. In the literature, instrumental measures are often used to judge the quality of ABE solutions. When subjective listening tests are considered, they most often use a comparison category rating (CCR) scale and, more rarely, an absolute category rating (ACR) scale. This paper investigates the relevance of instrumental and subjective assessment methods for ABE systems. An ACR and a CCR test are organized. Their results are compared and discussed. Discrepancies between these two tests open the discussion for the design of a proper subjective listening test for ABE systems. Some instrumental measures are also evaluated. A poor correlation between these measures and the subjective results is observed.

...read moreread less

16 citations

Proceedings Article•DOI•

Document Writer Analysis with Rejection for Historical Arabic Manuscripts

[...]

Daniel Fecker, Abedelkadir Asi¹, Werner Pantke, Volker Märgner, Jihad El-Sana¹, Tim Fingscheidt - Show less +2 more•Institutions (1)

Ben-Gurion University of the Negev¹

15 Dec 2014

TL;DR: The main contribution is a learning-based rejection strategy which utilizes writer retrieval and support vector machines for rejecting a decision if no corresponding writer can be found for a query manuscript.

...read moreread less

Abstract: Determining the individuality of handwriting in ancient manuscripts is an important aspect of the manuscript analysis process. Automatic identification of writers in historical manuscripts can support historians to gain insights into manuscripts with missing metadata such as writer name, period, and origin. In this paper writer classification and retrieval approaches for multi-page documents in the context of historical manuscripts are presented. The main contribution is a learning-based rejection strategy which utilizes writer retrieval and support vector machines for rejecting a decision if no corresponding writer can be found for a query manuscript. Experiments using different feature extraction methods demonstrate the abilities of our proposed methods. A dedicated data set based on a publicly available database of historical Arabic manuscripts was used and the experiments show promising results.

...read moreread less

14 citations

An automotive wideband stereo acoustic echo canceler using frequency-domain adaptive filtering

[...]

Marc-André Jung¹, Samy Elshamy¹, Tim Fingscheidt¹•Institutions (1)

Braunschweig University of Technology¹

13 Nov 2014

TL;DR: An improved state-space frequency-domain acoustic echo canceler (AEC) is presented, which makes use of Kalman filtering theory to achieve very good convergence performance, particularly in double talk.

...read moreread less

Abstract: We present an improved state-space frequency-domain acoustic echo canceler (AEC), which makes use of Kalman filtering theory to achieve very good convergence performance, particularly in double talk. Our contribution can be considered threefold: The proposed approach is designed to suit an automotive wideband overlap-save (OLS) setup, to operate best in this distinctive use case. Second, we provide a temporal smoothing and overestimation approach for two particular noise covariance matrices to improve echo return loss enhancement (ERLE) performance. Furthermore, we integrate an adapted perceptually transparent decorrelation preprocessor, which makes use of human insensitivity against appropriately chosen frequency-selective phase modulation, to improve robustness against far-end impulse response changes.

...read moreread less

13 citations

Proceedings Article•DOI•

Improving scalar quantization for correlated processes using adaptive codebooks only at the receiver

[...]

Sai Han¹, Tim Fingscheidt¹•Institutions (1)

Braunschweig University of Technology¹

13 Nov 2014

TL;DR: A novel scalar decoding approach utilizing the correlation of input signals is proposed in this paper, and distinct improvement is achieved with the receiver in error-free and error-prone transmission conditions, both with hard-decision and soft-dec decision decoding.

...read moreread less

Abstract: Lloyd-Max quantization (LMQ) is a widely used scalar non-uniform quantization approach targeting for the minimum mean squared error (MMSE). Once designed, the quantizer codebook is fixed over time and does not take advantage of possible correlations in the input signals. Exploiting correlation in scalar quantization could be achieved by predictive quantization, however, for the price of a higher bit error sensitivity. In order to improve the Lloyd-Max quantizer performance for correlated processes without encoder-sided prediction, a novel scalar decoding approach utilizing the correlation of input signals is proposed in this paper. Based on previously received samples, the current sample can be predicted a priori. Thereafter, a quantization codebook adapted over time will be generated according to the prediction error probability density function. Compared to the standard LMQ, distinct improvement is achieved with our receiver in error-free and error-prone transmission conditions, both with hard-decision and soft-decision decoding.

...read moreread less

6 citations

Proceedings Article•

Towards Acoustic Event Detection for Surveillance in Cars

[...]

Peter Transfeld¹, Simon Receveur¹, Tim Fingscheidt¹•Institutions (1)

Braunschweig University of Technology¹

17 Oct 2014

5 citations

Proceedings Article•DOI•

Variable-length versus fixed-length coding: On tradeoffs for soft-decision decoding

[...]

Sai Han, Tim Fingscheidt

04 May 2014

TL;DR: This work, based on the trellis representation for VLCs and the BCJR algorithm, presents a variable-length soft-decision decoder utilizing bit-wise channel reliability information and achieving a better error robustness in contrast to hard-dec decision decoding.

...read moreread less

Abstract: Variable-length codes (VLCs) are widely used in media transmission. Compared to fixed-length codes (FLCs), VLCs can represent the same message with a lower bit rate, thus having a better compression performance. But inevitably, VLCs are very sensitive to transmission errors. In this work, based on the trellis representation for VLCs and the BCJR algorithm, we present a variable-length soft-decision decoder utilizing bit-wise channel reliability information and achieving a better error robustness in contrast to hard-decision decoding. Given the application of VLCs in audio coding showing both source correlation and variable block lengths, a strong dependency of performance is observed for both. Therefore, we point out tradeoffs of (soft-decision) decoded FLCs and VLCs depending on quantization bit rate, source correlation, and block length. We find that VLCs over AWGN channels are only recommended for very low source correlation in combination with very short block lengths and soft-decision decoding.

...read moreread less

5 citations

Proceedings Article•

A New Evaluation Methodology for Speech Emotion Recognition With Confidence Output

[...]

Patrick Meyer¹, Tim Fingscheidt¹•Institutions (1)

Braunschweig University of Technology¹

17 Oct 2014

4 citations

Proceedings Article•DOI•

A compact formulation of turbo audio-visual speech recognition

[...]

Simon Receveur, Patrick Meyer, Tim Fingscheidt

04 May 2014

TL;DR: A compact formulation of turbo automatic speech recognition is introduced, and a shape-based visual feature extraction algorithm without any learning paradigms is presented, which clearly outperforms the iterative approach introduced by Shivappa et al.

...read moreread less

Abstract: Since most automatic speech recognition (ASR) systems still suffer from adverse acoustic conditions and insufficient acoustic modeling, recognition robustness can be improved by integrating further information sources such as additional acoustic channels, modalities, or models. Considering the question of information fusion, interesting parallels to problems in digital communications can be observed, where the turbo principle revolutionized reliable communication. In this paper, we provide new perspectives on turbo ASR: First, we introduce a compact formulation of turbo automatic speech recognition; second, we present a shape-based visual feature extraction algorithm without any learning paradigms. Third, we show an application to an audio-visual speech recognition task on a large data set, where our proposed method clearly outperforms the iterative approach introduced by Shivappa et al. as well as a conventional coupled-hidden-Markov-model approach by up to 23.8% relative reduction in word error rate.

...read moreread less

Proceedings Article•DOI•

Automatic recognition of wideband telephone speech with limited amount of matched training data

[...]

Patrick Bauer¹, Johannes Abel¹, Volker Fischer, Tim Fingscheidt¹•Institutions (1)

Braunschweig University of Technology¹

13 Nov 2014

TL;DR: It turns out that decimation and interpolation techniques, reducing the bandwidth mismatch between the NB speech material in training and the WB speech data to be recognized, do not succeed in outperforming the pure NB ASR baseline, but true WB ASR training supported by artificial bandwidth extension (ABE) reveals a performance gain.

...read moreread less

Abstract: Automatic speech recognition (ASR) for wideband (WB) telephone speech services must cope with a lack of matching speech databases for acoustic model training. This paper investigates the impact of mixing insufficient WB and additional narrowband (NB) speech training data. It turns out that decimation and interpolation techniques, reducing the bandwidth mismatch between the NB speech material in training and the WB speech data to be recognized, do not succeed in outperforming the pure NB ASR baseline. However, true WBASR training supported by artificial bandwidth extension (ABE) reveals a performance gain. A new ABE approach that makes use of robust dynamic features and a Viterbi path decoder exploiting phonetic a priori knowledge proves to be superior. It yields a reduction of 1.9 % word error rate relative to the NB ASR baseline and 9.3 % relative to a WB ASR experiment trained on only a limited amount of WB speech data.

...read moreread less

Journal Article•DOI•

Trial-by-trial modeling of electrophysiological signals during inverse Bayesian inference

[...]

Antonio Kolossa¹, Bruno Kopp², Tim Fingscheidt¹•Institutions (2)

Braunschweig University of Technology¹, Hannover Medical School²

21 Jul 2014-BMC Neuroscience

TL;DR: A new experimental design was introduced to relate electrophysiological measures to Bayesian inference, and an urns-and-balls paradigm was used to study neural underpinnings of probabilistic inverse inference.

...read moreread less

Abstract: Empirical support for the Bayesian brain hypothesis, although of major theoretical importance for cognitive neuroscience, is surprisingly scarce. The literature still lacks definitive functional neuroimaging evidence that neural activities code and compute Bayesian probabilities. Here, we introduce a new experimental design to relate electrophysiological measures to Bayesian inference. Specifically, an urns-and-balls paradigm was used to study neural underpinnings of probabilistic inverse inference. Event-related potentials (ERPs) were recorded from human participants who performed the urns-and-balls paradigm, and computational modeling was conducted on trial-by-trial electrophysiological signals. Five computational models were compared with respect to their capacity to predict electrophysiological measures. One Bayesian model (BAY) was compared with another Bayesian model which takes potential effects of non-linear probability weighting into account (BAYS). A predictive surprise model (TOPS) of sequential probability revisions was derived from the Bayesian models. A comparison was made with two published models of surprise (DIF [1] and OST [2]). Subsets of the trial-by-trial electrophysiological signals were differentially sensitive to model predictors: The anteriorly distributed N250 was best fit by the DIF model, the BAYS model provided the best fit to the anteriorly distributed P3a, whereas the posteriorly distributed P3b and Slow Wave were best fit by the TOPS model.

...read moreread less

Book Chapter•DOI•

A Wideband Automotive Hands-Free System for Mobile HD Voice Services

[...]

Marc-André Jung¹, Tim Fingscheidt¹•Institutions (1)

Braunschweig University of Technology¹

01 Jan 2014

TL;DR: This chapter presents a wideband hands-free system for automotive telephony applications with a synchronously adapted acoustic echo canceller and postfilter based on a frequency domain adaptive filter approach and Kalman filter theory and makes use of a generalized Wiener postfilter for residual echo suppression and noise reduction in a consistent way.

...read moreread less

Abstract: Wideband mobile telephony supporting a speech bandwidth from 50 to 7,000 Hz gets more and more employed. These so-called mobile HD Voice services consequently find their way into automobile applications. In this chapter we present a wideband hands-free system for automotive telephony applications with a synchronously adapted acoustic echo canceller and postfilter. It is based on a frequency domain adaptive filter approach and Kalman filter theory and makes use of a generalized Wiener postfilter for residual echo suppression and noise reduction in a consistent way. To provide a high convergence rate in case of time-variant echo paths, the echo canceller with very robust double-talk performance is supported by a fast converging shadow filter, which allows for a good tracking performance. A decimation approach is used to decrease algorithmic delay and computational complexity without loss of quality. Experimental results with car cabin impulse responses show good echo cancellation capabilities with fast convergence times along with extraordinary full-duplex performance while still keeping an almost untouched speech component in the converged state.

...read moreread less