scispace - formally typeset
Search or ask a question

Showing papers by "Tim Fingscheidt published in 2014"


Proceedings ArticleDOI
24 Aug 2014
TL;DR: This paper presents several feature extraction and classification approaches for the identification of writers in historical Arabic manuscripts and these approaches are able to successfully identify writers of multipage documents.
Abstract: Identification of writers of handwritten historical documents is an important and challenging task. In this paper we present several feature extraction and classification approaches for the identification of writers in historical Arabic manuscripts. The approaches are able to successfully identify writers of multipage documents. The feature extraction methods rely on different principles, such as contour-, textural- and key point-based and the classification schemes are based on averaging and voting. For all experiments a dedicated data set based on a publicly available database is used. The experiments show promising results and the best performance was achieved using a novel feature extraction based on key point descriptors.

25 citations


Proceedings ArticleDOI
20 Nov 2014
TL;DR: Two neural networks are employed to support an HMM-based ABE: the first one detects /s, z/ phonemes to assist the estimation process, while the second one corrects the estimated high-band energy.
Abstract: In telephony applications, artificial bandwidth extension (ABE) can be applied to narrowband (NB) calls for speech quality and intelligibility enhancement. However, high-band extension is challenging due to insufficient mutual information between the lower and upper frequency band in speech. Estimation errors particularly of fricatives /s, z/ are the consequence leading to annoying artifacts, such as lisping. In this paper, two neural networks are employed to support an HMM-based ABE: The first one detects /s, z/ phonemes to assist the estimation process, while the second one corrects the estimated high-band energy. In an absolute category rating test the proposed ABE attains a significantly improved speech quality vs. NB speech. This is confirmed by a comparison category rating test pointing out a speech quality gain of 1.0 CMOS points over NB speech.

19 citations


Proceedings ArticleDOI
15 Dec 2014
TL;DR: It is shown that the extended word spotter outperforms the original version in terms of mean average precision on both datasets and is also tested on the widely used George Washington dataset.
Abstract: In this paper, we present a new and freely available dataset comprising 80 pages of an historical handwritten Arabic document in conjunction with a detailed ground truth for the development and evaluation of segmentation-free word spotting approaches. Besides information on the underlying manuscript and technical details, we introduce a comprehensive list of tags that each word is labeled with. These tags can be used for research on specific issues such as dealing with text in different colors. For comparison of different word spotters, a fixed set of 25 keywords with different properties is included. Furthermore, some specifics of spotting on Arabic manuscripts are discussed. We exemplarily present a state-of-the-art word spotting algorithm in its original and a new extended implementation and evaluate both approaches on the new dataset. For comparison, they are also tested on the widely used George Washington dataset. It is shown that the extended word spotter outperforms the original version in terms of mean average precision on both datasets.

17 citations


Proceedings ArticleDOI
04 May 2014
TL;DR: This paper investigates the relevance of instrumental and subjective assessment methods for ABE systems, and compares and discusses the results of an ACR and a CCR test.
Abstract: During the transition to wideband speech telephony, artificial bandwidth extension (ABE) could help to preserve customer satisfaction by enhancing speech quality in case of narrowband (NB) calls. However, the assessment of speech quality for ABE systems is still an open question. In the literature, instrumental measures are often used to judge the quality of ABE solutions. When subjective listening tests are considered, they most often use a comparison category rating (CCR) scale and, more rarely, an absolute category rating (ACR) scale. This paper investigates the relevance of instrumental and subjective assessment methods for ABE systems. An ACR and a CCR test are organized. Their results are compared and discussed. Discrepancies between these two tests open the discussion for the design of a proper subjective listening test for ABE systems. Some instrumental measures are also evaluated. A poor correlation between these measures and the subjective results is observed.

16 citations


Proceedings ArticleDOI
15 Dec 2014
TL;DR: The main contribution is a learning-based rejection strategy which utilizes writer retrieval and support vector machines for rejecting a decision if no corresponding writer can be found for a query manuscript.
Abstract: Determining the individuality of handwriting in ancient manuscripts is an important aspect of the manuscript analysis process. Automatic identification of writers in historical manuscripts can support historians to gain insights into manuscripts with missing metadata such as writer name, period, and origin. In this paper writer classification and retrieval approaches for multi-page documents in the context of historical manuscripts are presented. The main contribution is a learning-based rejection strategy which utilizes writer retrieval and support vector machines for rejecting a decision if no corresponding writer can be found for a query manuscript. Experiments using different feature extraction methods demonstrate the abilities of our proposed methods. A dedicated data set based on a publicly available database of historical Arabic manuscripts was used and the experiments show promising results.

14 citations


13 Nov 2014
TL;DR: An improved state-space frequency-domain acoustic echo canceler (AEC) is presented, which makes use of Kalman filtering theory to achieve very good convergence performance, particularly in double talk.
Abstract: We present an improved state-space frequency-domain acoustic echo canceler (AEC), which makes use of Kalman filtering theory to achieve very good convergence performance, particularly in double talk. Our contribution can be considered threefold: The proposed approach is designed to suit an automotive wideband overlap-save (OLS) setup, to operate best in this distinctive use case. Second, we provide a temporal smoothing and overestimation approach for two particular noise covariance matrices to improve echo return loss enhancement (ERLE) performance. Furthermore, we integrate an adapted perceptually transparent decorrelation preprocessor, which makes use of human insensitivity against appropriately chosen frequency-selective phase modulation, to improve robustness against far-end impulse response changes.

13 citations


Proceedings ArticleDOI
13 Nov 2014
TL;DR: A novel scalar decoding approach utilizing the correlation of input signals is proposed in this paper, and distinct improvement is achieved with the receiver in error-free and error-prone transmission conditions, both with hard-decision and soft-dec decision decoding.
Abstract: Lloyd-Max quantization (LMQ) is a widely used scalar non-uniform quantization approach targeting for the minimum mean squared error (MMSE). Once designed, the quantizer codebook is fixed over time and does not take advantage of possible correlations in the input signals. Exploiting correlation in scalar quantization could be achieved by predictive quantization, however, for the price of a higher bit error sensitivity. In order to improve the Lloyd-Max quantizer performance for correlated processes without encoder-sided prediction, a novel scalar decoding approach utilizing the correlation of input signals is proposed in this paper. Based on previously received samples, the current sample can be predicted a priori. Thereafter, a quantization codebook adapted over time will be generated according to the prediction error probability density function. Compared to the standard LMQ, distinct improvement is achieved with our receiver in error-free and error-prone transmission conditions, both with hard-decision and soft-decision decoding.

6 citations



Proceedings ArticleDOI
04 May 2014
TL;DR: This work, based on the trellis representation for VLCs and the BCJR algorithm, presents a variable-length soft-decision decoder utilizing bit-wise channel reliability information and achieving a better error robustness in contrast to hard-dec decision decoding.
Abstract: Variable-length codes (VLCs) are widely used in media transmission. Compared to fixed-length codes (FLCs), VLCs can represent the same message with a lower bit rate, thus having a better compression performance. But inevitably, VLCs are very sensitive to transmission errors. In this work, based on the trellis representation for VLCs and the BCJR algorithm, we present a variable-length soft-decision decoder utilizing bit-wise channel reliability information and achieving a better error robustness in contrast to hard-decision decoding. Given the application of VLCs in audio coding showing both source correlation and variable block lengths, a strong dependency of performance is observed for both. Therefore, we point out tradeoffs of (soft-decision) decoded FLCs and VLCs depending on quantization bit rate, source correlation, and block length. We find that VLCs over AWGN channels are only recommended for very low source correlation in combination with very short block lengths and soft-decision decoding.

5 citations



Proceedings ArticleDOI
04 May 2014
TL;DR: A compact formulation of turbo automatic speech recognition is introduced, and a shape-based visual feature extraction algorithm without any learning paradigms is presented, which clearly outperforms the iterative approach introduced by Shivappa et al.
Abstract: Since most automatic speech recognition (ASR) systems still suffer from adverse acoustic conditions and insufficient acoustic modeling, recognition robustness can be improved by integrating further information sources such as additional acoustic channels, modalities, or models. Considering the question of information fusion, interesting parallels to problems in digital communications can be observed, where the turbo principle revolutionized reliable communication. In this paper, we provide new perspectives on turbo ASR: First, we introduce a compact formulation of turbo automatic speech recognition; second, we present a shape-based visual feature extraction algorithm without any learning paradigms. Third, we show an application to an audio-visual speech recognition task on a large data set, where our proposed method clearly outperforms the iterative approach introduced by Shivappa et al. as well as a conventional coupled-hidden-Markov-model approach by up to 23.8% relative reduction in word error rate.

Proceedings ArticleDOI
13 Nov 2014
TL;DR: It turns out that decimation and interpolation techniques, reducing the bandwidth mismatch between the NB speech material in training and the WB speech data to be recognized, do not succeed in outperforming the pure NB ASR baseline, but true WB ASR training supported by artificial bandwidth extension (ABE) reveals a performance gain.
Abstract: Automatic speech recognition (ASR) for wideband (WB) telephone speech services must cope with a lack of matching speech databases for acoustic model training. This paper investigates the impact of mixing insufficient WB and additional narrowband (NB) speech training data. It turns out that decimation and interpolation techniques, reducing the bandwidth mismatch between the NB speech material in training and the WB speech data to be recognized, do not succeed in outperforming the pure NB ASR baseline. However, true WBASR training supported by artificial bandwidth extension (ABE) reveals a performance gain. A new ABE approach that makes use of robust dynamic features and a Viterbi path decoder exploiting phonetic a priori knowledge proves to be superior. It yields a reduction of 1.9 % word error rate relative to the NB ASR baseline and 9.3 % relative to a WB ASR experiment trained on only a limited amount of WB speech data.

Journal ArticleDOI
TL;DR: A new experimental design was introduced to relate electrophysiological measures to Bayesian inference, and an urns-and-balls paradigm was used to study neural underpinnings of probabilistic inverse inference.
Abstract: Empirical support for the Bayesian brain hypothesis, although of major theoretical importance for cognitive neuroscience, is surprisingly scarce. The literature still lacks definitive functional neuroimaging evidence that neural activities code and compute Bayesian probabilities. Here, we introduce a new experimental design to relate electrophysiological measures to Bayesian inference. Specifically, an urns-and-balls paradigm was used to study neural underpinnings of probabilistic inverse inference. Event-related potentials (ERPs) were recorded from human participants who performed the urns-and-balls paradigm, and computational modeling was conducted on trial-by-trial electrophysiological signals. Five computational models were compared with respect to their capacity to predict electrophysiological measures. One Bayesian model (BAY) was compared with another Bayesian model which takes potential effects of non-linear probability weighting into account (BAYS). A predictive surprise model (TOPS) of sequential probability revisions was derived from the Bayesian models. A comparison was made with two published models of surprise (DIF [1] and OST [2]). Subsets of the trial-by-trial electrophysiological signals were differentially sensitive to model predictors: The anteriorly distributed N250 was best fit by the DIF model, the BAYS model provided the best fit to the anteriorly distributed P3a, whereas the posteriorly distributed P3b and Slow Wave were best fit by the TOPS model.

Book ChapterDOI
01 Jan 2014
TL;DR: This chapter presents a wideband hands-free system for automotive telephony applications with a synchronously adapted acoustic echo canceller and postfilter based on a frequency domain adaptive filter approach and Kalman filter theory and makes use of a generalized Wiener postfilter for residual echo suppression and noise reduction in a consistent way.
Abstract: Wideband mobile telephony supporting a speech bandwidth from 50 to 7,000 Hz gets more and more employed. These so-called mobile HD Voice services consequently find their way into automobile applications. In this chapter we present a wideband hands-free system for automotive telephony applications with a synchronously adapted acoustic echo canceller and postfilter. It is based on a frequency domain adaptive filter approach and Kalman filter theory and makes use of a generalized Wiener postfilter for residual echo suppression and noise reduction in a consistent way. To provide a high convergence rate in case of time-variant echo paths, the echo canceller with very robust double-talk performance is supported by a fast converging shadow filter, which allows for a good tracking performance. A decimation approach is used to decrease algorithmic delay and computational complexity without loss of quality. Experimental results with car cabin impulse responses show good echo cancellation capabilities with fast convergence times along with extraordinary full-duplex performance while still keeping an almost untouched speech component in the converged state.