Home
/
Authors
/
András Zolnay

Author

András Zolnay

Bio: András Zolnay is an academic researcher from RWTH Aachen University. The author has contributed to research in topics: Linear discriminant analysis & Word error rate. The author has an hindex of 9, co-authored 9 publications receiving 295 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Acoustic feature combination for robust speech recognition

[...]

András Zolnay, Ralf Schlüter, Hermann Ney

18 Mar 2005

TL;DR: Experiments performed on the large-vocabulary task VerbMobil II (German conversational speech) show that the accuracy of automatic speech recognition systems can be improved by the combination of different acoustic features.

...read moreread less

Abstract: In this paper, we consider the use of multiple acoustic features of the speech signal for robust speech recognition. We investigate the combination of various auditory based (mel frequency cepstrum coefficients, perceptual linear prediction, etc.) and articulatory based (voicedness) features. Features are combined by linear discriminant analysis and log-linear model combination based techniques. We describe the two feature combination techniques and compare the experimental results. Experiments performed on the large-vocabulary task VerbMobil II (German conversational speech) show that the accuracy of automatic speech recognition systems can be improved by the combination of different acoustic features.

...read moreread less

78 citations

Proceedings Article•

Robust speech recognition using a voiced-unvoiced feature.

[...]

András Zolnay¹, Ralf Schlüter, Hermann Ney•Institutions (1)

RWTH Aachen University¹

01 Jan 2002

TL;DR: A voiced-unvoiced measure was combined with the standard Mel Frequency Cepstral Coefficients using linear discriminant analysis (LDA) to choose the most relevant features for continuous speech recognition.

...read moreread less

Abstract: In this paper, a voiced-unvoiced measure is used as acoustic feature for continuous speech recognition. The voiced-unvoiced measure was combined with the standard Mel Frequency Cepstral Coefficients (MFCC) using linear discriminant analysis (LDA) to choose the most relevant features. Experiments were performed on the SieTill (German digit strings recorded over telephone line) and on the SPINE (English spontaneous speech under different simulated noisy environments) corpus. The additional voiced-unvoiced measure results in improvements in word error rate (WER) of up to 11% relative to using MFCC alone with the same overall number of parameters in the system.

...read moreread less

43 citations

Proceedings Article•DOI•

Implementing frequency-warping and VTLN through linear transformation of conventional MFCC.

[...]

Srinivasan Umesh¹, András Zolnay, Hermann Ney•Institutions (1)

Indian Institute of Technology Kanpur¹

04 Sep 2005

TL;DR: The proposed method exploits the bandlimited interpolation idea (in the frequency-domain) to do the necessary frequency-warping and yields exact results as long as the cepstral coefficients are que-frency limited.

...read moreread less

Abstract: In this paper, we show that frequency-warping (including VTLN) can be implemented through linear transformation of conventional MFCC. Unlike the Pitz-Ney [1] continuous domain approach, we directly determine the relation between frequency-warping and the linear-transformation in the discrete-domain. The advantage of such an approach is that it can be applied to any frequency-warping and is not limited to cases where an analytical closed-form solution can be found. The proposed method exploits the bandlimited interpolation idea (in the frequency-domain) to do the necessary frequency-warping and yields exact results as long as the cepstral coefficients are que-frency limited. This idea of quefrencylimitedness shows the importance of the filter-bank smoothing of the spectra which has been ignored in [1, 2]. Furthermore, unlike [1], since we operate in the discrete domain, we can also apply the usual discrete-cosine transform (i.e. DCT-II) on the logarithm of the filter-bank output to get conventional MFCC features. Therefore, using our proposed method, we can linearly transform conventional MFCC cepstra to do VTLN and we do not require any recomputation of the warped-features. We provide experimental results in support of this approach.

...read moreread less

35 citations

Journal Article•DOI•

Using multiple acoustic feature sets for speech recognition

[...]

András Zolnay¹, Daniil Kocharov², Ralf Schlüter¹, Hermann Ney¹•Institutions (2)

RWTH Aachen University¹, Saint Petersburg State University²

01 Jun 2007-Speech Communication

TL;DR: The results show that the accuracy of automatic speech recognition systems can be significantly improved by the combination of auditory and articulatory motivated features.

...read moreread less

33 citations

Proceedings Article•

Feature combination using linear discriminant analysis and its pitfalls.

[...]

Ralf Schlüter¹, András Zolnay, Hermann Ney•Institutions (1)

RWTH Aachen University¹

01 Jan 2006

TL;DR: It is shown that the combination of acoustic features using LDA does not consistently lead to improvements in word error rate, and relative improvements inword error rate of up to 5% were observed for LDA-based combination of multiple acoustic features.

...read moreread less

Abstract: In this paper, Linear Discriminant Analysis (LDA) is investigated with respect to the combination of different acoustic features for automatic speech recognition. It is shown that the combination of acoustic features using LDA does not consistently lead to improvements in word error rate. A detailed analysis of the recognition results on the Verbmobil (VM II) and on the English portion of the European Parliament Plenary Sessions (EPPS) corpus is given. This includes an independent analysis of the effect of the dimension of the input to LDA, the effect of strongly correlated input features, as well as a detailed numerical analysis of the generalized eigenvalue problem underlying LDA. Relative improvements in word error rate of up to 5% were observed for LDA-based combination of multiple acoustic features.

...read moreread less

31 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Digital processing of speech signals

[...]

M.G. Bellanger

01 Oct 1980

1,565 citations

Journal Article•DOI•

Automatic speech recognition and speech variability: A review

[...]

Mohamed Faouzi BenZeghiba, R. De Mori, Olivier Deroo, Stéphane Dupont, T. Erbes, D. Jouvet, Luciano Fissore, Pietro Laface, Alfred Mertins, Christophe Ris, Richard Rose, Vivek Tyagi, Christian Wellekens - Show less +9 more

01 Oct 2007-Speech Communication

TL;DR: Current advances related to automatic speech recognition (ASR) and spoken language systems and deficiencies in dealing with variation naturally present in speech are outlined.

...read moreread less

507 citations

An essay towards solving a problem in the doctrine of chances. [Facsimil]

[...]

Thomas Bayes

01 Jan 2001

TL;DR: The probability of any event is the ratio between the value at which an expectation depending on the happening of the event ought to be computed, and the value of the thing expected upon it’s 2 happening.

...read moreread less

Abstract: Problem Given the number of times in which an unknown event has happened and failed: Required the chance that the probability of its happening in a single trial lies somewhere between any two degrees of probability that can be named. SECTION 1 Definition 1. Several events are inconsistent, when if one of them happens, none of the rest can. 2. Two events are contrary when one, or other of them must; and both together cannot happen. 3. An event is said to fail, when it cannot happen; or, which comes to the same thing, when its contrary has happened. 4. An event is said to be determined when it has either happened or failed. 5. The probability of any event is the ratio between the value at which an expectation depending on the happening of the event ought to be computed, and the value of the thing expected upon it’s 2 happening.

...read moreread less

368 citations

Journal Article•DOI•

Statistical approaches to computer-assisted translation

[...]

Sergio Barrachina, Oliver Bender, Francisco Casacuberta, Jorge Civera, Elsa Cubel, Shahram Khadivi, Antonio Lagarda, Hermann Ney, Jesús Tomás, Enrique Vidal, Juan-Miguel Vilar - Show less +7 more

01 Mar 2009-Computational Linguistics

TL;DR: Alignment templates, phrase-based models, and stochastic finite-state transducers are used to develop computer-assisted translation systems in a European project in two real tasks.

...read moreread less

Abstract: Current machine translation (MT) systems are still not perfect. In practice, the output from these systems needs to be edited to correct errors. A way of increasing the productivity of the whole translation process (MT plus human work) is to incorporate the human correction activities within the translation process itself, thereby shifting the MT paradigm to that of computer-assisted translation. This model entails an iterative process in which the human translator activity is included in the loop: In each iteration, a prefix of the translation is validated (accepted or amended) by the human and the system computes its best (or n-best) translation suffix hypothesis to complete this prefix. A successful framework for MT is the so-called statistical (or pattern recognition) framework. Interestingly, within this framework, the adaptation of MT systems to the interactive scenario affects mainly the search process, allowing a great reuse of successful techniques and models. In this article, alignment templates, phrase-based models, and stochastic finite-state transducers are used to develop computer-assisted translation systems. These systems were assessed in a European project (TransType2) in two real tasks: The translation of printer manuals; manuals and the translation of the Bulletin of the European Union. In each task, the following three pairs of languages were involved (in both translation directions): English-Spanish, English-German, and English-French.

...read moreread less

238 citations

Journal Article•DOI•

Exploring Monaural Features for Classification-Based Speech Segregation

[...]

Yuxuan Wang¹, Kun Han¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

01 Feb 2013-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This paper expands T-F unit features to include gammatone frequency cepstral coefficients (GFCC), mel-frequency cep stral coefficients, relative spectral transform (RASTA) and perceptual linear prediction (PLP), and proposes to use a group Lasso approach to select complementary features in a principled way.

...read moreread less

Abstract: Monaural speech segregation has been a very challenging problem for decades. By casting speech segregation as a binary classification problem, recent advances have been made in computational auditory scene analysis on segregation of both voiced and unvoiced speech. So far, pitch and amplitude modulation spectrogram have been used as two main kinds of time-frequency (T-F) unit level features in classification. In this paper, we expand T-F unit features to include gammatone frequency cepstral coefficients (GFCC), mel-frequency cepstral coefficients, relative spectral transform (RASTA) and perceptual linear prediction (PLP). Comprehensive comparisons are performed in order to identify effective features for classification-based speech segregation. Our experiments in matched and unmatched test conditions show that these newly included features significantly improve speech segregation performance. Specifically, GFCC and RASTA-PLP are the best single features in matched-noise and unmatched-noise test conditions, respectively. We also find that pitch-based features are crucial for good generalization to unseen environments. To further explore complementarity in terms of discriminative power, we propose to use a group Lasso approach to select complementary features in a principled way. The final combined feature set yields promising results in both matched and unmatched test conditions.

...read moreread less

192 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

Collapse