scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Evaluating the intelligibility benefit of speech modifications in known noise conditions

TL;DR: The current study compares the benefits of speech modification algorithms in a large-scale speech intelligibility evaluation and quantifies the equivalent intensity change, defined as the amount in decibels that unmodified speech would need to be adjusted by in order to achieve the same intelligibility as modified speech.
About: This article is published in Speech Communication.The article was published on 2013-05-01. It has received 115 citations till now. The article focuses on the topics: Intelligibility (communication) & Voice activity detection.
Citations
More filters
Journal ArticleDOI
TL;DR: This review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise.

107 citations


Cites background from "Evaluating the intelligibility bene..."

  • ...9), did produce sparse representations which improved speech intelligibility by around 2 dB in a large-scale evaluation [60]....

    [...]

Proceedings ArticleDOI
25 Aug 2013
TL;DR: Surprisingly, for most conditions the largest gains were observed for noise-independent algorithms, suggesting that performance in this task can be further improved by exploiting information in the masking signal.
Abstract: Speech output is used extensively, including in situations where correct message reception is threatened by adverse listening conditions. Recently, there has been a growing interest in algorithmic modifications that aim to increase the intelligibility of both natural and synthetic speech when presented in noise. The Hurricane Challenge is the first large-scale open evaluation of algorithms designed to enhance speech intelligibility. Eighteen systems operating on a common data set were subjected to extensive listening tests and compared to unmodified natural and text-to-speech (TTS) baselines. The best-performing systems achieved gains over unmodified natural speech of 4.4 and 5.1 dB in competing speaker and stationary noise respectively, while TTS systems made gains of 5.6 and 5.1 dB over their baseline. Surprisingly, for most conditions the largest gains were observed for noise-independent algorithms, suggesting that performance in this task can be further improved by exploiting information in the masking signal.

73 citations


Cites methods or result from "Evaluating the intelligibility bene..."

  • ...See [16] for more details....

    [...]

  • ...Note that this is not the same as the OptSII system described in [16]....

    [...]

  • ...Gains are expressed as equivalent intensity changes (EICs) computed by mapping scores to psychometric curves previously obtained for each masker using Plain speech (see [16] for details)....

    [...]

  • ...That study [16] compared 7 speech modification algorithms against read and Lombard speech and an unmodified TTS system....

    [...]

Journal ArticleDOI
TL;DR: Two-phase features are suggested to represent the phase of the harmonic model in a uniform way, without voicing decision, and the synthesis quality of the resulting vocoder has been evaluated, using subjective listening tests, in the context of resynthesis, pitch scaling, and Hidden Markov Model (HMM)-based synthesis.
Abstract: Feature-based vocoders, e.g., STRAIGHT, offer a way to manipulate the perceived characteristics of the speech signal in speech transformation and synthesis. For the harmonic model, which provide excellent perceived quality, features for the amplitude parameters already exist (e.g., Line Spectral Frequencies (LSF), Mel-Frequency Cepstral Coefficients (MFCC)). However, because of the wrapping of the phase parameters, phase features are more difficult to design. To randomize the phase of the harmonic model during synthesis, a voicing feature is commonly used, which distinguishes voiced and unvoiced segments. However, voice production allows smooth transitions between voiced/unvoiced states which makes voicing segmentation sometimes tricky to estimate. In this article, two-phase features are suggested to represent the phase of the harmonic model in a uniform way, without voicing decision. The synthesis quality of the resulting vocoder has been evaluated, using subjective listening tests, in the context of resynthesis, pitch scaling, and Hidden Markov Model (HMM)-based synthesis. The experiments show that the suggested signal model is comparable to STRAIGHT or even better in some scenarios. They also reveal some limitations of the harmonic framework itself in the case of high fundamental frequencies.

59 citations


Cites background from "Evaluating the intelligibility bene..."

  • ...8K utterances, respectively [77,78], all with fs = 16 kHz)....

    [...]

Journal ArticleDOI
TL;DR: The latter part of this work focuses mainly on a novel frequency warping technique that is shown to achieve vowel space expansion, incorporated into an established Lombard-inspired Spectral Shaping method that pairs with dynamic range compression to maximize speech audibility (SSDRC).

47 citations


Cites background or methods from "Evaluating the intelligibility bene..."

  • ...Moreover, this SS fixed filter has proven effective in an extensive evaluation of speech intelligibility enhancement modifications (Cooke et al., 2013)....

    [...]

  • ...Finally, an extensive evaluation of the ntelligibility of a variety of methods was recently carried out and described in Cooke et al. (2013)....

    [...]

  • ...In addition to the fixed filter, the SS described in Zorila et al. (2012) and evaluated in Cooke et al. (2013) also ncorporates adaptive components, including peak-sharpening (Hs(f)) and pre-emphasis (Hp(f)) filters....

    [...]

Journal ArticleDOI
TL;DR: This work describes methods for intelligibility enhancement from a unified vantage point, including speech intelligibility index (SII)-based systems and systems aimed at enhancing the sound-field where it is perceived by the listener.
Abstract: Modern communication technology facilitates communication from anywhere to anywhere. As a result, low speech intelligibility has become a common problem, which is exacerbated by the lack of feedback to the talker about the rendering environment. In recent years, a range of algorithms has been developed to enhance the intelligibility of speech rendered in a noisy environment. We describe methods for intelligibility enhancement from a unified vantage point. Before one defines a measure of intelligibility, the level of abstraction of the representation must be selected. For example, intelligibility can be measured on the message, the sequence of words spoken, the sequence of sounds, or a sequence of states of the auditory system. Natural measures of intelligibility defined at the message level are mutual information and the hit-or-miss criterion. The direct evaluation of high-level measures requires quantitative knowledge of human cognitive processing. Lower-level measures can be derived from higher-level measures by making restrictive assumptions. We discuss the implementation and performance of some specific enhancement systems in detail, including speech intelligibility index (SII)-based systems and systems aimed at enhancing the sound-field where it is perceived by the listener. We conclude with a discussion of the current state of the field and open problems.

47 citations

References
More filters
Book
01 Jan 1975
TL;DR: Names of founding work in the area of Adaptation and modiication, which aims to mimic biological optimization, and some (Non-GA) branches of AI.
Abstract: Name of founding work in the area. Adaptation is key to survival and evolution. Evolution implicitly optimizes organisims. AI wants to mimic biological optimization { Survival of the ttest { Exploration and exploitation { Niche nding { Robust across changing environments (Mammals v. Dinos) { Self-regulation,-repair and-reproduction 2 Artiicial Inteligence Some deenitions { "Making computers do what they do in the movies" { "Making computers do what humans (currently) do best" { "Giving computers common sense; letting them make simple deci-sions" (do as I want, not what I say) { "Anything too new to be pidgeonholed" Adaptation and modiication is root of intelligence Some (Non-GA) branches of AI: { Expert Systems (Rule based deduction)

32,573 citations


"Evaluating the intelligibility bene..." refers background or methods in this paper

  • ...The technique learns frequency band weights which maximise objective intelligibility using a genetic algorithm optimisation technique (Holland, 1975), with glimpse proportion (Cooke, 2006) as an objective intelligibility metric....

    [...]

  • ...Lombard sentences came from the same subset of the Harvard corpus as the plain material and were spoken by the same talker (see Section 3.1)....

    [...]

01 Jan 2002

8,837 citations


"Evaluating the intelligibility bene..." refers background or methods in this paper

  • ...Lombard sentences came from the same subset of the Harvard corpus as the plain material and were spoken by the same talker (see Section 3.1)....

    [...]

  • ...After recording, all speech was downsampled from 96 to 16 kHz using Praat (Boersma, 2001), manually endpointed to remove leading and trailing silence and high-pass filtered with a cut-off frequency of 100 Hz to remove low-frequency artefacts....

    [...]

Proceedings ArticleDOI
07 May 2001
TL;DR: A new model has been developed for use across a wider range of network conditions, including analogue connections, codecs, packet loss and variable delay, known as perceptual evaluation of speech quality (PESQ).
Abstract: Previous objective speech quality assessment models, such as bark spectral distortion (BSD), the perceptual speech quality measure (PSQM), and measuring normalizing blocks (MNB), have been found to be suitable for assessing only a limited range of distortions. A new model has therefore been developed for use across a wider range of network conditions, including analogue connections, codecs, packet loss and variable delay. Known as perceptual evaluation of speech quality (PESQ), it is the result of integration of the perceptual analysis measurement system (PAMS) and PSQM99, an enhanced version of PSQM. PESQ is expected to become a new ITU-T recommendation P.862, replacing P.861 which specified PSQM and MNB.

2,169 citations

Journal ArticleDOI
TL;DR: A short-time objective intelligibility measure (STOI) is presented, which shows high correlation with the intelligibility of noisy and time-frequency weighted noisy speech (e.g., resulting from noise reduction) of three different listening experiments and showed better correlation with speech intelligibility compared to five other reference objective intelligible models.
Abstract: In the development process of noise-reduction algorithms, an objective machine-driven intelligibility measure which shows high correlation with speech intelligibility is of great interest. Besides reducing time and costs compared to real listening experiments, an objective intelligibility measure could also help provide answers on how to improve the intelligibility of noisy unprocessed speech. In this paper, a short-time objective intelligibility measure (STOI) is presented, which shows high correlation with the intelligibility of noisy and time-frequency weighted noisy speech (e.g., resulting from noise reduction) of three different listening experiments. In general, STOI showed better correlation with speech intelligibility compared to five other reference objective intelligibility models. In contrast to other conventional intelligibility models which tend to rely on global statistics across entire sentences, STOI is based on shorter time segments (386 ms). Experiments indeed show that it is beneficial to take segment lengths of this order into account. In addition, a free Matlab implementation is provided.

1,847 citations


"Evaluating the intelligibility bene..." refers methods in this paper

  • ...Optimisation using other objective intelligibility or quality models (e.g. Christiansen et al., 2010; Taal et al., 2011; Rix et al., 2001) is likely to result in different modifications....

    [...]

Journal ArticleDOI
TL;DR: A set of simple new procedures has been developed to enable the real-time manipulation of speech parameters by using pitch-adaptive spectral analysis combined with a surface reconstruction method in the time–frequency region.

1,741 citations


"Evaluating the intelligibility bene..." refers methods in this paper

  • ...The following parameters were used to train, adapt and generate speech: 59 Mel cepstral coefficients, Mel scale F0, and 25 aperiodicity energy bands extracted using STRAIGHT (Kawahara et al., 1999)....

    [...]