Showing papers in "Speech Communication in 1989"
••
TL;DR: A robust new algorithm for accurate endpointing of speech signals is described in this paper after an overview of the literature, which uses simple measures based on energy and zero-crossing rate for speech/silence detection.
113 citations
••
TL;DR: A four parameter model of the glottis is described with similar kinematic parameters to complement this approach and provides an alternative to flow pulse modeling because it can include some source-system interactions with relatively little computational overhead.
70 citations
••
TL;DR: Anticipatory effects appear to be more tightly controlled than carryover effects presumably because of phonemic preplanning, and gestural antagonism in the contextual phonemes affects the two coarticulatory types differently.
64 citations
••
TL;DR: In articulatory phonetics speech is described as a sequence of distinct articulatory gestures, each of which produces an acoustic event that should approximate a phonetic target as discussed by the authors, but due to the overlap of the gestures these phonetic targets are often only partly realized.
51 citations
••
TL;DR: This large-scale (3–3.5 Bark) spectral integration theory derived from the work of Chistovich and colleagues and supposed to provide a basis for the computation of the F2 parameter is not in fact supported by an actual proof, since all presumed evidence can be understood without this theory.
50 citations
••
TL;DR: The magnitudes of the male-female differences are similar to those observed for the creaky-normal voicing differences and breathy-normal differences, and may arise from a combination of biological, sociological and acoustical effects.
47 citations
••
TL;DR: The model presented here shows that syntax-driven and rhythm-driven strategies could be extreme cases of a more complex model which integrates both syntactic and rhythmic constraints.
42 citations
••
TL;DR: It is concluded that the UP strongly mediates the recognition of spoken words with early UP, and the shadowing of late-UP items is best predicted by word length in slower, and by word frequency in faster subjects; this suggests the intervention of different mechanisms.
34 citations
••
TL;DR: No intrinsic superiority in the discrimination performance of connected speech as opposed to sustained vowels could be found and in the case of running speech absolute microperturbation values appeared to be higher during inter-segment transitions and during voice onset and offset.
29 citations
••
TL;DR: A tentative conclusion from these experiments is that it is easier for the perceptual system to compensate for the effects of a transmission channel if it only changes the relative amplitudes of formants than if it changes estimated formant frequencies.
24 citations
••
TL;DR: A two-channel approach to speech analysis is recommended to aid the automatic processing of speech, where one channel is the conventional acoustic signal, while the other channel isThe electroglottogram (EGG).
••
TL;DR: This paper compares the results obtained with nine different sets of speech parametes, including log- area parameters, formants, reflection coefficients and band-filter parameters and concludes that log-area parameters from the most suitable parameter set available for temporal decomposition are obtained.
••
TL;DR: The algorithm is based on the iterative use of a linear filter with zero phase and monotonically decreasing frequency response, providing an estimate for the locations of the closure and opening of the vocal chords.
••
TL;DR: The use of the quadratic classifier together with the individual feature space is shown to drastically improve recognition accuracy while the added memory requirements are shown to be negligible.
••
TL;DR: In a paired comparison task, two factors appeared to affect the tempo judgements to a certain extent: the response category to be used by the listeners and the position of the stimulus with standard tempo.
••
TL;DR: The LSP representation is studied for speech recognition, and the weighted LSP distance measure is found to perform significantly better than these popular LP distance measures.
••
TL;DR: A clustering algorithm based on the standard KMEANS procedure that generates reference models for continuous density Hidden Markov Model (HMM) based systems by simultaneously considering spectral and duration information is introduced.
••
TL;DR: The results indicated that the effects of the parameters are additive and that, although presence/absence of periodicity (VOT and VTT) is the most important determinant of perceived voicing, perception is also to a large extent affected by “C2”-duration and “preceding vowel” duration.
••
TL;DR: A CELP speech coding algorithm where the coder parameters are jointly optimized where the relation between pitch period, pitch predictor coefficient, codebook entry and scaling factor is derived.
••
TL;DR: Investigations on a population of 22 speakers showed that the elimination of the time-invariant spectral components from the speech features, taking place when performing cepstral normalization or computing first-order orthogonal coefficients, brings a substantial reliability improvement.
••
TL;DR: A Partial Connection Multilayered Network (PCMN), based on a technique of partial connection between layers, is presented, which permits the efficient treatment of temporal information, which is very important in speech processing, unlike image processing.
••
TL;DR: The Dempster-Shafer formalism is applied in order to combine information in the lexicon, using a frequency distribution as the basis for evidence evaluation and has suitable properties in the case of an oral dialogue system, as it preserves module autonomy and allows backtracking at any time during the recognition process.
••
••
TL;DR: This work used both a more conventional articulation test and a monosyllabic adaptive speech interference test to evaluate the intelligibility of nine different speech-coding techniques, and found different patterns of responses.
••
TL;DR: It is proposed that the study of lexical stress in continuous speech be accompanied by theStudy of prosodics and their general use in sentences, to avoid the problem of syllable segmentation.
••
TL;DR: An experimental Dutch keyboard-to-speech system has been developed to explore the possibilities and limitations of Dutch speech synthesis in a communication aid for the speech impaired as mentioned in this paper, using diphones and a formant synthesizer chip for speech synthesis.
••
TL;DR: Discrete power spectrum features, i.e. the sign and rank-order functions of a bandpass filter output are analyzed together with more standard features such as LPC coefficients and the short-time spectrum measured by means of aBandpass filter bank.
••
TL;DR: It is concluded that phonetic and psycholinguistic feature representations need not match.
••
TL;DR: The gain portion of a shape-gain quantizer is made adaptive, yielding a vector quantizer that can adjust itself to the time-varying amplitude of a speech signal.