Topic
Voice
About: Voice is a research topic. Over the lifetime, 2393 publications have been published within this topic receiving 56637 citations.
Papers published on a yearly basis
Papers
More filters
•
21 Dec 2000TL;DR: In this paper, a voicing determination algorithm for classification of a speech signal segment as voiced or unvoiced is presented, which is based on a normalized autocorrelation where the length of the window is proportional to the pitch period.
Abstract: This invention presents a voicing determination algorithm for classification of a speech signal segment as voiced or unvoiced. The algorithm is based on a normalized autocorrelation where the length of the window is proportional to the pitch period. The speech segment to be classified is further divided into a number of sub-segments, and the normalized autocorrelation is calculated for each sub-segment if a certain number of the normalized autocorrelation values is above a predetermined threshold, the speech segment is classified as voiced. To improve the performance of the voicing determination algorithm in unvoiced to voiced transients, the normalized autocorrelations of the last sub-segments are emphasized. The performance of the voicing decision algorithm can be enhanced by utilizing also the possible lookahead information.
14 citations
••
TL;DR: The sensitivity of the P3a to the stress manipulation suggests that prosodic rather than temporal salience captures attention in unattended speech sounds.
Abstract: This study addressed whether temporally salient (e.g., word onset) or prosodically salient (e.g., stressed syllables) information serves as a cue to capture attention in speech sound analysis. In an auditory oddball paradigm, 16 native English speakers were asked to ignore binaurally presented disyllabic speech sounds and watch a silent movie while ERPs were recorded. Four types of phonetic deviants were employed: a deviant syllable that was either stressed or unstressed and that occurred in either the first or second temporal position. The nature of the phonetic change (a change from a voiced consonant to its corresponding unvoiced consonant) was kept constant. MMNs were observed for all deviants. In contrast, the P3a was only seen when the deviance occurred on stressed syllables. The sensitivity of the P3a to the stress manipulation suggests that prosodic rather than temporal salience captures attention in unattended speech sounds.
14 citations
••
01 Apr 1987TL;DR: A two-channel, speech and electroglottograph (EGG) approach to speech analysis is suggested to aid the automatic processing of speech.
Abstract: Attempts to measure the synthetic quality of speech usually consider the two factors intelligibility and naturalness, each involving subjective and objective characteristics. To generate high quality synthetic speech, spectral distortion should be avoided, spectral continuity and formant tracking should be done well. Glottal-related factors, including proper modeling of the 1) glottal excitation waveforms and 2) effects of source-tract interaction for synthesizers are discussed. Accurate detection of voiced/unvoiced/ silent segments in the speech waveform and the fundamental frequency of voicing are also major concerns. We present both formal and informal listener evaluations of three synthesizers: LPC, formant and articulatory. Finally, we suggest a two-channel, speech and electroglottograph (EGG), approach to speech analysis to aid the automatic processing of speech.
14 citations
••
22 May 2011
TL;DR: Limited acoustic-phonetic information derived primarily by processing the excitation source information in the speech signal is used to improve the performance of detection of manner of articulation from a baseline phone recognition system.
Abstract: Reliable acoustic-phonetic (AP) information derived from the speech signal can be used to detect and correct errors in the output of a phone recognizer. In this paper, limited acoustic-phonetic information derived primarily by processing the excitation source information in the speech signal is used to improve the performance of detection of manner of articulation from a baseline phone recognition system. A context-independent HMM-based monophone system without any language information is used as the baseline system for this purpose. The performance of the phone recognizer in terms of its ability to detect the manners of articulation is studied. The errors in the hypothesis of the manner of articulation of phones are corrected using AP information such as voicing, voice bar and frication. It is shown that significant improvement can be achieved by using simple or limited AP information.
13 citations
••
TL;DR: This article exploited the McGurk effect to examine whether visual information for place of articulation also shifts the best-exemplar range for voiceless stop consonants, following Green and Kuhl's (1989) demonstration of effects of visual place-of- articulation on the location of voicing boundaries.
Abstract: Previous work has demonstrated that the graded internal structure of phonetic categories is sensitive to a variety of contextual factors. One such factor is place of articulation: The best exemplars of voiceless stop consonants along auditory bilabial and velar voice onset time (VOT) continua occur over different ranges of VOTs (Volaitis & Miller, 1992). In the present study, we exploited theMcGurk effect to examine whether visual information for place of articulation also shifts the best-exemplar range for voiceless consonants, following Green and Kuhl’s (1989) demonstration of effects of visual place of articulation on the location of voicing boundaries. In Experiment 1, we established that /p/ and /t/ have different best-exemplar ranges along auditory bilabial and alveolar VOT continua. We then found, in Experiment 2, a similar shift in the best-exemplar range for /t/ relative to that for /p/ when there was a change in visual place of articulation, with auditory place of articulation held constant. These findings indicate that the perceptual mechanisms that determine internal phonetic category structure are sensitive to visual, as well as to auditory, information.
13 citations