scispace - formally typeset
Search or ask a question

Showing papers on "Voice published in 1996"


Journal ArticleDOI
TL;DR: In this article, a case study of a child acquiring English and Spanish in England between the ages of 1;7 and 2;3 is presented, where VOT measurements of utterance-initial stops were made of productions at ages 1,7, 1;11, and 2,3.

87 citations


Journal ArticleDOI
TL;DR: This article investigated the role of universals in Korean native speakers' acquisition of English obstruent voicing contrasts, finding that interlanguage systems behave according to the markedness universals of natural languages.
Abstract: This study of five native speakers of Korean investigates the role of universals in their acquisition of English obstruent voicing contrasts. The data were gathered from a passage and a word list containing voiced and voiceless obstruents in initial, medial, and final word positions. Results reflected principles of markedness universals of L1 acquisition and adult natural languages, suggesting that interlanguage systems behave according to universals of natural languages.

78 citations


Journal ArticleDOI
TL;DR: The results of the experiment show that the relative importance of burst and transitions for the perception of place of articulation to a great extent depends on place and voicing of the stop consonant and on the vowel context.
Abstract: The purpose of the study presented in this paper and the accompanying paper [Smits et al., J. Acoust. Soc. Am. 100, 3865–3881 (1996)] is to evaluate whether detailed or gross time‐frequency structures are more relevant for the perception of place of articulation of prevocalic stop consonants. To this end, first a perception experiment is carried out with ‘‘burst‐spliced’’ stop‐vowel utterances, containing the Dutch stops /b,d,p,t/ and /k/. From the utterances burst‐only, burstless, and cross‐spliced stimuli were created and presented to listeners. The results of the experiment show that the relative importance of burst and transitions for the perception of place of articulation to a great extent depends on place and voicing of the stop consonant and on the vowel context. Velar bursts are generally more effective in cueing place of articulation than other bursts. There is no significant difference in the effectiveness of /p/, /t/, and /k/ transitions, while /b/ transitions are more effective than /d/ transitions. The release burst dominates the perception of place of articulation in front‐vowel contexts, while the formant transitions are generally dominant in nonfront vowel contexts. The bursts of unvoiced stops are perceptually more important than the bursts of voiced stops.

57 citations


01 Jan 1996
TL;DR: In this article, it is argued that a segment phonetically realized as a voiced bilabial stop (b) may arise from several distinct phonological representations (i.e., it may be a laryngeally voiced sound, involving the activation of the feature (voice), a dependent of the Laryngeal node in the segmental tree.
Abstract: In this thesis the representation of voicing contrasts is explored. The central claim of the thesis is that voicing contrasts that are phonetically similar can arise from several distinct phonological representations. For example, it is argued that a segment phonetically realized as voiced bilabial stop ( (b)) may arise from several distinct phonological representations. This segment may be a laryngeally voiced sound, involving the activation of the feature (voice), a dependent of the Laryngeal node in the segmental tree. On the other hand, it may also be a sonorant sound, resulting from the activation of the SV (Sonorant Voicing) node in the segmental tree, a node that is argued to be distinct from the Laryngeal node. As well, there is a third possible representational source for this segment: it may be the result of what I term 'contextual voicing'. A contextually voiced segment is one that has neither a Laryngeal node nor an SV node. Such segments generally surface as voiced when surrounded by other voiced sounds. In the first two chapters, I outline the theory of SV sounds and a theory of enhancement that allows for minimal specification at the underlying level. In chapter three, I present a typology of laryngeal systems, arguing that the number of different laryngeal systems found in the languages of the world is constrained by the organization of features under the Laryngeal node. In chapters 4 and 5, I present empirical support for the model of segment structure presented in the first three chapters. A variety of languages that show within-sonorant assimilations are analyzed. As well, some problematic laryngeal systems (in particular, Dutch and Turkish) are explored and it is shown that the theory allows for fresh insights into systems in which the stops and the fricatives behave differently with respect to processes involving voicing. In the concluding chapter an analysis of a variety of consonantal alternations in Northern Turkic languages is presented. Here we see strong support for all aspects of the theory presented in the preceding chapters.

53 citations


Journal ArticleDOI
TL;DR: The authors examined the acquisition of the glottal and supra-glottal timing patterns of English initial stops by French speakers and found that French speakers have greater difficulty reducing the duration of pulsing for English /bdg/ than producing long-lag voice onset time (VOT) and concurrent reduction in closure duration for /ptk/.
Abstract: This study examines the acquisition of the glottal and supra-glottal timing patterns of English initial stops by French speakers. French speakers have greater difficulty reducing the duration of glottal pulsing for English /bdg/ than producing long-lag voice onset time (VOT) and the concurrent reduction in closure duration for /ptk/. This is ascribed to the greater articulatory complexity of English /bdg/ for French speakers, the lower degree of perceptual salience of differences in glottal pulsing, compared to differences in VOT, the existence in English of a competing pattern of more fully voiced /bdg/, and the existence in French of aspirated contextual variants of the lingual stops before high vowels. The latter also explains why long-lag VOT is more difficult to acquire for the labial than for the lingual stops. Acquisition of English-like VOT appears to proceed from (1) positive transfer from French of long voicing lag for lingual stops in high vowel contexts, to (2) distributional extension to nonhigh vowel contexts and to the labial stop, and to (3) gradual stretching of the lag to better approximate the English norm. The acquisition of English-like voicing appears to proceed in a similar fashion.

38 citations


Journal ArticleDOI
TL;DR: The somewhat enlarged voicing contrast during SC was consistent with previous findings regarding the influence of rate changes on the temporal fine structure of speech and was similar to the voicing contrast results reported for clear speech by Picheny, Durlach, and Braida (1986).
Abstract: This study investigated speaking rate and voice onset time (VOT) in speech produced during simultaneous communication (SC) by speakers with normal hearing. Stimulus words initiated with voiced and voiceless plosives were embedded in a sentence that was spoken and produced with SC. VOT measures were calculated from acoustic recordings and results indicated significant differences between speech-only and SC conditions, with speech produced during SC demonstrating both slower speaking rate and increased VOT of voiceless consonants. VOTs produced during both SC and speech-only conditions followed English voicing rules and varied appropriately with place of articulation. The somewhat enlarged voicing contrast during SC was consistent with previous findings regarding the influence of rate changes on the temporal fine structure of speech (Miller, 1987) and was similar to the voicing contrast results reported for clear speech by Picheny, Durlach, and Braida (1986).

30 citations


Journal ArticleDOI
TL;DR: The perceptual salience of relative spectral change and formant transitions as cues to labial and alveolar/dental place of articulation was assessed in a conflicting cue paradigm and provides no support for the view that the Relative spectral change is a significant perceptual cue to stop consonant place of articulatedulation.
Abstract: The perceptual salience of relative spectral change [Lahiri et al, J Acoust Soc Am 76, 391–404 (1984)] and formant transitions as cues to labial and alveolar/dental place of articulation was assessed in a conflicting cue paradigm The prototype stimuli were produced by two English speakers The stimuli with conflicting cues to place of articulation were created by altering the spectra of the signals so that the change in spectral energy from signal onset to voicing onset specified one place of articulation while the formant transitions specified the other place of articulation Listeners’ identification of these stimuli was determined principally by the information from formant transitions This outcome provides no support for the view that the relative spectral change is a significant perceptual cue to stop consonant place of articulation

29 citations


Journal ArticleDOI
01 Apr 1996

28 citations


Journal ArticleDOI
TL;DR: In order to analyze the articulatory ability of skilled esophageal speakers in terms of the voicing distinction in consonants, perceptual, acoustical and physiological studies were conducted and it was found that voiced-voiceless confusion was only 10.5% on average in the present subjects.
Abstract: In order to analyze the articulatory ability of skilled esophageal speakers in terms of the voicing distinction in consonants, perceptual, acoustical and physiological studies were conducted It was found that voiced-voiceless confusion was only 105% on average in the present subjects Acoustical analysis indicated that voice onset time (VOT) values in word-initial position and the presence or absence of voicing in word-medial position were most important cues for the voicing distinction of consonants in esophageal speech Fiber-optic and photoglottographic examinations revealed that there was a transient opening of the neoglottis for the production of voiceless sounds, while it stayed closed for voiced sounds Immediately before the period of voicelessness, surface EMG obtained from the anterior neck appeared to show transient activation There was a tendency that peak intraoral pressure values for the voiceless stops were significantly higher in the esophageal speakers than in the normal subject, whereas the values for voiced cognates were rather close to those of the normal

25 citations



Journal ArticleDOI
TL;DR: In this article, a phoneme identification experiment manipulating closure duration was carried out to investigate perception of the word-medial voicing contrast by Dutch four-year old, six-year-old, and 12-yearold children, and Dutch adults.

Journal ArticleDOI
TL;DR: The issue of temporal speech cues keeping their distinctiveness in the face of extrinsic transformations, such as those wrought by different speaking rates, is explored with respect to the perception, in Icelandic, of Voice Onset Time as acue for word-initial stop voicing, wordinitial aspiration as a cue for [h], and Voice Offset Time for pre-aspiration.
Abstract: Speech segments are highly context-dependent and acoustically variable. One factor that contributes heavily to the variability of speech is speaking rate. Some speech cues are temporal in nature—that is, the distinctions that they signify are defined over time. How can temporal speech cues keep their distinctiveness in the face of extrinsic transformations, such as those wrought by different speaking rates? This issue is explored with respect to the perception, in Icelandic, of Voice Onset Time as a cue for word-initial stop voicing, wordinitial aspiration as a cue for [h], and Voice Offset Time as a cue for pre-aspiration. All the speech cues show rate-dependent perception though to different degrees, with Voice Offset Time being most sensitive to rate changes and Voice Onset Time least sensitive. The differences in the behaviour of these speech cues are related to their different positions in the syllable.

Journal ArticleDOI
TL;DR: The main finding of this study indicates that previously reported group trends regarding aging effects on mean speaking fundamental frequency of the female voice cannot simply be attributed to all elderly individuals.


Journal ArticleDOI
TL;DR: This work has identified auditory grouping processes that contribute to voice segregation in at least three ways: periodicity or harmonicity in the composite signal provides a basis for grouping together signal components that stem from a target voice.
Abstract: Listeners with normal hearing can communicate successfully in noisy and reverberant conditions that distort temporal and spectral cues in the speech signal. Recent work has sought to identify auditory grouping processes that contribute to this ability. One property that has received attention is the fundamental frequency of voicing (F0). During voiced speech the pulsing of the vocal folds gives rise to a pattern of amplitude modulation in the waveform and harmonicity in the spectrum. When two or more voices compete for the attention of the listener, momentary differences in F0 can contribute to voice segregation in at least three ways. First, periodicity or harmonicity in the composite signal provides a basis for grouping together signal components that stem from a target voice. Second, waveform interactions generate moment‐to‐moment fluctuations in the signal‐to‐noise ratio that enable listeners to ‘‘glimpse’’ the acoustic features of the target voice. Third, time‐varying changes in F0 provide a basis fo...

Journal ArticleDOI
TL;DR: This article examined the interlanguage phonology of Malay-speaking Bruneian students attending Universiti Brunei Darussalam and found that markedness can explain why some features of the (target language) TL phonology are more easily acquired than others.
Abstract: This paper examines the interlanguage (IL) phonology of Malay-speaking Bruneian students attending Universiti Brunei Darussalam. A complete year group of 55 students with different proficiency levels was recorded interacting communicatively and the data analyzed in order to explore the extent to which markedness relationships in phonology can be used to predict learning difficulties and to investigate the extent to which this particular IL phonology has stabilized. The focus of the study was on consonant clusters, both final and initial, and the voicing contrast. The analysis of the data suggests that markedness can explain why some features of the (target language) TL phonology are more easily acquired than others. However, there was no significant difference between the IL phonology of the less proficient and more proficient students when lexical roots were examined. This suggests that the IL phonology in question has stabilized. Finally, this paper relates the findings to research into new varieties of English and suggests reasons for phonological stabilization.

Journal ArticleDOI
TL;DR: In this article, a description is provided, based on original experiments, of the final devoicing that curiously takes place in the English utterances of Tswana native speakers.
Abstract: The topic of this paper is the well-known phenomenon of final devoicing, which occurs in languages such as Afrikaans, Dutch, German and Russian. It is discussed against the background of second language acquisition. The aim of the discussion is threefold. First, a description is provided, based on original experiments, of the final devoicing that curiously takes place in the English utterances of Tswana native speakers. Second, the relevance of the results is pointed out to the theoretical notion of interlanguage phonology: this lies in the observation that both Tswana and English belong to the group of languages lacking final de- voicing: English essentially inexplicably lacks it, and Tswana completely lacks closed syllables. Third, details of the experiments are discussed, with data arrangements along several dimensions, taking into account the length of the vowel preceding the target consonant; the manner of articulation of the target consonant; and the notion that the speaker's output may be influence...


Journal ArticleDOI
TL;DR: This paper investigated the effect of increased segment durations associated with speech produced during simultaneous communication on final consonant voicing perception and found that the effect was not significant in the case of speech alone and simultaneous communication.
Abstract: The temporal structure of speech is altered dramatically when a person uses sign language and speaks aloud simultaneously (i.e., simultaneous communication or SC). Notable among these temporal alterations are durational increases of vowels, pause times between words, and voice onset times. Temporal aspects of speech play a pivotal role in the perception of certain phonemic contrasts in spoken English. For example, cues for the perception of final consonant voicing are carried in the vowel that precedes the final consonant. The purpose of the present study was to investigate the effect of increased segment durations associated with speech produced during SC on final consonant voicing perception. Eight skilled SC users produced naturally spoken words that differed only in the voicing characteristic of the final consonant. The words were recorded under two conditions: (a) speech alone and (b) SC. Digital editing was used to remove the final consonant. The digitally altered words were played to 20 listeners who, in a forced‐choice paradigm, circled the word they thought they heard. The listeners accurately identified final consonant voicing of 69.2% of the target words produced in the speech alone condition and 77.0% of the target words in the SC condition, a nonsignificant difference.

01 Jan 1996
TL;DR: The authors report on the phonetic interaction of tone and consonant voicing in Km u, a language where some dialects use Fo for producing distinctive word tones, while others do not have tones but rely on the contrastive voicing of initial consonants to distinguish words which tonal dialects distinguishes with tones.
Abstract: This is apreliminary report on the phonetic interaction of tone and consonant voicing in K m u , a language where some dialects use Fo for producing distinctive word tones, while others do not have tones but rely on the contrastive voicing of initial consonants to distinguish words which tonal dialects distinguishes with tones. Speakers of non-tonal dialects produce no significant Fo diflerences in words which differ only in the tones in tonal dialects, and a perception test showed that they did not use Fo to distinguish such words when listening to a tonal dialect.

Proceedings ArticleDOI
03 Oct 1996
TL;DR: It is suggested that lexical biasing in word recognition can account for the difference between the model and the behavioural results.
Abstract: Lipreading in combination with an acoustic indication of voice fundamental frequency (FO) has been shown to greatly improve word recognition accuracy with sentence stimuli. A possible explanation for this effect is that FO delivers information for consonantal voicing. In experiment 1 we showed with a computational model how voicing information affects the uniqueness of lipread words in a large phonemically transcribed machine-readable lexicon. In experiment 2 the same computational methods were used to simulate the results obtained by McGrath and Summerfield (1985) for lipreading with and without acoustic FO. The model failed to account in full for the behaviourally observed enhancements. It is suggested that lexical biasing in word recognition can account for the difference between the model and the behavioural results.

Patent
16 Jul 1996
TL;DR: In this paper, the brightness of the voice is decided by a voice brightness switch, and the strength of a voice is determined by a touch of a key board on a sound source.
Abstract: PURPOSE: To voice an expressive voice according to strength and brightness to be voiced in a voice generating device capable of voicing the voice. CONSTITUTION: The brightness of the voice is decided by a voice brightness switch, and the strength of the voice is decided by the strength of a touch of a key board 1. Then, the consonant waveform data corresponding to the brightness of the voice decided by the voice brightness switch and the vowel waveform data corresponding to the strength of the voice decided by the strength of the touch of the key board 1 are read out respectively, and these consonant and vowel waveform data are converted based on a key code and the velocity data, and voicing is instructed to a sound source 6 based on the result. Thus, the strength and the brightness such as a sound constituting language are expressed.

Journal ArticleDOI
TL;DR: The cross‐linguistic pattern of post‐nasal voicing reflects the combined effects of nasal leak and rarefactive velar pumping; the absence of pre-nasal voiced reflects the antagonistic effects of nose leak and compressive velar Pumping.
Abstract: Many languages systematically replace voiceless stops by voiced stops when the immediately preceding sound is nasal. This is estimated to occur in 7% of all languages, and thus is likely to reflect preferences of the speech production mechanism. Post‐nasal voicing can be explained by two factors. Coarticulation with an adjacent nasal induces nasal leak during a stop, which vents oral pressure and tends to maintain the transglottal pressure drop needed to sustain voicing. Coarticulation also causes the velum to rise or fall during the stop while the velar port is closed. This creates a pumping effect (rarefaction in post‐nasal stops, compression in pre‐nasal), which facilitates voicing for post‐nasal stops and inhibits voicing for pre‐nasal stops. The cross‐linguistic pattern of post‐nasal voicing reflects the combined effects of nasal leak and rarefactive velar pumping; the absence of pre‐nasal voicing reflects the antagonistic effects of nasal leak and compressive velar pumping. To test this hypothesis, ...

Journal ArticleDOI
TL;DR: In this article, the authors assess the validity of this assumption in normal English-speaking women, men, and 5-year-old children and find that some aspects of laryngeal behavior are not identical across these populations.
Abstract: Patterns of consonantal voicing, especially as measured by voice onset time (VOT), have played an important role in discussions of speech timing control across populations. For example, relatively late acquisition of adultlike VOT values for voiceless aspirated stops in normal children has been attributed to the precise laryngeal‐oral phasing required for these sounds. Implicit in such arguments is the assumption that the laryngeal and oral events themselves are comparable across subject groups. The present study attempts to assess the validity of this assumption in normal English‐speaking women, men, and 5‐year‐old children. Recordings of oral airflow, intraoral air pressure, and acoustics were made as subjects produced a variety of voiceless consonants in prestress position within a carrier phrase. Analysis here focuses on /h/, which involves no oral obstruction and therefore affords the clearest picture of events at the glottis during the consonant. In these data, the patterns of flow increase, voicing offset and voicing onset during abduction for /h/ show certain differences across groups and suggest that some aspects of laryngeal behavior are not identical across these populations. Results are discussed in terms of group‐related differences in anatomy and aerodynamic quantities.

Journal ArticleDOI
TL;DR: The authors reported on two studies of the acquisition of the voicing contrast in Taiwanese and showed that 6-year-olds' production of voiced stops is not completely adult-like, although they did not show that the children had already acquired the contrast between the two voiceless stop types in the beginning of recording, although there were still many tokens of voiceless aspirated stops produced with short lag VOTs.
Abstract: Taiwanese (Amoy) is one of the few Chinese dialects with a three‐way contrast. There are three Taiwanese syllable initial voiceless unaspirated stops, /p,t,k/, three voiceless aspirated stops, /ph,th,kh/, and two voiced stops, /b,g/. This paper reports on two studies of the acquisition of the voicing contrast in Taiwanese. The longitudinal study followed two girls from about 28 months to 33 months and to 40 months, respectively. The cross‐sectional study compared VOTs in 54 children ranging from 30 months to 6 years. In the longitudinal study, the children had already begun to acquire the contrast between the two voiceless stop types in the beginning of recording, although there were still many tokens of voiceless aspirated stops produced with short lag VOTs. Later, the VOTs of the voiceless aspirated stops were hyperaspirated with VOTs exceeding the adult norm, and there were fewer tokens with short lag VOTs. Both children began to acquire the voiced stops around 33 months. The cross‐sectional study confirm Kewley‐Port and Preston’s (1974) claim and shows further that 6‐year‐olds’ production of voiced stops is not completely adult‐like.

Journal Article
TL;DR: Speech analysis is an important complement to evaluate the phonatory behavior of laryngectomized patients and the automatic calculation of frequential parameters are promising but there are errors due to the importance of altered voicing.
Abstract: Through a review of published acoustical analyses of alaryngeal voice and speech, our aim was to establish the interest and the limitations of a computerized acoustical system. One case of esophageal voice and one case of tracheoesophageal voice were analysed with the Kay Elemetrics Multidimensional Voice Program for voice and Computerised Speech Lab for speech. All the acoustical parameters are altered in alaryngeal voice. The automatic calculation of frequential parameters are promising but there are errors due to the importance of altered voicing. Speech analysis is an important complement to evaluate the phonatory behavior of laryngectomized patients.

Proceedings ArticleDOI
14 Oct 1996
TL;DR: In this article, a pre-classification method and its digital signal processing algorithm implemented by short time Fourier transform for Chinese voiceless consonant speech are proposed and tested on a data set of 910 C-V syllables from a database of 1267 Chinese allsyllable tokens spoken by a male speaker.
Abstract: A pre-classification method and its digital signal processing algorithm implemented by short time Fourier transform for Chinese voiceless consonant speech are proposed. The important features of the spike fill of stops and C-V (C denotes voiceless consonant) boundary are detected and marked automatically first. Then, the Chinese voiceless consonants are divided into stops and non-stops according to these distinctive features. The stops are further divided into unaspirated stops, aspirated-fricative stops. Testing on a data set of 910 C-V syllables from a database of 1267 Chinese all-syllable tokens spoken by a male speaker shows that the algorithm performs well with a 92.4% average correct rate. The proposed method and algorithm are of value to improve the mechanism and performance of a Chinese speech recognition system.

Proceedings ArticleDOI
03 Oct 1996
TL;DR: The study investigates the patterns of devoicing in post-vocalic obstruants in Canadian French and attempts to verify the following functional hypothesis: a consonant will be more resistant toDevoicing (absence of periodic structure) if no (or few) other cues of the voiced-voiceless distinction can be found either in the consonant itself or in the preceding vowel.
Abstract: In Canadian French, besides periodic phonation, other cues can be associated with the voiced-voiceless distinction due to the application of phonological rules. These cues, mainly duration and vowel quality, may be present in the consonant itself (voiced consonants are shorter than their voiceless counterparts) and in the preceding vowels (duration and vowel quality). The cues are related to the application of an allophonic rule and the presence in the phoneme inventory of intrinsically long and short vowels. The tendency towards devoicing of some portion of a normally voiced consonant in post-vocalic word-final positions is found in many languages. The study investigates the patterns of devoicing in post-vocalic obstruants in Canadian French and attempts to verify the following functional hypothesis: a consonant will be more resistant to devoicing (absence of periodic structure) if no (or few) other cues of the voiced-voiceless distinction can be found either in the consonant itself or in the preceding vowel. The data will serve as a reference for current studies on patients with apraxia of speech.

Journal ArticleDOI
TL;DR: This article investigated the effects of linguistic experience on the production of voice onset time (VOT) in syllable-initial stop consonants in Spanish and English monolingual and Spanish-English bilinguals.
Abstract: The present study investigated the effects of linguistic experience on the production of voice onset time (VOT) in syllable‐initial stop consonants. Spanish and English realize the voicing distinction using different phonetic categories. Of interest in this study was how language background influenced VOT across different speaking rates. Three groups of subjects (English and Spanish monolinguals and Spanish–English bilinguals), produced sentences containing voiced and voiceless stops at different speaking rates. Care was taken to place the bilinguals into Spanish and English monolingual modes on separate occasions. Preliminary analyses of the bilinguals’ data suggest several findings. Comparing the VOT values for bilinguals’ /p/ tokens in both language modes provided evidence for separate phonetic categories. Changes in speaking rate affected the VOT values for the voiceless tokens when in English mode and the voiced tokens when in Spanish mode. Finally, differences were found among the talkers in the way in which they realized the voicing distinction in Spanish, but not in English mode. These results point to differential effects of speaking rate on phoneme categories as a function of language mode. Additional analyses will be presented comparing productions by Spanish and English monolinguals to Spanish–English bilinguals’ in their corresponding language modes. [Work supported by NSF.]

Patent
19 Feb 1996
TL;DR: In this article, the spectral magnitude and phase representation used in Multi-Band Excitation (MBE) based speech coding systems is improved. Butler et al. proposed a fast, FFT compatible method which produces a smooth set of spectral magnitudes without the sharp discontinuities introduced by voicing transitions.
Abstract: The spectral magnitude and phase representation used in Multi-Band Excitation (MBE) based speech coding systems is improved. At the encoder the digital speech signal is divided into frames, and a fundamental frequency, voicing information, and a set of spectral magnitudes are estimated for each frame. A spectral magnitude is computed at each harmonic frequency (ie. multiples of the estimated fundamental frequency) using a new estimation method which is independent of voicing state and which corrects for any offset between the harmonic and the frequency sampling grid. The result is a fast, FFT compatible method which produces a smooth set of spectral magnitudes without the sharp discontinuities introduced by voicing transitions as found in prior MBE based speech coders. Quantization efficiency is thereby improved, producing higher speech quality at lower bit rates. In addition, smoothing methods, typically used to reduce the effect of bit errors or to enhance formants, are more effective since they are not confused by false edges (i.e. discontinuities) at voicing transitions. Overall speech quality and intelligibility are improved. At the decoder a bit stream is received and then used to reconstruct a fundamental frequency, voicing information, and a set of spectral magnitudes for a sequence of frames. The voicing information is used to label each harmonic as either voiced or unvoiced, and for voiced harmonics an individual phase is regenerated as a function of the spectral magnitudes localized about that harmonic frequency. The decoder then synthesizes the voiced and unvoiced component and adds them to produce the synthesized speech. The regenerated phase more closely approximates actual speech in terms of peak-to-rms value relative to the prior art, thereby yielding improved dynamic range. In addition the synthesized speech is perceived as more natural and exhibits fewer phase related distortions.