scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Acoustical Society of America in 1989"


Journal ArticleDOI
TL;DR: Techniques used to synthesize headphone-presented stimuli that simulate the ear-canal waveforms produced by free-field sources are described, showing that the simulations duplicate free- field waveforms within a few dB of magnitude and a few degrees of phase at frequencies up to 14 kHz.
Abstract: This article describes techniques used to synthesize headphone-presented stimuli that simulate the ear-canal waveforms produced by free-field sources. The stimulus synthesis techniques involve measurement of each subject's free-field-to-eardrum transfer functions for sources at a large number of locations in free field, and measurement of headphone-to-eardrum transfer functions with the subject wearing headphones. Digital filters are then constructed from the transfer function measurements, and stimuli are passed through these digital filters. Transfer function data from ten subjects and 144 source positions are described in this article, along with estimates of the various sources of error in the measurements. The free-field-to-eardrum transfer function data are consistent with comparable data reported elsewhere in the literature. A comparison of ear-canal waveforms produced by free-field sources with ear-canal waveforms produced by headphone-presented simulations shows that the simulations duplicate free-field waveforms within a few dB of magnitude and a few degrees of phase at frequencies up to 14 kHz.

724 citations


Journal ArticleDOI
TL;DR: Listeners reported the apparent spatial positions of wideband noise bursts that were presented either by loudspeakers in free field or by headphones, with the aim of duplicating, at a listener's eardrums, the waveforms that were produced by the free-field stimuli.
Abstract: Listeners reported the apparent spatial positions of wideband noise bursts that were presented either by loudspeakers in free field or by headphones. The headphone stimuli were digitally processed with the aim of duplicating, at a listener’s eardrums, the waveforms that were produced by the free‐field stimuli. The processing algorithms were based on each subject’s free‐field‐to‐eardrum transfer functions that had been measured at 144 free‐field source locations. The headphone stimuli were localized by eight subjects in virtually the same positions as the corresponding free‐field stimuli. However, with headphone stimuli, there were more front–back confusions, and source elevation seemed slightly less well defined. One subject’s difficulty with elevation judgments, which was observed both with free‐field and with headphone stimuli, was traced to distorted features of the free‐field‐to‐eardrum transfer function.

720 citations


Journal ArticleDOI
TL;DR: In this article, the authors present a rigorous model for the propagation of pressure waves in bubbly liquids and show that the model works well up to volume fractions of 1% to 2% provided that bubble resonances play a negligible role.
Abstract: Recent work has rendered possible the formulation of a rigorous model for the propagation of pressure waves in bubbly liquids. The derivation of this model is reviewed heuristically, and the predictions for the small‐amplitude case are compared with the data sets of several investigators. The data concern the phase speed, attenuation, and transmission coefficient through a layer of bubbly liquid. It is found that the model works very well up to volume fractions of 1%–2% provided that bubble resonances play a negligible role. Such is the case in a mixture of many bubble sizes or, when only one or a few sizes are present, away from the resonant frequency regions for these sizes. In the presence of resonance effects, the accuracy of the model is severely impaired. Possible reasons for the failure of the model in this case are discussed.

649 citations


Journal ArticleDOI
Ingo R. Titze1
TL;DR: Comparison is drawn between male and female larynges on the basis of overall size, vocal fold membranous length, elastic properties of tissue, and prephonatory glottal shape and the simulated vocal fold contact area is used to infer male-female differences in the shape of the glottis.
Abstract: Comparison is drawn between male and female larynges on the basis of overall size, vocal fold membranous length, elastic properties of tissue, and prephonatory glottal shape. Two scale factors are proposed that are useful for explaining differences in fundamental frequency, sound power, mean airflow, and glottal efficiency. Fundamental frequency is scaled primarily according to the membranous length of the vocal folds (scale factor of 1.6), whereas mean airflow, sound power, glottal efficiency, and amplitude of vibration include another scale factor (1.2) that relates to overall larynx size. Some explanations are given for observed sex differences in glottographic waveforms. In particular, the simulated (computer-modeled) vocal fold contact area is used to infer male-female differences in the shape of the glottis. The female glottis appears to converge more linearly (from bottom to top) than the male glottis, primarily because of medial surface bulging of the male vocal folds.

504 citations


Journal ArticleDOI
TL;DR: The authors provide a coherent and well documented frame of reference for a field of study that is becoming a central to both linguistics and psycholinguistics, including a wide variety of approaches from the radical alternative of new connectionist models, through new developments in traditional symbolic approaches, to the reemphasis on linguistic concepts as a crucial input to psycholingual models.
Abstract: How do humans understand and produce language? "Lexical Representation and Process" is the first collection to cover the full range of lexical representations and their role in language processing. The 18 contributions focus on psychological models of lexical processing, the nature of the input, lexical structure and process, and parsing and interpretation. "Lexical Representation and Process "provides a coherent and well documented frame of reference for a field of study that is becoming a central to both linguistics and psycholinguistics. It includes a wide variety of approaches from the radical alternative of new connectionist models, through new developments in traditional symbolic approaches, to the reemphasis on linguistic concepts as a crucial input to psycholinguistic models. The contributors are William Marslen Wilson, Ken Forster, Mark Seidenberg, Gary Dell, Dennis Matt, Jeff Elman, Keith Rayner, David Balota, Derek Besner, James Johnston, Uli Frauenfelder, Aditi Lahiri, Anne Cutler, Leslie Henderson, Jorge Hankamer, Rob Schreuder, Ino Flores D'Arcais, Lyn Frazier, Lorraine Tyler, Mark Steedman, Mike Tanenhaus, and Greg Carlson. William Marslen Wilson is a Senior Scientist at the Medical Research Council Applied Psychology Unit in Cambridge, England. A Bradford Book

489 citations


Journal ArticleDOI
TL;DR: Possible hypotheses regarding the nature of the processes giving rise to these effects are discussed, with particular attention to the idea that the processing of information about the talker's voice is intimately related to early perceptual processes that extract acoustic-phonetic information from the speech signal.
Abstract: The perceptual consequences of trial‐to‐trial changes in the voice of the talker on spoken word recognition were examined. The results from a series of experiments using perceptual identification and naming tasks demonstrated that perceptual performance decreases when the voice of the talker changes from trial to trial compared to performance when the voice on each trial remains the same. In addition, the effects of talker variability on word recognition appeared to be more robust and less dependent on task than the effects of word frequency and lexical structure. Possible hypotheses regarding the nature of the processes giving rise to these effects are discussed, with particular attention to the idea that the processing of information about the talker’s voice is intimately related to early perceptual processes that extract acoustic–phonetic information from the speech signal.

460 citations


Journal ArticleDOI
TL;DR: The present work reviews theories and empirical findings that bear on the perception of English vowels, with an emphasis on the comparison of data analytic "machine recognition" approaches with results from speech perception experiments.
Abstract: The present work reviews theories and empirical findings, including results from two new experiments, that bear on the perception of English vowels, with an emphasis on the comparison of data analytic "machine recognition" approaches with results from speech perception experiments. Two major sources of variability (viz., speaker differences and consonantal context effects) are addressed from the classical perspective of overlap between vowel categories in F1 x F2 space. Various approaches to the reduction of this overlap are evaluated. Two types of speaker normalization are considered. "Intrinsic" methods based on relationships among the steady-state properties (F0, F1, F2, and F3) within individual vowel tokens are contrasted with "extrinsic" methods, involving the relationships among the formant frequencies of the entire vowel system of a single speaker. Evidence from a new experiment supports Ainsworth's (1975) conclusion [W. Ainsworth, Auditory Analysis and Perception of Speech (Academic, London, 1975)] that both types of information have a role to play in perception. The effects of consonantal context on formant overlap are also considered. A new experiment is presented that extends Lindblom and Studdert-Kennedy's finding [B. Lindblom and M. Studdert-Kennedy, J. Acoust. Soc. Am. 43, 840-843 (1967)] of perceptual effects of consonantal context on vowel perception to /dVd/ and /bVb/ contexts. Finally, the role of vowel-inherent dynamic properties, including duration and diphthongization, is briefly reviewed. All of the above factors are shown to have reliable influences on vowel perception, although the relative weight of such effects and the circumstances that alter these weights remain far from clear. It is suggested that the design of more complex perceptual experiments, together with the development of quantitative pattern recognition models of human vowel perception, will be necessary to resolve these issues.

448 citations


Journal ArticleDOI
TL;DR: In this paper, a wave superposition integral is proposed for computing the acoustic fields of arbitrarily shaped radiators, which is shown to be equivalent to the Helmholtz integral, based on the idea that the combined fields of an array of sources interior to a radiator can be made to reproduce a velocity prescribed on the surface of the radiator.
Abstract: A method for computing the acoustic fields of arbitrarily shaped radiators is described that uses the principle of wave superposition. The superposition integral, which is shown to be equivalent to the Helmholtz integral, is based on the idea that the combined fields of an array of sources interior to a radiator can be made to reproduce a velocity prescribed on the surface of the radiator. The strengths of the sources that produce this condition can, in turn, be used to compute the corresponding surface pressures. The results of several numerical experiments are presented that demonstrate the simplicity of the method. Also, the advantages that the superposition method has over the more commonly used boundary‐element methods are discussed. These include simplicity of generating the matrix elements used in the numerical formulation and improved accuracy and speed, the latter two being due to the avoidance of uniqueness and singularity problems inherent in the boundary‐element formulation.

363 citations


Journal ArticleDOI
Ingo R. Titze1
TL;DR: It is shown that the typical 2-6 Hz/cm H2O rise in fundamental frequency with subglottal pressure observed in human and canine larynges is a direct and predictable consequence of this amplitude-frequency dependence.
Abstract: The change in fundamental frequency with subglottal pressure in phonation is quantified on the basis of the ratio between vibrational amplitude and vocal fold length. This ratio is typically very small in stringed instruments, but becomes quite appreciable in vocal fold vibration. Tension in vocal fold tissues is, therefore, not constant over the vibratory cycle, and a dynamic tension gives rise to amplitude–frequency dependence. It is shown that the typical 2–6 Hz/cm H2O rise in fundamental frequency with subglottal pressure observed in human and canine larynges is a direct and predictable consequence of this amplitude–frequency dependence. Results are presently limited to phonation in the chest register.

347 citations


Journal ArticleDOI
TL;DR: In this paper, the authors compared three estimators, namely, the moment method (MM), the maximum likelihood (ML), and the moment/Newton step (MNS), for estimating the parameters of a three-parameter generalized Gaussian distribution.
Abstract: The primary objective of this paper is to compare the large‐sample as well as the small‐sample properties of different methods for estimating the parameters of a three‐parameter generalized Gaussian distribution. Three estimators, namely, the moment method (MM), the maximum‐likelihood (ML), and the moment/Newton‐step (MNS) estimators, are considered. The applicability of general asymptotic optimality results of the efficient ML and MNS estimation techniques is studied in the generalized Gaussian context. The asymptotic normal distributions of the estimators are obtained. The asymptotic relative superiority of the ML estimator or its variant, the MNS estimator, over the moment method is studied in terms of asymptotic relative efficiency. Based on this study, it is concluded that deviations from normality in the underlying distribution of the data necessitate the use of the efficient ML or MNS methods. In the small‐sample case, a detailed comparative study of the estimators is made possible by extensive Monte Carlo simulations. From this study, it is concluded that the maximum‐likelihood method is found to be significantly superior for heavy‐tailed distributions. In a region of the parameter space corresponding to the vicinity of the Gaussian distribution, the moment method compares well with the other methods. Further, the MNS estimator is shown to perform best for light‐tailed distributions. The simulation results are shown to lend support to analytically derived asymptotic results for each of the methods.

324 citations


Journal ArticleDOI
TL;DR: A method is proposed to correct for unknown phase aberration, which uses speckle brightness as a quality factor, analogous to the correction technique used by Muller and Buffington to adaptively focus incoherent optical telescopes.
Abstract: Medical ultrasonic images are degraded by tissues with inhomogeneous acoustic velocities. The resulting phase aberration raises the off-peak response of the imaging system's point spread function (PSF), decreasing dynamic range. In extreme cases, multiple images of a single target are displayed. Phase aberration may become a limiting factor to image quality as ultrasonic frequency and aperture size are increased in order to improve spatial resolution. A method is proposed to correct for unknown phase aberration, which uses speckle brightness as a quality factor. The phase delays of a phased array transducer are modified, element by element, to maximize mean speckle brightness in a region of interest. The technique proposed is analogous to the correction technique used by Muller and Buffington [J. Opt. Soc. Am. 64 (9), 1200-1209 (1974)] to adaptively focus incoherent optical telescopes. The method is demonstrated using a computer model with several different simulated aberration profiles. With this model, mean speckle brightness is calculated using the two-dimensional PSF. Experiments have also been conducted in which speckle brightness is shown to increase as the phase delays of an ultrasonic scanner are modified in order to compensate for a rippled aberrating layer made of silicone rubber. The characteristics of the proposed method, and the possibility of employing it clinically to correct for unknown inhomogeneities in acoustic velocity, are discussed.

Journal ArticleDOI
TL;DR: Measurements of the transient cavitation threshold in water, in a fluid of higher viscosity, and in diluted whole blood elucidate the importance of ultrasound, host fluid, and nuclei parameters in determining these thresholds.
Abstract: Transient cavitation is a discrete phenomenon that relies on the existence of stabilized nuclei, or pockets of gas within a host fluid, for its genesis. A convenient descriptor for assessing the likelihood of transient cavitation is the threshold pressure, or the minimum acoustic pressure necessary to initiate bubble growth and subsequent collapse. An automated experimental apparatus has been developed to determine thresholds for cavitation produced in a fluid by short tone bursts of ultrasound at 0.76, 0.99, and 2.30 MHz. A fluid jet was used to convect potential cavitation nuclei through the focal region of the insonifying transducer. Potential nuclei tested include 1-microns polystyrene spheres, microbubbles in the 1- to 10-microns range that are stabilized with human serum albumin, and whole blood constituents. Cavitation was detected by a passive acoustical technique that is sensitive to sound scattered from cavitation bubbles. Measurements of the transient cavitation threshold in water, in a fluid of higher viscosity, and in diluted whole blood are presented. These experimental measurements of cavitation thresholds elucidate the importance of ultrasound, host fluid, and nuclei parameters in determining these thresholds. These results are interpreted in the context of an approximate analytical theory for the prediction of the onset of cavitation.

Journal ArticleDOI
TL;DR: The new method combines the advantages of the ray‐tracing process, namely, the relatively slow increase of computation time with the length of the impulse response, with the accuracy inherent to the image‐source model, which is even sufficient to calculate the Fourier transform.
Abstract: A new method for the calculation of room acoustical impulse responses is described, which is based on two well‐known computer algorithms, the ray‐tracing and the image‐source models. With the new method, the procedure of sieving the ‘‘visible’’ image sources out of the enormous quantity of possible sources is carried out by examination of the histories of sound particles. From the obtained list of visible image sources, the impulse response of the enclosure is easily constructed. The new method combines the advantages of the ray‐tracing process, namely, the relatively slow increase of computation time with the length of the impulse response, with the accuracy inherent to the image‐source model, which is even sufficient to calculate the Fourier transform, i.e., the steady‐state transmission function of the room, or to convolve the impulse response with sound signals.

Journal ArticleDOI
TL;DR: The effects of primary-tone separation on the amplitude of distortion-product emissions (DPEs) at the 2f1-f2 frequency were systematically examined and a principal outcome reflected in the detailed structure of both average and individual ratio functions was a nonmonotonic change in DPE amplitude as the ratio of f2/f1 increased.
Abstract: The effects of primary-tone separation on the amplitude of distortion-product emissions (DPEs) at the 2f1-f2 frequency were systematically examined in ten ears of five subjects. All individuals had normal hearing and middle-ear function based upon standard clinical measures. Acoustic-distortion products were elicited at 1, 2.5, and 4 kHz by equilevel primaries at 65, 75, and 85 dB SPL, while f2/f1 ratios were varied in 0.02 increments from 1.01-1.41 (4 kHz), 1.01-1.59 (2.5 kHz), or 1.01-1.79 (1 kHz). A principal outcome reflected in the detailed structure of both average and individual ratio functions was a nonmonotonic change in DPE amplitude as the ratio of f2/f1 increased. Despite the presence of amplitude nonmonotonicities, there was clearly a region of f1 and f2 separation that generated a maximum DPE. The effects of primary-tone separation on DPE amplitudes were systematically related to DPE frequency and primary-tone level. For all three levels of stimulation, the f2/f1 ratio was inversely related to DPE frequency. Thus larger ratios reflecting a greater separation of f1 and f2 were more effective in generating DPEs at 1 kHz rather than at 4 kHz. The optimal ratio for 2.5 kHz fell at an intermediate value. Conversely, acoustic distortion-product amplitude as a function of primary-tone level was directly related to the frequency separation of the primary tones. Regardless of the frequency region of the primary tones, smaller f2/f1 ratios were superior in generating DPEs in response to 65-dB stimuli, whereas larger ratios elicited bigger DPEs with primaries at 75 and 85 dB SPL. Within any specific stimulus-parameter combination, individual variability in DPE amplitude was noted. When all stimulus conditions describing the variations in frequency and level were considered, an f2/f1 ratio of 1.22 was most effective in maximizing DPE amplitude.

PatentDOI
TL;DR: An earmold as discussed by the authors is a method of manufacturing an earmolds for a hearing aid that conveys amplified sound from the hearing aid into the ear canal to a closed cavity adjacent the tympanic membrane.
Abstract: An earmold and a method of manufacturing an earmold for a hearing aid that conveys amplified sound from the hearing aid into the ear canal to a closed cavity adjacent the tympanic membrane. The earmold includes an acoustic conduction tube having an external diameter smaller than the ear canal and a flexible flanged tip that exerts negligible pressure on the wall of the canal. One end of the tube is held in place in the canal by the flanged tip. The opposite end of the tube may be positioned in the ear aperture by a fitting in the ear concha that may be integral with the tube. The hearing aid and the earmold leave the canal open preferably to a point past the canal isthmus.

Journal ArticleDOI
TL;DR: Speech of Parkinsonian and normal geriatric adults had reduced durations of vocalic segments, reduced formant transitions, and increased voice onset time compared to the normal geriatrics, and these effects were greater for the more severe, dysarthrics and were most apparent in the more complex, vocalic gestures.
Abstract: Acoustic and kinematic analyses, as well as perceptual evaluation, were conducted on the speech of Parkinsonian and normal geriatric adults. As a group, the Parkinsonian speakers had very limited jaw movement compared to the normal geriatrics. For opening gestures, jaw displacements and velocities produced by the Parkinsonian subjects were about half those produced by the normal geriatrics. Lower lip movement amplitude and velocity also were reduced for the Parkinsonian speakers relative to the normal geriatrics, but the magnitude of the reduction was not as great as that seen in the jaw. Lower lip closing velocities expressed as a function of movement amplitude were greater for the Parkinsonian speakers than for the normal geriatrics. This increased velocity of lower lip movement may reflect a difference in the control of lip elevation for the Parkinsonian speakers, an effect that increased with the severity of dysarthria. Acoustically, the Parkinsonian subjects had reduced durations of vocalic segments, reduced formant transitions, and increased voice onset time compared to the normal geriatrics. These effects were greater for the more severe, compared to the milder, dysarthrics and were most apparent in the more complex, vocalic gestures.

PatentDOI
TL;DR: In this paper, a method for deriving acoustic word representations for use in speech recognition is presented, which involves using dynamic programming to derive a corresponding initial sequence of probabilistic acoustic sub-models for the word independently of any previously derived acoustic model particular to the word.
Abstract: A method is provided for deriving acoustic word representations for use in speech recognition Initial word models are created, each formed of a sequence of acoustic sub-models The acoustic sub-models from a plurality of word models are clustered, so as to group acoustically similar sub-models from different words, using, for example, the Kullback-Leibler information as a metric of similarity Then each word is represented by cluster spelling representing the clusters into which its acoustic sub-models were placed by the clustering Speech recognition is performed by comparing sequences of frames from speech to be recognized against sequences of acoustic models associated with the clusters of the cluster spelling of individual word models The invention also provides a method for deriving a word representation which involves receiving a first set of frame sequences for a word, using dynamic programming to derive a corresponding initial sequence of probabilistic acoustic sub-models for the word independently of any previously derived acoustic model particular to the word, using dynamic programming to time align each of a second set of frame sequences for the word into a succession of new sub-sequences corresponding to the initial sequence of models, and using these new sub-sequences to calculate new probabilistic sub-models

Journal ArticleDOI
TL;DR: The crevice model for heterogeneous nucleation of bubbles in water in response to a decreasing liquid pressure is studied in this paper, where it is argued that previous work has overlooked the essential requirement of unstable growth of the interface in the crevice.
Abstract: The crevice model for heterogeneous nucleation of bubbles in water in response to a decreasing liquid pressure is studied. The model neglects gas‐diffusion effects and is therefore more suited for acoustic than for flow cavitation. It is argued that previous work has overlooked the essential requirement of unstable growth of the interface in the crevice. As a consequence, the available results are incorrect in some cases. Another feature of the model which is considered is the process by which the interface moves out of the crevice. It is concluded that, depending on circumstances, the conditions for this step may be more stringent than those for the initial expansion of the nucleus inside the crevice. Some numerical examples are given to illustrate the complex behavior of nuclei, depending of geometrical parameters, gas saturation, contact angles, and other quantities.

Journal ArticleDOI
TL;DR: The results indicate the types of horizontal and vertical spatial information that are available from sound level cues over various ranges of frequency and, within a small subject population, indicate the nature of intersubject variability.
Abstract: Changes in sound pressures measured in the ear canal are reported for broadband sound sources positioned at various locations about the subject. These location-dependent pressures are one source of acoustical cues for sound localization by human listeners. Sound source locations were tested with horizontal and vertical resolution of 10 degrees. Sound levels were measured with miniature microphones placed inside the two ear canals. Although the measured amplitude spectra varied with the position of the microphone in the ear canal, it is shown that the directional sensitivity at any particular frequency of the broadband stimulus is independent of microphone position anywhere within the ear canal. At any given frequency, the distribution of sound pressures as a function of sound source location formed a characteristic spatial pattern comprising one or two discrete areas from which sound sources produced maximum levels in the ear canal. The locations of these discrete areas varied in horizontal and vertical location according to sound frequency. For example, around 8 kHz, two areas of maximum sensitivity typically were found that were located laterally and were separated from each other vertically, whereas, around 12 kHz, two such areas were found located on the horizontal plane and separated horizontally. The spatial patterns of sound levels were remarkably similar among different subjects, although some frequency scaling was required to accommodate for differences in the subjects' physical sizes. Interaural differences in sound-pressure level (ILDs) at frequencies below about 8 kHz tended to increase monotonically with increasing distance of the sound source from the frontal midline and tended to be relatively constant as a function of vertical source location. At higher frequencies, however, ILDs varied both with the horizontal and with the vertical location of the sound source. At some frequencies, asymmetries between the left and right ears in a given subject resulted in substantial ILDs even for midline sound sources. These results indicate the types of horizontal and vertical spatial information that are available from sound level cues over various ranges of frequency and, within a small subject population, indicate the nature of intersubject variability.

PatentDOI
TL;DR: In this article, a process of speech synthesis from diphones stored in a dictionary as waveforms, for text-to-speech conversion, comprises supplying a sequence of phoneme codes and respective prosodic information.
Abstract: A process of speech synthesis from diphones stored in a dictionary as waveforms, for text-to-speech conversion, comprises supplying a sequence of phoneme codes and respective prosodic information, and, for each phoneme, analyzing and synthesizing each phoneme, and then concatenating the synthesized phonemes. For each phoneme, two diphones are selected among the stored diphones and the presence of voicing is determined. For voiced phonemes, the respective waveforms of the two diphones constituting the phoneme are filtered by a window which is centered on a point of the selected waveform representative of the beginning of a pulse response of vocal cords to excitation thereof. The window has a width substantially equal to twice the greater of the original fundamental period and the fundamental synthesis period and has an amplitude progressively decreasing from the center of the window. The signals resulting from the filtering and obtained for each diphone are time shifted so as to be spaced apart by a time equal to the fundamental synthesis period. Synthesis is achieved by adding the displaced overlapping signals.

PatentDOI
Willem Bastiaan Kleijn1
TL;DR: A parameter communication arrangement where a parameter that is transmitted over a channel using m-bit codewords or labels is quantized before transmission as one of only p levels, where, significantly, p
Abstract: A parameter communication arrangement where a parameter that is transmitted over a channel using m-bit codewords or labels is quantized before transmission as one of only p levels, where, significantly, p

PatentDOI
TL;DR: In this paper, an apparatus and method for correctly pronouncing proper names from text using a computer provides a dictionary which performs an initial search for the name, if the name is not in the dictionary, it is sent to a filter which either positively identifies a single language group or eliminates one or more language groups as the language group of origin for that word.
Abstract: An apparatus and method for correctly pronouncing proper names from text using a computer provides a dictionary which performs an initial search for the name. If the name is not in the dictionary, it is sent to a filter which either positively identifies a single language group or eliminates one or more language groups as the language group of origin for that word. When the filter cannot positively identify the language group of origin for the name, a list of possible language groups is sent to a grapheme analyzer which precedes a trigram analyzer. Using grapheme analysis, the most probable language group of origin for the name is determined and sent to a language-sensitive letter-to-sound section. In this section, the name is compared with language-sensitive rules to provide accurate phonemics and stress information for the name. The phonemics (including stress information) are sent to a voice realization unit for audio output of the name.

Journal ArticleDOI
TL;DR: Although the number of data points from which a new relationship was inferred more than tripled, the 1978 relationship still provides a reasonable fit to the data.
Abstract: More than a decade has passed since a relationship between community noise exposure and the prevalence of annoyance was synthesized by Schultz [T. J. Schultz, J. Acoust. Soc. Am. 64, 377–405 (1978)] from the findings of a dozen social surveys. This quantitative dosage–effect relationship has been adopted as a standard means for predicting noise‐induced annoyance in environmental assessment documents. The present effort updates the 1978 relationship with findings of social surveys conducted since its publication. Although the number of data points from which a new relationship was inferred more than tripled, the 1978 relationship still provides a reasonable fit to the data.

Journal ArticleDOI
TL;DR: In this article, a holographic process is presented based on numerical methods that work for source surfaces or measurement surfaces that may have an arbitrary shape, and it is shown that this process can be used for nearfield acoustic holography.
Abstract: Nearfield acoustic holography has proven to be a useful tool for studying sound radiation. However, the analytic formulation and all current implementations of the technique require that the measurement and reconstruction surfaces be level surfaces of a separable coordinate system. In this article, a holographic process is presented based on numerical methods that work for source surfaces or measurement surfaces that may have an arbitrary shape.

Journal ArticleDOI
TL;DR: The only marked effect of the interaural difference in overall presentation level is a reduction of the gain due to ILD when the level at the ear with the better S/N ratio is decreased, which implies that an optimal monaural hearing aid (with a moderate gain) will hardly interfere with unmasking through ITD, while it may increase the gain from ILD by preventing or diminishing threshold effects.
Abstract: The effect of head-induced interaural time delay (ITD) and interaural level differences (ILD) on binaural speech intelligibility in noise was studied for listeners with symmetrical and asymmetrical sensorineural hearing losses. The material, recorded with a KEMAR manikin in an anechoic room, consisted of speech, presented from the front (0°), and noise, presented at azimuths of 0°, 30°, and 90°. Derived noise signals, containing either only ITD or only ILD, were generated using a computer. For both groups of subjects, speech-reception thresholds (SRT) for sentences in noise were determined as a function of: (1) noise azimuth, (2) binaural cue; and (3) an interaural difference in overall presentation level, simulating the effect of a monaural hearing aid. Comparison of the mean results with corresponding data obtained previously from normal-hearing listeners shows that the hearing impaired have a 2.5 dB higher SRT in noise when both speech and noise are presented from the front, and 2.6-5.1 dB less binaural gain when the noise azimuth is changed from 0° to 90°. The gain due to ILD varies among the hearing-impaired listeners between O dB and normal values of 7 dB or more. It depends on the high-frequency hearing loss at the side presented with the most favorable signal-to-noise (S/N) ratio. The gain due to ITD is nearly normal for the symmetrically impaired (4.2 dB, compared with 4.7 dB for the normal hearing), but only 2.5 dB in the case of asymmetrical impairment. When ITD is introduced in noise already containing ILD, the resulting gain is 2-2.5 dB for all groups. The only marked effect of the interaural difference in overall presentation level is a reduction of the gain due to ILD when the level at the ear with the better S/N ratio is decreased. This implies that an optimal monaural hearing aid (with a moderate gain) will hardly interfere with unmasking through ITD, while it may increase the gain due to ILD by preventing or diminishing threshold effects.

Journal ArticleDOI
TL;DR: It was observed that the acoustic durations of bilabial stops were shortened, whereas stressed vowels were lengthened during loud speech production, and two interpretations of the data are offered.
Abstract: A comparison was made between normal and loud productions of bilabial stops and stressed vowels. Simultaneous recordings of lip and jaw movement and the accompanying audio signal were made for four native speakers of Swedish. The stimuli consisted of 12 Swedish vowels appearing in an /i’b _b/ frame and were produced with both normal and increased vocal effort. The displacement, velocity, and relative timing associated with the individual articulators as well as their coarticulatory interactions were studied together with changes in acoustic segmental duration. It is shown that the production of loud as compared with normal speech is characterized by amplification of normal movement patterns that are predictable for the above articulatory parameters. In addition, it was observed that the acoustic durations of bilabial stops were shortened, whereas stressed vowels were lengthened during loud speech production. Two interpretations of the data are offered, viewing loud articulatory behavior as a response to production demands and perceptual constraints, respectively.

Journal ArticleDOI
TL;DR: In this article, a set of piezopolymer devices based on a composite laminate theory for piezoelectric polymer materials was developed, which exhibited both bending and torsion deformation under an applied electric field.
Abstract: A set of piezopolymer devices has been developed based on a composite laminate theory for piezoelectric polymer materials. By using different combinations of ply angles and electrode patterns, a piezopolymer/metal shim plate structure was built that exhibited both bending and torsion deformation under an applied electric field. A set of torsion‐beam sensor structures was also built that could distinguish between bending and torsion or between different vibration modes. These devices were based on a general theory of piezoelectric laminates. The experimental results agreed quite closely with the theoretical predictions. These integrated sensor–actuator devices may find application in the control of microactuators or may be used for modal control of larger continuous structures.

PatentDOI
Lynn D. Wilcox1, A. Lawrence Spitz1
TL;DR: Prior to character or phoneme recognition, a classifier provides a respective probability list for each of a sequence of sample characters or phonemes, each probability list indicating the respective sample's probability for each character and phoneme type.
Abstract: Prior to character or phoneme recognition, a classifier provides a respective probability list for each of a sequence of sample characters or phonemes, each probability list indicating the respective sample's probability for each character or phoneme type. These probability lists are clustered in character or phoneme probability space, in which each dimension corresponds to the probability that a character or phoneme candidate is an instance of a specific character or phoneme type. For each resulting cluster, data is stored indicating its cluster ID and a probability list indicating the probability of each type at the cluster's center. Then, during recognition, a probability cluster identifier compares the probability list for each candidate with the probability list for each cluster to find the nearest cluster. The cluster identifier then provides the nearest cluster's cluster ID to a constraint satisfier that attempts to recognize the candidate based on rules, patterns, or a combination of rules and patterns. If necessary, the constraint satisfier uses the cluster ID to retrieve the stored probability list of the cluster to assist it in recognition.

PatentDOI
TL;DR: A method of inputting Chinese characters into a computer directly from Mandarin speech which recognizes a series of monosyllables by separately recognizing syllables and Mandarin tones and assembling the recognized parts to recognize the mono-syllable using Hidden Markov Models.
Abstract: A method of inputting Chinese characters into a computer directly from Mandarin speech which recognizes a series of monosyllables by separately recognizing syllables and Mandarin tones and assembling the recognized parts to recognize the mono-syllable using Hidden Markov Models. The recognized mono-syllable is used by a Markov Chinese Language Model in a Linguistic decoder section to determine the corresponding Chinese character A Mandarin dictation machine which uses the above method, using a speech input device to receive the Mandarin speech and digitizing it so a personal computer can further process that information. A pitch frequency detector, a Voice signal pre-processing unit, a Hidden Markov Model processor, and a training facility are all attached to the personal computer to perform their associated functions of the method above.

PatentDOI
Masafumi Nishimura1
TL;DR: A speech recognition system measures the values of at least two classes of features of an utterance: (1) a first class whose value is related to the frequency spectrum of the utterance, and (2) a second class whosevalue isrelated to the variation with time of the "first class" value of the uttered utterance.
Abstract: A speech recognition system measures the values of at least two classes of features of an utterance: (1) a first class whose value is related to the frequency spectrum of the utterance, and (2) a second class whose value is related to the variation with time of the "first class" value of the utterance. Word baseforms are constructed from Markov model baseform units. Each output-producing transition of a baseform unit produces outputs from both classes. However, for each output-producing transition, the probabilities of producing outputs from the first class are independent of the probabilities of producing outputs from the second class.