scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Acoustical Society of America in 1993"


Journal ArticleDOI
TL;DR: The voice parameters affected by emotion are found to be of three main types: voice quality, utterance timing, and utterance pitch contour.
Abstract: There has been considerable research into perceptible correlates of emotional state, but a very limited amount of the literature examines the acoustic correlates and other relevant aspects of emotion effects in human speech; in addition, the vocal emotion literature is almost totally separate from the main body of speech analysis literature. A discussion of the literature describing human vocal emotion, and its principal findings, are presented. The voice parameters affected by emotion are found to be of three main types: voice quality, utterance timing, and utterance pitch contour. These parameters are described both in general and in detail for a range of specific emotions. Current speech synthesizer technology is such that many of the parameters of human speech affected by emotion could be manipulated systematically in synthetic speech to produce a simulation of vocal emotion; application of the literature to construction of a system capable of producing synthetic speech with emotion is discussed.

1,063 citations


Journal ArticleDOI
TL;DR: Data suggest that while the interaural cues to horizontal location are robust, the spectral cues considered important for resolving location along a particular cone-of-confusion are distorted by a synthesis process that uses nonindividualized HRTFs.
Abstract: A recent development in human-computer interfaces is the virtual acoustic display, a device that synthesizes three-dimensional, spatial auditory information over headphones using digital filters constructed from head-related transfer functions (HRTFs). The utility of such a display depends on the accuracy with which listeners can localize virtual sound sources. A previous study [F. L. Wightman and D. J. Kistler, J. Acoust. Soc. Am. 85, 868-878 (1989)] observed accurate localization by listeners for free-field sources and for virtual sources generated from the subjects' own HRTFs. In practice, measurement of the HRTFs of each potential user of a spatial auditory display may not be feasible. Thus, a critical research question is whether listeners can obtain adequate localization cues from stimuli based on nonindividualized transforms. Here, inexperienced listeners judged the apparent direction (azimuth and elevation) of wideband noisebursts presented in the free-field or over headphones; headphone stimuli were synthesized using HRTFs from a representative subject of Wightman and Kistler. When confusions were resolved, localization of virtual sources was quite accurate and comparable to the free-field sources for 12 of the 16 subjects. Of the remaining subjects, 2 showed poor elevation accuracy in both stimulus conditions, and 2 showed degraded elevation accuracy with virtual sources. Many of the listeners also showed high rates of front-back and up-down confusions that increased significantly for virtual sources compared to the free-field stimuli. These data suggest that while the interaural cues to horizontal location are robust, the spectral cues considered important for resolving location along a particular cone-of-confusion are distorted by a synthesis process that uses nonindividualized HRTFs.

910 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a wave front synthesis method based on the Kirchhoff-Helmholtz integral (KHE) integral, where the wave fields of the sound sources on stage are measured by directive microphones; next they are extrapolated away from the stage, and finally they are re-emitted in the hall by one or more loudspeaker arrays.
Abstract: The acoustics in auditoria are determined by the properties of both the direct sound and the later arriving reflections. If electroacoustic means are used to repair disturbing deficiencies in the acoustics, one has to cope with unfavorable side effects such as localization problems and artificial impressions of the reverberant field (electronic flavor). To avoid those side effects, the concept of electroacoustic wave front synthesis is introduced. The underlying theory is based on the Kirchhoff–Helmholtz integral. In this new concept the wave fields of the sound sources on stage are measured by directive microphones; next they are electronically extrapolated away from the stage, and finally they are re‐emitted in the hall by one or more loudspeaker arrays. The proposed system aims at emitting wave fronts that are as close as possible to the real wave fields. Theoretically, there need not be any differences between the electronically generated wave fields and the real wave fields. By using the image source concept, reflections can be generated in the same way as direct sound.

865 citations


Journal ArticleDOI
TL;DR: In this article, a split-step Pade solution for the parabolic equation (PE) method is derived for problems involving very wide propagation angles, large depth variations in the properties of the waveguide, and elastic ocean bottoms.
Abstract: A split‐step Pade solution is derived for the parabolic equation (PE) method. Higher‐order Pade approximations are used to reduce both numerical errors and asymptotic errors (e.g., phase errors due to wide‐angle propagation). This approach is approximately two orders of magnitude faster than solutions based on Pade approximations that account for asymptotic errors but not numerical errors. In contrast to the split‐step Fourier solution, which achieves similar efficiency for some problems, the split‐step Pade solution is valid for problems involving very wide propagation angles, large depth variations in the properties of the waveguide, and elastic ocean bottoms. The split‐step Pade solution is practical for global‐scale problems.

682 citations


Journal ArticleDOI
TL;DR: The results of the present experiments suggest that variability plays an important role in perceptual learning and robust category formation and that listeners develop talker-specific, context-dependent representations for new phonetic categories by selectively shifting attention toward the contrastive dimensions of the non-native phonetic category categories.
Abstract: Two experiments were carried out to extend Logan et al.’s recent study [J. S. Logan, S. E. Lively, and D. B. Pisoni, J. Acoust. Soc. Am. 89, 874–886 (1991)] on training Japanese listeners to identify English /r/ and /l/. Subjects in experiment 1 were trained in an identification task with multiple talkers who produced English words containing the /r/–/l/ contrast in initial singleton, initial consonant clusters, and intervocalic positions. Moderate, but significant, increases in accuracy and decreases in response latency were observed between pretest and posttest and during training sessions. Subjects also generalized to new words produced by a familiar talker and novel words produced by an unfamiliar talker. In experiment 2, a new group of subjects was trained with tokens from a single talker who produced words containing the /r/–/l/ contrast in five phonetic environments. Although subjects improved during training and showed increases in pretest–posttest performance, they failed to generalize to tokens produced by a new talker. The results of the present experiments suggest that variability plays an important role in perceptual learning and robust category formation. During training, listeners develop talker‐specific, context‐dependent representations for new phonetic categories by selectively shifting attention toward the contrastive dimensions of the non‐native phonetic categories. Phonotactic constraints in the native language, similarity of the new contrast to distinctions in the native language, and the distinctiveness of contrastive cues all appear to mediate category acquisition.

637 citations


Journal ArticleDOI
Jean-Claude Junqua1
TL;DR: Both acoustic and perceptual analyses suggest that the influence of the Lombard effect on male and female speakers is different and bring to light that, even if some tendencies across speakers can be observed consistently, the Lombardy reflex is highly variable from speaker to speaker.
Abstract: Automatic speech recognition experiments show that, depending on the task performed and how speech variability is modeled, automatic speech recognizers are more or less sensitive to the Lombard reflex. To gain an understanding about the Lombard effect with the prospect of improving performance of automatic speech recognizers, (1) an analysis was made of the acoustic‐phonetic changes occurring in Lombard speech, and (2) the influence of the Lombard effect on speech perception was studied. Both acoustic and perceptual analyses suggest that the influence of the Lombard effect on male and female speakers is different. The analyses also bring to light that, even if some tendencies across speakers can be observed consistently, the Lombard reflex is highly variable from speaker to speaker. Based on the results of the acoustic and perceptual studies, some ways of dealing with Lombard speech variability in automatic speech recognition are also discussed.

482 citations


Journal ArticleDOI
TL;DR: In this article, a theoretically optimal multichannel receiver for intersymbol interference communication channels is derived, and its suboptimal versions with linear and decision feedback equalizer are presented.
Abstract: A theoretically optimal multichannel receiver for intersymbol interference communication channels is derived, and its suboptimal versions with linear and decision feedback equalizer are presented. A practical receiver based on any of these structures encounters difficulties in the underwater acoustic channels in which the extended time‐varying multipath is accompanied by phase instabilities. A receiver that overcomes these problems by jointly performing adaptive mean‐squared error diversity combining, multichannel carrier‐phase synchronization and decision feedback equalization is proposed. Its performance is demonstrated on the experimental telemetry data from deep and shallow water long‐range acoustic channels. Presented results indicate superior quality of coherent PSK and QAM reception obtained through joint equalization of very few channels.

454 citations


Journal ArticleDOI
TL;DR: The difficulties of interpretation of neonatal tympanograms are shown to be a consequence of ear-canal wall vibration, and impedance and reflectance measurements in the 2-4-kHz range are recommended as a potentially useful clinical tool for circumventing these difficulties.
Abstract: The ear-canal impedance and reflection coefficient were measured in an adult group and in groups of infants of age 1, 3, 6, 12, and 24 months over frequency range 125-10,700 Hz. The development of the external ear canal and middle ear strongly affect input impedance and reflection coefficient responses, and this development is not yet complete at age 24 months. Contributing factors include growth of the area and length of the ear canal, a resonance in the ear-canal walls of younger infants, and a probable influence of growth of the middle-ear cavities. The middle-ear compliance is lower in infants than adults, and the middle-ear resistance is higher. The power transfer into the middle ear of the infant is much less than into that of the adult. Such differences in power transfer directly influence both behavioral and physiological measurements of hearing. The difficulties of interpretation of neonatal tympanograms are shown to be a consequence of ear-canal wall vibration. Impedance and reflectance measurements in the 2-4-kHz range are recommended as a potentially useful clinical tool for circumventing these difficulties.

400 citations


Journal ArticleDOI
TL;DR: In this article, the authors compared two different formulations for calculating the total acoustic power radiated by a structure, in terms of the amplitudes of the structural modes and the velocities of an array of elemental radiators on the surface of the structure.
Abstract: Two formulations for calculating the total acoustic power radiated by a structure are compared; in terms of the amplitudes of the structural modes and in terms of the velocities of an array of elemental radiators on the surface of the structure. In both cases, the sound radiation due to the vibration of one structural mode or element is dependent on the vibration of other structural modes or elements. Either of these formulations can be used to describe the sound power radiation in terms of a set of velocity distributions on the structure whose sound power radiation is independent of the amplitudes of the other velocity distributions. These velocity distributions are termed ‘‘radiation modes.’’ Examples of the shapes and radiation efficiencies of these radiation modes are discussed in the cases of a baffled beam and a baffled panel. The implications of this formulation for the active control of sound radiation from structures are discussed. In particular, the radiation mode formulation can be used to provide an estimate of the number of independent parameters of the structural response which need to be measured and controlled to give a required attenuation of the radiated sound power.

391 citations


Journal ArticleDOI
TL;DR: In this article, the authors used methods that control for noise level and data quality to objectively evaluate the evidence on 22 personal and situational explanations for annoyance with environmental noise in residential areas.
Abstract: This study uses methods that control for noise level and data quality to objectively evaluate the evidence on 22 personal and situational explanations for annoyance with environmental noise in residential areas. The balance of the evidence from 464 findings drawn from 136 surveys suggests that annoyance is not affected to an important extent by ambient noise levels, the amount of time residents are at home, the type of interviewing method, or any of the nine demographic variables (age, sex, social status, income, education, home ownership, type of dwelling, length of residence, or receipt of benefits from the noise source). Annoyance is related to the amount of isolation from sound at home and to five attitudes (fear of danger from the noise source, noise prevention beliefs, general noise sensitivity, beliefs about the importance of the noise source, and annoyance with non‐noise impacts of the noise source). The evidence is too evenly divided to indicate whether changes in noise environments cause residents to be annoyed more, less, or about the same as would be expected in long‐established noise environments. The evidence shows that even at low noise levels (below DNL 55 dB), a small percentage are highly annoyed and that the extent of annoyance is related to noise exposure.

384 citations


Journal ArticleDOI
TL;DR: In this paper, the S0 Lamb mode can propagate over distances of the order of 1 m in composite laminates and so has the potential to be used in long-range nondestructive inspection.
Abstract: The S0 Lamb mode can propagate over distances of the order of 1 m in composite laminates and so has the potential to be used in long‐range nondestructive inspection. This paper discusses the interaction of the S0 Lamb mode with delaminations. The dispersion curves and the corresponding stress and displacement mode shapes of the lower order Lamb modes are obtained analytically and the interaction of the S0 mode with delaminations at different interfaces in a composite laminate is then studied both by finite element analysis and by experiment. It is shown that the amplitude of the reflection of the S0 mode from a delamination is strongly dependent on the position of the delamination through the thickness of the laminate and that the delamination locations corresponding to the maximum and minimum reflectivity correspond to the locations of maximum and minimum shear stress across the interface in the S0 mode.

PatentDOI
TL;DR: In this article, an ultrasonic sensing system is incorporated into surgical instruments to monitor operational fields defined by distal ends of the instruments, and the direction for transmission and reception of ultrasonic energy is determined by acoustic lenses, angularly oriented transducer mounts or a combination of the two.
Abstract: Ultrasonic sensing systems are incorporated into surgical instruments to monitor operational fields defined by distal ends of the instruments. The instruments include proximal ends for their activation typically including one or a pair of handles which a surgeon grasps and operates, for example by squeezing the handles together or by pivotally moving a trigger portion of the handle relative to a fixed portion of the handle. Circuitry for performing ultrasonic sensing preferably is enclosed in housings defined within the handles of the proximal ends of the instruments. Wiring, preferably running through the instruments, connects the circuitry to transducers formed in or mounted on the distal ends of the surgical instruments. The transducers direct ultrasonic energy to the operational fields defined by the distal ends of the instruments and receive ultrasonic energy reflected from the operational fields. The direction for transmission and receipt of ultrasonic energy is determined by acoustic lenses, angularly oriented transducer mounts or a combination of the two. Signals representative of the tissue or contents of the operational fields of surgical instruments drive audible signal generators or preferably tactile transducers to inform the surgeon of the contents. Tactile transducers are mounted for access by the surgeon, preferably on the handles of the surgical instruments.

Journal ArticleDOI
TL;DR: Del Grosso and Mader as mentioned in this paper examined the dependence on temperature of the speed of sound in pure water and found that the change from the previous t68 scale to the t90 scale is significant.
Abstract: In view of the adoption of the International Temperature Scale of 1990 (ITS‐90), which defines the International Celsius Temperatures, t90, the dependence on temperature of the speed of sound in pure water is examined. Drawing on the experimental data published previously [V. A. Del Grosso and C. W. Mader, ‘‘Speed of Sound in Pure Water,’’ J. Acoust. Soc. Am. 52, 1442–1446 (1972)], it is found that the change from the previous t68 scale is significant. At 100 °C, the difference between the two scales (t68−t90) is 0.026 °C, resulting in a difference of 0.022 m/s for the speed of sound. The speed of sound is fitted to a new fifth‐order polynomial applicable over the t90 range 0–100 °C.

PatentDOI
TL;DR: An ultrasound catheter is a catheter comprising an elongate flexible flexible catheter body having an ultrasound transmission member or wire extending longitudinally therethrough as discussed by the authors, where a distal head is formed on the distal end of the ultrasound transmission members or wire and is affixed to the body.
Abstract: An ultrasound catheter for removing obstructions from tubular anatomic structures such as blood vessels, said catheter comprising an elongate flexible catheter body having an ultrasound transmission member or wire extending longitudinally therethrough. A distal head is formed on the distal end of the ultrasound transmission member or wire and is affixed to the catheter body. The ultrasound transmission member or wire may be formed of any material capable of transmitting ultrasonic energy including various superelastic metal alloys such as nickel titanium metal alloys. The distal portion of the ultrasound transmission member or wire may be of reduced diameter to provide enhanced flexibility and/or amplification of the ultrasonic energy through the distal portion of the ultrasound transmission member or wire. A coating or jacket may be disposed on all or portion(s) of the ultrasound transmission member or wire to reduce friction between the ultrasound transmission member or wire and surrounding structures.

PatentDOI
George M. White1
TL;DR: A computer system having speech recognition functionality, a display screen, a microphone, and a mouse having pointer and voice buttons to signal the computer to display the recognized spoken command.
Abstract: A computer system having speech recognition functionality, a display screen, a microphone, and a mouse having pointer and voice buttons. The voice button located on the mouse is used to turn the microphone "on" and "off". The voice button in conjunction with the mouse are used to signal the computer to display the recognized spoken command. The pointer button located on the mouse is used to provide a standard "point and click" function so that a user can select text or object(s) on the display screen. The computer will apply recognized spoken commands only to the restricted selection. Voice icons are used to aid in the correction of any erroneous interpretation by the speech recognizer circuitry within the computer. A list of alternative commands are displayed in menu format associated with each icon so that the user can use the voice button and mouse to select the desired correct command. The computer then automatically corrects the erroneous interpretation. Each alternative has its own separate menu of synonyms and paraphrases to aid in locating and identifying the correct command.

PatentDOI
TL;DR: In this paper, a horn-shaped configuration of length that is a multiple of a half-wavelength was proposed to amplify ultrasound displacement at a frequency f, where f is the speed of sound in the high Q material.
Abstract: A horn (12) connectable to an energy source (88) to amplify ultrasound displacement is connected to a transmitter (14) formed of material having relatively high mechanical Q for transmitting ultrasonic energy therethrough at a frequency f, the transmitter having a horn-shaped configuration of length that is a multiple of a half-wavelength μ/2, and preferably this horn-shaped configuration is comprised of multiple horn segments (16a and 16b) each of a length substantially equal to a multiple of μ/2, where μ = c/f (c is the speed of sound in the high Q material). The transmitter has a proximal end of cross-sectional diameter D1 connected to the horn and a distal end of cross-sectional diameter D2, where D1 > D2. Ultrasonic energy transmitted through the transmitter drives a tip which is coupled to the transmitter by means of a flexible connector (20).

PatentDOI
TL;DR: In this article, a speech recognition system has an acoustic processor for generating a sequence of coded representations of an utterance to be recognized, where each speech hypothesis is modeled with an acoustic model.
Abstract: A speech recognition system displays a source text of one or more words in a source language. The system has an acoustic processor for generating a sequence of coded representations of an utterance to be recognized. The utterance comprises a series of one or more words in a target language different from the source language. A set of one or more speech hypotheses, each comprising one or more words from the target language, are produced. Each speech hypothesis is modeled with an acoustic model. An acoustic match score for each speech hypothesis comprises an estimate of the closeness of a match between the acoustic model of the speech hypothesis and the sequence of coded representations of the utterance. A translation match score for each speech hypothesis comprises an estimate of the probability of occurrence of the speech hypothesis given the occurrence of the source text. A hypothesis score for each hypothesis comprises a combination of the acoustic match score and the translation match score. At least one word of one or more speech hypotheses having the best hypothesis scores is output as a recognition result.

PatentDOI
TL;DR: A speech recognition interface system capable of handling a plurality of application programs simultaneously, and realizing convenient speech input and output modes which are suitable for the applications in the window systems and the speech mail systems, is presented in this article.
Abstract: A speech recognition interface system capable of handling a plurality of application programs simultaneously, and realizing convenient speech input and output modes which are suitable for the applications in the window systems and the speech mail systems. The system includes a speech recognition unit for carrying out a speech recognition processing for a speech input made by a user to obtain a recognition result; a program management table for managing program management data indicating a speech recognition interface function required by each application program; and a message processing unit for exchanging messages with the plurality of application programs in order to specify an appropriate recognition vocabulary to be used in the speech recognition processing of the speech input to the speech recognition unit, and to transmit the recognition result for the speech input obtained by the speech recognition unit by using the appropriate recognition vocabulary to appropriate ones of the plurality of application programs, according to the program management data managed by the program management table.

Journal ArticleDOI
TL;DR: In this article, a theory for attenuation and dispersion of compressional waves in inhomogeneous fluid-saturated materials is developed, and the wave speeds in the low and high frequency limits are associated with conditions of uniform pressure and of uniform no-flow, respectively.
Abstract: A theory is developed for the attenuation and dispersion of compressional waves in inhomogeneous fluid‐saturated materials. These effects are caused by material inhomogeneity on length scales of the order of centimeters and may be most significant at seismic wave frequencies, i.e., on the order of 100 Hz. The micromechanism involves diffusion of pore fluid between different regions, and is most effective in a partially saturated medium in which liquid can diffuse into regions occupied by gas. The local fluid flow effects can be replaced on the macroscopic scale by an effective viscoelastic medium, and the form of the viscoelastic creep function is illustrated for a compressional wave propagating normal to a layered medium. The wave speeds in the low‐ and high‐frequency limits are associated with conditions of uniform pressure and of uniform ‘‘no‐flow,’’ respectively. These correspond to the isothermal and isentropic wave speeds in a disordered thermoelastic medium.

Journal ArticleDOI
TL;DR: Responses to several different stimuli are presented to illustrate nonlinear temporal response properties that cannot be achieved with linear models for AN fibers.
Abstract: A computational model was developed for the responses of low‐frequency auditory‐nerve (AN) fibers in cat. The goal was to produce realistic temporal response properties and average discharge rates in response to simple and complex stimuli. Temporal and average‐rate properties of AN responses change as a function of sound‐pressure level due to nonlinearities in the auditory periphery. The input stage of the AN model is a narrow‐band filter that simulates the mechanical tuning of the basilar membrane. The parameters of this filter vary continuously as a function of stimulus level via a feedback mechanism, simulating the compressive nonlinearity associated with the mechanics of the basilar membrane. A memoryless, saturating nonlinearity and two low‐pass filters simulate transduction and membrane properties of the inner hair cell (IHC). A diffusion model for the IHC‐AN synapse introduces adaptation. Finally, a nonhomogeneous Poisson process, modified by absolute and relative refractoriness, provides the outpu...

Journal ArticleDOI
TL;DR: In this paper, the degradation of the full song of the Turdus merula was quantified by measuring excess attenuation, reduction of the signal-to-noise ratio, and blur ratio, the latter measure representing the degree of blurring of amplitude and frequency patterns over time.
Abstract: The habitat‐induced degradation of the full song of the blackbird (Turdus merula) was quantified by measuring excess attenuation, reduction of the signal‐to‐noise ratio, and blur ratio, the latter measure representing the degree of blurring of amplitude and frequency patterns over time. All three measures were calculated from changes of the amplitude functions (i.e., envelopes) of the degraded songs using a new technique which allowed a compensation for the contribution of the background noise to the amplitude values. Representative songs were broadcast in a deciduous forest without leaves and rerecorded. Speakers and microphones were placed at typical blackbird emitter and receiver positions. Analyses showed that the three degradation measures were mutually correlated, and that they varied with log distance. Their variation suggests that the broadcast song could be detected across more than four, and discriminated across more than two territories. The song’s high‐pitched twitter sounds were degraded more rapidly than its low‐pitched motif sounds. Motif sounds with a constant frequency projected best. The effect of microphone height was pronounced, especially on motif sounds, whereas the effect of speaker height was negligible. Degradation was inversely proportional to microphone height. Changing the reception site from a low to a high position reduced the degradation by the same amount as by approaching the sound source across one‐half or one‐whole territory. This suggests that the main reason for a male to sing from a high perch is to improve the singer’s ability to hear responses to its songs, rather than to maximize the transmission distance. The difference in degradation between low and high microphone heights may explain why females, which tend to perch on low brush, disregard certain degradable components of the song.

PatentDOI
TL;DR: The principle of minimum recognition error rate is applied by the present invention using discriminative training and various issues related to the special structure of HMMs are presented.
Abstract: A system pattern-based speech recognition, e.g., a hidden Markov model (HMM) based speech recognizer using Viterbi scoring. The principle of minimum recognition error rate is applied by the present invention using discriminative training. Various issues related to the special structure of HMMs are presented. Parameter update expressions for HMMs are provided.

PatentDOI
James D. Johnston1
TL;DR: In this paper, a perceptual filterbank coder is used to decode high quality stereophonic audio signals, which exploits the interchannel redundancies and psychoacoustic properties of stereo audio signals.
Abstract: Coding of high quality stereophonic audio signals is accomplished in a perceptual filterbank coder which exploits the interchannel redundancies and psychoacoustic. Using perceptual principles, switching between a normal and short window of input samples improve output signal quality for certain input signals, particularly those having a rapid attack. Switching is also accomplished between coding of left and right channels and so- called sum and difference channels in response to particular signal conditions. A number of new perceptually based techniques, including improved threshold determinations, result in high quality.

PatentDOI
TL;DR: In this article, a sound waveform is synthesized on the basis of the analysis data to which the controlled characteristic has been added, and the original sound wave form is represented by a combination of the thus-modified analysis data and the musical parameter.
Abstract: Analysis data are provided which are indicative of plural components making up an original sound waveform. The analysis data are analyzed to obtain a characteristic concerning a predetermined element, and then data indicative of the obtained characteristic is extracted as a sound or musical parameter. The characteristic corresponding to the extracted musical parameter is removed from the analysis data, and the original sound waveform is represented by a combination of the thus-modified analysis data and the musical parameter. These data are stored in a memory. The user can variably control the musical parameter. A characteristic corresponding to the controlled musical parameter is added to the analysis data. In this manner, a sound waveform is synthesized on the basis of the analysis data to which the controlled characteristic has been added. In such a sound synthesis technique of the analysis type, it is allowed to apply free controls to various sound elements such as a formant and a vibrato.

Journal ArticleDOI
TL;DR: The results indicate that the dynamic attributes of timbre are not only present at the onset, but also throughout, and that multiple acoustic attributes may contribute to the same perceptual dimensions.
Abstract: Three experiments examined the dynamic attributes of timbre by evaluating the role of onsets in similarity judgments. In separate experiments, subjects heard complete orchestral instrument tones, the onsets of those tones, and tones with the onsets removed (‘‘remainders’’). Ratings for complete tones corresponded to those for onsets, indicating that the salient acoustic attributes for complete tones are present at the onset. Ratings for complete tones also corresponded to those for remainders, indicating that the salient attributes for complete tones are present also in the absence of onsets. Subsequent acoustic analyses demonstrated that this pattern of similarity was due to the centroid frequencies and amplitude envelopes of the tones. The results indicate that the dynamic attributes of timbre are not only present at the onset, but also throughout, and that multiple acoustic attributes may contribute to the same perceptual dimensions.

Journal ArticleDOI
TL;DR: It would appear that DPOAE measurements can be used to accurately identify the presence of high-frequency hearing loss, but are not accurate predictors of hearing status at lower frequencies, at least for the conditions of the present measurements.
Abstract: Distortion product otoacoustic emissions (DPOAE) were measured in normal‐hearing and hearing‐impaired human subjects. Analyses based on decision theory were used to evaluate DPOAE test performance. Specifically, relative operating characteristic (ROC) curves were constructed and the areas under these curves were used to estimate the extent to which normal and impaired ears could be correctly identified by these measures. DPOAE amplitude and DPOAE/noise measurements were able to distinguish between normal and impaired subjects at 4000, 8000, and, to a lesser extent, at 2000 Hz. The ability of these measures to distinguish between groups decreased, however, as frequency and audiometric criterion used to separate normal and hearing‐impaired ears decreased. At 500 Hz, performance was no better than chance, regardless of the audiometric criterion for normal hearing. Cumulative distributions of misses (hearing‐impaired ears incorrectly identified as normal hearing) and false alarms (normal‐hearing ears identified as hearing impaired) were constructed and used to evaluate test performance for a range of hit rates (i.e., the percentage of correctly identified hearing‐impaired ears). Depending on the desired hit rate, criterion values of −5 to −12 dB SPL for DPOAE amplitudes and 8 to 15 dB for DPOAE/noise accurately distinguished normal‐hearing ears from those with thresholds greater than 20 dB HL for the two frequencies at which performance was best (4000 and 8000 Hz). It would appear that DPOAE measurements can be used to accurately identify the presence of high‐frequency hearing loss, but are not accurate predictors of hearing status at lower frequencies, at least for the conditions of the present measurements.

Journal ArticleDOI
TL;DR: The pattern of significant and nonsignificant between-group differences, but not data for individual subjects, was consistent with the hypothesis that L2 (second language) production accuracy is limited by the adequacy of perceptual representations for sounds in the L2.
Abstract: Four experiments, all of which focused on vowel duration, assessed Chinese subjects’ production and perception of the contrast between /t/ and /d/ in the final position of English words. Vowel duration was measured in minimal pairs in the first experiment. The stimuli in natural‐edited beat–bead and bat–bad continua in which vowel duration varied in 20‐ms steps were then presented to native English and Chinese subjects in a forced‐choice test, in an experiment using the method of adjustment, and in an imitation task. The non‐natives who learned English in childhood closely resembled native speakers in all four experiments. Three groups of non‐natives who had learned English as a second language in adulthood, on the other hand, differed from the native speakers. The late learners produced significantly longer vowels in words ending in /d/ than /t/. However, the late learners’ vowel duration differences were much smaller than the native speakers’, and were correlated significantly with degree of foreign accent in English. The late learners differed from the native speakers in several ways in the two perception experiments, and also in the imitation task. The pattern of significant and nonsignificant between‐group differences, but not data for individual subjects, was consistent with the hypothesis that L2 (second language) production accuracy is limited by the adequacy of perceptual representations for sounds in the L2.

Journal ArticleDOI
TL;DR: In this article, a precise numerical calculation of the specific heat ratio and speed of sound in air as a function of temperature, pressure, humidity, and CO2 concentration is presented.
Abstract: This paper describes a precise numerical calculation of the specific heat ratio and speed of sound in air as a function of temperature, pressure, humidity, and CO2 concentration. The above parameters are calculated utilizing classical thermodynamic relationships and a real gas equation of state over the temperature range 0 °C–30 °C. The shortcomings of previous determinations are also discussed. For both parameters, the coefficients of an interpolating equation are given, which are suitable for use in applications requiring high precision. The overall uncertainty in the specific heat ratio is estimated to be less than 320 ppm and the uncertainty in the speed of sound is similarly estimated to be less than 300 ppm.

PatentDOI
TL;DR: In this article, a wideband speech signal (8 kHz) of high quantity is reconstructed from a narrowband speech signals (300 Hz to 3.4 kHz) by LPC-analyzing to obtain spectrum information parameters.
Abstract: A wideband speech signal (8 kHz, for example) of high quantity is reconstructed from a narrowband speech signal (300 Hz to 3.4 kHz). The input narrowband speech signal is LPC-analyzed to obtain spectrum information parameters, and the parameters are vector-quantized using a narrowband speech signal codebook. For each code number of the narrowband speech signal codebook, the wideband speech waveform corresponding to the codevector concerned is extracted by one pitch for voiced speech and by one frame for unvoiced speech and prestored in a representative waveform codebook. Representative waveform segments corresponding to the respective output codevector numbers of the quantizer are extracted from the representative waveform codebook. Voiced speech is synthesized by pitch-synchronous overlapping of the extracted representative waveform segments and unvoiced speech is synthesized by randomly using waveforms of one frame length. By this, a wideband speech signal is produced. Then, frequency components below 300 Hz and above 3.4 kHz are extracted from the wideband speech signal and are added to an up-sampled version of the input narrowband speech signal to thereby reconstruct the wideband speech signal.

PatentDOI
TL;DR: In this paper, a high quality voice transformation system and method operates during a training mode to store voice signal characteristics representing target and source voices, and then during a real time transformation mode, a signal representing source speech is segmented into overlapping segments, analyzed to separate the excitation spectrum from the tone quality spectrum.
Abstract: A high quality voice transformation system and method operates during a training mode to store voice signal characteristics representing target and source voices. Thereafter, during a real time transformation mode, a signal representing source speech is segmented into overlapping segments, analyzed to separate the excitation spectrum from the tone quality spectrum. A stored target tone quality spectrum is substituted for the source spectrum and then convolved with the actual source speech excitation spectrum to produce a transformed speech signal having the word and excitation content of the source, but the acoustical characteristics of a target speaker. The system may be used to enable a talking, costumed character, or in other applications where a source speaker wishes to imitate the voice characteristics of a different, target speaker.