scispace - formally typeset
Search or ask a question

Showing papers in "Acoustical Science and Technology in 2001"


Journal ArticleDOI
TL;DR: A method of segregating desired speech from concurrent sounds received by two microphones that improved the signal-to-noise ratio by over 18dB and clarified the effect of frequency resolution on the proposed method.
Abstract: We have developed a method of segregating desired speech from concurrent sounds received by two microphones. In this method, which we call SAFIA, signals received by two microphones are analyzed by discrete Fourier transformation. For each frequency component, differences in the amplitude and phase between channels are calculated. These differences are used to select frequency components of the signal that come from the desired direction and to reconstruct these components as the desired source signal. To clarify the effect of frequency resolution on the proposed method, we conducted three experiments. First, we analyzed the relationship between frequency resolition and the power spectrum’s cumulative distribution. We found that the speech-signal power was concentrated on specific frequency components when the frequency resolution was about 10-Hz. Second, we determined whether a given frequency resolution decreased the overlap between the frequency components of two speech signals. A 10-Hz frequency resolution minimized the overlap. Third, we analyzed the relationship between sound quality and frequency resolution through subjective tests. The best frequency resolution in terms of sound quality corresponded to the frequency resolutions that concentrated the speech signal power on specific frequency components and that minimized the degree of overlap. Finally, we demonstrated that this method improved the signal-to-noise ratio by over 18dB.

144 citations


Journal ArticleDOI
TL;DR: In this article, the authors show that visual temporal resolution can be either improved or degraded by accompanying sounds, depending on the sequence and delay among the auditory and visual stimuli, and that a single visual flash can be perceived as multiple flashes when accompanied by multiple sounds.
Abstract: Three sets of new findings with regard to modulation of visual perception by auditory stimuli are reviewed. First, we show that visual temporal resolution can be either improved or deteriorated by accompanying sounds, depending on the sequence and delay among the auditory and visual stimuli. Second, a single visual flash can be perceived as multiple flashes when accompanied by multiple sounds. Third, an ambiguous motion display consisting of two objects moving toward each other is perceived as streaming with or without an unsynchronized sound, but as bouncing with a synchronized sound. Based on these findings, we argue, against the traditional belief of visual dominance, that audition can modify vision particularly when it provides strong transient signal(s).

48 citations



Journal ArticleDOI
TL;DR: A physiological articulatory model has been developed to simulate the dynamic actions of speech organs during speech production and demonstrated realistic behaviors similar to coarticulation in human speech production.
Abstract: A physiological articulatory model has been developed to simulate the dynamic actions of speech organs during speech production. This model represents the midsagittal region of the tongue, jaw, hyoid bone, and the vocal tract wall in three dimensions. The soft tissue of the tongue is outlined in the midsagittal and parasagittal planes of MR images obtained from a male Japanese speaker, and constructed as a 2-cm thick layer. The palatal and pharyngeal walls are constructed as a hard shell of a 3-cm left-to-right width. The jaw and hyoid bone are modelled to yield rotation and translation motions. The muscle structure in the model is identified based on volumetric MR images of the same speaker. A fast simulation method is developed by modeling both the soft tissue and rigid organs using mass-points with two types of links: viscoelastic springs with a proper stiffness for connective tissue, and extremely high stiffness for bony organs. Muscle activation signals are generated by a model control strategy based on the target-reaching task, and then fed to drive the model to approach the targets. The model demonstrated realistic behaviors similar to coarticulation in human speech production (Dang and Honda, 1998, 1999, 2000).

34 citations


Journal ArticleDOI
TL;DR: The difficult but important challenge is to devise a stereophonic acoustic echo canceller that converges independently of variations in the transmission room that identifies the true echo path impulse response quickly with low computational complexity.
Abstract: A stereo teleconferencing system provides a more realistic presence compared to monaural systems. It helps listeners distinguish who is talking at the other end by means of spatial information. In such hands-free systems, stereophonic acoustic echo cancellers are absolutely necessary for full-duplex communication. The most significant problem with stereo echo cancellation using the conventional linear combiner structure is that the adaptive filter often misconverges or, even when it converges, the convergence is very slow because of the strong crosscorrelation between the stereo signals [1]. As a result, the conventional stereo echo canceller suffers from variation in both the near-end echo path and the far-end transmission path. The adaptive algorithm must track variations in not only the receiving room but also the transmission room. Accordingly, the performance of the stereo echo canceller degrades at the instant of any abrupt change in the environment in the transmission room. The difficult but important challenge is to devise a stereophonic acoustic echo canceller that converges independently of variations in the transmission room. For this aim, it is necessary for a stereo echo canceller to identify the true echo path impulse response quickly with low computational complexity. In this paper, this fundamental problem of stereophonic acoustic echo cancellation is discussed and recent solutions are reviewed.

30 citations


Journal ArticleDOI
TL;DR: Physiological and psychophysical evidence for temporal coding of sensory qualities in different modalities is considered and a space of pulse codes is outlined that includes channel-codes (across-neural activation patterns), temporal pattern codes (spike patterns), and spike latency codes (relative spike timings).
Abstract: Physiological and psychophysical evidence for temporal coding of sensory qualities in different modalities is considered. A space of pulse codes is outlined that includes 1) channel-codes (across-neural activation patterns), 2) temporal pattern codes (spike patterns), and 3) spike latency codes (relative spike timings). Temporal codes are codes in which spike timings (rather than spike counts) are critical to informational function. Stimulus-dependent temporal patterning of neural responses can arise extrinsically or intrinsically: through stimulus-driven temporal correlations (phase-locking), response latencies, or characteristic timecourses of activation. Phase-locking is abundant in audition, mechanoception, electroception, proprioception, and vision. In phase-locked systems, temporal differences between sensory surfaces can subserve representations of location, motion, and spatial form that can be analyzed via temporal cross-correlation operations. To phase-locking limits, patterns of all-order interspike intervals that are produced reflect stimulus autocorrelation functions that can subserve representations of form. Stimulus-dependent intrinsic temporal response structure is found in all sensory systems. Characteristic temporal patterns that may encode stimulus qualities can be found in the chemical senses, the cutaneous senses, and some aspects of vision. In some modalities (audition, gustation, color vision, mechanoception, nocioception), particular temporal patterns of electrical stimulation elicit specific sensory qualities.

29 citations


Journal ArticleDOI
TL;DR: In this article, a wide variety of percussion instruments are described, including tuned wind chimes, Caribbean steelpans, major-third bells, bass handbells, Choirchimes, and glass instruments.
Abstract: Recent research on the acoustics of percussion instruments has focussed on observing their modes of vibration and understanding how they radiate sound. Holographic interferometry, on account of its high resolution, is an especially useful method for modal analysis on a wide variety of percussion instruments. Several new percussion instruments as well as studies on some very ancient ones are described. New instruments include tuned wind chimes, Caribbean steelpans, major-third bells, bass handbells, Choirchimes, and glass instruments.

27 citations


Journal ArticleDOI
TL;DR: In this paper, a five-alternative, forced-choice (5AFC) test was administered to 104 Japanese students of English to examine the ability of native Japanese speakers to distinguish English voiceless fricatives.
Abstract: In order to examine the ability of native Japanese speakers to distinguish between English voiceless fricatives, a five-alternative, forced-choice (5AFC) test was administered to 104 Japanese students of English. The stimuli consisted of 75 nonsense syllables in which five fricatives (/f/, /s/, /∫/, /θ/, /h/) were presented in five vowel environments (/i e a o u/) and in three different consonant contexts, and were spoken by three native speakers of English. The identification rates were submitted to signal-detection-theoretic (SDT) analysis (measured by d′), multidimensional-scaling (MDS) analysis, and cluster analysis. Overall, identification rates ranged from a maximum of 88% for the /∫/ stimuli to 55% for the /θ/ stimuli. The results showed that both vowel environment and consonant context had an effect on the listeners’ perception of the stimuli in this study. A control group of six native English subjects took the same identification test using the same procedure. All of the control listeners had no difficulty in identifying the target fricatives except in the case of the /f/–/θ/ contrast.

26 citations


Journal ArticleDOI
TL;DR: In this paper, the authors numerically analyzed the sound pressure on an object in an ultrasonic cleaning vessel by considering the dissipation of cavitation bubbles, and found that the amount of energy dissipation increases proportionally to the number of bubbles.
Abstract: This paper numerically analyzes the sound pressure on an object in an ultrasonic cleaning vessel by considering the dissipation of cavitation bubbles. To clarify the effect of ultrasonic attenuation on the number of cavitation bubbles, the cavitation intensity on a brass object is measured experimentally by changing the quantity of water. Then, the analyzed sound pressure results are compared with the measured cavitation intensity results. The energy dissipation by the oscillation of bubbles is estimated by the irreversible process of heat and mass transfer. The calculation is carried out for the natural oscillation and forced oscillation of cavitation bubbles. It is found that the dissipation of thermal conduction results from the radial oscillation of bubbles by ultrasound. The sound pressure calculated by this dissipation agrees with the cavitation intensity profile estimated using experimental results from the erosion loss of aluminum foil. As the quantity of the water in the cleaning vessel is increased, the sound pressure becomes lower. This is because the amount of energy dissipation of the ultrasonic wave increases proportionally to the number of bubbles. However, when the standing wave causes resonance between the ultrasonic generator and the block, the effect of the sound pressure on the bottom of the block is not disturbed by the water volume.

21 citations


Journal ArticleDOI
TL;DR: In this paper, the effect of Oriental lacquer (urushi) on the vibrational properties of wood was compared to those of conventional coatings, and the results suggested the possibility of urushi as a coating for the harp soundboard.
Abstract: In order to investigate the possibility of Oriental lacquer (urushi) as a coating for the wooden-soundboard of musical instruments, the effects of urushi coatings on the vibrational properties of wood were compared to those of conventional coatings. By coating, the dynamic Young’s modulus of wood decreased slightly in its fiber direction whereas that in the radial direction increased. The most remarkable changes due to coating were recognized in the internal friction of wood (Q-1), especially that in the radial direction. The effect of the urushi coating on the Q-1 of wood was relatively small and very close to those of polyurethane coating used for the soundboard of harp. The viscoelastic and mechanical properties of urushi lacquer films were also similar to those of the polyurethane lacquer film. These results suggested the possibility of urushi as a coating for the harp soundboard. The effects of coatings on the vibrational properties of wood were explained by using a model considering three layers, the uncoated wood, coating layer, and a layer consisting of lacquer and wood cell wall.

20 citations


Journal ArticleDOI
TL;DR: In this paper, the frequency dependence of the acoustic radiation pressure on a solid elastic sphere placed freely in an incident plane progressive sound field in water has been investigated theoretically by taking into account the three components of the radiation pressure, namely, kinetic energy, potential energy, and tensor term.
Abstract: The frequency dependence of the acoustic radiation pressure on a solid elastic sphere placed freely in an incident plane progressive sound field in water has been investigated. In particular, the behavior of the acoustic radiation pressure at resonance frequencies of the elastic vibration of the sphere has been studied theoretically by taking into account the three components of the radiation pressure, namely, kinetic energy, potential energy, and tensor term. It has shown that the contribution of potential energy to radiation pressure is the largest and it has a positive value, that which of kinetic energy is rather small and has a negative value, and that which of tensor term is so small as to be negligible in the cases far from resonance frequencies. At resonance frequencies, potential energy falls off rapidly, while the softer the material are, the greater positive values kinetic energy and tensor term are increased up to, with a few exceptions. As a result, there takes place, in general, a series of maxima in the frequency characteristic curves of radiation pressure for relatively soft materials such as lead, and a series of minima for relatively hard materials such as iron at the resonance frequencies, because the increase in kinetic energy and in tensor term overcome the decrease in the potential energy for relatively soft materials. Materials of intermediate hardness such as brass have the frequency characteristic curves mixed by maxima and minima.

Journal ArticleDOI
TL;DR: In this article, the effect of shaving the top surface of the shirabeguchi nut on the Chikuzen 5-stringed biwa has been investigated, and the results showed that a minute change in the shape of this surface results in a large difference in the quality of the resulting sawari tone.
Abstract: The sawari is an instrumental mechanism of a certain class of stringed instruments so that the string touches to it repeatedly when vibrating. The Chikuzen biwa is one of Japanese plucked stringed instruments; it is equipped with a sawari which is a narrow strip of surface on the top of shirabeguchi (the nut). It is known that only a minute change in the shape of this surface results in a large difference in the quality of the resulting “sawari” tone. This paper studies the sawari tone under different grades, or strengths, of the sawari created by shaving the top surface of shirabeguchi differently with masterly craftsmanship, together with one without sawari (no shaving at all), using an excellent Chikuzen 5-stringed biwa, to compare quantitatively the effect of the degree of shaving on the resulting sound. The analysis shows the temporal development of the amplitudes of up to 24th partials for open strings under each of the above-mentioned sawari conditions. The sawari effect appears in two aspects: (1) to intensify the partials of 6th to 20th and up, and (2) to elongate their durations.

Journal ArticleDOI
TL;DR: In this article, the temporal control in equal-interval tapping is governed by a memory mechanism, which preserves the information of the preceding 20 intervals to determine the interval of the present tap, and the point at which the memory mechanism switches between the long and short time values is located around a tempo in which the short time value corresponds to 350 ms.
Abstract: In previous studies, it was shown that the temporal control in equal-interval tapping is governed by a memory mechanism, which preserves the information of the preceding 20 intervals to determine the interval of the present tap. In the first stage of the present study, an equal-interval tapping experiment was carried out. The results of the experiment confirmed that the 20-interval memory mechanism governs the temporal control of single-finger equal-interval tapping for various tempi. In the following stages of the present study, simple rhythmic patterns were constructed with long and short time values with a 2:1 ratio. The temporal fluctuation in repetitive tapping of these rhythmic patterns was analyzed using Fourier analysis and autoregressive models. The results showed that the 20-interval memory mechanism also governs the temporal control of the tapping for the simple rhythmic patterns: In the case of a rapid tempo, the 20-interval memory mechanism is active for the long time value, whereas in the case of a slow tempo, it is active for the short time value. The point at which the memory mechanism switches between the long and short time values is located around a tempo in which the short time value corresponds to 350 ms.

Journal ArticleDOI
TL;DR: A new method for fundamental frequency estimation from the noisy spectrum of a speech signal is introduced, which uses the MUSIC algorithm, which is an eigen-based subspace decomposition method.
Abstract: In this article a new method for fundamental frequency estimation from the noisy spectrum of a speech signal is introduced. The fundamental frequency is one of the most essential characteristics for speech recognition, speech coding and so on. The proposed method uses the MUSIC algorithm, which is an eigen-based subspace decomposition method.

Journal ArticleDOI
TL;DR: In this paper, an IIR implementation of the gammachirp filter has been proposed to simulate basilar membrane motion efficiently (Irino and Unoki, 1999) and a reasonable filter response was provided by a combination of a gammatone filter and an IIRs asymmetric compensation (AC) filter.
Abstract: An IIR implementation of the gammachirp filter has been proposed to simulate basilar membrane motion efficiently (Irino and Unoki, 1999). A reasonable filter response was provided by a combination of a gammatone filter and an IIR asymmetric compensation (AC) filter. It was noted, probably however, that the rms error was high when the absolute values of the parameters are large, because the coefficients of the IIR-AC filter were selected heuristically. In this report, we show that this is due to the sign inversion of the phase of poles and zeros in the conventional model. We propose a new definition of the IIR-AC filter and we describe a method of systematic determining the optimum coefficients and number of cascade for the second-order filter. This results in a reduction of the error to about 1/3 that produced by the conventional model.

Journal ArticleDOI
TL;DR: This paper presents several methods for the alignment of a music score (given in MIDI form) to a human performance of the same musical piece, and an efficient manual bootstrapping method was developed and shows the effectiveness of the alignment algorithms.
Abstract: This paper presents several methods for the alignment of a music score (given in MIDI form) to a human performance of the same musical piece. Two distinct cases are considered: first, MIDI-to-MIDI alignment, where rhythm changes, note time shifts, and player errors are handled by a Dynamic-Programming (DP) algorithm. Next, a method for MIDI-to-audio alignment is presented, incorporating spectral data into the DP method. Experiments on a music database showed the effectiveness of the alignment algorithms. For cases where the alignment process gets trapped in a wrong local minimum, an efficient manual bootstrapping method was developed and shown to lead to the selection of the correct alignment.


Journal ArticleDOI
TL;DR: The purpose of this paper is to review what is known about the functional consequences of hair cell loss and subsequent hair cell regeneration in birds, to point out the relevance of this work for human hearing recovery, and to suggest some directions for future research.
Abstract: The discovery of hair cell regeneration in birds a little over a decade ago raises a number of obvious and exciting questions about basic functional and neural plasticity in the vertebrate auditory system. Because many birds must learn the complex, species-specific, acoustic signals they use for communication just as humans must learn the sounds of speech, the finding of hair cell regeneration in birds also raises other interesting questions. One of these questions concerns the relation between hearing loss and vocal production. Another question concerns the effect of full or partial hearing recov- ery on vocal behavior. The purpose of this paper is to review what is known about the functional (i.e. behavioral) consequences of hair cell loss and subsequent hair cell regeneration in birds, to point out the relevance of this work for human hearing recovery, and to suggest some directions for future research.

Journal ArticleDOI
TL;DR: In this article, a sound-synthesis model composed of an exciter, a one-dimensional vibrator, and a two-dimensional resonator is used, and smooth timbre conversion by parameter control is examined.
Abstract: Our goal is to develop sound synthesis technology that users can synthesize arbitrary sound timbre, including musical instrument sounds, natural sounds, and their interpolation/extrapolation on demand. For this purpose, we investigated sound interpolation based on physical modeling. A sound-synthesis model composed of an exciter, a one-dimensional vibrator, and a two-dimensional resonator is used, and smooth timbre conversion by parameter control is examined. Piano and guitar sounds are simulated using this model, and interpolation between piano and guitar tones is investigated. The strategy for parameter control is proposed, and subjective tests were performed to evaluate the algorithm. A multidimensional scaling (MDS) technique is used, and perceptual characteristics are discussed. One of the axes of the timbre space is interpreted as spectral energy distribution, so the spectral centroid is used as a reference to adjust parameters for synthesis. By considering the centroids, smoothly interpolating timbre is achieved. These results suggest the possibility of developing a morphing system using a physical model.



Journal ArticleDOI
TL;DR: In this article, a single bubble is observed using a needle-type hydrophone and compared with Mie-scattering data, and it is suggested that the background emision is associated with multipolar source of acoustic waves in which vortices surrounding the bubble play an important role.
Abstract: Acoustic emission from a single bubble is observed using a needle-type hydrophone and compared with Mie-scattering data. The acoustic emission is composed of two signals, i.e., pulses occuring at the moment of collapse and background emission ΔV. The average of ΔV and the maximum radius can be grouped to two types of collpase. The average power of ΔV is proportinal to or increase more rapidly than R3max. It is suggested that the background emision is associated with multipolar source of acoustic waves in which vortices surrounding the bubble play an important role.



Journal ArticleDOI
TL;DR: In this article, a motion tracking system for monitoring articulatory movements is presented, made up of two sensor units, magnetometer and optical sensors, which are attached to selected points of articulators.
Abstract: This paper introduces a motion tracking system useful for monitoring articulatory movements. The system is made up in combination of two sensor units, magnetometer and optical sensors. The magnetometer unit consists of sensors having two amorphous alloy cores and small permanent magnetic rods glued on the tongue surface. The measuring principle is based on a change in the intensity of the magnetic field related to the distances between the rods and sensors. The optical sensor unit consists of a position-sensitive device (PSD) and light-emitting diodes (LEDs) which are attached to several selected points of articulators. Simultaneous measurements have been done in combination with the magnetic sensing unit and the optical one. Two points on the tongue surface were measured by making use of the magnetic sensing unit, and five points; two points for the jaw, two points for the lips, and one point for the nose, were measured by use of the optical sensing unit.

Journal ArticleDOI
TL;DR: Speech intelligibility tests were conducted using sound fields with different energy concentration points and the results show that energy concentration at shorter delay times increases intelligibility, thus reconfirming the concept of importance of early energy, and indicates clear disagreement with the STI.
Abstract: In previous work, the authors examined the tendency of changes in the Speech Transmission Index (STI) using several modeled sound fields. The results showed that energy concentration due to strong reflections at any delay time, short or long, increases the STI. This means that the STI evaluates only the degree of energy concentration or dispersion in the time domain regardless of the delay time. As such, the STI fundamentally contradicts the generally accepted concept that early reflection is important in speech intelligibility. However, it has not yet been clarified as to whether such a property of the STI corresponds to intelligibility or merely reveals a defect in the STI. To examine the validity of the STI, speech intelligibility tests were conducted using sound fields with different energy concentration points. The results show that energy concentration at shorter delay times increases intelligibility, thus reconfirming the concept of importance of early energy, and indicates clear disagreement with the STI. The STI cannot be considered to correspond to intelligibility because it does not distinguish useful early energy from non-early energy, which does not contribute to intelligibility.

Journal ArticleDOI
TL;DR: In this article, the authors investigated the effect of pitch frequency in forming the attentional filters of complex tone and frequency-gliding tones in signal detection tasks, and they found that an attentional filter can be formed at the fundamental frequency region, where there is no real power or no power convergence on a single frequency.
Abstract: In this study, we investigated the effect of pitch frequency in forming the attentional filters of complex tone and frequency-gliding tones in signal detection tasks. The attentional filters were measured at spectral regions where there is no real power or no power convergence on a single frequency. In Experiment I, the attentional filter around the missing-fundamental frequency was measured by the probe-signal method. The cue tone was a complex tone composed of 13 components from 1,000 Hz up to 4,000 Hz, whose fundamental frequency is 250 Hz. In Experiment II, we also investigated the formation of attentional filters in relation to frequency-gliding tonal cues. The frequency was changed from 925 Hz to 1,075 Hz in an upward-frequency glide and from 1,075 Hz to 925 Hz in a downward-frequency glide. The overall pitch was first measured by pitch-matching and the attentional filter was then measured around the overall pitch frequency. In conclusion, an attentional filter can be formed at the fundamental frequency region, where is no real power, and at the frequency corresponding to overall pitch of frequency-gliding tone.



Journal ArticleDOI
TL;DR: In this paper, the authors reviewed the progress made over the past decade in understanding the mechanisms of sound production in music wind instruments and reviewed the one major exception being in loud playing on brass instruments where propagation nonlinearities in the air column are also important.
Abstract: Progress made over the past decade in understanding the mechanisms of sound production in music wind instruments is reviewed. The behavior of air columns, horns, and fingerholes is now fairly well understood, and most recent interest centers on details of the sound generator — the reed in woodwinds, the lips in brass instruments, and the air jet in flute-family instruments. Not only do these generators produce the sound, but they are also largely responsible, through their nonlinearity, for controlling the harmonic content and thus the musical timbre of the instrument, the one major excep- tion being in loud playing on brass instruments where propagation nonlinearities in the air column are also important. Despite considerable progress, there remain important and interesting questions to be answered.