scispace - formally typeset
Search or ask a question
JournalISSN: 0369-4232

Acoustical Science and Technology 

Acoustical Society of Japan
About: Acoustical Science and Technology is an academic journal published by Acoustical Society of Japan. The journal publishes majorly in the area(s): Noise & Sound (geography). It has an ISSN identifier of 0369-4232. It is also open access. Over the lifetime, 1496 publications have been published receiving 11195 citations. The journal is also known as: Nihon Onkyo Gakkaishi & The journal of the Acoustical Society of Japan.


Papers
More filters
Journal ArticleDOI
TL;DR: The background, structure, challenges, and contributions of MIREX are looked at and it is indicated that there are groups of systems that perform equally well within various MIR tasks.
Abstract: The Music Information Retrieval Evaluation eXchange (MIREX) is the community-based framework for the formal evaluation of Music Information Retrieval (MIR) systems and algorithms. By looking at the background, structure, challenges, and contributions of MIREX this paper provides some insights into the world of MIR research. Because MIREX tasks are defined by the community they reflect the interests, techniques, and research paradigms of the community as a whole. Both MIREX and MIR have a strong bias toward audio-based approaches as most MIR researchers have strengths in signal processing. Spectral-based approaches to MIR tasks have led to advancements in the MIR field but they now appear to be reaching their limits of effectiveness. This limitation is called the “glass ceiling” problem and the MIREX results data support its existence. The post-hoc analyses of MIREX results data indicate that there are groups of systems that perform equally well within various MIR tasks. There are many challenges facing MIREX and MIR research most of which have their root causes in the intellectual property issues surrounding music. The current inability of researchers to test their approaches against the MIREX test collections outside the annual MIREX cycle is hindering the rapid development of improved MIR systems.

310 citations

Journal ArticleDOI
Koichi Shinoda1, Takao Watanabe1
TL;DR: A method in which state clustering is accomplished by way of phonetic decision trees and in which the minimum description length (MDL)criterion is used to optimize the number of clusters is proposed.
Abstract: Context-dependent phone units, such as triphones, have recently come to be used to model subword units in speech recognition systems that are based on the use of hidden Markov models(HMMs).While most such systems employ clustering of the HMM parameters(e.g., subword clustering and state clustering)to control the HMM size, so as to avoid poor recognition accuracy due to a lack of training data, none of them provide any effective criteria for determining the optimal number of clusters.This paper proposes a method in which state clustering is accomplished by way of phonetic decision trees and in which the minimum description length(MDL)criterion is used to optimize the number of clusters.Large-vocabulary Japanese-language recognition experiments show that this method achieves higher accuracy than the maximum-likelihood approach.

272 citations

Journal ArticleDOI
TL;DR: This review outlines historical backgrounds, architecture, underlying principles, and representative applications of STRAIGHT.
Abstract: STRAIGHT, a speech analysis, modification synthesis system, is an extension of the classical channel VOCODER that exploits the advantages of progress in information processing technologies and a new conceptualization of the role of repetitive structures in speech sounds. This review outlines historical backgrounds, architecture, underlying principles, and representative applications of STRAIGHT.

269 citations

Journal ArticleDOI
TL;DR: In this paper, a large body of empirical research has demonstrated the importance of low-level spatiotemporal factors in the multisensory integration of auditory and visual stimuli (as, for example, indexed by research on the ventriloquism effect).
Abstract: Over the last 50 years or so, a large body of empirical research has demonstrated the importance of a variety of low-level spatiotemporal factors in the multisensory integration of auditory and visual stimuli (as, for example, indexed by research on the ventriloquism effect). Here, the evidence highlighting the contribution of both spatial and temporal factors to multisensory integration is briefly reviewed. The role played by the temporal correlation between auditory and visual signals, stimulus motion, intramodal versus crossmodal perceptual grouping, semantic congruency, and the unity assumption in modulating multisensory integration is also discussed. Taken together, the evidence now supports the view that a number of different factors, both structural and cognitive, conjointly contribute to the multisensory integration (or binding) of auditory and visual information.

202 citations

Journal ArticleDOI
TL;DR: A method of segregating desired speech from concurrent sounds received by two microphones that improved the signal-to-noise ratio by over 18dB and clarified the effect of frequency resolution on the proposed method.
Abstract: We have developed a method of segregating desired speech from concurrent sounds received by two microphones. In this method, which we call SAFIA, signals received by two microphones are analyzed by discrete Fourier transformation. For each frequency component, differences in the amplitude and phase between channels are calculated. These differences are used to select frequency components of the signal that come from the desired direction and to reconstruct these components as the desired source signal. To clarify the effect of frequency resolution on the proposed method, we conducted three experiments. First, we analyzed the relationship between frequency resolition and the power spectrum’s cumulative distribution. We found that the speech-signal power was concentrated on specific frequency components when the frequency resolution was about 10-Hz. Second, we determined whether a given frequency resolution decreased the overlap between the frequency components of two speech signals. A 10-Hz frequency resolution minimized the overlap. Third, we analyzed the relationship between sound quality and frequency resolution through subjective tests. The best frequency resolution in terms of sound quality corresponded to the frequency resolutions that concentrated the speech signal power on specific frequency components and that minimized the degree of overlap. Finally, we demonstrated that this method improved the signal-to-noise ratio by over 18dB.

144 citations

Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
202349
202252
202144
2020140
201952
201866