scispace - formally typeset
Search or ask a question
Author

Tetsuya Shimamura

Bio: Tetsuya Shimamura is an academic researcher from Saitama University. The author has contributed to research in topics: Noise & Speech enhancement. The author has an hindex of 14, co-authored 211 publications receiving 781 citations.


Papers
More filters
Journal ArticleDOI
01 Feb 2015
TL;DR: Simulation results indicate that the hidden watermark data is robust against different attacks and the proposed scheme has high data payload and provides superior performance compared to the state-of-the-art watermarking schemes reported recently.
Abstract: This paper proposes a blind singular value decomposition (SVD) based audio watermarking scheme using entropy and log-polar transformation (LPT) for copyright protection of audio signal. In our proposed scheme, initially the original audio is segmented into non-overlapping frames and discrete cosine transform (DCT) is applied to each frame. Low frequency DCT coefficients are divided into sub band and entropy of each sub band is calculated. Watermark data is embedded into the Cartesian components of the largest singular value obtained from the DCT sub band with highest entropy value of each frame by quantization. Simulation results indicate that the hidden watermark data is robust against different attacks. The comparison analysis shows that the proposed scheme has high data payload and provides superior performance compared to the state-of-the-art watermarking schemes reported recently.

42 citations

Proceedings ArticleDOI
07 May 1996
TL;DR: A new method to estimate F/sub 0/ utilizing the autocorrelation function of the log spectrum is proposed, named ACLOS (autocor correlation of log spectrum).
Abstract: Correct and robust measurement of fundamental frequency F/sub 0/ is requested in the analysis of speech sound, speech enhancement systems and analysis synthesis telephony employing F/sub 0/. This article proposes a new method to estimate F/sub 0/ utilizing the autocorrelation function of the log spectrum. This method is named ACLOS (autocorrelation of log spectrum). In this method, the fundamental frequency is estimated from the peak of the autocorrelation function along the frequency axis of the spectral pattern. It means that F/sub 0/ is measured by enhancing the periodic feature of harmonics on the spectrum. Experimental results indicate that ACLOS gives more robust and reasonable pitch information than other methods give.

32 citations

Journal ArticleDOI
TL;DR: A blind audio watermarking scheme in discrete cosine transform (DCT) domain based on singular value decomposition (SVD), exponential operation (EO), and logarithm operation (LO) which is highly robust against different attacks and has high data payload and shows low error probability rates.
Abstract: Digital watermarking has drawn extensive attention for copyright protection of multimedia data. This paper introduces a blind audio watermarking scheme in discrete cosine transform (DCT) domain based on singular value decomposition (SVD), exponential operation (EO), and logarithm operation (LO). In our proposed scheme, initially the original audio is segmented into non-overlapping frames and DCT is applied to each frame. Low frequency DCT coefficients are divided into sub-bands and power of each sub band is calculated. EO is performed on the sub-band with highest power of the DCT coefficients of each frame. SVD is applied to the exponential coefficients of each sub bands with highest power represented in matrix form. Watermark information is embedded into the largest singular value by using a quantization function. Simulation results indicate that the proposed watermarking scheme is highly robust against different attacks. In addition, it has high data payload and shows low error probability rates. Moreover, it provides good performance in terms of imperceptibility, robustness, and data payload compared with some recent state-of-the-art watermarking methods.

27 citations

Journal ArticleDOI
TL;DR: Experimental results indicate that the proposed watermarking scheme is highly robust against various signal processing attacks, and it outperforms state-of-the-art audioWatermarking methods in terms of imperceptibility, robustness, and data payload.
Abstract: This paper presents an audio watermarking scheme in fast Fourier transform (FFT) domain based on singular value decomposition (SVD) and Cartesian-polar transformation (CPT). In our proposed scheme, initially the original audio is segmented into nonoverlapping frames. FFT is applied to each frame and low frequency FFT coefficients are selected. SVD is applied to the selected FFT coefficients of each frame represented in a matrix form. The highest singular values of each frame are selected and are decomposed into two components using CPT. Watermark information is embedded into each of these CPT components using an embedding function. Experimental results indicate that the proposed watermarking scheme is highly robust against various signal processing attacks. In addition, the proposed scheme has a high data payload. Moreover, it outperforms state-of-the-art audio watermarking methods in terms of imperceptibility, robustness, and data payload.

25 citations

Journal ArticleDOI
TL;DR: A new noise suppression technique that executes iterative processing and sets parameters that are suited for that processing in the spectral subtraction method, which is a noise reduction technique for noise-added speech.
Abstract: In this paper, the authors propose a new noise suppression technique that executes iterative processing and sets parameters that are suited for that processing in the spectral subtraction method, which is a noise reduction technique for noise-added speech. Iterative processing is a technique in which speech enhancement processing is executed by considering the estimated speech that is obtained when noise reduction processing is executed once as the input signal again so that a reduction of the residual noise is anticipated. A further reduction of residual noise for which speech degradation is controlled can be achieved by adjusting the parameters for each iteration. The authors also simultaneously propose a technique for maintaining the real-time nature of spectral subtraction when the proposed technique is executed. They use actual speech to which white noise, automobile noise, and crowd babble noise is added to compare the characteristics of the two proposed methods with the conventional spectral subtraction method and its improvements. The authors verified according to objective and subjective evaluations that each proposed technique showed superior results in all noisy environments. © 2006 Wiley Periodicals, Inc. Electron Comm Jpn Pt 3, 90(4): 39 –51, 2007; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjc.20242

22 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The spectral smoothness principle is proposed as an efficient new mechanism in estimating the spectral envelopes of detected sounds and works robustly in noise, and is able to handle sounds that exhibit inharmonicities.
Abstract: A new method for estimating the fundamental frequencies of concurrent musical sounds is described. The method is based on an iterative approach, where the fundamental frequency of the most prominent sound is estimated, the sound is subtracted from the mixture, and the process is repeated for the residual signal. For the estimation stage, an algorithm is proposed which utilizes the frequency relationships of simultaneous spectral components, without assuming ideal harmonicity. For the subtraction stage, the spectral smoothness principle is proposed as an efficient new mechanism in estimating the spectral envelopes of detected sounds. With these techniques, multiple fundamental frequency estimation can be performed quite accurately in a single time frame, without the use of long-term temporal features. The experimental data comprised recorded samples of 30 musical instruments from four different sources. Multiple fundamental frequency estimation was performed for random sound source and pitch combinations. Error rates for mixtures ranging from one to six simultaneous sounds were 1.8%, 3.9%, 6.3%, 9.9%, 14%, and 18%, respectively. In musical interval and chord identification tasks, the algorithm outperformed the average of ten trained musicians. The method works robustly in noise, and is able to handle sounds that exhibit inharmonicities. The inharmonicity factor and spectral envelope of each sound is estimated along with the fundamental frequency.

356 citations

Journal ArticleDOI
TL;DR: This work presents a robust algorithm for multipitch tracking of noisy speech that combines an improved channel and peak selection method, a new method for extracting periodicity information across different channels, and a hidden Markov model for forming continuous pitch tracks.
Abstract: An effective multipitch tracking algorithm for noisy speech is critical for acoustic signal processing. However, the performance of existing algorithms is not satisfactory. We present a robust algorithm for multipitch tracking of noisy speech. Our approach integrates an improved channel and peak selection method, a new method for extracting periodicity information across different channels, and a hidden Markov model (HMM) for forming continuous pitch tracks. The resulting algorithm can reliably track single and double pitch tracks in a noisy environment. We suggest a pitch error measure for the multipitch situation. The proposed algorithm is evaluated on a database of speech utterances mixed with various types of interference. Quantitative comparisons show that our algorithm significantly outperforms existing ones.

308 citations

Book
01 Jan 1989
TL;DR: This paper presents principal characteristics of speech speech production models speech analysis and analysis-synthesis systems linear predictive coding (LPC) analysis speech coding speech synthesis speech recognition future directions of speech processing.
Abstract: Principal characteristics of speech speech production models speech analysis and analysis-synthesis systems linear predictive coding (LPC) analysis speech coding speech synthesis speech recognition future directions of speech processing. Appendices: convolution and z-transform vector quantization algorithm neural nests.

307 citations