scispace - formally typeset
Journal ArticleDOI

Fundamental frequency estimation based on the joint time-frequency analysis of harmonic spectral structure

Reads0
Chats0
TLDR
A pitch measure to detect the harmonic characteristics of voiced sounds on the spectrum of a speech signal and a fast adaptive representation (FAR) algorithm, which reduces the computation complexity of the original algorithm by 50%.
Abstract
In this paper, we propose a new scheme to analyze the spectral structure of speech signals for fundamental frequency estimation. First, we propose a pitch measure to detect the harmonic characteristics of voiced sounds on the spectrum of a speech signal. This measure utilizes the properties that there are distinct impulses located at the positions of fundamental frequency and its harmonics, and the energy of voiced sound is dominated by the energy of these distinct harmonic impulses. The spectrum can be obtained by the fast Fourier transform (FFT) however, it may be destroyed when the speech is interfered with by additive noise. To enhance the robustness of the proposed scheme in noisy environments, we apply the joint time-frequency analysis (JTFA) technique to obtain the adaptive representation of the spectrum of speech signals. The adaptive representation can accurately extract important harmonic structure of noisy speech signals at the expense of high computation cost. To solve this problem, we further propose a fast adaptive representation (FAR) algorithm, which reduces the computation complexity of the original algorithm by 50%. The performance of the proposed fundamental-frequency estimation scheme is evaluated on a large database with or without additive noise. The performance is compared to that of other approaches on the same database. The experimental results show that the proposed scheme performs well on clean speech and is robust in noisy environments.

read more

Citations
More filters
Journal ArticleDOI

A multipitch tracking algorithm for noisy speech

TL;DR: This work presents a robust algorithm for multipitch tracking of noisy speech that combines an improved channel and peak selection method, a new method for extracting periodicity information across different channels, and a hidden Markov model for forming continuous pitch tracks.
Journal ArticleDOI

A spectral/temporal method for robust fundamental frequency tracking

TL;DR: A fundamental frequency (F(0) tracking algorithm is presented that is extremely robust for both high quality and telephone speech, at signal to noise ratios ranging from clean speech to very noisy speech.
Journal ArticleDOI

Robust and accurate fundamental frequency estimation based on dominant harmonic components

TL;DR: The present method is better than previously reported methods in terms of both gross and fine F0 errors and the fundamental frequency is more accurately estimated from reliable harmonic components which are easy to select given the dominance spectra.
Journal ArticleDOI

A method for fundamental frequency estimation and voicing decision: Application to infant utterances recorded in real acoustical environments

TL;DR: The ripple-enhanced power spectrum based method (REPS) and the use of instantaneous frequency (IF) enables us to refine the accuracy of the F0 estimates, and the degree of dominance defined based on the IF is introduced as a robust voicing decision measure.
References
More filters
Book

Discrete-Time Signal Processing

TL;DR: In this paper, the authors provide a thorough treatment of the fundamental theorems and properties of discrete-time linear systems, filtering, sampling, and discrete time Fourier analysis.
Journal ArticleDOI

Matching pursuits with time-frequency dictionaries

TL;DR: The authors introduce an algorithm, called matching pursuit, that decomposes any signal into a linear expansion of waveforms that are selected from a redundant dictionary of functions, chosen in order to best match the signal structures.
Book

Fundamentals of speech recognition

TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
Book

Discrete-Time Processing of Speech Signals

TL;DR: The preface to the IEEE Edition explains the background to speech production, coding, and quality assessment and introduces the Hidden Markov Model, the Artificial Neural Network, and Speech Enhancement.
Book

Speech Analysis, Synthesis and Perception

TL;DR: A second edition was begun in 1970, the aim was to retain the original format, but to expand the content, especially in the areas of digital communications and com puter techniques for speech signal processing.