Improved phase vocoder time-scale modification of audio

doi:10.1109/89.759041

Home
/
Papers
/
Improved phase vocoder time-scale modification of audio

Journal Article•DOI•

Improved phase vocoder time-scale modification of audio

01 May 1999-IEEE Transactions on Speech and Audio Processing (IEEE)-Vol. 7, Iss: 3, pp 323-332

TL;DR: This paper examines the problem of phasiness in the context of time-scale modification and provides new insights into its causes, and two extensions to the standard phase vocoder algorithm are introduced, and the resulting sound quality is shown to be significantly improved.

read less

Abstract: The phase vocoder is a well established tool for time scaling and pitch shifting speech and audio signals via modification of their short-time Fourier transforms (STFTs). In contrast to time-domain time-scaling and pitch-shifting techniques, the phase vocoder is generally considered to yield high quality results, especially for large modification factors and/or polyphonic signals. However, the phase vocoder is also known for introducing a characteristic perceptual artifact, often described as "phasiness", "reverberation", or "loss of presence". This paper examines the problem of phasiness in the context of time-scale modification and provides new insights into its causes. Two extensions to the standard phase vocoder algorithm are introduced, and the resulting sound quality is shown to be significantly improved. Moreover, the modified phase vocoder is shown to provide a factor-of-two decrease in computational cost.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Book•

Digital Watermarking and Steganography

[...]

Ingemar J. Cox, Matthew L. Miller, Jeffrey Adam Bloom, Jessica Fridrich, Ton Kalker - Show less +1 more

23 Nov 2007

TL;DR: This new edition now contains essential information on steganalysis and steganography, and digital watermark embedding is given a complete update with new processes and applications.

...read moreread less

Abstract: Digital audio, video, images, and documents are flying through cyberspace to their respective owners. Unfortunately, along the way, individuals may choose to intervene and take this content for themselves. Digital watermarking and steganography technology greatly reduces the instances of this by limiting or eliminating the ability of third parties to decipher the content that he has taken. The many techiniques of digital watermarking (embedding a code) and steganography (hiding information) continue to evolve as applications that necessitate them do the same. The authors of this second edition provide an update on the framework for applying these techniques that they provided researchers and professionals in the first well-received edition. Steganography and steganalysis (the art of detecting hidden information) have been added to a robust treatment of digital watermarking, as many in each field research and deal with the other. New material includes watermarking with side information, QIM, and dirty-paper codes. The revision and inclusion of new material by these influential authors has created a must-own book for anyone in this profession. *This new edition now contains essential information on steganalysis and steganography *New concepts and new applications including QIM introduced *Digital watermark embedding is given a complete update with new processes and applications

...read moreread less

1,773 citations

Book•

The Sonification Handbook

[...]

Stefania Serafin, Karmen Franinovic, Thomas Hermann, Guillaume Lemaitre, Michal RInott, Davide Rocchesso - Show less +2 more

26 Jan 2011

TL;DR: In this paper, the authors propose a CONCRETE-based approach to solve the problem of concreTE-convexity, i.e., concrete-concrete.

...read moreread less

Abstract: CONCRETE

...read moreread less

447 citations

Patent•

Creating Music by Listening

[...]

Tristan Jehan¹•Institutions (1)

Massachusetts Institute of Technology¹

15 Jun 2007

TL;DR: In this paper, a method to create new music by listening to a plurality of music, learning from the plurality, and performing concatenative synthesis based on the listening and the learning to create the new music is described.

...read moreread less

Abstract: Automated creation of new music by listening is disclosed. A method to create new music may comprise listening to a plurality of music, learning from the plurality of music, and performing concatenative synthesis based on the listening and the learning to create the new music. The method may be performed on a computing device having an audio interface, such as a personal computer.

...read moreread less

214 citations

Journal Article•DOI•

Phase Processing for Single-Channel Speech Enhancement: History and recent advances

[...]

Timo Gerkmann¹, Martin Krawczyk-Becker¹, Jonathan Le Roux²•Institutions (2)

University of Oldenburg¹, Mitsubishi Electric Research Laboratories²

10 Feb 2015-IEEE Signal Processing Magazine

TL;DR: This work focuses on single-channel speech enhancement algorithms which rely on spectrotemporal properties, and can be employed when the miniaturization of devices only allows for using a single microphone.

...read moreread less

Abstract: With the advancement of technology, both assisted listening devices and speech communication devices are becoming more portable and also more frequently used. As a consequence, users of devices such as hearing aids, cochlear implants, and mobile telephones, expect their devices to work robustly anywhere and at any time. This holds in particular for challenging noisy environments like a cafeteria, a restaurant, a subway, a factory, or in traffic. One way to making assisted listening devices robust to noise is to apply speech enhancement algorithms. To improve the corrupted speech, spatial diversity can be exploited by a constructive combination of microphone signals (so-called beamforming), and by exploiting the different spectro?temporal properties of speech and noise. Here, we focus on single-channel speech enhancement algorithms which rely on spectrotemporal properties. On the one hand, these algorithms can be employed when the miniaturization of devices only allows for using a single microphone. On the other hand, when multiple microphones are available, single-channel algorithms can be employed as a postprocessor at the output of a beamformer. To exploit the short-term stationary properties of natural sounds, many of these approaches process the signal in a time-frequency representation, most frequently the short-time discrete Fourier transform (STFT) domain. In this domain, the coefficients of the signal are complex-valued, and can therefore be represented by their absolute value (referred to in the literature both as STFT magnitude and STFT amplitude) and their phase. While the modeling and processing of the STFT magnitude has been the center of interest in the past three decades, phase has been largely ignored.

...read moreread less

210 citations

Patent•DOI•

High quality time-scaling and pitch-scaling of audio signals

[...]

Brett G. Crockett¹•Institutions (1)

Dolby Laboratories¹

12 Feb 2002-Journal of the Acoustical Society of America

TL;DR: In this paper, an audio signal is analyzed using multiple pschoacoustic criteria to identify a region of the signal in which time scaling and/or pitch shifting processing would be inaudible or minimally audible.

...read moreread less

Abstract: In one alternative, an audio signal is analyzed using multiple pschoacoustic criteria to identify a region of the signal in which time scaling and/or pitch shifting processing whould be inaudible or minimally audible, and the signal is time scaled and/or pitch shifted within that region. In another alternative, the signal is divided into auditory events, and the signal is time scaled and/or pitch shifted within an auditory event. In a further alternative, the signal is divided into auditory events, and the auditory events are analyzed using a psychoacoustic criterion to identify those auditory events in which the time scaling and/or pitch shifting procession of the signal would be inaudible or minimally audible. Further alternatives provide for multiple channels of audio.

...read moreread less

171 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Signal estimation from modified short-time Fourier transform

[...]

D. Griffin¹, Jae Lim¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Apr 1984-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: An algorithm to estimate a signal from its modified short-time Fourier transform (STFT) by minimizing the mean squared error between the STFT of the estimated signal and the modified STFT magnitude is presented.

...read moreread less

Abstract: In this paper, we present an algorithm to estimate a signal from its modified short-time Fourier transform (STFT). This algorithm is computationally simple and is obtained by minimizing the mean squared error between the STFT of the estimated signal and the modified STFT. Using this algorithm, we also develop an iterative algorithm to estimate a signal from its modified STFT magnitude. The iterative algorithm is shown to decrease, in each iteration, the mean squared error between the STFT magnitude of the estimated signal and the modified STFT magnitude. The major computation involved in the iterative algorithm is the discrete Fourier transform (DFT) computation, and the algorithm appears to be real-time implementable with current hardware technology. The algorithm developed in this paper has been applied to the time-scale modification of speech. The resulting system generates very high-quality speech, and appears to be better in performance than any existing method.

...read moreread less

1,899 citations

Journal Article•DOI•

Speech analysis/Synthesis based on a sinusoidal representation

[...]

R.J. McAulay¹, Thomas F. Quatieri¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Aug 1986-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves, which forms the basis for new approaches to the problems of speech transformations including time-scale and pitch-scale modification, and midrate speech coding.

...read moreread less

Abstract: A sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves. These parameters are estimated from the short-time Fourier transform using a simple peak-picking algorithm. Rapid changes in the highly resolved spectral components are tracked using the concept of "birth" and "death" of the underlying sine waves. For a given frequency track a cubic function is used to unwrap and interpolate the phase such that the phase track is maximally smooth. This phase function is applied to a sine-wave generator, which is amplitude modulated and added to the other sine waves to give the final speech output. The resulting synthetic waveform preserves the general waveform shape and is essentially perceptually indistinguishable from the original speech. Furthermore, in the presence of noise the perceptual characteristics of the speech as well as the noise are maintained. In addition, it was found that the representation was sufficiently general that high-quality reproduction was obtained for a larger class of inputs including: two overlapping, superposed speech waveforms; music waveforms; speech in musical backgrounds; and certain marine biologic sounds. Finally, the analysis/synthesis system forms the basis for new approaches to the problems of speech transformations including time-scale and pitch-scale modification, and midrate speech coding [8], [9].

...read moreread less

1,659 citations

Proceedings Article•DOI•

Signal estimation from modified short-time Fourier transform

[...]

D. Griffin¹, Jae Lim¹•Institutions (1)

Massachusetts Institute of Technology¹

14 Apr 1983

...read moreread less

532 citations

Proceedings Article•DOI•

High quality time-scale modification for speech

[...]

S. Roucos¹, A. Wilgus¹•Institutions (1)

BBN Technologies¹

26 Apr 1985

TL;DR: A new and simple method for speech rate modification that yields high quality rate-modified speech and both objective and informal subjective results for the new and previous TSM methods are presented.

...read moreread less

Abstract: We present a new and simple method for speech rate modification that yields high quality rate-modified speech. Earlier algorithms either required a significant amount of computation for good quality output speech or resulted in poor quality rate-modified speech. The algorithm we describe allows arbitrary linear or nonlinear scaling of the time axis. The algorithm operates in the time domain using a modified overlap-and-add (OLA) procedure on the waveform. It requires moderate computation and could be easily implemented in real time on currently available hardware. The algorithm works equally well on single voice speech, multiple-voice speech, and speech in noise. In this paper, we discuss an earlier algorithm for time-scale modification (TSM), and present both objective and informal subjective results for the new and previous TSM methods.

...read moreread less

420 citations

"Improved phase vocoder time-scale m..." refers background in this paper

...More recently, various authors have noted that the iterative process can be greatly accelerated by calculating good sets of initial STFT phase values [13]....
[...]

Journal Article•DOI•

Non-parametric techniques for pitch-scale and time-scale modification of speech

[...]

Eric Moulines¹, Jean Laroche¹•Institutions (1)

Télécom ParisTech¹

01 Feb 1995-Speech Communication

TL;DR: This contribution reviews frequency-domain algorithms (phase-vocoder) and time- domain algorithms (Time-Domain Pitch-Synchronous Overlap/Add and the like) in the same framework and presents more recent variations of these schemes.

...read moreread less

363 citations

"Improved phase vocoder time-scale m..." refers background or methods in this paper

...A full discussion of time-domain time-scaling techniques and their shortcomings can be found in [7] or [5]....
[...]
...Further elaboration of this point can be found in [4], [7]....
[...]
...Finally, when speech signals are processed, all the above phase-locked techniques still exhibit more reverberation or phasiness than time-domain techniques such as the PSOLA technique [7]....
[...]