scispace - formally typeset
T

Thomas Drugman

Researcher at Amazon.com

Publications -  166
Citations -  4479

Thomas Drugman is an academic researcher from Amazon.com. The author has contributed to research in topics: Speech synthesis & Speech processing. The author has an hindex of 32, co-authored 159 publications receiving 3837 citations. Previous affiliations of Thomas Drugman include Faculté polytechnique de Mons & University of Mons.

Papers
More filters
Proceedings ArticleDOI

COVAREP — A collaborative voice analysis repository for speech technologies

TL;DR: An overview of the current offerings of COVAREP is provided and a demonstration of the algorithms through an emotion classification experiment is included, to allow more reproducible research by strengthening complex implementations through shared contributions and openly available code.
Proceedings Article

Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics

TL;DR: In this article, a method using harmonic information in the residual signal is presented for pitch tracking in noisy conditions, which is used both for pitch estimation, as well as for determining the voicing segments of speech.
Journal ArticleDOI

Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review

TL;DR: In this paper, five state-of-the-art GCI detection algorithms are compared using six different databases with contemporaneous electroglottographic recordings as ground truth, and containing many hours of speech by multiple speakers.
Proceedings Article

Glottal closure and opening instant detection from speech signals.

TL;DR: In this paper, a new procedure was proposed to detect Glottal Closure and Opening Instants (GCIs and GOIs) directly from speech waveforms, which is divided into two successive steps.
Proceedings Article

A Deterministic plus Stochastic Model of the Residual Signal for Improved Parametric Speech Synthesis

TL;DR: In this paper, an adaptation of the Deterministic plus Stochastic Model (DSM) for the residual is proposed, where the excitation is divided into two distinct spectral bands delimited by the maximum voiced frequency.