Home
/
Authors
/
Daniel Erro

Author

Daniel Erro

Other affiliations: University of Barcelona, Polytechnic University of Catalonia, Ikerbasque

Bio: Daniel Erro is an academic researcher from University of the Basque Country. The author has contributed to research in topics: Speech synthesis & Speech processing. The author has an hindex of 19, co-authored 70 publications receiving 1379 citations. Previous affiliations of Daniel Erro include University of Barcelona & Polytechnic University of Catalonia.

Papers published on a yearly basis

2020
2019
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Voice Conversion Based on Weighted Frequency Warping

[...]

Daniel Erro¹, Asunción Moreno¹, Antonio Bonafonte¹•Institutions (1)

Polytechnic University of Catalonia¹

01 Jul 2010-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: Compared to standard probabilistic systems, Weighted Frequency Warping results in a significant increase in quality scores, whereas the conversion scores remain almost unaltered.

...read moreread less

Abstract: Any modification applied to speech signals has an impact on their perceptual quality. In particular, voice conversion to modify a source voice so that it is perceived as a specific target voice involves prosodic and spectral transformations that produce significant quality degradation. Choosing among the current voice conversion methods represents a trade-off between the similarity of the converted voice to the target voice and the quality of the resulting converted speech, both rated by listeners. This paper presents a new voice conversion method termed Weighted Frequency Warping that has a good balance between similarity and quality. This method uses a time-varying piecewise-linear frequency warping function and an energy correction filter, and it combines typical probabilistic techniques and frequency warping transformations. Compared to standard probabilistic systems, Weighted Frequency Warping results in a significant increase in quality scores, whereas the conversion scores remain almost unaltered. This paper carefully discusses the theoretical aspects of the method and the details of its implementation, and the results of an international evaluation of the new system are also included.

...read moreread less

185 citations

Journal Article•DOI•

INCA Algorithm for Training Voice Conversion Systems From Nonparallel Corpora

[...]

Daniel Erro¹, Asunción Moreno¹, Antonio Bonafonte¹•Institutions (1)

Polytechnic University of Catalonia¹

01 Jul 2010-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This paper proposes a new iterative alignment method that allows pairing phonetically equivalent acoustic vectors from nonparallel utterances from different speakers, even under cross-lingual conditions, and it does not require any phonetic or linguistic information.

...read moreread less

Abstract: Most existing voice conversion systems, particularly those based on Gaussian mixture models, require a set of paired acoustic vectors from the source and target speakers to learn their corresponding transformation function. The alignment of phonetically equivalent source and target vectors is not problematic when the training corpus is parallel, which means that both speakers utter the same training sentences. However, in some practical situations, such as cross-lingual voice conversion, it is not possible to obtain such parallel utterances. With an aim towards increasing the versatility of current voice conversion systems, this paper proposes a new iterative alignment method that allows pairing phonetically equivalent acoustic vectors from nonparallel utterances from different speakers, even under cross-lingual conditions. This method is based on existing voice conversion techniques, and it does not require any phonetic or linguistic information. Subjective evaluation experiments show that the performance of the resulting voice conversion system is very similar to that of an equivalent system trained on a parallel corpus.

...read moreread less

142 citations

Journal Article•DOI•

Harmonics Plus Noise Model Based Vocoder for Statistical Parametric Speech Synthesis

[...]

Daniel Erro¹, Iñaki Sainz¹, Eva Navas¹, Inma Hernaez¹•Institutions (1)

University of the Basque Country¹

01 Apr 2014-IEEE Journal of Selected Topics in Signal Processing

TL;DR: This article presents an extensive explanation of all the different alternatives considered during the design of the HNM-based vocoder, together with the corresponding objective and subjective experiments, and a careful description of its implementation details.

...read moreread less

Abstract: This article explores the potential of the harmonics plus noise model of speech in the development of a high-quality vocoder applicable in statistical frameworks, particularly in modern speech synthesizers. It presents an extensive explanation of all the different alternatives considered during the design of the HNM-based vocoder, together with the corresponding objective and subjective experiments, and a careful description of its implementation details. Three aspects of the analysis have been investigated: refinement of the pitch estimation using quasi-harmonic analysis, study and comparison of several spectral envelope analysis procedures, and strategies to analyze and model the maximum voiced frequency. The performance of the resulting vocoder is shown to be similar to that of state-of-the-art vocoders in synthesis tasks.

...read moreread less

133 citations

Journal Article•DOI•

Parametric Voice Conversion Based on Bilinear Frequency Warping Plus Amplitude Scaling

[...]

Daniel Erro¹, Eva Navas¹, Inma Hernaez¹•Institutions (1)

University of the Basque Country¹

01 Mar 2013-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This article presents a fully parametric formulation of a frequency warping plus amplitude scaling method in which bilinear frequency Warping functions are used and achieves similar performance scores to state-of-the-art statistical methods involving dynamic features and global variance.

...read moreread less

Abstract: Voice conversion methods based on frequency warping followed by amplitude scaling have been recently proposed. These methods modify the frequency axis of the source spectrum in such manner that some significant parts of it, usually the formants, are moved towards their image in the target speaker's spectrum. Amplitude scaling is then applied to compensate for the differences between warped source spectra and target spectra. This article presents a fully parametric formulation of a frequency warping plus amplitude scaling method in which bilinear frequency warping functions are used. Introducing this constraint allows for the conversion error to be described in the cepstral domain and to minimize it with respect to the parameters of the transformation through an iterative algorithm, even when multiple overlapping conversion classes are considered. The paper explores the advantages and limitations of this approach when applied to a cepstral representation of speech. We show that it achieves significant improvements in quality with respect to traditional methods based on Gaussian mixture models, with no loss in average conversion accuracy. Despite its relative simplicity, it achieves similar performance scores to state-of-the-art statistical methods involving dynamic features and global variance.

...read moreread less

85 citations

Proceedings Article•

Improved HNM-based Vocoder for Statistical Synthesizers

[...]

Daniel Erro¹, Iñaki Sainz¹, Eva Navas¹, Inma Hernaez¹•Institutions (1)

University of the Basque Country¹

01 Jan 2011

TL;DR: Some recent improvements related to the excitation parameters, particularly the so called maximum voiced frequency are described, which leads to an even better synthesis performance as confirmed by subjective comparisons with other well-known methods.

...read moreread less

Abstract: Statistical parametric synthesizers have achieved very good performance scores during the last years. Nevertheless, as they require the use of vocoders to parameterize speech (during training) and to reconstruct waveforms (during synthesis), the speech generated from statistical models lacks some degree of naturalness. In previous works we explored the usefulness of the harmonics plus noise model in the design of a high-quality speech vocoder. Quite promising results were achieved when this vocoder was integrated into a synthesizer. In this paper, we describe some recent improvements related to the excitation parameters, particularly the so called maximum voiced frequency. Its estimation and explicit modelling leads to an even better synthesis performance as confirmed by subjective comparisons with other well-known methods.

...read moreread less

82 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet

[...]

Edward J. Vajda

01 Dec 2000-Language

789 citations

Proceedings Article•DOI•

COVAREP — A collaborative voice analysis repository for speech technologies

[...]

Gilles Degottex¹, John Kane², Thomas Drugman³, Tuomo Raitio⁴, Stefan Scherer⁵ - Show less +1 more•Institutions (5)

University of Crete¹, Trinity College, Dublin², University of Mons³, Aalto University⁴, University of Southern California⁵

04 May 2014

TL;DR: An overview of the current offerings of COVAREP is provided and a demonstration of the algorithms through an emotion classification experiment is included, to allow more reproducible research by strengthening complex implementations through shared contributions and openly available code.

...read moreread less

Abstract: Speech processing algorithms are often developed demonstrating improvements over the state-of-the-art, but sometimes at the cost of high complexity. This makes algorithm reimplementations based on literature difficult, and thus reliable comparisons between published results and current work are hard to achieve. This paper presents a new collaborative and freely available repository for speech processing algorithms called COVAREP, which aims at fast and easy access to new speech processing algorithms and thus facilitating research in the field. We envisage that COVAREP will allow more reproducible research by strengthening complex implementations through shared contributions and openly available code which can be discussed, commented on and corrected by the community. Presently COVAREP contains contributions from five distinct laboratories and we encourage contributions from across the speech processing research field. In this paper, we provide an overview of the current offerings of COVAREP and also include a demonstration of the algorithms through an emotion classification experiment.

...read moreread less

503 citations

Journal Article•DOI•

Spoofing and countermeasures for speaker verification

[...]

Zhizheng Wu¹, Nicholas Evans², Tomi Kinnunen³, Junichi Yamagishi⁴, Federico Alegre², Haizhou Li⁵ - Show less +2 more•Institutions (5)

Nanyang Technological University¹, Institut Eurécom², University of Eastern Finland³, University of Edinburgh⁴, Institute for Infocomm Research Singapore⁵

01 Feb 2015-Speech Communication

TL;DR: A survey of past work and priority research directions for the future is provided, showing that future research should address the lack of standard datasets and the over-fitting of existing countermeasures to specific, known spoofing attacks.

...read moreread less

433 citations

Journal Article•DOI•

Secure Face Unlock: Spoof Detection on Smartphones

[...]

Keyurkumar Patel¹, Hu Han², Anil K. Jain¹•Institutions (2)

Michigan State University¹, Chinese Academy of Sciences²

01 Oct 2016-IEEE Transactions on Information Forensics and Security

TL;DR: An efficient face spoof detection system on an Android smartphone based on the analysis of image distortion in spoof face images and an unconstrained smartphone spoof attack database containing more than 1000 subjects are built.

...read moreread less

Abstract: With the wide deployment of the face recognition systems in applications from deduplication to mobile device unlocking, security against the face spoofing attacks requires increased attention; such attacks can be easily launched via printed photos, video replays, and 3D masks of a face. We address the problem of face spoof detection against the print (photo) and replay (photo or video) attacks based on the analysis of image distortion ( e.g. , surface reflection, moire pattern, color distortion, and shape deformation) in spoof face images (or video frames). The application domain of interest is smartphone unlock, given that the growing number of smartphones have the face unlock and mobile payment capabilities. We build an unconstrained smartphone spoof attack database (MSU USSA) containing more than 1000 subjects. Both the print and replay attacks are captured using the front and rear cameras of a Nexus 5 smartphone. We analyze the image distortion of the print and replay attacks using different: 1) intensity channels (R, G, B, and grayscale); 2) image regions (entire image, detected face, and facial component between nose and chin); and 3) feature descriptors. We develop an efficient face spoof detection system on an Android smartphone. Experimental results on the public-domain Idiap Replay-Attack, CASIA FASD, and MSU-MFSD databases, and the MSU USSA database show that the proposed approach is effective in face spoof detection for both the cross-database and intra-database testing scenarios. User studies of our Android face spoof detection system involving 20 participants show that the proposed approach works very well in real application scenarios.

...read moreread less

375 citations

Spoofing and countermeasures for speaker verification: a sur vey

[...]

Zhizheng Wu, Nicholas Evans, Tomi Kinnunen, Junichi Yamagishi, Federico Alegre, Haizhou Li - Show less +2 more

01 Jan 2014

TL;DR: In this paper, the authors provide a survey of spoofing countermeasures for automatic speaker verificati on, highlighting the need for more effort in the future to ensure adequate protection against spoofing attacks.

...read moreread less

Abstract: While biometric authentication has advanced significantly in recent years, evidence shows the technology can be susceptible to malicious spoofing attacks. The research community has resp onded with dedicated countermeasures which aim to detect and deflect such attacks. Even if the literature shows that they can be effective, the problem is far from being solved; biometric systems remain vulnerable to spoofing. Despite a growing momentum to develo p spoofing countermeasures for automatic speaker verificati on, now that the technology has matured suffi ciently to support mass deployment in an array of diverse applications, greater effort will be needed in the future to ensure adequate protection against spoofing. This article provides a survey of past work and ide ntifies priority research directions for the future. We summarise previous studies involving impersonation, replay, speech synthesis and voice conversion spoofing attacks and more recent e fforts to develop dedicated countermeasures. The survey shows that future research should address the lack of standard datasets and the over-fitting of existing countermeasures to specific, know n spoofing attacks.

...read moreread less

371 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191

Collapse