Home
/
Authors
/
Takayoshi Yoshimura

Author

Takayoshi Yoshimura

Other affiliations: Nagoya Institute of Technology

Bio: Takayoshi Yoshimura is an academic researcher from Toyota. The author has contributed to research in topics: Speech synthesis & Hidden Markov model. The author has an hindex of 16, co-authored 37 publications receiving 2783 citations. Previous affiliations of Takayoshi Yoshimura include Nagoya Institute of Technology.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Speech parameter generation algorithms for HMM-based speech synthesis

[...]

Keiichi Tokuda¹, Takayoshi Yoshimura¹, Takashi Masuko², Takao Kobayashi², Tadashi Kitamura¹ - Show less +1 more•Institutions (2)

Nagoya Institute of Technology¹, Tokyo Institute of Technology²

05 Jun 2000

TL;DR: A speech parameter generation algorithm for HMM-based speech synthesis, in which the speech parameter sequence is generated from HMMs whose observation vector consists of a spectral parameter vector and its dynamic feature vectors, is derived.

...read moreread less

Abstract: This paper derives a speech parameter generation algorithm for HMM-based speech synthesis, in which the speech parameter sequence is generated from HMMs whose observation vector consists of a spectral parameter vector and its dynamic feature vectors. In the algorithm, we assume that the state sequence (state and mixture sequence for the multi-mixture case) or a part of the state sequence is unobservable (i.e., hidden or latent). As a result, the algorithm iterates the forward-backward algorithm and the parameter generation algorithm for the case where the state sequence is given. Experimental results show that by using the algorithm, we can reproduce clear formant structure from multi-mixture HMMs as compared with that produced from single-mixture HMMs.

...read moreread less

1,071 citations

Proceedings Article•

Simultaneous Modeling of Spectrum, Pitch and Duration in HMM-Based Speech Synthesis

[...]

Takayoshi Yoshimura, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, Tadashi Kitamura - Show less +1 more

01 Jan 1999

TL;DR: An HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM is described.

...read moreread less

Abstract: In this paper, we describe an HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM. In the system, pitch and state duration are modeled by multi-space probability distribution HMMs and multi-dimensional Gaussian distributions, respectively. The distributions for spectral parameter, pitch parameter and the state duration are clustered independently by using a decision-tree based context clustering technique. Synthetic speech is generated by using an speech parameter generation algorithm from HMM and a mel-cepstrum based vocoding technique. Through informal listening tests, we have confirmed that the proposed system successfully synthesizes natural-sounding speech which resembles the speaker in the training database.

...read moreread less

759 citations

Proceedings Article•

Mixed Excitation for HMM-based Speech Synthesis

[...]

Takayoshi Yoshimura¹, Keiichi Tokuda¹, Takashi Masuko¹, Takao Kobayashi², Tadashi Kitamura² - Show less +1 more•Institutions (2)

Nagoya Institute of Technology¹, Tokyo Institute of Technology²

01 Jan 2001

TL;DR: Improvements on the excitation model of an HMM-based text-to-speech system is described and the result of a listening test shows that the mixed excite model significantly improves quality of synthesized speech as compared with the traditional excited model.

...read moreread less

Abstract: This paper describes improvements on the excitation model of an HMM-based text-to-speech system. In our previous work, natural sounding speech can be synthesized from trained HMMs. However, it has a typical quality of “vocoded speech” since the system uses a traditional excitation model with either a periodic impulse train or white noise. In this paper, in order to reduce the synthetic quality, a mixed excitation model used in MELP is incorporated into the system. Excitation parameters used in mixed excitation are modeled by HMMs, and generated from HMMs by a parameter generation algorithm in the synthesis phase. The result of a listening test shows that the mixed excitation model significantly improves quality of synthesized speech as compared with the traditional excitation model.

...read moreread less

161 citations

Journal Article•DOI•

Speaker interpolation for HMM-based speech synthesis system.

[...]

Takayoshi Yoshimura¹, Keiichi Tokuda¹, Takashi Masuko², Takao Kobayashi², Tadashi Kitamura¹ - Show less +1 more•Institutions (2)

Nagoya Institute of Technology¹, Tokyo Institute of Technology²

01 Jan 2000-The Journal of The Acoustical Society of Japan (e)

TL;DR: An approach to voice characteristics conversion for an HMM-based text-to-speech synthesis system using speaker interpolation, which can synthesize speech with various voice quality without large database in synthesis phase.

...read moreread less

Abstract: This paper describes an approach to voice characteristics conversion for an HMM-based text-to-speech synthesis system using speaker interpolation.Although most text-to-speech synthesis systems which synthesize speech by concatenating speech units can synthesize speech with acceptable quality, they still cannot synthesize speech with various voice quality such as speaker individualities and emotions;In order to control speaker individualities and emotions, therefore, they need a large database, which records speech units with various voice characteristics in sythesis phase.On the other hand, our system synthesize speech with untrained speaker’s voice quality by interpolating HMM parameters among some representative speakers’ HMM sets.Accordingly, our system can synthesize speech with various voice quality without large database in synthesis phase.An HMM interpolation technique is derived from a probabilistic similarity measure for HMMs, and used to synthesize speech with untrained speaker’s voice quality by interpolating HMM parameters among some representative speakers’ HMM sets.The results of subjective experiments show that we can gradually change the voice quality of synthesized speech from one’s to the other’s by changing the interpolation ratio.

...read moreread less

140 citations

Proceedings Article•

Duration modeling for HMM-based speech synthesis.

[...]

Takayoshi Yoshimura, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, Tadashi Kitamura - Show less +1 more

01 Jan 1998

TL;DR: This paper takes account of contextual factors such as stressrelated factors and locational factors in addition to phone identity factors to synthesize good quality speech with natural timing and the speaking rate can be varied easily.

...read moreread less

Abstract: This paper proposes a new approach to state duration modeling for HMM-based speech synthesis. A set of state durations of each phoneme HMM is modeled by a multi-dimensional Gaussian distribution, and duration models are clustered using a decision tree based context clustering technique. In the synthesis stage, state durations are determined by using the state duration models. In this paper, we take account of contextual factors such as stressrelated factors and locational factors in addition to phone identity factors. Experimental results show that we can synthesize good quality speech with natural timing, and the speaking rate can be varied easily.

...read moreread less

140 citations

1
2
3
4
…
5
6
7
8

Collapse

Cited by

PDF

Open Access

More filters

Book Chapter•DOI•

System Identification I

[...]

Biao Huang¹, Yutong Qi, Akm Monjur Murshed²•Institutions (2)

University of Alberta¹, Shell Canada Limited²

11 Dec 2012

1,704 citations

Journal Article•DOI•

Statistical Parametric Speech Synthesis

[...]

Alan W. Black¹, Heiga Zen², Keiichi Tokuda²•Institutions (2)

Carnegie Mellon University¹, Nagoya Institute of Technology²

15 Apr 2007

TL;DR: This paper gives a general overview of techniques in statistical parametric speech synthesis, and contrasts these techniques with the more conventional unit selection technology that has dominated speech synthesis over the last ten years.

...read moreread less

Abstract: This paper gives a general overview of techniques in statistical parametric speech synthesis. One of the instances of these techniques, called HMM-based generation synthesis (or simply HMM-based synthesis), has recently been shown to be very effective in generating acceptable speech synthesis. This paper also contrasts these techniques with the more conventional unit selection technology that has dominated speech synthesis over the last ten years. Advantages and disadvantages of statistical parametric synthesis are highlighted as well as identifying where we expect the key developments to appear in the immediate future.

...read moreread less

1,270 citations

贝叶斯滤波与平滑 (Bayesian filtering and smoothing)

[...]

Simo Särkkä

01 Jan 2015

TL;DR: This compact, informal introduction for graduate students and advanced undergraduates presents the current state-of-the-art filtering and smoothing methods in a unified Bayesian framework and learns what non-linear Kalman filters and particle filters are, how they are related, and their relative advantages and disadvantages.

...read moreread less

Abstract: Filtering and smoothing methods are used to produce an accurate estimate of the state of a time-varying system based on multiple observational inputs (data). Interest in these methods has exploded in recent years, with numerous applications emerging in fields such as navigation, aerospace engineering, telecommunications, and medicine. This compact, informal introduction for graduate students and advanced undergraduates presents the current state-of-the-art filtering and smoothing methods in a unified Bayesian framework. Readers learn what non-linear Kalman filters and particle filters are, how they are related, and their relative advantages and disadvantages. They also discover how state-of-the-art Bayesian parameter estimation methods can be combined with state-of-the-art filtering and smoothing algorithms. The book’s practical and algorithmic approach assumes only modest mathematical prerequisites. Examples include MATLAB computations, and the numerous end-of-chapter exercises include computational assignments. MATLAB/GNU Octave source code is available for download at www.cambridge.org/sarkka, promoting hands-on work with the methods.

...read moreread less

1,102 citations

Proceedings Article•DOI•

Speech parameter generation algorithms for HMM-based speech synthesis

[...]

Keiichi Tokuda¹, Takayoshi Yoshimura¹, Takashi Masuko², Takao Kobayashi², Tadashi Kitamura¹ - Show less +1 more•Institutions (2)

Nagoya Institute of Technology¹, Tokyo Institute of Technology²

05 Jun 2000

...read moreread less

1,071 citations

Journal Article•DOI•

Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory

[...]

Tomoki Toda¹, Alan W. Black², Keiichi Tokuda³•Institutions (3)

Nara Institute of Science and Technology¹, Carnegie Mellon University², Nagoya Institute of Technology³

01 Nov 2007-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: In this article, a Gaussian mixture model (GMM) of the joint probability density of source and target features is employed for performing spectral conversion between speakers, and a conversion method based on the maximum-likelihood estimation of a spectral parameter trajectory is proposed.

...read moreread less

Abstract: In this paper, we describe a novel spectral conversion method for voice conversion (VC). A Gaussian mixture model (GMM) of the joint probability density of source and target features is employed for performing spectral conversion between speakers. The conventional method converts spectral parameters frame by frame based on the minimum mean square error. Although it is reasonably effective, the deterioration of speech quality is caused by some problems: 1) appropriate spectral movements are not always caused by the frame-based conversion process, and 2) the converted spectra are excessively smoothed by statistical modeling. In order to address those problems, we propose a conversion method based on the maximum-likelihood estimation of a spectral parameter trajectory. Not only static but also dynamic feature statistics are used for realizing the appropriate converted spectrum sequence. Moreover, the oversmoothing effect is alleviated by considering a global variance feature of the converted spectra. Experimental results indicate that the performance of VC can be dramatically improved by the proposed method in view of both speech quality and conversion accuracy for speaker individuality.

...read moreread less

914 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse