T
Tamás Gábor Csapó
Researcher at Budapest University of Technology and Economics
Publications - 93
Citations - 596
Tamás Gábor Csapó is an academic researcher from Budapest University of Technology and Economics. The author has contributed to research in topics: Speech synthesis & Computer science. The author has an hindex of 13, co-authored 82 publications receiving 472 citations. Previous affiliations of Tamás Gábor Csapó include Eötvös Loránd University.
Papers
More filters
Proceedings ArticleDOI
DNN-Based Ultrasound-to-Speech Conversion for a Silent Speech Interface.
TL;DR: It is found that the representation that used several neighboring image frames in combination with a feature selection method was preferred both by the subjects taking part in the listening experiments, and in terms of the Normalized Mean Squared Error.
Journal ArticleDOI
A comparative study on the contour tracking algorithms in ultrasound tongue images with automatic re-initialization
TL;DR: The results demonstrate that with automatic re-initialization of contour tracking, the tracking error can be reduced from an average of 5-6 to about 4 pixels, a result obtained by using a large number of hand-labeled frames and similarity measurements to extract the contours, which results in improved performance.
Journal ArticleDOI
Convolutional neural network-based automatic classification of midsagittal tongue gestural targets using B-mode ultrasound images
TL;DR: The CNN-based method achieves state-of-the-art performance, even though no pre-training of the CNN was carried out, and the speaker-dependent and speaker-independent tongue gestural target classification experiments are conducted.
Journal ArticleDOI
Speech-centric Multimodal Interaction for Easy-to-access Online Services – A Personal Life Assistant for the Elderly
Antônio Lúcio Teixeira,Annika Hämäläinen,Annika Hämäläinen,Jairo Avelar,Jairo Avelar,Nuno Almeida,Géza Németh,Tibor Fegyó,Csaba Zainkó,Tamás Gábor Csapó,Bálint Tóth,Andre Oliveira,Miguel Sales Dias,Miguel Sales Dias +13 more
TL;DR: The multimodal architecture of the PLA, the services provided by thePLA, and the work done in the area of speech input and output modalities, which play a key role in the application are presented.
Proceedings ArticleDOI
F0 Estimation for DNN-Based Ultrasound Silent Speech Interfaces
TL;DR: Deep neural networks are experimented with to perform articulatory-to-acoustic conversion from ultrasound images, with an emphasis on estimating the voicing feature and the F0 curve from the ultrasound input, with a correlation rate of 0.74.