SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks

doi:10.1145/3290605.3300376

Proceedings ArticleDOI

SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks

Naoki Kimura, +2 more

- pp 146

Chats0

TLDR

A system to detect a user's unvoiced utterance and recognize the utterance contents without the user's uttering voice is proposed, and it is confirmed that audio signals generated by the system can control the existing smart speakers.

Abstract:

The availability of digital devices operated by voice is expanding rapidly. However, the applications of voice interfaces are still restricted. For example, speaking in public places becomes an annoyance to the surrounding people, and secret information should not be uttered. Environmental noise may reduce the accuracy of speech recognition. To address these limitations, a system to detect a user's unvoiced utterance is proposed. From internal information observed by an ultrasonic imaging sensor attached to the underside of the jaw, our proposed system recognizes the utterance contents without the user's uttering voice. Our proposed deep neural network model is used to obtain acoustic features from a sequence of ultrasound images. We confirmed that audio signals generated by our system can control the existing smart speakers. We also observed that a user can adjust their oral movement to learn and improve the accuracy of their voice recognition.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Silent Speech Interfaces for Speech Restoration: A Review

Jose A. Gonzalez-Lopez, +4 more

- 24 Sep 2020 -

IEEE Access

TL;DR: A number of challenges remain to be addressed in future research before SSIs can be promoted to real-world applications, and future SSIs will improve the lives of persons with severe speech impairments by restoring their communication capabilities.

...read moreread less

Journal ArticleDOI

Combating Replay Attacks Against Voice Assistants

Swadhin Pradhan, +3 more

TL;DR: This work develops an end-to-end system to detect replay attacks without requiring a user to wear any wearable device and shows the overall system offers low false positive and false negative when evaluated against a range of attacks.

...read moreread less

Journal ArticleDOI

Endophasia: Utilizing Acoustic-Based Imaging for Issuing Contact-Free Silent Speech Commands

Yongzhao Zhang, +8 more

TL;DR: This photojournalism competition aims to promote positive attitudes towards diversity in the photographicjournalism industry by promoting positive images of people from diverse backgrounds.

...read moreread less

Journal ArticleDOI

Biosignal Sensors and Deep Learning-Based Speech Recognition: A Review.

Wookey Lee, +5 more

- 17 Feb 2021 -

Sensors

TL;DR: In this article, a survey of mouth interface technologies for speech recognition, production, and volitional control is presented, and the corresponding research to develop artificial mouth technologies based on various sensors, including electromyography (EMG), electroencephalography (EEG), electropalatography (EPG), electromagnetic articulography (EMA), permanent magnet articULography (PMA), gyros, images and 3-axial magnetic sensors, especially with deep learning techniques.

...read moreread less

Proceedings ArticleDOI

C-Face: Continuously Reconstructing Facial Expressions by Deep Learning Contours of the Face with Ear-mounted Miniature Cameras

Tuochao Chen, +5 more

TL;DR: This study implemented and evaluated C-Face for two applications: facial expression detection (outputting emojis) and silent speech recognition and found that the mean error of all 42 feature points was 0.77 mm for earphones and 0.74 mm for headphones.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Book ChapterDOI

U-Net: Convolutional Networks for Biomedical Image Segmentation

Olaf Ronneberger, +2 more

TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.

...read moreread less

Posted Content

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 22 Dec 2014 -

arXiv: Learning

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.

...read moreread less

Posted Content

U-Net: Convolutional Networks for Biomedical Image Segmentation

Olaf Ronneberger, +2 more

- 18 May 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: It is shown that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.

...read moreread less

Journal ArticleDOI

Signal estimation from modified short-time Fourier transform

D. Griffin, +1 more

- 01 Apr 1984 -

IEEE Transactions on Acoustics, Speech, ...

TL;DR: An algorithm to estimate a signal from its modified short-time Fourier transform (STFT) by minimizing the mean squared error between the STFT of the estimated signal and the modified STFT magnitude is presented.

...read moreread less