scispace - formally typeset
Proceedings ArticleDOI

SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks

Reads0
Chats0
TLDR
A system to detect a user's unvoiced utterance and recognize the utterance contents without the user's uttering voice is proposed, and it is confirmed that audio signals generated by the system can control the existing smart speakers.
Abstract
The availability of digital devices operated by voice is expanding rapidly. However, the applications of voice interfaces are still restricted. For example, speaking in public places becomes an annoyance to the surrounding people, and secret information should not be uttered. Environmental noise may reduce the accuracy of speech recognition. To address these limitations, a system to detect a user's unvoiced utterance is proposed. From internal information observed by an ultrasonic imaging sensor attached to the underside of the jaw, our proposed system recognizes the utterance contents without the user's uttering voice. Our proposed deep neural network model is used to obtain acoustic features from a sequence of ultrasound images. We confirmed that audio signals generated by our system can control the existing smart speakers. We also observed that a user can adjust their oral movement to learn and improve the accuracy of their voice recognition.

read more

Citations
More filters
Journal ArticleDOI

Silent Speech Interfaces for Speech Restoration: A Review

TL;DR: A number of challenges remain to be addressed in future research before SSIs can be promoted to real-world applications, and future SSIs will improve the lives of persons with severe speech impairments by restoring their communication capabilities.
Journal ArticleDOI

Combating Replay Attacks Against Voice Assistants

TL;DR: This work develops an end-to-end system to detect replay attacks without requiring a user to wear any wearable device and shows the overall system offers low false positive and false negative when evaluated against a range of attacks.
Journal ArticleDOI

Endophasia: Utilizing Acoustic-Based Imaging for Issuing Contact-Free Silent Speech Commands

TL;DR: This photojournalism competition aims to promote positive attitudes towards diversity in the photographicjournalism industry by promoting positive images of people from diverse backgrounds.
Journal ArticleDOI

Biosignal Sensors and Deep Learning-Based Speech Recognition: A Review.

TL;DR: In this article, a survey of mouth interface technologies for speech recognition, production, and volitional control is presented, and the corresponding research to develop artificial mouth technologies based on various sensors, including electromyography (EMG), electroencephalography (EEG), electropalatography (EPG), electromagnetic articulography (EMA), permanent magnet articULography (PMA), gyros, images and 3-axial magnetic sensors, especially with deep learning techniques.
Proceedings ArticleDOI

C-Face: Continuously Reconstructing Facial Expressions by Deep Learning Contours of the Face with Ear-mounted Miniature Cameras

TL;DR: This study implemented and evaluated C-Face for two applications: facial expression detection (outputting emojis) and silent speech recognition and found that the mean error of all 42 feature points was 0.77 mm for earphones and 0.74 mm for headphones.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Book ChapterDOI

U-Net: Convolutional Networks for Biomedical Image Segmentation

TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Posted Content

Adam: A Method for Stochastic Optimization

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.
Posted Content

U-Net: Convolutional Networks for Biomedical Image Segmentation

TL;DR: It is shown that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Journal ArticleDOI

Signal estimation from modified short-time Fourier transform

TL;DR: An algorithm to estimate a signal from its modified short-time Fourier transform (STFT) by minimizing the mean squared error between the STFT of the estimated signal and the modified STFT magnitude is presented.
Related Papers (5)
Trending Questions (1)
Can I use component speakers without amplifier?

We confirmed that audio signals generated by our system can control the existing smart speakers.