scispace - formally typeset
Search or ask a question
Author

Yu Chun

Bio: Yu Chun is an academic researcher from Chinese Ministry of Education. The author has contributed to research in topics: Direct voice input & Intelligent electronic device. The author has an hindex of 2, co-authored 7 publications receiving 9 citations.

Papers
More filters
Proceedings ArticleDOI
06 May 2021
TL;DR: FaceSight as mentioned in this paper is a computer vision-based hand-to-face gesture sensing technique for AR glasses, which fixes an infrared camera onto the bridge of AR glasses to provide extra sensing capability of the lower face and hand behaviors.
Abstract: We present FaceSight, a computer vision-based hand-to-face gesture sensing technique for AR glasses. FaceSight fixes an infrared camera onto the bridge of AR glasses to provide extra sensing capability of the lower face and hand behaviors. We obtained 21 hand-to-face gestures and demonstrated the potential interaction benefits through five AR applications. We designed and implemented an algorithm pipeline that segments facial regions, detects hand-face contact (f1 score: 98.36%), and trains convolutional neural network (CNN) models to classify the hand-to-face gestures. The input features include gesture recognition, nose deformation estimation, and continuous fingertip movement. Our algorithm achieves classification accuracy of all gestures at 83.06%, proved by the data of 10 users. Due to the compact form factor and rich gestures, we recognize FaceSight as a practical solution to augment input capability of AR glasses in the future.

19 citations

Patent
25 Jul 2017
TL;DR: In this article, a hand shape tracking method comprises the steps of fixing depth sensors on two or more positions of an arm or a wrist, and the depth sensors are used for obtaining depth map information of the corresponding position.
Abstract: The invention provides a hand shape tracking method and device and a computer readable medium. The hand shape tracking method comprises the steps of fixing depth sensors on two or more positions of an arm or a wrist, and the depth sensors are used for obtaining depth map information of the corresponding position; according to the obtained depth map information, achieving the purpose of rebuilding three-dimensional space information of the hand in real time. Through the arrangement of the depth sensors on the arm or the wrist, the depth map information of the corresponding position is obtained, and portable, cheap and precise hand shape track can be achieved.

3 citations

Patent
08 Nov 2019
TL;DR: In this paper, an interaction method is used for performing voice input when the user carries intelligent electronic equipment. But the interaction burden and difficulty are reduced, and thus, interaction is relatively natural.
Abstract: The invention provides electronic equipment configured with multiple microphones. The electronic equipment is provided with a storage and a central processing unit; a computer executable instruction is stored in the storage; when the computer executable instruction is executed by the central processing unit, the following operations can be executed: sound signals acquired by the multiple microphones are analyzed; whether a user is speaking to the electronic equipment closely or not is judged; and, by responding to the fact that the user is speaking to the electronic equipment closely is determined, the sound signals acquired by the microphones are used as the voice input processing of the user. An interaction method is suitable for performing voice input when the user carries intelligent electronic equipment; operation is natural and simple; the voice input step is simplified; the interaction burden and difficulty are reduced; and thus, interaction is relatively natural.

2 citations

Patent
15 Nov 2019
TL;DR: In this article, a sliding-touch input of a user on the input plane can be adapted to display the sliding track and the candidate words on the display plane, thereby improving the input efficiency of the user and having a high error tolerance rate.
Abstract: The invention provides an input method, an input device, an input system and electronic equipment. The input method is applied to first equipment with a display unit and comprises: receiving a first sliding track input by a user from second equipment, wherein the first sliding track is generated by conducting first sliding touch operation on an input unit of the second equipment by the user; displaying the first sliding track and a soft keyboard for text input on a display unit while receiving the first sliding track; matching the first sliding track with at least one candidate word in a wordbank according to a matching algorithm; and displaying at least one candidate word on a display unit, wherein the first sliding track is displayed as the fixed position, such as the center, of the starting point of the first sliding track on the soft keyboard. In this way, the sliding touch input of the user on the input plane can be adapted to display the sliding track and the candidate words onthe display plane, thereby improving the input efficiency of the user and having a high error-tolerant rate.

1 citations

Patent
20 Sep 2019
TL;DR: In this paper, the authors proposed a voice input triggering method and an intelligent electronic device with a sensor system. And when the intelligent electronic devices is located beside the mouth of the user, voice input is automatically activated.
Abstract: The invention provides a voice input triggering method and an intelligent electronic device. The triggering method is applied to the intelligent electronic device with a sensor system. And when the intelligent electronic device is located beside the mouth of the user, voice input is automatically activated. The mobile device captures signals through the sensor system and judges whether the intelligent electronic device is close to the mouth of the user or not. And in response to determining that the terminal is located beside the mouth of the user, voice input is activated. And under the condition that the user performs voice input, the sensor system detects a signal that the mouth of the user leaves the intelligent electronic device so as to end the voice input application. Therefore, the device is suitable for the user to perform voice input on the intelligent electronic device, improves the reception quality, efficiency and privacy of voice input, and enables interaction to be more natural.

1 citations


Cited by
More filters
Patent
09 Feb 2018
TL;DR: Wang et al. as mentioned in this paper proposed a 3D convolutional neural network sign language identification method integrated with multi-modal data, which can accurately extract the limb movement track information in two different data formats, can effectively reduce the computing complexity of a model, and effectively solves the problem that single classifier encounters error classification because of data loss.
Abstract: The invention discloses a 3D convolutional neural network sign language identification method integrated with multi-modal data. The 3D convolutional neural network sign language identification methodintegrated with multi-modal data includes the steps: constructing a deep neural network, respectively performing characteristic extraction on an infrared image and a contour image of a gesture from the spatial dimension and the time dimension of a video, and integrating two network output based on different data formats to perform final classification of sign language. The 3D convolutional neuralnetwork sign language identification method integrated with multi-modal data can accurately extract the limb movement track information in two different data formats, can effectively reduce the computing complexity of a model, uses a deep learning strategy to integrate the classification results of two networks, and effectively solves the problem that single classifier encounters error classification because of data loss, so as to enable the model to have relatively higher robustness for illumination and background noise of different scenes.

18 citations

Proceedings ArticleDOI
06 May 2021
TL;DR: Auth+Track as discussed by the authors integrates body and near field hand information for user tracking to improve the authentication efficiency and enhance user experiences with better usability by sparse authentication and continuous tracking of the user status.
Abstract: We propose Auth+Track, a novel authentication model that aims to reduce redundant authentication in everyday smartphone usage. By sparse authentication and continuous tracking of the user’s status, Auth+Track eliminates the “gap” authentication between fragmented sessions and enables “Authentication Free when User is Around”. To instantiate the Auth+Track model, we present PanoTrack, a prototype that integrates body and near field hand information for user tracking. We install a fisheye camera on the top of the phone to achieve a panoramic vision that can capture both user’s body and on-screen hands. Based on the captured video stream, we develop an algorithm to extract 1) features for user tracking, including body keypoints and their temporal and spatial association, near field hand status, and 2) features for user identity assignment. The results of our user studies validate the feasibility of PanoTrack and demonstrate that Auth+Track not only improves the authentication efficiency but also enhances user experiences with better usability.

13 citations

Proceedings ArticleDOI
06 May 2021
TL;DR: ProxiMic as discussed by the authors uses the feature from pop noise observed when a user speaks and blows air onto the microphone to detect close-to-mic speech, which can achieve 94.1% activation recall, 12.3 False Accepts per Week per User (FAWU) with 68KB memory size.
Abstract: Wake-up-free techniques (e.g., Raise-to-Speak) are important for improving the voice input experience. We present ProxiMic, a close-to-mic (within 5 cm) speech sensing technique using only one microphone. With ProxiMic, a user keeps a microphone-embedded device close to the mouth and speaks directly to the device without wake-up phrases or button presses. To detect close-to-mic speech, we use the feature from pop noise observed when a user speaks and blows air onto the microphone. Sound input is first passed through a low-pass adaptive threshold filter, then analyzed by a CNN which detects subtle close-to-mic features (mainly pop noise). Our two-stage algorithm can achieve 94.1% activation recall, 12.3 False Accepts per Week per User (FAWU) with 68 KB memory size, which can run at 352 fps on the smartphone. The user study shows that ProxiMic is efficient, user-friendly, and practical.

12 citations

Patent
10 Dec 2020
TL;DR: In this paper, an electronic device configured with a microphone, a voice interactive wake-up method executed by a computer-readable medium, and a central processor, is presented for performing voice input when the user carries a smart electronic device, and is natural and simple in operation.
Abstract: An electronic device configured with a microphone, a voice interactive wakeup method executed by an electronic device configured with a microphone, and a computer-readable medium. The electronic device comprises a memory and a central processor; the memory stores a computer-executable instruction. When the computer-executable instruction is executed by the central processor, the following operations are performed: analyzing a sound signal that is acquired by the microphone, and identifying whether the sound signal comprises the speaking voice of a person and comprises wind noise sound generated because an airflow generated by the speaking of the person hits the microphone (S101); and in response to determining that the sound signal comprises the speaking sound of the person and comprises the wind noise sound generated because the airflow generated by the speaking of the person hits the microphone, using the sound signal as the voice input of a user and performing processing (S102). The method is suitable for performing voice input when the user carries a smart electronic device, and is natural and simple in operation, and the steps of the voice input are simplified, thereby reducing the load and difficulty of interaction, and making interaction more natural.

3 citations

DOI
TL;DR: SafeSense is proposed, a cross-subject face-touch prediction system that combines the sensing capability of smartwatches and smartphones and uses a multi-task learning approach based on autoencoders for learning a subject-invariant representation without any assumptions about the target subjects.
Abstract: The World Health Organization reported that face touching is a primary source of infection transmission of viral diseases, including COVID-19, seasonal Influenza, Swine flu, Ebola virus, etc. Thus, people have been advised to avoid such activity to break the viral transmission chain. However, empirical studies showed that it is either impossible or difficult to avoid as it is unconsciously a human habit. This gives rise to the need to develop means enabling the automatic prediction of the occurrence of such activity. In this paper, we propose SafeSense, a cross-subject face-touch prediction system that combines the sensing capability of smartwatches and smartphones. The system includes innovative modules for automatically labeling the smartwatches’ sensor measurements using smartphones’ proximity sensors during normal phone use. Additionally, SafeSense uses a multi-task learning approach based on autoencoders for learning a subject-invariant representation without any assumptions about the target subjects. SafeSense also improves the deep model’s generalization ability and incorporates different modules to boost the per-subject system’s accuracy and robustness at run-time. We evaluated the proposed system on ten subjects using three different smartwatches and their connected phones. Results show that SafeSense can obtain as high as 97.9% prediction accuracy with a F1-score of 0.98. This outperforms the state-of-the-art techniques in all the considered scenarios without extra data collection overhead. These results highlight the feasibility of the proposed system for boosting public safety.

2 citations