scispace - formally typeset
Search or ask a question
Author

Wen Chao

Bio: Wen Chao is an academic researcher. The author has contributed to research in topics: Time domain & Domain (software engineering). The author has an hindex of 1, co-authored 1 publications receiving 2 citations.

Papers
More filters
Patent
Bo Wu, Yu Meng, Lianwu Chen, Wen Chao, Su Dan, Yu Dong 
24 Apr 2020
TL;DR: In this article, a time domain separation model for separating noise data and voice data in the audio data is proposed. But the method is not suitable for the time domain domain and does not consider the domain-specific features.
Abstract: The invention discloses a voice recognition method and device, computer equipment and a storage medium, and belongs to the field of data processing. The method comprises the steps of: inputting collected audio data are into a time domain separation model, performing prediction by the time domain separation model based on the audio data to obtain time domain separation information, and using the time domain separation information for separating noise data and voice data in the audio data; performing voice separation on the audio data based on the time domain separation information to obtain time domain voice data; performing feature extraction on the time domain voice data to obtain time domain voice feature information corresponding to the time domain voice data; and performing voice recognition on the time domain voice feature information corresponding to the time domain voice data, and determining the voice content corresponding to the time domain voice data. Through the voice recognition method provided by the invention, the computer equipment can complete the whole voice recognition processes in the time domain without converting the audio information in the time domain to thefrequency domain and then carrying out voice separation, thereby improving the speed of voice recognition.

2 citations


Cited by
More filters
Patent
13 Nov 2020
TL;DR: In this paper, the authors proposed a method to convert acquired audio data into a corresponding spectrogram, and then input the spectrogram to a multi-task convolutional neural network acoustic model.
Abstract: The embodiment of the invention provides a voice recognition method and device, equipment and a medium. The method comprises the steps: converting acquired audio data into a corresponding spectrogram;judging whether the frame number of the spectrogram is a preset frame number or not; if the frame number of the spectrogram is not the preset frame number, carrying out zero filling on the spectrogram, so as to making the frame number of a spectrogram to be identified obtained after zero filling is the preset frame number; and inputting the spectrogram to be identified into a multi-task convolutional neural network acoustic model. The spectrogram is directly input into an acoustic model, and then the text of the audio data is recognized. Compared with the prior art for calculating the information loss on the frequency domain caused by MFCC characteristics, the method and device have the advantages that the loss of the input characteristics is reduced, the identification degree of the audio data is increased, and the extraction of the characteristic information by the acoustic model is facilitated.
Patent
25 Dec 2020
TL;DR: In this paper, a voiceprint identification model with the neural network architecture is obtained by training according to the first sample feature information of the broadband spectrogram and the second sample feature features of the narrowband spectrogram.
Abstract: The invention provides a voiceprint identification model training method and device, a voiceprint identification method and device, equipment and a medium, and relates to the technical field of data processing. The voiceprint identification model training method comprises the steps of determining a target phoneme sample from phonemes of a voice sample, the phonemes of the voice sample being labeled with speaker labels in advance; generating a broadband spectrogram and a narrowband spectrogram of the phonemes in the target phoneme sample; obtaining first sample feature information of the broadband spectrogram and second sample feature information of the narrowband spectrogram; and according to the first sample feature information and the second sample feature information, performing model training by adopting a preset neural network architecture to obtain a voiceprint identification model with the neural network architecture. The voiceprint identification model with the neural network architecture is obtained by training according to the first sample feature information of the broadband spectrogram and the second sample feature information of the narrowband spectrogram, voiceprint identification can be performed on to-be-identified voice based on the voiceprint identification model, waste of human resources is reduced, and the objectivity and accuracy of voiceprint identification are improved.