scispace - formally typeset
Search or ask a question
Patent

Voice recognition method facing specific crowd

TL;DR: In this article, a voice recognition method for a specific crowd is presented, which comprises the following steps of sampling a voice signal and converting the voice signal to a digital signal from an analogue signal, then, pre-weighting, windowing, en-framing and performing front-end processing of endpoint detection on the digital voice signal; later on, performing feature extraction on the voice signals by adopting discrete wavelet transform; and finally, performing voice recognition on the feature-extracted voice signal by adopting a discrete hidden Markov model after training a sample.
Abstract: The invention discloses a voice recognition method facing a specific crowd. The method comprises the following steps of: first, sampling a voice signal and converting the voice signal to a digital signal from an analogue signal; then, pre-weighting, windowing, en-framing and performing front-end processing of endpoint detection on the digital voice signal; later on, performing feature extraction on the voice signal by adopting discrete wavelet transform; and finally, performing voice recognition on the feature-extracted voice signal by adopting a discrete hidden Markov model after training a sample. In the processes of performing the front-end processing and the feature extraction on the voice signal, spectrum features and pronunciation characteristics of different target crowds are fully taken into consideration and the process of extracting voice information is optimized, so that a processing process and an information extracting process can be simplified; and therefore, recognition precision is ensured, simultaneously calculation amount and information storage capacity in the recognition process are greatly reduced, and the voice recognition on an embedded platform is realized.
Citations
More filters
Patent
20 Nov 2013
TL;DR: In this article, a media program interaction method and system for multimedia applications is presented. And the system is applicable to the field of multimedia application, which includes the following steps: establishing an audio feature library by a server in real-time; storing audio feature values and program interactive information of all media programs; recording an audio of a currently-played media program through a mobile device; extracting the audio feature value of the audio of the media program, and sending the extracted audio feature feature value to the server; carrying out similarity matching on the received audio featurevalue and the audio program
Abstract: The invention provides a media program interaction method and system and is applicable to the field of multimedia application The media program interaction method comprises the following steps: establishing an audio feature library by a server in real time; storing audio feature values and program interactive information of all media programs; recording an audio of a currently-played media program through a mobile device; extracting the audio feature values of the audio of the media program, and sending the extracted audio feature value to the server; carrying out similarity matching on the received audio feature value and the audio feature values of all the media programs in the audio feature library by the server, so as to identify the currently-played media program; and judging the type information of the currently-played media program by the server, and pushing the corresponding program interactive information to the mobile device according to the type information of the currently-played media program With the adoption of the media program interaction method and the media program interaction system provided by the invention, the media program interactive information watched by a user can be accurately and rapidly fed back to the user The media program interaction method and the media program interaction system provided by the invention have the advantages of large amount of the interactive information, various types and better user experience; and the various demands of the user can be met

36 citations

Patent
03 Dec 2014
TL;DR: In this paper, a method for recommending music stars with tones similar to those of singers was proposed, which can increase singing pleasure and improve the level of the users for simulating the tones of the music stars.
Abstract: The invention provides a method for recommending music stars with tones similar to those of singers. The method includes the steps that pure human voice frequencies are obtained, preprocessing is conducted on the pure human voice frequencies, a voice feature coefficient set of each pure human voice frequency is extracted, and corresponding music star models are trained through the voice model algorithm; preprocessing is conducted on given user voice samples, and feature coefficient sets are extracted; the feature coefficient sets of the user voice samples are matched with all the music star models, and the music stars with the tones most similar to the tones of the singers are found out. The invention further provides a corresponding device. The method and device for recommending the music stars with the tones similar to those of the singers can be applied to KTV scenes, are used for recommending music stars with the tones similar to those of users, can increase singing pleasure, and improve the level of the users for simulating the tones of the music stars.

13 citations

Patent
10 Jun 2015
TL;DR: In this paper, an identification method for intelligent robots in order to accurately identify personnel identity was proposed, which includes the steps of 1) establishing an identification database, 2) inputting facial images and voice information to be identified, 3) respectively calculating feature vectors of inputted facial image and voice, 4) performing identification of faces and voices, and 5) outputting identification results.
Abstract: The invention provides an identification method for intelligent robots in order to accurately identify personnel identity. The identification method includes the steps of 1), establishing an identification database; 2), inputting facial images and voice information to be identified; 3), respectively calculating feature vectors of inputted facial images and voices; 4), performing identification of faces and voices; 5), outputting identification results.

11 citations

Patent
20 Jun 2017
TL;DR: In this article, the authors proposed a speech recognition method that comprises the following steps: acquiring a characteristic classification result of a speech signal to be recognized, wherein the characteristic classification results includes pronunciations used to describe the pronunciation characteristics of speech signal frames and the probability that the speech signal signal frames are mapped to the corresponding nouns.
Abstract: The embodiments of the invention provide a speech recognition method and a speech recognition device. The method comprises the following steps: acquiring a characteristic classification result of a speech signal to be recognized, wherein the characteristic classification result includes pronunciations used to describe the pronunciation characteristics of speech signal frames and the probability that the speech signal frames are mapped to the corresponding pronunciations; removing the pronunciations contained in the characteristic classification result through filtering based on the probability contained in the characteristic classification result; and recognizing the speech signal based on the characteristic classification result after filtering. According to the embodiments of the invention, there is no need to implement recognition operation related to the removed pronunciations in the process of speech signal recognition, for example, there is no need to search a recognition network for a path related to the removed pronunciations. Therefore, the time consumed by the speech recognition process can be reduced, and the efficiency of speech recognition can be improved.

8 citations

Patent
30 Jan 2018
TL;DR: In this article, a voice interaction system consisting of a preprocessing module, an acoustic model library, an N-gram model database and a rule-based model database is described, and the identification module comprises an MFCC parameter feature extraction unit and an identification control unit.
Abstract: The invention discloses a voice interaction system and a voice interaction method. The voice interaction system comprises a preprocessing module, an acoustic model library, a language model library, an identification module and an interaction center, wherein the preprocessing module comprises a voice preprocessing module and an endpoint detection module; the acoustic model library comprises an HMMmodel matching unit, a TDNN model matching unit, an HMM model database and an ANN model database; the language model library comprises a N-Gram model database and a Rule-based model database; the identification module comprises an MFCC parameter feature extraction unit and an identification control unit; and the interaction center comprises a semantic understanding module, an interaction processing module, a response information library and a semantic dictionary database. The voice interaction system and the voice interaction method utilize a feedback module to monitor identification information and a feedback instruction of a client, demonstrate the identification information to the client by means of the identification control unit, control and change a voice matching model and the language model library, and optimize the interactive identification accuracy rate of the voice interaction system.

8 citations

References
More filters
PatentDOI
TL;DR: The authors used subband cepstral features to improve the recognition string accuracy rates for speech inputs for first training and then recognizing speech, using a method and apparatus for first classifying speech.
Abstract: A method and apparatus for first training and then recognizing speech. The method and apparatus use subband cepstral features to improve the recognition string accuracy rates for speech inputs.

107 citations

Patent
05 Sep 2007
TL;DR: An embedded voice identification method based on sub-word hidden Markov model is proposed in this article, which includes detecting end point, picking up frame synchronous acoustic characters, calculating acoustic character vector sequence used on identification network decoding and decoding by identification network.
Abstract: An embedded voice identifying method based on sub-word hidden Markov model includes detecting end point, picking up frame synchronous acoustic characters, calculating acoustic character vector sequence used on identification network-decoding and decoding by identification network. The device used for realizing said method is also disclosed.

39 citations

PatentDOI
Yifan Gong1
TL;DR: In this paper, an estimate of clean speech vector, typically Mel-Frequency Cepstral Coefficient (MFCC), given its noisy observation is provided, making use of two Gaussian mixtures.
Abstract: An estimate of clean speech vector, typically Mel-Frequency Cepstral Coefficient (MFCC) given its noisy observation is provided. The method makes use of two Gaussian mixtures. The first one is trained on clean speech and the second is derived from the first one using some noise samples. The method gives an estimate of a clean speech feature vector as the conditional expectancy of clean speech given an observed noisy vector.

26 citations

Patent
15 Apr 2009
TL;DR: In this article, a method used for identifying isolated word of standard Chinese on basis of Basic Frequency Envelope (FFE) is presented. But the method is not suitable for the identification of isolated words in standard Chinese.
Abstract: The invention discloses a method used for identifying isolated word of standard Chinese on basis of basic frequency envelope. The method comprises: a vocabulary is stored in a template library in advance in the form of basic frequency envelope. The method further comprising the steps in sequence as follows: 1) the speech is cut and the noise is reduced; 2) the basic frequency characteristics of the speech are extracted; 3) judgment: when the language material is used for training, that is to say, when no corresponding vocabulary exists in a corpus, the step 4) is started; when the language material is used for identifying, the step 5) is started; 4) the basic frequency characteristics of the language material is added into the template library; 5) the similarities between the test speech and the basic frequency envelope of the template are compared and the vocabulary represented by the template with the minimum distance to the test speech is taken as the identification result; 6) the identification result is output. By researching the tone of Chinese, the method of the invention is capable of identifying the isolated word of standard Chinese by comparing the similarities between the test speech and the basic frequency envelope of the template.

7 citations