Bio: H.F. Hammed is an academic researcher. The author has contributed to research in topics: Speaker recognition & Word recognition. The author has an hindex of 1, co-authored 2 publications receiving 25 citations.
••01 Jan 2003
TL;DR: A pattern matching algorithm based on HMM is implemented using Field Programmable Gate Array (FPGA) for isolated Arabic word recognition and achieved a recognition accuracy comparable with the powerful classical recognition system.
Abstract: In this work we propose a speech recognition system for Arabic speech based on a hardware/software co-design implementation approach. Speech recognition is a computationally demanding task, specially the pattern matching stage. The Hidden Markov Model (HMM) is considered the most powerful modeling and matching technique in the different speech recognition tasks. Implementing the pattern matching algorithm, which is time consuming, using dedicated hardware will speed up the recognition process. In this paper, a pattern matching algorithm based on HMM is implemented using Field Programmable Gate Array (FPGA). The forward algorithm, core of matching algorithm in HMM, is analyzed and modified to be more suitable for FPGA implementation. Implementation results showed that the recognition accuracy of the modified algorithm is very close to the classical algorithm with the gain of achieving higher speed and less occupied area in the FPGA. The proposed approach is used for isolated Arabic word recognition and achieved a recognition accuracy comparable with the powerful classical recognition system.
••01 Dec 2005
TL;DR: A vector quantization is implemented in an FPGA and is reached almost 100% identification rate in 18.8 mus using only 22% of the slices inside the spartan 3 chip.
Abstract: Speaker identification is a challenging pattern classification task. It is used enormously in many applications such as security systems, information retrieved services, etc. portable identification systems are expected to be widely used in future in many purposes, such as mobile applications. Implementing the identification technique using a dedicated hardware could be very useful to achieve smart units. In this context, the Field Programmable Gate Array (FPGA) offer an efficient technology to realize a pattern classification strategy. A speaker identification system can be implemented using many classification approaches, one of these, the vector quantization technique (VQ), which is considered one of the most powerful classification techniques. In this paper a vector quantization is implemented in an FPGA. We have reached almost 100% identification rate in 18.8 mus using only 22% of the slices inside the spartan 3 chip.
TL;DR: A speech recognition system that allows arm‐disabled students to control computers by voice as a helping tool in the educational process and achieves higher recognition rates than other relevant approaches.
Abstract: Over the previous decades, a need has emerged to empower human-machine communication systems, which are essential to not only perform actions, but also obtain information especially in education applications. Moreover, any communication system has to introduce an efficient and easy way for interaction with a minimum possible error rate. The keyboard, mouse, trackball, touch-screen, and joystick are all examples of tools which were built to provide mechanical human-to-machine interaction. However, a system with the ability to use oral speech, which is the natural form of communication between humans instead of mechanical communication systems, can be more practical for normal students and even a necessity for arm-disabled students who cannot use their arms to handle traditional education tools like pens and notebooks. In this paper, we present a speech recognition system that allows arm-disabled students to control computers by voice as a helping tool in the educational process. When a student speaks through a microphone, the speech is divided into isolated words which are compared with a predefined database of huge number of spoken words to find a match. After that, each recognized word is translated into its related tasks which will be performed by the computer like opening a teaching application or renaming a file. The speech recognition process discussed in this paper involves two separate approaches; the first approach is based on double thresholds voice activity detection and improved Mel-frequency cepstral coefficients (MFCC), while the second approach is based on discrete wavelet transform along with modified MFCC algorithm. Utilizing the best values for all parameters in just mentioned techniques, our proposed system achieved a recognition rate of 98.7% using the first approach, and 98.86% using the second approach of which is better in ratio than the first one but slower in processing which is a critical point for a real time system. Both proposed approaches were compared with other relevant approaches and their recognition rates were noticeably higher.
TL;DR: The benefit of the overall accuracy of the integrated system (e.g., translation) outweighs the WER increase for the Arabic ASR system and it is recommended to include diacritics for ASR systems when integrated with other systems such as voice-enabled translation.
Abstract: Arabic is the native language for over 300 million speakers and one of the official languages in United Nations. It has a unique set of diacritics that can alter a word’s meaning. Arabic automatic speech recognition (ASR) received little attention compared to other languages, and researches were oblivious to the diacritics in most cases. Omitting diacritics circumscribes the Arabic ASR system’s usability for several applications such as voice-enabled translation, text to speech, and speech-to-speech. In this paper, we study the effect of diacritics on Arabic ASR systems. Our approach is based on building and comparing diacritized and nondiacritized models for different corpus sizes. In particular, we build Arabic ASR models using state-of-the-art technologies for 1, 2, 5, 10, and 23 h. Each of those models was trained once with a diacritized corpus and another time with a nondiacritized version of the same corpus. KALDI toolkit and SRILM were used to build eight models for each corpus that are GMM-SI, GMM SAT, GMM MPE, GMM MMI, SGMM, SGMM-bMMI, DNN, DNN-MPE. Eighty different models were created using this experimental setup. Our results show that Word Error Rates (WERs) ranged from 4.68% to 42%. Adding diacritics increased WER by 0.59% to 3.29%. Although diacritics increased WERs, it is recommended to include diacritics for ASR systems when integrated with other systems such as voice-enabled translation. We believe that the benefit of the overall accuracy of the integrated system (e.g., translation) outweighs the WER increase for the Arabic ASR system.
•22 Dec 2008
TL;DR: In this paper, a hardware implemented backend search stage for a speech recognition system is provided, which includes a number of pipelined stages including a fetch stage, an updating stage which may be a Viterbi stage, a transition and prune stage, and a language model stage.
Abstract: A hardware implemented backend search stage, or engine, for a speech recognition system is provided. In one embodiment, the backend search engine includes a number of pipelined stages including a fetch stage, an updating stage which may be a Viterbi stage, a transition and prune stage, and a language model stage. Each active triphone of each active word is represented by a corresponding triphone model. By being pipelined, the stages of the backend search engine are enabled to simultaneously process different triphone models, thereby providing high-rate backend searching for the speech recognition system. In one embodiment, caches may be used to cache frequently and/or recently accessed triphone information utilized by the fetch stage, frequently and/or recently accessed triphone-to-senone mappings utilized by the updating stage, or both.
TL;DR: Field Programmable Gate Arrays (FPGAs) seems to provide a unique advantage for the implementation of Digital Signal Processing (DSP) systems and by extension ASR systems.
Abstract: Speech recognition is one of the next generation technologies for human–computer interaction. Speech recognition has been researched since the late 1950s but due to its computational complexity and limited computing capabilities of the last few decades, its progress has been impeded. In laboratory settings automatic speech recognition systems (ASR) have achieved high levels of recognition accuracies, which tend to degrade in real world environments. This paper analyses the basics of the speech recognition system. Major problems faced by ASR in real world environments have been discussed with major focus on the techniques used in the development of noise robust ASR. Throughout the years there have been different implementation mediums for ASR but Field Programmable Gate Arrays (FPGAs) seems to provide a unique advantage for the implementation of Digital Signal Processing (DSP) systems and by extension ASR systems.