Author
A.E. Salama
Bio: A.E. Salama is an academic researcher from Cairo University. The author has contributed to research in topics: Speaker recognition & Vector quantization. The author has an hindex of 2, co-authored 4 publications receiving 34 citations.
Papers
More filters
01 Jan 2003
TL;DR: A pattern matching algorithm based on HMM is implemented using Field Programmable Gate Array (FPGA) for isolated Arabic word recognition and achieved a recognition accuracy comparable with the powerful classical recognition system.
Abstract: In this work we propose a speech recognition system for Arabic speech based on a hardware/software co-design implementation approach. Speech recognition is a computationally demanding task, specially the pattern matching stage. The Hidden Markov Model (HMM) is considered the most powerful modeling and matching technique in the different speech recognition tasks. Implementing the pattern matching algorithm, which is time consuming, using dedicated hardware will speed up the recognition process. In this paper, a pattern matching algorithm based on HMM is implemented using Field Programmable Gate Array (FPGA). The forward algorithm, core of matching algorithm in HMM, is analyzed and modified to be more suitable for FPGA implementation. Implementation results showed that the recognition accuracy of the modified algorithm is very close to the classical algorithm with the gain of achieving higher speed and less occupied area in the FPGA. The proposed approach is used for isolated Arabic word recognition and achieved a recognition accuracy comparable with the powerful classical recognition system.
24 citations
13 Dec 2005
TL;DR: A vector quantization is implemented in an FPGA and is reached almost 100% identification rate in 18.8 μs using only 22% of the slices inside the spartan 3 chip.
Abstract: Speaker identification is a challenging pattern classification task It is used enormously in many applications such as security systems, information retrieved services, etc portable identification systems are expected to be widely used in future in many purposes, such as mobile applications Implementing the identification technique using a dedicated hardware could be very useful to achieve smart units In this context, the Field Programmable Gate Array (FPGA) offer an efficient technology to realize a pattern classification strategy A speaker identification system can be implemented using many classification approaches, one of these, the vector quantization technique (VQ), which is considered one of the most powerful classification techniques In this paper a vector quantization is implemented in an FPGA We have reached almost 100% identification rate in 188 μs using only 22% of the slices inside the spartan 3 chip
7 citations
05 Sep 2004
TL;DR: The nonlinear activation function is adapted to be more suitable for the FPGA implementation and almost 100% identification rate is reached using Multi Layer Perceptron Neural Network ( MLP NN).
Abstract: Speaker identification is a challenging pattern classification task. It is used enormously in many applications such as security systems, information retrieved services, etc. portable identification systems are expected to be widely used in future in many purposes, such as mobile applications. Implementing the identification technique using a dedicated hardware could be very useful to achieve smart units. In this context, the FPGA could offer an efficient technology to realize a pattern classification strategy. A speaker identification system can be implemented using many classification approaches, one of these , the artificial neural network (ANN), which is considered one of the most powerful classification techniques. Implementing a Neural Network on. an FPGA is a challenging task because of the complexity of the required arithmetic operations. In this paper the nonlinear activation function is adapted to be more suitable for the FPGA implementation. We have reached almost 100% identification rate using Multi Layer Perceptron Neural Network ( MLP NN).
2 citations
01 Dec 2005
TL;DR: A vector quantization is implemented in an FPGA and is reached almost 100% identification rate in 18.8 mus using only 22% of the slices inside the spartan 3 chip.
Abstract: Speaker identification is a challenging pattern classification task. It is used enormously in many applications such as security systems, information retrieved services, etc. portable identification systems are expected to be widely used in future in many purposes, such as mobile applications. Implementing the identification technique using a dedicated hardware could be very useful to achieve smart units. In this context, the Field Programmable Gate Array (FPGA) offer an efficient technology to realize a pattern classification strategy. A speaker identification system can be implemented using many classification approaches, one of these, the vector quantization technique (VQ), which is considered one of the most powerful classification techniques. In this paper a vector quantization is implemented in an FPGA. We have reached almost 100% identification rate in 18.8 mus using only 22% of the slices inside the spartan 3 chip.
1 citations
Cited by
More filters
TL;DR: This paper discusses the stages involved in the biometric system recognition process and further discusses multimodal systems in terms of their architecture, mode of operation, and algorithms used to develop the systems.
Abstract: Biometric systems are used for the verification and identification of individuals using their physiological or behavioral features. These features can be categorized into unimodal and multimodal systems, in which the former have several deficiencies that reduce the accuracy of the system, such as noisy data, inter-class similarity, intra-class variation, spoofing, and non-universality. However, multimodal biometric sensing and processing systems, which make use of the detection and processing of two or more behavioral or physiological traits, have proved to improve the success rate of identification and verification significantly. This paper provides a detailed survey of the various unimodal and multimodal biometric sensing types providing their strengths and weaknesses. It discusses the stages involved in the biometric system recognition process and further discusses multimodal systems in terms of their architecture, mode of operation, and algorithms used to develop the systems. It also touches on levels and methods of fusion involved in biometric systems and gives researchers in this area a better understanding of multimodal biometric sensing and processing systems and research trends in this area. It furthermore gives room for research on how to find solutions to issues on various unimodal biometric systems.
94 citations
TL;DR: A speech recognition system that allows arm‐disabled students to control computers by voice as a helping tool in the educational process and achieves higher recognition rates than other relevant approaches.
Abstract: Over the previous decades, a need has emerged to empower human-machine communication systems, which are essential to not only perform actions, but also obtain information especially in education applications. Moreover, any communication system has to introduce an efficient and easy way for interaction with a minimum possible error rate. The keyboard, mouse, trackball, touch-screen, and joystick are all examples of tools which were built to provide mechanical human-to-machine interaction. However, a system with the ability to use oral speech, which is the natural form of communication between humans instead of mechanical communication systems, can be more practical for normal students and even a necessity for arm-disabled students who cannot use their arms to handle traditional education tools like pens and notebooks. In this paper, we present a speech recognition system that allows arm-disabled students to control computers by voice as a helping tool in the educational process. When a student speaks through a microphone, the speech is divided into isolated words which are compared with a predefined database of huge number of spoken words to find a match. After that, each recognized word is translated into its related tasks which will be performed by the computer like opening a teaching application or renaming a file. The speech recognition process discussed in this paper involves two separate approaches; the first approach is based on double thresholds voice activity detection and improved Mel-frequency cepstral coefficients (MFCC), while the second approach is based on discrete wavelet transform along with modified MFCC algorithm. Utilizing the best values for all parameters in just mentioned techniques, our proposed system achieved a recognition rate of 98.7% using the first approach, and 98.86% using the second approach of which is better in ratio than the first one but slower in processing which is a critical point for a real time system. Both proposed approaches were compared with other relevant approaches and their recognition rates were noticeably higher.
34 citations
TL;DR: The performance of speaker recognition systems can be improved and their execution time can be reduced by utilizing the finding that some Arabic phonemes have a strong effect on the recognition rate of such systems.
Abstract: In this paper, we investigate the effect of Arabic phonemes on the performance of speaker recognition systems. The investigation reveals that some Arabic phonemes have a strong effect on the recognition rate of such systems. The performance of speaker recognition systems can be improved and their execution time can be reduced by utilizing this finding. Additionally, this finding can be used by segmenting the most effective phonemes for speaker recognition from the utterance, using only the segmented part of the speech for speaker recognition. It can also be used in designing the text to be used in high-performance speaker recognition systems. From our investigation, we find that the recognition rates of Arabic vowels were all above 80%, whereas the recognition rates for the consonants varied from very low (14%) to very high (94%), with the latter achieved by a pharyngeal consonant followed by the two nasal phonemes, which achieved recognition rates above 80%. Four more consonants had recognition rates between 70% and 80%. We show that by utilizing these findings and by designing the text carefully, we can build a high-performance speaker recognition system.
21 citations
TL;DR: The proposed multi-directional local feature (MDLF) achieves excellent results both in text-dependent and text-independent speaker recognition, and in Arabic and English speech.
Abstract: A new feature called multi-directional local feature (MDLF) is proposed and applied in automatic speaker recognition. In order to extract the MDLF, a windowed speech signal is processed by fast Fourier transform and passed through 24 mel-scaled Filter Bank, followed by log compression. A three-point linear regression is then applied in four different directions, which are horizontal (time axis), vertical (frequency axis), 45◦ (time–frequency) and 135◦ (time–frequency). MDLF holds the characteristics of the speaker in time spectrum and results in better performance. In the experiments conducted, a Gaussian mixture model (GMM) with a different number of mixtures is used as the classifier. Experimental results show that the proposed MDLF has better recognition accuracy than the traditional MFCC features. The MDLF achieves excellent results both in text-dependent and text-independent speaker recognition, and in Arabic and English speech. The proposed technique is also language independent.
19 citations
TL;DR: The aim is not only to provide the architecture of a speaker identification system but also to reduce the redundant frames at the pre-processing stage to lower the identification time and computation burden which are vital for real time implementation.
Abstract: Speaker recognition refers to the task of recognizing persons from their spoken speech. It belongs to the field of biometric person authentication which also includes authentication by fingerprints, face and iris. Implementing the identification technique using a dedicated hardware like field programmable gate arrays (FPGA) could be useful to achieve smart units. The computational complexity and identification time mainly depend on the number of speakers, the number of frame vectors, their dimensionality and the model order of the classifier. Due to the slow movement of the voice producing parts, the adjacent frame vectors do not vary much in information content. In this paper, we present the design of a speaker identification system with a distance metric based frame selection technique. The aim is not only to provide the architecture of a speaker identification system but also to reduce the redundant frames at the pre-processing stage to lower the identification time and computation burden which are vital for real time implementation.
18 citations