scispace - formally typeset
Search or ask a question
Author

A. M. Sharshar

Bio: A. M. Sharshar is an academic researcher from Menoufia University. The author has contributed to research in topics: Speaker recognition & Pitch detection algorithm. The author has an hindex of 1, co-authored 2 publications receiving 2 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: An extensive study of speaker recognition in both text-dependent and text-independent cases is presented and two proposed CNN models are presented for efficient speaker recognition from clean and reverberant speech signals.
Abstract: Speaker recognition is one of several biometric recognition systems owing to its high importance in numerous applications of security and telecommunications. The key aspiration of speaker recognition systems is to know who is speaking depending on voice characteristics. This paper presents an extensive study of speaker recognition in both text-dependent and text-independent cases. Convolutional Neural Network (CNN) based feature extraction is extended to the text-dependent and text-independent speaker recognition tasks. In addition, the effect of reverberation on the speaker recognition system is addressed. All speech signals are converted into images by obtaining their spectrograms. Two proposed CNN models are presented for efficient speaker recognition from clean and reverberant speech signals. They depend on image processing concepts applied on spectrograms of speech signals. One of the proposed models is compared with a conventional Benchmark model in the text-independent scenario. The performance of the recognition system is measured by the recognition rate in the cases of clean and reverberant speech.

9 citations

Journal ArticleDOI
01 Jan 2020
TL;DR: Several methods for pitch frequency estimation are investigated and compared on clear and reverberant male and female speech signals to select the one that is not affected so much by the reverberation effect.
Abstract: Reverberation is one of the effects that occur regularly in closed room due to multiple reflections. This paper investigates the result of reverberation on both male and female speech signals. This effect is reflected in pitch frequency of speech signals. This parameter is important as it is usually used for speaker identification. Hence, several methods for pitch frequency estimation are investigated and compared on clear and reverberant male and female speech signals to select the one that is not affected so much by the reverberation effect.

Cited by
More filters
Journal ArticleDOI
TL;DR: In this paper , the authors proposed a fatigue detection system based on Fast Fourier Transform (FFT) and Discrete Wavelet Transform (DWT) with machine learning and deep learning classifiers.
Abstract: As the number of road accidents increases, it is critical to avoid making driving mistakes. Driver fatigue detection is a concern that has prompted researchers to develop numerous algorithms to address this issue. The challenge is to identify the sleepy drivers with accurate and speedy alerts. Several datasets were used to develop fatigue detection algorithms such as electroencephalogram (EEG), electrooculogram (EOG), electrocardiogram (ECG), and electromyogram (EMG) recordings of the driver’s activities e.g., DROZY dataset. This study proposes a fatigue detection system based on Fast Fourier Transform (FFT) and Discrete Wavelet Transform (DWT) with machine learning and deep learning classifiers. The FFT and DWT are used for feature extraction and noise removal tasks. In addition, the classification task is carried out on the combined EEG, EOG, ECG, and EMG signals using machine learning and deep learning algorithms including 1D Convolutional Neural Networks (1D CNNs), Concatenated CNNs (C-CNNs), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), k-Nearest Neighbor (KNN), Quadrature Data Analysis (QDA), Multi-layer Perceptron (MLP), and Logistic Regression (LR). The proposed methods are validated on two scenarios, multi-class and binary-class classification. The simulation results reveal that the proposed models achieved a high performance for fatigue detection from medical signals, with a detection accuracy of 90% and 96% for multiclass and binary-class scenarios, respectively. The works in the literature achieved a maximum accuracy of 95%. Therefore, the proposed methods outperform similar efforts in terms of detection accuracy.

3 citations

Journal ArticleDOI
TL;DR: In this paper, a comparative study of state-of-the-art speaker recognition techniques along with their design challenges, robustness issues and performance evaluation methods is presented. But, the experimental results serve as a benchmark for VQ/GMM/UBM based methods for the IITG-MV SR database.
Abstract: An array of features and methods are being developed over the past six decades for Speaker Identification (SI) and Speaker Verification (SV), jointly known as Speaker Recognition(SR). Mel Frequency Cepstral Coefficients (MFCC) is generally used as feature vectors in most of the cases because it gives higher accuracy compared to other features. The presented paper focuses on comparative study of state-of-the-art SR techniques along with their design challenges, robustness issues and performance evaluation methods. Rigorous experiments have been performed using Gaussian Mixture Model (GMM) with variations like Universal Background Model (UBM) and/or Vector Quantization (VQ) and/or VQ based UBM-GMM (VQ-UBM-GMM) with detail discussion. Other popular methods have been included, namely, Linear Discriminate Analysis (LDA), Probabilistic LDA (PLDA), Gaussian PLDA (GPLDA), Multi-condition GPLDA (MGPLDA), Identity Vector (i-vector) for comparative study only. Three popular audio data-sets have been used in the experiments, namely, IITG-MV SR, Hyke-2011 and ELSDSR. Hyke-2011 and ELSDSR contain clean speech while IITG-MV SR contains noisy audio data with variations in channel (device), environment, spoken style. We propose a new data mixing approach for SR to make the system independent of recording device, spoken style and environment. The accuracy we obtained for VQ and GMM based methods for databases, Hyke-2011 and ELSDSR are varies from $$99.6\%$$ to $$100\%$$ whereas accuracy for IITG-MV SR is upto $$98\%$$ . Indeed, in some cases the accuracies degrade drastically due to mismatch between training and testing data as well as singularity problem of GMM. The experimental results serve as a benchmark for VQ/GMM/UBM based methods for the IITG-MV SR database.

2 citations

Journal ArticleDOI
TL;DR: This research paper proposes a novel voiceprint generation methodology for recognizing the speakers registered in a system using Mel-Spectrogram, Chromagram, MFCC and a new ensembled feature called Mel-Chroma.
Abstract: This research paper proposes a novel voiceprint generation methodology for recognizing the speakers registered in a system. The proposed methodology is a keyword-dependent closed set speaker classification task. The features used are Mel-Spectrogram, Chromagram, MFCC and a new ensembled feature called Mel-Chroma. Mel-Chroma is generated with the combination of Mel-spectrogram and Chromagram. The Mel-Chroma spectrogram generated is converted into a binary image by using the average as the threshold. The recurrent neural network model LSTM is used for the classification task and the dataset used is FSDD. The proposed method has a higher accuracy than the state-of-art methods for the specific task. The accuracy obtained for the classification of speakers using a binary Mel-Chroma voiceprint is 98.33%.
Journal ArticleDOI
16 Jun 2023-Symmetry
TL;DR: In this paper , a multi-stage system utilizing machine and deep learning techniques was proposed to detect asymmetric states, including tiredness and non-vigilance as well as yawning.
Abstract: Due to the widespread issue of road accidents, researchers have been drawn to investigate strategies to prevent them. One major contributing factor to these accidents is driver fatigue resulting from exhaustion. Various approaches have been explored to address this issue, with machine and deep learning proving to be effective in processing images and videos to detect asymmetric signs of fatigue, such as yawning, facial characteristics, and eye closure. This study proposes a multistage system utilizing machine and deep learning techniques. The first stage is designed to detect asymmetric states, including tiredness and non-vigilance as well as yawning. The second stage is focused on detecting eye closure. The machine learning approach employs several algorithms, including Support Vector Machine (SVM), k-Nearest Neighbor (KNN), Multi-layer Perceptron (MLP), Decision Tree (DT), Logistic Regression (LR), and Random Forest (RF). Meanwhile, the deep learning approach utilizes 2D and 3D Convolutional Neural Networks (CNNs). The architectures of proposed deep learning models are designed after several trials, and their parameters have been selected to achieve optimal performance. The effectiveness of the proposed methods is evaluated using video and image datasets, where the video dataset is classified into three states: alert, tired, and non-vigilant, while the image dataset is classified based on four facial symptoms, including open or closed eyes and yawning. A more robust system is achieved by combining the image and video datasets, resulting in multiple classes for detection. Simulation results demonstrate that the 3D CNN proposed in this study outperforms the other methods, with detection accuracies of 99 percent, 99 percent, and 98 percent for the image, video, and mixed datasets, respectively. Notably, this achievement surpasses the highest accuracy of 97 percent found in the literature, suggesting that the proposed methods for detecting drowsiness are indeed effective solutions.
Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper provided an in-depth analysis of timbre-speech spectrograms in instrumental music, and designed a model analysis of rehabilitation occupational therapy techniques based on the analysis of time-varying linear system consisting of excitation, vocal tract, and radiation models.
Abstract: This paper provides an in-depth analysis of timbre-speech spectrograms in instrumental music, designs a model analysis of rehabilitation occupational therapy techniques based on the analysis of timbre-speech spectrograms in instrumental music, and tests the models for comparison. Starting from the mechanism of human articulation, this paper models the process of human expression as a time-varying linear system consisting of excitation, vocal tract, and radiation models. The system's overall architecture is designed according to the characteristics of Chinese speech and everyday speech rehabilitation theory (HSL theory). The dual judgment of temporal threshold and short-time average energy realized the phonetic length training. Tone and clear tone training were achieved by linear predictive coding technique (LPC) and autocorrelation function. Using the DTW technique, isolated word speech recognition was achieved by extracting Mel-scale Frequency Cepstral Coefficients (MFCC) parameters of speech signals. The system designs corresponding training scenes for each training module according to the extracted speech parameters, combines the multimedia speech spectrogram motion situation with the speech parameters, and finally presents the training content as a speech spectrogram, and evaluates the training results through human-machine interaction to stimulate the interest of rehabilitation therapy and realize the speech rehabilitation training of patients. After analyzing the pre- and post-test data, it was found that the p-values of all three groups were <0.05, which was judged to be significantly different. Also, all subjects changed their behavioral data during the treatment. Therefore, it was concluded that the music therapy technique could improve the patients' active gaze communication ability, verbal command ability, and active question-answering ability after summarizing the data, i.e., the hypothesis of this experiment is valid. Therefore, it is believed that the technique of timbre-speech spectrogram analysis in instrumental music can achieve the effect of rehabilitation therapy to a certain extent.