scispace - formally typeset
Search or ask a question
Author

Obaidullah Sk

Bio: Obaidullah Sk is an academic researcher from Aliah University. The author has contributed to research in topics: Language identification & Spoken language. The author has an hindex of 1, co-authored 1 publications receiving 11 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: This paper proposes to use speech signal patterns for spoken language identification, where image-based features are used and the highest accuracy of 99.96%, which outperforms the state-of-the-art reported results.
Abstract: Western countries entertain speech recognition-based applications. It does not happen in a similar magnitude in East Asia. Language complexity could potentially be one of the primary reasons behind this lag. Besides, multilingual countries like India need to be considered so that language identification (words and phrases) can be possible through speech signals. Unlike the previous works, in this paper, we propose to use speech signal patterns for spoken language identification, where image-based features are used. The concept is primarily inspired from the fact that speech signal can be read/visualized. In our experiment, we use spectrograms (for image data) and deep learning for spoken language classification. Using the IIIT-H Indic speech database for Indic languages, we achieve the highest accuracy of 99.96%, which outperforms the state-of-the-art reported results. Furthermore, for a relative decrease of 4018.60% in the signal-to-noise ratio, a decrease of only 0.50% in accuracy tells us the fact that our concept is fairly robust.

20 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A comprehensive analysis of publications of DL from 2007 to 2019 is deployed and a preliminary knowledge of DL is provided for researchers who are interested in this area and a conclusive and comprehensive analysis is made for these who want to do further research on this area.
Abstract: As an emerging and applicable method, deep learning (DL) has attracted much attention in recent years With the development of DL and the massive of publications and researches in this direction, a comprehensive analysis of DL is necessary In this paper, from the perspective of bibliometrics, a comprehensive analysis of publications of DL is deployed from 2007 to 2019 (the first publication with keywords “deep learning” and “machine learning” was published in 2007) By preprocessing, 5722 publications are exported from Web of Science and they are imported into the professional science mapping tools: VOS viewer and Cite Space Firstly, the publication structures are analyzed based on annual publications, and the publication of the most productive countries/regions, institutions and authors Secondly, by the use of VOS viewer, the co-citation networks of countries/regions, institutions, authors and papers are depicted The citation structure of them and the most influential of them are further analyzed Thirdly, the cooperation networks of countries/regions, institutions and authors are illustrated by VOS viewer Time-line review and citation burst detection of keywords are exported from Cite Space to detect the hotspots and research trend Finally, some conclusions of this paper are given This paper provides a preliminary knowledge of DL for researchers who are interested in this area, and also makes a conclusive and comprehensive analysis of DL for these who want to do further research on this area

34 citations

Journal ArticleDOI
TL;DR: A new nature-inspired feature selection (FS) algorithm is developed by hybridizing Binary Bat Algorithm with Late Acceptance Hill-Climbing (LAHC) to select the optimal subset from the said feature vectors in order to reduce the model complexity and help it train faster.
Abstract: With the recent advancements in the fields of machine learning and artificial intelligence, spoken language identification-based applications have been increasing in terms of the impact they have on the day-to-day lives of common people. Western countries have been enjoying the privilege of spoken language recognition-based applications for a while now, however, they have not gained much popularity in multi-lingual countries like India owing to various complexities. In this paper, we have addressed this issue by attempting to identify different Indian languages based on various well-known features like Mel-Frequency Cepstral Coefficient (MFCC), Linear Prediction Coefficient (LPC), Discrete Wavelet Transform (DWT), Gammatone Frequency Cepstral Coefficient (GFCC) as well as a few deep learning architecture based features like i-vector and x-vector extracted from the audio signals. After comparing the initial results, it is observed that the combination of MFCC and LPC produces the best results. Then we have developed a new nature-inspired feature selection (FS) algorithm by hybridizing Binary Bat Algorithm (BBA) with Late Acceptance Hill-Climbing (LAHC) to select the optimal subset from the said feature vectors in order to reduce the model complexity and help it train faster. Using Random Forest (RF) classifier, we have achieved an accuracy of 92.35% on Indic TTS database developed by IIT-Madras, and an accuracy of 100% on the Indic Speech database developed by the Speech and Vision Laboratory (SVL) IIIT-Hyderabad. The proposed algorithm is also found to outperform many standard meta-heuristic FS algorithms. The source code of this work is available at: https://github.com/CodeChef97dotcom/Feature-Selection

23 citations

Journal ArticleDOI
TL;DR: A new hybrid Feature Selection (FS) algorithm have been developed using the versatile Harmony Search (HS) algorithm and a new nature-inspired algorithm called Naked Mole-Rat (NMR) algorithm to select the best subset of features and reduce the model complexity to help it train faster.
Abstract: This era is dominated by artificial intelligence and its various applications - one of which is Spoken Language Identification (S-LID) which has always been a challenging issue and an important research area in the domain of speech signal processing. This paper deals with S-LID to be used for Human-Computer Interaction (HCI) based applications by attempting to classify various languages from three multi-lingual databases namely CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages, VoxForge and Indian Institute of Technology, Madras (IIT-Madras) speech corpus database by extracting their Mel-Spectrogram features and Relative Spectral Transform - Perceptual Linear Prediction (RASTA-PLP) features. A new hybrid Feature Selection (FS) algorithm have been developed using the versatile Harmony Search (HS) algorithm and a new nature-inspired algorithm called Naked Mole-Rat (NMR) algorithm to select the best subset of features and reduce the model complexity to help it train faster. This selected feature set is fed to five classifiers namely Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Multi-layer Perceptron (MLP), Naive Bayes (NB) and Random Forest (RF). The evaluation measures used in this paper are precision, recall, f1-score, classification accuracy and number of selected features. An accuracy of 99.89% on CSS10, 98.22% on VoxForge and 99.75% on IIT-Madras speech corpus databases is achieved using RF. Furthermore, the proposed algorithm is found to outperform 15 standard meta-heuristic FS algorithms. The source code of this work is available at: https://github.com/CodeChef97dotcom/HS-NMR.git

21 citations

Journal ArticleDOI
TL;DR: This work proposes a model for the recognition of Indian and foreign languages, and augments data with noise of varying loudness taken from diverse environments to make the model robust to noise from everyday life.

9 citations

Journal ArticleDOI
TL;DR: CNN based bidirectional long short-term memory (BiLSTM) model has been proposed for Indian language identification with special emphasis on Northeastern languages and results show that the ResNet-50 based model has achieved accuracy up to 98.10% as compare to 97.70% for VGG-16 based model.

5 citations