scispace - formally typeset
Search or ask a question
Author

Yassine BenAyed

Bio: Yassine BenAyed is an academic researcher from University of Sfax. The author has contributed to research in topics: Support vector machine & Hidden Markov model. The author has an hindex of 5, co-authored 17 publications receiving 95 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: This paper proposes to optimize the network over an adaptive backpropagation MLMKL framework using the gradient ascent method instead of dual objective function, or the estimation of the leave-one-out error, and achieves high performance.
Abstract: Multiple kernel learning (MKL) approach has been proposed for kernel methods and has shown high performance for solving some real-world applications. It consists on learning the optimal kernel from one layer of multiple predefined kernels. Unfortunately, this approach is not rich enough to solve relatively complex problems. With the emergence and the success of the deep learning concept, multilayer of multiple kernel learning (MLMKL) methods were inspired by the idea of deep architecture. They are introduced in order to improve the conventional MKL methods. Such architectures tend to learn deep kernel machines by exploring the combinations of multiple kernels in a multilayer structure. However, existing MLMKL methods often have trouble with the optimization of the network for two or more layers. Additionally, they do not always outperform the simplest method of combining multiple kernels (i.e., MKL). In order to improve the effectiveness of MKL approaches, we introduce, in this paper, a novel backpropagation MLMKL framework. Specifically, we propose to optimize the network over an adaptive backpropagation algorithm. We use the gradient ascent method instead of dual objective function, or the estimation of the leave-one-out error. We test our proposed method through a large set of experiments on a variety of benchmark data sets. We have successfully optimized the system over many layers. Empirical results over an extensive set of experiments show that our algorithm achieves high performance compared to the traditional MKL approach and existing MLMKL methods.

36 citations

Journal ArticleDOI
TL;DR: A Text-To-Speech (TTS) synthesis system for modern standard Arabic language based on statistical parametric approach and Mel-cepstral coefficients is described and the proposed method for synthesis system can generate intelligible and natural speech.

26 citations

Journal ArticleDOI
TL;DR: An efficient Arabic TTS system based on statistical parametric approach and non-uniform units speech synthesis and a new simple stacked neural network approach to improve the accuracy of the acoustic models is presented.
Abstract: Text-to-speech system (TTS), known also as speech synthesizer, is one of the important technology in the last years due to the expanding field of applications. Several works on speech synthesizer have been made on English and French, whereas many other languages, including Arabic, have been recently taken into consideration. The area of Arabic speech synthesis has not sufficient progress and it is still in its first stage with a low speech quality. In fact, speech synthesis systems face several problems (e.g. speech quality, articulatory effect, etc.). Different methods were proposed to solve these issues, such as the use of large and different unit sizes. This method is mainly implemented with the concatenative approach to improve the speech quality and several works have proved its effectiveness. This paper presents an efficient Arabic TTS system based on statistical parametric approach and non-uniform units speech synthesis. Our system includes a diacritization engine. Modern Arabic text is written without mention the vowels, called also diacritic marks. Unfortunately, these marks are very important to define the right pronunciation of the text which explains the incorporation of the diacritization engine to our system. In this work, we propose a simple approach based on deep neural networks. Deep neural networks are trained to directly predict the diacritic marks and to predict the spectral and prosodic parameters. Furthermore, we propose a new simple stacked neural network approach to improve the accuracy of the acoustic models. Experimental results show that our diacritization system allows the generation of full diacritized text with high precision and our synthesis system produces high-quality speech.

11 citations

Journal Article
TL;DR: The incorporation of SVM with HMM brings into existence of the new system of ASR, and the proposed system SVM/HMM realizes the best performances, whereby, it achieves 75.8% as a recognition frequency.
Abstract: Hidden Markov Models (HMM) are currently widely used in Automatic Speech Recognition (ASR) as being the most effective models. Yet, they sometimes pose some problems of discrimination. The hybridization of Artificial Neural Networks (ANN) in particular Multi Layer Perceptrons (MLP) with HMM is a promising technique to overcome these limitations. In order to ameliorate results of recognition system, we use Support Vector Machines (SVM) witch characterized by a high predictive power and discrimination. The incorporation of SVM with HMM brings into existence of the new system of ASR. So, by using 2800 occurrences of Arabic phonemes, this work arises a comparative study of our acknowledgment system of it as the following: The use of especially the HMM standards lead to a recognition rate of 66.98%. Also, with the hybrid system MLP/HMM we succeed in achieving the value of 73.78%. Moreover, our proposed system SVM/HMM realizes the best performances, whereby, we achieve 75.8% as a recognition frequency.

11 citations

Proceedings ArticleDOI
22 Jun 2013
TL;DR: An Arabic text to speech synthesis system based on statistical parametric synthesis, where MFCC neural network architecture and an objective evaluation with the MFCC distortion measure are given in this paper.
Abstract: With the increasing number of users of text to speech applications, high quality speech synthesis is required. However, only few researches concern Arabic text to speech applications. Compared with other languages such as English and French the quality of Arabic synthesis speech is still poor. For these reasons, we propose in this paper an Arabic text to speech synthesis system based on statistical parametric synthesis. Mel Frequency Cepstral Coefficients (MFCC), energy and pitch are predicted using back propagation artificial neural networks and then transformed into speech using Mel Log Spectrum Approximation filter. Often, in Arabic written text, the short vowels called diacritic marks are omitted. So, a diacritization system is proposed to resolve this problem. Different unit sizes are considered in speech database which are phoneme, diphone and triphone. MFCC neural network architecture and an objective evaluation with the MFCC distortion measure are given in this paper.

9 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The results demonstrate that the DTW-based similarity measure of the NDVI time series can be effectively used to map large-area rice cropping systems with diverse cultivation processes.
Abstract: Normalized Difference Vegetation Index (NDVI) derived from Moderate Resolution Imaging Spectroradiometer (MODIS) time-series data has been widely used in the fields of crop and rice classification. The cloudy and rainy weather characteristics of the monsoon season greatly reduce the likelihood of obtaining high-quality optical remote sensing images. In addition, the diverse crop-planting system in Vietnam also hinders the comparison of NDVI among different crop stages. To address these problems, we apply a Dynamic Time Warping (DTW) distance-based similarity measure approach and use the entire yearly NDVI time series to reduce the inaccuracy of classification using a single image. We first de-noise the NDVI time series using S-G filtering based on the TIMESAT software. Then, a standard NDVI time-series base for rice growth is established based on field survey data and Google Earth sample data. NDVI time-series data for each pixel are constructed and the DTW distance with the standard rice growth NDVI time series is calculated. Then, we apply thresholds to extract rice growth areas. A qualitative assessment using statistical data and a spatial assessment using sampled data from the rice-cropping map reveal a high mapping accuracy at the national scale between the statistical data, with the corresponding R2 being as high as 0.809; however, the mapped rice accuracy decreased at the provincial scale due to the reduced number of rice planting areas per province. An analysis of the results indicates that the 500-m resolution MODIS data are limited in terms of mapping scattered rice parcels. The results demonstrate that the DTW-based similarity measure of the NDVI time series can be effectively used to map large-area rice cropping systems with diverse cultivation processes.

123 citations

Journal ArticleDOI
TL;DR: This article presents a comprehensive overview of the state-of-the-art approaches that bridge the MKL and deep learning techniques, systematically reviewing the typical hybrid models, training techniques, and their theoretical and practical benefits.

37 citations

Journal ArticleDOI
TL;DR: It is deduced that SVM/HMM hybrid model is more efficient then HMMs standards and the hybrid system Multi-Layer Perceptron (MLP) with HMM.
Abstract: This paper presents a new hybrid method for continuous Arabic speech recognition based on triphones modelling. To do this, we apply Support Vectors Machine (SVM) as an estimator of posterior probabilities within the Hidden Markov Models (HMM) standards. In this work, we describe a new approach of categorising Arabic vowels to long and short vowels to be applied on the labeling phase of speech signals. Using this new labeling method, we deduce that SVM/HMM hybrid model is more efficient then HMMs standards and the hybrid system Multi-Layer Perceptron (MLP) with HMM. The obtained results for the Arabic speech recognition system based on triphones are 64.68 % with HMMs, 72.39 % with MLP/HMM and 74.01 % for SVM/HMM hybrid model. The WER obtained for the recognition of continuous speech by the three systems proves the performance of SVM/HMM by obtaining the lowest average for 4 tested speakers 11.42 %.

31 citations