scispace - formally typeset
Search or ask a question
Author

Ilyes Rebai

Bio: Ilyes Rebai is an academic researcher from University of Sfax. The author has contributed to research in topics: Deep learning & Kernel method. The author has an hindex of 4, co-authored 11 publications receiving 107 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: This work proposes a new Deep Neural Network (DNN) speech recognition architecture which takes advantage from both DA and EM approaches in order to improve the prediction accuracy of the system.

66 citations

Journal ArticleDOI
TL;DR: This paper proposes to optimize the network over an adaptive backpropagation MLMKL framework using the gradient ascent method instead of dual objective function, or the estimation of the leave-one-out error, and achieves high performance.
Abstract: Multiple kernel learning (MKL) approach has been proposed for kernel methods and has shown high performance for solving some real-world applications. It consists on learning the optimal kernel from one layer of multiple predefined kernels. Unfortunately, this approach is not rich enough to solve relatively complex problems. With the emergence and the success of the deep learning concept, multilayer of multiple kernel learning (MLMKL) methods were inspired by the idea of deep architecture. They are introduced in order to improve the conventional MKL methods. Such architectures tend to learn deep kernel machines by exploring the combinations of multiple kernels in a multilayer structure. However, existing MLMKL methods often have trouble with the optimization of the network for two or more layers. Additionally, they do not always outperform the simplest method of combining multiple kernels (i.e., MKL). In order to improve the effectiveness of MKL approaches, we introduce, in this paper, a novel backpropagation MLMKL framework. Specifically, we propose to optimize the network over an adaptive backpropagation algorithm. We use the gradient ascent method instead of dual objective function, or the estimation of the leave-one-out error. We test our proposed method through a large set of experiments on a variety of benchmark data sets. We have successfully optimized the system over many layers. Empirical results over an extensive set of experiments show that our algorithm achieves high performance compared to the traditional MKL approach and existing MLMKL methods.

36 citations

Journal ArticleDOI
TL;DR: A Text-To-Speech (TTS) synthesis system for modern standard Arabic language based on statistical parametric approach and Mel-cepstral coefficients is described and the proposed method for synthesis system can generate intelligible and natural speech.

26 citations

Journal ArticleDOI
TL;DR: An efficient Arabic TTS system based on statistical parametric approach and non-uniform units speech synthesis and a new simple stacked neural network approach to improve the accuracy of the acoustic models is presented.
Abstract: Text-to-speech system (TTS), known also as speech synthesizer, is one of the important technology in the last years due to the expanding field of applications. Several works on speech synthesizer have been made on English and French, whereas many other languages, including Arabic, have been recently taken into consideration. The area of Arabic speech synthesis has not sufficient progress and it is still in its first stage with a low speech quality. In fact, speech synthesis systems face several problems (e.g. speech quality, articulatory effect, etc.). Different methods were proposed to solve these issues, such as the use of large and different unit sizes. This method is mainly implemented with the concatenative approach to improve the speech quality and several works have proved its effectiveness. This paper presents an efficient Arabic TTS system based on statistical parametric approach and non-uniform units speech synthesis. Our system includes a diacritization engine. Modern Arabic text is written without mention the vowels, called also diacritic marks. Unfortunately, these marks are very important to define the right pronunciation of the text which explains the incorporation of the diacritization engine to our system. In this work, we propose a simple approach based on deep neural networks. Deep neural networks are trained to directly predict the diacritic marks and to predict the spectral and prosodic parameters. Furthermore, we propose a new simple stacked neural network approach to improve the accuracy of the acoustic models. Experimental results show that our diacritization system allows the generation of full diacritized text with high precision and our synthesis system produces high-quality speech.

11 citations

Proceedings ArticleDOI
22 Jun 2013
TL;DR: An Arabic text to speech synthesis system based on statistical parametric synthesis, where MFCC neural network architecture and an objective evaluation with the MFCC distortion measure are given in this paper.
Abstract: With the increasing number of users of text to speech applications, high quality speech synthesis is required. However, only few researches concern Arabic text to speech applications. Compared with other languages such as English and French the quality of Arabic synthesis speech is still poor. For these reasons, we propose in this paper an Arabic text to speech synthesis system based on statistical parametric synthesis. Mel Frequency Cepstral Coefficients (MFCC), energy and pitch are predicted using back propagation artificial neural networks and then transformed into speech using Mel Log Spectrum Approximation filter. Often, in Arabic written text, the short vowels called diacritic marks are omitted. So, a diacritization system is proposed to resolve this problem. Different unit sizes are considered in speech database which are phoneme, diphone and triphone. MFCC neural network architecture and an objective evaluation with the MFCC distortion measure are given in this paper.

9 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The results demonstrate that the DTW-based similarity measure of the NDVI time series can be effectively used to map large-area rice cropping systems with diverse cultivation processes.
Abstract: Normalized Difference Vegetation Index (NDVI) derived from Moderate Resolution Imaging Spectroradiometer (MODIS) time-series data has been widely used in the fields of crop and rice classification. The cloudy and rainy weather characteristics of the monsoon season greatly reduce the likelihood of obtaining high-quality optical remote sensing images. In addition, the diverse crop-planting system in Vietnam also hinders the comparison of NDVI among different crop stages. To address these problems, we apply a Dynamic Time Warping (DTW) distance-based similarity measure approach and use the entire yearly NDVI time series to reduce the inaccuracy of classification using a single image. We first de-noise the NDVI time series using S-G filtering based on the TIMESAT software. Then, a standard NDVI time-series base for rice growth is established based on field survey data and Google Earth sample data. NDVI time-series data for each pixel are constructed and the DTW distance with the standard rice growth NDVI time series is calculated. Then, we apply thresholds to extract rice growth areas. A qualitative assessment using statistical data and a spatial assessment using sampled data from the rice-cropping map reveal a high mapping accuracy at the national scale between the statistical data, with the corresponding R2 being as high as 0.809; however, the mapped rice accuracy decreased at the provincial scale due to the reduced number of rice planting areas per province. An analysis of the results indicates that the 500-m resolution MODIS data are limited in terms of mapping scattered rice parcels. The results demonstrate that the DTW-based similarity measure of the NDVI time series can be effectively used to map large-area rice cropping systems with diverse cultivation processes.

123 citations

Journal ArticleDOI
Yong He1, Hong Zeng1, Yangyang Fan1, Shuaisheng Ji1, Jianjian Wu1 
TL;DR: An approach to detect oilseed rape pests based on deep learning, which improves the mean average precision (mAP) to 77.14%; the result shows that this approach surpasses the original model obviously and is helpful for integrated pest management.
Abstract: In this paper, we proposed an approach to detect oilseed rape pests based on deep learning, which improves the mean average precision (mAP) to 77.14%; the result increased by 9.7% with the original model. We adopt this model to mobile platform to let every farmer able to use this program, which will diagnose pests in real time and provide suggestions on pest controlling. We designed an oilseed rape pest imaging database with 12 typical oilseed rape pests and compared the performance of five models, SSD w/Inception is chosen as the optimal model. Moreover, for the purpose of the high mAP, we have used data augmentation (DA) and added a dropout layer. The experiments are performed on the Android application we developed, and the result shows that our approach surpasses the original model obviously and is helpful for integrated pest management. This application has improved environmental adaptability, response speed, and accuracy by contrast with the past works and has the advantage of low cost and simple operation, which are suitable for the pest monitoring mission of drones and Internet of Things (IoT).

41 citations

Journal ArticleDOI
TL;DR: This article presents a comprehensive overview of the state-of-the-art approaches that bridge the MKL and deep learning techniques, systematically reviewing the typical hybrid models, training techniques, and their theoretical and practical benefits.

37 citations

Proceedings ArticleDOI
01 Nov 2019
TL;DR: A deep learning model called Lung Disease Classification (LDC), combined with advanced data normalization and data augmentation techniques, for high-performance classification in lung disease diagnosis and guarantees better performance than other previously reported approaches.
Abstract: The advanced technologies are essential to achieving the improvement of medicine. More specifically, an extensive investigation in a partnership among researchers, health care providers, and patients is integral to bringing precise and customized treatment strategies in taking care of various diseases. This paper aims to assess the degree of accuracy acceptable in the medical field by utilizing deep learning to publicly available data. First, we extracted spectrogram features and labels of the annotated lung sound samples and used them as an input to our 2D Convolutional Neural Network (CNN) model. Secondly, we normalized the lung sounds to remove the peak values and noise from them. For deep learning classification, publicly available data was not sufficient to conduct the learning process. Finally, we have created a deep learning model called Lung Disease Classification (LDC), combined with advanced data normalization and data augmentation techniques, for high-performance classification in lung disease diagnosis. The final accuracy obtained after the normalization and augmentation was approximately 97%. The proposed model paves the way for adequate assessment of the degree of accuracy acceptable in the medical field and guarantees better performance than other previously reported approaches.

36 citations

Journal ArticleDOI
TL;DR: The application of a combined machine learning network and physics-based model to detect chatter in milling led to a 98.90% success rate in chatter detection while allowing to further train the network during production with the help of the physics- based, deterministic model.
Abstract: Unstable vibrations, chatter, in machining lead to poor surface finish and damage to the tool and machine. It is desired to detect and avoid chatter on-line without false alarms for improved productivity. This paper presents the application of a combined machine learning network and physics-based model to detect chatter in milling. The vibration data collected during machining is converted into moving short-time frequency spectrums, whose features are mapped to five machining states as air cut, entry into and exit from the workpiece, stable cut, and chatter conditions. The machine learning network was trained and its architecture was reduced to a computationally optimal network with 3 convolution blocks followed by a neural network with one hidden layer. A parallel algorithm, which Kalman filters the stable forced vibrations to isolate chatter signals in raw data, is used to detect the chatter and its frequency. The combination of the machine learning and physics-based model led to a 98.90% success rate in chatter detection while allowing to further train the network during production with the help of the physics-based, deterministic model.

34 citations