scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Neural and Evolutionary Computing in 2013"


Posted Content
TL;DR: In this paper, deep recurrent neural networks (RNNs) are used to combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.
Abstract: Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is unknown. The combination of these methods with the Long Short-term Memory RNN architecture has proved particularly fruitful, delivering state-of-the-art results in cursive handwriting recognition. However RNN performance in speech recognition has so far been disappointing, with better results returned by deep feedforward networks. This paper investigates \emph{deep recurrent neural networks}, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. When trained end-to-end with suitable regularisation, we find that deep Long Short-term Memory RNNs achieve a test set error of 17.7% on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score.

5,310 citations


Posted Content
TL;DR: With enhanced local modeling via the micro network, the proposed deep network structure NIN is able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers.
Abstract: We propose a novel deep network structure called "Network In Network" (NIN) to enhance model discriminability for local patches within the receptive field. The conventional convolutional layer uses linear filters followed by a nonlinear activation function to scan the input. Instead, we build micro neural networks with more complex structures to abstract the data within the receptive field. We instantiate the micro neural network with a multilayer perceptron, which is a potent function approximator. The feature maps are obtained by sliding the micro networks over the input in a similar manner as CNN; they are then fed into the next layer. Deep NIN can be implemented by stacking mutiple of the above described structure. With enhanced local modeling via the micro network, we are able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers. We demonstrated the state-of-the-art classification performances with NIN on CIFAR-10 and CIFAR-100, and reasonable performances on SVHN and MNIST datasets.

3,905 citations


Posted Content
TL;DR: This paper shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with long-range structure, simply by predicting one data point at a time.
Abstract: This paper shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with long-range structure, simply by predicting one data point at a time. The approach is demonstrated for text (where the data are discrete) and online handwriting (where the data are real-valued). It is then extended to handwriting synthesis by allowing the network to condition its predictions on a text sequence. The resulting system is able to generate highly realistic cursive handwriting in a wide variety of styles.

3,551 citations


Posted Content
TL;DR: In this paper, the authors show that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions.
Abstract: Despite the widespread practical success of deep learning methods, our theoretical understanding of the dynamics of learning in deep neural networks remains quite sparse. We attempt to bridge the gap between the theory and practice of deep learning by systematically analyzing learning dynamics for the restricted case of deep linear neural networks. Despite the linearity of their input-output map, such networks have nonlinear gradient descent dynamics on weights that change with the addition of each new hidden layer. We show that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions. We provide an analytical description of these phenomena by finding new exact solutions to the nonlinear dynamics of deep learning. Our theoretical analysis also reveals the surprising finding that as the depth of a network approaches infinity, learning speed can nevertheless remain finite: for a special class of initial conditions on the weights, very deep networks incur only a finite, depth independent, delay in learning speed relative to shallow networks. We show that, under certain conditions on the training data, unsupervised pretraining can find this special class of initial conditions, while scaled random Gaussian initializations cannot. We further exhibit a new class of random orthogonal initial conditions on weights that, like unsupervised pre-training, enjoys depth independent learning times. We further show that these initial conditions also lead to faithful propagation of gradients even in deep nonlinear networks, as long as they operate in a special regime known as the edge of chaos.

702 citations


Posted Content
TL;DR: In this article, the authors explore different ways to extend a recurrent neural network (RNN) to a \textit{deep} RNN by carefully analyzing and understanding the architecture of an RNN.
Abstract: In this paper, we explore different ways to extend a recurrent neural network (RNN) to a \textit{deep} RNN. We start by arguing that the concept of depth in an RNN is not as clear as it is in feedforward neural networks. By carefully analyzing and understanding the architecture of an RNN, however, we find three points of an RNN which may be made deeper; (1) input-to-hidden function, (2) hidden-to-hidden transition and (3) hidden-to-output function. Based on this observation, we propose two novel architectures of a deep RNN which are orthogonal to an earlier attempt of stacking multiple recurrent layers to build a deep RNN (Schmidhuber, 1992; El Hihi and Bengio, 1996). We provide an alternative interpretation of these deep RNNs using a novel framework based on neural operators. The proposed deep RNNs are empirically evaluated on the tasks of polyphonic music prediction and language modeling. The experimental result supports our claim that the proposed deep RNNs benefit from the depth and outperform the conventional, shallow RNNs.

690 citations


Posted Content
TL;DR: A relatively comprehensive list of all the algorithms based on swarm intelligence, bio-inspired, physics-based and chemistry-based, depending on the sources of inspiration, that have become popular tools for solving real-world problems.
Abstract: Swarm intelligence and bio-inspired algorithms form a hot topic in the developments of new algorithms inspired by nature. These nature-inspired metaheuristic algorithms can be based on swarm intelligence, biological systems, physical and chemical systems. Therefore, these algorithms can be called swarm-intelligence-based, bio-inspired, physics-based and chemistry-based, depending on the sources of inspiration. Though not all of them are efficient, a few algorithms have proved to be very effi cient and thus have become popular tools for solving real-world problems. Some algorithms are insuffici ently studied. The purpose of this review is to present a relatively comprehensive list of all the algorithms in the literature, so as to inspire further research.

508 citations


Journal ArticleDOI
TL;DR: A comprehensive review of this living and evolving discipline of Swarm Intelligence shows that the firefly algorithm could be applied to every problem arising in practice and encourages new researchers and algorithm developers to use this simple and yet very efficient algorithm for problem solving.
Abstract: The firefly algorithm has become an increasingly important tool of Swarm Intelligence that has been applied in almost all areas of optimization, as well as engineering practice. Many problems from various areas have been successfully solved using the firefly algorithm and its variants. In order to use the algorithm to solve diverse problems, the original firefly algorithm needs to be modified or hybridized. This paper carries out a comprehensive review of this living and evolving discipline of Swarm Intelligence, in order to show that the firefly algorithm could be applied to every problem arising in practice. On the other hand, it encourages new researchers and algorithm developers to use this simple and yet very efficient algorithm for problem solving. It often guarantees that the obtained results will meet the expectations.

147 citations


Posted Content
TL;DR: The results showed that the proposed classifier has the ability of recognizing and classifying EEG signals efficiently and was evaluated using in total 300 EEG signals.
Abstract: In this paper, a wavelet-based neural network (WNN) classifier for recognizing EEG signals is implemented and tested under three sets EEG signals (healthy subjects, patients with epilepsy and patients with epileptic syndrome during the seizure). First, the Discrete Wavelet Transform (DWT) with the Multi-Resolution Analysis (MRA) is applied to decompose EEG signal at resolution levels of the components of the EEG signal (delta, theta, alpha, beta and gamma) and the Parsevals theorem are employed to extract the percentage distribution of energy features of the EEG signal at different resolution levels. Second, the neural network (NN) classifies these extracted features to identify the EEGs type according to the percentage distribution of energy features. The performance of the proposed algorithm has been evaluated using in total 300 EEG signals. The results showed that the proposed classifier has the ability of recognizing and classifying EEG signals efficiently.

137 citations


Posted Content
TL;DR: In this paper, a new swarm intelligence algorithm based on the bat algorithm is presented, which is hybridized with differential evolution strategies, showing promising results of the standard benchmark functions, this hybridization also significantly improves the original bat algorithm.
Abstract: Swarm intelligence is a very powerful technique to be used for optimization purposes. In this paper we present a new swarm intelligence algorithm, based on the bat algorithm. The Bat algorithm is hybridized with differential evolution strategies. Besides showing very promising results of the standard benchmark functions, this hybridization also significantly improves the original bat algorithm.

135 citations


Posted Content
TL;DR: In this paper, a novel approach for single channel source separation (SCSS) using a deep neural network (DNN) architecture is introduced, where the trained DNN is utilized to aid in estimation of each source in the mixed signal.
Abstract: In this paper, a novel approach for single channel source separation (SCSS) using a deep neural network (DNN) architecture is introduced. Unlike previous studies in which DNN and other classifiers were used for classifying time-frequency bins to obtain hard masks for each source, we use the DNN to classify estimated source spectra to check for their validity during separation. In the training stage, the training data for the source signals are used to train a DNN. In the separation stage, the trained DNN is utilized to aid in estimation of each source in the mixed signal. Single channel source separation problem is formulated as an energy minimization problem where each source spectra estimate is encouraged to fit the trained DNN model and the mixed signal spectrum is encouraged to be written as a weighted sum of the estimated source spectra. The proposed approach works regardless of the energy scale differences between the source signals in the training and separation stages. Nonnegative matrix factorization (NMF) is used to initialize the DNN estimate for each source. The experimental results show that using DNN initialized by NMF for source separation improves the quality of the separated signal compared with using NMF for source separation.

87 citations


Posted Content
TL;DR: This paper introduces bilevel evolutionary algorithm based on quadratic approximations (BLEAQ) of optimal lower level variables with respect to the upper level variables, capable of handling bileVEL problems with different kinds of complexities in relatively smaller number of function evaluations.
Abstract: Bilevel optimization problems are a class of challenging optimization problems, which contain two levels of optimization tasks. In these problems, the optimal solutions to the lower level problem become possible feasible candidates to the upper level problem. Such a requirement makes the optimization problem difficult to solve, and has kept the researchers busy towards devising methodologies, which can efficiently handle the problem. Despite the efforts, there hardly exists any effective methodology, which is capable of handling a complex bilevel problem. In this paper, we introduce bilevel evolutionary algorithm based on quadratic approximations (BLEAQ) of optimal lower level variables with respect to the upper level variables. The approach is capable of handling bilevel problems with different kinds of complexities in relatively smaller number of function evaluations. Ideas from classical optimization have been hybridized with evolutionary methods to generate an efficient optimization algorithm for generic bilevel problems. The efficacy of the algorithm has been shown on two sets of test problems. The first set is a recently proposed SMD test set, which contains problems with controllable complexities, and the second set contains standard test problems collected from the literature. The proposed method has been evaluated against two benchmarks, and the performance gain is observed to be significant.

Posted Content
TL;DR: The results show that HyperNEAT struggles with performing image classification by itself, but can be effective in training a feature extractor that other ML approaches can learn from, and NeuroEvolution combined with other ML methods provides an intriguing area of research that can replicate the processes in nature.
Abstract: An important goal for the machine learning (ML) community is to create approaches that can learn solutions with human-level capability. One domain where humans have held a significant advantage is visual processing. A significant approach to addressing this gap has been machine learning approaches that are inspired from the natural systems, such as artificial neural networks (ANNs), evolutionary computation (EC), and generative and developmental systems (GDS). Research into deep learning has demonstrated that such architectures can achieve performance competitive with humans on some visual tasks; however, these systems have been primarily trained through supervised and unsupervised learning algorithms. Alternatively, research is showing that evolution may have a significant role in the development of visual systems. Thus this paper investigates the role neuro-evolution (NE) can take in deep learning. In particular, the Hypercube-based NeuroEvolution of Augmenting Topologies is a NE approach that can effectively learn large neural structures by training an indirect encoding that compresses the ANN weight pattern as a function of geometry. The results show that HyperNEAT struggles with performing image classification by itself, but can be effective in training a feature extractor that other ML approaches can learn from. Thus NeuroEvolution combined with other ML methods provides an intriguing area of research that can replicate the processes in nature.

Journal ArticleDOI
TL;DR: A neural network (NN) based approach to predict customer churn in subscription of cellular wireless services is proposed and the results of experiments indicate that neural network based approach can Predict customer churn with accuracy more than 92%.
Abstract: Marketing literature states that it is more costly to engage a new customer than to retain an existing loyal customer. Churn prediction models are developed by academics and practitioners to effectively manage and control customer churn in order to retain existing customers. As churn management is an important activity for companies to retain loyal customers, the ability to correctly predict customer churn is necessary. As the cellular network services market becoming more competitive, customer churn management has become a crucial task for mobile communication operators. This paper proposes a neural network based approach to predict customer churn in subscription of cellular wireless services. The results of experiments indicate that neural network based approach can predict customer churn.

Posted Content
TL;DR: This work provides a general drift theorem that includes bounds on the upper and lower tail of the hitting time distribution and can be specialized into virtually all existing drift theorems with drift towards the target from the literature.
Abstract: Drift analysis is one of the state-of-the-art techniques for the runtime analysis of randomized search heuristics (RSHs) such as evolutionary algorithms (EAs), simulated annealing etc. The vast majority of existing drift theorems yield bounds on the expected value of the hitting time for a target state, e.g., the set of optimal solutions, without making additional statements on the distribution of this time. We address this lack by providing a general drift theorem that includes bounds on the upper and lower tail of the hitting time distribution. The new tail bounds are applied to prove very precise sharp-concentration results on the running time of a simple EA on standard benchmark problems, including the class of general linear functions. Surprisingly, the probability of deviating by an $r$-factor in lower order terms of the expected time decreases exponentially with $r$ on all these problems. The usefulness of the theorem outside the theory of RSHs is demonstrated by deriving tail bounds on the number of cycles in random permutations. All these results handle a position-dependent (variable) drift that was not covered by previous drift theorems with tail bounds. Moreover, our theorem can be specialized into virtually all existing drift theorems with drift towards the target from the literature. Finally, user-friendly specializations of the general drift theorem are given.

Posted ContentDOI
TL;DR: This paper explains genetic algorithm for novice in this field and basic philosophy of genetic algorithm and its flowchart are described.
Abstract: This paper explains genetic algorithm for novice in this field. Basic philosophy of genetic algorithm and its flowchart are described. Step by step numerical computation of genetic algorithm for solving simple mathematical equality problem will be briefly explained

Posted Content
TL;DR: A novel method is proposed to detect the prognostic biomarkers of survival in colorectal cancer patients using wavelet analysis, genetic algorithm, and Bayes classifier, and the corresponding protein markers were located based on the position of optimized features.
Abstract: Biomarkers which predict patient's survival can play an important role in medical diagnosis and treatment. How to select the significant biomarkers from hundreds of protein markers is a key step in survival analysis. In this paper a novel method is proposed to detect the prognostic biomarkers of survival in colorectal cancer patients using wavelet analysis, genetic algorithm, and Bayes classifier. One dimensional discrete wavelet transform (DWT) is normally used to reduce the dimensionality of biomedical data. In this study one dimensional continuous wavelet transform (CWT) was proposed to extract the features of colorectal cancer data. One dimensional CWT has no ability to reduce dimensionality of data, but captures the missing features of DWT, and is complementary part of DWT. Genetic algorithm was performed on extracted wavelet coefficients to select the optimized features, using Bayes classifier to build its fitness function. The corresponding protein markers were located based on the position of optimized features. Kaplan-Meier curve and Cox regression model were used to evaluate the performance of selected biomarkers. Experiments were conducted on colorectal cancer dataset and several significant biomarkers were detected. A new protein biomarker CD46 was found to significantly associate with survival time.

Proceedings ArticleDOI
TL;DR: This paper introduces a new dynamic neighborhood network for particle swarm optimization that is compared with other two algorithms having static neighborhood topologies on a set of classic benchmark problems and showed superior performance for C-PSO regarding escaping local optima and convergence speed.
Abstract: This paper introduces a new dynamic neighborhood network for particle swarm optimization. In the proposed Clubs-based Particle Swarm Optimization (C-PSO) algorithm, each particle initially joins a default number of what we call 'clubs'. Each particle is affected by its own experience and the experience of the best performing member of the clubs it is a member of. Clubs membership is dynamic, where the worst performing particles socialize more by joining more clubs to learn from other particles and the best performing particles are made to socialize less by leaving clubs to reduce their strong influence on other members. Particles return gradually to default membership level when they stop showing extreme performance. Inertia weights of swarm members are made random within a predefined range. This proposed dynamic neighborhood algorithm is compared with other two algorithms having static neighborhood topologies on a set of classic benchmark problems. The results showed superior performance for C-PSO regarding escaping local optima and convergence speed.

Posted Content
TL;DR: The results show that deep learning methods are able to learn physiologically important representations and detect latent relations in neuroimaging data.
Abstract: Deep learning methods have recently made notable advances in the tasks of classification and representation learning. These tasks are important for brain imaging and neuroscience discovery, making the methods attractive for porting to a neuroimager's toolbox. Success of these methods is, in part, explained by the flexibility of deep learning models. However, this flexibility makes the process of porting to new areas a difficult parameter optimization problem. In this work we demonstrate our results (and feasible parameter ranges) in application of deep learning methods to structural and functional brain imaging data. We also describe a novel constraint-based approach to visualizing high dimensional data. We use it to analyze the effect of parameter choices on data transformations. Our results show that deep learning methods are able to learn physiologically important representations and detect latent relations in neuroimaging data.

Book ChapterDOI
TL;DR: To characterise optimisation ability of algorithms, the expected first hitting time (FHT) is suggested, i.e., the time until a search point in the vicinity of the optimum is visited.
Abstract: We reconsider stochastic convergence analyses of particle swarm optimisation, and point out that previously obtained parameter conditions are not always sufficient to guarantee mean square convergence to a local optimum. We show that stagnation can in fact occur for non-trivial configurations in non-optimal parts of the search space, even for simple functions like SPHERE. The convergence properties of the basic PSO may in these situations be detrimental to the goal of optimisation, to discover a sufficiently good solution within reasonable time. To characterise optimisation ability of algorithms, we suggest the expected first hitting time (FHT), i.e., the time until a search point in the vicinity of the optimum is visited. It is shown that a basic PSO may have infinite expected FHT, while an algorithm introduced here, the Noisy PSO, has finite expected FHT on some functions.

Posted Content
TL;DR: A new computational approach to the quantum perceptron neural network can achieve learning in low-cost computation and is capable to construct its own set of activation operators to be applied widely in both quantum and classical applications to overcome the linearity limitation of classical perceptron.
Abstract: Recently, with the rapid development of technology, there are a lot of applications require to achieve low-cost learning. However the computational power of classical artificial neural networks, they are not capable to provide low-cost learning. In contrast, quantum neural networks may be representing a good computational alternate to classical neural network approaches, based on the computational power of quantum bit (qubit) over the classical bit. In this paper we present a new computational approach to the quantum perceptron neural network can achieve learning in low-cost computation. The proposed approach has only one neuron can construct self-adaptive activation operators capable to accomplish the learning process in a limited number of iterations and, thereby, reduce the overall computational cost. The proposed approach is capable to construct its own set of activation operators to be applied widely in both quantum and classical applications to overcome the linearity limitation of classical perceptron. The computational power of the proposed approach is illustrated via solving variety of problems where promising and comparable results are given.

Proceedings ArticleDOI
TL;DR: In this article, a modified PSO optimization is used to find out the best model parameter that minimizes the sum square error between the measured and the simulated currents, which is based on a simple startup test using a standard V/F inverter.
Abstract: This paper presents a new technique for induction motor parameter identification. The proposed technique is based on a simple startup test using a standard V/F inverter. The recorded startup currents are compared to that obtained by simulation of an induction motor model. A Modified PSO optimization is used to find out the best model parameter that minimizes the sum square error between the measured and the simulated currents. The performance of the modified PSO is compared with other optimization methods including line search, conventional PSO and Genetic Algorithms. Simulation results demonstrate the ability of the proposed technique to capture the true values of the machine parameters and the superiority of the results obtained using the modified PSO over other optimization techniques.

Posted Content
TL;DR: In this paper, an adaptive amoeba algorithm is proposed to address the shortest path tree (SPT) problem in dynamic graphs, where the edge weight updates consists of three categories: edge weight increases, edge weight decreases, the mixture of them.
Abstract: This paper presents an adaptive amoeba algorithm to address the shortest path tree (SPT) problem in dynamic graphs. In dynamic graphs, the edge weight updates consists of three categories: edge weight increases, edge weight decreases, the mixture of them. Existing work on this problem solve this issue through analyzing the nodes influenced by the edge weight updates and recompute these affected vertices. However, when the network becomes big, the process will become complex. The proposed method can overcome the disadvantages of the existing approaches. The most important feature of this algorithm is its adaptivity. When the edge weight changes, the proposed algorithm can recognize the affected vertices and reconstruct them spontaneously. To evaluate the proposed adaptive amoeba algorithm, we compare it with the Label Setting algorithm and Bellman-Ford algorithm. The comparison results demonstrate the effectiveness of the proposed method.

Posted Content
TL;DR: This work proposes a new benchmark for visual representations on which the neural representation in visual area IT is superior to visual area V4 and a recent supervised algorithm achieves performance comparable to that of IT for an intermediate level of image variation difficulty, and surpasses IT at a higher difficulty level.
Abstract: A key requirement for the development of effective learning representations is their evaluation and comparison to representations we know to be effective. In natural sensory domains, the community has viewed the brain as a source of inspiration and as an implicit benchmark for success. However, it has not been possible to directly test representational learning algorithms directly against the representations contained in neural systems. Here, we propose a new benchmark for visual representations on which we have directly tested the neural representation in multiple visual cortical areas in macaque (utilizing data from [Majaj et al., 2012]), and on which any computer vision algorithm that produces a feature space can be tested. The benchmark measures the effectiveness of the neural or machine representation by computing the classification loss on the ordered eigendecomposition of a kernel matrix [Montavon et al., 2011]. In our analysis we find that the neural representation in visual area IT is superior to visual area V4. In our analysis of representational learning algorithms, we find that three-layer models approach the representational performance of V4 and the algorithm in [Le et al., 2012] surpasses the performance of V4. Impressively, we find that a recent supervised algorithm [Krizhevsky et al., 2012] achieves performance comparable to that of IT for an intermediate level of image variation difficulty, and surpasses IT at a higher difficulty level. We believe this result represents a major milestone: it is the first learning algorithm we have found that exceeds our current estimate of IT representation performance. We hope that this benchmark will assist the community in matching the representational performance of visual cortex and will serve as an initial rallying point for further correspondence between representations derived in brains and machines.

Journal ArticleDOI
TL;DR: By using two-layer feedforward network with tan-sigmoid transmission function in input and output layers, this article can anticipate participation rate of public in kohgiloye and boyerahmad province in future presidential election of islamic republic of iran with 91% accuracy.
Abstract: It is approved that artificial neural networks can be considerable effective in anticipating and analyzing flows in which traditional methods and statics are not able to solve. in this article, by using two-layer feedforward network with tan-sigmoid transmission function in input and output layers, we can anticipate participation rate of public in kohgiloye and boyerahmad province in future presidential election of islamic republic of iran with 91% accuracy. the assessment standards of participation such as confusion matrix and roc diagrams have been approved our claims.

Posted Content
TL;DR: The application of artificial neural network backpropagation method for measuring the severity of the disease, where the observed X-ray range from wrist to fingers, and the main procedures of system are image processing, feature extraction, and artificial Neural network process.
Abstract: The examination of Osteoarthritis disease through X-ray by rheumatology can be classified into four grade of severity. This paper discusses about the application of artificial neural network backpropagation method for measuring the severity of the disease, where the observed X-ray range from wrist to fingers. The main procedures of system in this paper is divided into three, which are image processing, feature extraction, and artificial neural network process. First, an X-ray image digital (200x150 pixels and greyscale) will be thresholded, then extracted features based on probabilistic values of the color intensity of seven bit quantization result, and statistical textures. That feature values then will be normalizing to interval [0.1, 0.9], and then the result would be processing on backpropagation artificial neural network system as input to determine the severity of disease from an X-ray had input before it. From testing with learning rate 0.3, momentum 0.4, hidden units five pieces and about 132 feature vectors, this system had had a level of accuracy of 100% for learning data, 80% for learning and non-learning data, and 66.6% for non-learning data

Proceedings ArticleDOI
TL;DR: The comparison demonstrates that both PSO and BP based neural networks outperform SARIMA, HW and SVM models for all three time series datasets.
Abstract: Seasonality is a distinctive characteristic which is often observed in many practical time series. Artificial Neural Networks (ANNs) are a class of promising models for efficiently recognizing and forecasting seasonal patterns. In this paper, the Particle Swarm Optimization (PSO) approach is used to enhance the forecasting strengths of feedforward ANN (FANN) as well as Elman ANN (EANN) models for seasonal data. Three widely popular versions of the basic PSO algorithm, viz. Trelea-I, Trelea-II and Clerc-Type1 are considered here. The empirical analysis is conducted on three real-world seasonal time series. Results clearly show that each version of the PSO algorithm achieves notably better forecasting accuracies than the standard Backpropagation (BP) training method for both FANN and EANN models. The neural network forecasting results are also compared with those from the three traditional statistical models, viz. Seasonal Autoregressive Integrated Moving Average (SARIMA), Holt-Winters (HW) and Support Vector Machine (SVM). The comparison demonstrates that both PSO and BP based neural networks outperform SARIMA, HW and SVM models for all three time series datasets. The forecasting performances of ANNs are further improved through combining the outputs from the three PSO based models.

Posted Content
TL;DR: A framework for parallel GAs on the Hadoop platform, following the paradigm of MapReduce is described, to allow the user to focus on the aspects of GA that are specific to the problem to be addressed, being sure that this task is going to be correctly executed on the Cloud with a good performance.
Abstract: Genetic Algorithms (GAs) are powerful metaheuristic techniques mostly used in many real-world applications. The sequential execution of GAs requires considerable computational power both in time and resources. Nevertheless, GAs are naturally parallel and accessing a parallel platform such as Cloud is easy and cheap. Apache Hadoop is one of the common services that can be used for parallel applications. However, using Hadoop to develop a parallel version of GAs is not simple without facing its inner workings. Even though some sequential frameworks for GAs already exist, there is no framework supporting the development of GA applications that can be executed in parallel. In this paper is described a framework for parallel GAs on the Hadoop platform, following the paradigm of MapReduce. The main purpose of this framework is to allow the user to focus on the aspects of GA that are specific to the problem to be addressed, being sure that this task is going to be correctly executed on the Cloud with a good performance. The framework has been also exploited to develop an application for Feature Subset Selection problem. A preliminary analysis of the performance of the developed GA application has been performed using three datasets and shown very promising performance.

Posted ContentDOI
TL;DR: A novel weighted ensemble scheme which intelligently combines multiple training algorithms to increase the ANN forecast accuracy and achieves significantly better forecast accuracies than two other popular statistical models.
Abstract: Enhancing the robustness and accuracy of time series forecasting models is an active area of research. Recently, Artificial Neural Networks (ANNs) have found extensive applications in many practical forecasting problems. However, the standard backpropagation ANN training algorithm has some critical issues, e.g. it has a slow convergence rate and often converges to a local minimum, the complex pattern of error surfaces, lack of proper training parameters selection methods, etc. To overcome these drawbacks, various improved training methods have been developed in literature; but, still none of them can be guaranteed as the best for all problems. In this paper, we propose a novel weighted ensemble scheme which intelligently combines multiple training algorithms to increase the ANN forecast accuracies. The weight for each training algorithm is determined from the performance of the corresponding ANN model on the validation dataset. Experimental results on four important time series depicts that our proposed technique reduces the mentioned shortcomings of individual ANN training algorithms to a great extent. Also it achieves significantly better forecast accuracies than two other popular statistical models.

Posted Content
TL;DR: This paper explores how sparse constraints in the DBN affect the classification accuracy for digit recognition in three different datasets and provides initial estimations of their usefulness by altering different parameters such as the group size and overlap percentage.
Abstract: Deep Belief Networks (DBN) have been successfully applied on popular machine learning tasks. Specifically, when applied on hand-written digit recognition, DBNs have achieved approximate accuracy rates of 98.8%. In an effort to optimize the data representation achieved by the DBN and maximize their descriptive power, recent advances have focused on inducing sparse constraints at each layer of the DBN. In this paper we present a theoretical approach for sparse constraints in the DBN using the mixed norm for both non-overlapping and overlapping groups. We explore how these constraints affect the classification accuracy for digit recognition in three different datasets (MNIST, USPS, RIMES) and provide initial estimations of their usefulness by altering different parameters such as the group size and overlap percentage.

Posted Content
TL;DR: This work introduces several strategies to allow efficient storage of non-uniform messages in recently introduced sparse associative memories and analyzes and discusses the methods introduced.
Abstract: Associative memories are data structures that allow retrieval of stored messages from part of their content. They thus behave similarly to human brain that is capable for instance of retrieving the end of a song given its beginning. Among different families of associative memories, sparse ones are known to provide the best efficiency (ratio of the number of bits stored to that of bits used). Nevertheless, it is well known that non-uniformity of the stored messages can lead to dramatic decrease in performance. We introduce several strategies to allow efficient storage of non-uniform messages in recently introduced sparse associative memories. We analyse and discuss the methods introduced. We also present a practical application example.