scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Neural and Evolutionary Computing in 2012"


Posted Content
TL;DR: The authors randomly omits half of the feature detectors on each training case to prevent complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors.
Abstract: When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. Random "dropout" gives big improvements on many benchmark tasks and sets new records for speech and object recognition.

6,899 citations


Posted Content
TL;DR: This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence.
Abstract: Many machine learning tasks can be expressed as the transformation---or \emph{transduction}---of input sequences into output sequences: speech recognition, machine translation, protein secondary structure prediction and text-to-speech to name but a few. One of the key challenges in sequence transduction is learning to represent both the input and output sequences in a way that is invariant to sequential distortions such as shrinking, stretching and translating. Recurrent neural networks (RNNs) are a powerful sequence learning architecture that has proven capable of learning such representations. However RNNs traditionally require a pre-defined alignment between the input and output sequences to perform transduction. This is a severe limitation since \emph{finding} the alignment is the most difficult aspect of many sequence transduction problems. Indeed, even determining the length of the output sequence is often challenging. This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence. Experimental results for phoneme recognition are provided on the TIMIT speech corpus.

1,448 citations


Posted Content
TL;DR: The results show that the performance of CSO is promising on unimodal and multimodal benchmark functions with different search space dimension sizes, and its results are compared with state-of-the art optimization methods.
Abstract: Designing a fast and efficient optimization method with local optima avoidance capability on a variety of optimization problems is still an open problem for many researchers. In this work, the concept of a new global optimization method with an open implementation area is introduced as a Curved Space Optimization (CSO) method, which is a simple probabilistic optimization method enhanced by concepts of general relativity theory. To address global optimization challenges such as performance and convergence, this new method is designed based on transformation of a random search space into a new search space based on concepts of space-time curvature in general relativity theory. In order to evaluate the performance of our proposed method, an implementation of CSO is deployed and its results are compared on benchmark functions with state-of-the art optimization methods. The results show that the performance of CSO is promising on unimodal and multimodal benchmark functions with different search space dimension sizes.

116 citations


Posted Content
TL;DR: A comparative analysis of different mutation operators is presented, surrounded by a dilated discussion that justifying the relevance of genetic operators chosen to solving the TSP problem.
Abstract: The genetic algorithm includes some parameters that should be adjusted, so as to get reliable results. Choosing a representation of the problem addressed, an initial population, a method of selection, a crossover operator, mutation operator, the probabilities of crossover and mutation, and the insertion method creates a variant of genetic algorithms. Our work is part of the answer to this perspective to find a solution for this combinatorial problem. What are the best parameters to select for a genetic algorithm that creates a variety efficient to solve the Travelling Salesman Problem (TSP)? In this paper, we present a comparative analysis of different mutation operators, surrounded by a dilated discussion that justifying the relevance of genetic operators chosen to solving the TSP problem.

101 citations


Posted Content
TL;DR: A recent implementation of genetic algorithms is employed to study a range of standard benchmark functions for global optimization, finding that some are not very useful as challenging test functions, since they neither allow for a discrimination between different variants of genetic operators nor exhibit a dimensionality scaling resembling that of real-world problems.
Abstract: We have employed a recent implementation of genetic algorithms to study a range of standard benchmark functions for global optimization. It turns out that some of them are not very useful as challenging test functions, since they neither allow for a discrimination between different variants of genetic operators nor exhibit a dimensionality scaling resembling that of real-world problems, for example that of global structure optimization of atomic and molecular clusters. The latter properties seem to be simulated better by two other types of benchmark functions. One type is designed to be deceptive, exemplified here by Lunacek's function. The other type offers additional advantages of markedly increased complexity and of broad tunability in search space characteristics. For the latter type, we use an implementation based on randomly distributed Gaussians. We advocate the use of the latter types of test functions for algorithm development and benchmarking.

88 citations


Posted Content
TL;DR: This paper picks up Hajek's line of thought to prove a drift theorem that is very easy to use in evolutionary computation and shows how previous analyses involving the complicated theorem can be redone in a much simpler and clearer way.
Abstract: This erratum points out an error in the simplified drift theorem (SDT) [Algorithmica 59(3), 369-386, 2011]. It is also shown that a minor modification of one of its conditions is sufficient to establish a valid result. In many respects, the new theorem is more general than before. We no longer assume a Markov process nor a finite search space. Furthermore, the proof of the theorem is more compact than the previous ones. Finally, previous applications of the SDT are revisited. It turns out that all of these either meet the modified condition directly or by means of few additional arguments.

82 citations


Posted Content
TL;DR: In this paper, the authors proposed a novel mechanism to adapt surrogate-assisted population-based algorithms to adapt the lifelength of the current surrogate model and the surrogate hyper-parameters.
Abstract: This paper presents a novel mechanism to adapt surrogate-assisted population-based algorithms. This mechanism is applied to ACM-ES, a recently proposed surrogate-assisted variant of CMA-ES. The resulting algorithm, saACM-ES, adjusts online the lifelength of the current surrogate model (the number of CMA-ES generations before learning a new surrogate) and the surrogate hyper-parameters. Both heuristics significantly improve the quality of the surrogate model, yielding a significant speed-up of saACM-ES compared to the ACM-ES and CMA-ES baselines. The empirical validation of saACM-ES on the BBOB-2012 noiseless testbed demonstrates the efficiency and the scalability w.r.t the problem dimension and the population size of the proposed approach, that reaches new best results on some of the benchmark problems.

73 citations


Proceedings ArticleDOI
TL;DR: In this article, a memetic ABC (MABC) algorithm was developed that is hybridized with two local search heuristics: the Nelder-Mead algorithm (NMA) and the random walk with direction exploitation (RWDE).
Abstract: Memetic computation (MC) has emerged recently as a new paradigm of efficient algorithms for solving the hardest optimization problems. On the other hand, artificial bees colony (ABC) algorithms demonstrate good performances when solving continuous and combinatorial optimization problems. This study tries to use these technologies under the same roof. As a result, a memetic ABC (MABC) algorithm has been developed that is hybridized with two local search heuristics: the Nelder-Mead algorithm (NMA) and the random walk with direction exploitation (RWDE). The former is attended more towards exploration, while the latter more towards exploitation of the search space. The stochastic adaptation rule was employed in order to control the balancing between exploration and exploitation. This MABC algorithm was applied to a Special suite on Large Scale Continuous Global Optimization at the 2012 IEEE Congress on Evolutionary Computation. The obtained results the MABC are comparable with the results of DECC-G, DECC-G*, and MLCC.

62 citations


Posted Content
TL;DR: In this article, the authors explore the theoretical basis of covariance matrix adaptation evolution strategy (CMA-ES) from the information geometry viewpoint and derive the range of learning rates such that a step in the direction of the exact natural gradient improves the parameters in the expected fitness.
Abstract: This paper explores the theoretical basis of the covariance matrix adaptation evolution strategy (CMA-ES) from the information geometry viewpoint. To establish a theoretical foundation for the CMA-ES, we focus on a geometric structure of a Riemannian manifold of probability distributions equipped with the Fisher metric. We define a function on the manifold which is the expectation of fitness over the sampling distribution, and regard the goal of update of the parameters of sampling distribution in the CMA-ES as maximization of the expected fitness. We investigate the steepest ascent learning for the expected fitness maximization, where the steepest ascent direction is given by the natural gradient, which is the product of the inverse of the Fisher information matrix and the conventional gradient of the function. Our first result is that we can obtain under some types of parameterization of multivariate normal distribution the natural gradient of the expected fitness without the need for inversion of the Fisher information matrix. We find that the update of the distribution parameters in the CMA-ES is the same as natural gradient learning for expected fitness maximization. Our second result is that we derive the range of learning rates such that a step in the direction of the exact natural gradient improves the parameters in the expected fitness. We see from the close relation between the CMA-ES and natural gradient learning that the default setting of learning rates in the CMA-ES seems suitable in terms of monotone improvement in expected fitness. Then, we discuss the relation to the expectation-maximization framework and provide an information geometric interpretation of the CMA-ES.

47 citations


Posted Content
TL;DR: In this paper, an abstract model is presented that describes swarm performance depending on swarm density based on the dichotomy between cooperation and interference, and the effects of positive feedback probability, that is increasing over time in a decision making system, are understood by the help of a parameter that controls the feedback based on swarm's current consensus.
Abstract: Methods of general applicability are searched for in swarm intelligence with the aim of gaining new insights about natural swarms and to develop design methodologies for artificial swarms. An ideal solution could be a `swarm calculus' that allows to calculate key features of swarms such as expected swarm performance and robustness based on only a few parameters. To work towards this ideal, one needs to find methods and models with high degrees of generality. In this paper, we report two models that might be examples of exceptional generality. First, an abstract model is presented that describes swarm performance depending on swarm density based on the dichotomy between cooperation and interference. Typical swarm experiments are given as examples to show how the model fits to several different results. Second, we give an abstract model of collective decision making that is inspired by urn models. The effects of positive feedback probability, that is increasing over time in a decision making system, are understood by the help of a parameter that controls the feedback based on the swarm's current consensus. Several applicable methods, such as the description as Markov process, calculation of splitting probabilities, mean first passage times, and measurements of positive feedback, are discussed and applications to artificial and natural swarms are reported.

46 citations


Journal ArticleDOI
TL;DR: The hardness of fitness functions is an important research topic in the field of evolutionary computation as mentioned in this paper, and the study can help understand the ability of evolutionary algorithms and provide a guideline to the design of benchmarks.
Abstract: The hardness of fitness functions is an important research topic in the field of evolutionary computation. In theory, the study can help understanding the ability of evolutionary algorithms. In practice, the study may provide a guideline to the design of benchmarks. The aim of this paper is to answer the following research questions: Given a fitness function class, which functions are the easiest with respect to an evolutionary algorithm? Which are the hardest? How are these functions constructed? The paper provides theoretical answers to these questions. The easiest and hardest fitness functions are constructed for an elitist (1+1) evolutionary algorithm to maximise a class of fitness functions with the same optima. It is demonstrated that the unimodal functions are the easiest and deceptive functions are the hardest in terms of the time-fitness landscape. The paper also reveals that the easiest fitness function to one algorithm may become the hardest to another algorithm, and vice versa.

Journal ArticleDOI
TL;DR: A novel quantum genetic algorithm is introduced that has a quantum crossover procedure performing crossovers among all chromosomes in parallel for each generation and a quadratic speedup is achieved over its classical counterpart in the dominant factor of the run time to handle each generation.
Abstract: In the context of evolutionary quantum computing in the literal meaning, a quantum crossover operation has not been introduced so far. Here, we introduce a novel quantum genetic algorithm which has a quantum crossover procedure performing crossovers among all chromosomes in parallel for each generation. A complexity analysis shows that a quadratic speedup is achieved over its classical counterpart in the dominant factor of the run time to handle each generation.

Posted Content
TL;DR: To apply AOPS to (possibly recurrent) neural networks (NNs) and to efficiently teach a SLIM NN to solve many tasks, each connection keeps a list of tasks it is used for, which may be efficiently updated during training.
Abstract: Self-delimiting (SLIM) programs are a central concept of theoretical computer science, particularly algorithmic information & probability theory, and asymptotically optimal program search (AOPS). To apply AOPS to (possibly recurrent) neural networks (NNs), I introduce SLIM NNs. Neurons of a typical SLIM NN have threshold activation functions. During a computational episode, activations are spreading from input neurons through the SLIM NN until the computation activates a special halt neuron. Weights of the NN's used connections define its program. Halting programs form a prefix code. The reset of the initial NN state does not cost more than the latest program execution. Since prefixes of SLIM programs influence their suffixes (weight changes occurring early in an episode influence which weights are considered later), SLIM NN learning algorithms (LAs) should execute weight changes online during activation spreading. This can be achieved by applying AOPS to growing SLIM NNs. To efficiently teach a SLIM NN to solve many tasks, such as correctly classifying many different patterns, or solving many different robot control tasks, each connection keeps a list of tasks it is used for. The lists may be efficiently updated during training. To evaluate the overall effect of currently tested weight changes, a SLIM NN LA needs to re-test performance only on the efficiently computable union of tasks potentially affected by the current weight changes. Future SLIM NNs will be implemented on 3-dimensional brain-like multi-processor hardware. Their LAs will minimize task-specific total wire length of used connections, to encourage efficient solutions of subtasks by subsets of neurons that are physically close. The novel class of SLIM NN LAs is currently being probed in ongoing experiments to be reported in separate papers.

Posted Content
TL;DR: A new methodology to study the structure of the configuration spaces of hard combinatorial problems, which consists in building the network that has as nodes the locally optimal configurations and as edges the weighted oriented transitions between their basins of attraction.
Abstract: In this work we present a new methodology to study the structure of the configuration spaces of hard combinatorial problems. It consists in building the network that has as nodes the locally optimal configurations and as edges the weighted oriented transitions between their basins of attraction. We apply the approach to the detection of communities in the optima networks produced by two different classes of instances of a hard combinatorial optimization problem: the quadratic assignment problem (QAP). We provide evidence indicating that the two problem instance classes give rise to very different configuration spaces. For the so-called real-like class, the networks possess a clear modular structure, while the optima networks belonging to the class of random uniform instances are less well partitionable into clusters. This is convincingly supported by using several statistical tests. Finally, we shortly discuss the consequences of the findings for heuristically searching the corresponding problem spaces.

Proceedings ArticleDOI
TL;DR: The Echo State Queueing Network (ESQN) as mentioned in this paper is inspired by the Random Neural Network (RNN) and uses queueing theory for the design of the reservoir.
Abstract: In the last decade, a new computational paradigm was introduced in the field of Machine Learning, under the name of Reservoir Computing (RC). RC models are neural networks which a recurrent part (the reservoir) that does not participate in the learning process, and the rest of the system where no recurrence (no neural circuit) occurs. This approach has grown rapidly due to its success in solving learning tasks and other computational applications. Some success was also observed with another recently proposed neural network designed using Queueing Theory, the Random Neural Network (RandNN). Both approaches have good properties and identified drawbacks. In this paper, we propose a new RC model called Echo State Queueing Network (ESQN), where we use ideas coming from RandNNs for the design of the reservoir. ESQNs consist in ESNs where the reservoir has a new dynamics inspired by recurrent RandNNs. The paper positions ESQNs in the global Machine Learning area, and provides examples of their use and performances. We show on largely used benchmarks that ESQNs are very accurate tools, and we illustrate how they compare with standard ESNs.

Book ChapterDOI
TL;DR: In this article, the permutation flow shop problem is studied and a deep landscape analysis focused on the neutrality property is driven and propositions on the way to use this neutrality to guide efficiently the search are given.
Abstract: Solving efficiently complex problems using metaheuristics, and in particular local searches, requires incorporating knowledge about the problem to solve. In this paper, the permutation flowshop problem is studied. It is well known that in such problems, several solutions may have the same fitness value. As this neutrality property is an important one, it should be taken into account during the design of optimization methods. Then in the context of the permutation flowshop, a deep landscape analysis focused on the neutrality property is driven and propositions on the way to use this neutrality to guide efficiently the search are given.

Book ChapterDOI
TL;DR: In this article, an approach based on genetic programming and symbolic regression is proposed to identify variable interactions in large datasets, where multiple symbolic regression runs are executed for each variable of the dataset to find potentially interesting models.
Abstract: Macro-economic models describe the dynamics of economic quantities. The estimations and forecasts produced by such models play a substantial role for financial and political decisions. In this contribution we describe an approach based on genetic programming and symbolic regression to identify variable interactions in large datasets. In the proposed approach multiple symbolic regression runs are executed for each variable of the dataset to find potentially interesting models. The result is a variable interaction network that describes which variables are most relevant for the approximation of each variable of the dataset. This approach is applied to a macro-economic dataset with monthly observations of important economic indicators in order to identify potentially interesting dependencies of these indicators. The resulting interaction network of macro-economic indicators is briefly discussed and two of the identified models are presented in detail. The two models approximate the help wanted index and the CPI inflation in the US.

Posted Content
TL;DR: Results suggest that the proposed Mesh Learning approach can provide an effective algorithm for pattern analysis of brain activity during cognitive processing.
Abstract: A relatively recent advance in cognitive neuroscience has been multi-voxel pattern analysis (MVPA), which enables researchers to decode brain states and/or the type of information represented in the brain during a cognitive operation. MVPA methods utilize machine learning algorithms to distinguish among types of information or cognitive states represented in the brain, based on distributed patterns of neural activity. In the current investigation, we propose a new approach for representation of neural data for pattern analysis, namely a Mesh Learning Model. In this approach, at each time instant, a star mesh is formed around each voxel, such that the voxel corresponding to the center node is surrounded by its p-nearest neighbors. The arc weights of each mesh are estimated from the voxel intensity values by least squares method. The estimated arc weights of all the meshes, called Mesh Arc Descriptors (MADs), are then used to train a classifier, such as Neural Networks, k-Nearest Neighbor, Na\"ive Bayes and Support Vector Machines. The proposed Mesh Model was tested on neuroimaging data acquired via functional magnetic resonance imaging (fMRI) during a recognition memory experiment using categorized word lists, employing a previously established experimental paradigm (\"Oztekin & Badre, 2011). Results suggest that the proposed Mesh Learning approach can provide an effective algorithm for pattern analysis of brain activity during cognitive processing.

Posted Content
TL;DR: This paper considers generalizations of ORDER and MAJORITY and presents a computational complexity analysis of (1+1) GP using multi-criteria fitness functions that take into account the original objective and the complexity of a syntax tree as a secondary measure.
Abstract: The computational complexity analysis of genetic programming (GP) has been started recently by analyzing simple (1+1) GP algorithms for the problems ORDER and MAJORITY. In this paper, we study how taking the complexity as an additional criteria influences the runtime behavior. We consider generalizations of ORDER and MAJORITY and present a computational complexity analysis of (1+1) GP using multi-criteria fitness functions that take into account the original objective and the complexity of a syntax tree as a secondary measure. Furthermore, we study the expected time until population-based multi-objective genetic programming algorithms have computed the Pareto front when taking the complexity of a syntax tree as an equally important objective.

Posted Content
TL;DR: It is proved that choosing parents that guarantee feasible offspring results in an even better optimization time of $\mathord{O}(n^{3}\log n)$ and both results show that already simple adjustments of the recombination operator can asymptotically improve the runtime of evolutionary algorithms.
Abstract: The all-pairs shortest path problem is the first non-artificial problem for which it was shown that adding crossover can significantly speed up a mutation-only evolutionary algorithm. Recently, the analysis of this algorithm was refined and it was shown to have an expected optimization time (w.r.t. the number of fitness evaluations) of $\Theta(n^{3.25}(\log n)^{0.25})$. In contrast to this simple algorithm, evolutionary algorithms used in practice usually employ refined recombination strategies in order to avoid the creation of infeasible offspring. We study extensions of the basic algorithm by two such concepts which are central in recombination, namely \emph{repair mechanisms} and \emph{parent selection}. We show that repairing infeasible offspring leads to an improved expected optimization time of $\mathord{O}(n^{3.2}(\log n)^{0.2})$. As a second part of our study we prove that choosing parents that guarantee feasible offspring results in an even better optimization time of $\mathord{O}(n^{3}\log n)$. Both results show that already simple adjustments of the recombination operator can asymptotically improve the runtime of evolutionary algorithms.

Journal ArticleDOI
TL;DR: In this article, the authors surveyed and classified existing literature related to different techniques used by genetic programming research community to deal with bloat and over-fitting issues and pointed out limitation of these techniques, if any.
Abstract: In the field of empirical modeling using Genetic Programming (GP), it is important to evolve solution with good generalization ability. Generalization ability of GP solutions get affected by two important issues: bloat and over-fitting. We surveyed and classified existing literature related to different techniques used by GP research community to deal with these issues. We also point out limitation of these techniques, if any. Moreover, the classification of different bloat control approaches and measures for bloat and over-fitting are also discussed. We believe that this work will be useful to GP practitioners in following ways: (i) to better understand concepts of generalization in GP (ii) comparing existing bloat and over-fitting control techniques and (iii) selecting appropriate approach to improve generalization ability of GP evolved solutions.

Posted Content
TL;DR: The ability of Functional Link Neural Network (FLNN) to overcome the complexity structure of MLP by using single layer architecture is presented and an Artificial Bee Colony (ABC) optimization for training the FLNN is proposed.
Abstract: Artificial Neural Networks have emerged as an important tool for classification and have been widely used to classify a non-linear separable pattern. The most popular artificial neural networks model is a Multilayer Perceptron (MLP) as is able to perform classification task with significant success. However due to the complexity of MLP structure and also problems such as local minima trapping, over fitting and weight interference have made neural network training difficult. Thus, the easy way to avoid these problems is to remove the hidden layers. This paper presents the ability of Functional Link Neural Network (FLNN) to overcome the complexity structure of MLP by using single layer architecture and propose an Artificial Bee Colony (ABC) optimization for training the FLNN. The proposed technique is expected to provide better learning scheme for a classifier in order to get more accurate classification result.

Posted Content
TL;DR: The copulaedas package as discussed by the authors provides a platform where EDAs based on copulas can be implemented and studied, and can be easily integrated into the package by extending an S4 class with generic functions for their main components.
Abstract: The use of copula-based models in EDAs (estimation of distribution algorithms) is currently an active area of research. In this context, the copulaedas package for R provides a platform where EDAs based on copulas can be implemented and studied. The package offers complete implementations of various EDAs based on copulas and vines, a group of well-known optimization problems, and utility functions to study the performance of the algorithms. Newly developed EDAs can be easily integrated into the package by extending an S4 class with generic functions for their main components. This paper presents copulaedas by providing an overview of EDAs based on copulas, a description of the implementation of the package, and an illustration of its use through examples. The examples include running the EDAs defined in the package, implementing new algorithms, and performing an empirical study to compare the behavior of different algorithms on benchmark functions and a real-world problem.

Posted Content
TL;DR: A technique for the automated generation of cross-domain analogies based on a novel evolutionary algorithm based on Gentner's structure mapping theory of analogies is introduced and the feasibility of spontaneously generating semantic networks that are analogous to a given base network is demonstrated.
Abstract: Analogy plays an important role in creativity, and is extensively used in science as well as art. In this paper we introduce a technique for the automated generation of cross-domain analogies based on a novel evolutionary algorithm (EA). Unlike existing work in computational analogy-making restricted to creating analogies between two given cases, our approach, for a given case, is capable of creating an analogy along with the novel analogous case itself. Our algorithm is based on the concept of "memes", which are units of culture, or knowledge, undergoing variation and selection under a fitness measure, and represents evolving pieces of knowledge as semantic networks. Using a fitness function based on Gentner's structure mapping theory of analogies, we demonstrate the feasibility of spontaneously generating semantic networks that are analogous to a given base network.

Posted Content
TL;DR: These trajectories of the IGO are a deterministic continuous time model of the underlying evolution strategy in the limit for population size to infinity and change rates to zero and it is proved the global convergence of the solution of the ODE towards the global optimum.
Abstract: The Information-Geometric Optimization (IGO) has been introduced as a unified framework for stochastic search algorithms. Given a parametrized family of probability distributions on the search space, the IGO turns an arbitrary optimization problem on the search space into an optimization problem on the parameter space of the probability distribution family and defines a natural gradient ascent on this space. From the natural gradients defined over the entire parameter space we obtain continuous time trajectories which are the solutions of an ordinary differential equation (ODE). Via discretization, the IGO naturally defines an iterated gradient ascent algorithm. Depending on the chosen distribution family, the IGO recovers several known algorithms such as the pure rank-\mu update CMA-ES. Consequently, the continuous time IGO-trajectory can be viewed as an idealization of the original algorithm. In this paper we study the continuous time trajectories of the IGO given the family of isotropic Gaussian distributions. These trajectories are a deterministic continuous time model of the underlying evolution strategy in the limit for population size to infinity and change rates to zero. On functions that are the composite of a monotone and a convex-quadratic function, we prove the global convergence of the solution of the ODE towards the global optimum. We extend this result to composites of monotone and twice continuously differentiable functions and prove local convergence towards local optima.

Posted Content
TL;DR: A simple regularization scheme is introduced that encourages the weight vectors associated with each hidden unit to have similar norms and can be easily combined with standard stochastic maximum likelihood to yield an effective training strategy for the simultaneous training of all layers of the deep Boltzmann machine.
Abstract: The deep Boltzmann machine (DBM) has been an important development in the quest for powerful "deep" probabilistic models. To date, simultaneous or joint training of all layers of the DBM has been largely unsuccessful with existing training methods. We introduce a simple regularization scheme that encourages the weight vectors associated with each hidden unit to have similar norms. We demonstrate that this regularization can be easily combined with standard stochastic maximum likelihood to yield an effective training strategy for the simultaneous training of all layers of the deep Boltzmann machine.

Posted Content
TL;DR: It is shown that the power of higher arity operators is much stronger than what the previous O(n/k) bound by Doerr et al. suggests, and the key to this result is an encoding strategy, which might be of independent interest.
Abstract: We show that for all $1

Journal ArticleDOI
TL;DR: In this paper, the comparative performances of two metaheuristic algorithms namely Artificial Immune Systems (AIS) and Genetic Algorithms are classified as computational systems inspired by theoretical immunology and genetics mechanisms.
Abstract: Two metaheuristic algorithms namely Artificial Immune Systems (AIS) and Genetic Algorithms are classified as computational systems inspired by theoretical immunology and genetics mechanisms. In this work we examine the comparative performances of two algorithms. A special selection algorithm, Clonal Selection Algorithm (CLONALG), which is a subset of Artificial Immune Systems, and Genetic Algorithms are tested with certain benchmark functions. It is shown that depending on type of a function Clonal Selection Algorithm and Genetic Algorithm have better performance over each other.

Posted Content
TL;DR: A layer-wise training procedure admitting a performance guarantee compared to the global optimum is proposed based on an optimistic proxy of future performance, the best latent marginal, and is tested against a state of the art method and finds it to improve performance.
Abstract: When using deep, multi-layered architectures to build generative models of data, it is difficult to train all layers at once. We propose a layer-wise training procedure admitting a performance guarantee compared to the global optimum. It is based on an optimistic proxy of future performance, the best latent marginal. We interpret auto-encoders in this setting as generative models, by showing that they train a lower bound of this criterion. We test the new learning procedure against a state of the art method (stacked RBMs), and find it to improve performance. Both theory and experiments highlight the importance, when training deep architectures, of using an inference model (from data to hidden variables) richer than the generative model (from hidden variables to data).

Journal ArticleDOI
TL;DR: Artificial Neural Network has successfully been used for exchange rate forecasting and the effects of the number of inputs and hidden nodes and the size of the training sample on the in-sample and out-of-sample performance are examined.
Abstract: A large part of the workforce, and growing every day, is originally from India. India one of the second largest populations in the world, they have a lot to offer in terms of jobs. The sheer number of IT workers makes them a formidable travelling force as well, easily picking up employment in English speaking countries. The beginning of the economic crises since 2008 September, many Indians have return homeland, and this has had a substantial impression on the Indian Rupee (INR) as liken to the US Dollar (USD). We are using numerational knowledge based techniques for forecasting has been proved highly successful in present time. The purpose of this paper is to examine the effects of several important neural network factors on model fitting and forecasting the behaviours. In this paper, Artificial Neural Network has successfully been used for exchange rate forecasting. This paper examines the effects of the number of inputs and hidden nodes and the size of the training sample on the in-sample and out-of-sample performance. The Indian Rupee (INR) / US Dollar (USD) is used for detailed examinations. The number of input nodes has a greater impact on performance than the number of hidden nodes, while a large number of observations do reduce forecast errors.