scispace - formally typeset
Search or ask a question

Showing papers on "Artificial neural network published in 2011"


Proceedings Article
14 Jun 2011
TL;DR: This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of the hard non-linearity and non-dierentiabil ity.
Abstract: While logistic sigmoid neurons are more biologically plausible than hyperbolic tangent neurons, the latter work better for training multi-layer neural networks. This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of the hard non-linearity and non-dierentiabil ity

6,790 citations


Journal ArticleDOI
TL;DR: This work proposes a novel dimensionality reduction framework for reducing the distance between domains in a latent space for domain adaptation and proposes both unsupervised and semisupervised feature extraction approaches, which can dramatically reduce thedistance between domain distributions by projecting data onto the learned transfer components.
Abstract: Domain adaptation allows knowledge from a source domain to be transferred to a different but related target domain. Intuitively, discovering a good feature representation across domains is crucial. In this paper, we first propose to find such a representation through a new learning method, transfer component analysis (TCA), for domain adaptation. TCA tries to learn some transfer components across domains in a reproducing kernel Hilbert space using maximum mean miscrepancy. In the subspace spanned by these transfer components, data properties are preserved and data distributions in different domains are close to each other. As a result, with the new representations in this subspace, we can apply standard machine learning methods to train classifiers or regression models in the source domain for use in the target domain. Furthermore, in order to uncover the knowledge hidden in the relations between the data labels from the source and target domains, we extend TCA in a semisupervised learning setting, which encodes label information into transfer components learning. We call this extension semisupervised TCA. The main contribution of our work is that we propose a novel dimensionality reduction framework for reducing the distance between domains in a latent space for domain adaptation. We propose both unsupervised and semisupervised feature extraction approaches, which can dramatically reduce the distance between domain distributions by projecting data onto the learned transfer components. Finally, our approach can handle large datasets and naturally lead to out-of-sample generalization. The effectiveness and efficiency of our approach are verified by experiments on five toy datasets and two real-world applications: cross-domain indoor WiFi localization and cross-domain text classification.

3,195 citations


Proceedings Article
12 Dec 2011
TL;DR: This work contributes novel techniques for making response surface models P(y|x) in which many elements of hyper-parameter assignment (x) are known to be irrelevant given particular values of other elements.
Abstract: Several recent advances to the state of the art in image classification benchmarks have come from better configurations of existing techniques rather than novel approaches to feature learning. Traditionally, hyper-parameter optimization has been the job of humans because they can be very efficient in regimes where only a few trials are possible. Presently, computer clusters and GPU processors make it possible to run more trials and we show that algorithmic approaches can find better results. We present hyper-parameter optimization results on tasks of training neural networks and deep belief networks (DBNs). We optimize hyper-parameters using random search and two new greedy sequential methods based on the expected improvement criterion. Random search has been shown to be sufficiently efficient for learning neural networks for several datasets, but we show it is unreliable for training DBNs. The sequential algorithms are applied to the most difficult DBN learning problems from [1] and find significantly better results than the best previously reported. This work contributes novel techniques for making response surface models P(y|x) in which many elements of hyper-parameter assignment (x) are known to be irrelevant given particular values of other elements.

3,088 citations


BookDOI
04 Aug 2011
TL;DR: This book discusses the challenges of dynamic programming, the three curses of dimensionality, and some experimental comparisons of stepsize formulas that led to the creation of ADP for online applications.
Abstract: Preface. Acknowledgments. 1. The challenges of dynamic programming. 1.1 A dynamic programming example: a shortest path problem. 1.2 The three curses of dimensionality. 1.3 Some real applications. 1.4 Problem classes. 1.5 The many dialects of dynamic programming. 1.6 What is new in this book? 1.7 Bibliographic notes. 2. Some illustrative models. 2.1 Deterministic problems. 2.2 Stochastic problems. 2.3 Information acquisition problems. 2.4 A simple modeling framework for dynamic programs. 2.5 Bibliographic notes. Problems. 3. Introduction to Markov decision processes. 3.1 The optimality equations. 3.2 Finite horizon problems. 3.3 Infinite horizon problems. 3.4 Value iteration. 3.5 Policy iteration. 3.6 Hybrid valuepolicy iteration. 3.7 The linear programming method for dynamic programs. 3.8 Monotone policies. 3.9 Why does it work? 3.10 Bibliographic notes. Problems 4. Introduction to approximate dynamic programming. 4.1 The three curses of dimensionality (revisited). 4.2 The basic idea. 4.3 Sampling random variables . 4.4 ADP using the postdecision state variable. 4.5 Lowdimensional representations of value functions. 4.6 So just what is approximate dynamic programming? 4.7 Experimental issues. 4.8 Dynamic programming with missing or incomplete models. 4.9 Relationship to reinforcement learning. 4.10 But does it work? 4.11 Bibliographic notes. Problems. 5. Modeling dynamic programs. 5.1 Notational style. 5.2 Modeling time. 5.3 Modeling resources. 5.4 The states of our system. 5.5 Modeling decisions. 5.6 The exogenous information process. 5.7 The transition function. 5.8 The contribution function. 5.9 The objective function. 5.10 A measuretheoretic view of information. 5.11 Bibliographic notes. Problems. 6. Stochastic approximation methods. 6.1 A stochastic gradient algorithm. 6.2 Some stepsize recipes. 6.3 Stochastic stepsizes. 6.4 Computing bias and variance. 6.5 Optimal stepsizes. 6.6 Some experimental comparisons of stepsize formulas. 6.7 Convergence. 6.8 Why does it work? 6.9 Bibliographic notes. Problems. 7. Approximating value functions. 7.1 Approximation using aggregation. 7.2 Approximation methods using regression models. 7.3 Recursive methods for regression models. 7.4 Neural networks. 7.5 Batch processes. 7.6 Why does it work? 7.7 Bibliographic notes. Problems. 8. ADP for finite horizon problems. 8.1 Strategies for finite horizon problems. 8.2 Qlearning. 8.3 Temporal difference learning. 8.4 Policy iteration. 8.5 Monte Carlo value and policy iteration. 8.6 The actorcritic paradigm. 8.7 Bias in value function estimation. 8.8 State sampling strategies. 8.9 Starting and stopping. 8.10 A taxonomy of approximate dynamic programming strategies. 8.11 Why does it work? 8.12 Bibliographic notes. Problems. 9. Infinite horizon problems. 9.1 From finite to infinite horizon. 9.2 Algorithmic strategies. 9.3 Stepsizes for infinite horizon problems. 9.4 Error measures. 9.5 Direct ADP for online applications. 9.6 Finite horizon models for steady state applications. 9.7 Why does it work? 9.8 Bibliographic notes. Problems. 10. Exploration vs. exploitation. 10.1 A learning exercise: the nomadic trucker. 10.2 Learning strategies. 10.3 A simple information acquisition problem. 10.4 Gittins indices and the information acquisition problem. 10.5 Variations. 10.6 The knowledge gradient algorithm. 10.7 Information acquisition in dynamic programming. 10.8 Bibliographic notes. Problems. 11. Value function approximations for special functions. 11.1 Value functions versus gradients. 11.2 Linear approximations. 11.3 Piecewise linear approximations. 11.4 The SHAPE algorithm. 11.5 Regression methods. 11.6 Cutting planes. 11.7 Why does it work? 11.8 Bibliographic notes. Problems. 12. Dynamic resource allocation. 12.1 An asset acquisition problem. 12.2 The blood management problem. 12.3 A portfolio optimization problem. 12.4 A general resource allocation problem. 12.5 A fleet management problem. 12.6 A driver management problem. 12.7 Bibliographic references. Problems. 13. Implementation challenges. 13.1 Will ADP work for your problem? 13.2 Designing an ADP algorithm for complex problems. 13.3 Debugging an ADP algorithm. 13.4 Convergence issues. 13.5 Modeling your problem. 13.6 Online vs. offline models. 13.7 If it works, patent it!

2,300 citations


Journal ArticleDOI
TL;DR: A survey on Extreme learning machine (ELM) and its variants, especially on (1) batch learning mode of ELM, (2) fully complex ELm, (3) online sequential ELM; and (4) incremental ELM and (5) ensemble ofELM.
Abstract: Computational intelligence techniques have been used in wide applications. Out of numerous computational intelligence techniques, neural networks and support vector machines (SVMs) have been playing the dominant roles. However, it is known that both neural networks and SVMs face some challenging issues such as: (1) slow learning speed, (2) trivial human intervene, and/or (3) poor computational scalability. Extreme learning machine (ELM) as emergent technology which overcomes some challenges faced by other techniques has recently attracted the attention from more and more researchers. ELM works for generalized single-hidden layer feedforward networks (SLFNs). The essence of ELM is that the hidden layer of SLFNs need not be tuned. Compared with those traditional computational intelligence techniques, ELM provides better generalization performance at a much faster learning speed and with least human intervene. This paper gives a survey on ELM and its variants, especially on (1) batch learning mode of ELM, (2) fully complex ELM, (3) online sequential ELM, (4) incremental ELM, and (5) ensemble of ELM.

1,767 citations


Proceedings ArticleDOI
11 Mar 2011
TL;DR: The paper presents a methodology for face recognition based on information theory approach of coding and decoding the face image using Principle Component Analysis and recognition using the feed forward back propagation Neural Network.
Abstract: Face is a complex multidimensional visual model and developing a computational model for face recognition is difficult. The paper presents a methodology for face recognition based on information theory approach of coding and decoding the face image. Proposed methodology is connection of two stages - Feature extraction using Principle Component Analysis and recognition using the feed forward back propagation Neural Network. The goal is to implement the system (model) for a particular face and distinguish it from a large number of stored faces with some real-time variations as well. The Eigenface approach uses Principal Component Analysis (PCA) algorithm for the recognition of the images. It gives us efficient way to find the lower dimensional space.

1,727 citations


Proceedings ArticleDOI
22 May 2011
TL;DR: Several modifications of the original recurrent neural network language model are presented, showing approaches that lead to more than 15 times speedup for both training and testing phases and possibilities how to reduce the amount of parameters in the model.
Abstract: We present several modifications of the original recurrent neural network language model (RNN LM).While this model has been shown to significantly outperform many competitive language modeling techniques in terms of accuracy, the remaining problem is the computational complexity. In this work, we show approaches that lead to more than 15 times speedup for both training and testing phases. Next, we show importance of using a backpropagation through time algorithm. An empirical comparison with feedforward networks is also provided. In the end, we discuss possibilities how to reduce the amount of parameters in the model. The resulting RNN model can thus be smaller, faster both during training and testing, and more accurate than the basic one.

1,675 citations


Proceedings Article
12 Dec 2011
TL;DR: This paper introduces an easy-to-implement stochastic variational method (or equivalently, minimum description length loss function) that can be applied to most neural networks and revisits several common regularisers from a variational perspective.
Abstract: Variational methods have been previously explored as a tractable approximation to Bayesian inference for neural networks. However the approaches proposed so far have only been applicable to a few simple network architectures. This paper introduces an easy-to-implement stochastic variational method (or equivalently, minimum description length loss function) that can be applied to most neural networks. Along the way it revisits several common regularisers from a variational perspective. It also provides a simple pruning heuristic that can both drastically reduce the number of network weights and lead to improved generalisation. Experimental results are provided for a hierarchical multidimensional recurrent neural network applied to the TIMIT speech corpus.

1,341 citations


Proceedings ArticleDOI
16 Jul 2011
TL;DR: A fast, fully parameterizable GPU implementation of Convolutional Neural Network variants and their feature extractors are neither carefully designed nor pre-wired, but rather learned in a supervised way.
Abstract: We present a fast, fully parameterizable GPU implementation of Convolutional Neural Network variants. Our feature extractors are neither carefully designed nor pre-wired, but rather learned in a supervised way. Our deep hierarchical architectures achieve the best published results on benchmarks for object classification (NORB, CIFAR10) and handwritten digit recognition (MNIST), with error rates of 2.53%, 19.51%, 0.35%, respectively. Deep nets trained by simple back-propagation perform better than more shallow ones. Learning is surprisingly rapid. NORB is completely trained within five epochs. Test error rates on MNIST drop to 2.42%, 0.97% and 0.48% after 1, 3 and 17 epochs, respectively.

1,216 citations


Book ChapterDOI
14 Jun 2011
TL;DR: It is argued that neural networks can be used to learn features that output a whole vector of instantiation parameters and this is a much more promising way of dealing with variations in position, orientation, scale and lighting than the methods currently employed in the neural networks community.
Abstract: The artificial neural networks that are used to recognize shapes typically use one or more layers of learned feature detectors that produce scalar outputs. By contrast, the computer vision community uses complicated, hand-engineered features, like SIFT [6], that produce a whole vector of outputs including an explicit representation of the pose of the feature. We show how neural networks can be used to learn features that output a whole vector of instantiation parameters and we argue that this is a much more promising way of dealing with variations in position, orientation, scale and lighting than the methods currently employed in the neural networks community. It is also more promising than the hand-engineered features currently used in computer vision because it provides an efficient way of adapting the features to the domain.

1,120 citations


Journal ArticleDOI
21 Jul 2011-Nature
TL;DR: It is suggested that DNA strand displacement cascades could be used to endow autonomous chemical systems with the capability of recognizing patterns of molecular events, making decisions and responding to the environment.
Abstract: The impressive capabilities of the mammalian brain—ranging from perception, pattern recognition and memory formation to decision making and motor activity control—have inspired their re-creation in a wide range of artificial intelligence systems for applications such as face recognition, anomaly detection, medical diagnosis and robotic vehicle control Yet before neuron-based brains evolved, complex biomolecular circuits provided individual cells with the ‘intelligent’ behaviour required for survival However, the study of how molecules can ‘think’ has not produced an equal variety of computational models and applications of artificial chemical systems Although biomolecular systems have been hypothesized to carry out neural-network-like computations in vivo and the synthesis of artificial chemical analogues has been proposed theoretically, experimental work has so far fallen short of fully implementing even a single neuron Here, building on the richness of DNA computing and strand displacement circuitry, we show how molecular systems can exhibit autonomous brain-like behaviours Using a simple DNA gate architecture that allows experimental scale-up of multilayer digital circuits, we systematically transform arbitrary linear threshold circuits (an artificial neural network model) into DNA strand displacement cascades that function as small neural networks Our approach even allows us to implement a Hopfield associative memory with four fully connected artificial neurons that, after training in silico, remembers four single-stranded DNA patterns and recalls the most similar one when presented with an incomplete pattern Our results suggest that DNA strand displacement cascades could be used to endow autonomous chemical systems with the capability of recognizing patterns of molecular events, making decisions and responding to the environment

01 Jan 2011
TL;DR: This paper uses speech recognition as an example task, and shows that a real-time hybrid hidden Markov model / neural network (HMM/NN) large vocabulary system can be built with a 10× speedup over an unoptimized baseline and a 4× speed up over an aggressively optimized floating-point baseline at no cost in accuracy.
Abstract: Recent advances in deep learning have made the use of large, deep neural networks with tens of millions of parameters suitable for a number of applications that require real-time processing. The sheer size of these networks can represent a challenging computational burden, even for modern CPUs. For this reason, GPUs are routinely used instead to train and run such networks. This paper is a tutorial for students and researchers on some of the techniques that can be used to reduce this computational cost considerably on modern x86 CPUs. We emphasize data layout, batching of the computation, the use of SSE2 instructions, and particularly leverage SSSE3 and SSE4 fixed-point instructions which provide a 3× improvement over an optimized floating-point baseline. We use speech recognition as an example task, and show that a real-time hybrid hidden Markov model / neural network (HMM/NN) large vocabulary system can be built with a 10× speedup over an unoptimized baseline and a 4× speedup over an aggressively optimized floating-point baseline at no cost in accuracy. The techniques described extend readily to neural network training and provide an effective alternative to the use of specialized hardware.

Journal ArticleDOI
TL;DR: This study attempted to develop two efficient models and compared their performances in predicting the direction of movement in the daily Istanbul Stock Exchange (ISE) National 100 Index, finding that average performance of ANN model was found significantly better than that of SVM model.
Abstract: Prediction of stock price index movement is regarded as a challenging task of financial time series prediction. An accurate prediction of stock price movement may yield profits for investors. Due to the complexity of stock market data, development of efficient models for predicting is very difficult. This study attempted to develop two efficient models and compared their performances in predicting the direction of movement in the daily Istanbul Stock Exchange (ISE) National 100 Index. The models are based on two classification techniques, artificial neural networks (ANN) and support vector machines (SVM). Ten technical indicators were selected as inputs of the proposed models. Two comprehensive parameter setting experiments for both models were performed to improve their prediction performances. Experimental results showed that average performance of ANN model (75.74%) was found significantly better than that of SVM model (71.52%).

Journal ArticleDOI
TL;DR: A new method for the detection of P300 waves is presented, based on a convolutional neural network (CNN), which provides a new way for analyzing brain activities due to the receptive field of the CNN models.
Abstract: A Brain-Computer Interface (BCI) is a specific type of human-computer interface that enables the direct communication between human and computers by analyzing brain measurements. Oddball paradigms are used in BCI to generate event-related potentials (ERPs), like the P300 wave, on targets selected by the user. A P300 speller is based on this principle, where the detection of P300 waves allows the user to write characters. The P300 speller is composed of two classification problems. The first classification is to detect the presence of a P300 in the electroencephalogram (EEG). The second one corresponds to the combination of different P300 responses for determining the right character to spell. A new method for the detection of P300 waves is presented. This model is based on a convolutional neural network (CNN). The topology of the network is adapted to the detection of P300 waves in the time domain. Seven classifiers based on the CNN are proposed: four single classifiers with different features set and three multiclassifiers. These models are tested and compared on the Data set II of the third BCI competition. The best result is obtained with a multiclassifier solution with a recognition rate of 95.5 percent, without channel selection before the classification. The proposed approach provides also a new way for analyzing brain activities due to the receptive field of the CNN models.

Journal ArticleDOI
TL;DR: The effectiveness of neural network models which are known to be dynamic and effective in stock-market predictions are evaluated, including multi-layer perceptron (MLP), dynamic artificial neural network (DAN2) and the hybrid neural networks which use generalized autoregressive conditional heteroscedasticity (GARCH) to extract new input variables.
Abstract: Forecasting stock exchange rates is an important financial problem that is receiving increasing attention. During the last few years, a number of neural network models and hybrid models have been proposed for obtaining accurate prediction results, in an attempt to outperform the traditional linear and nonlinear approaches. This paper evaluates the effectiveness of neural network models which are known to be dynamic and effective in stock-market predictions. The models analysed are multi-layer perceptron (MLP), dynamic artificial neural network (DAN2) and the hybrid neural networks which use generalized autoregressive conditional heteroscedasticity (GARCH) to extract new input variables. The comparison for each model is done in two view points: Mean Square Error (MSE) and Mean Absolute Deviate (MAD) using real exchange daily rate values of NASDAQ Stock Exchange index.

Journal ArticleDOI
01 Mar 2011
TL;DR: Empirical results with three well-known real data sets indicate that the proposed model can be an effective way to improve forecasting accuracy achieved by traditional hybrid models and also either of the components models used separately.
Abstract: Improving forecasting especially time series forecasting accuracy is an important yet often difficult task facing decision makers in many areas. Both theoretical and empirical findings have indicated that integration of different models can be an effective way of improving upon their predictive performance, especially when the models in combination are quite different. Artificial neural networks (ANNs) are flexible computing frameworks and universal approximators that can be applied to a wide range of forecasting problems with a high degree of accuracy. However, using ANNs to model linear problems have yielded mixed results, and hence; it is not wise to apply ANNs blindly to any type of data. Autoregressive integrated moving average (ARIMA) models are one of the most popular linear models in time series forecasting, which have been widely applied in order to construct more accurate hybrid models during the past decade. Although, hybrid techniques, which decompose a time series into its linear and nonlinear components, have recently been shown to be successful for single models, these models have some disadvantages. In this paper, a novel hybridization of artificial neural networks and ARIMA model is proposed in order to overcome mentioned limitation of ANNs and yield more general and more accurate forecasting model than traditional hybrid ARIMA-ANNs models. In our proposed model, the unique advantages of ARIMA models in linear modeling are used in order to identify and magnify the existing linear structure in data, and then a neural network is used in order to determine a model to capture the underlying data generating process and predict, using preprocessed data. Empirical results with three well-known real data sets indicate that the proposed model can be an effective way to improve forecasting accuracy achieved by traditional hybrid models and also either of the components models used separately.

Proceedings ArticleDOI
01 Nov 2011
TL;DR: This work uses a state-of-the-art big and deep neural network combining convolution and max-pooling for supervised feature learning and classification of hand gestures given by humans to mobile robots using colored gloves.
Abstract: Automatic recognition of gestures using computer vision is important for many real-world applications such as sign language recognition and human-robot interaction (HRI). Our goal is a real-time hand gesture-based HRI interface for mobile robots. We use a state-of-the-art big and deep neural network (NN) combining convolution and max-pooling (MPCNN) for supervised feature learning and classification of hand gestures given by humans to mobile robots using colored gloves. The hand contour is retrieved by color segmentation, then smoothened by morphological image processing which eliminates noisy edges. Our big and deep MPCNN classifies 6 gesture classes with 96% accuracy, nearly three times better than the nearest competitor. Experiments with mobile robots using an ARM 11 533MHz processor achieve real-time gesture recognition performance.

Proceedings ArticleDOI
01 Dec 2011
TL;DR: This work describes how to effectively train neural network based language models on large data sets and introduces hash-based implementation of a maximum entropy model, that can be trained as a part of the neural network model.
Abstract: We describe how to effectively train neural network based language models on large data sets. Fast convergence during training and better overall performance is observed when the training data are sorted by their relevance. We introduce hash-based implementation of a maximum entropy model, that can be trained as a part of the neural network model. This leads to significant reduction of computational complexity. We achieved around 10% relative reduction of word error rate on English Broadcast News speech recognition task, against large 4-gram model trained on 400M tokens.

Journal ArticleDOI
TL;DR: The aim of this paper is to present, in a tutorial manner, an initial framework for the possible development of fully asynchronous STDP learning neuromorphic architectures exploiting two or three-terminal memristive type devices.
Abstract: In this paper we present a very exciting overlap between emergent nano technology and neuroscience. We are linking one type of memristor nano technology devices to the biological synaptic update rule known as Spike-Time-Dependent-Plasticity found in real biological synapses. Understanding this link allows neuromorphic engineers to develop circuit architectures that use this type of memristors to artificially emulate parts of the visual cortex. We focus on the type of memristors referred to as voltage driven memristors and focus our discussions on a behavioral macro model for such devices. The implementations result in fully asynchronous architectures with neurons sending their action potentials not only forwards but also backwards. One critical aspect is to use neurons that generate spikes of specific shapes. By changing the shapes of the neuron action potential spikes we can tune and manipulate the STDP learning rules for both excitatory and inhibitory synapses. We show how neurons and memristors can be interconnected to achieve large scale spiking learning systems, that follow a type of multiplicative STDP learning rule. We briefly extend the architectures to use three-terminal transistors with similar memristive behavior. We illustrate how a V1 visual cortex layer can assembled and how it is capable of learning to extract orientations from visual data coming from a real artificial CMOS spiking retina observing real life scenes. Finally, we discuss limitations of currently available memristors. The results presented are based on behavioral simulations and do not take into account non-idealities of devices and interconnects. The aim of this paper is to present, in a tutorial manner, an initial framework for the possible development of fully asynchronous STDP learning neuromorphic architectures exploiting two or three terminal memristive type devices. All files used for the simulations are made available through the journal web site.

01 Jan 2011
TL;DR: The results of applying the artificial neural networks methodology to acute nephritis diagnosis based upon selected symptoms show abilities of the network to learn the patterns corresponding to symptoms of the person.
Abstract: Artificial neural networks are finding many uses in the medical diagnosis application The goal of this paper is to evaluate artificial neural network in disease diagnosis Two cases are studied The first one is acute nephritis disease; data is the disease symptoms The second is the heart disease; data is on cardiac Single Proton Emission Computed Tomography (SPECT) images Each patient classified into two categories: infected and non-infected Classification is an important tool in medical diagnosis decision support Feed-forward back propagation neural network is used as a classifier to distinguish between infected or non-infected person in both cases The results of applying the artificial neural networks methodology to acute nephritis diagnosis based upon selected symptoms show abilities of the network to learn the patterns corresponding to symptoms of the person In this study, the data were obtained from UCI machine learning repository in order to diagnosed diseases The data is separated into inputs and targets The targets for the neural network will be identified with 1's as infected and will be identified with 0's as non-infected In the diagnosis of acute nephritis disease; the percent correctly classified in the simulation sample by the feed-forward back propagation network is 99 percent while in the diagnosis of heart disease; the percent correctly classified in the simulation sample by the feed-forward back propagation network is 95 percent

Journal ArticleDOI
TL;DR: Experimental results reveal that the three ensemble methods can substantially improve individual base learners, and in particular, Bagging performs better than Boosting across all credit datasets.
Abstract: Both statistical techniques and Artificial Intelligence (AI) techniques have been explored for credit scoring, an important finance activity. Although there are no consistent conclusions on which ones are better, recent studies suggest combining multiple classifiers, i.e., ensemble learning, may have a better performance. In this study, we conduct a comparative assessment of the performance of three popular ensemble methods, i.e., Bagging, Boosting, and Stacking, based on four base learners, i.e., Logistic Regression Analysis (LRA), Decision Tree (DT), Artificial Neural Network (ANN) and Support Vector Machine (SVM). Experimental results reveal that the three ensemble methods can substantially improve individual base learners. In particular, Bagging performs better than Boosting across all credit datasets. Stacking and Bagging DT in our experiments, get the best performance in terms of average accuracy, type I error and type II error.

Journal ArticleDOI
TL;DR: A neural network model is proposed and it is shown by a rigorous theoretical analysis that its neural activity implements MCMC sampling of a given distribution, both for the case of discrete and continuous time.
Abstract: The organization of computations in networks of spiking neurons in the brain is still largely unknown, in particular in view of the inherently stochastic features of their firing activity and the experimentally observed trial-to-trial variability of neural systems in the brain. In principle there exists a powerful computational framework for stochastic computations, probabilistic inference by sampling, which can explain a large number of macroscopic experimental data in neuroscience and cognitive science. But it has turned out to be surprisingly difficult to create a link between these abstract models for stochastic computations and more detailed models of the dynamics of networks of spiking neurons. Here we create such a link and show that under some conditions the stochastic firing activity of networks of spiking neurons can be interpreted as probabilistic inference via Markov chain Monte Carlo (MCMC) sampling. Since common methods for MCMC sampling in distributed systems, such as Gibbs sampling, are inconsistent with the dynamics of spiking neurons, we introduce a different approach based on non-reversible Markov chains that is able to reflect inherent temporal processes of spiking neuronal activity through a suitable choice of random variables. We propose a neural network model and show by a rigorous theoretical analysis that its neural activity implements MCMC sampling of a given distribution, both for the case of discrete and continuous time. This provides a step towards closing the gap between abstract functional models of cortical computation and more detailed models of networks of spiking neurons.

Proceedings ArticleDOI
03 Oct 2011
TL;DR: This work describes the approach that won the preliminary phase of the German traffic sign recognition benchmark with a better-than-human recognition rate, and obtains an even better recognition rate by further training the nets.
Abstract: We describe the approach that won the preliminary phase of the German traffic sign recognition benchmark with a better-than-human recognition rate of 98.98%.We obtain an even better recognition rate of 99.15% by further training the nets. Our fast, fully parameterizable GPU implementation of a Convolutional Neural Network does not require careful design of pre-wired feature extractors, which are rather learned in a supervised way. A CNN/MLP committee further boosts recognition performance.

Journal ArticleDOI
TL;DR: The recently developed algorithm is introduced for designing compact RBF networks and performing efficient training process, including their generalization ability, tolerance to input noise, and online learning ability.
Abstract: Radial basis function (RBF) networks have advantages of easy design, good generalization, strong tolerance to input noise, and online learning ability. The properties of RBF networks make it very suitable to design flexible control systems. This paper presents a review on different approaches of designing and training RBF networks. The recently developed algorithm is introduced for designing compact RBF networks and performing efficient training process. At last, several problems are applied to test the main properties of RBF networks, including their generalization ability, tolerance to input noise, and online learning ability. RBF networks are also compared with traditional neural networks and fuzzy inference systems.

Book
13 Oct 2011
TL;DR: A Neural Processor for Maze Solving and Issues in Analog VLSI and MOS Techniques for Neural Computing are discussed.
Abstract: 1. A Neural Processor for Maze Solving.- 2 Resistive Fuses: Analog Hardware for Detecting Discontinuities in Early Vision.- 3 CMOS Integration of Herault-Jutten Cells for Separation of Sources.- 4 Circuit Models of Sensory Transduction in the Cochlea.- 5 Issues in Analog VLSI and MOS Techniques for Neural Computing.- 6 Design and Fabrication of VLSI Components for a General Purpose Analog Neural Computer.- 7 A Chip that Focuses an Image on Itself.- 8 A Foveated Retina-Like Sensor Using CCD Technology.- 9 Cooperative Stereo Matching Using Static and Dynamic Image Features.- 10 Adaptive Retina.

Journal ArticleDOI
TL;DR: In this paper, passivity analysis is conducted for discrete-time stochastic neural networks with both Markovian jumping parameters and mixed time delays by introducing a Lyapunov functional that accounts for the Mixed time delays.
Abstract: In this paper, passivity analysis is conducted for discrete-time stochastic neural networks with both Markovian jumping parameters and mixed time delays. The mixed time delays consist of both discrete and distributed delays. The Markov chain in the underlying neural networks is finite piecewise homogeneous. By introducing a Lyapunov functional that accounts for the mixed time delays, a delay-dependent passivity condition is derived in terms of the linear matrix inequality approach. The case of Markov chain with partially unknown transition probabilities is also considered. All the results presented depend upon not only discrete delay but also distributed delay. A numerical example is included to demonstrate the effectiveness of the proposed methods.

Journal ArticleDOI
TL;DR: By using Lyapunov analysis, it is proven that all the signals in the closed-loop system is the semi-globally uniformly ultimately bounded and the output errors converge to a compact set.
Abstract: This brief studies an adaptive neural output feedback tracking control of uncertain nonlinear multi-input-multi-output (MIMO) systems in the discrete-time form. The considered MIMO systems are composed of n subsystems with the couplings of inputs and states among subsystems. In order to solve the noncausal problem and decouple the couplings, it needs to transform the systems into a predictor form. The higher order neural networks are utilized to approximate the desired controllers. By using Lyapunov analysis, it is proven that all the signals in the closed-loop system is the semi-globally uniformly ultimately bounded and the output errors converge to a compact set. In contrast to the existing results, the advantage of the scheme is that the number of the adjustable parameters is highly reduced. The effectiveness of the scheme is verified by a simulation example.

Proceedings ArticleDOI
20 Oct 2011
TL;DR: A new architecture is proposed to overcome scalable learning algorithms for networks of spiking neurons in silicon by combining innovations in computation, memory, and communication to leverage robust digital neuron circuits and novel transposable SRAM arrays.
Abstract: Efforts to achieve the long-standing dream of realizing scalable learning algorithms for networks of spiking neurons in silicon have been hampered by (a) the limited scalability of analog neuron circuits; (b) the enormous area overhead of learning circuits, which grows with the number of synapses; and (c) the need to implement all inter-neuron communication via off-chip address-events. In this work, a new architecture is proposed to overcome these challenges by combining innovations in computation, memory, and communication, respectively, to leverage (a) robust digital neuron circuits; (b) novel transposable SRAM arrays that share learning circuits, which grow only with the number of neurons; and (c) crossbar fan-out for efficient on-chip inter-neuron communication. Through tight integration of memory (synapses) and computation (neurons), a highly configurable chip comprising 256 neurons and 64K binary synapses with on-chip learning based on spike-timing dependent plasticity is demonstrated in 45nm SOI-CMOS. Near-threshold, event-driven operation at 0.53V is demonstrated to maximize power efficiency for real-time pattern classification, recognition, and associative memory tasks. Future scalable systems built from the foundation provided by this work will open up possibilities for ubiquitous ultra-dense, ultra-low power brain-like cognitive computers.

Book ChapterDOI
11 Apr 2011
TL;DR: Ann models are too often developed without due consideration given to the effect that the choice of input variables has on model complexity, learning difficulty, and performance of the subsequently trained ANN.
Abstract: The choice of input variables is a fundamental, and yet crucial consideration in identifying the optimal functional form of statistical models. The task of selecting input variables is common to the development of all statistical models, and is largely dependent on the discovery of relationships within the available data to identify suitable predictors of the model output. In the case of parametric, or semi-parametric empirical models, the difficulty of the input variable selection task is somewhat alleviated by the a priori assumption of the functional form of the model, which is based on some physical interpretation of the underlying system or process being modelled. However, in the case of artificial neural networks (ANNs), and other similarly data-driven statistical modelling approaches, there is no such assumption made regarding the structure of the model. Instead, the input variables are selected from the available data, and the model is developed subsequently. The difficulty of selecting input variables arises due to (i) the number of available variables, which may be very large; (ii) correlations between potential input variables, which creates redundancy; and (iii) variables that have little or no predictive power. Variable subset selection has been a longstanding issue in fields of applied statistics dealing with inference and linear regression (Miller, 1984), and the advent of ANN models has only served to create new challenges in this field. The non-linearity, inherent complexity and non-parametric nature of ANN regression make it difficult to apply many existing analytical variable selection methods. The difficulty of selecting input variables is further exacerbated during ANN development, since the task of selecting inputs is often delegated to the ANN during the learning phase of development. A popular notion is that an ANN is adequately capable of identifying redundant and noise variables during training, and that the trained network will use only the salient input variables. ANN architectures can be built with arbitrary flexibility and can be successfully trained using any combination of input variables (assuming they are good predictors). Consequently, allowances are often made for a large number of input variables, with the belief that the ability to incorporate such flexibility and redundancy creates a more robust model. Such pragmatism is perhaps symptomatic of the popularisation of ANN models through machine learning, rather than statistical learning theory. ANN models are too often developed without due consideration given to the effect that the choice of input variables has on model complexity, learning difficulty, and performance of the subsequently trained ANN. 1

Journal ArticleDOI
TL;DR: Behavioral analysis of different number of hidden layers and differentNumber of hidden neurons is discussed, it's very difficult to select number ofhidden layers and hidden neurons.
Abstract: The terms "Neural Network" (NN) and "Artificial Neural Network" (ANN) usually refer to a Multilayer Perceptron Network. It process the records one at a time, and "learn" by comparing their prediction of the record with the known actual record. The problem of model selection is considerably important for acquiring higher levels of generalization capability in supervised learning. This paper discussed behavioral analysis of different number of hidden layers and different number of hidden neurons. It's very difficult to select number of hidden layers and hidden neurons. There are different methods like Akaike's Information Criterion, Inverse test method and some traditional methods are used to find Neural Network architecture. What to do while neural network is not getting train or errors are not getting reduced. To reduce Neural Network errors, what we have to do with Neural Network architecture. These types of techniques are discussed and also discussed experiment and result. To solve different problems a neural network should be trained to perform correct classification..