scispace - formally typeset
Search or ask a question

Showing papers on "Artificial neural network published in 1996"


Journal ArticleDOI
TL;DR: The SExtractor ( Source Extractor) as mentioned in this paper is an automated software that optimally detects, deblends, measures and classifies sources from astronomical images, which is particularly suited to the analysis of large extragalactic surveys.
Abstract: We present the automated techniques we have developed for new software that optimally detects, deblends, measures and classifies sources from astronomical images: SExtractor ( Source Extractor ). We show that a very reliable star/galaxy separation can be achieved on most images using a neural network trained with simulated images. Salient features of SExtractor include its ability to work on very large images, with minimal human intervention, and to deal with a wide variety of object shapes and magnitudes. It is therefore particularly suited to the analysis of large extragalactic surveys.

10,983 citations


Journal ArticleDOI
TL;DR: A package of computer programs for analysis and visualization of three-dimensional human brain functional magnetic resonance imaging (FMRI) results is described and techniques for automatically generating transformed functional data sets from manually labeled anatomical data sets are described.

10,002 citations


Book
01 Jan 1996
TL;DR: Professor Ripley brings together two crucial ideas in pattern recognition; statistical methods and machine learning via neural networks in this self-contained account.
Abstract: From the Publisher: Pattern recognition has long been studied in relation to many different (and mainly unrelated) applications, such as remote sensing, computer vision, space research, and medical imaging. In this book Professor Ripley brings together two crucial ideas in pattern recognition; statistical methods and machine learning via neural networks. Unifying principles are brought to the fore, and the author gives an overview of the state of the subject. Many examples are included to illustrate real problems in pattern recognition and how to overcome them.This is a self-contained account, ideal both as an introduction for non-specialists readers, and also as a handbook for the more expert reader.

5,632 citations


Journal ArticleDOI
01 Mar 1996
TL;DR: The article discusses the motivations behind the development of ANNs and describes the basic biological neuron and the artificial computational model, and outlines network architectures and learning processes, and presents some of the most commonly used ANN models.
Abstract: Artificial neural nets (ANNs) are massively parallel systems with large numbers of interconnected simple processors. The article discusses the motivations behind the development of ANNs and describes the basic biological neuron and the artificial computational model. It outlines network architectures and learning processes, and presents some of the most commonly used ANN models. It concludes with character recognition, a successful ANN application.

4,281 citations


Book
12 Jul 1996
TL;DR: The authors may not be able to make you love reading, but neural networks a systematic introduction will lead you to love reading starting from now.
Abstract: We may not be able to make you love reading, but neural networks a systematic introduction will lead you to love reading starting from now. Book is the window to open the new world. The world that you want is in the better stage and level. World will always guide you to even the prestige stage of the life. You know, this is some of how reading will give you the kindness. In this case, more books you read more knowledge you know, but it can mean also the bore is full.

2,278 citations


Journal ArticleDOI
TL;DR: In this article, the optimal data selection techniques have been used with feed-forward neural networks and showed how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression.
Abstract: For many types of machine learning algorithms, one can compute the statistically "optimal" way to select training data. In this paper, we review how optimal data selection techniques have been used with feedforward neural networks. We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are computationally expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate. Empirically, we observe that the optimality criterion sharply decreases the number of training examples the learner needs in order to achieve good performance.

2,122 citations


Journal ArticleDOI
TL;DR: An overview of the features of neural networks and logistic regression is presented, and the advantages and disadvantages of using this modeling technique are discussed.

1,564 citations


Book
01 May 1996
TL;DR: Neural Fuzzy Systems provides a comprehensive, up-to-date introduction to the basic theories of fuzzy systems and neural networks, as well as an exploration of how these two fields can be integrated to create Neural-Fuzzy systems.
Abstract: Neural Fuzzy Systems provides a comprehensive, up-to-date introduction to the basic theories of fuzzy systems and neural networks, as well as an exploration of how these two fields can be integrated to create Neural-Fuzzy Systems It includes Matlab software, with a Neural Network Toolkit, and a Fuzzy System Toolkit

1,545 citations


Journal ArticleDOI
TL;DR: It is shown that one cannot say: if empirical misclassification rate is low, the Vapnik-Chervonenkis dimension of your generalizer is small, and the training set is large, then with high probability your OTS error is small.
Abstract: This is the first of two papers that use off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in which there are such distinctions.) In this first paper it is shown, loosely speaking, that for any two algorithms A and B, there are “as many” targets (or priors over targets) for which A has lower expected OTS error than B as vice versa, for loss functions like zero-one loss. In particular, this is true if A is cross-validation and B is “anti-cross-validation” (choose the learning algorithm with largest cross-validation error). This paper ends with a discussion of the implications of these results for computational learning theory. It is shown that one cannot say: if empirical misclassification rate is low, the Vapnik-Chervonenkis dimension of your generalizer is small, and the training set is large, then with high probability your OTS error is small. Other implications for “membership queries” algorithms and “punting” algorithms are also discussed.

1,371 citations


Book ChapterDOI
TL;DR: This chapter describes three prediction methods that use evolutionary information as input to neural network systems to predict secondary structure (PHDsec), relative solvent accessibility, and transmembrane helices (P HDhtm).
Abstract: Publisher Summary The first step in a PHD prediction is generating a multiple sequence alignment. The second step involves feeding the alignment into a neural network system. Correctness of the multiple sequence alignment is as crucial for prediction accuracy as is the fact that the alignment contains a broad spectrum of homologous sequences. This chapter describes three prediction methods that use evolutionary information as input to neural network systems to predict secondary structure (PHDsec), relative solvent accessibility (PHDacc), and transmembrane helices (PHDhtm). It illustrates the possibilities and limitations in practical applications of these methods with results from careful cross-validation experiments on large sets of unique protein structures. All predictions are made available by an automatic Email prediction service. The baseline conclusion after some 30,000 requests to the service is that 1-D predictions have become accurate enough to be used as a starting point for the expert-driven modeling of protein structure.

1,316 citations


Journal ArticleDOI
TL;DR: A design methodology is developed that expands the class of nonlinear systems that adaptive neural control schemes can be applied to and relaxes some of the restrictive assumptions that are usually made.
Abstract: Based on the Lyapunov synthesis approach, several adaptive neural control schemes have been developed during the last few years. So far, these schemes have been applied only to simple classes of nonlinear systems. This paper develops a design methodology that expands the class of nonlinear systems that adaptive neural control schemes can be applied to and relaxes some of the restrictive assumptions that are usually made. One such assumption is the requirement of a known bound on the network reconstruction error. The overall adaptive scheme is shown to guarantee semiglobal uniform ultimate boundedness. The proposed feedback control law is a smooth function of the state.

Journal Article
TL;DR: It is shown that networks of spiking neurons are, with regard to the number of neurons that are needed, computationally more powerful than these other neural network models based on McCulloch Pitts neurons, respectively, sigmoidal gates.
Abstract: -The computational power of formal models for networks of spiking neurons is compared with that of other neural network models based on McCulloch Pitts neurons (i.e., threshold gates), respectively, sigmoidal gates. In particular it is shown that networks of spiking neurons are, with regard to the number of neurons that are needed, computationally more powerful than these other neural network models. A concrete biologically relevant function is exhibited which can be computed by a single spiking neuron (for biologically reasonable values o f its parameters), but which requires hundreds of hidden units on a sigmoidal neural net. On the other hand, it is known that any function that can be computed by a small sigmoidal neural net can also be computed by a small network of spiking neurons. This article does not assume prior knowledge about spiking neurons, and it contains an extensive list o f references to the currently available literature on computations in networks of spiking neurons and relevant results from neurobiology. © 1997 Elsevier Science Ltd. All rights reserved. Keywords--Spiking neuron, Integrate-and-fire neutron, Computational complexity, Sigmoidal neural nets, Lower bounds. 1. D E F I N I T I O N S AND M O T I V A T I O N S If one classifies neural network models according to their computational units, one can distinguish three different generations. The f irst generation is based on M c C u l l o c h P i t t s neurons as computational units. These are also referred to as perceptrons or threshold gates. They give rise to a variety of neural network models such as multilayer perceptrons (also called threshold circuits), Hopfield nets, and Boltzmann machines. A characteristic feature of these models is that they can only give digital output. In fact they are universal for computations with digital input and output, and every boolean function can be computed by some multilayer perceptron with a single hidden layer. The second generation is based on computational units that apply an "activation function" with a continuous set of possible output values to a weighted sum (or polynomial) of the inputs. Common activation functions are the s igmoid func t ion a(y) = 1/(1 + e -y) and the linear Acknowledgements: I would like to thank Eduardo Sontag and an anonymous referee for their helpful comments. Written under partial support by the Austrian Science Fund. Requests for reprints should be sent to W. Maass, Institute for Theoretical Computer Science, Technische Universit~it Graz, Klosterwiesgasse 32/2, A-8010, Graz, Austria; tel. +43 316 873-5822; fax: +43 316 873-5805; e-mail: maass@igi,tu-graz.ac.at saturated function 7r with 7r(y) = y for 0 --< y --< 1, 7r(y) = 0 for y < 0, lr(y) = 1 for y > 1. Besides piecewise polynomial activation functions we consider in this paper also "piecewise exponential" activation functions, whose pieces can be defined by expressions involving exponentiation (such as the definition of a). Typical examples for networks from this second generation are feedforward and recurrent sigmoidal neural nets, as well as networks of radial basis function units. These nets are also able to compute (with the help of thresholding at the network output) arbitrary boolean functions. Actually it has been shown that neural nets from the second generation can compute certain boolean functions with f e w e r gates than neural nets from the first generation (Maass, Schnitger, & Sontag, 1991; DasGupta & Schnitger, 1993). In addition, neural nets from the second generation are able to compute functions with analog input and output. In fact they are universal for analog computations in the sense that any continuous function with a compact domain and range can be approximated arbitrarily well (with regard to uniform convergence, i.e., the L= norm) by a network of this type with a single hidden layer. Another characteristic feature of this second generation of neural network models is that they support learning algorithms that are based on gradient descent such as backprop.

Journal ArticleDOI
TL;DR: An eight-step procedure to design a neural network forecasting model is explained including a discussion of tradeoffs in parameter selection, some common pitfalls, and points of disagreement among practitioners.

Journal ArticleDOI
TL;DR: A multilayer neural-net (NN) controller for a general serial-link rigid robot arm is developed using a filtered error/passivity approach and novel online weight tuning algorithms guarantee bounded tracking errors as well as bounded NN weights.
Abstract: A multilayer neural-net (NN) controller for a general serial-link rigid robot arm is developed. The structure of the NN controller is derived using a filtered error/passivity approach. No off-line learning phase is needed for the proposed NN controller and the weights are easily initialized. The nonlinear nature of the NN, plus NN functional reconstruction inaccuracies and robot disturbances, mean that the standard delta rule using backpropagation tuning does not suffice for closed-loop dynamic control. Novel online weight tuning algorithms, including correction terms to the delta rule plus an added robust signal, guarantee bounded tracking errors as well as bounded NN weights. Specific bounds are determined, and the tracking error bound can be made arbitrarily small by increasing a certain feedback gain. The correction terms involve a second-order forward-propagated wave in the backpropagation network. New NN properties including the notions of a passive NN, a dissipative NN, and a robust NN are introduced.

Book
11 Oct 1996
TL;DR: This paper presents a meta-modelling framework that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of supervised learning of neural networks.
Abstract: History of neural networks supervised learning - single-layer networks supervised learning - multilayer networks I supervised learning - multilayer networks II unsupervised learning associative models optimization methods a little math data.

Journal ArticleDOI
TL;DR: The mathematical connection between the Expectation-Maximization (EM) algorithm and gradient-based approaches for maximum likelihood learning of finite gaussian mixtures is built up and an explicit expression for the matrix is provided.
Abstract: We build up the mathematical connection between the “Expectation-Maximization” (EM) algorithm and gradient-based approaches for maximum likelihood learning of finite gaussian mixtures. We show that the EM step in parameter space is obtained from the gradient via a projection matrix P, and we provide an explicit expression for the matrix. We then analyze the convergence of EM in terms of special properties of P and provide new results analyzing the effect that P has on the likelihood surface. Based on these mathematical results, we present a comparative discussion of the advantages and disadvantages of EM and other algorithms for the learning of gaussian mixture models.


Book
01 Jan 1996
TL;DR: A review of Linear Algebra, Principal Component Analysis, and VLSI Implementation.
Abstract: A Review of Linear Algebra. Principal Component Analysis. PCA Neural Networks. Channel Noise and Hidden Units. Heteroassociative Models. Signal Enhancement Against Noise. VLSI Implementation. Appendices. Bibliography. Index.

Book
01 Jan 1996
TL;DR: Techniques for building neural networks: introduction what are neural networks?
Abstract: Techniques for building neural networks: introduction what are neural networks? how does neural computing differ from traditional programming? how are neural networks built? how do neural networks learn? what do I need to build an MLP? the neural project life cycle the generalisation-accuracy trade-off implementation details activation and learning equations a simple example: modelling a pendulum. Dat encoding and re-coding: introduction data type classification initial statistical calculations dimensionality reduction scaling a data set neural encoding methods temporal data when to carry out re-coding implementation details. Building a network: introduction designing the MLP training neural networks implementations details. Time varying systems: time varying data sets neural networks for predicting or classifying time series choosing the best method for the task predicting more than one step into the future learning separate paths through state space recurrent networks as models of finite state automata summary of temporal neural networks. Data collection and validation: data collection building the training and test sets data quality calculating entropy values for a data set using a forward-inverse model to serve III posed problems. Output and error analysis: introduction what do the errors mean? error bars and confidence limits methods for visualising errors novelty detection implementation details a simple two class example unbalanced data: a mail shot targeting example auto-associative network novelty detection training a network on confidence limits an example based on credit rating. Network use and analysis: introduction extracting reasons traversing a network summary calculating the derivatives personnel selection: a worked example. Managing a neural network based project: project context development platform project personnel project costs the benefits of neural computing the risks involved with neural computing alternatives to a neural computing approach project time scale project documentation system maintenance. Review of neural applications: introduction to part II. Neural networks and signal processing: introduction signal processing as data preparation pre-processing techniques for visual processing neural filters in the Fourier and temporal domains speech recognition production quality control an artistic style classifier fingerprint analysis summary. Financial and business modelling. (Part contents).

Journal ArticleDOI
TL;DR: A hybrid method of short-term traffic forecasting is introduced; the KARIMA method, which uses a Kohonen self-organizing map as an initial classifier; each class has an individually tuned ARIMA model associated with it.
Abstract: A hybrid method of short-term traffic forecasting is introduced; the KARIMA method. The technique uses a Kohonen self-organizing map as an initial classifier; each class has an individually tuned ARIMA model associated with it. Using a Kohonen map which is hexagonal in layout eases the problem of defining the classes. The explicit separation of the tasks of classification and functional approximation greatly improves forecasting performance compared to either a single ARIMA model or a backpropagation neural network. The model is demonstrated by producing forecasts of traffic flow, at horizons of half an hour and an hour, for a French motorway. Performance is similar to that exhibited by other layered models, but the number of classes needed is much smaller (typically between two and four). Because the number of classes is small, it is concluded that the algorithm could be easily retrained in order to track long-term changes in traffic flow and should also prove to be readily transferrable.

Journal ArticleDOI
TL;DR: It is shown that the long-term dependencies problem is lessened for a class of architectures called nonlinear autoregressive models with exogenous (NARX) recurrent neural networks, which have powerful representational capabilities.
Abstract: It has previously been shown that gradient-descent learning algorithms for recurrent neural networks can perform poorly on tasks that involve long-term dependencies, i.e. those problems for which the desired output depends on inputs presented at times far in the past. We show that the long-term dependencies problem is lessened for a class of architectures called nonlinear autoregressive models with exogenous (NARX) recurrent neural networks, which have powerful representational capabilities. We have previously reported that gradient descent learning can be more effective in NARX networks than in recurrent neural network architectures that have "hidden states" on problems including grammatical inference and nonlinear system identification. Typically, the network converges much faster and generalizes better than other networks. The results in this paper are consistent with this phenomenon. We present some experimental results which show that NARX networks can often retain information for two to three times as long as conventional recurrent neural networks. We show that although NARX networks do not circumvent the problem of long-term dependencies, they can greatly improve performance on long-term dependency problems. We also describe in detail some of the assumptions regarding what it means to latch information robustly and suggest possible ways to loosen these assumptions.

Journal ArticleDOI
TL;DR: Fundamentals of Artificial Neural Networks provides the first systematic account of artificial neural network paradigms by identifying clearly the fundamental concepts and major methodologies underlying most of the current theory and practice employed by neural network researchers.
Abstract: From the Publisher: As book review editor of the IEEE Transactions on Neural Networks, Mohamad Hassoun has had the opportunity to assess the multitude of books on artificial neural networks that have appeared in recent years. Now, in Fundamentals of Artificial Neural Networks, he provides the first systematic account of artificial neural network paradigms by identifying clearly the fundamental concepts and major methodologies underlying most of the current theory and practice employed by neural network researchers. Such a systematic and unified treatment, although sadly lacking in most recent texts on neural networks, makes the subject more accessible to students and practitioners. Here, important results are integrated in order to more fully explain a wide range of existing empirical observations and commonly used heuristics. There are numerous illustrative examples, over 200 end-of-chapter analytical and computer-based problems that will aid in the development of neural network analysis and design skills, and a bibliography of nearly 700 references. Proceeding in a clear and logical fashion, the first two chapters present the basic building blocks and concepts of artificial neural networks and analyze the computational capabilities of the basic network architectures involved. Supervised, reinforcement, and unsupervised learning rules in simple nets are brought together in a common framework in chapter three. The convergence and solution properties of these learning rules are then treated mathematically in chapter four, using the \"average learning equation\" analysis approach. This organization of material makes it natural to switch into learning multilayer nets using backpropand its variants, described in chapter five. Chapter six covers most of the major neural network paradigms, while associative memories and energy minimizing nets are given detailed coverage in the next chapter. The final chapter takes up Boltzmann machines and Boltzmann learning along with other global search/optimization algorithms such as stochastic gradient search, simulated annealing, and genetic algorithms.

Journal ArticleDOI
TL;DR: This paper focuses on data selection and classifier training methods, in order to 'prepare' classifiers for combining, and discusses several methods that make the classifiers in an ensemble more complementary.
Abstract: Using an ensemble of classifiers, instead of a single classifier, can lead to improved generalization. The gains obtained by combining, however, are often affected more by the selection of what is presented to the combiner than by the actual combining method that is chosen. In this paper, we focus on data selection and classifier training methods, in order to 'prepare' classifiers for combining. We review a combining framework for classification problems that quantifies the need for reducing the correlation among individual classifiers. Then, we discuss several methods that make the classifiers in an ensemble more complementary. Experimental results are provided to illustrate the benefits and pitfalls of reducing the correlation among classifiers, especially when the training data are in limited supply.

Proceedings ArticleDOI
03 Jun 1996
TL;DR: A connectionist architecture together with a novel supervised learning scheme which is capable of solving inductive inference tasks on complex symbolic structures of arbitrary size is presented.
Abstract: While neural networks are very successfully applied to the processing of fixed-length vectors and variable-length sequences, the current state of the art does not allow the efficient processing of structured objects of arbitrary shape (like logical terms, trees or graphs). We present a connectionist architecture together with a novel supervised learning scheme which is capable of solving inductive inference tasks on complex symbolic structures of arbitrary size. The most general structures that can be handled are labeled directed acyclic graphs. The major difference of our approach compared to others is that the structure-representations are exclusively tuned for the intended inference task. Our method is applied to tasks consisting in the classification of logical terms. These range from the detection of a certain subterm to the satisfaction of a specific unification pattern. Compared to previously known approaches we obtained superior results in that domain.

Journal ArticleDOI
TL;DR: In this paper, a series of numerical experiments, in which flow data were generated from synthetic storm sequences routed through a conceptual hydrological model consisting of a single nonlinear reservoir, has demonstrated the closeness of fit that can be achieved to such data sets using ANNs.
Abstract: A series of numerical experiments, in which flow data were generated from synthetic storm sequences routed through a conceptual hydrological model consisting of a single nonlinear reservoir, has demonstrated the closeness of fit that can be achieved to such data sets using Artificial Neural Networks (ANNs). The application of different standardization factors to both training and verification sequences has underlined the importance of such factors to network performance. Trials with both one and two hidden layers in the ANN have shown that, although improved performances are achieved with the extra hidden layer, the additional computational effort does not appear justified for data sets exhibiting the degree of nonlinear behaviour typical of rainfall and flow sequences from many catchment areas.

Journal ArticleDOI
TL;DR: It is proved that with or without such knowledge both adaptive schemes can "learn" how to control the plant, provide for bounded internal signals, and achieve asymptotically stable tracking of a reference input.
Abstract: Stable direct and indirect adaptive controllers are presented, which use Takagi-Sugeno fuzzy systems, conventional fuzzy systems, or a class of neural networks to provide asymptotic tracking of a reference signal for a class of continuous-time nonlinear plants with poorly understood dynamics The indirect adaptive scheme allows for the inclusion of a priori knowledge about the plant dynamics in terms of exact mathematical equations or linguistics while the direct adaptive scheme allows for the incorporation of such a priori knowledge in specifying the controller We prove that with or without such knowledge both adaptive schemes can "learn" how to control the plant, provide for bounded internal signals, and achieve asymptotically stable tracking of a reference input In addition, for the direct adaptive scheme a technique is presented in which linguistic knowledge of the inverse dynamics of the plant may be used to accelerate adaptation The performance of the indirect and direct adaptive schemes is demonstrated through the longitudinal control of an automobile within an automated lane

Journal ArticleDOI
TL;DR: The literature review presented discusses different methods under the general rubric of learning Bayesian networks from data, and includes some overlapping work on more general probabilistic networks.
Abstract: The literature review presented discusses different methods under the general rubric of learning Bayesian networks from data, and includes some overlapping work on more general probabilistic networks. Connections are drawn between the statistical, neural network, and uncertainty communities, and between the different methodological communities, such as Bayesian, description length, and classical statistics. Basic concepts for learning and Bayesian networks are introduced and methods are then reviewed. Methods are discussed for learning parameters of a probabilistic network, for learning the structure, and for learning hidden variables. The article avoids formal definitions and theorems, as these are plentiful in the literature, and instead illustrates key concepts with simplified examples.

Patent
25 Jul 1996
TL;DR: The simultaneous multi access reasoning technology system of as discussed by the authors utilizes both existing knowledge and implicit information that can be numerically extracted from training data to provide a method and apparatus for diagnosing disease and treating a patient.
Abstract: The simultaneous multi access reasoning technology system of the present invention utilizes both existing knowledge and implicit information that can be numerically extracted from training data to provide a method and apparatus for diagnosing disease and treating a patient. This technology further comprises a system for receiving patient data from another location, analyzing the data in a trained neural network, producing a diagnostic value, and optionally transmitting the diagnostic value to another location.

Journal ArticleDOI
TL;DR: The results indicate that customized neural networks offer a very promising avenue for building effective generic models if the measure of performance is percentage of bad loans correctly classified, but logistic regression models are comparable to the neural networks approach.

Proceedings Article
03 Jul 1996
TL;DR: The results show that the method can decrease the computational complexity of the decision rule by a factor of ten with no loss in generalization perfor mance making the SVM test speed com petitive with that of other methods.
Abstract: A Support Vector Machine SVM is a uni versal learning machine whose decision sur face is parameterized by a set of support vec tors and by a set of corresponding weights An SVM is also characterized by a kernel function Choice of the kernel determines whether the resulting SVM is a polynomial classi er a two layer neural network a ra dial basis function machine or some other learning machine SVMs are currently considerably slower in test phase than other approaches with sim ilar generalization performance To address this we present a general method to signif icantly decrease the complexity of the deci sion rule obtained using an SVM The pro posed method computes an approximation to the decision rule in terms of a reduced set of vectors These reduced set vectors are not support vectors and can in some cases be computed analytically We give ex perimental results for three pattern recogni tion problems The results show that the method can decrease the computational com plexity of the decision rule by a factor of ten with no loss in generalization perfor mance making the SVM test speed com petitive with that of other methods Fur ther the method allows the generalization performance complexity trade o to be di rectly controlled The proposed method is not speci c to pattern recognition and can be applied to any problem where the Sup port Vector algorithm is used for example regression INTRODUCTION SUPPORT VECTOR MACHINES Consider a two class classi er for which the decision rule takes the form