scispace - formally typeset
Search or ask a question
Author

Hava T. Siegelmann

Bio: Hava T. Siegelmann is an academic researcher from University of Massachusetts Amherst. The author has contributed to research in topics: Artificial neural network & Recurrent neural network. The author has an hindex of 34, co-authored 172 publications receiving 7092 citations. Previous affiliations of Hava T. Siegelmann include Harvard University & Technion – Israel Institute of Technology.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, a Gaussian kernel based clustering method using support vector machines (SVM) is proposed to find the minimal enclosing sphere, which can separate into several components, each enclosing a separate cluster of points.
Abstract: We present a novel clustering method using the approach of support vector machines. Data points are mapped by means of a Gaussian kernel to a high dimensional feature space, where we search for the minimal enclosing sphere. This sphere, when mapped back to data space, can separate into several components, each enclosing a separate cluster of points. We present a simple algorithm for identifying these clusters. The width of the Gaussian kernel controls the scale at which the data is probed while the soft margin constant helps coping with outliers and overlapping clusters. The structure of a dataset is explored by varying the two parameters, maintaining a minimal number of support vectors to assure smooth cluster boundaries. We demonstrate the performance of our algorithm on several datasets.

1,389 citations

Journal ArticleDOI
TL;DR: It is proved that one may simulate all Turing machines by such nets, and any multi-stack Turing machine in real time, and there is a net made up of 886 processors which computes a universal partial-recursive function.

837 citations

Journal ArticleDOI
01 Apr 1997
TL;DR: It is constructively proved that the NARX networks with a finite number of parameters are computationally as strong as fully connected recurrent networks and thus Turing machines, raising the issue of what amount of feedback or recurrence is necessary for any network to be Turing equivalent and what restrictions on feedback limit computational power.
Abstract: Recently, fully connected recurrent neural networks have been proven to be computationally rich-at least as powerful as Turing machines. This work focuses on another network which is popular in control applications and has been found to be very effective at learning a variety of problems. These networks are based upon Nonlinear AutoRegressive models with eXogenous Inputs (NARX models), and are therefore called NARX networks. As opposed to other recurrent networks, NARX networks have a limited feedback which comes only from the output neuron rather than from hidden states. They are formalized by y(t)=/spl Psi/(u(t-n/sub u/), ..., u(t-1), u(t), y(t-n/sub y/), ..., y(t-1)) where u(t) and y(t) represent input and output of the network at time t, n/sub u/ and n/sub y/ are the input and output order, and the function /spl Psi/ is the mapping performed by a Multilayer Perceptron. We constructively prove that the NARX networks with a finite number of parameters are computationally as strong as fully connected recurrent networks and thus Turing machines. We conclude that in theory one can use the NARX models, rather than conventional recurrent networks without any computational loss even though their feedback is limited. Furthermore, these results raise the issue of what amount of feedback or recurrence is necessary for any network to be Turing equivalent and what restrictions on feedback limit computational power.

462 citations

Book
01 Mar 1999
TL;DR: This chapter discusses Neural Networks and Turing Machines, which are concerned with the construction of neural networks based on the explicit specification of a discrete-time Turing machine.
Abstract: 1 Computational Complexity.- 1.1 Neural Networks.- 1.2 Automata: A General Introduction.- 1.2.1 Input Sets in Computability Theory.- 1.3 Finite Automata.- 1.3.1 Neural Networks and Finite Automata.- 1.4 The Turing Machine.- 1.4.1 Neural Networks and Turing Machines.- 1.5 Probabilistic Turing Machines.- 1.5.1 Neural Networks and Probabilistic Machines.- 1.6 Nondeterministic Turing Machines.- 1.6.1 Nondeterministic Neural Networks.- 1.7 Oracle Turing Machines.- 1.7.1 Neural Networks and Oracle Machines.- 1.8 Advice Turing Machines.- 1.8.1 Circuit Families.- 1.8.2 Neural Networks and Advice Machines.- 1.9 Notes.- 2 The Model.- 2.1 Variants of the Network.- 2.1.1 A "System Diagram" Interpretation.- 2.2 The Network's Computation.- 2.3 Integer Weights.- 3 Networks with Rational Weights.- 3.1 The Turing Equivalence Theorem.- 3.2 Highlights of the Proof.- 3.2.1 Cantor-like Encoding of Stacks.- 3.2.2 Stack Operations.- 3.2.3 General Construction of the Network.- 3.3 The Simulation.- 3.3.1 P-Stack Machines.- 3.4 Network with Four Layers.- 3.4.1 A Layout Of The Construction.- 3.5 Real-Time Simulation.- 3.5.1 Computing in Two Layers.- 3.5.2 Removing the Sigmoid From the Main Layer.- 3.5.3 One Layer Network Simulates TM.- 3.6 Inputs and Outputs.- 3.7 Universal Network.- 3.8 Nondeterministic Computation.- 4 Networks with Real Weights.- 4.1 Simulating Circuit Families.- 4.1.1 The Circuit Encoding.- 4.1.2 A Circuit Retrieval.- 4.1.3 Circuit Simulation By a Network.- 4.1.4 The Combined Network.- 4.2 Networks Simulation by Circuits.- 4.2.1 Linear Precision Suffices.- 4.2.2 The Network Simulation by a Circuit.- 4.3 Networks versus Threshold Circuits.- 4.4 Corollaries.- 5 Kolmogorov Weights: Between P and P/poly.- 5.1 Kolmogorov Complexity and Reals.- 5.2 Tally Oracles and Neural Networks.- 5.3 Kolmogorov Weights and Advice Classes.- 5.4 The Hierarchy Theorem.- 6 Space and Precision.- 6.1 Equivalence of Space and Precision.- 6.2 Fixed Precision Variable Sized Nets.- 7 Universality of Sigmoidal Networks.- 7.1 Alarm Clock Machines.- 7.1.1 Adder Machines.- 7.1.2 Alarm Clock and Adder Machines.- 7.2 Restless Counters.- 7.3 Sigmoidal Networks are Universal.- 7.3.1 Correctness of the Simulation.- 7.4 Conclusions.- 8 Different-limits Networks.- 8.1 At Least Finite Automata.- 8.2 Proof of the Interpolation Lemma.- 9 Stochastic Dynamics.- 9.1 Stochastic Networks.- 9.1.1 The Model.- 9.2 The Main Results.- 9.2.1 Integer Networks.- 9.2.2 Rational Networks.- 9.2.3 Real Networks.- 9.3 Integer Stochastic Networks.- 9.4 Rational Stochastic Networks.- 9.4.1 Rational Set of Choices.- 9.4.2 Real Set of Choices.- 9.5 Real Stochastic Networks.- 9.6 Unreliable Networks.- 9.7 Nondeterministic Stochastic Networks.- 10 Generalized Processor Networks.- 10.1 Generalized Networks: Definition.- 10.2 Bounded Precision.- 10.3 Equivalence with Neural Networks.- 10.4 Robustness.- 11 Analog Computation.- 11.1 Discrete Time Models.- 11.2 Continuous Time Models.- 11.3 Hybrid Models.- 11.4 Dissipative Models.- 12 Computation Beyond the Turing Limit.- 12.1 The Analog Shift Map.- 12.2 Analog Shift and Computation.- 12.3 Physical Relevance.- 12.4 Conclusions.

407 citations

Journal ArticleDOI
TL;DR: The existence of a finite neural network, made up of sigmoidal neurons, which simulates a universal Turing machine, composed of less than 10 5 synchronously evolving processors, interconnected linearly is shown.

388 citations


Cited by
More filters
Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations

Journal ArticleDOI
TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

14,635 citations

Book ChapterDOI
01 Jan 2014
TL;DR: This chapter provides an overview of the fundamentals of algorithms and their links to self-organization, exploration, and exploitation.
Abstract: Algorithms are important tools for solving problems computationally. All computation involves algorithms, and the efficiency of an algorithm largely determines its usefulness. This chapter provides an overview of the fundamentals of algorithms and their links to self-organization, exploration, and exploitation. A brief history of recent nature-inspired algorithms for optimization is outlined in this chapter.

8,285 citations

Journal ArticleDOI
TL;DR: In this article, a Support Vector Machine (SVM) method based on recursive feature elimination (RFE) was proposed to select a small subset of genes from broad patterns of gene expression data, recorded on DNA micro-arrays.
Abstract: DNA micro-arrays now permit scientists to screen thousands of genes simultaneously and determine whether those genes are active, hyperactive or silent in normal or cancerous tissue. Because these new micro-array devices generate bewildering amounts of raw data, new analytical methods must be developed to sort out whether cancer tissues have distinctive signatures of gene expression over normal tissues or other types of cancer tissues. In this paper, we address the problem of selection of a small subset of genes from broad patterns of gene expression data, recorded on DNA micro-arrays. Using available training examples from cancer and normal patients, we build a classifier suitable for genetic diagnosis, as well as drug discovery. Previous attempts to address this problem select genes with correlation techniques. We propose a new method of gene selection utilizing Support Vector Machine methods based on Recursive Feature Elimination (RFE). We demonstrate experimentally that the genes selected by our techniques yield better classification performance and are biologically relevant to cancer. In contrast with the baseline method, our method eliminates gene redundancy automatically and yields better and more compact gene subsets. In patients with leukemia our method discovered 2 genes that yield zero leave-one-out error, while 64 genes are necessary for the baseline method to get the best result (one leave-one-out error). In the colon cancer database, using only 4 genes our method is 98% accurate, while the baseline method is only 86% accurate.

7,939 citations

Journal ArticleDOI
TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.
Abstract: Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.

5,744 citations