scispace - formally typeset
Open accessPosted Content

Neural Production Systems

Abstract: Visual environments are structured, consisting of distinct objects or entities. These entities have properties -- both visible and latent -- that determine the manner in which they interact with one another. To partition images into entities, deep-learning researchers have proposed structural inductive biases such as slot-based architectures. To model interactions among entities, equivariant graph neural nets (GNNs) are used, but these are not particularly well suited to the task for two reasons. First, GNNs do not predispose interactions to be sparse, as relationships among independent entities are likely to be. Second, GNNs do not factorize knowledge about interactions in an entity-conditional manner. As an alternative, we take inspiration from cognitive science and resurrect a classic approach, production systems, which consist of a set of rule templates that are applied by binding placeholder variables in the rules to specific entities. Rules are scored on their match to entities, and the best fitting rules are applied to update entity properties. In a series of experiments, we demonstrate that this architecture achieves a flexible, dynamic flow of control and serves to factorize entity-specific and rule-based information. This disentangling of knowledge achieves robust future-state prediction in rich visual environments, outperforming state-of-the-art methods using GNNs, and allows for the extrapolation from simple (few object) environments to more complex environments.

... read more


9 results found

Open accessPosted Content
Dianbo Liu1, Alex Lamb2, Kenji Kawaguchi3, Anirudh Goyal2  +3 moreInstitutions (4)
06 Jul 2021-arXiv: Learning
Abstract: Deep learning has advanced from fully connected architectures to structured models organized into components, e.g., the transformer composed of positional elements, modular architectures divided into slots, and graph neural nets made up of nodes. In structured models, an interesting question is how to conduct dynamic and possibly sparse communication among the separate components. Here, we explore the hypothesis that restricting the transmitted information among components to discrete representations is a beneficial bottleneck. The motivating intuition is human language in which communication occurs through discrete symbols. Even though individuals have different understandings of what a "cat" is based on their specific experiences, the shared discrete token makes it possible for communication among individuals to be unimpeded by individual differences in internal representation. To discretize the values of concepts dynamically communicated among specialist components, we extend the quantization mechanism from the Vector-Quantized Variational Autoencoder to multi-headed discretization with shared codebooks and use it for discrete-valued neural communication (DVNC). Our experiments show that DVNC substantially improves systematic generalization in a variety of architectures -- transformers, modular architectures, and graph neural networks. We also show that the DVNC is robust to the choice of hyperparameters, making the method very useful in practice. Moreover, we establish a theoretical justification of our discretization process, proving that it has the ability to increase noise robustness and reduce the underlying dimensionality of the model.

... read more

Topics: Deep learning (55%), Autoencoder (54%), Graph (abstract data type) (54%) ... show more

2 Citations

Open accessPosted Content
28 Oct 2021-arXiv: Robotics
Abstract: Machine learning has long since become a keystone technology, accelerating science and applications in a broad range of domains. Consequently, the notion of applying learning methods to a particular problem set has become an established and valuable modus operandi to advance a particular field. In this article we argue that such an approach does not straightforwardly extended to robotics -- or to embodied intelligence more generally: systems which engage in a purposeful exchange of energy and information with a physical environment. In particular, the purview of embodied intelligent agents extends significantly beyond the typical considerations of main-stream machine learning approaches, which typically (i) do not consider operation under conditions significantly different from those encountered during training; (ii) do not consider the often substantial, long-lasting and potentially safety-critical nature of interactions during learning and deployment; (iii) do not require ready adaptation to novel tasks while at the same time (iv) effectively and efficiently curating and extending their models of the world through targeted and deliberate actions. In reality, therefore, these limitations result in learning-based systems which suffer from many of the same operational shortcomings as more traditional, engineering-based approaches when deployed on a robot outside a well defined, and often narrow operating envelope. Contrary to viewing embodied intelligence as another application domain for machine learning, here we argue that it is in fact a key driver for the advancement of machine learning technology. In this article our goal is to highlight challenges and opportunities that are specific to embodied intelligence and to propose research directions which may significantly advance the state-of-the-art in robot learning.

... read more

Topics: Robot learning (67%), Intelligent agent (52%), Embodied cognition (50%)

Open accessPosted Content
Abstract: Artificial Intelligence has been developed for decades with the achievement of great progress. Recently, deep learning shows its ability to solve many real world problems, e.g. image classification and detection, natural language processing, playing GO. Theoretically speaking, an artificial neural network can fit any function and reinforcement learning can learn from any delayed reward. But in solving real world tasks, we still need to spend a lot of effort to adjust algorithms to fit task unique features. This paper proposes that the reason of this phenomenon is the sparse feedback feature of the nature, and a single algorithm, no matter how we improve it, can only solve dense feedback tasks or specific sparse feedback tasks. This paper first analyses how sparse feedback affects algorithm perfomance, and then proposes a pattern that explains how to accumulate knowledge to solve sparse feedback problems.

... read more

Topics: Reinforcement learning (55%), Deep learning (54%), Artificial neural network (52%) ... show more

Open accessPosted Content
12 Oct 2021-arXiv: Learning
Abstract: Modern neural network architectures can leverage large amounts of data to generalize well within the training distribution. However, they are less capable of systematic generalization to data drawn from unseen but related distributions, a feat that is hypothesized to require compositional reasoning and reuse of knowledge. In this work, we present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules, which we call \emph{functions}. Inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. The proposed architecture can flexibly compose computation along width and depth, and lends itself well to capacity extension after training. To demonstrate the versatility of Neural Interpreters, we evaluate it in two distinct settings: image classification and visual abstract reasoning on Raven Progressive Matrices. In the former, we show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner. In the latter, we find that Neural Interpreters are competitive with respect to the state-of-the-art in terms of systematic generalization

... read more

Topics: Artificial neural network (53%), Inference (51%), Generalization (50%)

Open accessPosted Content
Nuri Cingillioglu1, Alessandra Russo1Institutions (1)
14 Jun 2021-arXiv: Learning
Abstract: Humans have the ability to seamlessly combine low-level visual input with high-level symbolic reasoning often in the form of recognising objects, learning relations between them and applying rules. Neuro-symbolic systems aim to bring a unifying approach to connectionist and logic-based principles for visual processing and abstract reasoning respectively. This paper presents a complete neuro-symbolic method for processing images into objects, learning relations and logical rules in an end-to-end fashion. The main contribution is a differentiable layer in a deep learning architecture from which symbolic relations and rules can be extracted by pruning and thresholding. We evaluate our model using two datasets: subgraph isomorphism task for symbolic rule learning and an image classification domain with compound relations for learning objects, relations and rules. We demonstrate that our model scales beyond state-of-the-art symbolic learners and outperforms deep relational neural network architectures.

... read more

Topics: Deep learning (55%), Artificial neural network (54%), Connectionism (52%) ... show more


72 results found

Open accessProceedings Article
01 Jan 2015-
Abstract: Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

... read more

15,992 Citations

Open accessPosted Content
Ashish Vaswani1, Noam Shazeer1, Niki Parmar2, Jakob Uszkoreit1  +4 moreInstitutions (2)
Abstract: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

... read more

6,974 Citations

Open accessJournal ArticleDOI: 10.1080/01621459.1971.10482356
William M. Rand1Institutions (1)
Abstract: Many intuitively appealing methods have been suggested for clustering data, however, interpretation of their results has been hindered by the lack of objective criteria. This article proposes several criteria which isolate specific aspects of the performance of a method, such as its retrieval of inherent structure, its sensitivity to resampling and the stability of its results in the light of new data. These criteria depend on a measure of similarity between two different clusterings of the same set of data; the measure essentially considers how each pair of data points is assigned in each clustering.

... read more

Topics: Cluster analysis (62%), Rand index (56%)

5,541 Citations

Journal ArticleDOI: 10.1162/NECO.1991.3.1.79
01 Mar 1991-Neural Computation
Abstract: We present a new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases. The new procedure can be viewed either as a modular version of a multilayer supervised network, or as an associative version of competitive learning. It therefore provides a new link between these two apparently different approaches. We demonstrate that the learning procedure divides up a vowel discrimination task into appropriate subtasks, each of which can be solved by a very simple expert network.

... read more

Topics: Semi-supervised learning (61%), Supervised learning (61%), Competitive learning (60%) ... show more

3,861 Citations

Open accessJournal ArticleDOI: 10.1109/TNN.2008.2005605
Franco Scarselli1, Marco Gori1, Ah Chung Tsoi2, Markus Hagenbuchner3  +1 moreInstitutions (3)
Abstract: Many underlying relationships among data in several areas of science and engineering, e.g., computer vision, molecular chemistry, molecular biology, pattern recognition, and data mining, can be represented in terms of graphs. In this paper, we propose a new neural network model, called graph neural network (GNN) model, that extends existing neural network methods for processing the data represented in graph domains. This GNN model, which can directly process most of the practically useful types of graphs, e.g., acyclic, cyclic, directed, and undirected, implements a function tau(G,n) isin IRm that maps a graph G and one of its nodes n into an m-dimensional Euclidean space. A supervised learning algorithm is derived to estimate the parameters of the proposed GNN model. The computational cost of the proposed algorithm is also considered. Some experimental results are shown to validate the proposed learning algorithm, and to demonstrate its generalization capabilities.

... read more

Topics: Graph (abstract data type) (65%), Directed graph (62%), Moral graph (62%) ... show more

3,082 Citations

No. of citations received by the Paper in previous years
Network Information
Related Papers (5)
A probabilistic model for entity disambiguation using relationships01 Jan 2004

Dmitri V. Kalashnikov, Sharad Mehrotra

75% related
Inductive Biases for Deep Learning of Higher-Level Cognition30 Nov 2020, arXiv: Learning

Anirudh Goyal, Yoshua Bengio

75% related
From Entities to Geometry: Towards exploiting Multiple Sources to Predict Relevance01 Jan 2010

E. Di Buccio, Mounia Lalmas +1 more

73% related