Showing papers by "James L. McClelland published in 2022"

PDF

Open Access

Proceedings Article•DOI•

Can language models learn from explanations in context?

[...]

Andrew K. Lampinen, Ishita Dasgupta, Stephanie C.Y. Chan, Kory Matthewson, Michael Henry Tessler, Antonia Creswell, James L. McClelland, Jane X. Wang, Felix Hill - Show less +5 more

05 Apr 2022

TL;DR: Investigating whether explanations of few-shot examples can help in-context learning of large LMs on challenging tasks finds that explanations can improve performance—even without tuning.

...read moreread less

Abstract: Language Models (LMs) can perform new tasks by adapting to a few in-context examples. For humans, explanations that connect examples to task principles can improve learning. We therefore investigate whether explanations of few-shot examples can help LMs. We annotate questions from 40 challenging tasks with answer explanations, and various matched control explanations. We evaluate how different types of explanations, instructions, and controls affect zero- and few-shot performance. We analyze these results using statistical multilevel modeling techniques that account for the nested dependencies among conditions, tasks, prompts, and models. We find that explanations can improve performance -- even without tuning. Furthermore, explanations hand-tuned for performance on a small validation set offer substantially larger benefits, and building a prompt by selecting examples and explanations together substantially improves performance over selecting examples alone. Finally, even untuned explanations outperform carefully matched controls, suggesting that the benefits are due to the link between an example and its explanation, rather than lower-level features. However, only large models benefit. In summary, explanations can support the in-context learning of large LMs on challenging tasks.

...read moreread less

104 citations

Journal Article•DOI•

Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks

[...]

Yuxuan Li, James L. McClelland

02 Oct 2022-arXiv.org

TL;DR: This work shows that two-layer transformers learn generalizable solutions to multi-level problems and develop signs of systematic task decomposition, and provides key insights into how transformer models may be capable of decomposing complex decisions into reusable, multi- level policies in tasks requiring structured behavior.

...read moreread less

Abstract: Transformer networks have seen great success in natural language processing and machine vision, where task objectives such as next word prediction and image classification benefit from nuanced context sensitivity across high-dimensional inputs. However, there is an ongoing debate about how and when transformers can acquire highly structured behavior and achieve systematic generalization. Here, we explore how well a causal transformer can perform a set of algorithmic tasks, including copying, sorting, and hierarchical compositions of these operations. We demonstrate strong generalization to sequences longer than those used in training by replacing the standard positional encoding typically used in transformers with labels arbitrarily paired with items in the sequence. We search for the layer and head configuration sufficient to solve these tasks, then probe for signs of systematic processing in latent representations and attention patterns. We show that two-layer transformers learn reliable solutions to multi-level problems, develop signs of task decomposition, and encode input items in a way that encourages the exploitation of shared computation across related tasks. These results provide key insights into how attention layers support structured computation both within a task and across multiple tasks.

...read moreread less

7 citations

Journal Article•DOI•

Investigating the effectiveness of a well‐managed hatchery as a tool for hawksbill sea turtle (Eretmochelys imbricata) conservation

[...]

Sean M. Evans, M. J. Schulze, S.W. Dunlop, Bronwyn Dunlop, James L. McClelland, R.D. Hodgkiss - Show less +2 more

07 Oct 2022-Conservation science and practice

TL;DR: In this paper , the authors examined the effectiveness of a hatchery at increasing hatching and emergence success from four seasons of data (2017-2021) and found that a greater percentage of in situ clutches was influenced by both crab activity and predation as compared to relocated clutches.

...read moreread less

Abstract: The critically endangered hawksbill sea turtle (Eretmochelys imbricata) is of conservation concern worldwide. Conservation actions which act to reduce high levels of embryonic mortality aid to boost hatchling production. At Cousine Island, Seychelles, a mixed management method has been adopted to maximize the number of hatchlings entering the ocean. We examined the effectiveness of a hatchery at increasing hatching and emergence success from four seasons of data (2017–2021). Hatchery nests had significantly higher hatching and emergence success across all years relative to nests left in situ (i.e., natural), with inter‐annual variation observed. A greater percentage of in situ clutches was found to be influenced by both crab activity and predation as compared to relocated clutches. Overall, the mixed management approach increased hatching success (mean = 75.1%) relative to the various nest management techniques previously used (2004/2005–2016/2017; mean = 63.5%). By mitigating external influences such as tidal flooding, beach erosion, and crab activity/predation, this study provides evidence for the success of a hatchery in directly increasing hatchling recruitment. Strict and careful hatchery management as well as timely and efficient relocation procedures are needed to minimize potential negative effects of nest relocation.

...read moreread less

2 citations

Journal Article•DOI•

Capturing advanced human cognitive abilities with deep neural networks

[...]

James L. McClelland

28 Oct 2022-Trends in Cognitive Sciences

TL;DR: The authors suggest that neural networks must learn to exploit human-invented tools of thought and human-like ways of using them, and must engage in explicit goal-directed problem solving as exemplified in the activities of scientists and mathematicians and taught in advanced educational settings.

...read moreread less

2 citations

Journal Article•DOI•

Learning to Reason With Relational Abstractions

[...]

Andrew Joo Hun Nam, Mengye Ren, Chelsea Finn, James L. McClelland

06 Oct 2022-arXiv.org

TL;DR: This work introduces new types of sequences that more explic-itly provide an abstract characterization of the transitions through intermediate solution steps to the goal state and shows that models that are supplied with such sequences as prompts can solve tasks with a signiﬁcantly higher accuracy, and models that is trained to produce such sequences solve problems better than those that are trained with previously used human-generated sequences and other baselines.

...read moreread less

Abstract: Large language models have recently shown promising progress in mathematical reasoning when fine-tuned with human-generated sequences walking through a sequence of solution steps. However, the solution sequences are not formally structured and the resulting model-generated sequences may not reflect the kind of systematic reasoning we might expect an expert human to produce. In this paper, we study how to build stronger reasoning capability in language models using the idea of relational abstractions. We introduce new types of sequences that more explicitly provide an abstract characterization of the transitions through intermediate solution steps to the goal state. We find that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy, and models that are trained to produce such sequences solve problems better than those that are trained with previously used human-generated sequences and other baselines. Our work thus takes several steps toward elucidating and improving how language models perform on tasks requiring multi-step mathematical reasoning.

...read moreread less

1 citations

Achieving and Understanding Out-of-Distribution Generalization in Systematic Reasoning in Small-Scale Transformers

[...]

Andrew Joo Hun Nam, Mustafa Abdool, Trevor C. Maxfield, James L. McClelland

07 Oct 2022

TL;DR: This paper explore the question of OODG in small scale transformers trained with examples from a known distribution and find that suppressing sensitivity to absolute positions overcomes this limitation, which represents a small step toward understanding and promoting systematic generalization in transformers.

...read moreread less

Abstract: Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks. This challenge is quite apparent in tasks with well-defined variables and rules, where explicit use of the rules could solve problems independently of the particular values of the variables, but networks tend to be tied to the range of values sampled in their training data. Large transformer-based language models have pushed the boundaries on how well neural networks can solve previously unseen problems, but their complexity and lack of clarity about the relevant content in their training data obfuscates how they achieve such robustness. As a step toward understanding how transformer-based systems generalize, we explore the question of OODG in small scale transformers trained with examples from a known distribution. Using a reasoning task based on the puzzle Sudoku, we show that OODG can occur on a complex problem if the training set includes examples sampled from the whole distribution of simpler component tasks. Successful generalization depends on carefully managing positional alignment when absolute position encoding is used, but we find that suppressing sensitivity to absolute positions overcomes this limitation. Taken together our results represent a small step toward understanding and promoting systematic generalization in transformers.

...read moreread less

Journal Article•DOI•

Out-of-Distribution Generalization in Algorithmic Reasoning Through Curriculum Learning

[...]

Andrew Joo Hun Nam, Mustafa Abdool, Trevor C. Maxfield, James L. McClelland

arXiv.org

TL;DR: This article explored the question of out-of-distribution generalization in smaller scale transformers using a reasoning task based on the puzzle Sudoku, and showed that OODG can occur on complex problems if the training set includes examples sampled from the whole distribution of simpler component tasks.

...read moreread less

Abstract: Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks, and is quite apparent in tasks with well-deﬁned variables and rules, where explicit use of the rules can solve problems independently of the particular values of the variables. Large transformer-based language models have pushed the boundaries on how well neural networks can generalize to novel inputs, but their complexity obfuscates they achieve such robustness. As a step toward understanding how transformer-based systems generalize, we explore the question of OODG in smaller scale transformers. Using a reasoning task based on the puzzle Sudoku, we show that OODG can occur on complex problems if the training set includes examples sampled from the whole distribution of simpler component tasks. Large transformer-based ’foundation’ models Bommasani et al.

...read moreread less

Proceedings Article•DOI•

Lateral Inhibition Facilitates Sequential Learning in a Hippocampus-Inspired Auto-Associator

[...]

Benjamin Midler, James L. McClelland

01 Jan 2022

Book Chapter•DOI•

Could the AI of Our Dreams Ever Become Reality

[...]

James L. McClelland

01 Jan 2022