scispace - formally typeset
Search or ask a question

Showing papers by "James L. McClelland published in 2022"


Proceedings ArticleDOI
05 Apr 2022
TL;DR: Investigating whether explanations of few-shot examples can help in-context learning of large LMs on challenging tasks finds that explanations can improve performance—even without tuning.
Abstract: Language Models (LMs) can perform new tasks by adapting to a few in-context examples. For humans, explanations that connect examples to task principles can improve learning. We therefore investigate whether explanations of few-shot examples can help LMs. We annotate questions from 40 challenging tasks with answer explanations, and various matched control explanations. We evaluate how different types of explanations, instructions, and controls affect zero- and few-shot performance. We analyze these results using statistical multilevel modeling techniques that account for the nested dependencies among conditions, tasks, prompts, and models. We find that explanations can improve performance -- even without tuning. Furthermore, explanations hand-tuned for performance on a small validation set offer substantially larger benefits, and building a prompt by selecting examples and explanations together substantially improves performance over selecting examples alone. Finally, even untuned explanations outperform carefully matched controls, suggesting that the benefits are due to the link between an example and its explanation, rather than lower-level features. However, only large models benefit. In summary, explanations can support the in-context learning of large LMs on challenging tasks.

104 citations


Journal ArticleDOI
TL;DR: This work shows that two-layer transformers learn generalizable solutions to multi-level problems and develop signs of systematic task decomposition, and provides key insights into how transformer models may be capable of decomposing complex decisions into reusable, multi- level policies in tasks requiring structured behavior.
Abstract: Transformer networks have seen great success in natural language processing and machine vision, where task objectives such as next word prediction and image classification benefit from nuanced context sensitivity across high-dimensional inputs. However, there is an ongoing debate about how and when transformers can acquire highly structured behavior and achieve systematic generalization. Here, we explore how well a causal transformer can perform a set of algorithmic tasks, including copying, sorting, and hierarchical compositions of these operations. We demonstrate strong generalization to sequences longer than those used in training by replacing the standard positional encoding typically used in transformers with labels arbitrarily paired with items in the sequence. We search for the layer and head configuration sufficient to solve these tasks, then probe for signs of systematic processing in latent representations and attention patterns. We show that two-layer transformers learn reliable solutions to multi-level problems, develop signs of task decomposition, and encode input items in a way that encourages the exploitation of shared computation across related tasks. These results provide key insights into how attention layers support structured computation both within a task and across multiple tasks.

7 citations


Journal ArticleDOI
TL;DR: In this paper , the authors examined the effectiveness of a hatchery at increasing hatching and emergence success from four seasons of data (2017-2021) and found that a greater percentage of in situ clutches was influenced by both crab activity and predation as compared to relocated clutches.
Abstract: The critically endangered hawksbill sea turtle (Eretmochelys imbricata) is of conservation concern worldwide. Conservation actions which act to reduce high levels of embryonic mortality aid to boost hatchling production. At Cousine Island, Seychelles, a mixed management method has been adopted to maximize the number of hatchlings entering the ocean. We examined the effectiveness of a hatchery at increasing hatching and emergence success from four seasons of data (2017–2021). Hatchery nests had significantly higher hatching and emergence success across all years relative to nests left in situ (i.e., natural), with inter‐annual variation observed. A greater percentage of in situ clutches was found to be influenced by both crab activity and predation as compared to relocated clutches. Overall, the mixed management approach increased hatching success (mean = 75.1%) relative to the various nest management techniques previously used (2004/2005–2016/2017; mean = 63.5%). By mitigating external influences such as tidal flooding, beach erosion, and crab activity/predation, this study provides evidence for the success of a hatchery in directly increasing hatchling recruitment. Strict and careful hatchery management as well as timely and efficient relocation procedures are needed to minimize potential negative effects of nest relocation.

2 citations


Journal ArticleDOI
TL;DR: The authors suggest that neural networks must learn to exploit human-invented tools of thought and human-like ways of using them, and must engage in explicit goal-directed problem solving as exemplified in the activities of scientists and mathematicians and taught in advanced educational settings.

2 citations


Journal ArticleDOI
TL;DR: This work introduces new types of sequences that more explic-itly provide an abstract characterization of the transitions through intermediate solution steps to the goal state and shows that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy, and models that is trained to produce such sequences solve problems better than those that are trained with previously used human-generated sequences and other baselines.
Abstract: Large language models have recently shown promising progress in mathematical reasoning when fine-tuned with human-generated sequences walking through a sequence of solution steps. However, the solution sequences are not formally structured and the resulting model-generated sequences may not reflect the kind of systematic reasoning we might expect an expert human to produce. In this paper, we study how to build stronger reasoning capability in language models using the idea of relational abstractions. We introduce new types of sequences that more explicitly provide an abstract characterization of the transitions through intermediate solution steps to the goal state. We find that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy, and models that are trained to produce such sequences solve problems better than those that are trained with previously used human-generated sequences and other baselines. Our work thus takes several steps toward elucidating and improving how language models perform on tasks requiring multi-step mathematical reasoning.

1 citations


07 Oct 2022
TL;DR: This paper explore the question of OODG in small scale transformers trained with examples from a known distribution and find that suppressing sensitivity to absolute positions overcomes this limitation, which represents a small step toward understanding and promoting systematic generalization in transformers.
Abstract: Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks. This challenge is quite apparent in tasks with well-defined variables and rules, where explicit use of the rules could solve problems independently of the particular values of the variables, but networks tend to be tied to the range of values sampled in their training data. Large transformer-based language models have pushed the boundaries on how well neural networks can solve previously unseen problems, but their complexity and lack of clarity about the relevant content in their training data obfuscates how they achieve such robustness. As a step toward understanding how transformer-based systems generalize, we explore the question of OODG in small scale transformers trained with examples from a known distribution. Using a reasoning task based on the puzzle Sudoku, we show that OODG can occur on a complex problem if the training set includes examples sampled from the whole distribution of simpler component tasks. Successful generalization depends on carefully managing positional alignment when absolute position encoding is used, but we find that suppressing sensitivity to absolute positions overcomes this limitation. Taken together our results represent a small step toward understanding and promoting systematic generalization in transformers.

Journal ArticleDOI
TL;DR: This article explored the question of out-of-distribution generalization in smaller scale transformers using a reasoning task based on the puzzle Sudoku, and showed that OODG can occur on complex problems if the training set includes examples sampled from the whole distribution of simpler component tasks.
Abstract: Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks, and is quite apparent in tasks with well-defined variables and rules, where explicit use of the rules can solve problems independently of the particular values of the variables. Large transformer-based language models have pushed the boundaries on how well neural networks can generalize to novel inputs, but their complexity obfuscates they achieve such robustness. As a step toward understanding how transformer-based systems generalize, we explore the question of OODG in smaller scale transformers. Using a reasoning task based on the puzzle Sudoku, we show that OODG can occur on complex problems if the training set includes examples sampled from the whole distribution of simpler component tasks. Large transformer-based ’foundation’ models Bommasani et al.