scispace - formally typeset
Search or ask a question
Author

Guy Gur-Ari

Bio: Guy Gur-Ari is an academic researcher from Google. The author has contributed to research in topics: Chern–Simons theory & Gauge theory. The author has an hindex of 22, co-authored 39 publications receiving 2694 citations. Previous affiliations of Guy Gur-Ari include Weizmann Institute of Science & Stanford University.

Papers
More filters
Journal Article
TL;DR: A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.
Abstract: Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning , which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model (PaLM). We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies.

1,429 citations

Journal ArticleDOI
TL;DR: In this paper, the authors show that the late time behavior of horizon fluctuations in large anti-de Sitter (AdS) black holes is governed by the random matrix dynamics characteristic of quantum chaotic systems.
Abstract: We argue that the late time behavior of horizon fluctuations in large anti-de Sitter (AdS) black holes is governed by the random matrix dynamics characteristic of quantum chaotic systems. Our main tool is the Sachdev-Ye-Kitaev (SYK) model, which we use as a simple model of a black hole. We use an analytically continued partition function |Z(β + it)|2 as well as correlation functions as diagnostics. Using numerical techniques we establish random matrix behavior at late times. We determine the early time behavior exactly in a double scaling limit, giving us a plausible estimate for the crossover time to random matrix behavior. We use these ideas to formulate a conjecture about general large AdS black holes, like those dual to 4D super-Yang-Mills theory, giving a provisional estimate of the crossover time. We make some preliminary comments about challenges to understanding the late time dynamics from a bulk point of view.

553 citations

Journal ArticleDOI
TL;DR: In this paper, the authors studied three-dimensional O(N) (U(N)) vector models with the Chern-Simons theory coupled to a scalar field in the fundamental representation, in the large N limit.
Abstract: We study three dimensional O(N) k and U(N) k Chern-Simons theories coupled to a scalar field in the fundamental representation, in the large N limit. For infinite k this is just the singlet sector of the O(N) (U(N)) vector model, which is conjectured to be dual to Vasiliev’s higher spin gravity theory on AdS 4. For large k and N we obtain a parity-breaking deformation of this theory, controlled by the ’t Hooft coupling λ = 4πN/k. For infinite N we argue (and show explicitly at two-loop order) that the theories with finite λ are conformally invariant, and also have an exactly marginal (ϕ 2)3 deformation. For large but finite N and small ’t Hooft coupling λ, we show that there is still a line of fixed points parameterized by the ’t Hooft coupling λ. We show that, at infinite N, the interacting non-parity-invariant theory with finite λ has the same spectrum of primary operators as the free theory, consisting of an infinite tower of conserved higher-spin currents and a scalar operator with scaling dimension Δ = 1; however, the correlation functions of these operators do depend on λ. Our results suggest that there should exist a family of higher spin gravity theories, parameterized by λ, and continuously connected to Vasiliev’s theory. For finite N the higher spin currents are not conserved.

438 citations

Journal Article
TL;DR: Evaluation of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters finds that model performance and calibration both improve with scale, but are poor in absolute terms.
Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit"breakthrough"behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.

376 citations

Journal ArticleDOI
TL;DR: In this article, the authors considered the conformal field theory of N complex massless scalars coupled to a U(N) Chern-Simons theory at level k, and they showed that the theory is equivalent to the Legendre transform of the theory of k fermions coupled to the U(k)
Abstract: We consider the conformal field theory of N complex massless scalars in 2 + 1 dimensions, coupled to a U(N) Chern-Simons theory at level k. This theory has a ’t Hooft large N limit, keeping fixed λ ≡ N/k. We compute some correlation functions in this theory exactly as a function of λ, in the large N (planar) limit. We show that the results match with the general predictions of Maldacena and Zhiboedov for the correlators of theories that have high-spin symmetries in the large N limit. It has been suggested in the past that this theory is dual (in the large N limit) to the Legendre transform of the theory of fermions coupled to a Chern-Simons gauge field, and our results allow us to find the precise mapping between the two theories. We find that in the large N limit the theory of N scalars coupled to a U(N) k Chern-Simons theory is equivalent to the Legendre transform of the theory of k fermions coupled to a U(k) N Chern-Simons theory, thus providing a bosonization of the latter theory. We conjecture that perhaps this duality is valid also for finite values of N and k, where on the fermionic side we should now have (for N f flavors) a $ \mathrm{U}{(k)_{{{{{N-{N_f}}} \left/ {2} \right.}}}} $ theory. Similar results hold for real scalars (fermions) coupled to the O(N) k Chern-Simons theory.

335 citations


Cited by
More filters
Journal Article
TL;DR: A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.
Abstract: Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning , which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model (PaLM). We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies.

1,429 citations

Journal ArticleDOI
TL;DR: In this paper, the role of pertubative renormalization group (RG) approaches and self-consistent renormalized spin fluctuation (SCR-SF) theories to understand the quantum-classical crossover in the vicinity of the quantum critical point with generalization to the Kondo effect in heavy-fermion systems is discussed.
Abstract: We give a general introduction to quantum phase transitions in strongly-correlated electron systems. These transitions which occur at zero temperature when a non-thermal parameter $g$ like pressure, chemical composition or magnetic field is tuned to a critical value are characterized by a dynamic exponent $z$ related to the energy and length scales $\Delta$ and $\xi$. Simple arguments based on an expansion to first order in the effective interaction allow to define an upper-critical dimension $D_{C}=4$ (where $D=d+z$ and $d$ is the spatial dimension) below which mean-field description is no longer valid. We emphasize the role of pertubative renormalization group (RG) approaches and self-consistent renormalized spin fluctuation (SCR-SF) theories to understand the quantum-classical crossover in the vicinity of the quantum critical point with generalization to the Kondo effect in heavy-fermion systems. Finally we quote some recent inelastic neutron scattering experiments performed on heavy-fermions which lead to unusual scaling law in $\omega /T$ for the dynamical spin susceptibility revealing critical local modes beyond the itinerant magnetism scheme and mention new attempts to describe this local quantum critical point.

1,347 citations

Journal ArticleDOI
TL;DR: In this article, a Theta vacua of gauge theories is proposed for cosmologists. But the authors do not consider the cosmological perturbation theory of axions in string theory.
Abstract: 1 Introduction 2 Models: the QCD axion; the strong CP problem; PQWW, KSVZ, DFSZ; anomalies, instantons and the potential; couplings; axions in string theory 3 Production and IC's: SSB and non-perturbative physics; the axion field during inflation and PQ SSB; cosmological populations - decay of parent, topological defects, thermal production, vacuum realignment 4 The Cosmological Field: action; background evolution; misalignment for QCD axion and ALPs; cosmological perturbation theory - ic's, early time treatment, axion sound speed and Jeans scale, transfer functions and WDM; the Schrodinger picture; simualting axions; BEC 5 CMB and LSS: Primary anisotropies; matter power; combined constraints; Isocurvature and inflation 6 Galaxy Formation; halo mass function; high-z and the EOR; density profiles; the CDM small-scale crises 7 Accelerated expansion: the cc problem; axion inflation (natural and monodromy) 8 Gravitational interactions with black holes and pulsars 9 Non-gravitational interactions: stellar astrophysics; LSW; vacuum birefringence; axion forces; direct detection with ADMX and CASPEr; Axion decays; dark radiation; astrophysical magnetic fields; cosmological birefringence 10 Conclusions A Theta vacua of gauge theories B EFT for cosmologists C Friedmann equations D Cosmological fluids E Bayes Theorem and priors F Degeneracies and sampling G Sheth-Tormen HMF

1,282 citations

Proceedings ArticleDOI
23 May 2022
TL;DR: This work presents Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding, and finds that human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment.
Abstract: We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. Our key discovery is that generic large language models (e.g. T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis: increasing the size of the language model in Imagen boosts both sample fidelity and image-text alignment much more than increasing the size of the image diffusion model. Imagen achieves a new state-of-the-art FID score of 7.27 on the COCO dataset, without ever training on COCO, and human raters find Imagen samples to be on par with the COCO data itself in image-text alignment. To assess text-to-image models in greater depth, we introduce DrawBench, a comprehensive and challenging benchmark for text-to-image models. With DrawBench, we compare Imagen with recent methods including VQ-GAN+CLIP, Latent Diffusion Models, and DALL-E 2, and find that human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment. See https://imagen.research.google/ for an overview of the results.

1,270 citations

Proceedings Article
28 Jan 2022
TL;DR: Experiments on three large language models show that chain-of-thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks.
Abstract: We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.

1,211 citations