Home
/
Authors
/
Guy Gur-Ari

Author

Guy Gur-Ari

Other affiliations: Weizmann Institute of Science, Stanford University, Institute for Advanced Study

Bio: Guy Gur-Ari is an academic researcher from Google. The author has contributed to research in topics: Chern–Simons theory & Gauge theory. The author has an hindex of 22, co-authored 39 publications receiving 2694 citations. Previous affiliations of Guy Gur-Ari include Weizmann Institute of Science & Stanford University.

Papers

PDF

Open Access

More filters

Journal Article•

PaLM: Scaling Language Modeling with Pathways

[...]

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Velu Prabhakaran, Emily Reif, Nan Du, B. C. Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Peng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, L Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Oliveira Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zong Tuan Zhou, Xuezhi Wang, Brennan Saeta, Mark Díaz, Orhan Firat, M. Catasta, Jason Loh Seong Wei, Kathleen S. Meier-Hellstern, Douglas Eck, Jeffrey Dean, Slav Petrov, Noah Fiedel - Show less +63 more

05 Apr 2022-arXiv.org

TL;DR: A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.

...read moreread less

Abstract: Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning , which drastically reduces the number of task-speciﬁc training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model (PaLM). We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly eﬃcient training across multiple TPU Pods. We demonstrate continued beneﬁts of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the ﬁnetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A signiﬁcant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies.

...read moreread less

1,429 citations

Journal Article•DOI•

Black Holes and Random Matrices

[...]

Jordan Cotler¹, Guy Gur-Ari¹, Masanori Hanada², Masanori Hanada¹, Masanori Hanada³, Joseph Polchinski⁴, Joseph Polchinski⁵, Phil Saad¹, Stephen H. Shenker¹, Douglas Stanford⁶, Alexandre Streicher⁴, Alexandre Streicher¹, Masaki Tezuka² - Show less +9 more•Institutions (6)

Stanford University¹, Kyoto University², Yukawa Institute for Theoretical Physics³, University of California, Santa Barbara⁴, Kavli Institute for Theoretical Physics⁵, Institute for Advanced Study⁶

01 May 2017-Journal of High Energy Physics

TL;DR: In this paper, the authors show that the late time behavior of horizon fluctuations in large anti-de Sitter (AdS) black holes is governed by the random matrix dynamics characteristic of quantum chaotic systems.

...read moreread less

Abstract: We argue that the late time behavior of horizon fluctuations in large anti-de Sitter (AdS) black holes is governed by the random matrix dynamics characteristic of quantum chaotic systems. Our main tool is the Sachdev-Ye-Kitaev (SYK) model, which we use as a simple model of a black hole. We use an analytically continued partition function |Z(β + it)|2 as well as correlation functions as diagnostics. Using numerical techniques we establish random matrix behavior at late times. We determine the early time behavior exactly in a double scaling limit, giving us a plausible estimate for the crossover time to random matrix behavior. We use these ideas to formulate a conjecture about general large AdS black holes, like those dual to 4D super-Yang-Mills theory, giving a provisional estimate of the crossover time. We make some preliminary comments about challenges to understanding the late time dynamics from a bulk point of view.

...read moreread less

553 citations

Journal Article•DOI•

d = 3 bosonic vector models coupled to Chern-Simons gauge theories

[...]

Ofer Aharony¹, Guy Gur-Ari¹, Ran Yacoby¹•Institutions (1)

Weizmann Institute of Science¹

13 Mar 2012-Journal of High Energy Physics

TL;DR: In this paper, the authors studied three-dimensional O(N) (U(N)) vector models with the Chern-Simons theory coupled to a scalar field in the fundamental representation, in the large N limit.

...read moreread less

Abstract: We study three dimensional O(N) k and U(N) k Chern-Simons theories coupled to a scalar field in the fundamental representation, in the large N limit. For infinite k this is just the singlet sector of the O(N) (U(N)) vector model, which is conjectured to be dual to Vasiliev’s higher spin gravity theory on AdS 4. For large k and N we obtain a parity-breaking deformation of this theory, controlled by the ’t Hooft coupling λ = 4πN/k. For infinite N we argue (and show explicitly at two-loop order) that the theories with finite λ are conformally invariant, and also have an exactly marginal (ϕ 2)3 deformation. For large but finite N and small ’t Hooft coupling λ, we show that there is still a line of fixed points parameterized by the ’t Hooft coupling λ. We show that, at infinite N, the interacting non-parity-invariant theory with finite λ has the same spectrum of primary operators as the free theory, consisting of an infinite tower of conserved higher-spin currents and a scalar operator with scaling dimension Δ = 1; however, the correlation functions of these operators do depend on λ. Our results suggest that there should exist a family of higher spin gravity theories, parameterized by λ, and continuously connected to Vasiliev’s theory. For finite N the higher spin currents are not conserved.

...read moreread less

438 citations

Journal Article•

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

[...]

Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb +436 more

09 Jun 2022-arXiv.org

TL;DR: Evaluation of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters finds that model performance and calibration both improve with scale, but are poor in absolute terms.

...read moreread less

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit"breakthrough"behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.

...read moreread less

376 citations

Journal Article•DOI•

Correlation functions of large N Chern-Simons-Matter theories and bosonization in three dimensions

[...]

Ofer Aharony¹, Guy Gur-Ari¹, Ran Yacoby¹•Institutions (1)

Weizmann Institute of Science¹

06 Dec 2012-Journal of High Energy Physics

TL;DR: In this article, the authors considered the conformal field theory of N complex massless scalars coupled to a U(N) Chern-Simons theory at level k, and they showed that the theory is equivalent to the Legendre transform of the theory of k fermions coupled to the U(k)

...read moreread less

Abstract: We consider the conformal field theory of N complex massless scalars in 2 + 1 dimensions, coupled to a U(N) Chern-Simons theory at level k. This theory has a ’t Hooft large N limit, keeping fixed λ ≡ N/k. We compute some correlation functions in this theory exactly as a function of λ, in the large N (planar) limit. We show that the results match with the general predictions of Maldacena and Zhiboedov for the correlators of theories that have high-spin symmetries in the large N limit. It has been suggested in the past that this theory is dual (in the large N limit) to the Legendre transform of the theory of fermions coupled to a Chern-Simons gauge field, and our results allow us to find the precise mapping between the two theories. We find that in the large N limit the theory of N scalars coupled to a U(N) k Chern-Simons theory is equivalent to the Legendre transform of the theory of k fermions coupled to a U(k) N Chern-Simons theory, thus providing a bosonization of the latter theory. We conjecture that perhaps this duality is valid also for finite values of N and k, where on the fermionic side we should now have (for N f flavors) a $ \mathrm{U}{(k)_{{{{{N-{N_f}}} \left/ {2} \right.}}}} $ theory. Similar results hold for real scalars (fermions) coupled to the O(N) k Chern-Simons theory.

...read moreread less

335 citations

1
2
3
4
…
5
6
7
8
9

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

PaLM: Scaling Language Modeling with Pathways

[...]

05 Apr 2022-arXiv.org

...read moreread less

1,429 citations

Journal Article•DOI•

Quantum Phase Transitions

[...]

M. Lavagna

07 Feb 2001-arXiv: Strongly Correlated Electrons

TL;DR: In this paper, the role of pertubative renormalization group (RG) approaches and self-consistent renormalized spin fluctuation (SCR-SF) theories to understand the quantum-classical crossover in the vicinity of the quantum critical point with generalization to the Kondo effect in heavy-fermion systems is discussed.

...read moreread less

Abstract: We give a general introduction to quantum phase transitions in strongly-correlated electron systems. These transitions which occur at zero temperature when a non-thermal parameter $g$ like pressure, chemical composition or magnetic field is tuned to a critical value are characterized by a dynamic exponent $z$ related to the energy and length scales $\Delta$ and $\xi$. Simple arguments based on an expansion to first order in the effective interaction allow to define an upper-critical dimension $D_{C}=4$ (where $D=d+z$ and $d$ is the spatial dimension) below which mean-field description is no longer valid. We emphasize the role of pertubative renormalization group (RG) approaches and self-consistent renormalized spin fluctuation (SCR-SF) theories to understand the quantum-classical crossover in the vicinity of the quantum critical point with generalization to the Kondo effect in heavy-fermion systems. Finally we quote some recent inelastic neutron scattering experiments performed on heavy-fermions which lead to unusual scaling law in $\omega /T$ for the dynamical spin susceptibility revealing critical local modes beyond the itinerant magnetism scheme and mention new attempts to describe this local quantum critical point.

...read moreread less

1,347 citations

Journal Article•DOI•

Axion Cosmology

[...]

David J. E. Marsh

26 Oct 2015-arXiv: Cosmology and Nongalactic Astrophysics

TL;DR: In this article, a Theta vacua of gauge theories is proposed for cosmologists. But the authors do not consider the cosmological perturbation theory of axions in string theory.

...read moreread less

Abstract: 1 Introduction 2 Models: the QCD axion; the strong CP problem; PQWW, KSVZ, DFSZ; anomalies, instantons and the potential; couplings; axions in string theory 3 Production and IC's: SSB and non-perturbative physics; the axion field during inflation and PQ SSB; cosmological populations - decay of parent, topological defects, thermal production, vacuum realignment 4 The Cosmological Field: action; background evolution; misalignment for QCD axion and ALPs; cosmological perturbation theory - ic's, early time treatment, axion sound speed and Jeans scale, transfer functions and WDM; the Schrodinger picture; simualting axions; BEC 5 CMB and LSS: Primary anisotropies; matter power; combined constraints; Isocurvature and inflation 6 Galaxy Formation; halo mass function; high-z and the EOR; density profiles; the CDM small-scale crises 7 Accelerated expansion: the cc problem; axion inflation (natural and monodromy) 8 Gravitational interactions with black holes and pulsars 9 Non-gravitational interactions: stellar astrophysics; LSW; vacuum birefringence; axion forces; direct detection with ADMX and CASPEr; Axion decays; dark radiation; astrophysical magnetic fields; cosmological birefringence 10 Conclusions A Theta vacua of gauge theories B EFT for cosmologists C Friedmann equations D Cosmological fluids E Bayes Theorem and priors F Degeneracies and sampling G Sheth-Tormen HMF

...read moreread less

1,282 citations

Proceedings Article•DOI•

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

[...]

Chitwan Saharia, V. K. Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, Seyedeh Sara Mahdavi, Raphael Gontijo Lopes, Tim Salimans, Jonathan Ho, David J. Fleet, Mahmood Norouzi - Show less +10 more

23 May 2022

TL;DR: This work presents Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding, and finds that human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment.

...read moreread less

Abstract: We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. Our key discovery is that generic large language models (e.g. T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis: increasing the size of the language model in Imagen boosts both sample fidelity and image-text alignment much more than increasing the size of the image diffusion model. Imagen achieves a new state-of-the-art FID score of 7.27 on the COCO dataset, without ever training on COCO, and human raters find Imagen samples to be on par with the COCO data itself in image-text alignment. To assess text-to-image models in greater depth, we introduce DrawBench, a comprehensive and challenging benchmark for text-to-image models. With DrawBench, we compare Imagen with recent methods including VQ-GAN+CLIP, Latent Diffusion Models, and DALL-E 2, and find that human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment. See https://imagen.research.google/ for an overview of the results.

...read moreread less

1,270 citations

Proceedings Article•

Chain of Thought Prompting Elicits Reasoning in Large Language Models

[...]

Jason Loh Seong Wei, Xuezhi Wang, D. Schuurmans, Maarten Bosma, Ed H. Chi, Fei Xia, Quoc Hoai Le, Denny Zhou - Show less +4 more

28 Jan 2022

TL;DR: Experiments on three large language models show that chain-of-thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks.

...read moreread less

Abstract: We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.

...read moreread less

1,211 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse