scispace - formally typeset
A

Azalia Mirhoseini

Researcher at Google

Publications -  78
Citations -  3351

Azalia Mirhoseini is an academic researcher from Google. The author has contributed to research in topics: Reinforcement learning & Computer science. The author has an hindex of 19, co-authored 67 publications receiving 2118 citations. Previous affiliations of Azalia Mirhoseini include Microsoft & Rice University.

Papers
More filters
Posted Content

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

TL;DR: This work introduces a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks, and applies the MoE to the tasks of language modeling and machine translation, where model capacity is critical for absorbing the vast quantities of knowledge available in the training corpora.
Posted Content

Device Placement Optimization with Reinforcement Learning

TL;DR: A method which learns to optimize device placement for TensorFlow computational graphs using a sequence-to-sequence model, which finds non-trivial device placements that outperform hand-crafted heuristics and traditional algorithmic methods.
Proceedings Article

Device placement optimization with reinforcement learning

TL;DR: In this article, a sequence-to-sequence model is used to predict which subsets of operations in a TensorFlow graph should run on which of the available devices, and the execution time of the predicted subsets is then used as the reward signal to optimize the parameters of the sequence to sequence model.
Proceedings Article

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

TL;DR: In this paper, a sparsely-gated mixture-of-experts (MoE) layer is proposed to increase the capacity of a neural network to absorb information without a proportional increase in computation.
Posted Content

Chip Placement with Deep Reinforcement Learning

TL;DR: This work presents a learning-based approach to chip placement, and shows that, in under 6 hours, this method can generate placements that are superhuman or comparable on modern accelerator netlists, whereas existing baselines require human experts in the loop and take several weeks.