scispace - formally typeset
A

Ashish Vaswani

Researcher at Google

Publications -  73
Citations -  70493

Ashish Vaswani is an academic researcher from Google. The author has contributed to research in topics: Machine translation & Transformer (machine learning model). The author has an hindex of 34, co-authored 70 publications receiving 35599 citations. Previous affiliations of Ashish Vaswani include Information Sciences Institute & University of Southern California.

Papers
More filters
Patent

Fully attentional computer vision

TL;DR: In this article, a system implemented as computer programs on one or more computers in one or multiple locations that implements a computer vision model is described, where a positional local self-attention layer is configured to receive an input feature map and to generate an output feature map.

S cale e fficiently : i nsights from p re - training and f ine - tuning t ransformers

TL;DR: It is shown that aside from only the model size, model shape matters for downstream fine-tuning, and scaling protocols operate differently at different compute regions, which means widely adopted T5-base and T4-large sizes are Pareto-inefficient.
Journal Article

Documentary Linguistics and Computational Linguistics: A response to Brooks

TL;DR: In mid-2012, a two-week workshop was organized in Papua New Guinea to provide training in basic techniques and technologies for language documentation, and to gain understanding of how these technologies might be improved in the future.
Posted Content

The Efficiency Misnomer

TL;DR: In this paper, the authors thoroughly discuss common cost indicators, their advantages and disadvantages, and how they can contradict each other, and demonstrate how incomplete reporting of cost indicators can lead to partial conclusions and a blurred or incomplete picture of the practical considerations of different models.

Simple and Efficient ways to Improve REALM

TL;DR: RealM++ as discussed by the authors improves upon the training and inference setups and introduces better supervision signal for improving performance, without any architectural changes, achieving 5.5% absolute accuracy gains over the baseline while being faster to train.