Valerii Likhosherstov

Researcher at University of Cambridge

Publications - 30

Citations - 884

Valerii Likhosherstov is an academic researcher from University of Cambridge. The author has contributed to research in topics: Computer science & Ising model. The author has an hindex of 5, co-authored 22 publications receiving 309 citations. Previous affiliations of Valerii Likhosherstov include Google.

Papers

PDF

Open Access

More filters

Posted Content

Rethinking Attention with Performers

Krzysztof Choromanski, +12 more

- 30 Sep 2020 -

arXiv: Learning

TL;DR: Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear space and time complexity, without relying on any priors such as sparsity or low-rankness are introduced.

...read moreread less

Proceedings Article

Rethinking Attention with Performers

Krzysztof Choromanski, +12 more

TL;DR: Performers as mentioned in this paper uses Fast Attention Via positive Orthogonal Random features (FAVOR+) to approximate softmax attention-kernels, which can estimate regular (softmax) full-rank attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity.

...read moreread less

Posted Content

Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers

Krzysztof Choromanski, +8 more

- 05 Jun 2020 -

arXiv: Learning

TL;DR: A new Transformer architecture, Performer, based on Fast Attention Via Orthogonal Random features (FAVOR), which demonstrates its effectiveness on the challenging task of protein sequence modeling and provides strong theoretical guarantees: unbiased estimation of the attention matrix and uniform convergence.

...read moreread less

Posted Content

Sub-Linear Memory: How to Make Performers SLiM.

Valerii Likhosherstov, +4 more

- 21 Dec 2020 -

arXiv: Learning

TL;DR: A thorough analysis of recent Transformer mechanisms with linear self-attention, Performers, results in a remarkable computational flexibility: forward and backward propagation can be performed with no approximations using sublinear memory as a function of $L$ (in addition to negligible storage for the input sequence), at a cost of greater time complexity in the parallel setting.

...read moreread less

Posted Content

An Ode to an ODE

Krzysztof Choromanski, +8 more

- 19 Jun 2020 -

arXiv: Learning

TL;DR: A new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the orthogonal group O(d), which provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.

...read moreread less

Collapse