Elmoustapha Ould-Ahmed-Vall

Patent

Compression in machine learning and deep learning processing

TL;DR: In this paper, the authors present an embodiment of an apparatus for compression of untyped data including a graphical processing unit (GPU) including a data compression pipeline, the data pipeline includes a data port coupled with one or more shader cores.

...read moreread less

Patent

Collapsing of multiple nested loops, methods and instructions

Mikhail Plotnikov, +2 more

TL;DR: In this paper, a multi-dimensional loop counter update instruction with a decode logic and an execution logic are presented. And methods to collapse loops using such instructions are described and claimed.

...read moreread less

Patent

Compute optimizations for neural networks

Kevin Nealis, +10 more

TL;DR: In this paper, a neural network and an arithmetic logic unit including a barrel shifter, an adder, and an accumulator register are used to decode a single instruction into a decoded instruction that specifies multiple operands including an input value and a quantized weight value.

...read moreread less

Patent

Compute optimizations for low precision machine learning operations

Elmoustapha Ould-Ahmed-Vall, +16 more

TL;DR: In this article, the authors present an accelerator module comprising a memory stack including multiple memory dies, a graphics processing unit (GPU) coupled with the memory stack via one or more memory controllers, the GPU including a plurality of multiprocessors having a single instruction, multiple thread (SIMT) architecture, and the at least one single instruction to cause at least a portion of the GPU to perform a floating-point operation on input having differing precisions.

...read moreread less

Patent

Systems, apparatuses, and methods for performing mask bit compression

Bret L. Toll, +4 more

TL;DR: In this article, a single mask bit compression instruction that includes a source writemask register operand, a destination writeemask operand and an opcode is described. But it does not specify a single opcode.

...read moreread less

Papers

Compression in machine learning and deep learning processing

Collapsing of multiple nested loops, methods and instructions

Compute optimizations for neural networks

Compute optimizations for low precision machine learning operations

Systems, apparatuses, and methods for performing mask bit compression