scispace - formally typeset
E

Elmoustapha Ould-Ahmed-Vall

Researcher at Intel

Publications -  299
Citations -  1664

Elmoustapha Ould-Ahmed-Vall is an academic researcher from Intel. The author has contributed to research in topics: Operand & Opcode. The author has an hindex of 19, co-authored 299 publications receiving 1656 citations. Previous affiliations of Elmoustapha Ould-Ahmed-Vall include Georgia Institute of Technology & AMIT.

Papers
More filters
Patent

Apparatus and method for multiply, add/subtract, and accumulate of packed data elements

TL;DR: In this article, an apparatus and method for performing dual concurrent multiplications, subtraction/addition, and accumulation of packed data elements is presented, where a decoder is used to decode an instruction to generate a decoded instruction, and an execution circuitry comprising: multiplier circuitry to multiply the first and third data elements to generate the first temporary product, and adder circuitry to concurrently add the second temporary product to a second accumulated packed data element from a third source register, which is at least twice as large as the first width.
Patent

Systems, methods, and apparatuses for tile matrix multiplication and accumulation

TL;DR: In this article, the source/destination matrix operands are associated with a source and a destination matrix operand, and decoding circuitry is used to decode an instruction having fields for an opcode, an identifier for a first source matrix operator, and an identifier of a second source operand.
Patent

Instructions and logic for bit field address and insertion

TL;DR: In this article, a processor includes a core to execute an instruction to return an address of a bit-field in a packed bit array, and the core includes logic to identify an index of the bit field, identify a length, multiply the index and length, and return the address and bit offset based upon a product of the indexed and length.
Patent

Policy-based system interface for a real-time autonomous system

TL;DR: In this paper, the authors present an embodiment of an apparatus for compression of untyped data including a graphical processing unit (GPU) including a data compression pipeline, the data pipeline includes a data port coupled with one or more shader cores.
Patent

Systems and methods for computing dot products of nibbles in two tile operands

TL;DR: In this article, a processor includes decode circuitry to decode a tile dot product instruction having fields for anopcode, a destination identifier to identify a M by N destination matrix, a first source identifier and a second source identifier for identifying a K by N secondsource matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element of the identified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the first source matrix by a corresponding nib