E
Elmoustapha Ould-Ahmed-Vall
Researcher at Intel
Publications - 299
Citations - 1664
Elmoustapha Ould-Ahmed-Vall is an academic researcher from Intel. The author has contributed to research in topics: Operand & Opcode. The author has an hindex of 19, co-authored 299 publications receiving 1656 citations. Previous affiliations of Elmoustapha Ould-Ahmed-Vall include Georgia Institute of Technology & AMIT.
Papers
More filters
Patent
Apparatus and method for multiply, add/subtract, and accumulate of packed data elements
TL;DR: In this article, an apparatus and method for performing dual concurrent multiplications, subtraction/addition, and accumulation of packed data elements is presented, where a decoder is used to decode an instruction to generate a decoded instruction, and an execution circuitry comprising: multiplier circuitry to multiply the first and third data elements to generate the first temporary product, and adder circuitry to concurrently add the second temporary product to a second accumulated packed data element from a third source register, which is at least twice as large as the first width.
Patent
Systems, methods, and apparatuses for tile matrix multiplication and accumulation
Valentine Robert,Zeev Sperber,Mark J. Charney,Bret L. Toll,Rinat Rappoport,Stanislav Shwartsman,Baum Dan,Igor Yanover,Elmoustapha Ould-Ahmed-Vall,Menachem Adelman,Corbal Jesus,Yuri Gebil,Simon Rubanovich +12 more
TL;DR: In this article, the source/destination matrix operands are associated with a source and a destination matrix operand, and decoding circuitry is used to decode an instruction having fields for an opcode, an identifier for a first source matrix operator, and an identifier of a second source operand.
Patent
Instructions and logic for bit field address and insertion
TL;DR: In this article, a processor includes a core to execute an instruction to return an address of a bit-field in a packed bit array, and the core includes logic to identify an index of the bit field, identify a length, multiply the index and length, and return the address and bit offset based upon a product of the indexed and length.
Patent
Policy-based system interface for a real-time autonomous system
Ray Joydeep,Ben Ashbaugh,Surti Prasoonkumar,Pradeep Ramani,Rama Harihara,Jerin C. Justin,Jing Huang,Xiaoming Cui,Timothy B. Costa,Ting Gong,Elmoustapha Ould-Ahmed-Vall,Kumar Balasubramanian,Anil Thomas,Oguz H. Elibol,Jayaram Bobba,Guozhong Zhuang,Bhavani Subramanian,Gokce Keskin,Chandrasekaran Sakthivel,Rajesh Poornachandran +19 more
TL;DR: In this paper, the authors present an embodiment of an apparatus for compression of untyped data including a graphical processing unit (GPU) including a data compression pipeline, the data pipeline includes a data port coupled with one or more shader cores.
Patent
Systems and methods for computing dot products of nibbles in two tile operands
Raanan Sade,Rubanovich Simon,Amit Gradstein,Sperber Zeev,Alexander Heinecke,Valentine Robert,Charney Mark,Bret L. Toll,Corbal Jesus,Elmoustapha Ould-Ahmed-Vall,Menachem Adelman +10 more
TL;DR: In this article, a processor includes decode circuitry to decode a tile dot product instruction having fields for anopcode, a destination identifier to identify a M by N destination matrix, a first source identifier and a second source identifier for identifying a K by N secondsource matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element of the identified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the first source matrix by a corresponding nib