E
Elmoustapha Ould-Ahmed-Vall
Researcher at Intel
Publications - 299
Citations - 1664
Elmoustapha Ould-Ahmed-Vall is an academic researcher from Intel. The author has contributed to research in topics: Operand & Opcode. The author has an hindex of 19, co-authored 299 publications receiving 1656 citations. Previous affiliations of Elmoustapha Ould-Ahmed-Vall include Georgia Institute of Technology & AMIT.
Papers
More filters
Patent
Systems and methods to load a tile register pair
Raanan Sade,Simon Rubanovich,Amit Gradstein,Zeev Sperber,Alexander Heinecke,Robert Valentine,Mark J. Charney,Bret L. Toll,Corbal Jesus,Elmoustapha Ould-Ahmed-Vall,Menachem Adelman +10 more
TL;DR: In this article, the authors present decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers, respectively, each matrix having a PAIR parameter equal to TRUE.
Patent
Instruction and logic for partial reduction operations
TL;DR: In this article, a processor includes a fetch logic to fetch instructions, the instructions including a partial reduction instruction; a decode logic to decode the partial reduction instructions and provide the decoded Partial Reduction instruction to one or more execution units; and one or multiple execution units to, responsive to the decoding, perform a plurality of N partial reduction operations to generate an output data elements, where an input array comprises N lanes, and each of the N partial reductions operations is to reduce a set of input data elements included in a corresponding lane of the n lanes.
Patent
Fused Multiply-Add (FMA) low functional unit
Cristina S. Anderson,Marius A. Cornea-Hasegan,Elmoustapha Ould-Ahmed-Vall,Robert Valentine,Jesus Corbal,Nikita Astafev,Mark J. Charney,Milind B. Girkar,Amit Gradstein,Simon Rubanovich,Zeev Sperber +10 more
TL;DR: In this article, a register and a fused multiply-add (FMA) low functional unit are used to store first, second, and third floating point (FP) values.
Patent
Hardware cancellation monitor for floating point operations
TL;DR: In this article, a processor includes a plurality of cores, with at least one core including a cancellation monitor unit, which detects an execution of a floating point (FP) instruction in the core, wherein the execution of the FP instruction uses a set of FP inputs and generates an FP output.
Patent
Providing vector horizontal compare functionality within a vector register
TL;DR: In this article, an instruction specifying: a destination operand, a size of vector elements, a source operand and a mask corresponding to a portion of the vector element data fields in the source operands, corresponding to the mask and compare the values for equality.