Home
/
Authors
/
Elmoustapha Ould-Ahmed-Vall

Author

Elmoustapha Ould-Ahmed-Vall

Other affiliations: Georgia Institute of Technology, AMIT

Bio: Elmoustapha Ould-Ahmed-Vall is an academic researcher from Intel. The author has contributed to research in topics: Operand & Opcode. The author has an hindex of 19, co-authored 299 publications receiving 1656 citations. Previous affiliations of Elmoustapha Ould-Ahmed-Vall include Georgia Institute of Technology & AMIT.

Topics: Operand, Opcode, Execution unit, Matrix (mathematics), Data element ...read more

Papers published on a yearly basis

2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005

Papers

PDF

Open Access

More filters

Patent•

Instruction and logic to provide vector horizontal compare functionality

[...]

Elmoustapha Ould-Ahmed-Vall¹, Charles R. Yount¹, Suleyman Sair¹, Kshitij A. Doshi¹•Institutions (1)

Intel¹

30 Nov 2011

TL;DR: In this article, an instruction specifying: a destination operand, a size of vector elements, a source operand and a mask corresponding to a portion of the vector element data fields in the source operands, corresponding to the mask and compare the values for equality.

...read moreread less

Abstract: Instructions and logic provide vector horizontal compare functionality. Some embodiments, responsive to an instruction specifying: a destination operand, a size of the vector elements, a source operand, and a mask corresponding to a portion of the vector element data fields in the source operand; read values from data fields of the specified size in the source operand, corresponding to the mask and compare the values for equality. In some embodiments, responsive to a detection of inequality, a trap may be taken. In some alternative embodiments, a flag may be set. In other alternative embodiments, a mask field may be set to a masked state for the corresponding unequal value(s). In some embodiments, responsive to all unmasked data fields of the source operand being equal to a particular value, that value may be broadcast to all data fields of the specified size in the destination operand.

...read moreread less

135 citations

Patent•

Vector friendly instruction format and execution thereof

[...]

Robert Valentine, Jesus Corbal San Adrian, Roger Espasa Sans, Robert Dale Cavin, Bret L. Toll, Santiago Galan Duran, Jeffrey G. Wiedemeier, Sridhar Samudrala, Milind B. Girkar, Edward T. Grochowski, Jonathan C. Hall, Dennis R. Bradford, Elmoustapha Ould-Ahmed-Vall, James C. Abel, Mark J. Charney, Seth Abraham, Suleyman Sair, Andrew T. Forsyth, Lisa Wu, Charles R. Yount - Show less +16 more

30 Sep 2011

TL;DR: A vector friendly instruction format as mentioned in this paper has a plurality of fields including a base operation field, a modifier field, an augmentation operation field and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operator field, the modifier field and the alpha field.

...read moreread less

Abstract: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.

...read moreread less

63 citations

Proceedings Article•DOI•

Using Model Trees for Computer Architecture Performance Analysis of Software Applications

[...]

Elmoustapha Ould-Ahmed-Vall¹, J. Woodlee¹, Charles R. Yount¹, K.A. Doshi¹, S. Abraham¹ - Show less +1 more•Institutions (1)

Intel¹

25 Apr 2007

TL;DR: A model-tree based approach based on the M5' algorithm is implemented and validated that accounts for event interactions and workload characteristics, attesting it as a sound approach for performance analysis of modern superscalar machines.

...read moreread less

Abstract: The identification of performance issues on specific computer architectures has a variety of important benefits such as tuning software to improve performance, comparing the performance of various platforms and assisting in the design of new platforms. In order to enable this analysis, most modern micro-processors provide access to hardware-based event counters. Unfortunately, features such as out-of-order execution, pre-fetching and speculation complicate the interpretation of the raw data. Thus, the traditional approach of assigning a uniform estimated penalty to each event does not accurately identify and quantify performance limiters. This paper presents a novel method employing a statistical regression-modeling approach to better achieve this goal. Specifically, a model-tree based approach based on the M5' algorithm is implemented and validated that accounts for event interactions and workload characteristics. Data from a subset of the SPEC CPU2006 suite is used by the algorithm to automatically build a performance-model tree, identifying the unique performance classes (phases) found in the suite and associating with each class a unique, explanatory linear model of performance events. These models can be used to identify performance problems for a given workload and estimate the potential gain from addressing each problem. This information can help orient the performance optimization efforts to focus available time and resources on techniques most likely to impact performance problems with highest potential gain. The model tree exhibits high correlation (more than 0.98) and low relative absolute error (less than 8 %) between predicted and measured performance, attesting it as a sound approach for performance analysis of modern superscalar machines

...read moreread less

53 citations

Journal Article•DOI•

Distributed Fault-Tolerance for Event Detection Using Heterogeneous Wireless Sensor Networks

[...]

Elmoustapha Ould-Ahmed-Vall¹, Bonnie Ferri², George F. Riley²•Institutions (2)

Intel¹, Georgia Institute of Technology²

01 Dec 2012-IEEE Transactions on Mobile Computing

TL;DR: A general fault-tolerant event detection scheme that allows nodes to detect erroneous local decisions by leveraging the local decisions reported by their neighbors and is proven to be optimal under the maximum a posteriori (MAP) criterion.

...read moreread less

Abstract: This paper presents a general fault-tolerant event detection scheme that allows nodes to detect erroneous local decisions by leveraging the local decisions reported by their neighbors. This detection scheme can handle cases where nodes have different accuracy levels. The derived fault-tolerant estimator is proven to be optimal under the maximum a posteriori (MAP) criterion. An equivalent weighted voting scheme is also derived. Further, two new error models are derived to take into account the neighbor distance and the geographical distributions of the two decision quorums. These models are particularly suitable for detection applications where the event under consideration is highly localized. The fault-tolerant estimator is simulated using a network of 1,024 nodes deployed randomly in a square region and assigned random probabilities of failure. Several estimation schemes that allow nodes to learn their error rates continuously are developed. These error rates are used in the distributed estimation schemes to assign appropriate weights to the nodes in the voting scheme.

...read moreread less

49 citations

Proceedings Article•DOI•

Distributed unique global ID assignment for sensor networks

[...]

Elmoustapha Ould-Ahmed-Vall¹, Douglas M. Blough¹, Bonnie S. Heck¹, George F. Riley¹•Institutions (1)

Georgia Institute of Technology¹

12 Dec 2005

TL;DR: A distributed algorithm to solve the unique ID assignment problem is presented and it is demonstrated that a high percentage of nodes are assigned globally unique IDs at the termination of the algorithm when the algorithm parameters are set properly.

...read moreread less

Abstract: A sensor network consists of a set of battery-powered nodes, which collaborate to perform sensing tasks in a given environment. It may contain one or more base stations to collect sensed data and possibly relay it to a central processing and storage system. These networks are characterized by scarcity of resources, in particular the available energy. We present a distributed algorithm to solve the unique ID assignment problem. The proposed solution starts by assigning long unique IDs and organizing nodes in a tree structure. This tree structure is used to compute the size of the network. Then, unique IDs are assigned using the minimum number of bytes. Globally unique IDs are useful in providing many network functions, e.g. configuration, monitoring of individual nodes, and various security mechanisms. Theoretical and simulation analysis of the proposed solution have been preformed. The results demonstrate that a high percentage of nodes (more than 99%) are assigned globally unique IDs at the termination of the algorithm when the algorithm parameters are set properly. Furthermore, the algorithm terminates in a relatively short time that scales well with the network size. For example, the algorithm terminates in about 5 minutes for a network of 1,000 nodes

...read moreread less

46 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A survey on wireless multimedia sensor networks

[...]

Ian F. Akyildiz¹, Tommaso Melodia¹, Kaushik R. Chowdhury¹•Institutions (1)

Georgia Institute of Technology¹

01 Mar 2007-Computer Networks

TL;DR: Existing solutions and open research issues at the application, transport, network, link, and physical layers of the communication protocol stack are investigated, along with possible cross-layer synergies and optimizations.

...read moreread less

2,311 citations

Journal Article•

Habitat monitoring: Application driver for wireless communications technology

[...]

Alberto E. Cerpa, Jeremy Elson, Deborah Estrin, Lewis Girod, Michael Hamilton, Jerry Zhao¹ - Show less +2 more•Institutions (1)

Information Sciences Institute¹

01 Jan 2001-Center for Embedded Network Sensing

TL;DR: This work proposes a tiered system architecture in which data collected at numerous, inexpensive sensor nodes is filtered by local processing on its way through to larger, more capable and more expensive nodes.

...read moreread less

Abstract: As new fabrication and integration technologies reduce the cost and size of micro-sensors and wireless interfaces, it becomes feasible to deploy densely distributed wireless networks of sensors and actuators. These systems promise to revolutionize biological, earth, and environmental monitoring applications, providing data at granularities unrealizable by other means. In addition to the challenges of miniaturization, new system architectures and new network algorithms must be developed to transform the vast quantity of raw sensor data into a manageable stream of high-level data. To address this, we propose a tiered system architecture in which data collected at numerous, inexpensive sensor nodes is filtered by local processing on its way through to larger, more capable and more expensive nodes.We briefly describe Habitat monitoring as our motivating application and introduce initial system building blocks designed to support this application. The remainder of the paper presents details of our experimental platform.

...read moreread less

454 citations

Journal Article•DOI•

Ultralow-Power Design in Near-Threshold Region

[...]

Dejan Markovic¹, Cheng C. Wang¹, Louis P. Alarcon², Tsung-Te Liu², Jan M. Rabaey² - Show less +1 more•Institutions (2)

University of California, Los Angeles¹, University of California, Berkeley²

22 Jan 2010

TL;DR: This paper explores how design in the moderate inversion region helps to recover some of that lost performance, while staying quite close to the minimum-energy point, and introduces a pass-transistor based logic family that excels in this operational region.

...read moreread less

Abstract: Operation in the subthreshold region most often is synonymous to minimum-energy operation. Yet, the penalty in performance is huge. In this paper, we explore how design in the moderate inversion region helps to recover some of that lost performance, while staying quite close to the minimum-energy point. An energy-delay modeling framework that extends over the weak, moderate, and strong inversion regions is developed. The impact of activity and design parameters such as supply voltage and transistor sizing on the energy and performance in this operational region is derived. The quantitative benefits of operating in near-threshold region are established using some simple examples. The paper shows that a 20% increase in energy from the minimum-energy point gives back ten times in performance. Based on these observations, a pass-transistor based logic family that excels in this operational region is introduced. The logic family operates most of its logic in the above-threshold mode (using low-threshold transistors), yet containing leakage to only those in subthreshold. Operation below minimum-energy point of CMOS is demonstrated. In leakage-dominated ultralow-power designs, time-multiplexing will be shown to yield not only area, but also energy reduction due to lower leakage. Finally, the paper demonstrates the use of ultralow-power design techniques in chip synthesis.

...read moreread less

391 citations

Journal Article•DOI•

Review: From wireless sensor networks towards cyber physical systems

[...]

Fang-Jing Wu¹, Yu-Fen Kao², Yu-Chee Tseng¹•Institutions (2)

National Chiao Tung University¹, Chung Hua University²

01 Aug 2011-Pervasive and Mobile Computing

TL;DR: This article reviews some research activities in WSN and reviews some CPS platforms and systems that have been developed recently, including health care, navigation, rescue, intelligent transportation, social networking, and gaming applications.

...read moreread less

323 citations

Journal Article•DOI•

A mechanistic performance model for superscalar out-of-order processors

[...]

Stijn Eyerman¹, Lieven Eeckhout¹, Tejas Karkhanis², James E. Smith³•Institutions (3)

Ghent University¹, Advanced Micro Devices², University of Wisconsin-Madison³

29 May 2009-ACM Transactions on Computer Systems

TL;DR: The mechanistic model provides several advantages over prior modeling approaches, and, when estimating performance, it differs from detailed simulation of a 4-wide out-of-order processor by an average of 7%.

...read moreread less

Abstract: A mechanistic model for out-of-order superscalar processors is developed and then applied to the study of microarchitecture resource scaling. The model divides execution time into intervals separated by disruptive miss events such as branch mispredictions and cache misses. Each type of miss event results in characterizable performance behavior for the execution time interval. By considering an interval's type and length (measured in instructions), execution time can be predicted for the interval. Overall execution time is then determined by aggregating the execution time over all intervals. The mechanistic model provides several advantages over prior modeling approaches, and, when estimating performance, it differs from detailed simulation of a 4-wide out-of-order processor by an average of 7p.The mechanistic model is applied to the general problem of resource scaling in out-of-order superscalar processors. First, we use the model to determine size relationships among microarchitecture structures in a balanced processor design. Second, we use the mechanistic model to study scaling of both pipeline depth and width in balanced processor designs. We corroborate previous results in this area and provide new results. For example, we show that at optimal design points, the pipeline depth times the square root of the processor width is nearly constant. Finally, we consider the behavior of unbalanced, overprovisioned processor designs based on insight gained from the mechanistic model. We show that in certain situations an overprovisioned processor may lead to improved overall performance. Designs where a processor's dispatch width is wider than its issue width are of particular interest.

...read moreread less

168 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse