scispace - formally typeset
Search or ask a question
Author

Tarek El-Ghazawi

Bio: Tarek El-Ghazawi is an academic researcher from George Washington University. The author has contributed to research in topics: Reconfigurable computing & Field-programmable gate array. The author has an hindex of 35, co-authored 309 publications receiving 4716 citations. Previous affiliations of Tarek El-Ghazawi include George Washington University Virginia Campus & Florida Institute of Technology.


Papers
More filters
01 Jan 2003
TL;DR: The efforts of Brian Wibecan and Greg Fischer were invaluable in bringing these specifications to the final (version 1.0) state.
Abstract: Acknowledgments Many scientists have contributed to the ideas and concepts behind these specifications. They are too many to mention here, but we would like to cite the contributions of David who have contributed to the initial UPC language concepts and specifications. We also would like to acknowledge the role of the participants in the first UPC workshop, held in May 2000 in Bowie, Maryland, and in which the specifications of this version were discussed. In particular we would like to acknowledge the support and participation of Compaq, Cray, HP, Sun, and CSC. We would like also to acknowledge the abundant input of Kevin Harris and Sébastien Chauvin and the efforts of Lauren Smith. Finally, the efforts of Brian Wibecan and Greg Fischer were invaluable in bringing these specifications to the final (version 1.0) state.

228 citations

Journal ArticleDOI
TL;DR: It is shown that automatic wavelet reduction yields better or comparable classification accuracy for hyperspectral data, while achieving substantial computational savings.
Abstract: Hyperspectral imagery provides richer information about materials than multispectral imagery. The new larger data volumes from hyperspectral sensors present a challenge for traditional processing techniques. For example, the identification of each ground surface pixel by its corresponding spectral signature is still difficult because of the immense volume of data. Conventional classification methods may not be used without dimension reduction preprocessing. This is due to the curse of dimensionality, which refers to the fact that the sample size needed to estimate a function of several variables to a given degree of accuracy grows exponentially with the number of variables. Principal component analysis (PCA) has been the technique of choice for dimension reduction. However, PCA is computationally expensive and does not eliminate anomalies that can be seen at one arbitrary band. Spectral data reduction using automatic wavelet decomposition could be useful. This is because it preserves the distinctions among spectral signatures. It is also computed in automatic fashion and can filter data anomalies. This is due to the intrinsic properties of wavelet transforms that preserves high- and low-frequency features, therefore preserving peaks and valleys found in typical spectra. Compared to PCA, for the same level of data reduction, we show that automatic wavelet reduction yields better or comparable classification accuracy for hyperspectral data, while achieving substantial computational savings.

209 citations

BookDOI
01 Jul 2003
TL;DR: This tutorial jumps right in to the power of UPC without dragging you through basic programming, with examples of both the UPC Programming Model and UPC Library in action.
Abstract: Preface. 1. Introductory Tutorial. 1.1 Getting Started. 1.2 Private and Shared Data. 1.3 Shared Arrays and Affinity of Shared Data. 1.4 Synchronization and Memory Consistency. 1.5 Work Sharing. 1.6 UPC Pointers. 1.7 Summary. Exercises. 2. Programming View and UPC Data Types. 2.1 Programming Models. 2.2 UPC Programming Model. 2.3 Shared and Private Variables. 2.4 Shared and Private Arrays. 2.5 Blocked Shared Arrays. 2.6 Compiling Environments and Shared Arrays. 2.7 Summary. Exercises. 3. Pointers and Arrays. 3.1 UPC Pointers. 3.2 Pointer Arithmetic. 3.3 Pointer Casting and Usage Practices. 3.4 Pointer Information and Manipulation Functions. 3.5 More Pointer Examples. 3.6 Summary. Exercises. 4. Work Sharing and Domain Decomposition. 4.1 Basic Work Distribution. 4.2 Parallel Iterations. 4.3 Multidimensional Data. 4.4 Distributing Trees. 4.5 Summary. Exercises. 5. Dynamic Shared Memory Allocation. 5.1 Allocating a Global Shared Memory Space Collectively. 5.2 Allocating Multiple Global Spaces. 5.3 Allocating Local Shared Spaces. 5.4 Freeing Allocated Spaces. 5.5 Summary. Exercises. 6. Synchronization and Memory Consistency. 6.1 Barriers. 6.2 Split-Phase Barriers. 6.3 Locks. 6.4 Memory Consistency. 6.5 Summary. Exercises. 7. Performance Tuning and Optimization. 7.1 Parallel System Architectures. 7.2 Performance Issues in Parallel Programming. 7.3 Role of Compilers and Run-Time Systems. 7.4 UPC Hand Optimization. 7.5 Case Studies. 7.6 Summary. Exercises. 8. UPC Libraries. 8.1 UPC Collective Library. 8.2 UPC-IO Library. 8.3 Summary. References. Appendix A: UPC Language Specifications, v1.1.1. Appendix B: UPC Collective Operations Specifications, v1.0. Appendix C: UPC-IO Specifications, v1.0. Appendix D: How to Compile and Run UPC Programs. Appendix E: Quick UPC Reference. Index.

201 citations

Journal ArticleDOI
TL;DR: The authors describe the two major contemporary HPRC architectures and explore the pros and cons of each using representative applications from remote sensing, molecular dynamics, bioinformatics, and cryptanalysis.
Abstract: Several high-performance computers now use field-programmable gate arrays as reconfigurable coprocessors. The authors describe the two major contemporary HPRC architectures and explore the pros and cons of each using representative applications from remote sensing, molecular dynamics, bioinformatics, and cryptanalysis.

163 citations

Proceedings ArticleDOI
15 Jun 2005
TL;DR: This paper compares CAF and UPC variants of these programs with the original Fortran+MPI code and accounts for the root causes limiting UPC performance such as the synchronization model, the communication efficiency of strided data, and source-to-source translation issues.
Abstract: Co-array Fortran (CAF) and Unified Parallel C (UPC) are two emerging languages for single-program, multiple-data global address space programming. These languages boost programmer productivity by providing shared variables for inter-process communication instead of message passing. However, the performance of these emerging languages still has room for improvement. In this paper, we study the performance of variants of the NAS MG, CG, SP, and BT benchmarks on several modern architectures to identify challenges that must be met to deliver top performance. We compare CAF and UPC variants of these programs with the original Fortran+MPI code. Today, CAF and UPC programs deliver scalable performance on clusters only when written to use bulk communication. However, our experiments uncovered some significant performance bottlenecks of UPC codes on all platforms. We account for the root causes limiting UPC performance such as the synchronization model, the communication efficiency of strided data, and source-to-source translation issues. We show that they can be remedied with language extensions, new synchronization constructs, and, finally, adequate optimizations by the back-end C compilers.

148 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

01 May 1993
TL;DR: Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems.
Abstract: Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dynamics models which can be difficult to parallelize efficiently—those with short-range forces where the neighbors of each atom change rapidly. They can be implemented on any distributed-memory parallel machine which allows for message-passing of data between independently executing processors. The algorithms are tested on a standard Lennard-Jones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers--the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems. For large problems, the spatial algorithm achieves parallel efficiencies of 90% and a 1840-node Intel Paragon performs up to 165 faster than a single Cray C9O processor. Trade-offs between the three algorithms and guidelines for adapting them to more complex molecular dynamics simulations are also discussed.

29,323 citations

Journal ArticleDOI
TL;DR: A review of recent as well as classic image registration methods to provide a comprehensive reference source for the researchers involved in image registration, regardless of particular application areas.

6,842 citations

Journal ArticleDOI
TL;DR: This paper presents an overview of un Mixing methods from the time of Keshava and Mustard's unmixing tutorial to the present, including Signal-subspace, geometrical, statistical, sparsity-based, and spatial-contextual unmixed algorithms.
Abstract: Imaging spectrometers measure electromagnetic energy scattered in their instantaneous field view in hundreds or thousands of spectral channels with higher spectral resolution than multispectral cameras. Imaging spectrometers are therefore often referred to as hyperspectral cameras (HSCs). Higher spectral resolution enables material identification via spectroscopic analysis, which facilitates countless applications that require identifying materials in scenarios unsuitable for classical spectroscopic analysis. Due to low spatial resolution of HSCs, microscopic material mixing, and multiple scattering, spectra measured by HSCs are mixtures of spectra of materials in a scene. Thus, accurate estimation requires unmixing. Pixels are assumed to be mixtures of a few materials, called endmembers. Unmixing involves estimating all or some of: the number of endmembers, their spectral signatures, and their abundances at each pixel. Unmixing is a challenging, ill-posed inverse problem because of model inaccuracies, observation noise, environmental conditions, endmember variability, and data set size. Researchers have devised and investigated many models searching for robust, stable, tractable, and accurate unmixing algorithms. This paper presents an overview of unmixing methods from the time of Keshava and Mustard's unmixing tutorial to the present. Mixing models are first discussed. Signal-subspace, geometrical, statistical, sparsity-based, and spatial-contextual unmixing algorithms are described. Mathematical problems and potential solutions are described. Algorithm characteristics are illustrated experimentally.

2,373 citations

01 Jan 2007

1,932 citations