Reconstructing hardware transactional memory for workload optimized systems

doi:10.1007/978-3-642-24151-2_1

Book Chapter•DOI•

Reconstructing hardware transactional memory for workload optimized systems

Kunal Korgaonkar¹, Prabhat Jain¹, Deepak Tomar¹, Kashyap Garimella¹, V. Kamakoti¹ - Show less +1 more•Institutions (1)

26 Sep 2011-pp 1-15

TL;DR: It is argued that Hardware Transactional Memory (HTM) can be a suitable implementation choice for these systems and the knowledge about the workload is extremely useful to make appropriate design choices in the workload optimized HTM.

read less

Abstract: Workload optimized systems consisting of large number of general and special purpose cores, and with a support for shared memory programming, are slowly becoming prevalent. One of the major impediments for effective parallel programming on these systems is lock-based synchronization. An alternate synchronization solution called Transactional Memory (TM) is currently being explored.We observe that most of the TM design proposals in literature are catered to match the constrains of general purpose computing platforms. Given the fact that workload optimized systems utilize wider hardware design spaces and on-chip parallelism, we argue that Hardware Transactional Memory (HTM) can be a suitable implementation choice for these systems. We re-evaluate the criteria to be satisfied by a HTM and identify possible scope for relaxations in the context of workload optimized systems. Based on the relaxed criteria, we demonstrate the scope for building HTM design variants, such that, each variant caters to a specific workload requirement. We carry out suitable experiments to bring about the trade-off between the design variants. Overall, we show how the knowledge about the workload is extremely useful to make appropriate design choices in the workload optimized HTM.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Parallel Scientific Computation: A Structured Approach using BSP and MPI

[...]

Ami Marowka

01 Jan 2006-Scalable Computing: Practice and Experience

TL;DR: This is the first textbook provides a comprehensive overview of the technical aspects of building parallel programs using BSP and BSPlib, and is contemporary, well presented, and balanced between concepts and the technical depth required for developing parallel algorithms.

...read moreread less

Abstract: Parallel Scientific Computation: A Structured Approach using BSP and MPI Rob H. Bisseling Hardcover: 324 pages Oxford University Press, USA (May 6, 2004) Language: English ISBN: 0198529392 In spite of many efforts, no solid framework exists for developing parallel software that is portable and efficient across various parallel architectures. The lack of such framework is mostly due to the absence of a universal model of parallel computation, which can play a role similar to that which Von Neumann’s model plays for the sequential computing, and inhibit the diversity of the existing parallel architecture and parallel programming models. Bulk Synchronous Parallel (BSP) is a parallel computing model proposed by Valiant in 1989, which provides a useful and elegant theoretical framework for bridging the gap between parallel hardware and software. This model comprises a computer archicture (BSP computer), a class of algorithms (BSP algorithm), and a performance model (BSP cost function). The attraction of BSP model lays in its simplicity. A BSP computer consists of collection of processors, each with private memory, and a communication network. A BSP algorithm consists of a sequence of supersteps. A superstep contains either a number of computation steps or a number of communication steps, followed by global barrier synchronization. A BSP performance cost function is based on four parameters: number of processors (p), processor computing rate (r), the ratio between the computation time and communication time (g), and the synchronization cost (l). In Parallel Scientific Computation: A Structured Approach using BSP and MPI, Rob Bisseling provides a practical introduction to the area of numerical scientific computation by using BSPlib communication library in parallel algorithm design and parallel programming. Each chapter contains: an abstract; a brief discussion of sequential algorithm included to make the material self-contain; the design and analysis of a parallel algorithm; an annotated program text; illustrative experimental results of an implementation on a particular parallel computer; bibliographic notes; theoretical and practical exercises. The source files of the printed program texts, together with a set of test programs that demonstrate their use, form a package called BSPedupack, which is available at the official home page of the book. Researchers, students, and savvy professionals, schooled in hardware or software, will value Bisseling's self-study approach to parallel scientific programming. After all, this is the first textbook provides a comprehensive overview of the technical aspects of building parallel programs using BSP. The book opens with an overview of BSP model and BSPlib, which tell you how to get started with writing BSP programs, and how to benchmark your computer as a BSP computer. Chapter 2 on dense LU decomposition presents a regular computation with communication patterns that are common in matrix computations. Chapter 3 on the FFT also treats a regular computation but one with a more complex flow of data. Chapter 4 presents the multiplication of a sparse matrix and dense vector. Appendix C presents MPI programs in the order of the corresponding BSP programs appear in the main text. The book includes a reasonable amount of real world examples, which support the theoretical aspects of the discussions. It is easy to follow and includes logical and consistent exposition and clear descriptions of basic and advanced techniques. Being a textbook, it contains various exercises and project assignments at the end of each chapter. However, sample solutions for these exercises are still not available. Perhaps an accompanying CD carrying the sample solutions and tutorials for use in the classroom would have added to the academic value of the book. However, the bibliographic notes given at the ends of each chapter, as well as the references at the end of the book, are quite useful for those interested in exploring the subject of BSP development further. The book is contemporary, well presented, and balanced between concepts and the technical depth required for developing parallel algorithms. Although the book takes a simple performance view of parallel algorithms design, readers should have some basic knowledge of parallel computing, data structures, and C programming. Overall, the book is suitable as a textbook for one-term undergraduate or graduate courses, as a self-study book, or as technical training material for professionals. Ami Marowka Department of Software Engineering Shenkar College of Engineering and Design Ramat-Gan, Israel.

...read moreread less

80 citations

Reconstructing hardware transactional memory for workload optimized systems

Citations

References

Related Papers (5)