scispace - formally typeset
Search or ask a question
Book ChapterDOI

Reconstructing hardware transactional memory for workload optimized systems

26 Sep 2011-pp 1-15
TL;DR: It is argued that Hardware Transactional Memory (HTM) can be a suitable implementation choice for these systems and the knowledge about the workload is extremely useful to make appropriate design choices in the workload optimized HTM.
Abstract: Workload optimized systems consisting of large number of general and special purpose cores, and with a support for shared memory programming, are slowly becoming prevalent. One of the major impediments for effective parallel programming on these systems is lock-based synchronization. An alternate synchronization solution called Transactional Memory (TM) is currently being explored.We observe that most of the TM design proposals in literature are catered to match the constrains of general purpose computing platforms. Given the fact that workload optimized systems utilize wider hardware design spaces and on-chip parallelism, we argue that Hardware Transactional Memory (HTM) can be a suitable implementation choice for these systems. We re-evaluate the criteria to be satisfied by a HTM and identify possible scope for relaxations in the context of workload optimized systems. Based on the relaxed criteria, we demonstrate the scope for building HTM design variants, such that, each variant caters to a specific workload requirement. We carry out suitable experiments to bring about the trade-off between the design variants. Overall, we show how the knowledge about the workload is extremely useful to make appropriate design choices in the workload optimized HTM.
Citations
More filters
Journal ArticleDOI
TL;DR: This is the first textbook provides a comprehensive overview of the technical aspects of building parallel programs using BSP and BSPlib, and is contemporary, well presented, and balanced between concepts and the technical depth required for developing parallel algorithms.
Abstract: Parallel Scientific Computation: A Structured Approach using BSP and MPI Rob H. Bisseling Hardcover: 324 pages Oxford University Press, USA (May 6, 2004) Language: English ISBN: 0198529392 In spite of many efforts, no solid framework exists for developing parallel software that is portable and efficient across various parallel architectures. The lack of such framework is mostly due to the absence of a universal model of parallel computation, which can play a role similar to that which Von Neumann’s model plays for the sequential computing, and inhibit the diversity of the existing parallel architecture and parallel programming models. Bulk Synchronous Parallel (BSP) is a parallel computing model proposed by Valiant in 1989, which provides a useful and elegant theoretical framework for bridging the gap between parallel hardware and software. This model comprises a computer archicture (BSP computer), a class of algorithms (BSP algorithm), and a performance model (BSP cost function). The attraction of BSP model lays in its simplicity. A BSP computer consists of collection of processors, each with private memory, and a communication network. A BSP algorithm consists of a sequence of supersteps. A superstep contains either a number of computation steps or a number of communication steps, followed by global barrier synchronization. A BSP performance cost function is based on four parameters: number of processors (p), processor computing rate (r), the ratio between the computation time and communication time (g), and the synchronization cost (l). In Parallel Scientific Computation: A Structured Approach using BSP and MPI, Rob Bisseling provides a practical introduction to the area of numerical scientific computation by using BSPlib communication library in parallel algorithm design and parallel programming. Each chapter contains: an abstract; a brief discussion of sequential algorithm included to make the material self-contain; the design and analysis of a parallel algorithm; an annotated program text; illustrative experimental results of an implementation on a particular parallel computer; bibliographic notes; theoretical and practical exercises. The source files of the printed program texts, together with a set of test programs that demonstrate their use, form a package called BSPedupack, which is available at the official home page of the book. Researchers, students, and savvy professionals, schooled in hardware or software, will value Bisseling's self-study approach to parallel scientific programming. After all, this is the first textbook provides a comprehensive overview of the technical aspects of building parallel programs using BSP. The book opens with an overview of BSP model and BSPlib, which tell you how to get started with writing BSP programs, and how to benchmark your computer as a BSP computer. Chapter 2 on dense LU decomposition presents a regular computation with communication patterns that are common in matrix computations. Chapter 3 on the FFT also treats a regular computation but one with a more complex flow of data. Chapter 4 presents the multiplication of a sparse matrix and dense vector. Appendix C presents MPI programs in the order of the corresponding BSP programs appear in the main text. The book includes a reasonable amount of real world examples, which support the theoretical aspects of the discussions. It is easy to follow and includes logical and consistent exposition and clear descriptions of basic and advanced techniques. Being a textbook, it contains various exercises and project assignments at the end of each chapter. However, sample solutions for these exercises are still not available. Perhaps an accompanying CD carrying the sample solutions and tutorials for use in the classroom would have added to the academic value of the book. However, the bibliographic notes given at the ends of each chapter, as well as the references at the end of the book, are quite useful for those interested in exploring the subject of BSP development further. The book is contemporary, well presented, and balanced between concepts and the technical depth required for developing parallel algorithms. Although the book takes a simple performance view of parallel algorithms design, readers should have some basic knowledge of parallel computing, data structures, and C programming. Overall, the book is suitable as a textbook for one-term undergraduate or graduate courses, as a self-study book, or as technical training material for professionals. Ami Marowka Department of Software Engineering Shenkar College of Engineering and Design Ramat-Gan, Israel.

80 citations

References
More filters
Journal ArticleDOI
TL;DR: In this paper, a theoretical valuation formula for options is derived, based on the assumption that options are correctly priced in the market and it should not be possible to make sure profits by creating portfolios of long and short positions in options and their underlying stocks.
Abstract: If options are correctly priced in the market, it should not be possible to make sure profits by creating portfolios of long and short positions in options and their underlying stocks. Using this principle, a theoretical valuation formula for options is derived. Since almost all corporate liabilities can be viewed as combinations of options, the formula and the analysis that led to it are also applicable to corporate liabilities such as common stock, corporate bonds, and warrants. In particular, the formula can be used to derive the discount that should be applied to a corporate bond because of the possibility of default.

28,434 citations

Proceedings ArticleDOI
01 Dec 2001
TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.
Abstract: This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The first is the introduction of a new image representation called the "integral image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features from a larger set and yields extremely efficient classifiers. The third contribution is a method for combining increasingly more complex classifiers in a "cascade" which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. The cascade can be viewed as an object specific focus-of-attention mechanism which unlike previous approaches provides statistical guarantees that discarded regions are unlikely to contain the object of interest. In the domain of face detection the system yields detection rates comparable to the best previous systems. Used in real-time applications, the detector runs at 15 frames per second without resorting to image differencing or skin color detection.

18,620 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Book
12 Sep 2011
TL;DR: In this paper, the authors deduced a set of restrictions on option pricing formulas from the assumption that investors prefer more to less, which are necessary conditions for a formula to be consistent with a rational pricing theory.
Abstract: The long history of the theory of option pricing began in 1900 when the French mathematician Louis Bachelier deduced an option pricing formula based on the assumption that stock prices follow a Brownian motion with zero drift. Since that time, numerous researchers have contributed to the theory. The present paper begins by deducing a set of restrictions on option pricing formulas from the assumption that investors prefer more to less. These restrictions are necessary conditions for a formula to be consistent with a rational pricing theory. Attention is given to the problems created when dividends are paid on the underlying common stock and when the terms of the option contract can be changed explicitly by a change in exercise price or implicitly by a shift in the investment or capital structure policy of the firm. Since the deduced restrictions are not sufficient to uniquely determine an option pricing formula, additional assumptions are introduced to examine and extend the seminal Black-Scholes theory of option pricing. Explicit formulas for pricing both call and put options as well as for warrants and the new "down-and-out" option are derived. The effects of dividends and call provisions on the warrant price are examined. The possibilities for further extension of the theory to the pricing of corporate liabilities are discussed.

9,635 citations

Journal ArticleDOI
TL;DR: The working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap are discussed, as well as aspects of system engineering: databases, system architecture, and evaluation.
Abstract: Presents a review of 200 references in content-based image retrieval. The paper starts with discussing the working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap. Subsequent sections discuss computational steps for image retrieval systems. Step one of the review is image processing for retrieval sorted by color, texture, and local geometry. Features for retrieval are discussed next, sorted by: accumulative and global features, salient points, object and shape features, signs, and structural combinations thereof. Similarity of pictures and objects in pictures is reviewed for each of the feature types, in close connection to the types and means of feedback the user of the systems is capable of giving by interaction. We briefly discuss aspects of system engineering: databases, system architecture, and evaluation. In the concluding section, we present our view on: the driving force of the field, the heritage from computer vision, the influence on computer vision, the role of similarity and of interaction, the need for databases, the problem of evaluation, and the role of the semantic gap.

6,447 citations