scispace - formally typeset
Search or ask a question

Showing papers in "Scalable Computing: Practice and Experience in 2009"


Journal ArticleDOI
TL;DR: The fundamentals of Bio-Inspired Artificial Intelligence are well demonstrated, allowing for a novice researcher in this area to develop the necessary skills and have a firm grasp on this topic.
Abstract: Bio-Inspired Artificial Intelligence: Theories, Methods, and Technologies by D. Floreano and C. Mattiussi This is a book that bridges biological systems and computer science. For digital-based researchers, having this book which details the biological components of natural life and seamlessly integrates that knowledge into our digital realm is an essential asset. Each chapter is systematically introduces the reader to a biological system while easing them into the its computational counterpart. There are seven chapters covering evolution, cellular, neural, developmental, immune, behavioral, and collective systems. Chapter 1 introduces the fundamental concept of computational evolution as related to biological systems. This chapter starts with the basic concepts of evolutionary theory and progresses, covering everything from fitness functions to analog circuits. The following chapter presents the next logical step upwards in biology, cellular structures and systems. Again introducing the basics of life and progressing towards cellular automata. Chapter 3 covers Neural Networks by introducing the Biological Nervous System, then the Artificial Neural Network. The core concepts to Neural Networks are detailed in a systematic and common-sense manner, introducing unsupervised learning, supervised learning, and reinforce learning, then progressing onto neural hardware and hybrid systems. In Chapter 4, the authors detail developmental systems, explaining how nature utilizes the cellular structures to how engineers can mimic nature. This theme of progression from biological introduction to digital computation is reproduced as a single voice through out each chapter. The fundamentals of Bio-Inspired Artificial Intelligence are well demonstrated, allowing for a novice researcher in this area to develop the necessary skills and have a firm grasp on this topic. Once the reader has a solid grasp of the building blocks of life, the authors present chapters related to larger systems. Of particular interest to my research is the chapter on Immune Systems. This chapter provides a fundamental understanding of the Human Immune System, detailing the finer points of immunological cellular structures, while introducing a slightly more than generalized immune response concept. After a lengthy introduction of human immunology, we are introduced to the core of Artificial Immune Systems, the Negative Selection Algorithm and Clonal Selection Algorithm. Each one of these algorithms is covered enough so that the reader is capable of understanding each respective algorithms strengths and limitations. For new researchers to Artificial Immune Systems, days of reading journal articles is summarized in these sections, allowing for intelligent and efficient decision making in choosing your next step of research. Chapter 6 and 7 provides the audience with behavior systems and collective systems, respectively. The behavioral systems covered in this book relate to aspects of AI, robots, and some machine learning. Once behavior is understood, collective and cooperative systems are covered. Optimization techniques of particle swarms, ant colonies, and topics derived for robotics are detailed and well explained. While this is not a textbook, is does cover the fundamental concepts required to research Bio-Inspired Artificial Intelligence. For myself, the quality of this book can simply be noted by the publishers, MIT Press. Many of the best books I have encountered in my studies have been published by MIT, and here is another. Floreano and Mattiussi have not let me down in their quality, albeit I do have some complaints. First, while the topics cover a solid breadth, the depth on detailing the computation side is limited. I would like to have seen either more depth in each chapter or a broader look at each chapters algorithms, but the book falls somewhere in the middle. My current research involves Danger Signals and their relationship to preventing Epidemic Attacks, so I would have like to seen more detail about Polly Matzinger's Danger Theory rather than one short paragraph saying that it is not universally accepted. While Immunologists may debate Danger Theory, novel algorithms have been developed off of the concept of Danger Theory and deserve a place in this book. Yet to counter my own argument, the authors do finish off each chapter with a Suggested Readings section outlining a series of excellent supplement papers to the chapters topics that would eventually lead the reader to these novel topics. Overall, if you are interested in this field, buy this book. You can find it online at MIT Press for a discounted price. This book will make an excellent addition to any computer researchers library. Anthony Kulis, Department of Computer Science, Southern Illinois University

292 citations



Journal ArticleDOI
TL;DR: This paper identifies clearly which heuristics need to be chosen and this allows a new technique for computing the conditioning which is ideally suited to BVODES to be introduced.
Abstract: Boundary value problems for ordinary differential equations (BVODES) occur in a great many practical situations and they are generally much harder to solve than initial value problems. Traditionally codes for BVODES did not take into account the conditioning of the problem and it was generally assumed that the problem being solved was well conditioned so that small local errors gave rise to correspondingly small global errors. Recently a new generation of codes which take account of conditioning has been developed. However most of these codes are based on a rather ad hoc approach with the need to choose several heuristics without any real guidance on how these choices can be made. In this paper we identify clearly which heuristics need to be chosen and we discuss different choices of monitor functions that are used in our codes. This has the important effect of unifying the various approaches that have recently been proposed. This in turn allows us, in the present paper, to introduce a new technique for computing the conditioning which is ideally suited to BVODES.

14 citations


Journal ArticleDOI
TL;DR: Different measurements showing that tasks granularity is a key point for the performances of the decomposition/mapping strategy shows a good scalability, and improves performance in both cheaper commodity cluster and high performance clusters with low latency networks.
Abstract: In this paper we aim at exploiting the temporal coherence among successive phases of a computation, in order to implement a load-balancing technique in mesh-like computations to be mapped on a cluster of processors. A key concept, on which the load balancing schema is built on, is the use of a Predictor component that is in charge of providing an estimation of the unbalancing between successive phases. By using this information, our method partitions the computation in balanced tasks through the Prediction Binary Tree (PBT). At each new phase, current PBT is updated by using previous phase computing time for each task as next-phase's cost estimate. The PBT is designed so that it balances the load across the tasks as well as reduces dependency among processors for higher performances. Reducing dependency is obtained by using rectangular tiles of the mesh, of almost-square shape (i. e. one dimension is at most twice the other). By reducing dependency, one can reduce inter-processors communication or exploit local dependencies among tasks (such as data locality). Furthermore, we also provide two heuristics which take advantage of data-locality. Our strategy has been assessed on a significant problem, Parallel Ray Tracing. Our implementation shows a good scalability, and improves performance in both cheaper commodity cluster and high performance clusters with low latency networks. We report different measurements showing that tasks granularity is a key point for the performances of our decomposition/mapping strategy.

11 citations


Journal ArticleDOI
TL;DR: It is shown that a class of Runge-Kutta methods investigated by Milne and Rosser that compute a block of new values at each step are well-suited to vectorization.
Abstract: Vectorization is very important to the efficiency of computation in the popular problem-solving environment Matlab. It is shown that a class of Runge-Kutta methods investigated by Milne and Rosser that compute a block of new values at each step are well-suited to vectorization. Local error estimates and continuous extensions that require no additional function evaluations are derived. A (7,8) pair is derived and implemented in a program BV78 that is shown to perform quite well when compared to the well-known Matlab ODE solver ode45 which is based on a (4,5) pair.

11 citations


Journal ArticleDOI
TL;DR: This paper examines the problem of minimizing the variance of the download time in a particular Video on Demand System based on a Grid Delivery Network which is a hybrid architecture based on P2P and Grid Computing concepts, and forms this as an optimization problem.
Abstract: In this paper, we examine the problem of minimizing the variance of the download time in a particular Video on Demand System. This VOD system is based on a Grid Delivery Network which is a hybrid architecture based on P2P and Grid Computing concepts. In this system, videos are divided into blocks and replicated on hosts to decrease the average response time. The purpose of the paper is to study the impact of the block allocation scheme on the variance of the download time. We formulate this as an optimization problem, and show that this problem can be reduced to finding a Steiner System. We analyze different heuristics to solve it in practice, and validate through simulation that a random allocation is quasi-optimal.

7 citations


Journal ArticleDOI
TL;DR: A control engineering environment called CPDev for programming small controllers in ST, FBD and IL languages of IEC 61131-3 standard is presented and applications include two controllers for small DCS systems and PC equipped with I/O boards.
Abstract: A control engineering environment called CPDev for programming small controllers in ST, FBD and IL languages of IEC 61131-3 standard is presented. The environment consists of a compiler, simulator and hardware configurer. It is open in the sense that: (1) code generated by the compiler can be executed by different processors, (2) low-level components of the controller runtime program are developed by hardware designers, (3) control programmers can define libraries with functions, function blocks and programs. Of the three IEC languages, ST Structured Text is a basis for CPDev. FBD diagrams are translated to ST. IL compiler uses the same code generator. The runtime program has the form of virtual machine which executes universal code generated by the compiler. The machine is an ANSI C program with some platform-dependent components. The machines for AVR, ARM, MCS51 and x86 processors have been developed so far. Applications include two controllers for small DCS systems and PC equipped with I/O boards. CPDev may be downloaded from http://cpdev.prz-rzeszow.pl/­demo .

7 citations


Journal ArticleDOI
TL;DR: A selection of papers which are extensions of papers presented at the 7-th International Symposium on Parallel and Distributed Computing, 1–5 July 2008, in Krakow, Poland, to present the flavour of the research reported at the conference and to present some of the most relevant topics currently focused on the research on parallel and distributed computing in general.
Abstract: Dear SCPE Reader, We present a selection of papers which are extensions of papers presented at the 7-th International Symposium on Parallel and Distributed Computing, 1–5 July 2008, in Krakow, Poland. The motivation for publishing the selection in the SCPE Journal was, on the one hand, to present the flavour of the research reported at the conference and on the other hand to present some of the most relevant topics currently focused on the research on parallel and distributed computing in general. The selection contains only 6 papers out of about 60 presented at the conference, and thus, is far from covering all relevant topics represented at the ISPDC 2008. This is because not all of the invited authors were patient enough to accept a fairly long paper publishing process. Nevertheless, we hope that the presented papers will bring you closer to the research covered by the ISPDC conferences and will encourage you to participate in future ISPDC editions. The first paper ``The Impact of Workload Variability on Load Balancing Algorithms'' is by Marta Beltran and Antonio Guzman from King Juan Carlos University in Spain. It concerns an important topic of load balancing in cluster systems, namely adaptativity of the load balancing algorithms to changes of the workload in the system. Adequate accounting for additional load in the hosting system is of great relevance for correct optimization effects. The paper presents a thorough formal analysis of the workload variability metrics and their influence on the quality of load balancing algorithms. Four basic activities appearing in load balancing algorithms are identified, and based on them some algorithmic solutions are proposed to correctly deal with workload variability in system load balancing. The problem of dynamic load balancing algorithms robustness has been discussed. Two different robustness metrics sensitive to the applied type of opimization: local task-oriented or a global one enable selecting task remote execution or migration as load balancing operations. The proposed approach is illustrated with experiments. The second paper ``Model-Driven Engineering and Formal Validation of High-Performance Embedded Systems'' is by Abdoulaye Gamatie, Eric Rutten, Huafeng Yu, Pierre Boulet, Jean-Luc Dekeyser, from University of Lille and INRIA in France. The paper is concerned with a very advanced methodology of designing correct parallel embedded systems for intensive data-parallel computing. In their previous research, the authors of the paper designed the GASPARD embedded system design framework. It is based on the hardware/software co-design approach through model-driven engineering. The framework is based on an UML-like model specification language in which hardware and software elements are modelled using a component approach with special mechanisms for repetitive structures. This paper tries to combine the modelling framework of GASPARD with the mechanisms of synchronous languages to achieve design verifiability provided for such languages. The paper shows how GASPARD models can be translated into synchronous models based on data flow equations in order to formally check their correctness. The proposed approach is illustrated with an example of a video processing system. The third paper ``Relations Between Several Parallel Computational Models'' is by Stephan Bruda and Yuanqiao Zhang from Bishop’s University in Canada. The paper is concerned with theoretical aspects of shared memory systems described by the parallel random access machine PRAM model and aims in studying performance properties of different types of PRAM systems. The attention is focussed on analysing the computational power of two more sophisticated PRAM models (Combining CRCW and Broadcast Selective Reduction), which include data reduction in case of concurrent writes. The paper shows that these two models have equivalent computational power, which is a new result comparing the existing literature. The performance of both models applied to reconfigurable multiple bus machines was studied as a possible architectural solution for current VLSI processor implementations. It was shown that in such systems under reasonable assumptions concurrent-write does not enhance performance comparing the exclusive-write model. Another result important for the VLSI technology is that the Combining CRCW PRAM model (in which data of concurrent writes are arthmetically or logically combined before write) and the exclusive-write on directed reconfigurable busses perform in equivalent way under strong real-time requirements. The fourth paper ``Experiences with Mesh-Like Computations Using Prediction Binary Trees'' is by Gennaro Cordasco, Biagio Cosenza, Rosario de Chiara, Ugo Erra and Vittorio Scarano from the University ``degli Studi'' of Salerno and the University ``degli Studi della Biasilicata'' of Potenza in Italy. The paper concerns optimization methods for mesh-like computations in clusters of processors. The computations are perfomed assuming a phase-like program execution control using a tiling approach which reduces inter-processor communication. A temporal coherence is also assumed, which means that task sizes provide similar execution times in consecutive phases. Temporary coherent computations are structured in a Prediction Binary Tree, in which leaves represent computing tiles to be mapped to processors. A phase-by-phase semi-static load balancing is introduced to the scheduling algorithm. The scheduling algorithm is equipped with a predictor, which estimates the computation time of next phase tiles based on previous execution times and modifies the tiles to achieve balanced execution in phases. For this, two heuristics are used to leverage on data locality in processors. The proposed approach is illustrated by the example of interactive rendering with Parallel Ray Tracing algorithm. The fifth paper ``The Influence of the IBM pSeries Servers Virtualization Mechanism on Dynamic Resource Allocation in AIX 5L'' is by Maciej Mlynski from ASpartner Limited in Poland. The paper concerns a very up-to-date problem of system virtualization and presents the results of research carried on IBM pSeries servers. IBM is strongly developing the virtualization technique especially on IBM pSeries servers enabling an improved and flexible sharing of system resources between applications. The paper investigates novel facilities for dynamic resource management such as micro-partitioning and partition load manager. They enable dynamic creation of workload logical partitions of system resources and their dynamic mangement. It includes run-time resource re-alocation between logical partitions including setting of sharing specifications as well as run-time adding/removing/setting parameters of resources in the system. It remains an open question how to properly tune parameters of the operating system using the provided virtualization facilities to obtain the best efficiency for a given application program. The paper presents the results of experiments which study the effects of tuning the disk subsystem parameters under the IBM AIX 5L operating system with the use of the provided virtualization facilities on the resulting application execution performance. The results show that even small deterioration in the resource pool status requires an immediate adaptation of the operating system parameters to maintain the required performance. The sixth paper ``HeteroPBLAS: A Set of Parallel Basic Linear Algebra Subprograms Optimized for Heterogeneous Computational Clusters'' is by Ravi Reddy, Alexey Lastovetsky and Pedro Alonso from University College Dublin in Ireland and Polytechnic University of Valencia in Spain. The paper concerns the methodology for parallelization of linear algebra computations for execution in heterogeneous cluster environments. The design of the HeteroPBLAS library (Parallel Basic Linear Algebra Subprograms) for heterogeneous computational clusters is presented. The main contribution of the paper is the automation of the parallelization and optimization of the PBLAS, which is done by means of a special user interface and the underlying set of functions. An important element is here a performance model that is based on program code instrumentation, which determines parameters of the application and the executive heterogeneous platform relevant for execution performance of parallel code. The parameter values specified for or returned by execution of the performance model functions are next used for generation and optimal mapping of the parallel code of the library subroutines. The proposed approach is illustrated by experimental results of execution of optimized HeteroPBLAS programs on homogeneous and heterogeneous computing clusters. Marek Tudruj

5 citations


Journal ArticleDOI
TL;DR: A combination of two models of computation (MoCs) within a framework, called Gaspard, in order to deal with the design and validation of high-performance embedded systems, specifically dedicated to intensive data-parallel processing.
Abstract: The study presented in this paper concerns the safe design of high-performance embedded systems, specifically dedicated to intensive data-parallel processing as found, for instance, in modern multimedia applications or radar/sonar signal processing Among the important requirements of such systems are the efficient execution, reliability and quality of service Unfortunately, the complexity of modern embedded systems makes it difficult to meet all these requirements As an answer to this issue, this paper proposes a combination of two models of computation (MoCs) within a framework, called Gaspard , in order to deal with the design and validation of high-performance embedded systems On the one hand, the repetitive MoC offers a powerful expression of the parallelism available in both system functionality and architecture On the other hand, the synchronous MoC serves as a formal model on which a trustworthy verification can be applied Here, the high-level models specified with the repetitive MoC are translated into an equational model in the synchronous MoC so as to be able to formally analyze different properties of the modeled systems As an example, a clock synchronizability analysis is illustrated on a multimedia system in order to guarantee a correct interaction between its components For the implementation and validation of our proposition, a Model-Driven Engineering (MDE) approach is adopted MDE is well-suited to deal with design complexity and productivity issues In our case, the OMG standard MARTE profile is used to model embedded systems Then, automatic transformations are applied to these models to produce the corresponding synchronous programs for analysis

5 citations


Journal ArticleDOI
TL;DR: An exhaustive analysis is presented to help users and administrators to understand if their load balancing mechanisms are sensitive to variability and to what degree and solutions to deal with variability in load balancing algorithms are proposed.
Abstract: The workload on a cluster system can be highly variable, increasing the difficulty of balancing the load across its nodes. The general rule is that high variability leads to wrong load balancing decisions taken with out-of-date information and difficult to correct in real-time during applications execution. In this work the workload variability is studied from the perspective of the load balancing performance, focusing on the design of algorithms capable of dealing with this variability. In this paper an exhaustive analysis is presented to help users and administrators to understand if their load balancing mechanisms are sensitive to variability and to what degree. Furthermore, solutions to deal with variability in load balancing algorithms are proposed and their utilization is exemplified on a real load balancing algorithm.

5 citations


Journal ArticleDOI
TL;DR: The design of a Distributed File System (DFS) specialized for HPC applications that exploits the storage, computation and communication capabilities of Internet edge nodes and empowers users with the autonomy to flex file management strategies to suit their needs is discussed.
Abstract: The concept of using Internet edge nodes for High Performance Computing (HPC) applications has gained acceptance in recent times. Many of these HPC applications also have large I/O requirements. Consequently, an edge node file system that efficiently manages the large number of files involved can assist in improving application performance significantly. In this paper, we discuss the design of a Distributed File System (DFS) specialized for HPC applications that exploits the storage, computation and communication capabilities of Internet edge nodes and empowers users with the autonomy to flex file management strategies to suit their needs

Journal ArticleDOI
TL;DR: Analysis of the influence of the virtualization mechanism of IBM pSeries servers on dynamic resource allocation in AIX 5L operating system raises issues of economic use of resources and distribution of processing power.
Abstract: This paper presents analysis of the influence of the virtualization mechanism of IBM pSeries servers on dynamic resource allocation in AIX 5L operating system. It raises issues of economic use of resources and distribution of processing power. Some innovative solutions are proposed as methods for dynamic resource allocation. These are: micro-partitioning and partition load manager utility. Some results of experiments are presented. The goal of these experiments was to estimate the influence of selected AIX 5L operating system adjustments on computation efficiency.

Journal ArticleDOI
TL;DR: In this work, a characterization of WSAN applications in health, environmental, agricultural and industrial sectors are presented, and a system for detecting heart arrhythmias in non-critical patients during rehabilitation sessions in confined spaces is presented.
Abstract: Nowadays developments in Wireless Sensor and Actuators Networks (WSAN) applications are determined by the fulfillment of constraints imposed by the application. For this reason, in this work a characterization of WSAN applications in health, environmental, agricultural and industrial sectors are presented. Two cases study are presented, in the first a system for detecting heart arrhythmias in non-critical patients during rehabilitation sessions in confined spaces is presented, as well as an architecture for the network and nodes in these applications is proposed; while the second case presents experimental and theoretical results of the effect produced by communication networks in a Networks Control System (NCS), specifically by the use of the Medium Access Control (MAC) algorithm implemented in IEEE 802.15.4.

Journal ArticleDOI
TL;DR: A software library which provides optimized parallel basic linear algebra subprograms for Heterogeneous Computational Clusters, written on the top of HeteroMPI and PBLAS whose building blocks, the de facto standard kernels for matrix and vector operations and message passing communication, are optimized for heterogeneous computational clusters.
Abstract: This paper presents a software library, called Heterogeneous PBLAS (HeteroPBLAS), which provides optimized parallel basic linear algebra subprograms for Heterogeneous Computational Clusters. This library is written on the top of HeteroMPI and PBLAS whose building blocks, the de facto standard kernels for matrix and vector operations (BLAS) and message passing communication (BLACS), are optimized for heterogeneous computational clusters. This is the first step towards the development of a parallel linear algebra package for Heterogeneous Computational Clusters. We show that the efficiency of the parallel routines is due to the most important feature of the library, which is the automation of the difficult optimization tasks of parallel programming on heterogeneous computing clusters. They are the determination of the accurate values of the platform parameters such as the speeds of the processors and the latencies and bandwidths of the communication links connecting different pairs of processors, the optimal values of the algorithmic parameters such as the data distribution blocking factor, the total number of processes, the 2D process grid arrangement, and the efficient mapping of the processes executing the parallel algorithm to the executing nodes of the heterogeneous computing cluster. We present the user interface and the software hierarchy of the first research implementation of HeteroPBLAS. We demonstrate the efficiency of the HeteroPBLAS programs on a homogeneous computing cluster and a heterogeneous computing cluster.

Journal ArticleDOI
TL;DR: The Maximum Node Degree, which characterizes the dynamic membership property, impacts the connectivity mobility metrics of the Link and Path Durations of the dynamic connectivity graphs and the lower and the upper bounds are provided.
Abstract: Mobility Metrics of the connectivity graphs with fixed number of nodes have been presented in several research studies over the last three years. More realistic mobility models that are derived from real user traces challenge the assumption of the connectivity graph with fixed number of nodes, but they rather show that wireless nodes posses dynamic membership (nodes join and leave the simulation dynamically based on some random variable). In this paper, we evaluate the mobility metrics of the dynamic connectivity graph on both the nodes and the links. The contributions of this paper are two-fold. First, we introduce the algorithm that computes the Maximum Node Degree. We show that the Maximum Node Degree, which characterizes the dynamic membership property, impacts the connectivity mobility metrics of the Link and Path Durations of the dynamic connectivity graphs. Second, we provide the lower and the upper bounds for these mobility metrics.

Journal ArticleDOI
TL;DR: This paper provides a survey of the synthetic and real traffic models used in ad hoc network simulation studies, and shows that the real pedestrian and real vehicular model share common mobility characteristics.
Abstract: In order to provide credible and valid simulation results it is important to built simulation models that accurately represent the environments were ad hoc networks will be deployed. Recent research results have shown that there is a disparity of 30% between protocol performance in real test beds and the one in simulation environments. In this paper we summarize the recent trends on the simulation models for ad hoc networks. First, we provide a survey of the synthetic and real traffic models used in ad hoc network simulation studies. Second, we select a representative of the most used mobility synthetic model, the real pedestrian mobility model, and the real vehicular model (for mixed traffic). We show that the real pedestrian and real vehicular model share common mobility characteristics: a) The transition matrix on both models illustrate that wireless nodes do not move from one location to another at random, but they are rather based on activities (work, shopping, college, gym); b) The dynamic membership property illustrates that nodes join and leave the simulation based on some variable distribution or patterns. Lastly, via simulations we show that when using realistic simulation models the simulation protocol performance closer reflects with the real test bed protocol performance.

Journal ArticleDOI
TL;DR: This work contradicts the long held belief that the heavyweight models (namely, the Combining CRCW PRAM and the BSR) form a hierarchy, showing that they are identical in computational power with each other.
Abstract: We investigate the relative computational power of parallel models with shared memory. Based on feasibility considerations present in the literature, we split these models into lightweight and heavyweight, and then find that the heavyweight class is strictly more powerful than the lightweight class, as expected. On the other hand, we contradict the long held belief that the heavyweight models (namely, the Combining CRCW PRAM and the BSR) form a hierarchy, showing that they are identical in computational power with each other. We thus introduce the BSR into the family of practically meaningful massively parallel models. We also investigate the power of concurrent-write on models with reconfigurable buses, finding that it does not add computational power over exclusive-write under certain reasonable assumptions. Overall, the Combining CRCW PRAM and the CREW models with directed reconfigurable buses are found to be the simplest of the heavyweight models, which now also include the BSR and all the models with directed reconfigurable buses. These results also have significant implications in the area of real-time computations.

Journal ArticleDOI
TL;DR: In this article, the authors use a thread parallel model of parallel computing on CMPs and evolve metrics specific to generalized chip multicore processors (CMP) and use them for parallel performance modeling of numerical linear algebra.
Abstract: With the advent of multicore chips new parallel computing metrics and models have become essential for re-designing traditional scientific application libraries tuned to a single chip. In this paper we evolve metrics specific to generalized chip multicore processors (CMP) and use them for parallel performance modeling of numerical linear algebra routines that are commonly available as shared object libraries tuned to single processor chip. The study uses a thread parallel model of parallel computing on CMPs. POSIX threads (pthread) have been used due to the wide acceptance and availability. The shortcoming of the POSIX threads for numerical linear algebra in terms of data distribution has been overcome by tuning algorithms so that a particular thread will operate on a specific portion of the matrix. The paper studies tuned implementations of the conventional a few parallel linear algebra method as examples on a generalized CMP model. For formulating a speed-up metric, this work takes into consideration the power consumption and the effect of memory cache hierarchy.

Journal ArticleDOI
TL;DR: An evaluation of the performance of the storage is presented by performing a GCC build on the trial production storage system both during trouble-free operation and failure.
Abstract: With the recent flood of data, one of the major issues is the storage thereof Although commodity HDDs are now very cheap, appliance storage systems are still relatively expensive As a result, we developed the VLSD (Virtual Large-Scale Disk) toolkit to assist in the construction of large-scale storage using only cheap commodity hardware and software As an experiment in using the VLSD toolkit, storage was created for an educational environment This paper presents an evaluation of the performance of the storage by performing a GCC build on the trial production storage system both during trouble-free operation and failure

Journal ArticleDOI
TL;DR: A component-based architecture, which generic components and algorithms, and a developpement methodology, to manage QoS issues while developing an embedded software and the obtained software is able to automatically adapt its behaviour to the physical resources, thanks todegraded modes.
Abstract: Even if hardware improvements have increased the performance of embedded systems in the last years, resource problems are still acute. The persisting problem is the constantly growing complexity of systems, which increase the need for reusable developement framework and pieces of code. In the case of PDAs and smartphones, in addition to classical needs (safety, security), developers must deal with quality of service (QoS) constraints, such as resource management. Qinna was designed to face with these problems. In this paper, we propose a complete framework to express ressource constraints during the developpement process. We propose a component-based architecture, which generic components and algorithms, and a developpement methodology, to manage QoS issues while developing an embedded software. The obtained software is then able to automatically adapt its behaviour to the physical resources, thanks to \g{degraded modes}. We illustrate the methodology and the use of Qinna within a case study.

Journal ArticleDOI
TL;DR: This paper evaluates the performance of an OpenMP shared-memory programming model that is integrated into Microsoft Visual Studio C++ 2005 and Intel C++ compilers on a multicore processor.
Abstract: The transition from sequential computing to parallel computing represents the next turning point in the way software engineers design and write software. This paradigm shift leads the integration of parallel programming standards for high-end shared-memory machine architectures into desktop programming environments. In this paper we present a performance study of these new systems. We evaluate the performance of an OpenMP shared-memory programming model that is integrated into Microsoft Visual Studio C++ 2005 and Intel C++ compilers on a multicore processor. We benchmarked using the NAS OpenMP high-level applications benchmarks and the EPCC OpenMP low-level benchmarks. We report the basic timings and runtime profiles of each benchmark and analyze the running results.

Journal ArticleDOI
TL;DR: This paper presents a new table driven approach to handle real-time capable Web services communication, on embedded hardware through the Devices Profile for Web Services.
Abstract: Service-oriented architectures (SOA) become more and more important in networked embedded systems. The main advantages of Service-oriented architectures are a higher abstraction level and interoperability of devices. In this area, Web services have become an important standard for communication between devices. However, this upcoming technology is only available on devices with sufficient resources. Therefore, embedded devices are often excluded from the deployment of Web services due to a lack of computing power, insufficient memory space and limited bandwidth. Furthermore, embedded devices often require real-time capabilities for communication and process control. This paper presents a new table driven approach to handle real-time capable Web services communication, on embedded hardware through the Devices Profile for Web Services.

Journal ArticleDOI
TL;DR: This work proposes to bridge the gap between the performance modeling and software engineering by incorporating UML, and aims to permit the graphical specification of performance model in a human-intuitive fashion, but on the other hand aims for a machine-efficient model evaluation.
Abstract: We address the issue of the development of performance models for programs that may be executed on large-scale computing systems. The commonly used approaches apply non-standard notations for model specification and often require that the software engineer has a thorough understanding of the underlying performance modeling technique. We propose to bridge the gap between the performance modeling and software engineering by incorporating UML. In our approach we aim to permit the graphical specification of performance model in a human-intuitive fashion on one hand, but on the other hand we aim for a machine-efficient model evaluation. The user specifies graphically the performance model using UML. Thereafter, the transformation of the performance model from the human-usable UML representation to the machine-efficient C++ representation is done automatically. We describe our methodology and illustrate it with the automatic transformation of a sample performance model. Furthermore, we demonstrate the usefulness of our approach by modeling and simulating a real-world material science program.