Showing papers on "Degree of parallelism published in 2004"

PDF

Open Access

Proceedings Article•DOI•

The CHAMPS system: change management with planning and scheduling

[...]

Alexander Keller¹, Joseph L. Hellerstein¹, Joel L. Wolf¹, Kun-Lung Wu¹, Vijaya Krishnan¹ - Show less +1 more•Institutions (1)

IBM¹

23 Apr 2004

TL;DR: The CHAMPS system, a prototype under development at IBM Research for Change Management with Planning and Scheduling, is discussed, able to achieve a very high degree of parallelism for a set of tasks by exploiting detailed factual knowledge about the structure of a distributed system from dependency information at runtime.

...read moreread less

Abstract: Change management is a process by which IT systems are modified to accommodate considerations such as software fixes, hardware upgrades and performance enhancements. This paper discusses the CHAMPS system, a prototype under development at IBM Research for Change Management with Planning and Scheduling. The CHAMPS system is able to achieve a very high degree of parallelism for a set of tasks by exploiting detailed factual knowledge about the structure of a distributed system from dependency information at runtime. In contrast, today's systems expect an administrator to provide such insights, which is often not the case. Furthermore, the optimization techniques we employ allow the CHAMPS system to come up with a very high quality solution for a mathematically intractable problem in a time which scales nicely with the problem size. We have implemented the CHAMPS system and have applied it in a TPC-W environment that implements an on-line book store application.

...read moreread less

151 citations

Book Chapter•DOI•

Workload characteristics of a multi-cluster supercomputer

[...]

Hui Li¹, David Groep, Lex Wolters¹•Institutions (1)

Leiden University¹

13 Jun 2004

TL;DR: This paper presents a comprehensive characterization of a multi-cluster supercomputer workload using twelve-month scientific research traces that include system utilization, job arrival rate and interarrival time, job cancellation rate, job size, job runtime, memory usage, and user/group behavior.

...read moreread less

Abstract: This paper presents a comprehensive characterization of a multi-cluster supercomputer workload using twelve-month scientific research traces. Metrics that we characterize include system utilization, job arrival rate and interarrival time, job cancellation rate, job size (degree of parallelism), job runtime, memory usage, and user/group behavior. Correlations between metrics (job runtime and memory usage, requested and actual runtime, etc) are identified and extensively studied. Differences with previously reported workloads are recognized and statistical distributions are fitted for generating synthetic workloads with the same characteristics. This study provides a realistic basis for experiments in resource management and evaluations of different scheduling strategies in a multi-cluster research environment.

...read moreread less

148 citations

Patent•

Techniques to preserve data constraints and referential integrity in asynchronous transactional replication of relational tables

[...]

Nicolas G. Adiba¹, Roberta Jo Cochrane¹, Elizabeth B. Hamel¹, Somil D. Kulkarni¹, Bruce G. Lindsay¹ - Show less +1 more•Institutions (1)

Cochrane Collaboration¹

27 Feb 2004

TL;DR: In this article, an improved method and system for preserving data constraints during parallel apply in asynchronous transaction replication in a database system have been disclosed, preserving secondary unique constraints and referential integrity constraints, while also allowing a high degree of parallelism in the application of asynchronous replication transactions.

...read moreread less

Abstract: An improved method and system for preserving data constraints during parallel apply in asynchronous transaction replication in a database system have been disclosed. The method and system preserves secondary unique constraints and referential integrity constraints, while also allowing a high degree of parallelism in the application of asynchronous replication transactions. The method and system also detects and resolves ordering problems introduced by referential integrity cascade deletes, and allows the parallel initial loading of parent and child tables of a referential integrity constraint.

...read moreread less

77 citations

Proceedings Article•DOI•

Parallel test reduces cost of test more effectively than just a cheap tester

[...]

J. Rivoir¹•Institutions (1)

Agilent Technologies¹

14 Jul 2004

TL;DR: In this paper, the authors show quantitatively that parallel test is a much effective test cost reduction method than low-cost ATE, because it reduces all test cost contributors, not only capital cost of ATE.

...read moreread less

Abstract: Today's manufacturers of high-volume consumer devices are under tremendous cost pressure and consequently under extreme pressure to reduce cost of test. Low-cost ATE has often been promoted as the obvious solution. Parallel test is another well-known approach, where multiple devices are tested in parallel (multi-site test) and/or multiple blocks within one device are tested in parallel (concurrent test). This paper shows quantitatively that parallel test is a much effective test cost reduction method than low-cost ATE, because it reduces all test cost contributors, not only capital cost of ATE. It also shows that the optimum number of sites is relatively insensitive to ATE capital cost, operating cost, yield, and various limiting factors, but the cost benefits diminish fast, if limited independent ATE resources reduce the degree of parallelism and force a partially sequential test.

...read moreread less

44 citations

Proceedings Article•

Deconstructing commit

[...]

G.B. Bell¹, Mikko H. Lipasti¹•Institutions (1)

University of Wisconsin-Madison¹

10 Mar 2004

TL;DR: This paper deconstructs the notion of commit in an out-of-order processor, and examines the set of necessary conditions under which instructions can be permitted to retire out of program order, providing a detailed analysis of the frequency and relative importance of these conditions.

...read moreread less

Abstract: Many modern processors execute instructions out of their original program order to exploit instruction-level parallelism and achieve higher performance. However even though instructions can execute in an arbitrary order, they must eventually commit, or retire from execution, in program order. This constraint provides a safety mechanism to ensure that mis-speculated instructions are not inadvertently committed, but can consume valuable processor resources and severely limit the degree of parallelism exposed in a program. We assert that such a constraint is overly conservative, and propose conditions under which it can be relaxed. This paper deconstructs the notion of commit in an out-of-order processor, and examines the set of necessary conditions under which instructions can be permitted to retire out of program order. It provides a detailed analysis of the frequency and relative importance of these conditions, and discusses microarchitectural modifications that relax the in-order commit requirement. Overall, we found that for a given set of processor resources our technique achieves speedups of up to 68% and 8% for floating point and integer benchmarks, respectively. Conversely, because out-of-order commit allows more efficient utilization of cycle-time limiting resources, it can alternatively enable simpler designs with potentially higher clock frequencies.

...read moreread less

42 citations

Journal Article•DOI•

Genetic algorithms for parallel code optimization

[...]

Ender Özcan¹, E. Onbasioglu¹•Institutions (1)

Yeditepe University¹

19 Jun 2004

TL;DR: Steady state memetic algorithm is compared with transgenerational Memetic algorithm using different crossover operators and hill-climbing methods to find the best number of processors and the best data distribution method for each stage of a parallel program.

...read moreread less

Abstract: Determining the optimum data distribution, degree of parallelism and the communication structure on distributed memory machines for a given algorithm is not a straightforward task. Assuming that a parallel algorithm consists of consecutive stages, a genetic algorithm is proposed to find the best number of processors and the best data distribution method to be used for each stage of the parallel algorithm. Steady state genetic algorithm is compared with transgenerational genetic algorithm using different crossover operators. Performance is evaluated in terms of the total execution time of the program including communication and computation times. A computation intensive, a communication intensive and a mixed implementation are utilized in the experiments. The performance of GA provides satisfactory results for these illustrative examples.

...read moreread less

31 citations

Journal Article•DOI•

A Distribution Design Methodology for Object DBMS

[...]

Fernanda Araujo Baião¹, Marta Mattoso¹, Gerson Zaverucha¹•Institutions (1)

Federal University of Rio de Janeiro¹

01 Jul 2004-Distributed and Parallel Databases

TL;DR: The analysis phase is responsible for driving the choice between the horizontal and the vertical partitioning techniques, or even the combination of both, in order to assist distribution designers in the fragmentation phase of object databases.

...read moreread less

Abstract: The design of distributed databases involves making decisions on the fragmentation and placement of data and programs across the sites of a computer network. The first phase of the distribution design in a top-down approach is the fragmentation phase, which clusters in fragments the information accessed simultaneously by applications. Most distribution design algorithms propose a horizontal or vertical class fragmentation. However, the user has no assistance in the choice between these techniques. In this work we present a detailed methodology for the design of distributed object databases that includes: (i) an analysis phase, to indicate the most adequate fragmentation technique to be applied in each class of the database schemas (ii) a horizontal class fragmentation algorithm, and (iii) a vertical class fragmentation algorithm. Basically, the analysis phase is responsible for driving the choice between the horizontal and the vertical partitioning techniques, or even the combination of both, in order to assist distribution designers in the fragmentation phase of object databases. Experiments using our methodology have resulted in fragmentation schemas offering a high degree of parallelism together with an important reduction of irrelevant data.

...read moreread less

30 citations

Proceedings Article•DOI•

Scheduling real time parallel structures on cluster computing with possible processor failures

[...]

A. Amin¹, Reda A. Ammar¹, A. El Dessouly²•Institutions (2)

University of Connecticut¹, Huazhong University of Science and Technology²

28 Jun 2004

TL;DR: An algorithm to efficiently schedule parallel task graphs (fork-join structures) that considers more than one factor at the same time, scheduability, reliability of the participating processors and achieved degree of parallelism.

...read moreread less

Abstract: Efficient task scheduling is essential for achieving high performance computing applications for distributed systems. Most of existing real-time systems consider schedulability as a main goal and ignores other effects such as machines failures. In This work we develop an algorithm to efficiently schedule parallel task graphs (fork-join structures). Our scheduling algorithm considers more than one factor at the same time. These factors are scheduability, reliability of the participating processors and achieved degree of parallelism. To achieve most of these goals, we composed an objective function that combines these different factors simultaneously. The proposed objective function is adjustable to provide the user with a way to prefer one factor to the others. The simulation results indicate that our algorithm produces schedules where the applications deadlines are met, reliability is maximized and the application parallelism is exploited.

...read moreread less

24 citations

Proceedings Article•DOI•

SBASCO: skeleton-based scientific components

[...]

Manuel Díaz, Bartolomé Rubio, Enrique Soler, José M. Troya

08 Mar 2004

TL;DR: The system programming model is presented which considers two different views of a component interface: one from the point of view of the application programmer and another thought to be used by a configuration tool in order to establish efficient implementations.

...read moreread less

Abstract: SBASCO is a new programming environment for the development of parallel and distributed high-performance scientific applications. The approach integrates both skeleton-based and component technologies. The main goal of the proposal is to provide a high-level programmability system for the efficient development of numerical applications with performance portability on different platforms. We present the system programming model which considers two different views of a component interface: one from the point of view of the application programmer and another thought to be used by a configuration tool in order to establish efficient implementations. This can be achieved due to the knowledge at the interface level of data distribution and processor layout inside each component. The programming model borrows from software skeletons a cost model enhanced by a run-time analysis, which enables one to automatically establish a suitable degree of parallelism and replication of the internal structure of a component.

...read moreread less

23 citations

Optimization Techniques for Implementing Parallel Skeletons in Grid Environments

[...]

Marco Aldinucci¹, Marco Danelutto², Largo B. Pontecorvo³, Thierry Delaittre¹, Tamas Kiss³, Ariel Goyeneche³, Gabor Terstyanszky⁴, Stephen Winter⁴, Péter Kacsuk², London W, Mta Sztaki - Show less +7 more•Institutions (4)

Hungarian Academy of Sciences¹, University of Westminster², University of Pisa³, University of Münster⁴

01 Jan 2004

TL;DR: This work describes the use and implementation of skeletons in a distributed computation environment, with the Java-based system Lithium as the reference implementation, and proposes three different optimizations based on an asynchronous, optimized RMI interaction mechanism that optimize the collection of results and the work-load balancing.

...read moreread less

Abstract: Skeletons are common patterns of parallelism such as farm and pipeline that can be abstracted and offered to the application programmer as programming primitives. We describe the use and implementation of skeletons in a distributed computation environment, with the Java-based system Lithium as our reference implementation. Our main contribution is optimization techniques based on an asynchronous, optimized RMI interaction mechanism, which we integrated into the macro data flow (MDF) evaluation technology of Lithium. In detail, we show three different optimizations: 1) a lookahead mechanism that allows to process multiple tasks concurrently at each single server and thereby increases the overall degree of parallelism, 2) a lazy task-binding technique that reduces interactions between remote servers and the task dispatcher, and 3) dynamic improvements based on process monitoring that optimize the collection of results and the work-load balancing. We report experimental results that demonstrate the achieved improvements due to the proposed optimizations on various testbeds, including heterogeneous environments.

...read moreread less

20 citations

Proceedings Article•DOI•

VLSI processor for reliable stereo matching based on window-parallel logic-in-memory architecture

[...]

Masanori Hariyama¹, Michitaka Kameyama¹•Institutions (1)

Tohoku University¹

17 Jun 2004

TL;DR: This paper presents a VLSI processor for reliable stereo matching to establish correspondence between images by selecting a desirable window size for sum of absolute differences (SAD) computation with a window-parallel and pixel-serial architecture.

...read moreread less

Abstract: This paper presents a VLSI processor for reliable stereo matching to establish correspondence between images by selecting a desirable window size for sum of absolute differences(SAD) computation. In SAD computation, a degree of parallelism between pixels in a window changes depending on its window size, while a degree of parallelism between windows is predetermined by the input-image size. Based on this consideration, a window-parallel and pixel-serial architecture is also proposed to achieve 100% utilization of processing elements. Not only 100% utilization but also a simple interconnection network between memory modules and processing elements makes the VLSI processor much superior to the pixel-parallel-architecture-based VLSI processors.

...read moreread less

The 2D Discrete Wavelet Transform on Programmable Graphics Hardware

[...]

Roberto Lario¹, Tirado Francisco, Christian Tenllado¹, Manuel Prieto•Institutions (1)

Complutense University of Madrid¹

01 Jan 2004

TL;DR: Comparisons between the most popular implementation alternatives of the Discrete Wavelet Transform, known as the lifting and filter-bank algorithms, show that the lifting algorithm can be efficiently tailored to provide best results despite the data dependencies involved in this scheme.

...read moreread less

Abstract: The growing popularity of the Discrete Wavelet Transform (DWT) has boosted its tuning on all sorts of computer systems, from special purpose hardware for embedded systems to general purpose microprocessors and multiprocessors. In this paper we continue to investigate possibilities for the implementation of the DWT, focusing on state-of-the-art programmable graphics hardware. Current design trends have transformed these devices into powerful coprocessors with enough flexibility to perform intensive and complex floating-point calculations. This study is concentrated on the comparison between the most popular implementation alternatives, known as the lifting and filter-bank algorithms. The characteristics of the filter-bank version suggest a better mapping on current graphics hardware, given that they present a higher degree of parallelism. However, our experiments show that the lifting algorithm, which exhibits lower computational demands, can be efficiently tailored to provide best results despite the data dependencies involved in this scheme, which makes the exploitation of data parallelism more difficult.

...read moreread less

Journal Article•DOI•

Hardware optimization and serial implementation of a novel spiking neuron model for the POEtic tissue.

[...]

Oriol Yuguero Torres, Jan Eriksson¹, Juan Manuel Moreno, Alessandro E. P. Villa², Alessandro E. P. Villa¹ - Show less +1 more•Institutions (2)

University of Lausanne¹, Joseph Fourier University²

01 Aug 2004-BioSystems

TL;DR: The hardware implementation of a spiking neuron model, which uses a spike time dependent plasticity (STDP) rule that allows synaptic changes by discrete time steps, is described and the serial implementation has been realized.

...read moreread less

Abstract: In this paper we describe the hardware implementation of a spiking neuron model, which uses a spike time dependent plasticity (STDP) rule that allows synaptic changes by discrete time steps. For this purpose an integrate-and-fire neuron is used with recurrent local connections. The connectivity of this model has been set to 24 neighbours, so there is a high degree of parallelism. After obtaining good results with the hardware implementation of the model, we proceed to simplify this hardware description, trying to keep the same behaviour. Some experiments using dynamic grading patterns have been used in order to test the learning capabilities of the model. Finally, the serial implementation has been realized.

...read moreread less

Proceedings Article•

Optimization techniques for skeletons on grids.

[...]

Marco Aldinucci¹, Marco Danelutto², Jan Dünnweber³, Sergei Gorlatch³•Institutions (3)

Istituto di Scienza e Tecnologie dell'Informazione¹, University of Pisa², University of Münster³

01 Jan 2004

TL;DR: This work discusses three optimizations: a lookahead mechanism that allows to process multiple tasks concurrently at each grid server and thereby increases the overall degree of parallelism, a lazy taskbinding technique that reduces interactions between grid servers and the task dispatcher, and dynamic improvements that optimize the collecting of results and the work-load balancing.

...read moreread less

Abstract: Skeletons are common patterns of parallelism, such as farm and pipeline, that can be abstracted and offered to the application programmer as programming primitives. We describe the use and implementation of skeletons on emerging computational grids, with the skeleton system Lithium, based on Java and RMI, as our reference programming syttem. Our main contribution is the exploration of optimization techniques for implementing skeletons on grids based on an optimized, future-based RMI mechanism, which we integrate into the macro-dataflow evaluation mechanism of Lithium. We discuss three optimizations: 1) a lookahead mechanism that allows to process multiple tasks concurrently at each grid server and thereby increases the overall degree of parallelism, 2) a lazy taskbinding technique that reduces interactions between grid servers and the task dispatcher, and 3) dynamic improvements that optimize the collecting of results and the work-load balancing. We report experimental results that demonstrate the improvements due to our optimizations on various testbeds, including a heterogeneous grid-like environment.

...read moreread less

Journal Article•DOI•

Embedding agents within the intruder to detect parallel attacks

[...]

P. J. Broadfoot¹, A. W. Roscoe¹•Institutions (1)

University of Oxford¹

01 May 2004-Journal of Computer Security

TL;DR: This paper reports significant progress towards the solution of the problem of internalising protocol roles within the “intruder” process, by means anticipated in [5], namely by “internalising” protocol roles by means of a finite FDR check.

...read moreread less

Abstract: We carry forward the work described in our previous papers [5,18,20] on the application of data independence to the model checking of security protocols using CSP [19] and FDR [10]. In particular, we showed how techniques based on data independence [12,19] could be used to justify, by means of a finite FDR check, systems where agents can perform an unbounded number of protocol runs. Whilst this allows for a more complete analysis, there was one significant incompleteness in the results we obtained: while each individual identity could perform an unlimited number of protocol runs sequentially, the degree of parallelism remained bounded (and small to avoid state space explosion). In this paper, we report significant progress towards the solution of this problem, by means anticipated in [5], namely by “internalising” protocol roles within the “intruder” process. The internalisation of protocol roles (initially only server-type roles) was introduced in [20] as a state-space reduction technique (for which it is usually spectacularly successful). It was quickly noticed that this had the beneficial side-effect of making the internalised server arbitrarily parallel, at least in cases where it did not generate any new values of data independent type. We now consider the case where internal roles do introduce fresh values and address the issue of capturing their state of mind (for the purposes of analysis).

...read moreread less

Proceedings Article•DOI•

An efficient VLSI/FPGA architecture for combining an analysis filterbank following a synthesis filterbank

[...]

R.K. Sande¹, B. Anantharaman•Institutions (1)

Samsung R&D Institute India - Bangalore¹

23 May 2004

TL;DR: This paper describes an efficient structure to implement a system consisting of an M-channel synthesis filterbank followed by an L-channel analysis filterbank that is very efficient in VLSI, FPGA or parallel processor implementation.

...read moreread less

Abstract: This paper describes an efficient structure to implement a system consisting of an M-channel synthesis filterbank followed by an L-channel analysis filterbank (where M is a multiple of L or L is a multiple of M). The structure is very efficient in VLSI, FPGA or parallel processor implementation in terms of requiring less area or logic blocks, lower power consumption and extending the degree of parallelism. The proposed method is applicable in situations where a subband based processing or encoding follows another subband based processing or decoding and the intermediate synthesized signal is not a desired signal in itself.

...read moreread less

Book Chapter•DOI•

Implementation of parallel numerical algorithms using hierarchically tiled arrays

[...]

Ganesh Bikshandi¹, Basilio B. Fraguela, Jia Guo¹, María Jesús Garzarán¹, Gheorghe Almasi², José E. Moreira², David Padua¹ - Show less +3 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, IBM²

22 Sep 2004

TL;DR: This paper has implemented HTAs as a MATLAB TM toolbox, overloading conventional operators and array functions such that HTA operations appear to the programmer as extensions of MATLABTM.

...read moreread less

Abstract: In this paper, we describe our experience in writing parallel numerical algorithms using Hierarchically Tiled Arrays (HTAs) HTAs are classes of objects that encapsulate parallelism HTAs allow the construction of single-threaded parallel programs where a master process distributes tasks to be executed by a collection of servers holding the components (tiles) of the HTAs The tiled and recursive nature of HTAs facilitates the development of algorithms with a high degree of parallelism as well as locality We have implemented HTAs as a MATLABTM toolbox, overloading conventional operators and array functions such that HTA operations appear to the programmer as extensions of MATLABTM We have successfully used it to write some widely used parallel numerical programs The resulting programs are easier to understand and maintain than their MPI counterparts

...read moreread less

Proceedings Article•DOI•

A left-edge algorithm approach for scheduling and allocation of hardware contexts in dynamically reconfigurable architectures

[...]

Remy Eskinazi Sant'Anna, Manoel Eusebio de Lima¹, Paulo Maciel¹•Institutions (1)

Federal University of Pernambuco¹

22 Feb 2004

TL;DR: This work aims to describe a methodology for scheduling and allocation of hardware contexts, in applications with high degree of parallelism, in a Run-Time-Reconfiguration (RTR) proceeding for a reconfigurable FPGA.

...read moreread less

Abstract: This work aims to describe a methodology for scheduling and allocation of hardware contexts, in applications with high degree of parallelism, in a Run-Time-Reconfiguration (RTR) proceeding for a reconfigurable FPGA. The Scheduling approach is based on the hardware resource distribution in the FPGA architecture. The Scheduler is modeled as a Petri Net and the best performance yields the best scheduling. The hardware contexts allocation is based on a Left-Edge algorithm principle for rationalization of resources in scheduling approach. The adaptation of the algorithm considers that pre-located areas for loading of the contexts in the architecture are used.

...read moreread less

Proceedings Article•DOI•

IFFT-FFT core architecture with an identical stage structure for wireless LAN communications

[...]

Moises Serra, Pere Marti, Jordi Carrabina

11 Jul 2004

TL;DR: This work shows the design of the IFFT module corresponding to the baseband processing of an OFDM transmitter according to the IEEE802.11a-g and Hiperlan/2 standard.

...read moreread less

Abstract: This work shows the design of the IFFT module corresponding to the baseband processing of an OFDM transmitter according to the IEEE802.11a-g and Hiperlan/2 standard. This module will be included in a future OFDM demonstrator, which will be implemented into a programmable logic device. We have used our own algorithm for IFFT computing. It is based on the recursive properties called decimation. This algorithm offers optimal characteristics for the hardware implementation: a high degree of parallelism and exactly the same interconnection pattern between any of the algorithm stages. A new point of view in the prototyping design flow and the verification process comes from the use of the last generation system level design environments for DSPs into FPGAs. These environments, called visual data flows, are ideally suited for modeling DSP systems since they allow a high level of functional abstraction with different data types and operators.

...read moreread less

Journal Article•DOI•

A case study of parallel I/O for biological sequence search on Linux clusters

[...]

Yifeng Zhu¹, Hong Jiang², Xiao Qin³, David Swanson²•Institutions (3)

University of Maine¹, University of Nebraska–Lincoln², New Mexico Institute of Mining and Technology³

31 Jan 2004

TL;DR: The goal is to study the performance gain from parallel I/O under the constraints of different numbers of commodity storage devices in a Linux cluster, and evaluates two read performance optimisation techniques employed in CEFT-PVFS.

...read moreread less

Abstract: In this work, we investigate parallel I/O efficiencies in parallelised BLAST, the most popular tool for searching similarity in biological databases and implement two variations by incorporating the PVFS and CEFT-PVFS parallel I/O facilities. Our goal is to study the performance gain from parallel I/O under the constraints of different numbers of commodity storage devices in a Linux cluster. We also evaluate two read performance optimisation techniques employed in CEFT-PVFS: (1) doubling the degree of parallelism is shown to have comparable read performance with respect to PVFS when both systems have the same number of servers; (2) skipping hot-spot nodes can reduce the performance penalty when I/O workloads are highly imbalanced. The I/O resource contention between multiple applications, running in the same cluster, can degrade the performance of the original parallel BLAST and the PVFS version up to 10- and 21-fold, respectively; whereas, the one based on CEFT-PVFS, which has the ability to skip hot-spot nodes, suffered only a two-fold performance degradation.

...read moreread less

Book Chapter•DOI•

Interval parallel global optimization with charm

[...]

J. A. Martínez¹, Leocadio G. Casado¹, José A. Alvarez¹, Inmaculada García¹•Institutions (1)

University of Almería¹

20 Jun 2004

TL;DR: This work evaluates a parallel version of AMIGO (Advanced Multidimensional Interval Analysis Global Optimization) algorithm that makes an efficient use of all the available information in continuous differentiable problems to reduce the search domain and to accelerate the search.

...read moreread less

Abstract: Interval Global Optimization based on Branch and Bound (B&B) technique is a standard for searching an optimal solution in the scope of continuous and discrete Global Optimization. It iteratively creates a search tree where each node represents a problem which is decomposed in several subproblems provided that a feasible solution can be found by solving this set of subproblems. The enormous computational power needed to solved most of the B&B Global Optimization problems and their high degree of parallelism make them suitable candidates to be solved in a multiprocessing environment. This work evaluates a parallel version of AMIGO (Advanced Multidimensional Interval Analysis Global Optimization) algorithm. AMIGO makes an efficient use of all the available information in continuous differentiable problems to reduce the search domain and to accelerate the search. Our parallel version takes advantage of the capabilities offered by Charm++. Preliminary results show our proposal as a good candidate to solve very hard global optimization problems.

...read moreread less

Embedding agents within the intruder to detect parallel attacks

[...]

Ios Press

01 Jan 2004

...read moreread less

Proceedings Article•DOI•

Employing Compilers for Determining Architectural Features of Application-Specific DSPs

[...]

Jie Guo¹, M. Hosemann¹, Gerhard Fettweis¹•Institutions (1)

Dresden University of Technology¹

07 Sep 2004

TL;DR: A compiler-based methodology is introduced to speed-up and simplify the customization of the data path platform for AS-DSPs based on the SUIF compiler framework by Stanford University.

...read moreread less

Abstract: In order to achieve high performance and low hardware overhead over application specific integrated circuits (ASICs), application-specific DSPs (AS-DSPs) are more and more widely used. However, designing them is still a tedious, time-consuming and error-prone task since each application has to be analyzed thoroughly, which is usually done by hand. Recently, we proposed a platform approach to design data paths for AS-DSPs. In this paper we are introducing a compiler-based methodology to speed-up and simplify the customization of the data path platform. Based on the SUIF compiler framework by Stanford University, we implemented analysis passes to determine the kind and useful number of functional units, the potential degree of parallelism, and the required connectivity between functional units.

...read moreread less

Proceedings Article•DOI•

Matrix factorizations for parallel integer transforms

[...]

Yiyuan She¹, Pengwei Hao¹, Yakup Paker•Institutions (1)

Peking University¹

10 May 2004

TL;DR: From block factorizations for any nonsingular transform matrix, two types of parallel elementary reversible matrix (PERM) factorizations are introduced which are helpful for the parallelization of perfectly reversible integer transforms.

...read moreread less

Abstract: Integer mapping is critical for lossless source coding and the techniques have been used for image compression in the new international image compression standard, JPEG 2000. In this paper, from block factorizations for any nonsingular transform matrix, we introduce two types of parallel elementary reversible matrix (PERM) factorizations which are helpful for the parallelization of perfectly reversible integer transforms. With improved degree of parallelism (DOP) and parallel performance, the cost of multiplication and addition can be respectively reduced to O(logN) and O(log2N) for an N-by-N transform matrix. These make PERM factorizations an effective means of developing parallel integer transforms for large matrices. Besides, we also present a scheme to block the matrix and allocate the load of processors for efficient transformation.

...read moreread less

Proceedings Article•DOI•

Brief announcement: adaptive balancing networks

[...]

Srikanta Tirthapura¹•Institutions (1)

Iowa State University¹

25 Jul 2004

TL;DR: This work presents an adaptive construction of the bitonic balancing network, which tunes its width to the system size in a distributed and local way, and does this with the help of an efficient peer-to-peer lookup service.

...read moreread less

Abstract: We present an adaptive construction of the bitonic balancing network. Our network tunes its width (the degree of parallelism) to the system size in a distributed and local way, and does this with the help of an efficient peer-to-peer lookup service. In contrast, all previously known constructions were static, and had the same width irrespective of the system size.Our technique is quite general: though we describe here the construction of the bitonic balancing network, this could be used in the adaptive construction of any distributed data structure which can be decomposed in a recursive manner.

...read moreread less