scispace - formally typeset
Search or ask a question

Showing papers on "Parallel algorithm published in 2005"


Proceedings ArticleDOI
27 Nov 2005
TL;DR: This paper designs incremental and parallel versions of the co-clustering algorithm and uses it to build an efficient real-time CF framework and demonstrates that this approach provides an accuracy comparable to that of the correlation and matrix factorization based approaches at a much lower computational cost.
Abstract: Collaborative filtering-based recommender systems have become extremely popular due to the increase in Web-based activities such as e-commerce and online content distribution. Current collaborative filtering (CF) techniques such as correlation and SVD based methods provide good accuracy, but are computationally expensive and can be deployed only in static off-line settings. However, a number of practical scenarios require dynamic real-time collaborative filtering that can allow new users, items and ratings to enter the system at a rapid rate. In this paper, we consider a novel CF approach based on a proposed weighted co-clustering algorithm (Banerjee et al., 2004) that involves simultaneous clustering of users and items. We design incremental and parallel versions of the co-clustering algorithm and use it to build an efficient real-time CF framework. Empirical evaluation demonstrates that our approach provides an accuracy comparable to that of the correlation and matrix factorization based approaches at a much lower computational cost.

445 citations


Journal ArticleDOI
TL;DR: An overview of the algorithms, design philosophy, and implementation techniques in the software SuperLU, for solving sparse unsymmetric linear systems, and some examples of how the solver has been used in large-scale scientific applications, and the performance.
Abstract: We give an overview of the algorithms, design philosophy, and implementation techniques in the software SuperLU, for solving sparse unsymmetric linear systems. In particular, we highlight the differences between the sequential SuperLU (including its multithreaded extension) and parallel SuperLU_DIST. These include the numerical pivoting strategy, the ordering strategy for preserving sparsity, the ordering in which the updating tasks are performed, the numerical kernel, and the parallelization strategy. Because of the scalability concern, the parallel code is drastically different from the sequential one. We describe the user interfaces of the libraries, and illustrate how to use the libraries most efficiently depending on some matrix characteristics. Finally, we give some examples of how the solver has been used in large-scale scientific applications, and the performance.

371 citations


Journal ArticleDOI
TL;DR: This paper presents a novel evolutionary optimization methodology for multiband and wide-band patch antenna designs that combines the particle swarm optimization and the finite-difference time-domain to achieve the optimum antenna satisfying a certain design criterion.
Abstract: This paper presents a novel evolutionary optimization methodology for multiband and wide-band patch antenna designs. The particle swarm optimization (PSO) and the finite-difference time-domain (FDTD) are combined to achieve the optimum antenna satisfying a certain design criterion. The antenna geometric parameters are extracted to be optimized by PSO, and a fitness function is evaluated by FDTD simulations to represent the performance of each candidate design. The optimization process is implemented on parallel clusters to reduce the computational time introduced by full-wave analysis. Two examples are investigated in the paper: first, the design of rectangular patch antennas is presented as a test of the parallel PSO/FDTD algorithm. The optimizer is then applied to design E-shaped patch antennas. It is observed that by using different fitness functions, both dual-frequency and wide-band antennas with desired performance are obtained by the optimization. The optimized E-shaped patch antennas are analyzed, fabricated, and measured to validate the robustness of the algorithm. The measured less than - 18 dB return loss (for dual-frequency antenna) and 30.5% bandwidth (for wide-band antenna) exhibit the prospect of the parallel PSO/FDTD algorithm in practical patch antenna designs.

306 citations


Journal ArticleDOI
TL;DR: The employment of the overset-grid techniques, coupled with high- order interpolation at overset boundaries, was found to be an effective way of employing the high-order algorithm for more complex geometries than was previously possible.

269 citations


Journal Article
TL;DR: A parallel version of the particle swarm optimization (PPSO) algorithm together with three communication strategies which can be used according to the independence of the data, which demonstrates the usefulness of the proposed PPSO algorithm.
Abstract: Particle swarm optimization (PSO) is an alternative population-based evolutionary computation technique. It has been shown to be capable of optimizing hard mathematical problems in continuous or binary space. We present here a parallel version of the particle swarm optimization (PPSO) algorithm together with three communication strategies which can be used according to the independence of the data. The first strategy is designed for solution parameters that are independent or are only loosely correlated, such as the Rosenbrock and Rastrigrin functions. The second communication strategy can be applied to parameters that are more strongly correlated such as the Griewank function. In cases where the properties of the parameters are unknown, a third hybrid communication strategy can be used. Experimental results demonstrate the usefulness of the proposed PPSO algorithm.

250 citations


Journal ArticleDOI
TL;DR: Two new parallel AMG coarsening schemes are proposed, that are based on solely enforcing a maximum independent set property, resulting in sparser coarse grids and the performance of the new preconditioners is examined.
Abstract: Algebraic multigrid (AMG) is a very efficient iterative solver and preconditioner for large unstructured sparse linear systems. Traditional coarsening schemes for AMG can, however, lead to computational complexity growth as problem size increases, resulting in increased memory use and execution time, and diminished scalability. Two new parallel AMG coarsening schemes are proposed that are based solely on enforcing a maximum independent set property, resulting in sparser coarse grids. The new coarsening techniques remedy memory and execution time complexity growth for various large three-dimensional (3D) problems. If used within AMG as a preconditioner for Krylov subspace methods, the resulting iterative methods tend to converge fast. This paper discusses complexity issues that can arise in AMG, describes the new coarsening schemes, and examines the performance of the new preconditioners for various large 3D problems.

196 citations


Book ChapterDOI
30 Aug 2005
TL;DR: This study focuses on the high-performance, parallel application SMG2000, a much studied code whose variations in execution times are still not well understood, and employs multilayer neural networks trained on input data from executions on the target platform to predict performance.
Abstract: Accurately modeling and predicting performance for large-scale applications becomes increasingly difficult as system complexity scales dramatically. Analytic predictive models are useful, but are difficult to construct, usually limited in scope, and often fail to capture subtle interactions between architecture and software. In contrast, we employ multilayer neural networks trained on input data from executions on the target platform. This approach is useful for predicting many aspects of performance, and it captures full system complexity. Our models are developed automatically from the training input set, avoiding the difficult and potentially error-prone process required to develop analytic models. This study focuses on the high-performance, parallel application SMG2000, a much studied code whose variations in execution times are still not well understood. Our model predicts performance on two large-scale parallel platforms within 5%-7% error across a large, multi-dimensional parameter space.

182 citations


Book ChapterDOI
12 Dec 2005
TL;DR: A novel “lazy” list-based implementation of a concurrent set object based on an optimistic locking scheme for inserts and removes, eliminating the need to use the equivalent of an atomically markable reference.
Abstract: List-based implementations of sets are a fundamental building block of many concurrent algorithms. A skiplist based on the lock-free list-based set algorithm of Michael will be included in the JavaTM Concurrency Package of JDK 1.6.0. However, Michael's lock-free algorithm has several drawbacks, most notably that it requires all list traversal operations, including membership tests, to perform cleanup operations of logically removed nodes, and that it uses the equivalent of an atomically markable reference, a pointer that can be atomically “marked,” which is expensive in some languages and unavailable in others. We present a novel “lazy” list-based implementation of a concurrent set object. It is based on an optimistic locking scheme for inserts and removes, eliminating the need to use the equivalent of an atomically markable reference. It also has a novel wait-free membership test operation (as opposed to Michael's lock-free one) that does not need to perform cleanup operations and is more efficient than that of all previous algorithms. Empirical testing shows that the new lazy-list algorithm consistently outperforms all known algorithms, including Michael's lock-free algorithm, throughout the concurrency range. At high load, with 90% membership tests, the lazy algorithm is more than twice as fast as Michael's. This is encouraging given that typical search structure usage patterns include around 90% membership tests. By replacing the lock-free membership test of Michael's algorithm with our new wait-free one, we achieve an algorithm that slightly outperforms our new lazy-list (though it may not be as efficient in other contexts as it uses Java's RTTI mechanism to create pointers that can be atomically marked).

178 citations


Book
15 May 2005
TL;DR: This work attempts to bridge the gap between theory and practice concentrating on modern algorithmic implementation on parallel architecture machines, with the main focus being on parallel algorithm development often to applied industrial problems.
Abstract: The Monte Carlo method is inherently parallel and the extensive and rapid development in vector and parallel computers has resulted in renewed and increasing interest in this method. At the same time there has been an expansion in the application areas and the method is now widely used in many important areas of science including nuclear and semiconductor physics, statistical mechanics and heat and mass transfer. This work attempts to bridge the gap between theory and practice concentrating on modern algorithmic implementation on parallel architecture machines. Although a suitable text for final year or postgraduate mathematicians it is principally aimed at the applied scientists - only a small amount of mathematical knowledge is assumed and theorem proving is kept to a minimum, with the main focus being on parallel algorithm development often to applied industrial problems. Algorithms are developed both for MIMD machines with distributed memory and SIMD machines; a selection of programs are provided.

175 citations


Journal ArticleDOI
TL;DR: The communication pattern and study the scalability of a distributed memory implementation of the multilevel fast multipole algorithm (MLFMA) called ScaleME, which uses the message passing interface (MPI) for communication between processors.
Abstract: In this paper, we analyze the communication pattern and study the scalability of a distributed memory implementation of the multilevel fast multipole algorithm (MLFMA) called ScaleME. ScaleME uses the message passing interface (MPI) for communication between processors. The parallelization of MLFMA uses a novel a hybrid scheme for distributing the workload across the processors. We study the communication and computational behavior and demonstrate the effectiveness of the parallelization scheme using realistic problems.

153 citations


Journal ArticleDOI
TL;DR: An algorithmic extension of Powell's UOBYQA algorithm (Unconstrained Optimization BY Quadratical Approximation) is presented and a new, easily comprehensible and fully stand-alone implementation in C++ of the parallel algorithm is presented.

Proceedings ArticleDOI
15 Jun 2005
TL;DR: This work develops a general framework for adaptive algorithm selection for use in the Standard Template Adaptive Parallel Library (STAPL), using machine learning techniques to analyze data collected by STAPL installation benchmarks and to determine tests that will select among algorithmic options at run-time.
Abstract: Writing portable programs that perform well on multiple platforms or for varying input sizes and types can be very difficult because performance is often sensitive to the system architecture, the run-time environment, and input data characteristics This is even more challenging on parallel and distributed systems due to the wide variety of system architectures One way to address this problem is to adaptively select the best parallel algorithm for the current input data and system from a set of functionally equivalent algorithmic options Toward this goal, we have developed a general framework for adaptive algorithm selection for use in the Standard Template Adaptive Parallel Library (STAPL) Our framework uses machine learning techniques to analyze data collected by STAPL installation benchmarks and to determine tests that will select among algorithmic options at run-time We apply a prototype implementation of our framework to two important parallel operations, sorting and matrix multiplication, on multiple platforms and show that the framework determines run-time tests that correctly select the best performing algorithm from among several competing algorithmic options in 86-100% of the cases studied, depending on the operation and the system

Journal ArticleDOI
TL;DR: A series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized fulllocking, and cache-sensitive locking are developed, and a reduction-object-based interface for specifying a data mining algorithm is proposed.
Abstract: With recent technological advances, shared memory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In This work, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of popular data mining algorithms. In addition, we propose a reduction-object-based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the techniques we have developed starting from a common specification of the algorithm. We have carried out a detailed evaluation of the parallelization techniques and the programming interface. We have experimented with apriori and fp-tree-based association mining, k-means clustering, k-nearest neighbor classifier, and decision tree construction. The main results from our experiments are as follows: 1) Among full replication, optimized full locking, and cache-sensitive locking, there is no clear winner. Each of these three techniques can outperform others depending upon machine and dataset parameters. These three techniques perform significantly better than the other two techniques. 2) Good parallel efficiency is achieved for each of the four algorithms we experimented with, using our techniques and runtime system. 3) The overhead of the interface is within 10 percent in almost all cases. 4) In the case of decision tree construction, combining different techniques turned out to be crucial for achieving high performance.

Journal ArticleDOI
TL;DR: Results suggest that shortening scheduling times leads to a higher guarantee ratio, and if parallel scheduling algorithms are applied to shorten scheduling times, the performance of heterogeneous clusters will be further enhanced.

Journal ArticleDOI
TL;DR: The parallel genetic simulated annealing (PGSA) has been developed and used to optimize the cutting parameters for multi-pass milling process and is shown to be more suitable and efficient for optimizing thecutting parameters for milling operation than GP+DP and PGA.
Abstract: This paper presents an approach to select the optimal machining parameters for multi-pass milling. It is based on two recent approaches, genetic algorithm (GA) and simulated annealing (SA), which have been applied to many difficult combinatorial optimization problems with certain strengths and weaknesses. In this paper, a hybrid of GA and SA (GSA) is presented to use the strengths of GA and SA and overcome their weaknesses. In order to improve, the performance of GSA further, the parallel genetic simulated annealing (PGSA) has been developed and used to optimize the cutting parameters for multi-pass milling process. For comparison, conventional parallel GA (PGA) is also chosen as another optimization method. An application example that has been solved previously using the geometric programming (GP) and dynamic programming (DP) method is presented. From the given results, PGSA is shown to be more suitable and efficient for optimizing the cutting parameters for milling operation than GP+DP and PGA.

Journal ArticleDOI
TL;DR: A new randomized algorithm and implementation with superior performance that for the first time achieves parallel speedup on arbitrary graphs (both regular and irregular topologies) when compared with the best sequential implementation for finding a spanning tree.

Journal ArticleDOI
01 Sep 2005
TL;DR: A Recovering Beam Search algorithm is developed for the unrelated parallel machine scheduling problem that requires polynomial time and is able to generate approximate solutions for instances with large size using a few minutes of computation time.
Abstract: This paper considers the problem of scheduling jobs on unrelated parallel machines to minimize the makespan. Recovering Beam Search is a recently introduced method for obtaining approximate solutions to combinatorial optimization problems. A traditional Beam Search algorithm is a type of truncated branch and bound algorithm approach. However, Recovering Beam Search allows the possibility of correcting wrong decisions by replacing partial solutions with others. We develop a Recovering Beam Search algorithm for our unrelated parallel machine scheduling problem that requires polynomial time. Computational results show that it is able to generate approximate solutions for instances with large size (up to 1000 jobs) using a few minutes of computation time.

Proceedings ArticleDOI
18 Apr 2005
TL;DR: This work introduces an efficient "systolic injection" method for intelligently reporting unpredictably generated mid-array results to a controller without any chance of collision or excessive stalling in the Apriori algorithm.
Abstract: The Apriori algorithm is a popular correlation-based data mining kernel. However, it is a computationally expensive algorithm and the running times can stretch up to days for large databases, as database sizes can extend to Gigabytes. Through the use of a new extension to the systolic array architecture, time required for processing can be significantly reduced. Our array architecture implementation on a Xilinx Virtex-II Pro 100 provides a performance improvement that can be orders of magnitude faster than the state-of-the-art software implementations. The system is easily scalable and introduces an efficient "systolic injection" method for intelligently reporting unpredictably generated mid-array results to a controller without any chance of collision or excessive stalling.

Proceedings ArticleDOI
21 Aug 2005
TL;DR: An algorithm, called Par-CSP (Parallel Closed Sequential Pattern mining), to conduct parallel mining of closed sequential patterns on a distributed memory system by exploiting the divide-and-conquer property so that the overhead of interprocessor communication is minimized.
Abstract: Discovery of sequential patterns is an essential data mining task with broad applications. Among several variations of sequential patterns, closed sequential pattern is the most useful one since it retains all the information of the complete pattern set but is often much more compact than it. Unfortunately, there is no parallel closed sequential pattern mining method proposed yet. In this paper we develop an algorithm, called Par-CSP (Parallel Closed Sequential Pattern mining), to conduct parallel mining of closed sequential patterns on a distributed memory system. Par-CSP partitions the work among the processors by exploiting the divide-and-conquer property so that the overhead of interprocessor communication is minimized. Par-CSP applies dynamic scheduling to avoid processor idling. Moreover, it employs a technique, called selective sampling to address the load imbalance problem. We implement Par-CSP using MPI on a 64-node Linux cluster. Our experimental results show that Par-CSP attains good parallelization efficiencies on various input datasets.

Proceedings ArticleDOI
David A. Bader, Guojing Cong1, John Feo2
14 Jun 2005
TL;DR: This paper considers the performance and scalability of two graph algorithms, list ranking and connected components, on two classes of shared-memory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures (MTA)such as the Cray MTA-2.
Abstract: Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to non-contiguous, concurrent accesses to global data structures with low degrees of locality The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, contiguous memory accesses, and so are inefficient platforms for such algorithms Few parallel graph algorithms outperform their best sequential implementation on SMP clusters due to long memory latencies and high synchronization costs In this paper, we consider the performance and scalability of two graph algorithms, list ranking and connected components, on two classes of shared-memory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures (MTA) such as the Cray MTA-2 While previous studies have shown that parallel graph algorithms can speedup on SMPs, the systems' reliance on cache microprocessors limits performance The MTA's latency tolerant processors and hardware support for fine-grain synchronization makes performance a function of parallelism Since parallel graph algorithms have an abundance of parallelism, they perform and scale significantly better on the MTA We describe and give a performance model for each architecture We analyze the performance of the two algorithms and discuss how the features of each architecture affects algorithm development, ease of programming, performance, and scalability

Journal ArticleDOI
TL;DR: The analytical and experimental performance shows that the proposed parallel algorithm has better speed-up, less communication time, and better space reduction factor than the earlier algorithm.
Abstract: This work presents an efficient mapping scheme for the multilayer perceptron (MLP) network trained using back-propagation (BP) algorithm on network of workstations (NOWs). Hybrid partitioning (HP) scheme is used to partition the network and each partition is mapped on to processors in NOWs. We derive the processing time and memory space required to implement the parallel BP algorithm in NOWs. The performance parameters like speed-up and space reduction factor are evaluated for the HP scheme and it is compared with earlier work involving vertical partitioning (VP) scheme for mapping the MLP on NOWs. The performance of the HP scheme is evaluated by solving optical character recognition (OCR) problem in a network of ALPHA machines. The analytical and experimental performance shows that the proposed parallel algorithm has better speed-up, less communication time, and better space reduction factor than the earlier algorithm. This work also presents a simple and efficient static mapping scheme on heterogeneous system. Using divisible load scheduling theory, a closed-form expression for number of neurons assigned to each processor in the NOW is obtained. Analytical and experimental results for static mapping problem on NOWs are also presented.

Book ChapterDOI
27 Aug 2005
TL;DR: This paper describes how fine-grained parallel genetic algorithms can be mapped to programmable graphics hardware found in commodity PC and demonstrates the effectiveness of the approach by comparing it with compatible software implementation.
Abstract: Parallel genetic algorithms are usually implemented on parallel machines or distributed systems. This paper describes how fine-grained parallel genetic algorithms can be mapped to programmable graphics hardware found in commodity PC. Our approach stores chromosomes and their fitness values in texture memory on graphics card. Both fitness evaluation and genetic operations are implemented entirely with fragment programs executed on graphics processing unit in parallel. We demonstrate the effectiveness of our approach by comparing it with compatible software implementation. The presented approach allows us benefit from the advantages of parallel genetic algorithms on low-cost platform.

Journal ArticleDOI
TL;DR: To factor the product of two large prime numbers, is a breakthrough in basic biological operations using a molecular computer and indicates that the cryptosystems using public-key are perhaps insecure and presents clear evidence of the ability of molecular computing to perform complicated mathematical operations.
Abstract: The RSA public-key cryptosystem is an algorithm that converts input data to an unrecognizable encryption and converts the unrecognizable data back into its original decryption form. The security of the RSA public-key cryptosystem is based on the difficulty of factoring the product of two large prime numbers. This paper demonstrates to factor the product of two large prime numbers, and is a breakthrough in basic biological operations using a molecular computer. In order to achieve this, we propose three DNA-based algorithms for parallel subtractor, parallel comparator, and parallel modular arithmetic that formally verify our designed molecular solutions for factoring the product of two large prime numbers. Furthermore, this work indicates that the cryptosystems using public-key are perhaps insecure and also presents clear evidence of the ability of molecular computing to perform complicated mathematical operations.

Journal ArticleDOI
01 Jan 2005
TL;DR: A data distributed parallel algorithm that is capable of aligning large-scale three-dimensional images of deformable objects and requires less amount of memory resources, so that aligns datasets up to 1024x1024x590 voxel images with reducing the execution time from hours to minutes, a clinically compatible time.
Abstract: Image registration is a technique for defining a geometric relationship between each point in images. This paper presents a data distributed parallel algorithm that is capable of aligning large-scale three-dimensional (3-D) images of deformable objects. The novelty of our algorithm is to overcome the limitations on the memory space as well as the execution time. In order to enable this, our algorithm incorporates data distribution, data-parallel processing, and load balancing techniques into Schnabel's registration algorithm that realizes robust and efficient alignment based on information theory and adaptive mesh refinement. We also present some experimental results obtained on a 128-CPU cluster of PCs interconnected by Myrinet and Fast Ethernet switches. The results show that our algorithm requires less amount of memory resources, so that aligns datasets up to 1024x1024x590 voxel images with reducing the execution time from hours to minutes, a clinically compatible time.

01 Jan 2005
TL;DR: A parallel algorithm for distributed memory parallel computers for adaptive local refinement of tetrahedral meshes using bisection, part of PHG, Parallel Hierarchical Grid, a toolbox under development for parallel adaptive multigrid solution of PDEs.
Abstract: Local mesh refinement is one of the key steps in implementations of adaptive finite element methods. This paper presents a parallel algorithm for distributed memory parallel computers for adaptive local refinement of tetrahedral meshes using bisection. The algorithm is part of PHG, Parallel Hierarchical Grid, a toolbox under development for parallel adaptive multigrid solution of PDEs. The algorithm proposed is characterized by allowing simultaneous refinement of submeshes to arbitrary levels before synchronization between submeshes and without the need of a central coordinator process for managing new vertices. Some general properties on local refinement of conforming tetrahedral meshes using bisection are also discussed which are useful in analysing and validating the parallel refinement algorithm as well as in simplifying the implementation.

Book ChapterDOI
19 Jun 2005
TL;DR: This paper document numerous pitfalls one may fall into when evaluating the performance of a complex system, with the hope of providing at least some help in avoiding them.
Abstract: There are many choices to make when evaluating the performance of a complex system. In the context of parallel job scheduling, one must decide what workload to use and what measurements to take. These decisions sometimes have subtle implications that are easy to overlook. In this paper we document numerous pitfalls one may fall into, with the hope of providing at least some help in avoiding them. Along the way, we also identify topics that could benefit from additional research.

Proceedings ArticleDOI
27 Dec 2005
TL;DR: Application of commodity available GPU for two kinds of ANN models was explored, one is the self-organizing maps (SOM); the other is multi layer perceptron (MLP), and the result shows that ANN computing on GPU is much faster than on standard CPU when the neural network is large.
Abstract: Artificial neural network (ANN) is widely used in pattern recognition related area In some case, the computational load is very heavy, in other case, real time process is required So there is a need to apply a parallel algorithm on it, and usually the computation for ANN is inherently parallel In this paper, graphic hardware is used to speed up the computation of ANN In recent years, graphic processing unit (GPU) grows faster than CPU Graphic hardware venders provide programmability on GPU In this paper, application of commodity available GPU for two kinds of ANN models was explored One is the self-organizing maps (SOM); the other is multi layer perceptron (MLP) The computation result shows that ANN computing on GPU is much faster than on standard CPU when the neural network is large And some design rules for improve the efficiency on GPU are given

Journal ArticleDOI
TL;DR: This paper studies the speed-up gained via adaptive mesh refinement, and/or parallelization in multiphase flow in general geometries through parallelized, adaptive algorithm.

Journal ArticleDOI
TL;DR: The implementation of a recently proposed parallel algorithm that finds strongly connected components in distributed graphs, and how it is used in a radiation transport solver is described.

Journal ArticleDOI
TL;DR: This paper presents an iterative list scheduling algorithm to deal with scheduling on heterogeneous computing systems and shows that in the majority of the cases, there is significant improvement to the initial schedule.