Author

# Franklin Lowenthal

Bio: Franklin Lowenthal is an academic researcher from California State University. The author has contributed to research in topics: Load balancing (computing) & Differential equation. The author has an hindex of 4, co-authored 15 publications receiving 46 citations.

##### Papers

More filters

••

15 Nov 2003TL;DR: Performance results show that programs that use Dyn-MPI execute efficiently in non dedicated environments, including up to almost a three-fold improvement compared to programs that do not redistribute data and a 25% improvement over standard adaptive load balancing techniques.

Abstract: Distributing data is a fundamental problem in implementing efficient distributed-memory parallel programs. The problem becomes more difficult in environments where the participating nodes are not dedicated to a parallel application. We are investigating the data distribution problem in non dedicated environments in the context of explicit message-passing programs. To address this problem, we have designed and implemented an extension to MPI called Dynamic MPI (Dyn-MPI). The key component of Dyn-MPI is its run-time system, which efficiently and automatically redistributes data on the fly when there are changes in the application or the underlying environment. Dyn-MPI supports efficient memory allocation, precise measurement of system load and computation time, and node removal. Performance results show that programs that use Dyn-MPI execute efficiently in non dedicated environments, including up to almost a three-fold improvement compared to programs that do not redistribute data and a 25% improvement over standard adaptive load balancing techniques.

15 citations

••

TL;DR: Performance results show that programs that use Dyn-MPI execute efficiently in non-dedicated environments, including up to almost a threefold improvement compared to programs that do not redistribute data and a 25% improvement over standard adaptive load balancing techniques.

Abstract: Distributing data is a fundamental problem in implementing efficient distributed-memory parallel programs. The problem becomes more difficult in environments where the participating nodes are not dedicated to a parallel application. We are investigating the data distribution problem in non-dedicated environments in the context of explicit message-passing programs. To address this problem, we have designed and implemented an extension to MPI called dynamic MPI (Dyn-MPI). The key component of Dyn-MPI is its run-time system, which efficiently and automatically redistributes data on the fly when there are changes in the application or the underlying environment. Dyn-MPI supports efficient memory allocation, precise measurement of system load and computation time, and node removal. Performance results show that programs that use Dyn-MPI execute efficiently in non-dedicated environments, including up to almost a threefold improvement compared to programs that do not redistribute data and a 25% improvement over standard adaptive load balancing techniques.

9 citations

••

TL;DR: A cost model is formulated to express the expected time to read the desired data as a function of disk system's parameters (seek time, rotational latency, and reading speed) and the lengths of foreign keys and an algorithm is provided for identifying the most desirable disk page size.

Abstract: This paper examines strategic arrangement of fact data in a data warehouse in order to answer analytical queries efficiently. Usually, the composite of foreign keys from dimension tables are defined as the fact table's primary key. We focus on analytical queries that specify a value for a randomly chosen foreign key. The desired data for answering a query are typically located at different parts of the disk, thus requiting multiple disk I/Os to read them from disk to memory. We formulate a cost model to express the expected time to read the desired data as a function of disk system's parameters (seek time, rotational latency, and reading speed) and the lengths of foreign keys. For a predetermined disk page size, we search for an arrangement of the fact data that minimizes the expected time cost. An algorithm is then provided for identifying the most desirable disk page size. Finally, we present a heuristic for answering complex queries that specify values for multiple foreign keys.

7 citations

••

TL;DR: Using Markov analysis, it is shown that this yields additional insight into the underlying concept of reciprocal service department cost allocation by proving that the “full service” department costs can be used to determine the price that should be paid to an external supplier of the same service currently supplied by the service department.

Abstract: In a manufacturing company, certain departments can be
characterized as production departments and others as service
departments. Examples of service departments are purchasing,
computing services, repair and maintenance, security, food
services, and so forth. The costs of such service departments must
be allocated to the production departments, which in turn will
allocate them to the product. It is known that one can view the
cost allocation problem as an absorbing Markov process, with the
production departments as the absorbing states and the service
departments as the transient states. Using Markov analysis, we
will show that this yields additional insight into the underlying
concept of reciprocal service department cost allocation by
proving that the “full service” department costs can be used to
determine the price that should be paid to an external supplier of
the same service currently supplied by the service department.

5 citations

••

TL;DR: In this article, it is shown that a pair of functional equations f(rx) = kf(x) and f(sx) = jf(X) where r" : s" for any positive integers n and m suffice to uniquely determine the learning curve; compatibility of the two equations requires that log,k = log,j or there will be no learning curve satisfying the pair of equations.

Abstract: A systematic mathematical analysis of learning curves is presented. It is shown that while a learning curve with learning factor k does necessarily satisfy the functional equation f(2x) = kf(x), this equation admits numerous other analytic, convex solutions as well so that it cannot be used to uniquely characterize learning curves. Rather, it is shown that a pair of functional equations f(rx) = kf(x) and f(sx) = jf(x) where r" : s" for any positive integers n and m suffice to uniquely determine the learning curve; compatibility of the two equations requires that log,k = log,j or there will be no learning curve satisfying the pair of equations. Two classes of almost learning curves are generated and studied by means of suitable perturbation terms in the differential equation y' = by/x of the true or standard learning curve and these curves are applied to describe data not exhibiting exact learning-curve behavior. Finally the concept of average marginal hours or cost is introduced and its behavior is found to also exhibit the learning-curve phenomenon except for an initial deviation.

4 citations

##### Cited by

More filters

••

12 Nov 2005

TL;DR: This paper presents a system called Jitter, which reduces the frequency on nodes that are assigned less computation and therefore have slack time, and the goal of Jitter is to attempt to ensure that they arrive "just in time" so that they avoid increasing overall execution time.

Abstract: Recently, improving the energy efficiency of HPC machines has become important. As a result, interest in using powerscalable clusters, where frequency and voltage can be dynamically modified, has increased. On power-scalable clusters, one opportunity for saving energy with little or no loss of performance exists when the computational load is not perfectly balanced. This situation occurs frequently, as balancing load between nodes is one of the long standing problems in parallel and distributed computing. In this paper we present a system called Jitter, which reduces the frequency on nodes that are assigned less computation and therefore have slack time. This saves energy on these nodes, and the goal of Jitter is to attempt to ensure that they arrive "just in time" so that they avoid increasing overall execution time. For example, in Aztec, from the ASCI Purple suite, our algorithm uses 8% less energy while increasing execution time by only 2.6%.

223 citations

••

TL;DR: This paper presents a system called Jitter, which reduces the frequency on nodes that are assigned less computation and therefore have slack time, and the goal of Jitter is to attempt to ensure that they arrive "just in time" so that they avoid increasing overall execution time.

Abstract: Although users of high-performance computing are most interested in raw performance, both energy and power consumption have become critical concerns. As a result, improving energy efficiency of nodes on HPC machines has become important, and the prevalence of power-scalable clusters, where the frequency and voltage can be dynamically modified, has increased. On power-scalable clusters, one opportunity for saving energy with little or no loss of performance exists when the computational load is not perfectly balanced. This situation occurs frequently, as keeping the load balanced between nodes is one of the long-standing fundamental problems in parallel and distributed computing. Indeed, despite the large body of research aimed at balancing load both statically and dynamically, this problem is quite difficult to solve. This paper presents a system called Jitter that reduces the frequency and voltage on nodes that are assigned less computation and, therefore, have idle or slack time. This saves energy on these nodes, and the goal of Jitter is to attempt to ensure that they arrive ''just in time'' so that they avoid increasing overall execution time. Specifically, we dynamically determine which nodes have enough slack time such that they can execute at a reduced frequency with little performance cost-which will greatly reduce the consumed energy on that node. In particular, Jitter saves 12.8% energy with 0.4% time increase-which is essentially the same as a hand-tuned solution-on the Aztec benchmark.

184 citations

••

24 Oct 2005TL;DR: Two types of scheduling algorithms are discussed and customized to adapt the characteristics of parameter-sweep, as well as their effectiveness has been compared under multifarious scenarios.

Abstract: Parameter-sweep has been widely adopted in large numbers of scientific applications. Parameter-sweep features need to be incorporated into grid workflows so as to increase the scale and scope of such applications. New scheduling mechanisms and algorithms are required to provide optimized policy for resource allocation and task arrangement in such a case. This paper addresses scheduling sequential parameter-sweep tasks in a fine-grained manner. The optimization is produced by pipelining the subtasks and dispatching each of them onto well-selected resources. Two types of scheduling algorithms are discussed and customized to adapt the characteristics of parameter-sweep, as well as their effectiveness has been compared under multifarious scenarios.

63 citations

••

TL;DR: Heterogeneous MPI (HeteroMPI), an extension of MPI for programming high-performance computations on heterogeneous networks of computers, allows the application programmer to describe the performance model of the implemented algorithm in a generic form.

Abstract: The paper presents Heterogeneous MPI (HeteroMPI), an extension of MPI for programming high-performance computations on heterogeneous networks of computers. It allows the application programmer to describe the performance model of the implemented algorithm in a generic form. This model allows the specification of all the main features of the underlying parallel algorithm, which have an impact on its execution performance. These features include the total number of parallel processes, the total volume of computations to be performed by each process, the total volume of data to be transferred between each pair of the processes, and how exactly the processes interact during the execution of the algorithm. Given a description of the performance model, HeteroMPI tries to create a group of processes that executes the algorithm faster than any other group. The principal extensions to MPI are presented. We demonstrate the features of the library by performing experiments with parallel simulation of the interaction of electric and magnetic fields and parallel matrix multiplication.

61 citations

••

14 May 2007TL;DR: This work has implemented malleability as an extension to the PCM (process checkpointing and migration) library, a user-level library for iterative MPI applications, a framework for middleware-driven dynamic application reconfiguration.

Abstract: Malleability enables a parallel application's execution system to split or merge processes modifying granularity. While process migration is widely used to adapt applications to dynamic execution environments, it is limited by the granularity of the application's processes. Malleability empowers process migration by allowing the application's processes to expand or shrink following the availability of resources. We have implemented malleability as an extension to the PCM (process checkpointing and migration) library, a user-level library for iterative MPI applications. PCM is integrated with the Internet operating system (IOS), a framework for middleware-driven dynamic application reconfiguration. Our approach requires minimal code modifications and enables transparent middleware- triggered reconfiguration. Experimental results using a two-dimensional data parallel program that has a regular communication structure demonstrate the usefulness of malleability.

51 citations