scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Integrating Physical Constraints in HW-SW Partitioning for Architectures With Partial Dynamic Reconfiguration

TL;DR: This work presents an exact approach for hardware-software (HW-SW) partitioning that guarantees correctness of implementation by considering placement implications as an integral aspect of HW-SW partitioning and presents a physically aware HW- SW partitioning heuristic that simultaneously partitions, schedules, and does linear placement of tasks on such devices.
Abstract: Partial dynamic reconfiguration is a key feature of modern reconfigurable architectures such as the Xilinx Virtex series of devices. However, this capability imposes strict placement constraints such that even exact system-level partitioning (and scheduling) formulations are not guaranteed to be physically realizable due to placement infeasibility. We first present an exact approach for hardware-software (HW-SW) partitioning that guarantees correctness of implementation by considering placement implications as an integral aspect of HW-SW partitioning. Our exact approach is based on integer linear programming (ILP) and considers key issues such as configuration prefetch for minimizing schedule length on the target single-context device. Next, we present a physically aware HW-SW partitioning heuristic that simultaneously partitions, schedules, and does linear placement of tasks on such devices. With the exact formulation, we confirm the necessity of physically-aware HW-SW partitioning for the target architecture. We demonstrate that our heuristic generates high-quality schedules by comparing the results with the exact formulation for small tests and with a popular, but placement-uanaware scheduling heuristic for a large set of over a hundred tests. Our final set of experiments is a case study of JPEG encoding-we demonstrate that our focus on physical considerations along with our consideration of multiple task implementation points enables our approach to be easily extended to handle heterogenous architectures (with specialized resources distributed between general purpose programmable logic columns). The execution time of our heuristic is very reasonable-task graphs with hundreds of nodes are processed (partitioned, scheduled, and placed) in a couple of minutes
Citations
More filters
Journal ArticleDOI
01 Jun 2010
TL;DR: This paper proposes an ant colony optimization (ACO) heuristic that, given a model of the target architecture and the application, efficiently executes both scheduling and mapping to optimize the application performance.
Abstract: To exploit the power of modern heterogeneous multiprocessor embedded platforms on partitioned applications, the designer usually needs to efficiently map and schedule all the tasks and the communications of the application, respecting the constraints imposed by the target architecture. Since the problem is heavily constrained, common methods used to explore such design space usually fail, obtaining low-quality solutions. In this paper, we propose an ant colony optimization (ACO) heuristic that, given a model of the target architecture and the application, efficiently executes both scheduling and mapping to optimize the application performance. We compare our approach with several other heuristics, including simulated annealing, tabu search, and genetic algorithms, on the performance to reach the optimum value and on the potential to explore the design space. We show that our approach obtains better results than other heuristics by at least 16% on average, despite an overhead in execution time. Finally, we validate the approach by scheduling and mapping a JPEG encoder on a realistic target architecture.

157 citations

Proceedings ArticleDOI
13 Jun 2005
TL;DR: A physically aware hardware-software (HW-SW) scheme for minimizing application execution time under HW resource constraints, where the HW is a reconfigurable architecture with partial dynamic reconfiguration capability.
Abstract: Many reconfigurable architectures offer partial dynamic configurability, but current system-level tools cannot guarantee feasible implementations when exploiting this feature. We present a physically aware hardware-software (HW-SW) scheme for minimizing application execution time under HW resource constraints, where the HW is a reconfigurable architecture with partial dynamic reconfiguration capability. Such architectures impose strict placement constraints that lead to implementation infeasibility of even optimal scheduling formulations that ignore the nature of these constraints. We propose an exact and a heuristic formulation that simultaneously partition, schedule, and do linear placement of tasks on such architectures. With our exact formulation, we prove the critical nature of placement constraints. We demonstrate that our heuristic generates high-quality schedules by comparing the results with the exact formulation for small tests and a popular, but placement-uanaware scheduling heuristic for larger tests. With a case study, we demonstrate extension of our approach to handle heterogenous architectures with specialized resources distributed between general purpose programmable logic columns. The execution time of our heuristic is very reasonable- task graphs with hundreds of nodes are processed in a couple of minutes.

112 citations

Journal ArticleDOI
TL;DR: A new model for the partitioning and scheduling of a specification on partially dynamically reconfigurable hardware is proposed based on a new graph-theoretic approach, which aims to obtain near optimality even if performed independently from the subsequent phase.
Abstract: This paper proposes a new model for the partitioning and scheduling of a specification on partially dynamically reconfigurable hardware. Although this problem can be solved optimally only by tackling its subproblems jointly, the exceeding complexity of such a task leads to a decomposition into two phases. The partitioning phase is based on a new graph-theoretic approach, which aims to obtain near optimality even if performed independently from the subsequent phase. For the scheduling phase, a new integer linear programming formulation and a heuristic approach are developed. Both take into account configuration prefetching and module reuse. The experimental results show that the proposed method compares favorably with existing solutions.

50 citations

Journal ArticleDOI
TL;DR: A scheduler that deals with task-graphs at run-time, steering its execution in the reconfigured resources while carrying out both prefetch and replacement techniques that cooperate to hide most of the reconfiguration delays, which clearly outperforms conventional run- time schedulers based on as-soon-as-possible techniques.
Abstract: New generation embedded systems demand high performance, efficiency, and flexibility. Reconfigurable hardware can provide all these features. However, the costly reconfiguration process and the lack of management support have prevented a broader use of these resources. To solve these issues we have developed a scheduler that deals with task-graphs at run-time, steering its execution in the reconfigurable resources while carrying out both prefetch and replacement techniques that cooperate to hide most of the reconfiguration delays. In our scheduling environment, task-graphs are analyzed at design-time to extract useful information. This information is used at run-time to obtain near-optimal schedules, escaping from local-optimum decisions, while only carrying out simple computations. Moreover, we have developed a hardware implementation of the scheduler that applies all the optimization techniques while introducing a delay of only a few clock cycles. In the experiments our scheduler clearly outperforms conventional run-time schedulers based on as-soon-as-possible techniques. In addition, our replacement policy, specially designed for reconfigurable systems, achieves almost optimal results both regarding reuse and performance.

48 citations


Cites background from "Integrating Physical Constraints in..."

  • ...This may be sometimes a major problem, since a sub-optimal placement can lead to infeasible schedules that do not meet the system constraints, as it is proved in [4]....

    [...]

Proceedings ArticleDOI
21 Jul 2008
TL;DR: This paper proposes an implementation of the algorithm that gradually constructs feasible solution instances and searches around them rather than exploring a structure that already considers all the possible solutions, and introduces a two-stage decision mechanism that simplifies the data structures but lets the ant perform correlated choices for both the mapping and the scheduling.
Abstract: Heterogeneous multiprocessor systems, assembled with off-the-shelf processors and augmented with reprogrammable devices, thanks to their performance, cost effectiveness and flexibility, have become a standard platform for embedded systems. To fully exploit the computational power offered by these systems, great care should be taken when deciding on which processing element (mapping) and when (scheduling) executing the program tasks. Unfortunately, both these problems are NP-complete, and, even if they are strictly interconnected, they are normally performed separately with exact or heuristic algorithms to simplify the search for the optimum points. In this paper we present an exploration algorithm based on Ant Colony Optimization (ACO) that tries to solve the two problems simultaneously. We propose an implementation of the algorithm that gradually constructs feasible solution instances and searches around them rather than exploring a structure that already considers all the possible solutions. We introduce a two-stage decision mechanism that simplifies the data structures but lets the ant perform correlated choices for both the mapping and the scheduling. We show that this algorithm provides better and more robust solutions in less time than the Simulated Annealing and the Tabu Search algorithms, extended to support the combined scheduling and mapping problems. In particular, our ACO formulation can find, on average, solutions between 64% and 55% better than Simulated Annealing and Tabu Search.

36 citations


Cites result from "Integrating Physical Constraints in..."

  • ...One of the first problems to which ACO has been successfully applied is the TSP [4], for which it gives competitive results when compared with traditional methods....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A heuristic method for partitioning arbitrary graphs which is both effective in finding optimal partitions, and fast enough to be practical in solving large problems is presented.
Abstract: We consider the problem of partitioning the nodes of a graph with costs on its edges into subsets of given sizes so as to minimize the sum of the costs on all edges cut. This problem arises in several physical situations — for example, in assigning the components of electronic circuits to circuit boards to minimize the number of connections between boards. This paper presents a heuristic method for partitioning arbitrary graphs which is both effective in finding optimal partitions, and fast enough to be practical in solving large problems.

5,082 citations

Proceedings ArticleDOI
01 Jan 1982
TL;DR: An iterative mincut heuristic for partitioning networks is presented whose worst case computation time, per pass, grows linearly with the size of the network.
Abstract: An iterative mincut heuristic for partitioning networks is presented whose worst case computation time, per pass, grows linearly with the size of the network. In practice, only a very small number of passes are typically needed, leading to a fast approximation algorithm for mincut partitioning. To deal with cells of various sizes, the algorithm progresses by moving one cell at a time between the blocks of the partition while maintaining a desired balance based on the size of the blocks rather than the number of cells per block. Efficient data structures are used to avoid unnecessary searching for the best cell to move and to minimize unnecessary updating of cells affected by each move.

2,463 citations

DOI
01 Mar 1998
TL;DR: A user-controllable, general-purpose, pseudorandom task graph generator called Task Graphs For Free, which has the ability to generate independent tasks as well as task sets which are composed of partially ordered task graphs.
Abstract: We present a user-controllable, general-purpose, pseudorandom task graph generator called Task Graphs For Free (TGFF). TGFF creates problem instances for use in allocation and scheduling research. It has the ability to generate independent tasks as well as task sets which are composed of partially ordered task graphs. A complete description of a scheduling problem instance is created, including attributes for processors, communication resources, tasks, and inter-task communication. The user may parametrically control the correlations between attributes. Sharing TGFF's parameter settings allows researchers to easily reproduce the examples used by others, regardless of the platform on which TGFF is run.

962 citations

Journal ArticleDOI
TL;DR: A Lagrangean relaxation of a zero-one integer programming formulation of the problem of cutting a number of rectangular pieces from a single large rectangle is developed and used as a bound in a tree search procedure.
Abstract: We consider the two-dimensional cutting problem of cutting a number of rectangular pieces from a single large rectangle so as to maximize the value of the pieces cut. We develop a Lagrangean relaxation of a zero-one integer programming formulation of the problem and use it as a bound in a tree search procedure. Subgradient optimization is used to optimize the bound derived from the Lagrangean relaxation. Problem reduction tests derived from both the original problem and the Lagrangean relaxation are given. Incorporating the bound and the reduction tests into a tree search procedure enables moderately sized problems to be solved.

467 citations

Proceedings ArticleDOI
01 Dec 1995
TL;DR: A P-admissible solution space where each packing is represented by a pair of module name sequences is proposed, and hundreds of modules could be successfully packed as demonstrated.
Abstract: The first and the most critical stage in VLSI layout design is the placement, the background of which is the rectangle packing problem: Given many rectangular modules of arbitrary size, place them without overlapping on a layer in the smallest bounding rectangle. Since the variety of the packing is infinite (two- dimensionally continuous) many, the key issue for successful optimization is in the introduction of a P-admissible solution space, which is a finite set of solutions at least one of which is optimal. This paper proposes such a solution space where each packing is represented by a pair of module name sequences. Searching this space by simulated annealing, hundreds of modules could be successfully packed as demonstrated. Combining a conventional wiring method, the biggest MCNC benchmark ami49 is challenged.

391 citations


"Integrating Physical Constraints in..." refers methods in this paper

  • ...Thus, with prefetch, we are unable to directly apply rectangular packing algorithms from work like [ 21 ]....

    [...]

  • ...In such work, the task reconfiguration is bundled along with task execution and treated as a single process—while such simplifications make the problem closer to rectangle packing [ 21 ], the proposed strategies are not applicable to single-context architectures with resource contention for reconfiguration and significant reconfiguration overheads....

    [...]