scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Path-based scheduling for synthesis

Raul Camposano1
TL;DR: A novel path-based scheduling algorithm that yields solutions with the minimum number of control steps, taking into account arbitrary constraints that limit the amount of operations in each control step, is presented.
Abstract: A novel path-based scheduling algorithm is presented. It yields solutions with the minimum number of control steps, taking into account arbitrary constraints that limit the amount of operations in each control step. The result is a finite state machine that implements the control. Although the complexity of the algorithm is proportional to the number of paths in the control-flow graph, it is shown to be practical for large examples with thousands of nodes. >
Citations
More filters
Journal ArticleDOI
Jason Cong, Bin Liu, Stephen Neuendorffer1, Juanjo Noguera1, Kees Vissers1, Zhiru Zhang 
TL;DR: AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx are used as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains.
Abstract: Escalating system-on-chip design complexity is pushing the design community to raise the level of abstraction beyond register transfer level. Despite the unsuccessful adoptions of early generations of commercial high-level synthesis (HLS) systems, we believe that the tipping point for transitioning to HLS msystem-on-chip design complexityethodology is happening now, especially for field-programmable gate array (FPGA) designs. The latest generation of HLS tools has made significant progress in providing wide language coverage and robust compilation technology, platform-based modeling, advancement in core HLS algorithms, and a domain-specific approach. In this paper, we use AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains. Complex industrial designs targeting Xilinx FPGAs are also presented as case studies, including comparison of HLS solutions versus optimized manual designs. In particular, the experiment on a sphere decoder shows that the HLS solution can achieve an 11-31% reduction in FPGA resource usage with improved design productivity compared to hand-coded design.

728 citations


Additional excerpts

  • ...with conditional branches [12]....

    [...]

Book
11 Mar 2009
TL;DR: EDA/VLSI practitioners and researchers in need of fluency in an "adjacent" field will find this an invaluable reference to the basic EDA concepts, principles, data structures, algorithms, and architectures for the design, verification, and test of VLSI circuits.
Abstract: This book provides broad and comprehensive coverage of the entire EDA flow. EDA/VLSI practitioners and researchers in need of fluency in an "adjacent" field will find this an invaluable reference to the basic EDA concepts, principles, data structures, algorithms, and architectures for the design, verification, and test of VLSI circuits. Anyone who needs to learn the concepts, principles, data structures, algorithms, and architectures of the EDA flow will benefit from this book. Covers complete spectrum of the EDA flow, from ESL design modeling to logic/test synthesis, verification, physical design, and test - helps EDA newcomers to get "up-and-running" quickly Includes comprehensive coverage of EDA concepts, principles, data structures, algorithms, and architectures - helps all readers improve their VLSI design competence Contains latest advancements not yet available in other books, including Test compression, ESL design modeling, large-scale floorplanning, placement, routing, synthesis of clock and power/ground networks - helps readers to design/develop testable chips or products Includes industry best-practices wherever appropriate in most chapters - helps readers avoid costly mistakes Table of Contents Chapter 1: Introduction Chapter 2: Fundamentals of CMOS Design Chapter 3: Design for Testability Chapter 4: Fundamentals of Algorithms Chapter 5: Electronic System-Level Design and High-Level Synthesis Chapter 6: Logic Synthesis in a Nutshell Chapter 7: Test Synthesis Chapter 8: Logic and Circuit Simulation Chapter 9:?Functional Verification Chapter 10: Floorplanning Chapter 11: Placement Chapter 12: Global and Detailed Routing Chapter 13: Synthesis of Clock and Power/Ground Networks Chapter 14: Fault Simulation and Test Generation.

200 citations

Book
01 Jan 2008
TL;DR: This dissertation formulates the problem of computer-aided design of embedded systems using both application-specific as well as general-purpose reprogrammable components using both chip-level and system-level synthesis.
Abstract: As the complexity of systems being subject to computer-aided synthesis and optimization techniques increases, so does the need to find ways to incorporate predesigned components into final system implementation. In this context, a general-purpose microprocessor provides a sophisticated low-cost component that can be tailored to realize most system functions through appropriate software. This approach is particularly useful in the design of embedded systems that have a relatively simple target architecture, when compared to general-purpose computing systems such as workstations. In embedded systems the processor is used as a resource dedicated to implement specific functions. However, the design issues in embedded systems are complicated since most of these systems operate in a time-constrained environment. Recent advances in chip-level synthesis have made it possible to synthesize application-specific circuits under strict timing constraints. This dissertation formulates the problem of computer-aided design of embedded systems using both application-specific as well as general-purpose reprogrammable components. Given a specification of system functionality and constraints in a hardware description language, we model the system as a set of bilogic flow graphs, and formulate the co-synthesis problem as a partitioning problem under constraints. Timing constraints are used to determine the parts of the system functionality that are delegated to application-specific hardware and the software that runs on the processor. The software component of such a 'mixed' system poses an interesting problem due to its interaction with concurrently operating hardware. We address this problem by generating software as a set of concurrent fixed-latency serialized operations called threads. The satisfaction of the imposed performance constraints is then ensured by exploiting concurrency between program threads, achieved by an inter-leaved execution on a single processor system. This co-synthesis of hardware and software from behavioral specifications makes it possible to build time-constrained embedded systems by using off-the-shelf parts and application-specific circuitry. Due to the reduction in size of application-specific hardware needed compared to an all-hardware solution, the needed hardware component can be easily mapped to semicustom VLSI such as gate arrays, thus shortening the design time. In addition, the ability to perform a detailed analysis of timing performance provides an opportunity to improve the system definition by creating better prototypes. The algorithms and techniques described have been implemented in a framework called Vulcan, which is integrated with the Stanford Olympus Synthesis System and provides a path from chip-level synthesis to system-level synthesis.

175 citations

Proceedings ArticleDOI
24 Jul 2006
TL;DR: A new scheduler is described that converts a rich set of scheduling constraints into a system of difference constraints (SDC) and performs a variety of powerful optimizations under a unified mathematical programming framework and effectively optimize longest path latency, expected overall latency, and the slack distribution.
Abstract: Scheduling plays a central role in the behavioral synthesis process, which automatically compiles high-level specifications into optimized hardware implementations. However, most of the existing behavior-level scheduling heuristics either have a limited efficiency in a specific class of applications or lack general support of various design constraints. In this paper we describe a new scheduler that converts a rich set of scheduling constraints into a system of difference constraints (SDC) and performs a variety of powerful optimizations under a unified mathematical programming framework. In particular, we show that our SDC-based scheduling algorithm can efficiently support resource constraints, frequency constraints, latency constraints, and relative timing constraints, and effectively optimize longest path latency, expected overall latency, and the slack distribution. Experiments demonstrate that our proposed technique provides efficient solutions for a broader range of applications with higher quality of results (in terms of system performance) when compared to the state-of-the-art scheduling heuristics.

171 citations

Proceedings ArticleDOI
Kazutoshi Wakabayashi1, H. Tanaka
01 Jul 1992
TL;DR: An algorithm is proposed which generates a single finite state machine controller from parallel individual control sequences derived in the global parallelization process, which can parallelize multiple nests of conditional branches and optimize across the boundaries of basic blocks.
Abstract: The authors present a global scheduling method based on condition vectors. The proposed method exploits global parallelism. The technique can schedule operations independent of control dependencies. It transforms the control structure of the given behavior drastically, while preserving semantics to minimize the number of states in final schedule. The method can parallelize multiple nests of conditional branches and optimize across the boundaries of basic blocks. It can also optimize all possible execution paths. An algorithm is proposed which generates a single finite state machine controller from parallel individual control sequences derived in the global parallelization process. Experimental results prove that the global parallelization is very effective. >

148 citations


Cites methods from "Path-based scheduling for synthesis..."

  • ...Table.1 shows comparisons with “MAHA” in [l], “PATH” in [ 6 ], “KIM” in [5], “R*S” in [8] and our method “CVLS”....

    [...]

  • ...In this way, the proposed method can deal with the “false path” problem described in [ 6 ] by using such boolean relations among conditional test operations....

    [...]

References
More filters
Book
01 Jun 1979
TL;DR: A thoroughly revised second edition of Shimon Even's Graph Algorithms, with a foreword by Richard M. Karp and notes by Andrew V Goldberg, explains algorithms in a formal but simple language with a direct and intuitive presentation.
Abstract: Shimon Even's Graph Algorithms, published in 1979, was a seminal introductory book on algorithms read by everyone engaged in the field. This thoroughly revised second edition, with a foreword by Richard M. Karp and notes by Andrew V. Goldberg, continues the exceptional presentation from the first edition and explains algorithms in a formal but simple language with a direct and intuitive presentation. The book begins by covering basic material, including graphs and shortest paths, trees, depth-first-search, and breadth-first search. The main part of the book is devoted to network flows and applications of network flows, and it ends with chapters on planar graphs and testing graph planarity.

1,428 citations

Journal ArticleDOI
Fisher1
TL;DR: Compilation of high-level microcode languages into efficient horizontal microcode and good hand coding probably both require effective global compaction techniques.
Abstract: Microcode compaction is the conversion of sequential microcode into efficient parallel (horizontal) microcode. Local compaction techniques are those whose domain is basic blocks of code, while global methods attack code with a general flow control. Compilation of high-level microcode languages into efficient horizontal microcode and good hand coding probably both require effective global compaction techniques.

1,269 citations

Journal ArticleDOI
TL;DR: A general scheduling methodology is presented that can be integrated into specialized or general-purpose high-level synthesis systems and reduces the number of functional units, storage units, and buses required by balancing the concurrency of operations assigned to them.
Abstract: A general scheduling methodology is presented that can be integrated into specialized or general-purpose high-level synthesis systems. An initial version of the force-directed scheduling algorithm at the heart of this methodology was originally presented by the authors in 1987. The latest implementation of the logarithm introduced here reduces the number of functional units, storage units, and buses required by balancing the concurrency of operations assigned to them. The algorithm supports a comprehensive set of constraint types and scheduling modes. These include multicycle and chained operations; mutually exclusive operations; scheduling under fixed global timing constraints with minimization of functional unit costs, minimization of register costs, and minimization of global interconnect requirements; scheduling with local time constraints (on operation pairs); scheduling under fixed hardware resource constraints; functional pipelining; and structural pipeline (use of pipeline functional units). Examples from current literature, one of which was chosen as a benchmark for the 1988 High-Level Synthesis Workshop, are used to illustrate the effectiveness of the approach. >

1,093 citations

Journal ArticleDOI
TL;DR: An algorithm is presented which finds all the elementary circuits of a directed graph in time bounded by O(n + e)(c + 1) and space bounded by $O( n + e) where there are n vertices, e edges and c elementary circuits in the graph.
Abstract: An algorithm is presented which finds all the elementary circuits of a directed graph in time bounded by $O((n + e)(c + 1))$ and space bounded by $O(n + e)$, where there are n vertices, e edges and c elementary circuits in the graph. The algorithm resembles algorithms by Tiernan and Tarjan, but is faster because it considers each edge at most twice between any one circuit and the next in the output sequence.

834 citations

Journal ArticleDOI
01 Feb 1990
TL;DR: It is shown how the high-level synthesis task can be decomposed into a number of distinct but not independent subtasks.
Abstract: High-level synthesis systems start with an abstract behavioral specification of a digital system and find a register-transfer level structure that realizes the given behavior. The various tasks involved in developing a register-transfer level structure from an algorithmic level specification are described. In particular, it is shown how the high-level synthesis task can be decomposed into a number of distinct but not independent subtasks. The techniques that have been developed for solving those subtasks are presented. Areas related to high-level synthesis that are still open problems are examined. >

639 citations