scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

InSyn: Integrated Scheduling for DSP Applications

01 Jul 1993-pp 349-354
TL;DR: The InSyn is presented, an integrated allocation and scheduling approach for high-level synthesis applications that considers functional units, busses and registers while performing time-step assignment.
Abstract: In this paper, we present the InSyn, an integrated allocation and scheduling approach for high-level synthesis applications. The scheduler considers functional units, busses and registers while performing time-step assignment. The results show that incorporating all these features during scheduling can produce very good designs.
Citations
More filters
Journal ArticleDOI
TL;DR: The need for higher-level design automation tools are discussed first and some basic techniques for various subtasks of high-level synthesis are described, including testability, power efficiency, and reliability.
Abstract: We survey recent developments in high level synthesis technology for VLSI design. The need for higher-level design automation tools are discussed first. We then describe some basic techniques for various subtasks of high-level synthesis. Techniques that have been proposed in the past few years (since 1994) for various subtasks of high-level synthesis are surveyed. We also survey some new synthesis objectives including testability, power efficiency, and reliability.

111 citations

Journal ArticleDOI
TL;DR: This work presents novel OS algorithms using the ant colony optimization approach for both timing-constrained scheduling (TCS) and resource-constructed scheduling (RCS) problems, using a unique hybrid approach by combining the MAX-MIN ant system metaheuristic with traditional scheduling heuristics.
Abstract: Operation scheduling (OS) is a fundamental problem in mapping an application to a computational device. It takes a behavioral application specification and produces a schedule to minimize either the completion time or the computing resources required to meet a given deadline. The OS problem is NP-hard; thus, effective heuristic methods are necessary to provide qualitative solutions. We present novel OS algorithms using the ant colony optimization approach for both timing-constrained scheduling (TCS) and resource-constrained scheduling (RCS) problems. The algorithms use a unique hybrid approach by combining the MAX-MIN ant system metaheuristic with traditional scheduling heuristics. We compiled a comprehensive testing benchmark set from real-world applications in order to verify the effectiveness and efficiency of our proposed algorithms. For TCS, our algorithm achieves better results compared with force-directed scheduling on almost all the testing cases with a maximum 19.5% reduction of the number of resources. For RCS, our algorithm outperforms a number of different list-scheduling heuristics with better stability and generates better results with up to 14.7% improvement. Our algorithms outperform the simulated annealing method for both scheduling problems in terms of quality, computing time, and stability

55 citations


Cites background from "InSyn: Integrated Scheduling for DS..."

  • ...If an operation can be performed by more than one resource type, we call it “heterogeneous” scheduling [7]....

    [...]

Proceedings ArticleDOI
06 Nov 1994
TL;DR: This paper presents a comprehensive technique for lower bound estimation (LBE) of resources from behavioral descriptions that accounts for storage resources in addition to functional resources and uses a finer granularity that permits the modeling of functional unit, register and interconnect delays.
Abstract: In this paper, we present a comprehensive technique for lower bound estimation (LBE) of resources from behavioral descriptions. Previous work has focused on LBE techniques that use very simple cost models which primarily focus on the functional unit resources. Our cost model accounts for storage resources in addition to functional resources. Our timing model uses a finer granularity that permits the modeling of functional unit, register and interconnect delays. We tested our LBE technique for both functional unit and storage requirements on several high-level synthesis benchmarks and observed near-optimal results.

39 citations


Cites background or result from "InSyn: Integrated Scheduling for DS..."

  • ...[17] A. Sharma and R. Jain, InSyn: Integrated Scheduling for DSP Applications, Proc....

    [...]

  • ...Basically we com­pared our results with OASIC [10], ILP approach [15], HAL [16], and InSyn [17]....

    [...]

  • ...Basically we compared our results with OASIC [10], ILP approach [15], HAL [16], and InSyn [17]....

    [...]

  • ...Delay InSyn [17] Our Estimation (ns) FU Reg....

    [...]

  • ...340 3(+), 2(*p) 10 3(+), 2(*p) 12 3(+), 2(*p) 10 360 3(+), 1(*p) 10 3(+), 1(*p) - 3(+), 1(*p) 10 380 2(+), 1(*p) 9 2(+), 1(*p) 12 2(+), 1(*p) 9 *p: 2-stage pipelined multiplier (delay of 25.0 ns), +: adder (delay of 15.0 ns) Table 6: 5th order elliptic wave .lter -design III Delay (ns) InSyn [17] Our Estimation FU Reg....

    [...]

Journal ArticleDOI
30 Oct 1995
TL;DR: This work defines and makes use of newgraph dependent constraints to obtain a lower bound estimate on the iteration period for any data-flow graph and incorporates implicit retiming and pipelining to generate optimal and near optimal schedules.
Abstract: We present a new algorithm for resource-constrained scheduling for digital signal processing (DSP) applications when the number of processors is fixed and the objective is to obtain a schedule with the minimum iteration period. This type of scheduling is best suited for moderate speed applications where conservation of area and power is more important than speed. We define and make use of newgraph dependent constraints to obtain a lower bound estimate on the iteration period for any data-flow graph. By satisfying these constraints before performing the scheduling task, we can restrict the design space and can generate valid schedules in less time than previously reported. The graph dependent constraints provide a more accurate lower bound estimate on the iteration period than previously published results. This new scheduling algorithm exploits the iterative nature of DSP algorithms and uses aniterative-loop based scheduling approach. This resource scheduling algorithm has been incorporated in the Minnesota ARchitecture Synthesis (MARS) system. Our approach exploits inter-iteration and intra-iteration precedence constraints and incorporates implicit retiming and pipelining to generate optimal and near optimal schedules.

27 citations


Cites background or methods from "InSyn: Integrated Scheduling for DS..."

  • ...To obtain more optimal designs, both tasks should be performed simultaneously [4-6], [14-27]....

    [...]

  • ...In recent years many synthesis systems have been developed for automated design of high performance dedicated architectures, especially for digital signal processing (DSP) applications [1-27]....

    [...]

Journal ArticleDOI
TL;DR: This paper presents an integrated approach aimed at predicting lower bounds on hardware resources needed to implement a behavioral description within a given amount of time, and believes that this approach can lead to better quality HLS solutions in less time.
Abstract: The importance of effective lower bound estimation (LBE) techniques is well established in high-level synthesis (HLS) since it allows more efficient exploration of the design space while providing other HLS tools with the capability of predicting the effect of specific tools on the design space. Much of the previous work has focused on LBE techniques that use very simple cost models which primarily focus on the functional unit resources. With the push toward submicron technologies, simple models that use functional unit resources alone are not accurate enough to allow effective design space exploration since the effects of storage and interconnect can indeed dominate the cost function. In this paper, we present an integrated approach aimed at predicting lower bounds on hardware resources needed to implement a behavioral description within a given amount of time. Our area cost model accounts for storage (register) and interconnect resources (buses) in addition to functional resources. Our timing model uses a finer granularity that permits the modeling of functional unit, register, and interconnect delays. Our approach is integrated because we consider the dependencies between the different types of resources as well as the ordering in which the resources are allocated. We tested our technique for functional unit, storage, and interconnect requirements on several high-level synthesis benchmarks, and observed near-optimal results. We believe that our comprehensive LBE approach can lead to better quality HLS solutions in less time, and we demonstrate this approach in our paper.

26 citations

References
More filters
Journal ArticleDOI
TL;DR: A general scheduling methodology is presented that can be integrated into specialized or general-purpose high-level synthesis systems and reduces the number of functional units, storage units, and buses required by balancing the concurrency of operations assigned to them.
Abstract: A general scheduling methodology is presented that can be integrated into specialized or general-purpose high-level synthesis systems. An initial version of the force-directed scheduling algorithm at the heart of this methodology was originally presented by the authors in 1987. The latest implementation of the logarithm introduced here reduces the number of functional units, storage units, and buses required by balancing the concurrency of operations assigned to them. The algorithm supports a comprehensive set of constraint types and scheduling modes. These include multicycle and chained operations; mutually exclusive operations; scheduling under fixed global timing constraints with minimization of functional unit costs, minimization of register costs, and minimization of global interconnect requirements; scheduling with local time constraints (on operation pairs); scheduling under fixed hardware resource constraints; functional pipelining; and structural pipeline (use of pipeline functional units). Examples from current literature, one of which was chosen as a benchmark for the 1988 High-Level Synthesis Workshop, are used to illustrate the effectiveness of the approach. >

1,093 citations

Journal ArticleDOI
01 Jun 1988
TL;DR: This paper shows that software pipelining is an effective and viable scheduling technique for VLIW processors, and proposes a hierarchical reduction scheme whereby entire control constructs are reduced to an object similar to an operation in a basic block.
Abstract: This paper shows that software pipelining is an effective and viable scheduling technique for VLIW processors. In software pipelining, iterations of a loop in the source program are continuously initiated at constant intervals, before the preceding iterations complete. The advantage of software pipelining is that optimal performance can be achieved with compact object code.This paper extends previous results of software pipelining in two ways: First, this paper shows that by using an improved algorithm, near-optimal performance can be obtained without specialized hardware. Second, we propose a hierarchical reduction scheme whereby entire control constructs are reduced to an object similar to an operation in a basic block. With this scheme, all innermost loops, including those containing conditional statements, can be software pipelined. It also diminishes the start-up cost of loops with small number of iterations. Hierarchical reduction complements the software pipelining technique, permitting a consistent performance improvement be obtained.The techniques proposed have been validated by an implementation of a compiler for Warp, a systolic array consisting of 10 VLIW processors. This compiler has been used for developing a large number of applications in the areas of image, signal and scientific processing.

936 citations

Journal ArticleDOI
TL;DR: Compilers for vector or multiprocessor computers must have certain optimization features to successfully generate parallel code to be able to operate on parallel systems.
Abstract: Compilers for vector or multiprocessor computers must have certain optimization features to successfully generate parallel code.

758 citations

Journal ArticleDOI
01 Feb 1990
TL;DR: It is shown how the high-level synthesis task can be decomposed into a number of distinct but not independent subtasks.
Abstract: High-level synthesis systems start with an abstract behavioral specification of a digital system and find a register-transfer level structure that realizes the given behavior. The various tasks involved in developing a register-transfer level structure from an algorithmic level specification are described. In particular, it is shown how the high-level synthesis task can be decomposed into a number of distinct but not independent subtasks. The techniques that have been developed for solving those subtasks are presented. Areas related to high-level synthesis that are still open problems are examined. >

639 citations

Journal ArticleDOI
TL;DR: This paper presents a unifying procedure, called Facet, for the automated synthesis of data paths at the register-transfer level that minimizes the number of storage elements, data operators, and interconnection units.
Abstract: This paper presents a unifying procedure, called Facet, for the automated synthesis of data paths at the register-transfer level. The procedure minimizes the number of storage elements, data operators, and interconnection units. A design generator named Emerald, based on Facet, was developed and implemented to facilitate extensive experiments with the methodology. The input to the design generator is a behavioral description which is viewed as a code sequence. Emerald provides mechanisms for interactively manipulating the code sequence. Different forms of the code sequence are mapped into data paths of different cost and speed. Data paths for the behavioral descriptions of the AM2910, the AM2901, and the IBM System/370 were produced and analyzed. Designs for the AM2910 and the AM2901 are compared with commercial designs. Overall, the total number of gates required for Emerald's designs is about 15 percent more than the commercial designs. The design space spanned by the behavioral specification of the AM2901 is extensively explored.

567 citations