scispace - formally typeset
Search or ask a question
Topic

Software pipelining

About: Software pipelining is a research topic. Over the lifetime, 939 publications have been published within this topic receiving 18316 citations.


Papers
More filters
Journal ArticleDOI
01 Jun 1988
TL;DR: This paper shows that software pipelining is an effective and viable scheduling technique for VLIW processors, and proposes a hierarchical reduction scheme whereby entire control constructs are reduced to an object similar to an operation in a basic block.
Abstract: This paper shows that software pipelining is an effective and viable scheduling technique for VLIW processors. In software pipelining, iterations of a loop in the source program are continuously initiated at constant intervals, before the preceding iterations complete. The advantage of software pipelining is that optimal performance can be achieved with compact object code.This paper extends previous results of software pipelining in two ways: First, this paper shows that by using an improved algorithm, near-optimal performance can be obtained without specialized hardware. Second, we propose a hierarchical reduction scheme whereby entire control constructs are reduced to an object similar to an operation in a basic block. With this scheme, all innermost loops, including those containing conditional statements, can be software pipelined. It also diminishes the start-up cost of loops with small number of iterations. Hierarchical reduction complements the software pipelining technique, permitting a consistent performance improvement be obtained.The techniques proposed have been validated by an implementation of a compiler for Warp, a systolic array consisting of 10 VLIW processors. This compiler has been used for developing a large number of applications in the areas of image, signal and scientific processing.

936 citations

Proceedings ArticleDOI
30 Nov 1994
TL;DR: This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models and characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.
Abstract: Module scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative module scheduling, that is capable of dealing with realistic machine models. This paper also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.

696 citations

Proceedings ArticleDOI
20 Oct 2006
TL;DR: This paper demonstrates an end-to-end stream compiler that attains robust multicore performance in the face of varying application characteristics and exploits all types of parallelism in a unified manner in order to achieve this generality.
Abstract: As multicore architectures enter the mainstream, there is a pressing demand for high-level programming models that can effectively map to them. Stream programming offers an attractive way to expose coarse-grained parallelism, as streaming applications (image, video, DSP, etc.) are naturally represented by independent filters that communicate over explicit data channels.In this paper, we demonstrate an end-to-end stream compiler that attains robust multicore performance in the face of varying application characteristics. As benchmarks exhibit different amounts of task, data, and pipeline parallelism, we exploit all types of parallelism in a unified manner in order to achieve this generality. Our compiler, which maps from the StreamIt language to the 16-core Raw architecture, attains a 11.2x mean speedup over a single-core baseline, and a 1.84x speedup over our previous work.

590 citations

Journal ArticleDOI
TL;DR: A comparison of the alternative methods for software pipelining is presented, and the relationships between the methods are explored and possibilities for improvement highlighted.
Abstract: Utilizing parallelism at the instruction level is an important way to improve performance. Because the time spent in loop execution dominates total execution time, a large body of optimizations focuses on decreasing the time to execute each iteration. Software pipelining is a technique that reforms the loop so that a faster execution rate is realized. Iterations are executed in overlapped fashion to increase parallelism.Let {ABC}n represent a loop containing operations A, B, C that is executed n times. Although the operations of a single iteration can be parallelized, more parallelism may be achieved if the entire loop is considered rather than a single iteration. The software pipelining transformation utilizes the fact that a loop {ABC}n is equivalent to A{BCA}n−1BC. Although the operations contained in the loop do not change, the operations are from different iterations of the original loop.Various algorithms for software pipelining exist. A comparison of the alternative methods for software pipelining is presented. The relationships between the methods are explored and possibilities for improvement highlighted.

365 citations

Proceedings ArticleDOI
12 Nov 2005
TL;DR: The concept of an automatic approach to thread extraction, called decoupled software pipelining (DSWP), is proved by demonstrating significant gains for dual-core chip multiprocessor models running a variety of codes.
Abstract: Until recently, a steadily rising clock rate and other uniprocessor microarchitectural improvements could be relied upon to consistently deliver increasing performance for a wide range of applications. Current difficulties in maintaining this trend have lead microprocessor manufacturers to add value by incorporating multiple processors on a chip. Unfortunately, since decades of compiler research have not succeeded in delivering automatic threading for prevalent code properties, this approach demonstrates no improvement for a large class of existing codes. To find useful work for chip multiprocessors, we propose an automatic approach to thread extraction, called Decoupled Software Pipelining (DSWP). DSWP exploits the finegrained pipeline parallelism lurking in most applications to extract long-running, concurrently executing threads. Use of the non-speculative and truly decoupled threads produced by DSWP can increase execution efficiency and provide significant latency tolerance, mitigating design complexity by reducing inter-core communication and per-core resource requirements. Using our initial fully automatic compiler implementation and a validated processor model, we prove the concept by demonstrating significant gains for dual-core chip multiprocessor models running a variety of codes. We then explore simple opportunities missed by our initial compiler implementation which suggest a promising future for this approach.

303 citations


Network Information
Related Topics (5)
Compiler
26.3K papers, 578.5K citations
85% related
Cache
59.1K papers, 976.6K citations
84% related
Parallel algorithm
23.6K papers, 452.6K citations
84% related
Programming paradigm
18.7K papers, 467.9K citations
83% related
Scalability
50.9K papers, 931.6K citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20232
202211
20214
20208
20199
20186