scispace - formally typeset
Search or ask a question
Topic

Pipeline (computing)

About: Pipeline (computing) is a research topic. Over the lifetime, 26760 publications have been published within this topic receiving 204305 citations. The topic is also known as: data pipeline & computational pipeline.


Papers
More filters
Proceedings ArticleDOI
Mathys C. Walma1
24 Sep 2007
TL;DR: A method for pipelining the calculation of CRC's, such as ISO-3309 CRC32, that allows independent scaling of circuit frequency and data throughput by varying the data width and the number of pipeline stages and allows calculation over data that isn't the full width of the input.
Abstract: Traditional methods to calculate CRC suffer from diminishing returns. Doubling the data width doesn't double the maximum data throughput, the worst case timing path becomes slower. Feedback in the traditional implementation makes pipelining problematic. However, the on chip data width used for high throughput protocols is constantly increasing. The battle of reducing static power consumption is one factor driving this trend towards wider data paths. This paper discusses a method for pipelining the calculation of CRC's, such as ISO-3309 CRC32. This method allows independent scaling of circuit frequency and data throughput by varying the data width and the number of pipeline stages. Pipeline latency can be traded for area while slightly affecting timing. Additionally it allows calculation over data that isn't the full width of the input. This often happens at the end of the packet, although it could happen in the middle of the packet if data arrival is bursty. Finally, a fortunate side effect is that it offers the ability to efficiently update a known good CRC value where a small subset of data in the packet has changed. This is a function often desired in routers, for example updating the TTL field in IPv4 packets.

49 citations

Proceedings ArticleDOI
17 Feb 2021
TL;DR: Guo et al. as mentioned in this paper proposed AutoBridge, an automated framework that couples a coarse-grained floorplanning step with pipelining during HLS compilation, allowing HLS to more easily identify and pipeline the long wires, especially those crossing the die boundaries.
Abstract: Despite an increasing adoption of high-level synthesis (HLS) for its design productivity advantages, there remains a significant gap in the achievable clock frequency between an HLS-generated design and a handcrafted RTL one. A key factor that limits the timing quality of the HLS outputs is the difficulty in accurately estimating the interconnect delay at the HLS level. Unfortunately, this problem becomes even worse when large HLS designs are implemented on the latest multi-die FPGAs, where die-crossing interconnects incur a high delay penalty. To tackle this challenge, we propose AutoBridge, an automated framework that couples a coarse-grained floorplanning step with pipelining during HLS compilation. First, our approach provides HLS with a view on the global physical layout of the design, allowing HLS to more easily identify and pipeline the long wires, especially those crossing the die boundaries. Second, by exploiting the flexibility of HLS pipelining, the floorplanner is able to distribute the design logic across multiple dies on the FPGA device without degrading clock frequency. This prevents the placer from aggressively packing the logic on a single die which often results in local routing congestion that eventually degrades timing. Since pipelining may introduce additional latency, we further present analysis and algorithms to ensure the added latency will not compromise the overall throughput. AutoBridge can be integrated into the existing CAD toolflow for Xilinx FPGAs. In our experiments with a total of 43 design configurations, we improve the average frequency from 147 MHz to 297 MHz (a 102% improvement) with no loss of throughput and a negligible change in resource utilization. Notably, in 16 experiments we make the originally unroutable designs achieve 274 MHz on average. The tool is available at https://github.com/Licheng-Guo/AutoBridge.

49 citations

Journal ArticleDOI
TL;DR: The drake R package (Landau 2018) is a workflow manager and computational engine for data science projects that surpasses the analogous functionality in similar tools such as Make, remake, memoise, and knitr.
Abstract: The drake R package (Landau 2018) is a workflow manager and computational engine for data science projects. Its primary objective is to keep results up to date with the underlying code and data. When it runs a project, drake detects any pre-existing output and refreshes the pieces that are outdated or missing. Not every runthrough starts from scratch, and the final answers are reproducible. With a user-friendly R-focused interface, comprehensive documentation, and extensive implicit parallel computing support, drake surpasses the analogous functionality in similar tools such as Make (Stallman 1998), remake (FitzJohn 2017), memoise (Wickham et al. 2017), and knitr (Xie 2017).

49 citations

Patent
20 Sep 1996
TL;DR: In this article, the authors present a processor architecture in which each processor has its own memory, strategically distributed along the stages of an execution pipeline of the processor, to provide fast access to often used information, such as the contents of the address and data registers, the program counter, etc.
Abstract: A computer system architecture in which each processor has its own memory, strategically distributed along the stages of an execution pipeline of the processor, to provide fast access to often used information, such as the contents of the address and data registers, the program counter, etc. Memory storage is strategically located in close physical proximity to a stage in an execution pipeline at which memory is commonly or repeatedly accessed. Coupled to the pipeline at various stages are small memory cells for storing information that is consistently and repeatedly requested at that stage in the execution pipeline. The speed of the execution pipeline in a processor is critical to overall performance of the processor and the computer architecture of the present invention as a whole. To that end, the clock cycle time at which the pipeline is operated is increased as much as the operating characteristics of the logic and associated circuitry will allow. Generally, access times for memory are slower than the clock cycle times at which the pipeline logic can operate. Thus, there is a point of diminishing return at which increasing the clock cycle time of the pipeline is less advantageous if the pipeline must wait for memory access to complete. Thus, there is provided two sets of strategically located memory cells distributed along the execution pipeline of a processor, and alternately accesses the memory cells.

49 citations


Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
86% related
Scalability
50.9K papers, 931.6K citations
85% related
Server
79.5K papers, 1.4M citations
82% related
Electronic circuit
114.2K papers, 971.5K citations
82% related
CMOS
81.3K papers, 1.1M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202218
20211,066
20201,556
20191,793
20181,754
20171,548