scispace - formally typeset
Search or ask a question
Topic

Pipeline (computing)

About: Pipeline (computing) is a research topic. Over the lifetime, 26760 publications have been published within this topic receiving 204305 citations. The topic is also known as: data pipeline & computational pipeline.


Papers
More filters
Patent
07 Jun 2004
TL;DR: A distributed query engine pipeline architecture comprises cascaded analysis engines that accept an input query and each identifies a portion of the input query that it can pass on to an execution engine as discussed by the authors.
Abstract: A distributed query engine pipeline architecture comprises cascaded analysis engines that accept an input query and each identifies a portion of the input query that it can pass on to an execution engine. Each stage rewrites the input query to remove the portion identified and replaces it with a placeholder. The rewritten query is forwarded to the next analysis engine in the cascade. Each engine compiles the portion it identified so that an execution engine may process that portion. Execution preferably proceeds from the portion of the query compiled by the last analysis engine. The execution engine corresponding to the last analysis engine executes the query and makes a call to the next higher execution engine in the cascade for data from the preceding portion. The process continues until the results from the input query are fully assembled.

113 citations

Journal ArticleDOI
TL;DR: In this article, the effect of corrosion defect size on the remaining pipeline strength is modeled by a Markov process Analytical solution of the probability transition matrix is obtained by solving the Kolmogorov forward differential equation.

113 citations

Journal ArticleDOI
T. Asprey1, G.S. Averill1, E. DeLano1, R. Mason1, B. Weiner1, J. Yetter1 
TL;DR: The PA7100 CPU, the first precision-architecture, reduced-instruction-set-computer (PA-RISC) architecture implementation to combine an integer core and floating-point coprocessor into a single-chip format, is described.
Abstract: The PA7100 CPU, the first precision-architecture, reduced-instruction-set-computer (PA-RISC) architecture implementation to combine an integer core and floating-point coprocessor into a single-chip format, is described. It incorporates superscalar execution and supports clock rates of up to 100 MHz in standard 0.8- mu m CMOS. Features such as a flexible primary cache organization and multiprocessing capability allow the device to be scaled to a variety of system applications, price ranges, and performance levels. The microprocessor instruction execution pipeline, cache design, translation look-aside buffer (TLB) for virtual address translation, floating-point unit, and system interface bus are discussed. The design, test, and verification methods used in the development of the PA7100 are reviewed. >

113 citations

Proceedings ArticleDOI
Mingyu Gao1, Xuan Yang1, Jing Pu1, Mark Horowitz1, Christos Kozyrakis1 
04 Apr 2019
TL;DR: This work proposes dataflow optimizations to address the shortcomings of existing parallel dataflow techniques for tiled NN accelerators, and develops buffer sharing dataflow that turns the distributed buffers into an idealized shared buffer, eliminating excessive data duplication and the memory access overheads.
Abstract: The use of increasingly larger and more complex neural networks (NNs) makes it critical to scale the capabilities and efficiency of NN accelerators. Tiled architectures provide an intuitive scaling solution that supports both coarse-grained parallelism in NNs: intra-layer parallelism, where all tiles process a single layer, and inter-layer pipelining, where multiple layers execute across tiles in a pipelined manner. This work proposes dataflow optimizations to address the shortcomings of existing parallel dataflow techniques for tiled NN accelerators. For intra-layer parallelism, we develop buffer sharing dataflow that turns the distributed buffers into an idealized shared buffer, eliminating excessive data duplication and the memory access overheads. For inter-layer pipelining, we develop alternate layer loop ordering that forwards the intermediate data in a more fine-grained and timely manner, reducing the buffer requirements and pipeline delays. We also make inter-layer pipelining applicable to NNs with complex DAG structures. These optimizations improve the performance of tiled NN accelerators by 2x and reduce their energy consumption by 45% across a wide range of NNs. The effectiveness of our optimizations also increases with the NN size and complexity.

113 citations

Proceedings ArticleDOI
Qing Yang1, Xiaoxiao Li1, Hongyi Yao1, Ji Fang1, Kun Tan1, Wenjun Hu1, Jiansong Zhang1, Yongguang Zhang1 
27 Aug 2013
TL;DR: BigStation is presented, a scalable architecture that enables realtime signal processing in large-scale MIMO systems which may have tens or hundreds of antennas and parallelize the MU-MIMO processing with a distributed pipeline based on its computation and communication patterns.
Abstract: Multi-user multiple-input multiple-output (MU-MIMO) is the latest communication technology that promises to linearly increase the wireless capacity by deploying more antennas on access points (APs). However, the large number of MIMO antennas will generate a huge amount of digital signal samples in real time. This imposes a grand challenge on the AP design by multiplying the computation and the I/O requirements to process the digital samples. This paper presents BigStation, a scalable architecture that enables realtime signal processing in large-scale MIMO systems which may have tens or hundreds of antennas. Our strategy to scale is to extensively parallelize the MU-MIMO processing on many simple and low-cost commodity computing devices. Our design can incrementally support more antennas by proportionally adding more computing devices. To reduce the overall processing latency, which is a critical constraint for wireless communication, we parallelize the MU-MIMO processing with a distributed pipeline based on its computation and communication patterns. At each stage of the pipeline, we further use data partitioning and computation partitioning to increase the processing speed. As a proof of concept, we have built a BigStation prototype based on commodity PC servers and standard Ethernet switches. Our prototype employs 15 PC servers and can support real-time processing of 12 software radio antennas. Our results show that the BigStation architecture is able to scale to tens to hundreds of antennas. With 12 antennas, our BigStation prototype can increase wireless capacity by 6.8x with a low mean processing delay of 860μs. While this latency is not yet low enough for the 802.11 MAC, it already satisfies the real-time requirements of many existing wireless standards, e.g., LTE and WCDMA.

112 citations


Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
86% related
Scalability
50.9K papers, 931.6K citations
85% related
Server
79.5K papers, 1.4M citations
82% related
Electronic circuit
114.2K papers, 971.5K citations
82% related
CMOS
81.3K papers, 1.1M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202218
20211,066
20201,556
20191,793
20181,754
20171,548