scispace - formally typeset
Search or ask a question
Topic

Pipeline (computing)

About: Pipeline (computing) is a research topic. Over the lifetime, 26760 publications have been published within this topic receiving 204305 citations. The topic is also known as: data pipeline & computational pipeline.


Papers
More filters
Proceedings ArticleDOI
22 Aug 2005
TL;DR: Compared to previous schemes, this paper shows that SDP is the only scheme that scales well in all the five scalability requirements and achieves scalability in throughput by simultaneously pipelining at the data-structure level and the hardware level.
Abstract: A truly scalable IP-lookup scheme must address five challenges of scalability, namely: routing-table size, lookup throughput, implementation cost, power dissipation, and routing-table update cost. Though several IP-lookup schemes have been proposed in the past, none of them do well in all the five scalability requirements. Previous schemes pipeline tries by mapping trie levels to pipeline stages. We make the fundamental observation that because this mapping is static and oblivious of the prefix distribution, the schemes do not scale well when worst-case prefix distributions are considered. This paper is the first to meet all the five requirements in the worst case. We propose scalable dynamic pipelining (SDP) which includes three key innovations: (1) We map trie nodes to pipeline stages based on the node height. Because the node height is directly determined by the prefix distribution, the node height succinctly provides sufficient information about the distribution. Our mapping enables us to prove a worst-case per-stage memory bound which is significantly tighter than those of previous schemes. (2) We exploit our mapping to propose a novel scheme for incremental route-updates. In our scheme a route-update requires exactly and only one write dispatched into the pipeline. This route-update cost is obviously the optimum and our scheme achieves the optimum in the worst case. (3) We achieve scalability in throughput by simultaneously pipelining at the data-structure level and the hardware level. SDP naturally scales in power and implementation cost. We not only present a theoretical analysis but also evaluate SDP and a number of previous schemes using detailed hardware simulation. Compared to previous schemes, we show that SDP is the only scheme that scales well in all the five requirements.

56 citations

Proceedings ArticleDOI
25 Jun 2007
TL;DR: Division and square root algorithms are also described which take advantage of high-precision linear approximation hardware for obtaining a reciprocal or reciprocal square root approximation.
Abstract: The floating point unit of the next generation PowerPC is detailed. It has been tested at over 5 GHz. The design supports an extremely aggressive cycle time of 13 FO4 using a technology independent measure. For most dependent instructions, its fused multiply-add dataflow has only 6 effective pipeline stages. This is nearly equivalent to its predecessor, the Power 5, even though its technology independent frequency has increased over 70%. Overall the frequency has improved over 100%. It achieves this high performance through aggressive feedback paths, circuit design and layout. The pipeline has 7 stages but data may be fed back to dependent operations prior to rounding and complete normalization. Division and square root algorithms are also described which take advantage of high-precision linear approximation hardware for obtaining a reciprocal or reciprocal square root approximation.

56 citations

Patent
15 Aug 1988
TL;DR: In this article, a branch command initiates an interval of delay which affords prefetching target instructions while using pipeline contents to prevent a pipeline break and avoid lost time, and then a jump or split to the target instructions is performed only if the condition is met.
Abstract: In a pipeline computer, current instructions executed in sequence are monitored for conditional and unconditional branch commands, as well as the readiness of condition codes, the meeting of branch conditions and split commands. A branch command initiates an interval of delay which affords prefetching target instructions while using pipeline contents to prevent a pipeline break and avoid lost time. Detection of a branch command actuates a register to store a sequence of target instructions. Unless a branch command is conditional, subsequent detection (delayed) of a split command shifts the stored target instructions into operation as the current instructions. For a conditional branch command, a jump or split to the target instructions is performed only if the condition is met. Otherwise the current instruction sequence is restored pending another branch command. Dual instruction registers, program counters and address registers alternate to accommodate branch jumps with considerable time savings by effective programming.

56 citations

Proceedings ArticleDOI
12 Feb 2005
TL;DR: This paper examines the realistic benefits and limits of clock-gating in current generation high-performance processors and examines additional opportunities to avoid unnecessary clocking in real workload executions, and examines the power reduction benefits of a couple of newly invented schemes called transparent pipeline clock- gating and elastic pipeline Clock-Gating.
Abstract: Clock-gating has been introduced as the primary means of dynamic power management in recent high-end commercial microprocessors. The temperature drop resulting from active power reduction can result in additional leakage power savings in future processors. In this paper we first examine the realistic benefits and limits of clock-gating in current generation high-performance processors (e.g. of the POWER4/spl trade/ or POWER5/spl trade/ class). We then look beyond classical clock-gating: we examine additional opportunities to avoid unnecessary clocking in real workload executions. In particular, we examine the power reduction benefits of a couple of newly invented schemes called transparent pipeline clock-gating and elastic pipeline clock-gating. Based on our experiences with current designs, we try to bound the practical limits of clock gating efficiency in future microprocessors.

56 citations

Patent
15 Apr 2010
TL;DR: In this paper, a multi-mode Advanced Encryption Standard (MM-AES) module for a storage controller is adapted to perform interleaved processing of multiple data streams, i.e., concurrently encrypt and/or decrypt string-data blocks from multiple data stream using, for each data stream, a corresponding cipher mode that is any one of a plurality of AES cipher modes.
Abstract: In one embodiment, a multi-mode Advanced Encryption Standard (MM-AES) module for a storage controller is adapted to perform interleaved processing of multiple data streams, i.e., concurrently encrypt and/or decrypt string-data blocks from multiple data streams using, for each data stream, a corresponding cipher mode that is any one of a plurality of AES cipher modes. The MM-AES module receives a string-data block with (a) a corresponding key identifier that identifies the corresponding module-cached key and (b) a corresponding control command that indicates to the MM-AES module what AES-mode-related processing steps to perform on the data block. The MM-AES module generates, updates, and caches masks to preserve inter-block information and allow the interleaved processing. The MM-AES module uses an unrolled and pipelined architecture where each processed data block moves through its processing pipeline in step with correspondingly moving key, auxiliary data, and instructions in parallel pipelines.

56 citations


Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
86% related
Scalability
50.9K papers, 931.6K citations
85% related
Server
79.5K papers, 1.4M citations
82% related
Electronic circuit
114.2K papers, 971.5K citations
82% related
CMOS
81.3K papers, 1.1M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202218
20211,066
20201,556
20191,793
20181,754
20171,548