Topic

Pipeline (computing)

About: Pipeline (computing) is a research topic. Over the lifetime, 26760 publications have been published within this topic receiving 204305 citations. The topic is also known as: data pipeline & computational pipeline.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication

[...]

Peter G. Sassone¹, D.S. Wills¹•Institutions (1)

Georgia Institute of Technology¹

04 Dec 2004

TL;DR: The dynamic detection and speculative execution of instruction strands-linear chains of dependent instructions without intermediate fan-out are proposed, which ease the IPC penalties of multicycle issue and bypass by reducing dependency pressures, providing opportunity for clock frequency gains.

...read moreread less

Abstract: In the modern era of wire-dominated architectures, specific effort must be made to reduce needless communication within out-of-order pipelines while still maintaining binary compatibility. To ease pressure on highly-connected elements such as the issue logic and bypass network, we propose the dynamic detection and speculative execution of instruction strands-linear chains of dependent instructions without intermediate fan-out. The hardware required for detecting these chains is simple and resides off the critical path of the pipeline, and the execution targets are the normal ALUs with a self-bypass mode. By collapsing these strings of dependencies into atomic macro-instructions, the efficiency of the issue queue and reorder buffer can be increased. Our results show that over 25% of all dynamic ALU instructions can be grouped, decreasing both the average reorder buffer occupancy and issue queue occupancy by over a third. Additionally, these strands have several properties which make them amenable to simple performance optimizations. Our experiments show average IPC increases of 17% on a four-wide machine and 20% on an eight-wide machine in Spec2000int and Mediabench applications. Finally, strands ease the IPC penalties of multicycle issue and bypass by reducing dependency pressures, providing opportunity for clock frequency gains as well.

...read moreread less

86 citations

Proceedings Article•DOI•

Resource Sharing and Pipelining in Coarse-Grained Reconfigurable Architecture for Domain-Specific Optimization

[...]

Yoonjin Kim¹, Mary Kiemb¹, Chulsoo Park¹, Jinyong Jung¹, Kiyoung Choi¹ - Show less +1 more•Institutions (1)

Seoul National University¹

07 Mar 2005

TL;DR: A reconfigurable array architecture template and a design space exploration flow for domain-specific optimization are suggested and Experimental results show that this approach is much more efficient, in both performance and area, compared to existing reconfigured array architectures.

...read moreread less

Abstract: Coarse-grained reconfigurable architectures aim to achieve goals of both high performance and flexibility. However, existing reconfigurable array architectures require many resources without considering the specific application domain. Functional resources that take long latency and/or large area can be pipelined and/or shared among the processing elements. Therefore, the hardware cost and the delay can be effectively reduced without any performance degradation for some application domains. We suggest such a reconfigurable array architecture template and a design space exploration flow for domain-specific optimization. Experimental results show that our approach is much more efficient, in both performance and area, compared to existing reconfigurable architectures.

...read moreread less

86 citations

Patent•

Pipe line high speed signal processor

[...]

David D. Lynch, Lee W. Tower

25 Mar 1974

TL;DR: In this paper, a sequence of instructions through a plurality of registers connected in cascade and separately decoding each instruction in a register for control of a corresponding stage in one or more data processing paths, each comprising stages through which data being processed is stepped, each stage corresponding to only one register of the control pipeline.

...read moreread less

Abstract: Control for overlapping instruction execution in an arithmetic unit is provided by stepping a sequence of instructions through a plurality of registers connected in cascade and separately decoding each instruction in a register for control of a corresponding stage in one or more data processing paths, each comprising stages through which data being processed is stepped, each stage corresponding to only one register of the control pipeline. The output of the decoder of each instruction register controls the required operations in the corresponding stage of the data pipeline. Automatically indexed indirect addressing is provided by use of pointers for data sources and destinations as required in the execution of every instruction in order to facilitate highly iterative and structured operations on blocks or arrays of data.

...read moreread less

86 citations

Proceedings Article•DOI•

Optimum power/performance pipeline depth

[...]

Allan M. Hartstein¹, Thomas R. Puzak¹•Institutions (1)

IBM¹

03 Dec 2003

TL;DR: The theory shows that the more important power is to themetric, the shorter the optimum pipeline length that results, and that as dynamic power grows, the optimal design point shifts to shorter pipelines, while clock gating pushes the optimum to deeper pipelines.

...read moreread less

Abstract: The impact of pipeline length on both the power and performance of a microprocessor is explored both theoretically and by simulation. A theory is presented for a wide range of power/performance metrics, BIPS/sup m//W. The theory shows that the more important power is to the metric, the shorter the optimum pipeline length that results. For typical parameters neither BIPS/W nor BIPS/sup 2//W yield an optimum, i.e., a non-pipelined design is optimal. For BIPS/sup 3//W the optimum, averaged over all 55 workloads studied, occurs at a 22.5 FO4 design point, a 7 stage pipeline, but this value is highly dependent on the assumed growth in latch count with pipeline depth. As dynamic power grows, the optimal design point shifts to shorter pipelines. Clock gating pushes the optimum to deeper pipelines. Surprisingly, as leakage power grows, the optimum is also found to shift to deeper pipelines. The optimum pipeline depth varies for different classes of workloads: SPEC95 and SPEC2000 integer applications, traditional (legacy) database and on-line transaction processing applications, modern (e.g. Web) applications, and floating point applications.

...read moreread less

86 citations

Journal Article•DOI•

Embedding a high speed interval type-2 fuzzy controller for a real plant into an FPGA

[...]

Roberto Sepúlveda¹, Oscar Montiel¹, Oscar Castillo², Patricia Melin²•Institutions (2)

Instituto Politécnico Nacional¹, AmeriCorps VISTA²

01 Mar 2012

TL;DR: The main goal of this paper is to show that interval type-2 fuzzy inference systems (IT2 FIS) can be used in applications that require high speed processing and shows that the iterative KM method can be efficient if it is adequately implemented using the appropriate combination of hardware and software.

...read moreread less

Abstract: The main goal of this paper is to show that interval type-2 fuzzy inference systems (IT2 FIS) can be used in applications that require high speed processing. This is an important issue since the use of IT2 FIS still being controversial for several reasons, one of the most important is related to the resulting shocking increase in computational complexity that type reducers, like the Karnik-Mendel (KM) iterative method, can cause even for small systems. Hence, comparing our results against a typical implementation of a IT2 FIS using a high level language implemented into a computer, we show that using a hardware implementation the the whole IT2 FIS (fuzzification, inference engine, type reducer and defuzzification) last only four clock cycles; a speed up of nearly 225,000 and 450,000 can be obtained for the Spartan 3 and Virtex 5 Field Programmable Gate Arrays (FPGAs), respectively. This proposal is suitable to be implemented in pipeline, so the complete IT2 process can be obtained in just one clock cycle with the consequently gain in speed of 900,000 and 2,400,000 for the aforementioned FPGAs. This paper also shows that the iterative KM method can be efficient if it is adequately implemented using the appropriate combination of hardware and software. Comparative experiments of control surfaces, and time response in the control of a real plant using the IT2 FIS implemented into a computer against the IT2 FIS into an FPGA are shown.

...read moreread less

86 citations

Collapse

Network Information

Performance

Metrics

26,760

Papers

229,716

Citations

No. of papers in the topic in previous years
Year	Papers
2022	18
2021	1,066
2020	1,556
2019	1,793
2018	1,754
2017	1,548

Pipeline (computing)

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics