scispace - formally typeset
Search or ask a question
Topic

Pipeline (computing)

About: Pipeline (computing) is a research topic. Over the lifetime, 26760 publications have been published within this topic receiving 204305 citations. The topic is also known as: data pipeline & computational pipeline.


Papers
More filters
Proceedings ArticleDOI
01 Jun 1987
TL;DR: In this paper, a new method of implementing branch instructions is presented, called Branch Folding, which can reduce the apparent number of instructions needed to execute a program by the number of branches in that program, as well as eliminating pipeline breakage.
Abstract: A new method of implementing branch instructions is presented. This technique has been implemented in the CRISP Microprocessor. With a combination of hardware and software techniques the execution time cost for many branches can be effectively reduced to zero. Branches are folded into other instructions, making their execution as separate instructions unnecessary. Branch Folding can reduce the apparent number of instructions needed to execute a program by the number of branches in that program, as well as reducing or eliminating pipeline breakage. Statistics are presented demonstrating the effectiveness of Branch Folding and associated techniques used in the CRISP Microprocessor.

129 citations

Proceedings ArticleDOI
04 Jun 2011
TL;DR: From this idea, a toolset is developed, called FabScalar, for automatically composing the synthesizable register-transfer-level (RTL) designs of arbitrary cores within a canonical superscalar template, which defines canonical pipeline stages and interfaces among them.
Abstract: A growing body of work has compiled a strong case for the single-ISA heterogeneous multi-core paradigm. A single-ISA heterogeneous multi-core provides multiple, differently-designed superscalar core types that can streamline the execution of diverse programs and program phases. No prior research has addressed the 'Achilles' heel of this paradigm: design and verification effort is multiplied by the number of different core types. This work frames superscalar processors in a canonical form, so that it becomes feasible to quickly design many cores that differ in the three major superscalar dimensions: superscalar width, pipeline depth, and sizes of structures for extracting instruction-level parallelism (ILP). From this idea, we develop a toolset, called FabScalar, for automatically composing the synthesizable register-transfer-level (RTL) designs of arbitrary cores within a canonical superscalar template. The template defines canonical pipeline stages and interfaces among them. A Canonical Pipeline Stage Library (CPSL) provides many implementations of each canonical pipeline stage, that differ in their superscalar width and depth of sub-pipelining. An RTL generation tool uses the template and CPSL to automatically generate an overall core of desired configuration. Validation experiments are performed along three fronts to evaluate the quality of RTL designs generated by FabScalar: functional and performance (instructions-per-cycle (IPC)) validation, timing validation (cycle time), and confirmation of suitability for standard ASIC flows. With FabScalar, a chip with many different superscalar core types is conceivable.

128 citations

Proceedings ArticleDOI
07 Sep 2017
TL;DR: A novel, accurate tightly-coupled visual-inertial odometry pipeline for event cameras that leverages their outstanding properties to estimate the camera ego-motion in challenging conditions, such as high-speed motion or high dynamic range scenes.
Abstract: Event cameras are bio-inspired vision sensors that output pixel-level brightness changes instead of standard intensity frames. They offer significant advantages over standard cameras, namely a very high dynamic range, no motion blur, and a latency in the order of microseconds. We propose a novel, accurate tightly-coupled visual-inertial odom- etry pipeline for such cameras that leverages their outstanding properties to estimate the camera ego-motion in challenging conditions, such as high-speed motion or high dynamic range scenes. The method tracks a set of features (extracted on the image plane) through time. To achieve that, we consider events in overlapping spatio-temporal windows and align them using the current camera motion and scene structure, yielding motion-compensated event frames. We then combine these feature tracks in a keyframe- based, visual-inertial odometry algorithm based on nonlinear optimization to estimate the camera’s 6-DOF pose, velocity, and IMU biases. The proposed method is evaluated quantitatively on the public Event Camera Dataset [19] and significantly outperforms the state-of-the-art [28], while being computationally much more efficient: our pipeline can run much faster than real-time on a laptop and even on a smartphone processor. Fur- thermore, we demonstrate qualitatively the accuracy and robustness of our pipeline on a large-scale dataset, and an extremely high-speed dataset recorded by spinning an event camera on a leash at 850 deg/s.

128 citations

Proceedings ArticleDOI
31 Jan 1998
TL;DR: A novel dynamic register renaming approach to delay the allocation of physical registers until a late stage in the pipeline, instead of doing it in the decode stage as conventional schemes do, so that the register pressure is reduced and the processor can exploit more instruction-level parallelism.
Abstract: A novel dynamic register renaming approach is proposed in this work. The key idea of the novel scheme is to delay the allocation of physical registers until a late stage in the pipeline, instead of doing it in the decode stage as conventional schemes do. In this way, the register pressure is reduced and the processor can exploit more instruction-level parallelism. Delaying the allocation of physical registers require some additional artifact to keep track of dependences. This is achieved by introducing the concept of virtual-physical registers, which do not require any storage location and are used to identify dependences among instructions that have not yet allocated a register to its destination operand. Two alternative allocation strategies have been investigated that differ in the stage where physical registers are allocated: issue or write-back. The experimental evaluation has confirmed the higher performance of the latter alternative. We have performed all evaluation of the novel scheme through a detailed simulation of a dynamically scheduled processor. The results show a significant improvement (e.g., 19% increase in IPC for a machine with 64 physical registers in each file) when compared with the traditional register renaming approach.

128 citations

Patent
10 Mar 1986
TL;DR: In this article, a cellular processing system for analyzing an image comprising a matrix of points employs an image memory for storing digital data signals representative of each of the points, a plurality of special function processing units, each adapted to perform a specific operation on one or more images, and data bus means for selectively distributing image data from the image memory to preselected function processors for processing in a cascaded fashion and returning the processed data signals back to image memory.
Abstract: A cellular processing system for analyzing an image comprising a matrix of points employs an image memory for storing digital data signals representative of each of the points, a plurality of special function processing units, each adapted to perform a specific operation on one or more images, and data bus means for selectively distributing image data from the image memory to one or more preselected function processors for processing in a cascaded fashion and returning the processed data signals back to image memory. The special function process units include a pipeline processor employing one or more programmable, substantially identical neighborhood transformation stages and an image combiner including means for performing arithmetic, logical, and conditional operations on one or more images.

125 citations


Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
86% related
Scalability
50.9K papers, 931.6K citations
85% related
Server
79.5K papers, 1.4M citations
82% related
Electronic circuit
114.2K papers, 971.5K citations
82% related
CMOS
81.3K papers, 1.1M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202218
20211,066
20201,556
20191,793
20181,754
20171,548