Topic
Pipeline (computing)
About: Pipeline (computing) is a research topic. Over the lifetime, 26760 publications have been published within this topic receiving 204305 citations. The topic is also known as: data pipeline & computational pipeline.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: A recurrent neural network (RNN) accelerator design with resistive random-access memory (ReRAM)-based processing-in-memory (PIM) architecture distinguished from prior ReRAM-based convolutional neural network accelerators is presented.
Abstract: We present a recurrent neural network (RNN) accelerator design with resistive random-access memory (ReRAM)-based processing-in-memory (PIM) architecture. Distinguished from prior ReRAM-based convolutional neural network accelerators, we redesign the system to make it suitable for RNN acceleration. We measure the system throughput and energy efficiency with the detailed circuit and device characterization. Reprogrammability is enabled with our design, and an RNN friendly pipeline is employed to increase the system throughput. We observe that on average the proposed system achieves $79{\times}$ improvement of computing efficiency compared with graphics processing unit baseline. Our simulation also indicates that to maintain high accuracy and computing efficiency, the read noise standard deviation should be less than 0.2, the device resistance should be at least 1 $\text{M}{\Omega }$ , and the device writes latency should be minimized.
89 citations
•
26 Nov 1997
TL;DR: In this article, a method for optimizing a program by inserting memory prefetch operations in the program executing in a computer system is presented, where a program optimizer uses the measured latencies to estimate the number of cycles that elapse before data of a memory operation are available.
Abstract: A method is provided for optimizing a program by inserting memory prefetch operations in the program executing in a computer system. The computer system includes a processor and a memory. Latencies of instructions of the program are measured by hardware while the instructions are processed by a pipeline of the processor. Memory prefetch instructions are automatically inserted in the program based on the measured latencies to optimize execution of the program. The latencies measure the time from when a load instructions issues a request for data to the memory until the data are available in the processor. A program optimizer uses the measured latencies to estimate the number of cycles that elapse before data of a memory operation are available.
89 citations
•
03 Mar 1998
TL;DR: In this paper, a processor includes an execution pipeline and a retire unit coupled to the end of the execution pipeline, and means for incrementing the register whenever an instruction is retired from the pipeline.
Abstract: A processor includes an execution pipeline and a retire unit coupled to an end of the execution pipeline. The processor executes instructions of a program. An apparatus for collecting performance data while the instructions are executing includes a register coupled to the retire unit of the processor. Means are provided for incrementing the register whenever an instruction is retired from the execution pipeline. In addition, the apparatus includes means for generating an interrupt to an interrupt handler whenever the register is incremented to a predetermined value.
88 citations
•
06 Jun 2010TL;DR: In this paper, a method for processing images for a first camera and a second camera of a mobile device using a shared pipeline is described, where the first set of images captured by the first camera of the mobile device are processed using a first configuration of the shared pipeline.
Abstract: Some embodiments provide a method of processing images for a first camera and a second camera of a mobile device using a shared pipeline. A method receives a first set of images captured by the first camera of the mobile device. The method processes the first set of images using a first configuration of the shared pipeline. The method also receives a second set of images captured by the second camera of the mobile device, and processes the second set of images using a second configuration of the shared pipeline different from the first configuration.
88 citations
•
NEC1
TL;DR: In this article, a data processing system capable of processing instructions under pipeline control in a plurality of stages including an executing stage, an instruction prefetching device comprises a prediction checking circuit (66, 67) coupled to a predicting circuit (52, 53) and an instruction executing circuit (32, 33, 37, 38) and a prefetch controlling circuit (47, 86).
Abstract: In a data processing system capable of processing instructions under pipeline control in a plurality of stages including an executing stage, an instruction prefetching device comprises a prediction checking circuit (66, 67) coupled to a predicting circuit (52, 53) and an instruction executing circuit (32, 33, 37, 38) and a prefetch controlling circuit (47, 86) coupled to the predicting circuit and the checking circuit. In one of the stages that is prior to the executing stage, the checking circuit (66, 67) checks whether or not a prediction for a branch destination is correct. if the prediction is correct, prefetch is continued according to the prediction. If the prediction is an incorrect prediction, the prefetch is continued according to a correct prediction with the incorrect prediction corrected immediately after the executing stage. Check of the prediction may be for an instruction other than branch instructions, for either an unconditional branch instruction or a branch count instruction, for a branch destination address, or for a branch direction which becomes clear after the executing stage.
88 citations