Topic
Pipeline (computing)
About: Pipeline (computing) is a research topic. Over the lifetime, 26760 publications have been published within this topic receiving 204305 citations. The topic is also known as: data pipeline & computational pipeline.
Papers published on a yearly basis
Papers
More filters
•
09 Apr 1990
TL;DR: In this paper, the authors describe a pipeline consisting of a multi-channel bi-directional video bus, multi-dimensional audio bus, and a digital interprocessor communications bus, where a software driver interconnects the multiple video and audio devices in different configurations.
Abstract: A system (10) has a pipeline (12) comprised of a multi-channel bi-directional video bus (14), multi-channel bi-directional audio bus (16), and a digital interprocessor communications bus (18). The pipeline (12) is equipped with a number of ports (20) where media controller (microprocessor) printed circuit cards (22) can be connected, thus providing a convenient method for connecting media devices (24) to the pipeline (12). In this manner, a media device's video input and output can be optionally connected to any of the video pipes (26) of the video bus (14). Similarly, the media device (24) audio inputs and outputs can be optionally connected to any of the audio bus (16) pipes (26). The switching is accomplished through a pair of analog multiplexers (28) whose connection options have been commanded by local microprocessor (30) resident on the media device microprocessor control board (22). The local microprocessor (30) receives instructions for the pipeline switch interconnections through the interprocessor serial communications bus (18 ). The pipeline (12) is constructed on a motherboard printed circuit board (32) that additionally contains a microprocessor (34) that serves as the local area network controller for the interprocessor communications. A software driver interconnects the multiple video and audio devices (24) in different configurations in response to user inputs to a host data processing system so that physical assignments of the device communications on the pipeline (12) are transparent to the user.
57 citations
••
04 May 2015TL;DR: This work uses FPGA to design a deep learning accelerator, the accelerator focuses on the implementation of the prediction process, data access optimization and pipeline structure, and can achieve promising result.
Abstract: Recently, machine learning is widely used in applications and cloud services And as the emerging field of machine learning, deep learning shows excellent ability in solving complex learning problems To give users better experience, high performance implementations of deep learning applications seem very important As a common means to accelerate algorithms, FPGA has high performance, low power consumption, small size and other characteristics So we use FPGA to design a deep learning accelerator, the accelerator focuses on the implementation of the prediction process, data access optimization and pipeline structure Compared with Core 2 CPU 23GHz, our accelerator can achieve promising result
57 citations
••
TL;DR: This paper designs an embedded face detection system for handheld digital cameras or camera phones that can achieve about 75-80% detection rate for group portraits and proposes a hardware pipeline design for Haar-like feature calculation and a system design exploiting several levels of parallelism.
57 citations
••
06 Nov 2004TL;DR: A parallel visualization pipeline implemented at the Pittsburgh Supercomputing Center (PSC) for studying the largest earthquake simulation ever performed is presented, based on a parallel adaptive rendering algorithm coupled with a new parallel I/O strategy which effectively reduces interframe delay by dedicating some processors toI/O and preprocessing tasks.
Abstract: This paper presents a parallel visualization pipeline implemented at the Pittsburgh Supercomputing Center (PSC) for studying the largest earthquake simulation ever performed. The simulation employs 100 million hexahedral cells to model 3D seismic wave propagation of the 1994 Northridge earthquake. The time-varying dataset produced by the simulation requires terabytes of storage space. Our solution for visualizing such terascale simulations is based on a parallel adaptive rendering algorithm coupled with a new parallel I/O strategy which effectively reduces interframe delay by dedicating some processors to I/O and preprocessing tasks. In addition, a 2D vector field visualization method and a 3D enhancement technique are incorporated into the parallel visualization framework to help scientists better understand the wave propagation both on and under the ground surface. Our test results on the HP/Compaq AlphaServer operated at the PSC show that we can completely remove the I/O bottlenecks commonly present in time-varying data visualization. The high-performance visualization solution we provide to the scientists allows them to explore their data in the temporal, spatial, and variable domains at high resolution. The new high-resolution explorability, likely not available to most computational science groups, will help lead to many new insights.
57 citations
••
30 Aug 1999
TL;DR: This paper describes the study of a new field programmable gate array architecture based on on-line arithmetic, dedicated to single chip implementation of numerical algorithms in low-power signal processing and digital control applications.
Abstract: This paper describes the study of a new field programmable gate array architecture based on on-line arithmetic. This architecture, called Field Programmable On-line oPerators (FPOP), is dedicated to single chip implementation of numerical algorithms in low-power signal processing and digital control applications. FPOP is based on a reprogrammable array of on-line arithmetic operators. On-line arithmetic is a digit-serial arithmetic with most significant digits first using a redundant number system. The digit-level pipeline, the small number of communication wires between the operators and the small size of the arithmetic operators lead to high-performance parallel computations. In FPOP, the basic elements are arithmetic operators such as adders, subtracters, multipliers, dividers, square-rooters, sine or cosine operators.... An equation model is then sufficient to describe the mapping of the algorithm on the circuit. The digit-serial communication mode also significantly reduces the necessary programmable routing resources compared to standard FPGAs.
57 citations