scispace - formally typeset
Search or ask a question
Topic

Pipeline (computing)

About: Pipeline (computing) is a research topic. Over the lifetime, 26760 publications have been published within this topic receiving 204305 citations. The topic is also known as: data pipeline & computational pipeline.


Papers
More filters
Journal ArticleDOI
TL;DR: This work proposes a hierarchical model that combines a precedence graph model and a queuing network model to capture the intra-job synchronization constraints of MapReduce, and produces estimates of average job response time that deviate from measurements of a real setup by less than 15 %.
Abstract: MapReduce is a currently popular programming model to support parallel computations on large datasets. Among the several existing MapReduce implementations, Hadoop has attracted a lot of attention from both industry and research. In a Hadoop job, map and reduce tasks coordinate to produce a solution to the input problem, exhibiting precedence constraints and synchronization delays that are characteristic of a pipeline communication between maps (producers) and reduces (consumers). We here address the challenge of designing analytical models to estimate the performance of MapReduce workloads, notably Hadoop workloads, focusing particularly on the intra-job pipeline parallelism between map and reduce tasks belonging to the same job. We propose a hierarchical model that combines a precedence graph model and a queuing network model to capture the intra-job synchronization constraints. We first show how to build a precedence graph that represents the dependencies among multiple tasks of the same job. We then apply it jointly with an approximate Mean Value Analysis (aMVA) solution to predict mean job response time, throughput and resource utilization. We validate our solution against a queuing network simulator and a real setup in various scenarios, finding very close agreement in both cases. In particular, our model produces estimates of average job response time that deviate from measurements of a real setup by less than 15 %.

66 citations

Patent
18 Feb 1993
TL;DR: In this paper, an improved method and apparatus are provided for performing parallel and pipeline processing of data sequences, which includes a plurality of memory circuits (200) and a plurality-of-data processors (204) where each data processor is constructed for parallel processing of the data sequences.
Abstract: Improved method and apparatus are provided for performing parallel and pipeline processing of data sequences (302). The apparatus includes a plurality of memory circuits (200) and a plurality of data processors wherein each data processor is constructed for parallel and pipeline processing of data sequences (302). Address controllers (202) are provided for routing data between the memory circuits (200) and the pixel processors (204). The address controllers (202) are capable of directly coupling any memory circuit (200) to any pixel processor (204) so that data may be simultaneously transferred from a plurality of memory circuits (200) to a plurality of pixel processors (204). Further, the pixel processors (204) are provided with processing elements for performing data processing on neighboring data words of a data sequence. The address controller (202) is constructed for providing data from the memory circuits (200) in a plurality of sequences so that the data may be provided to the pixel processor (204) first and second times in respective first and second sequences to enable two dimensional processing of the data sequence. A feature processor (206) is provided for extracting specific information from the processed image data, relating to features of objects contained therein.

66 citations

Patent
09 Apr 1999
TL;DR: In this article, an apparatus for processing data has a Single-Instruction-Multiple-Data (SIMD) architecture, and a number of features that improve performance and programmability.
Abstract: An apparatus for processing data has a Single-Instruction-Multiple-Data (SIMD) architecture, and a number of features that improve performance and programmability. The apparatus includes a rectangular array of processing elements and a controller. In one aspect, each of the processing elements includes one or more addressable storage means and other elements arranged in a pipelined architecture. The controller includes means for receiving a high level instruction, and converting each instruction into a sequence of one or more processing element microinstructions for simultaneously controlling each stage of the processing element pipeline. In doing so, the controller detects and resolves a number of resource conflicts, and automatically generates instructions for registering image operands that are skewed with respect to one another in the processing element array. In another aspect, a programmer references images via pointers to image descriptors that include the actual addresses of various bits of multi-bit data. Other features facilitate and speed up the movement of data into and out of the apparatus. 'Hit' detection and histogram logic are also included.

66 citations

Journal ArticleDOI
TL;DR: This approach not only demonstrates the record 1 million mass resolution for lipid imaging from brain tissue, but explicitly show such mass resolution is required to resolve the complexity of the lipidome.
Abstract: Desorption electrospray ionisation-mass spectrometry imaging (DESI-MSI) is a powerful imaging technique for the analysis of complex surfaces. However, the often highly complex nature of biological samples is particularly challenging for MSI approaches, as options to appropriately address molecular complexity are limited. Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) offers superior mass accuracy and mass resolving power, but its moderate throughput inhibits broader application. Here we demonstrate the dramatic gains in mass resolution and/or throughput of DESI-MSI on an FT-ICR MS by developing and implementing a sophisticated data acquisition and data processing pipeline. The presented pipeline integrates, for the first time, parallel ion accumulation and detection, post-processing absorption mode Fourier transform and pixel-by-pixel internal re-calibration. To achieve that, first, we developed and coupled an external high-performance data acquisition system to an FT-ICR MS instrument to record the time-domain signals (transients) in parallel with the instrument’s built-in electronics. The recorded transients were then processed by the in-house developed computationally-efficient data processing and data analysis software. Importantly, the described pipeline is shown to be applicable even to extremely large, up to 1 TB, imaging datasets. Overall, this approach provides improved analytical figures of merits such as: (i) enhanced mass resolution at no cost in experimental time; and (ii) up to 4-fold higher throughput while maintaining a constant mass resolution. Using this approach, we not only demonstrate the record 1 million mass resolution for lipid imaging from brain tissue, but explicitly show such mass resolution is required to resolve the complexity of the lipidome.

66 citations


Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
86% related
Scalability
50.9K papers, 931.6K citations
85% related
Server
79.5K papers, 1.4M citations
82% related
Electronic circuit
114.2K papers, 971.5K citations
82% related
CMOS
81.3K papers, 1.1M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202218
20211,066
20201,556
20191,793
20181,754
20171,548