scispace - formally typeset
Search or ask a question
Topic

Pipeline (computing)

About: Pipeline (computing) is a research topic. Over the lifetime, 26760 publications have been published within this topic receiving 204305 citations. The topic is also known as: data pipeline & computational pipeline.


Papers
More filters
Proceedings ArticleDOI
15 Apr 2014
TL;DR: A new protocol called P3 (Practical Packet Pipeline) is proposed that keeps its packet pipeline flowing despite the quality differences among channels, and achieves a minimum goodput of about 149 Kbps, while PIP's goodput reduces to zero in 65% of the cases.
Abstract: While high throughput is the key for a number of important applications of sensor networks, performance of the state-of-the-art approach is often poor in practice. This is because if even one of the channels used in its pipeline is bad, the pipeline stalls and throughput degrades significantly.In this paper, we propose a new protocol called P3 (Practical Packet Pipeline) that keeps its packet pipeline flowing despite the quality differences among channels. P3 exploits sender and receiver diversities through synchronous transmissions (constructive interference), involving concurrent transmissions from multiple senders to multiple receivers at every stage of its packet pipeline. To optimize throughput further, P3 uses node grouping to enable the source to transmit in every pipeline cycle, thus fully utilizing the transmission capacity of an underlying radio.Our evaluation results on a 139-node testbed show that P3 achieves an average goodput of 178.5 Kbps while goodput of the state-of-the-art high throughput protocol PIP (Packets In Pipeline) is only 31 Kbps. More interestingly, P3 achieves a minimum goodput of about 149 Kbps, while PIP's goodput reduces to zero in 65% of the cases.

43 citations

Proceedings ArticleDOI
17 Feb 2021
TL;DR: DAPPLE as mentioned in this paper is a synchronous training framework which combines data parallelism and pipeline parallelism for large DNN models, and it features a novel parallelization strategy planner to solve the partition and placement problems.
Abstract: It is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However, there are still several tricky issues to address: improving computing efficiency while ensuring convergence, and reducing memory usage without incurring additional computing costs. We propose DAPPLE, a synchronous training framework which combines data parallelism and pipeline parallelism for large DNN models. It features a novel parallelization strategy planner to solve the partition and placement problems, and explores the optimal hybrid strategies of data and pipeline parallelism. We also propose a new runtime scheduling algorithm to reduce device memory usage, which is orthogonal to re-computation approach and does not come at the expense of training throughput. Experiments show that DAPPLE planner consistently outperforms strategies generated by PipeDream's planner by up to 3.23× speedup under synchronous training scenarios, and DAPPLE runtime outperforms GPipe by 1.6× speedup of training throughput and saves 12% of memory consumption at the same time.

43 citations

Proceedings ArticleDOI
22 May 2021
TL;DR: In this paper, the Transformer architecture is used to predict the next token in the list of potential code completions in the IDE at cursor position, and it outperforms previous state-of-the-art next token prediction systems by margins ranging from 14% to 18%.
Abstract: Code prediction, more specifically autocomplete, has become an essential feature in modern IDEs. Autocomplete is more effective when the desired next token is at (or close to) the top of the list of potential completions offered by the IDE at cursor position. This is where the strength of the underlying machine learning system that produces a ranked order of potential completions comes into play. We advance the state-of-the-art in the accuracy of code prediction (next token prediction) used in autocomplete systems. Our work uses Transformers as the base neural architecture. We show that by making the Transformer architecture aware of the syntactic structure of code, we increase the margin by which a Transformer-based system outperforms previous systems. With this, it outperforms the accuracy of several state-of-the-art next token prediction systems by margins ranging from 14% to 18%. We present in the paper several ways of communicating the code structure to the Transformer, which is fundamentally built for processing sequence data. We provide a comprehensive experimental evaluation of our proposal, along with alternative design choices, on a standard Python dataset, as well as on a company internal Python corpus. Our code and data preparation pipeline will be available in open source.

43 citations

Journal ArticleDOI
TL;DR: A method to estimate the power and energy consumption of an algorithm directly from the C program, and a method to choose the processor and its operating frequency in order to minimize the global energy consumption.
Abstract: We present a method to estimate the power and energy consumption of an algorithm directly from the C program. Three models are involved: a model for the targeted processor (the power model), a model for the algorithm, and a model for the compiler (the prediction model). A functional-level power analysis is performed to obtain the power model. Five power models have been developed so far, for different architectures, from the simple RISC ARM7 to the very complex VLIW DSP TI C64. Important phenomena are taken into account, like cache misses, pipeline stalls, and internal/external memory accesses. The model for the algorithm expresses the algorithm's influence over the processor's activity. The prediction model represents the behavior of the compiler, and how it will allow the algorithm to use the processor's resources. The data mapping is considered at that stage. We have developed a tool, SoftExplorer, which performs estimation both at the C-level and the assembly level. Estimations are performed on real-life digital signal processing applications with average errors of 4.2% at the C-level and 1.8% at the assembly level. We present how SoftExplorer can be used to optimize the consumption of an application. We first show how to find the best data mapping for an algorithm. Then we demonstrate a method to choose the processor and its operating frequency in order to minimize the global energy consumption.

43 citations

Journal ArticleDOI
TL;DR: In this paper, an MILP continuous formulation for pipeline scheduling is presented, which results in the reduction of problem size with respect to available models, and increases the accuracy of pipeline scheduling.

43 citations


Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
86% related
Scalability
50.9K papers, 931.6K citations
85% related
Server
79.5K papers, 1.4M citations
82% related
Electronic circuit
114.2K papers, 971.5K citations
82% related
CMOS
81.3K papers, 1.1M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202218
20211,066
20201,556
20191,793
20181,754
20171,548