scispace - formally typeset
Proceedings ArticleDOI

Reducing Queuing Impact in Irregular Data Streaming Applications

TLDR
In this article, the authors studied irregular data flow applications with data-dependent and unknown a priori data flow and proposed a dynamic programming approach to minimize the frequency of switching between nodes.
Abstract
Throughput-oriented streaming applications on massive data sets are a prime candidate for parallelization on wide-SIMD platforms, especially when inputs are independent of one another. Many such applications are represented as a pipeline of compute nodes connected by directed edges. Here, we study applications with irregular data flow, i.e., those where the number of outputs produced per input to a node is data-dependent and unknown a priori. Moreover, we target these applications to architectures (GPUs) where different nodes of the pipeline execute cooperatively on a single wide-SIMD processor. To promote greater SIMD parallelism, irregular application pipelines can utilize queues to gather and compact multiple data items between nodes. However, the decision to introduce a queue between two nodes must trade off benefits to occupancy against costs associated with queue reading, writing, and management. Moreover, once queues are introduced to an application, their relative sizes impact the frequency with which the application switches between nodes, incurring scheduling and context-switching overhead. This work examines two optimization problems associated with queues. First, we consider which pairs of successive nodes in a pipeline should have queues between them to maximize overall application throughput. Second, given a fixed total budget for queue space, we consider how to choose the relative sizes of inter-node queues to minimize the frequency of switching between nodes. We formulate a dynamic programming approach to the first problem and give an empirically useful approximation to the second that allows for an analytical solution. Finally, we validate our theoretical results using real-world irregular streaming computations.

read more

Citations
More filters
Proceedings ArticleDOI

To move or not to move?: page migration for irregular applications in over-subscribed GPU memory systems with DynaMap

TL;DR: DynaMap as mentioned in this paper uses a compiler pass to instrument off-the-shelf CUDA UVM applications for spatial utilization tracking, and dynamically sets a spatial utilization threshold to determine migration based on memory pressure and access characteristics, and enhances the current NVIDIA UVM driver to dynamically migrate the page from the host memory to the GPU.
References
More filters
Journal ArticleDOI

Basic Local Alignment Search Tool

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.
Journal ArticleDOI

Synchronous data flow

TL;DR: A preliminary SDF software system for automatically generating assembly language code for DSP microcomputers is described, and two new efficiency techniques are introduced, static buffering and an extension to SDF to efficiently implement conditionals.
Book ChapterDOI

StreamIt: A Language for Streaming Applications

TL;DR: The StreamIt language provides novel high-level representations to improve programmer productivity and program robustness within the streaming domain and the StreamIt compiler aims to improve the performance of streaming applications via stream-specific analyses and optimizations.
Proceedings ArticleDOI

Optimal latency-throughput tradeoffs for data parallel pipelines

TL;DR: This paper presents anew algorithm to determine a processor mapping of a chain of tasks that optimizes the latency in the presence of throughput constraints, and optimization of the throughput with latency constraints.
Journal ArticleDOI

Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows

TL;DR: A simplified model with no communication cost is considered, and an exhaustive list of complexity results for different problem instances is provided, and some instances of this simple model are shown to be NP-hard, thereby exposing the inherent complexity of the mapping problem.
Related Papers (5)