scispace - formally typeset
Search or ask a question

Showing papers by "Tatiana Shpeisman published in 2018"


Patent
14 Mar 2018
TL;DR: In this article, a general-purpose graphics processing unit comprising a streaming multiprocessor having a single instruction, multiple thread (SIMT) architecture including hardware multithreading is presented.
Abstract: One embodiment provides for a general-purpose graphics processing unit comprising a streaming multiprocessor having a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The streaming multiprocessor comprises multiple processing blocks including multiple processing cores. The processing cores include independent integer and floating-point data paths that are configurable to concurrently execute multiple independent instructions. A memory is coupled with the multiple processing blocks.

20 citations


Patent
18 Oct 2018
TL;DR: In this article, an apparatus to facilitate optimization of a convolutional neural network (CNN) is presented, which includes optimization logic to receive a CNN model having a list of instructions and including pruning logic to optimize the list of instruction by eliminating branches in the list.
Abstract: An apparatus to facilitate optimization of a convolutional neural network (CNN) is disclosed. The apparatus includes optimization logic to receive a CNN model having a list of instructions and including pruning logic to optimize the list of instructions by eliminating branches in the list of instructions that comprise a weight value of 0.

4 citations


Patent
18 Oct 2018
TL;DR: In this paper, the difference between a first training dataset to be used for a neural network and a second training dataset for the neural network is detected and authenticated in response to the detection of the difference.
Abstract: Methods and apparatus relating to autonomous vehicle neural network optimization techniques are described. In an embodiment, the difference between a first training dataset to be used for a neural network and a second training dataset to be used for the neural network is detected. The second training dataset is authenticated in response to the detection of the difference. The neural network is used to assist in an autonomous vehicle/driving. Other embodiments are also disclosed and claimed.

3 citations


Patent
25 Oct 2018
TL;DR: In this paper, the authors describe an apparatus comprising a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision components or the low precision components to execute the instructions.
Abstract: In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.

2 citations


Journal ArticleDOI
TL;DR: This work proposes two novel schedulers with distinct goals: a device-affinity, contention-aware scheduler that incorporates instrumentation-driven optimizations to improve the throughput of running diverse applications on integrated CPU–GPU servers, and a specialized, affinity-aware work-stealing Scheduler that efficiently distributes work across all CPU and GPU cores for the same application.
Abstract: Integrated GPU systems are a cost-effective and energy-efficient option for accelerating data-intensive applications. While these platforms have reduced overhead of offloading computation to the GPU and potential for fine-grained resource scheduling, there remain several open challenges: (1) the distinct execution models inherent in the heterogeneous devices present on such platforms drives the need to dynamically match workload characteristics to the underlying resources, (2) the complex architecture and programming models of such systems require substantial application knowledge to achieve high performance, and (3) as such systems become prevalent, there is a need to extend their utility from running known regular data-parallel applications to the broader set of input-dependent, irregular applications common in enterprise settings. The key contribution of our research is to enable runtime specialization on such integrated GPU platforms by matching application characteristics to the underlying heterogeneous resources for both regular and irregular workloads. Our approach enables profile-driven resource management and optimizations for such platforms, providing high application performance and system throughput. Toward this end, this work proposes two novel schedulers with distinct goals: (a) a device-affinity, contention-aware scheduler that incorporates instrumentation-driven optimizations to improve the throughput of running diverse applications on integrated CPU–GPU servers, and (b) a specialized, affinity-aware work-stealing scheduler that efficiently distributes work across all CPU and GPU cores for the same application, taking into account both application characteristics and architectural differences of the underlying devices.

1 citations


Patent
21 Jun 2018
TL;DR: In this article, a dynamic runtime scheduling system includes task manager circuitry capable of detecting a correspondence in at least a portion of the output arguments from one or more first tasks with at least some of the input arguments to the second tasks.
Abstract: A dynamic runtime scheduling system includes task manager circuitry capable of detecting a correspondence in at least a portion of the output arguments from one or more first tasks with at least a portion of the input arguments to one or more second tasks. Upon detecting the output arguments from the first task represents a superset of the second task input arguments, the task manager circuitry apportions the first task into a plurality of new subtasks. At least one of the new subtasks includes output arguments having a 1:1 correspondence to the second task input arguments. Upon detecting the output arguments from an first task represents a subset of the second task input arguments, the task manager circuitry may autonomously apportion the second task into a plurality of new subtasks. At least one of the new subtasks may include input arguments having a 1:1 correspondence to first task output arguments.