Showing papers by "Tatiana Shpeisman published in 2018"

PDF

Open Access

Patent•

Mixed inference using low and high precision

[...]

Elmoustapha Ould-Ahmed-Vall¹, Barath Lakshmanan, Tatiana Shpeisman, Ray Joydeep, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Ben Ashbaugh, Linda L. Hurd, Liwei Ma - Show less +7 more•Institutions (1)

Intel¹

14 Mar 2018

TL;DR: In this article, a general-purpose graphics processing unit comprising a streaming multiprocessor having a single instruction, multiple thread (SIMT) architecture including hardware multithreading is presented.

...read moreread less

Abstract: One embodiment provides for a general-purpose graphics processing unit comprising a streaming multiprocessor having a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The streaming multiprocessor comprises multiple processing blocks including multiple processing cores. The processing cores include independent integer and floating-point data paths that are configurable to concurrently execute multiple independent instructions. A memory is coupled with the multiple processing blocks.

...read moreread less

20 citations

Patent•

Convolutional neural network optimization mechanism

[...]

Liwei Ma¹, Elmoustapha Ould-Ahmed-Vall, Barath Lakshmanan, Ben Ashbaugh, Jingyi Jin, Jeremy Bottleson, Mike B. MacPherson, Kevin Nealis, Srivastava Dhawal, Ray Joydeep, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman, Koker Altug, Appu Abhishek R - Show less +13 more•Institutions (1)

Intel¹

18 Oct 2018

TL;DR: In this article, an apparatus to facilitate optimization of a convolutional neural network (CNN) is presented, which includes optimization logic to receive a CNN model having a list of instructions and including pruning logic to optimize the list of instruction by eliminating branches in the list.

...read moreread less

Abstract: An apparatus to facilitate optimization of a convolutional neural network (CNN) is disclosed. The apparatus includes optimization logic to receive a CNN model having a list of instructions and including pruning logic to optimize the list of instructions by eliminating branches in the list of instructions that comprise a weight value of 0.

...read moreread less

4 citations

Patent•

Autonomous vehicle neural network optimization

[...]

Appu Abhishek R¹, Koker Altug¹, Linda L. Hurd¹, Dukhwan Kim¹, Mike B. MacPherson¹, John C. Weast¹, Justin Gottschlich¹, Jingyi Jin¹, Barath Lakshmanan¹, Chandrasekaran Sakthivel¹, Michael S. Strickland¹, Ray Joydeep¹, Sinha Kamal¹, Surti Prasoonkumar¹, Balaji Vembu¹, Ping T. Tang¹, Anbang Yao¹, Tatiana Shpeisman¹, Xiaoming Chen¹, Ranganathan Vasanth¹, Sanjeev Jahagirdar¹ - Show less +17 more•Institutions (1)

Intel¹

18 Oct 2018

TL;DR: In this paper, the difference between a first training dataset to be used for a neural network and a second training dataset for the neural network is detected and authenticated in response to the detection of the difference.

...read moreread less

Abstract: Methods and apparatus relating to autonomous vehicle neural network optimization techniques are described. In an embodiment, the difference between a first training dataset to be used for a neural network and a second training dataset to be used for the neural network is detected. The second training dataset is authenticated in response to the detection of the difference. The neural network is used to assist in an autonomous vehicle/driving. Other embodiments are also disclosed and claimed.

...read moreread less

3 citations

Patent•

Dynamic precision for neural network compute operations

[...]

Sinha Kamal¹, Balaji Vembu¹, Eriko Nurvitadhi¹, Galoppo Von Borries Nicolas C¹, Rajkishore Barik¹, Tsung-Han Lin¹, Ray Joydeep¹, Ping T. Tang¹, Michael S. Strickland¹, Xiaoming Chen¹, Anbang Yao¹, Tatiana Shpeisman¹, Appu Abhishek R¹, Koker Altug¹, Farshad Akhbari¹, Narayan Srinivasa¹, Chen Feng¹, Dukhwan Kim¹, Nadathur Satish¹, John C. Weast¹, Mike B. MacPherson¹, Linda L. Hurd¹, Ranganathan Vasanth¹, Sanjeev Jahagirdar¹ - Show less +20 more•Institutions (1)

Intel¹

25 Oct 2018

TL;DR: In this paper, the authors describe an apparatus comprising a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision components or the low precision components to execute the instructions.

...read moreread less

Abstract: In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.

...read moreread less

2 citations

Journal Article•DOI•

Accelerating Data Analytics on Integrated GPU Platforms via Runtime Specialization

[...]

Naila Farooqui¹, Indrajit Roy², Yuan Chen², Vanish Talwar, Rajkishore Barik¹, Brian T. Lewis¹, Tatiana Shpeisman¹, Karsten Schwan³ - Show less +4 more•Institutions (3)

Intel¹, Hewlett-Packard², Georgia Institute of Technology³

01 Apr 2018-International Journal of Parallel Programming

TL;DR: This work proposes two novel schedulers with distinct goals: a device-affinity, contention-aware scheduler that incorporates instrumentation-driven optimizations to improve the throughput of running diverse applications on integrated CPU–GPU servers, and a specialized, affinity-aware work-stealing Scheduler that efficiently distributes work across all CPU and GPU cores for the same application.

...read moreread less

Abstract: Integrated GPU systems are a cost-effective and energy-efficient option for accelerating data-intensive applications. While these platforms have reduced overhead of offloading computation to the GPU and potential for fine-grained resource scheduling, there remain several open challenges: (1) the distinct execution models inherent in the heterogeneous devices present on such platforms drives the need to dynamically match workload characteristics to the underlying resources, (2) the complex architecture and programming models of such systems require substantial application knowledge to achieve high performance, and (3) as such systems become prevalent, there is a need to extend their utility from running known regular data-parallel applications to the broader set of input-dependent, irregular applications common in enterprise settings. The key contribution of our research is to enable runtime specialization on such integrated GPU platforms by matching application characteristics to the underlying heterogeneous resources for both regular and irregular workloads. Our approach enables profile-driven resource management and optimizations for such platforms, providing high application performance and system throughput. Toward this end, this work proposes two novel schedulers with distinct goals: (a) a device-affinity, contention-aware scheduler that incorporates instrumentation-driven optimizations to improve the throughput of running diverse applications on integrated CPU–GPU servers, and (b) a specialized, affinity-aware work-stealing scheduler that efficiently distributes work across all CPU and GPU cores for the same application, taking into account both application characteristics and architectural differences of the underlying devices.

...read moreread less

1 citations

Patent•

Dynamic runtime task management

[...]

Chunling Hu¹, Tatiana Shpeisman, Rajkishore Barik, Justin Gottschlich•Institutions (1)

Intel¹

21 Jun 2018

TL;DR: In this article, a dynamic runtime scheduling system includes task manager circuitry capable of detecting a correspondence in at least a portion of the output arguments from one or more first tasks with at least some of the input arguments to the second tasks.

...read moreread less

Abstract: A dynamic runtime scheduling system includes task manager circuitry capable of detecting a correspondence in at least a portion of the output arguments from one or more first tasks with at least a portion of the input arguments to one or more second tasks. Upon detecting the output arguments from the first task represents a superset of the second task input arguments, the task manager circuitry apportions the first task into a plurality of new subtasks. At least one of the new subtasks includes output arguments having a 1:1 correspondence to the second task input arguments. Upon detecting the output arguments from an first task represents a subset of the second task input arguments, the task manager circuitry may autonomously apportion the second task into a plurality of new subtasks. At least one of the new subtasks may include input arguments having a 1:1 correspondence to first task output arguments.

...read moreread less