scispace - formally typeset
Search or ask a question
Topic

Multi-core processor

About: Multi-core processor is a research topic. Over the lifetime, 15435 publications have been published within this topic receiving 266049 citations. The topic is also known as: multi-core & multicore processor.


Papers
More filters
Proceedings ArticleDOI
02 Nov 2016
TL;DR: TensorFlow as mentioned in this paper is a machine learning system that operates at large scale and in heterogeneous environments, using dataflow graphs to represent computation, shared state, and the operations that mutate that state.
Abstract: TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. Tensor-Flow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom-designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous "parameter server" designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with a focus on training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model and demonstrate the compelling performance that TensorFlow achieves for several real-world applications.

10,913 citations

Posted Content
TL;DR: The TensorFlow dataflow model is described and the compelling performance that Tensor Flow achieves for several real-world applications is demonstrated.
Abstract: TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous "parameter server" designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with particularly strong support for training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model in contrast to existing systems, and demonstrate the compelling performance that TensorFlow achieves for several real-world applications.

5,542 citations

Proceedings ArticleDOI
Michael Isard1, Mihai Budiu1, Yuan Yu1, Andrew Birrell1, Dennis Fetterly1 
21 Mar 2007
TL;DR: The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.
Abstract: Dryad is a general-purpose distributed execution engine for coarse-grain data-parallel applications. A Dryad application combines computational "vertices" with communication "channels" to form a dataflow graph. Dryad runs the application by executing the vertices of this graph on a set of available computers, communicating as appropriate through flies, TCP pipes, and shared-memory FIFOs.The vertices provided by the application developer are quite simple and are usually written as sequential programs with no thread creation or locking. Concurrency arises from Dryad scheduling vertices to run simultaneously on multiple computers, or on multiple CPU cores within a computer. The application can discover the size and placement of data at run time, and modify the graph as the computation progresses to make efficient use of the available resources.Dryad is designed to scale from powerful multi-core single computers, through small clusters of computers, to data centers with thousands of computers. The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.

2,867 citations

Proceedings ArticleDOI
04 Oct 2009
TL;DR: This characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.
Abstract: This paper presents and characterizes Rodinia, a benchmark suite for heterogeneous computing. To help architects study emerging platforms such as GPUs (Graphics Processing Units), Rodinia includes applications and kernels which target multi-core CPU and GPU platforms. The choice of applications is inspired by Berkeley's dwarf taxonomy. Our characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.

2,697 citations

Book
01 Jan 1994
TL;DR: Using MPI as mentioned in this paper provides a thoroughly updated guide to the MPI (Message-Passing Interface) standard library for writing programs for parallel computers, including a comparison of MPI with sockets.
Abstract: This book offers a thoroughly updated guide to the MPI (Message-Passing Interface) standard library for writing programs for parallel computers Since the publication of the previous edition of Using MPI, parallel computing has become mainstream Today, applications run on computers with millions of processors; multiple processors sharing memory and multicore processors with multiple hardware threads per core are common The MPI-3 Forum recently brought the MPI standard up to date with respect to developments in hardware capabilities, core language evolution, the needs of applications, and experience gained over the years by vendors, implementers, and users This third edition of Using MPI reflects these changes in both text and example code The book takes an informal, tutorial approach, introducing each concept through easy-to-understand examples, including actual code in C and Fortran Topics include using MPI in simple programs, virtual topologies, MPI datatypes, parallel libraries, and a comparison of MPI with sockets For the third edition, example code has been brought up to date; applications have been updated; and references reflect the recent attention MPI has received in the literature A companion volume, Using Advanced MPI, covers more advanced topics, including hybrid programming and coping with large data

2,666 citations


Network Information
Related Topics (5)
Scalability
50.9K papers, 931.6K citations
93% related
Cache
59.1K papers, 976.6K citations
91% related
Server
79.5K papers, 1.4M citations
90% related
Scheduling (computing)
78.6K papers, 1.3M citations
85% related
Network packet
159.7K papers, 2.2M citations
85% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023197
2022477
2021367
2020545
2019766
2018809