scispace - formally typeset

Proceedings ArticleDOI

A predictable communication assist

17 May 2010-pp 97-98

Abstract: Modern multi-processor systems need to provide guaranteed services to their users. A communication assist (CA) helps in achieving tight timing guarantees. In this paper, we present a CA for a tile-based MP-SoC. Our CA has smaller memory requirements and a lower latency than existing CAs. The CA has been implemented in hardware. We compare it with two existing DMA controllers. When compared with these DMAs, our CA is up-to 44% smaller in terms of equivalent gate count.
Topics: Gate count (56%), Latency (engineering) (52%)

Content maybe subject to copyright    Report

A Predictable Communication Assist
Ahsan Shabbir
1
a.shabbir@tue.nl
Sander Stuijk
1
s.stuijk@tue.nl
Akash Kumar
1,2
akash@nus.edu.sg
Bart Theelen
3
bart.theelen@esi.nl
Bart Mesman
1
b.mesman@tue.nl
Henk Corporaal
1
h.corporaal@tue.nl
1
Eindhoven University of Technology Eindhoven, The Netherlands
2
National University of Singapore, Singapore
3
Embedded Systems Institute, The Netherlands
ABSTRACT
Modern multi-processor systems need to provide guaranteed
services to their users. A communication assist (CA) helps in
achieving tight timing guarantees. In this paper, we present
a CA for a tile-based MP-SoC. Our CA has smaller memory
requirements and a lower latency than existing CAs. The
CA has been implemented in hardware. We compare it with
two existing DMA controllers. When compared with these
DMAs, our CA is up-to 44% smaller in terms of equivalent
gate count.
Categories and Subject Descriptors
B.4.3 [Hardware ]: Input/Output and data communica-
tion—Interconnections,interfaces
General Terms
Design, Performance
Keywords
CA, Predictable, FPGAs, Communication, MP-SoC, DMA
1. INTRODUCTION AND RELATED WORK
The number of applications which is executed concur-
rently in an embedded system is increasing rapidly. To
meet the computational demands of these applications, a
multi-processor system-on-chip (MP-SoC) is used. In [2],
a multi-processor platform is introduced that decouples the
computation and communication of applications through a
communication assist (CA). This decoupling makes it easier
to provide tight timing guarantees on the computation and
communication tasks that are p erformed by the applications
running on the platform.
Several CA architectures [4, 5, 6] have been presented
before. These CAs use separate memory regions for stor-
ing data which needs to be communicated and data which
is being processed (i.e., separate communication and data
memories). This enables these CAs to provide timing guar-
antees on their operations, but at the cost of relatively high
latencies and large memory requirements.
The problem of large memory requirement has been solved
by a number of DMA architectures [1, 3, 7]. These DMAs
Copyright is held by the author/owner(s).
CF’10, May 17–19, 2010, Bertinoro, Italy.
ACM 978-1-4503-0044-5/10/05.
CA
P
CA
network
P
T
0
T
1
NI FIFOs NI FIFOs
1
2
3
4
5
MM
Figure 1: Proposed CA-based platform.
transfer data between neighbouring tiles and between tiles
and the main memory. However, DMA controllers do not
provide any guarantees on their timing behaviour. A DMA
controller is a piece of hardware which performs memory
transfers on its own. A CA can be seen as an advanced
distributed DMA controller [5]. Distributed means in this
context that the CAs at both ends of the connection are
working together to execute a block transfer, using a com-
munication proto col on top of the network protocol.
In this paper, we introduce a novel CA architecture in
which a single memory region is used for data which is com-
municated and data which is processed. This leads to an
up-to 50% lower memory requirement as compared to the
CA design presented in [4]. At the same time, our CA ar-
chitecture requires 44% less area when compared to existing
DMA architectures.
The rest of the paper is organized as follows. Section 2
introduces our CA in more detail. Section 3 presents ar-
chitectural details of our CA. The results of the hardware
implementation are presented in Section 4 and Section 5
concludes the paper.
2. COMMUNICATION ASSIST
Figure 1 shows the global view of our CA. It receives data
transfer requests from the processor (step 1 in Figure 1),
moves the data to the Network Interface (NI) FIFOs (step
2). The data goes through the network (step 3) and the CA
at the receiving tile copies it into the local memory of the
tile (step 4). The processor P in tile T
1
processes the data
and subsequently releases the space (step 5) so that the CA
can re-use this space for further transfers.
The CA presented in [4] has a separate data memory and
communication memory. These separate memories not only
cost additional area but also latency as the processor has to
move the data from the data memory to the communication
memory and vice versa. Our CA does not require a sep-

arate communication memory resulting in a lower memory
requirement and latency. Following are the basic functions
of our CA:
1. It accepts data transfer requests from the attached pro-
cessor and splits them into local and remote memory
requests.
2. Local memory requests are simply bypassed to the
data memory.
3. Remote memory requests are handled through a round
robin arbiter. Every two cycles, a 32-bit word is trans-
ferred from the buffer in the memory to an NI FIFO
channel or vice versa.
4. The buffers implemented in the memory are circular
buffers. The number of NI FIFO channels can be
greater than or equal to number of buffers in the data
memory. Our CA is programable, so the same buffer
in the memory can be used as input and output de-
pending on the port to which it is connected.
Our CA acts as an interface that provides a link between the
NoC and the sub systems (processor and memory). It also
acts as a memory management unit that helps the processor
keep track of its data. As a result, it decouples communica-
tion from computation and relieves the processor from data
transfer functions.
Data input Data output
portsports
DM
CA
NI FIFO
AT
P
PSU
MA
Figure 2: CA architec-
ture.
base address of the buffer
Content
Offset
0x00
0x02
0x04
size of the buffer
NI FIFO ID, direction,
Write Start, W
S
E
Write End, W
Read Start, R
Read End, R
S
E
0x06
0x08
0x0C
0x0A
Figure 3: Context
registers of a data
buffer.
3. CA ARCHITECTURE
Figure 2 depicts the hardware components of our CA. The
CA is connected to the network through input/output ports.
Each data port has a FIFO buffer (NI FIFO) that connects
the Memory Arbiter (MA) to the network. The NI FIFOs
can be driven by two clocks: 1) the network clock and 2) sub-
system clock. Separate clock domains allow the integration
of subsystems with different clock frequencies. Following are
the main components of our CA.
The Address Translation Unit (AT) is connected to
the processor of a subsystem. The AT monitors the ad-
dress bus of the processor and distinguishes between the lo-
cal memory accesses and buffer memory accesses, it passes
the local memory accesses to the DM, translates the virtual
address of buffer into physical memory address.
The Pointer Store Unit (PSU) contains a set of reg-
isters (called buffer context) describing the status of each
buffer. A buffer context consists of 6 registers as shown in
Figure 3. The PSU selects one of the buffer contexts as in-
dicated by the MA, sends the selected context to the MA
and updates the registers for management of the circular
buffers. Possible configurations of the PSU include the size
of the buffer, the base address of the buffer in physical mem-
ory, and the id of the connected NI FIFO.
The Memory Arbiter (MA) receives an active context
from the PSU and executes it. The MA executes the data
transfer by generating a memory address, memory control
signal and NI FIFO control signals according to the received
context. The MA switches context every two clock cycles
and checks the next buffers’ context.
Every context belongs to a buffer such that the MA trans-
fers one word between the NI FIFO and the buffer and then
moves on to the next buffer. The transfers are performed in
the same number of clock cycles every time and this gives
us a CA with predictable timing behaviour.
4. HARDWARE IMPLEMENTATION
The CA compares favorably to classical DMA controllers.
Table 1 shows the gate count (NAND2 equivalent) compar-
ison of our CA with other architectures. The CA is synthe-
sized for a clock frequency of 200 MHz. The design is imple-
mented using Synopsis Design Compiler and 0.18µm Stan-
dard Chartered library. The results show that the our CA
is 44% smaller than a commercial DMA [1]. The hardware
results for the CA by [4] are not available in the literature.
Note that our CA does not require complex functionality like
“scatter and gather”; this makes our CA light weight when
compared with the architectures shown in Table 1. All of
the designs have 8 channels.
Table 1: Gate count comparison with other DMAs.
Property our CA MSAP [7] PrimeCell [1]
queue config. 32bit*8 32bit*8 32bit*4
(word)
gate count 36.3k 68k 82k
The MSAP presented in [7] is very similar to our CA.
It uses a control network for the hand-shake between the
processors, before the actual data transfer. Our CA does
not require a control network as it uses “backpressure” as
a flow control mechanism. This makes our CA more area
efficient when compared to [7].
5. CONCLUSION
This paper introduces a programmable CA which uses a
shared data and buffer memory. This leads to lower memory
requirement for the overall system and to a lower commu-
nication latency as compared to CAs in literature. The CA
is up-to 44% smaller in terms of area when compared with
similar architectures and commercial DMA controllers.
6. REFERENCES
[1] ARM. Arm primecell
T M
DMA controller,
http://www.arm.com/armtech/PrimeCell?OpenDocument.
[2] Culler, D.,et al. Parallel computer architecture: a
hardware/software approach. Morgan Kaufmann Publishers,
Inc.
[3] Dave, C., and Charles, F. A scalable high-performance
DMA architecture for DSP applications. In ICCD
’00:,p. 414.
[4] Moonen, A., et al. A multi-core architecture for in-car
digital entertainment. In Proc. of GSPx Conference (2005).
[5] Niewland, et al. The impact of higher communication
layers on NOC supported MP-SoCs. In NOCS ’07 (2007),
pp. 107–116.
[6] Nikolov, H., et al. Multi-processor system design with
ESPAM. In Proc. of CODES+ISSS (06),pp. 211–216.
[7] Sang-Il, H., et al. An efficient scalable and flexible data
transfer architecture for multiprocessor soc with massive
distributed memory. In DAC ’04 , pp. 250–255.
Citations
More filters

Journal ArticleDOI
Ahsan Shabbir1, Akash Kumar1, Sander Stuijk1, Bart Mesman1  +1 moreInstitutions (1)
01 Jul 2010-
TL;DR: A worst-case performance model of the authors' CA is proposed so that the performance of the CA-based platform can be analyzed before its implementation, and a fully automated design flow to generate communication assist (CA) based multi-processor systems (CA-MPSoC) is presented.
Abstract: Future embedded systems demand multi-processor designs to meet real-time deadlines. The large number of applications in these systems generates an exponential number of use-cases. The key design automation challenges are designing systems for these use-cases and fast exploration of software and hardware implementation alternatives with accurate performance evaluation of these use-cases. These challenges cannot be overcome by current design methodologies which are semi-automated, time consuming and error prone. In this paper, we present a fully automated design flow to generate communication assist (CA) based multi-processor systems (CA-MPSoC). A worst-case performance model of our CA is proposed so that the performance of the CA-based platform can be analyzed before its implementation. The design flow provides performance estimates and timing guarantees for both hard real-time and soft real-time applications, provided the task to processor mappings are given by the user. The flow automatically generates a super-set hardware that can be used in all use-cases of the applications. The software for each of these use-cases is also generated including the configuration of communication architecture and interfacing with application tasks. CA-MPSoC has been implemented on Xilinx FPGAs for evaluation. Further, it is made available on-line for the benefit of the research community and in this paper, it is used for performance analysis of two real life applications, Sobel and JPEG encoder executing concurrently. The CA-based platform generated by our design flow records a maximum error of 3.4% between analyzed and measured periods. Our tool can also merge use-cases to generate a super-set hardware which accelerates the evaluation of these use-cases. In a case study with six applications, the use-case merging results in a speed up of 18 when compared to the case where each use-case is evaluated individually.

42 citations


Journal ArticleDOI
01 Nov 2013-
TL;DR: A predictable high-performance communication assist (CA) that helps to tackle design challenges in integrating IP cores into heterogeneous Multi-Processor System-on-Chips (MPSoCs), and a predictable heterogeneous multi-processor platform template for streaming applications is presented.
Abstract: Streaming applications are an important class of applications in emerging embedded systems such as smart camera network, unmanned vehicles, and industrial printing. These applications are usually very computationally intensive and have real-time constraints. To meet the increasing demand for performance and efficiency in these applications, the use of application specific IP cores in heterogeneous Multi-Processor System-on-Chips (MPSoCs) becomes inevitable. However, two of the key challenges in integrating these IP cores into MPSoCs are (i) how to properly handle inter-core communication; (ii) how to map streaming applications in an efficient and predictable way. In this paper, we first present a predictable high-performance communication assist (CA) that helps to tackle these design challenges. The proposed CA has zero throughput overhead, negligible latency overhead, and significantly less resource usage compared to existing CA designs. The proposed CA also provides a unified abstract interface for both processors and accelerator IP cores with flexible data access support. Based on the proposed CA design, we present a predictable heterogeneous multi-processor platform template for streaming applications. The template is used in a predictable design flow that uses Synchronous Data Flow (SDF) graphs for design time analysis. An accurate SDF model of our CA is introduced, enabling the mapping of applications onto heterogeneous MPSoCs in an efficient and predictable way. As a case study, we map the complete high-speed vision processing pipeline of an industrial application, Organic Light Emitting Diode (OLED) screen printing, onto one instance of the proposed platform. The result demonstrates that system design and analysis effort is greatly reduced with the proposed CA-based design flow.

4 citations


Proceedings ArticleDOI
01 Jan 2014-
TL;DR: A novel heuristic algorithm is presented that can design MPSoC platforms and map tasks of multiple applications onto this platform while satisfying the throughput constraints of these applications, and allows sharing of resources between multiple applications.
Abstract: The number of applications executing on multimedia systems is increasing every year. These applications execute in different combinations, known as use-cases. Each application may require guarantees on its performance. Ensuring that all applications meet their throughput requirements in all use-cases with minimum silicon area is a design challenge. In this paper, we present a novel heuristic algorithm that can design MPSoC platforms and map tasks of multiple applications onto this platform while satisfying the throughput constraints of these applications. Our algorithm tries to achieve this goal with minimum hardware area. It exploits the mutual exclusion conditions for concurrent execution of applications, as specified with the use-cases, and allows sharing of resources between multiple applications. There are a number of other techniques (load balancing etc.) that also save resources by sharing them but the feature that distinguishes our work from other related techniques is the fact that in minimization of hardware resources, we not only optimize the computation resources but also optimize the memory and buffer requirements in the interconnect.

2 citations


01 Jan 2011-
TL;DR: This dissertation presents predictable architectural components for MPSoCs, a Predictable MP soC design strategy, automatic platform synthesis tool, a run-time system and an MPSoC simulation technique to design and manage these multi-processor based systems efficiently.
Abstract: The design of multimedia systems has become increasingly complex due to consumer requirements. Consumers demand the functionalities offered by a huge desktop from these systems. Many of these systems are mobile. Therefore, power consumption and size of these devices should be small. These systems are increasingly becoming multi-processor based (MPSoCs) for the reasons of power and performance. Applications execute on these systems in different combinations also known as use-cases. Applications may have different performance requirements in each use-case. Currently, verification of all these use-cases takes bulk of the design effort. There is a need for analysis based techniques so that the platforms have a predictable behaviour and in turn provide guarantees on performance without expending precious man hours on verification. In this dissertation, techniques and architectures have been developed to design and manage these multi-processor based systems efficiently. The dissertation presents predictable architectural components for MPSoCs, a Predictable MPSoC design strategy, automatic platform synthesis tool, a run-time system and an MPSoC simulation technique. The introduction of predictability helps in rapid design of MPSoC platforms. Chapter 1 of the thesis studies the trends in modern multimedia applications and processor architectures. The chapter further highlights the problems in the design of MPSoC platforms and emphasizes the need of predictable design techniques. Predictable design techniques require predictable application and architectural components. The chapter further elaborates on Synchronous Data Flow Graphs which are used to model the applications throughout this thesis. The chapter presents the architecture template used in this thesis and enlists the contributions of the thesis. One of the contributions of this thesis is the design of a predictable component called communication assist. Chapter 2 of the thesis describes the architecture of this communication assist. The communication assist presented in this thesis not only decouples the communication from computation but also provides timing guarantees. Based on this communication assist, an MPSoC platform generation technique has been presented that can design MPSoC platforms capable of satisfying the throughput constraints of multiple applications in all use-cases. The technique is presented in Chapter 3. The design strategy uses three simple steps for platform design. In the first step it finds the required number of processors. The second step minimizes the communication interconnect between the processors and the third step minimizes the communication memory requirement of the platform. Further in Chapter 4, a tool has been developed to generate CA-based platforms for FPGAs. The output of this tool can be used to synthesize platforms on real hardware with the help of FPGA synthesis tools. The applications executing on these platforms often exhibit dynamism e.g. variation in task execution times and change in application throughput requirements. Further, new applications may often be added by consumers at run-time. Resource managers have been presented in literature to handle such dynamic situations. However, the scalability of these resource managers becomes an issue with the increase in number of processors and applications. Chapter 5 presents distributed run-time resource management techniques. Two versions of distributed resource managers have been presented which are scalable with the number of applications and processors. MPSoC platforms for real-time applications are designed assuming worst-case task execution times. It is known that the difference between average-case and worst-case behaviour can be quite large. Therefore, knowing the average case performance is also important for the system designer, and software simulation is often employed to estimate this. However, simulation in software is slow and does not scale with the number of applications and processing elements. In Chapter 6, a fast and scalable simulation methodology is introduced that can simulate the execution of multiple applications on an MPSoC platform. It is based on parallel execution of SDF (Synchronous Data Flow) models of applications. The simulation methodology uses Parallel Discrete Event Simulation (PDES) primitives and it is termed as "Smart Conservative PDES". The methodology generates a parallel simulator which is synthesizable on FPGAs. The framework can also be used to model dynamic arbitration policies which are difficult to analyse using models. The generated platform is also useful in carrying out Design Space Exploration as shown in the thesis. Finally, Chapter 7 summarizes the main findings and (practical) implications of the studies described in previous chapters of this dissertation. Using the contributions mentioned in the thesis, a designer can design and implement predictable multiprocessor based systems capable of satisfying throughput constraints of multiple applications in given set of use-cases, and employ resource management strategies to deal with dynamism in the applications. The chapter also describes the main limitations of this dissertation and makes suggestions for future research.

1 citations


References
More filters

Book
15 Aug 1998-
TL;DR: This book explains the forces behind this convergence of shared-memory, message-passing, data parallel, and data-driven computing architectures and provides comprehensive discussions of parallel programming for high performance and of workload-driven evaluation, based on understanding hardware-software interactions.
Abstract: The most exciting development in parallel computer architecture is the convergence of traditionally disparate approaches on a common machine structure. This book explains the forces behind this convergence of shared-memory, message-passing, data parallel, and data-driven computing architectures. It then examines the design issues that are critical to all parallel architecture across the full range of modern design, covering data access, communication performance, coordination of cooperative work, and correct implementation of useful semantics. It not only describes the hardware and software techniques for addressing each of these issues but also explores how these techniques interact in the same system. Examining architecture from an application-driven perspective, it provides comprehensive discussions of parallel programming for high performance and of workload-driven evaluation, based on understanding hardware-software interactions. * synthesizes a decade of research and development for practicing engineers, graduate students, and researchers in parallel computer architecture, system software, and applications development * presents in-depth application case studies from computer graphics, computational science and engineering, and data mining to demonstrate sound quantitative evaluation of design trade-offs * describes the process of programming for performance, including both the architecture-independent and architecture-dependent aspects, with examples and case-studies * illustrates bus-based and network-based parallel systems with case studies of more than a dozen important commercial designs Table of Contents 1 Introduction 2 Parallel Programs 3 Programming for Performance 4 Workload-Driven Evaluation 5 Shared Memory Multiprocessors 6 Snoop-based Multiprocessor Design 7 Scalable Multiprocessors 8 Directory-based Cache Coherence 9 Hardware-Software Tradeoffs 10 Interconnection Network Design 11 Latency Tolerance 12 Future Directions APPENDIX A Parallel Benchmark Suites

1,563 citations


Proceedings ArticleDOI
22 Oct 2006-
TL;DR: This paper explains how starting from system level platform, application, and mapping specifications, a multiprocessor platform is synthesized and programmed in a systematic and automated way in order to reduce the design time and to satisfy the performance needs of applications executed on these platforms.
Abstract: For modern embedded systems, the complexity of embedded applications has reached a point where the performance requirements of these applications can no longer be supported by embedded system architectures based on a single processor. Thus, the emerging embedded System-on-Chip platforms are increasingly becoming multiprocessor architectures. As a consequence, two major problems emerge, i.e., how to design and how to program such multiprocessor platforms in a systematic and automated way in order to reduce the design time and to satisfy the performance needs of applications executed on these platforms. Unfortunately, most of the current design methodologies and tools are based on Register Transfer Level (RTL) descriptions, mostly created by hand. Such methodologies are inadequate, because creating RTL descriptions of complex multiprocessor systems is error-prone and time consuming.As an efficient solution to these two problems, in this paper we propose a methodology and techniques implemented in a tool called Espam for automated multiprocessor system design and implementation. Espam moves the design specification from RTL to a higher, so called system level of abstraction. We explain how starting from system level platform, application, and mapping specifications, a multiprocessor platform is synthesized and programmed in a systematic and automated way. Furthermore, we present some results obtained by applying our methodology and Espam tool to automatically generate multiprocessor systems that execute a real-life application, namely a Motion-JPEG encoder.

78 citations


Proceedings ArticleDOI
Sang-Il Han1, Amer Baghdadi, M. Bonaciu, Soo-Ik Chae1  +1 moreInstitutions (1)
07 Jun 2004-
TL;DR: The proposed Distributed Memory Server is composed of high-performance and flexible memory service access points (MSAPs), which execute data transfers without intervention of the processing elements, and data network, and control network that can handle direct massive data transfer between the distributed memories of an MPSoC.
Abstract: Massive data transfer encountered in emerging multimedia embedded applications requires architecture allowing both highly distributed memory structure and multiprocessor computation to be handled. The key issue that needs to be solved is then how to manage data transfers between large numbers of distributed memories. To overcome this issue, our paper proposes a scalable Distributed Memory Server (DMS) for multiprocessor SoC (MPSoC). The proposed DMS is composed of: (1) high-performance and flexible memory service access points (MSAPs), which execute data transfers without intervention of the processing elements, (2) data network, and (3) control network. It can handle direct massive data transfer between the distributed memories of an MPSoC. The scalability and flexibility of the proposed DMS are illustrated through the implementation of an MPEG4 video encoder for QCIF and CIF formats. The experiments show clearly how DMS can be adapted to accommodate different SoC configurations requiring various data transfer bandwidths. Synthesis results show that bandwidth can scale up to 28.8 GB/sec.

49 citations


01 Jan 2005-
TL;DR: An multi-core architecture using a networkon-chip, which provides the required flexibility and scalability is described, and it is shown that the latency is comparable to the current architecture.
Abstract: This paper presents a new multi-core architecture for in-car digital entertainment. Target functions vary from terrestrial reception, digital reception, and compressed audio, up to handsfree voice with acoustic echo cancellation and USB media playback, possibly in different user modes like single versus dual media sound. In the near future, new functions like near field communication, wireless streaming, storage, digital rights management, navigation and video become important. The main challenge is that the platform must be open for future functions, which are unknown at design time. Another challenge is to reduce the design effort by maximizing reuse of hardware and software, especially from related domains like Consumer Electronics (CE). This paper describes an multi-core architecture using a networkon-chip, which provides the required flexibility and scalability. The area overhead, due to the network, is estimated to be 1.5% compared to the current architecture. Furthermore, it is shown that the latency is comparable to the current architecture.

28 citations


Proceedings ArticleDOI
07 May 2007-
TL;DR: A contrastive comparison of cache-based versus scratch-pad managed inter-processor communication for (distributed shared-memory) multiprocessor systems-on-chip shows that the scratchpad application mapping has the best overall performance, that it helps smoothing NoC traffic and that it is not sensitive to the quality-of-service (QoS) used.
Abstract: Multi-processor systems-on-chip use networks-on-chip (NoC) as a communication backbone to tackle the communication between processors and multi-level memory hierarchies. Inter-processor communication has a high impact on the NoC traffic but, to this day, there have been few detailed studies. Based on a realistic case study, we present a contrastive comparison of cache-based versus scratch-pad managed inter-processor communication for (distributed shared-memory) multiprocessor systems-on-chip. The platforms we target use six DSP nodes and a shared L2 memory, interconnected by a packet-switched network-on-chip with differentiated services. The first version of the platform uses caches to perform inter-processor communication whereas the second one uses a novel type of distributed DMA to help performing scratch-pad management. With detailed simulation results we show that the scratchpad application mapping has the best overall performance, that it helps smoothing NoC traffic and that it is not sensitive to the quality-of-service (QoS) used. We furthermore demonstrate that, on the contrary, cache-based MP-SoCs are very sensitive to the QoS level and that they generate significantly more NoC traffic than their scratch-pad counterpart. We recommend, where possible, to use scratch-pad management for NoC supported MP-SoCs as it yields performant, predictable results and can benefit from platform virtualization to achieve composability of applications

24 citations


Network Information
Related Papers (5)
Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20141
20131
20111
20101