CA-MPSoC: An automated design flow for predictable multi-processor architectures for multiple applications
read more
Citations
Exploring Trade-Offs inBuffer Requirements and Throughput Constraints forSynchronous Dataflow Graphs*
Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap
A methodology for automated design of hard-real-time embedded streaming systems
Dataflow formalisation of real-time streaming applications on a Composable and Predictable Multi-Processor SOC
An automated flow to map throughput constrained applications to a MPSoC
References
The Semantics of a Simple Language for Parallel Programming.
Parallel Computer Architecture: A Hardware/Software Approach
Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing
Blind Image Restoration Using a Block-Stationary Signal Model
Related Papers (5)
Frequently Asked Questions (16)
Q2. What are the future works mentioned in the paper "Ca-mpsoc: an automated design flow for predictable multi-processor architectures for multiple applications" ?
In the future, the authors intend to include an NoC also in their design flow. The authors also want to extend the design flow with automated mapping decisions, so that mapping of the actors to the processors can also be optimized.
Q3. What is the process of finding the new processor probabilities?
The updated actor execution times, execution probabilities and waiting probabilities are used to find the new processor level probabilities.
Q4. What is the complexity of the analysis?
Since the number of combinations is exponential in the number of actors mapped on a resource, the analysis has an exponential complexity.
Q5. What is the effect of the buffer size on the actor?
When an actor writes data to such channels, the available size reduces; when the receiving actor consumes this data, the available buffer increases, modeled by an increase in the number of tokens.
Q6. How can the authors make use of tile-based platforms easier?
In order to make use of tile-based platforms easier, inter-tile communication for these architectures should be predictable, fast and easy to program.
Q7. How long does it take to generate the bit file?
The Xilinx tool takes about 36 minute to generate the bit file together with the appropriate instruction and data memories for each core in the design.
Q8. What is the use-case merging technique used to generate the hardware?
As the generated hardware supports multiple use-cases, so the authors employ the use-case merging technique [26] and modify its certain parts to incorporate CA buffers.
Q9. What is the preferred architecture for a preemptive system?
In high performance embedded processors (like SPEs in Cell Broad Band Engine and graphics processors), non-preemptive systems are preferred over preemptive systems.
Q10. What is the interesting way to find the throughput of an SDFG?
One of the methods to find the throughput of an SDFG is to convert it into HSDF graph and then find the throughput of the resulting graph.
Q11. What is the way to compute the worst-case waiting time for non-preempt?
The worst-case-waiting-times for non-preemptive systems for FCFS as mentioned in [16] are computed by using the following formulatwait = n∑i=1texec(ai) (5)where actors ai for i = 1, 2, 3, ...n are mapped on the same resource (i.e processor).
Q12. How many pixels are sent to the DCT actor?
The DCT actor sends these 6 macro-blocks one by one (64 pixels each time) to the VLC actor where each of these macro-blocks is variable length encoded.
Q13. What is the number of CPUs and CA buffers needed for each use-case?
While the number of processors and CA buffers needed is updated with a max operation (line 10 and line 11 in Algorithm 1), the number of CA channels is added for each application (indicated by line 13 in Algorithm 1).
Q14. How many design points does it take to generate a new hardware?
As the number of design points are increased, the cost of generating the hardware becomes neg-ligible and each iteration takes only about 25 seconds.
Q15. What is the reason for the non-blocking of the claimreadspace and claimwritespace?
To avoid this, claimreadspace and claimwritespace commands have been implemented as non-blocking so that if any of claimspace commands is unsuccessful, the processor is not blocked.
Q16. How can a streaming application be described?
streaming applications can be described in a data flow like manner and the computational kernels of this flow can be easily mapped to suitable processing elements.