Run-time generation of partial FPGA configurations

Question

Q1. What are the contributions in "Run-time generation of partial fpga configurations" ?

Q2. What is the standard synthesis flow for a Virtex-II Pro?

Q3. What is the description of a router for just-in-time mapping of a?

Q4. What is the role of RTR in the creation of new bitstreams?

Q5. What is the role of the configuration memory controller?

Q6. How many CLBs are used to calculate a stripe density?

Q7. What is the first step of the bitstream?

Q8. How would the physical interface adaptation be performed?

Q9. How long does it take to generate a bitstream?

Q10. How much is the route reuse heuristic cost effective?

Q11. What is the procedure used to find a path between a source and a sink?

Q12. What is the main task of the creation of a new partial bitstream?

Q13. How many temperature decreases does the algorithm make?

Q14. What is the way to generate configurations?

Q15. What is the significant improvement in routing time for benchmark H8?

Q16. How much is the average length of the connections improved?

Accepted Answer

This paper presents and evaluates a method of generating partial bitstreams at run-time for dynamic reconfiguration of sections of an FPGA.

Accepted Answer

The standard synthesis flow produces a full bitstream for each component, which must be post-processed in order to extract the relevant partial bitstream and to add more information for use at run-time.

Accepted Answer

A router for just-in-time mapping of a device-independent configuration description to a specific device architecture is described by [21]: it is able to produce good hardware circuits using 13 times less memory and executing 10 times faster than VPR [22] (running on a desktop computer).

Accepted Answer

The creation of the new bitstreams requires assigning positions of the reconfigurable area to components (placement), relocating and merging the individual component bitstreams, and interconnecting the components (routing) by modification of the merged bitstream.

Accepted Answer

The configuration memory controller is responsible for handling the actual transfer of the bitstreams to the configuration memory through the internal configuration access port (ICAP).

Accepted Answer

Since routes can cross stripes by going through the components or by using empty spaces, a stripe density value is calculated, which is given by the percentage of occupied CLBs.

Accepted Answer

At the beginning of the first step, the bitstream of the empty dynamic area is used to initialize the working array of configuration information.

Accepted Answer

With the current system, the physical interface adaptation might be performed at run-time by adding appropriate ‘‘glue’’ components to the design.

Accepted Answer

The results for a set of 29 benchmarks (both synthetic and application-derived) show that time required for bitstream generation on a 300 MHz PowerPC embedded processor depends strongly on the complexity of the circuits, but is under 35 s for all benchmarks (average: 18 s).

Accepted Answer

In general, the route reuse heuristic is very cost effective, because it provides an improvement in running time at a very low implementation cost.

Accepted Answer

Procedure LinkPins (Alg. 3) is used to find a path from a source pin to one sink pin: it performs a breadth-first search for the shortest path between the source and sink terminals (line 4).

Accepted Answer

In this work, the creation of a new partial bitstream involves three major tasks: placement, routing, and bitstream construction (Fig. 2).

Accepted Answer

The algorithm stops after a predefined number of temperature decreases (currently, 50), or if it fails to make any improvement for a fixed number of successive temperature decreases (currently, 10).

Accepted Answer

The working implementation described here shows that runtime generation of configurations is a feasible technique for use on highly adaptive embedded systems, where it may be used to provide precisely-tailored hardware support for tasks whose computational needs exceed the computational power of the CPU.

Accepted Answer

The data of Table 3 show that the most significant improvement in routing time occurs for benchmark H8: with SA2D the routing time decreases 19.3%, for a global improvement of 15.9%.

Accepted Answer

The average length of the connections is also systematically improved, in the best cases by more than one segment: M2 shows a reduction of 18.7% from 5.67 to 4.61 for the average length.

Run-time generation of partial FPGA configurations

Figures

Citations

Fast and standalone Design Space Exploration for High-Level Synthesis under resource constraints

A Fast and Autonomous HLS Methodology for Hardware Accelerator Generation under Resource Constraints

Generation of Custom Run-Time Reconfigurable Hardware for Transparent Binary Acceleration

References

Optimization by Simulated Annealing

MediaBench: a tool for evaluating and synthesizing multimedia and communications systems

Architecture and CAD for Deep-Submicron FPGAS

Invited Paper: Enhanced Architectures, Design Methodologies and CAD Tools for Dynamic Reconfiguration of Xilinx FPGAs

Dynamic hardware plugins in an FPGA with partial run-time reconfiguration

Related Papers (5)

Generation of partial FPGA configurations at run-time

Creation of Partial FPGA Configurations at Run-Time

FPGA architecture support for heterogeneous, relocatable partial bitstreams

A novel approach to minimizing reconfiguration cost for LUT-based FPGAs

Towards a more efficient run-time FPGA configuration generation

Frequently Asked Questions (16)