A case of system-level hardware/software co-design and co-verification of a commodity multi-processor system with custom hardware
Summary (2 min read)
1. INTRODUCTION
- Modern digital systems are moving increasingly towards heterogeneity.
- The authors discuss which among the many conventional co-design/verification methods would best serve their purposes and why.
- The authors show the effectiveness of their methodology and draw out general lessons from it (Section 4), before they conclude in Section 5.
- The authors found that combining the software model with an interconnection simulator and HDL simulator via a processor BFM is the most effective method for functional verification.
- The authors also explain, however, that conventional ways of constructing a processor BFM should be revised in accordance with modern multi-core processors and inter- connection architecture; it is especially important to accurately implement the memory consistency model and cache coherence protocol.
2.1 Target Design
- This section outlines the design of their system for the sake of providing sufficient background context for one to understand their co-design and co-verification issues, while the detailed design of the system is outside scope of this paper.
- A typical software transactional memory (STM), an implementation of such a runtime system solely with software, tends to exhibit huge performance overhead.
- (6) The TM hardware, based on all the read/write address received from all the cores, now determines which cores have conflicting read and writes and sends out the requisite messages to those cores.
- The system is composed of two quad-core x86 CPUs and an FPGA that is connected coherently via a chain of point-to-point links.
- Messages from the CPU are sent to their HW via a non-coherent interface, while responses to the CPU go through the coherent cache.
2.2 Our Initial Failure and Issues with Co-Verification
- Since their system was composed of tightly coupled hardware and software, the authors had to deal with a classic chickenand-egg co-verification problem:.
- A crash observed after one last memory access from one core (e.g. de-referencing a dangling pointer), could be a result of a write, falsely allowed to commit, from another core millions of cycles before.
- The new STM software needed to be intensely validated (with the new hardware), especially under the assumption of parallel execution, variable latency and out-of-order message delivery as shown in Figure 2.
- An alternative was to use a detailed architecture simulator (e.g. [22]) but its simulation speed was insufficient.
- This method brought its own challenges which the authors discuss in detail in the following section.
3.1 General Issues and Solutions
- The authors discuss general issues that arise when using a software model for HW/SW co-verification and how they overcame those issues.
- Hardware simulation can consist of two different components: a cycle-accurate interconnection network simulation and HDL simulation.
- Instead, the authors rely on the single-threaded simulator to interleave multiple software execution contexts.
- (Issue #4) The memory consistency model must be carefully considered.
- Otherwise, the contents in the store buffer, up to the entry that has been matched, are flushed before the new packet is injected.
3.2 Our Co-simulation Environment: Implementation
- This subsection details the implementation of their co-simulation environment where all the issues discussed in previous subsections are resolved.
- Noticeably, the API provides separate methods for normal , non-coherent, and un accesses as well as flush and atomic operations.
- Figure 5 shows how execution flows from a SW context (i.e. a fiber executing the SW model) to the simulator context (i.e. the main fiber for simulation).
- Instead of actually sending a transactional read message to the FPGA, the HAL part of the STM calls into the BFM API (SIM_Noncoh_write) which eventually injects a packet into the simulator (BUS_Inject) and switches context to simulator execution (SIM_return_simulator).
- The network simulator, which performs simple cycle-based simulation, calls clock function for each BFM at each simulation cycle.
4. RESULTS AND DISCUSSION
- The authors new co-simulation environment (Section 3.2), was extremely useful for verifying the functional correctness of their system.
- On one hand, randomized timing helped to explore corner cases in data-race conditions.
- Since most of the software model was executed natively, there was no waste of valuable simulation cycles to execute instructions that were not necessary for functional verification.
- Fourth, the co-simulation environment provided a very helpful error detection mechanism, which was impossible in native execution on FPGA.
- Note that the last row points out which address is violating serialize-ability From this log, the authors were able to relate SW context and HW status, since the simulation cycle is shared by both SW simulation and HDL simulation.
5. CONCLUSION
- The authors presented their HW/SW co-verification experience on a commodity multi-processor system with custom hardware.
- For the sake of functional verification of such a system, it was most effective to combine native SW execution with cycle-based interconnect simulation and HDL simulation by means of a processor BFM.
- Their experiences showed that such BFMs should faithfully reflect the memory consistency models of their target processors and would benefit greatly from randomized packet injection timing in their network simulations.
- These requirements enable the co-simulation to generate a wide variety of data access interleavings, which is essential for co-verification of modern multi-processor systems.
Did you find this useful? Give us your feedback
Citations
44 citations
Cites background from "A case of system-level hardware/sof..."
...Such approaches also incur highly coordinated design and verification effort by both CPU and GPU vendors [24] that is challenging when multiple vendors wish to integrate existing CPU and GPU designs in a timely manner....
[...]
...Building scalable, high-performance cache coherence requires a holistic system that strikes a balance between directory storage overhead, cache probe bandwidth, and application characteristics [8, 24, 33, 36, 54, 55, 58]....
[...]
14 citations
Cites methods from "A case of system-level hardware/sof..."
...As disjoint verification of hardware and software naturally suffers from the considerable manual effort needed for finding a good abstraction that could both be proved to be a refinement of the hardware and be used as a base for verifying the software, some research work has been done to verify hardware/software co-designs as a whole [20,23]....
[...]
3 citations
Cites background from "A case of system-level hardware/sof..."
...Another work [16] presents a system-level codesign and coverification case study....
[...]
2 citations
Additional excerpts
...modules of embedded systems are realized in software [1]....
[...]
2 citations
References
4,019 citations
3,233 citations
460 citations
[...]
442 citations
"A case of system-level hardware/sof..." refers background in this paper
...Transactional Memory (TM) [9] is an abstract programming model that aims to greatly simplify parallel programming....
[...]
389 citations