A case of system-level hardware/software co-design and co-verification of a commodity multi-processor system with custom hardware
TL;DR: This paper presents an interesting system-level co-design and co-verification case study for a non-trivial design where multiple high-performing x86 processors and custom hardware were connected through a coherent interconnection fabric and used a processor bus functional model to combine native software execution with a cycle-accurate interconnect simulator and an HDL simulator.
Abstract: This paper presents an interesting system-level co-design and co-verification case study for a non-trivial design where multiple high-performing x86 processors and custom hardware were connected through a coherent interconnection fabric. In functional verification of such a system, we used a processor bus functional model (BFM) to combine native software execution with a cycle-accurate interconnect simulator and an HDL simulator. However, we found that significant extensions need to be made to the conventional BFM methodology in order to capture various data-race cases in simulation, which eventually happen in modern multi-processor systems. Especially essential were faithful implementations of the memory consistency model and cache coherence protocol, as well as timing randomization. We demonstrate how such a co-simulation environment can be constructed from existing tools and software. Lessons from our study can similarly be applied to design and verification of other tightly-coupled systems.
Summary (2 min read)
- Modern digital systems are moving increasingly towards heterogeneity.
- The authors discuss which among the many conventional co-design/verification methods would best serve their purposes and why.
- The authors show the effectiveness of their methodology and draw out general lessons from it (Section 4), before they conclude in Section 5.
- The authors found that combining the software model with an interconnection simulator and HDL simulator via a processor BFM is the most effective method for functional verification.
- The authors also explain, however, that conventional ways of constructing a processor BFM should be revised in accordance with modern multi-core processors and inter- connection architecture; it is especially important to accurately implement the memory consistency model and cache coherence protocol.
2.1 Target Design
- This section outlines the design of their system for the sake of providing sufficient background context for one to understand their co-design and co-verification issues, while the detailed design of the system is outside scope of this paper.
- A typical software transactional memory (STM), an implementation of such a runtime system solely with software, tends to exhibit huge performance overhead.
- (6) The TM hardware, based on all the read/write address received from all the cores, now determines which cores have conflicting read and writes and sends out the requisite messages to those cores.
- The system is composed of two quad-core x86 CPUs and an FPGA that is connected coherently via a chain of point-to-point links.
- Messages from the CPU are sent to their HW via a non-coherent interface, while responses to the CPU go through the coherent cache.
2.2 Our Initial Failure and Issues with Co-Verification
- Since their system was composed of tightly coupled hardware and software, the authors had to deal with a classic chickenand-egg co-verification problem:.
- A crash observed after one last memory access from one core (e.g. de-referencing a dangling pointer), could be a result of a write, falsely allowed to commit, from another core millions of cycles before.
- The new STM software needed to be intensely validated (with the new hardware), especially under the assumption of parallel execution, variable latency and out-of-order message delivery as shown in Figure 2.
- An alternative was to use a detailed architecture simulator (e.g. ) but its simulation speed was insufficient.
- This method brought its own challenges which the authors discuss in detail in the following section.
3.1 General Issues and Solutions
- The authors discuss general issues that arise when using a software model for HW/SW co-verification and how they overcame those issues.
- Hardware simulation can consist of two different components: a cycle-accurate interconnection network simulation and HDL simulation.
- Instead, the authors rely on the single-threaded simulator to interleave multiple software execution contexts.
- (Issue #4) The memory consistency model must be carefully considered.
- Otherwise, the contents in the store buffer, up to the entry that has been matched, are flushed before the new packet is injected.
3.2 Our Co-simulation Environment: Implementation
- This subsection details the implementation of their co-simulation environment where all the issues discussed in previous subsections are resolved.
- Noticeably, the API provides separate methods for normal , non-coherent, and un accesses as well as flush and atomic operations.
- Figure 5 shows how execution flows from a SW context (i.e. a fiber executing the SW model) to the simulator context (i.e. the main fiber for simulation).
- Instead of actually sending a transactional read message to the FPGA, the HAL part of the STM calls into the BFM API (SIM_Noncoh_write) which eventually injects a packet into the simulator (BUS_Inject) and switches context to simulator execution (SIM_return_simulator).
- The network simulator, which performs simple cycle-based simulation, calls clock function for each BFM at each simulation cycle.
4. RESULTS AND DISCUSSION
- The authors new co-simulation environment (Section 3.2), was extremely useful for verifying the functional correctness of their system.
- On one hand, randomized timing helped to explore corner cases in data-race conditions.
- Since most of the software model was executed natively, there was no waste of valuable simulation cycles to execute instructions that were not necessary for functional verification.
- Fourth, the co-simulation environment provided a very helpful error detection mechanism, which was impossible in native execution on FPGA.
- Note that the last row points out which address is violating serialize-ability From this log, the authors were able to relate SW context and HW status, since the simulation cycle is shared by both SW simulation and HDL simulation.
- The authors presented their HW/SW co-verification experience on a commodity multi-processor system with custom hardware.
- For the sake of functional verification of such a system, it was most effective to combine native SW execution with cycle-based interconnect simulation and HDL simulation by means of a processor BFM.
- Their experiences showed that such BFMs should faithfully reflect the memory consistency models of their target processors and would benefit greatly from randomized packet injection timing in their network simulations.
- These requirements enable the co-simulation to generate a wide variety of data access interleavings, which is essential for co-verification of modern multi-processor systems.
Did you find this useful? Give us your feedback
Cites background from "A case of system-level hardware/sof..."
...Such approaches also incur highly coordinated design and verification effort by both CPU and GPU vendors  that is challenging when multiple vendors wish to integrate existing CPU and GPU designs in a timely manner....
...Building scalable, high-performance cache coherence requires a holistic system that strikes a balance between directory storage overhead, cache probe bandwidth, and application characteristics [8, 24, 33, 36, 54, 55, 58]....
Cites methods from "A case of system-level hardware/sof..."
...As disjoint verification of hardware and software naturally suffers from the considerable manual effort needed for finding a good abstraction that could both be proved to be a refinement of the hardware and be used as a base for verifying the software, some research work has been done to verify hardware/software co-designs as a whole [20,23]....
...modules of embedded systems are realized in software ....
Cites background from "A case of system-level hardware/sof..."
...Another work  presents a system-level codesign and coverification case study....
"A case of system-level hardware/sof..." refers background in this paper
...Transactional Memory (TM)  is an abstract programming model that aims to greatly simplify parallel programming....
Related Papers (5)
Frequently Asked Questions (1)
Q1. What contributions have the authors mentioned in the paper "A case of system-level hardware/software co-design and co-verification of a commodity multi-processor system with custom hardware" ?
This paper presents an interesting system-level co-design and co-verification case study for a non-trivial design where multiple high-performing x86 processors and custom hardware were connected through a coherent interconnection fabric. The authors demonstrate how such a co-simulation environment can be constructed from existing tools and software. However, the authors found that significant extensions need to be made to the conventional BFM methodology in order to capture various data-race cases in simulation, which eventually happen in modern multiprocessor systems.