TL;DR: A synthesis framework that generates a formally verifiable RTL from a high level language using CDFGs from ANSI-C, LRH(+) and VHDL and guarantees detection of hardware redundancy and word-length mismatch related bugs by static code checking.
Abstract: This work presents a synthesis framework that generates a formally verifiable RTL from a high level language. We develop an estimation model for area, delay and power metrics of arithmetic components for Xilinx Spartan 3 FPGA family. Our estimation model works 300 times faster than Xilinx's toolchain with an average error of 6.57\% for delay and 3.76\% for area estimations. Our framework extracts CDFGs from ANSI-C, LRH(+) \cite{Kurumahmut2009} and VHDL. CDFGs are verified using the symbolic model checker NuSMV \cite{nusmv} with temporal logic properties. This method guarantees detection of hardware redundancy and word-length mismatch related bugs by static code checking.
Current system designs aim to pack as much capability as possible in order to meet application specific demands.
This has brought the importance of design automation and emergence of technologies like reconfigurable computing in order to cope with the increasing complexity both algorithmically and quantitatively.
High Level Synthesis (HLS) tools have been providing application description mapping HLL to Register Transfer Level (RTL).
Hence, the uniqueness of their method is to apply static code analysis both to source code and RTL code.
The generation process is explained in Section V-B. Step 2 is their proposed RTL generation process.
II. RELATED WORK
There are several estimation tools for FPGA based implementations.
The method in [6] develops delay and area model at the Data flow graph level (DFG).
Unfortunately, the number of operations that are supported are limited and estimation error is high for some applications.
The work in [21] suggests a verification methodology which converts data-path and control-path specification to a proof script.
The concept of critical states and paths are introduced.
III. ESTIMATION MODEL OF ARITHMETIC COMPONENTS
Shrinking time to market, short product lifetimes increase the necessity of early performance estimation of DSP algorithms during the development, prototyping cycle.
An estimation model for their RH(+) HLS and RTL generation tool is proposed in this section.
Firstly, datapath components are selected according to their delay and area behaviors.
Then, behavior estimations of these components are measured with Xilinx XST tool by synthesizing them on FPGA with changing parameters of these components.
The estimation methodology is handled in two steps; node estimation and graph estimation.
A. Selection of Arithmetic Components
The estimation is modeled for four basic types of operators; adder, subtractor, multiplier and divider.
The subtypes of the adders/subtractors are inspected for design space exploration.
It has been seen that unlike ASIC, RCAs are the fastest and smallest adder in FPGAs because FPGAs have a dedicated logic for implementing RCA.
CSLA has a slight advantage on delay over RCA for the bit sizes wider than 128 but not in area.
The LogiCore IP Multiplier 11.1 and Divider 3.0 from Xilinx IP Core library [3] are used as multiplier and divider components.
B. Parametric Area and Latency Estimation for Nodes
The arithmetic components are synthesized in Xilinx ISE 13.2 with different port sizes.
For that reason, piecewise polynomial functions are applied for these components.
Hardware behavior description (hbd) file is a file with a format that defines the area, delay and power models of arithmetic units.
Property field describes whether this is delay, area or power model.
The line after FunctionType describes the function in a specific format.
C. Area and Latency Estimation of the CDFG
With node estimation, latency and area of every vertex in the CDFG is estimated.
For the estimation of the overall performance, a complete estimation of the CDFG is necessary.
When their estimation model is compared with [6] and [7], Model in [6] make estimations by extracting general information such as how many components are used and what the average input length does the operators have.
In their methodology, the authors calculate the delay and area costs for every component in the datapath and calculate the area estimation by summing these values and calculate the delay estimation by finding critical path with graph processing.
Estimation of their model is more accurate especially when the application‘s critical path is not proportional to graph size because Enzler‘s [6] model does not use critical path in calculations.
IV. RTL GENERATION
The Dataflow graphs that are generated by the RH(+) tool are synthesized to VHDL as Golden-RTL.
Golden RTL is the datapath circuit without any multiplexer and resource sharing.
Golden RTLs are used for verification of design units.
In the graph, every vertex represents an arithmetic operation and edges represent the signals between the operations.
Generation of Golden RTL program consist of two main functions.
A. HDL Generation
The authors generate necessary adder, subtractor, multiplier and divider VHDL files as components and connect them at the top level.
The user can integrate an operator to the RH(+) framework by putting template and parameter files in the library.
The parameter files and template files are used for generating the arithmetic components by the RTL generator software.
After every necessary component is generated, these components are connected at the top level.
Neither inputs nor the outputs will be registered .
V. FORMAL VERIFICATION OF GOLDEN RTL
CDFG represents all the paths that might be traversed through a program during its execution.
Each node in the CDFG represents operations or control structures.
Each edge represents the data dependency between operations.
Since it lies in the middle of RTL level and high-level representation, it is mostly where all the optimizations, reductions and decisions about the system is made.
By using Computational Tree Logic (CTL) and Linear Time Logic (LTL) properties [14], bugs are detected right before converting the CDFG to RTL.
A. CDFG Structure
Vertex and Edge data structures are defined to represent the CDFG.
Vertex data structure holds the necessary information of operations and variables.
They can either be registers, internal or external memory based on the architecture.
It helps to identify dependencies between components.
Apart from pointers to and from Vertices, only a single property is necessary: Name Name: Name of the Edge.
B. Generation of NuSMV Representation
NuSMV [2] is a symbolic model checker developed as the reimplementation and extension of SMV [22].
Hence, this flag is true whenever the variables are initialized or assigned and false otherwise.
Each operation has different contribution to wordlengths.
Based on the precedence levels and transitions in CDFG, the authors make next states assignments.
This routine makes all the ini- tial assignments of the variables and the operators – GenerateState:.
C. Temporal Logic Properties
The temporal properties are written either in CTL or in LTL.
Similar properties are used for software verification in [23].
Therefore by reducing unnecessary variables, power consumption, area and latency performances are improved.
D. Backward Process: Generation of CDFG from RTL
The generation of CDFG from RTL and generation of NuSMV from the new CDFG enables the designer to verify the properties on the generated RTL.
Therefore, after this process, there will be two CDFGs generated: one from the HLL and the other from the generated RTL.
Hence, same properties are expected to yield same results.
The authors apply static analysis on the generated RTL.
By parsing structural mappings, signal/wire connections from RTL and querying components from RH(+) component database the authors build the second CDFG.
VI. EXPERIMENTS
The authors use five benchmarks namely Elliptic wave filter (EW), Auto Regression (AR), Discrete Cosine Transform (DCT), Infinite impulse response filter (IIR), Differential Equation (DE) to evaluate their framework.
These algorithms are coded in LRH(+) and CDFGs are generated by RH(+) compiler.
The authors compare their estimation of area and delay results with Xilinx ISE 13.2 synthesis, place and route reports.
The benchmarks have maximum of twenty eight operations .
By following the states in the counterexample output, one can trace the path that does not logically satisfy the properties which is to be verified.
VII. CONCLUSION
The authors show the process of a data-path synthesis framework which produces formally verifiable RegisterTransfer-Level (RTL) logic from high level languages such as ANSI-C and LRH(+).
Estimation of delay, area and power of Xilinx Spartan 3 FPGA were realized in order to speed up the design phase.
The authors benchmark five different applications for modelling, estimation and verification performance.
The created RTL is verified by four different temporal logic properties for checking redundant hardware creation and wordlength related mismatches.
It has been seen that, generation CPU run time is dependent both on the operator vertex and edge sizes of the graphs.
TL;DR: The tool supports almost all ANSI-C language features, including pointer constructs, dynamic memory allocation, recursion, and the float and double data types, and is integrated into a graphical user interface.
Abstract: We present a tool for the formal verification of ANSI-C programs using Bounded Model Checking (BMC). The emphasis is on usability: the tool supports almost all ANSI-C language features, including pointer constructs, dynamic memory allocation, recursion, and the float and double data types. From the perspective of the user, the verification is highly automated: the only input required is the BMC bound. The tool is integrated into a graphical user interface. This is essential for presenting long counterexample traces: the tool allows stepping through the trace in the same way a debugger allows stepping through a program.
1,425 citations
"A Verifiable High Level Data Path S..." refers background in this paper
...Model checking of high level programs have existed for some time [15], [16]....
TL;DR: The symbolic model checking technique revealed subtle errors in this protocol, resulting from complex execution sequences that would occur with very low probability in random simulation runs, and an alternative method is developed for avoiding the state explosion in the case of asynchronous control circuits.
Abstract: Finite state models of concurrent systems grow exponentially as the number of components of the system increases. This is known widely as the state explosion problem in automatic verification, and has limited finite state verification methods to small systems. To avoid this problem, a method called symbolic model checking is proposed and studied. This method avoids building a state graph by using Boolean formulas to represent sets and relations. A variety of properties characterized by least and greatest fixed points can be verified purely by manipulations of these formulas using Ordered Binary Decision Diagrams.
Theoretically, a structural class of sequential circuits is demonstrated whose transition relations can be represented by polynomial space OBDDs, though the number of states is exponential. This result is born out by experimental results on example circuits and systems. The most complex of these is the cache consistency protocol of a commercial distributed multiprocessor. The symbolic model checking technique revealed subtle errors in this protocol, resulting from complex execution sequences that would occur with very low probability in random simulation runs.
In order to model the cache protocol, a language was developed for describing sequential circuits and protocols at various levels of abstraction. This language has a synchronous dataflow semantics, but allows nondeterminism and supports interleaving processes with shared variables. A system called SMV can automatically verify programs in this language with respect to temporal logic formulas, using the symbolic model checking technique.
A technique for proving properties of inductively generated classes of finite state systems is also developed. The proof is checked automatically, but requires a user supplied process called a process invariant to act as an inductive hypothesis. An invariant is developed for the distributed cache protocol, allowing properties of systems with an arbitrary number of processors to be proved.
Finally, an alternative method is developed for avoiding the state explosion in the case of asynchronous control circuits. This technique is based on the unfolding of Petri nets, and is used to check for hazards in a distributed mutual exclusion circuit.
TL;DR: A method of compositional verification is presented that uses the combination of temporal case splitting and data type reductions to reduce types of infinite or unbounded range to small finite types, and arrays of infiniteor unbounded size to small fixed-size arrays.
Abstract: A method of compositional verification is presented that uses the combination of temporal case splitting and data type reductions to reduce types of infinite or unbounded range to small finite types, and arrays of infinite or unbounded size to small fixed-size arrays. This supports the verification by model checking of systems with unbounded resources and uninterpreted functions. The method is illustrated by application to an implementation of Tomasulo's algorithm, for arbitrary or infinite word size, register file size, number of reservation stations and number of execution units.
191 citations
"A Verifiable High Level Data Path S..." refers background in this paper
...There also exist off the shelf and research-level hardware model checkers at the netlist level [17], [18] and at CDFG level [19] with standard logic assertions....
TL;DR: An area and delay estimator in the context of a compiler that takes in high level signal and image processing applications described in MATLAB and performs automatic design space exploration to synthesize hardware for a field programmable gate array (FPGA) which meets the user area and frequency specifications.
Abstract: We present an area and delay estimator in the context of a compiler that takes in high level signal and image processing applications described in MATLAB and performs automatic design space exploration to synthesize hardware for a field programmable gate array (FPGA) which meets the user area and frequency specifications. We present an area estimator which is used to estimate the maximum number of configurable logic blocks (CLBs) consumed by the hardware synthesized for the Xilinx XC4010 from the input MATLAB algorithm. We also present a delay estimator which finds out the delay in the logic elements in the critical path and the delay in the interconnects. The total number of CLBs predicted by us is within 16% of the actual CLB consumption and the synthesized frequency estimated by us is within an error of 13% of the actual frequency after synthesis through Synplify logic synthesis tools and after placement and routing through the XACT tools from Xilinx. Since the estimators proposed by us are fast and accurate enough, they can be used in a high level synthesis framework like ours to perform rapid design space exploration.
88 citations
"A Verifiable High Level Data Path S..." refers methods in this paper
...In [8], an estimation technique is proposed dealing with a MATLAB specification....
TL;DR: This paper proposes a high-level estimation methodology for area and performance parameters of regular FPGA designs to be found in multimedia, telecommunications or cryptography and presents the estimation approach as well as evaluation results that prove the suitability of the proposed estimation approach.
Abstract: Field-programmable gate arrays (FPGAs) have become increasingly interesting in system design and due to the rapid technological progress ever larger devices are commercially affordable. These trends make FPGAs an alternative in application areas where extensive data processing plays an important role. Consequently, the desire emerges for early performance estimation in order to quantify the FPGA approach and to compare it with traditional alternatives.
In this paper, we propose a high-level estimation methodology for area and performance parameters of regular FPGA designs to be found in multimedia, telecommunications or cryptography. The goal is to provide a means that allows early quantification of an FPGA design and that enables early trade-off considerations. We present our estimation approach as well as evaluation results, which are based on several implemented applications and prove the suitability of the proposed estimation approach.
87 citations
"A Verifiable High Level Data Path S..." refers background or methods in this paper
...Generation of Golden RTL program consist of two main functions....
[...]
...The method in [6] develops delay and area model at the Data flow graph level (DFG)....
[...]
...Golden RTLs are used for verification of design units....