A Reconfigurable Neural Network ASIC for Detector Front-End Data Compression at the HL-LHC
read more
Citations
Which Metric on the Space of Collider Events
Which metric on the space of collider events?
Smart sensors using artificial intelligence for on-detector electronics and ASICs
Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml
Electronics for Fast Timing
References
Rectified Linear Units Improve Restricted Boltzmann Machines
Deep Sparse Rectifier Neural Networks
The CMS experiment at the CERN LHC
Machine learning and the physical sciences
The use of triple-modular redundancy to improve computer reliability
Related Papers (5)
Real-time clustering for pixel detectors: The DCE3 ASIC for the PXD detector in the Belle II experiment @KEK
Frequently Asked Questions (18)
Q2. What is the constraints for the compression algorithms?
The constraints for the compression algorithms are that they should accept new input data at 40 MHz and complete processing in 50 ns.
Q3. How many operations are required to perform the encoder architecture?
The encoder architecture set by the model requires approximately 225,000 multiply-accumulate operations to performAuthorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY.
Q4. What are the elements needed to design their compression algorithm?
There are a number of elements needed to design their compression algorithm: a sample of events for training and validation, a preprocessing and normalization block, an optimized NN architecture, and metrics for evaluating the NN performance, both for determining the training loss and the final network evaluation.
Q5. What is the main reason why Mentor has developed a HLS tool?
In recent years, HLS has become an alternative for generating hardware modules from code written in programming languages such as C/C++. HLS comes with significant benefits: it raises the level of abstraction and reduces the simulation time; it simplifies the verification phases; and finally, it makes the exploration and evaluation of design alternatives easier.
Q6. What is the design of the digital converter?
The digital design consists of three major functional blocks: (i) A converter which is a classical module designed with HLS; (ii) An encoder, which uses hls4ml; (iii) and an I2C peripheral which uses a SystemVerilog RTL code.
Q7. What is the newest version of the ECON-T ASIC?
The ECON-T ASIC is being developed for the LPCMOS (Low Power CMOS) 65 nm feature size technology and is under active development for CMS.
Q8. How can a fixed ASIC be designed?
the authors show that in spite of a fixed ASIC implementation, ML algorithms can still be designed with sufficient flexibility to enable reconfiguration for new operational conditions.
Q9. How long does it take to implement a digital system?
The digital implementation stage is time intensive, requiring ⇠65 hours of design and verification to meet the speed and area constraints with fewer iterations.
Q10. What is the design of the ECON-T ASIC?
The design demonstrates how complex NN architectures can be implemented on the front-end ASICs with realistic area constraints, allowing for minimal loss of information in the trigger data stream.
Q11. What is the importance of a rapid co-design loop between the NN algorithm training and?
For the automated design tool flow, it is very important to have a rapid co-design loop between the NN algorithm training and the implementation in hardware in order to understand whether the algorithm is meeting system constraints for power, area, and performance simultaneously.
Q12. What is the range of the output bits?
The range of the output bits depends on the location of the sensor module in the detector and the number of links available for a given ECON-T ASIC to transmit the data.
Q13. What is the difference between QAT and PTQ?
QAT is known to be much more performant than post-training quantization (PTQ), where the training is done using 32-bit floating-point operations, which are then truncated post-training to fixed-point or integer representations.
Q14. What is the process for generating an accelerator in HDL RTL code?
The generated code is then fed into the Vivado HLS tool to generate an accelerator in HDL RTL code for the deployment on Xilinx FPGAs [16].
Q15. What is the NN encoder for a high-occupancy regime?
The autoencoder is robust across a variety of conditions and performs well in the high-occupancy regime, which poses the greatest challenge for trigger reconstruction.
Q16. How many ns does the ieee algorithm take to perform?
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.vector multiplications every 25 ns.
Q17. What is the difference between normalized and convolutional NN?
The normalized NN inputs are truncated to 8 bits to allow a more compact NN implementation, while ensuring that any omitted cells constitute less than 1% of the total energy recorded within a module.
Q18. How many weight parameters are used to configure the dense layer?
This leads to a total of 2,288 weight parameters (dominated by the 2,064 parameters used to configure the dense layer), each of which are specified withFig.