Accelerating thermal simulations of 3D ICs with liquid cooling using neural networks
Summary (4 min read)
1. INTRODUCTION
- GPUs have become in the recent times a computing mainstay in many different applications other than graphics processing.
- The authors create and train a NN using 3D-ICE [4, 5], a conventional compact model-based thermal simulator for liquid-cooled 3D ICs.
- We further improve the reduction techniques to find the optimal configuration of the NN-based simulator to maximize the speed-ups.the authors.the authors.
- Experimental results are presented in Section 5.
2. THERMAL MODELING FOR LIQUID-COOLED 3D ICS
- Interlayer liquid cooling of 3D ICs is accomplished using microchannels that are etched behind each die that is stacked in the IC.
- Cold fluid is injected via a reservoir from one end (the inlet) and warm fluid exits into another reservoir at the other end (the outlet).
- The equivalent RC circuit that models the liquid-cooled 3D IC is represented by the following ordinary differential equations: GX(t) + CẊ(t) = U(t), (1) where G and C are, respectively, the conductance and capacitance matrices.
- Once the system has been formulated in this manner, it is solved via numerical integration using backward Euler method.
- In order to reduce the problem size, a modified version of the above thermal model based on porous mediums [5] was used in this work.
3. REVIEW OF NN-BASED SIMULATOR FOR CONVENTIONAL ICS
- A NNbased thermal simulator for conventional air-cooled 2D/3D ICs was proposed in [9].
- The authors proposed to train a neural network to reproduce the dependence in Eq. (2): each neuron in the model represents the thermal state of a given point (node or thermal cell) in the volume of the IC.
- For this training, the neural network must be given training samples that describe, for each neuron, the dependencies between the inputs and the corresponding output.
- The key idea in this process is to remove from Eq. (4) all the terms in both the summations on the RHS that do not contribute significantly to the output.
- Another innovation in this simulator was that instead of using a training set made with samples representing a real scenario where a floorplan is used, power traces with random values between zero and the maximum power density of the design is fed to the neural network during training.
4. PROPOSED NN-BASED SIMULATOR FOR LIQUID-COOLED 3D ICS
- The NN-based simulator for liquid-cooled 3D ICs proposed in this work is trained in a similar manner to the model described in the preceding section.
- Thus, neurons represent the temperatures of nodes in the active layers.
- The neural network is constructed as mentioned before and trained to mimic the behavior of Eq. (2).
- The nature of heat flow in liquid-cooled 3D ICs is fundamentally different from conventional ICs and the model must be adapted to capture these different heat dissipation paths.
- The aspects of the proposed NN-based simulator are discussed in the ensuing subsections.
4.1 Training methodology
- There are several algorithms available in the literature for training neural networks.
- Since the objective here is to train the NN weights in Eq. (4) to mimic the behavior of the deterministic system in Eq. (2) based on a set of inputs and outputs, a batch training algorithm like RPROP, that was used in [9], must be used.
- Also, given that there are as many unknown weights to be computed per neuron as there are inputs for it, a minimum number of training samples must be provided to the training algorithm for accurate estimation of the weights.
- Since it is common to design a liquid-cooled 3D IC for variable flow rates, training must be performed for each flow rate value intended in the final design.
- To solve this, the authors deploy the least-squares method using QR decomposition in this work.
4.2 Proximity-based reduction
- The heat flow patterns in liquid-cooled 3D ICs is fundamentally different from conventional air-cooled ICs.
- In liquid-cooled 3D ICs, there are indeed two major paths of heat flow as shown in Figure 3: one vertical, from the active layers towards the microchannel heat sinks, and one along the microchannels, the path along with heat is carried by the coolant.
- Two cases are shown in Figure 3: for one neuron in the bottom-right corner of the IC, the proximity region is much smaller than the proximity of the neuron in the top-left corner.
- This residual value serves as an indicator of the run-time accuracy of the NN-based simulator that has been constructed.
- Then, for all the remaining neurons, it extracts the target training output (line 6) and computes their weights as the unknowns of a linear system (line 7).
4.3 Optimal proximity profile
- Due to the sensible heat absorption in microchannel heat sinks, temperatures increase as the distance from the inlet increases.
- In addition to causing nodes near the outlet to be considerably hotter than those near the inlet, this phenomenon causes unequal lateral spreading of heat in the ICs from inlet to outlet.
- On the other hand, using extremely large constant proximity distances to compensate for these errors near the outlet increases the complexity of the entire simulator, while being an overkill for neurons near the inlet.
- The idea behind this algorithm is to use the given proximity to compute the weights only for the neurons in the last row and save the maximum value of the residual found in this first part (lines 1-10).
- Using Figure 3 as reference, MinLength is the value that turns the proximity region of the neuron in the bottom layer into a region as large as the one that belongs to the neuron in the top layer.
4.4 Running the NN-based simulator
- Once the proposed NN-based simulator is trained, it can be used to simulate the temperatures of the target liquidcooled 3D ICs using matrix vector multiplications.
- Essentially, the weights computed during the training are stored in the form of a sparse matrix in the GPU global memory.
- Then, the power traces and the thermal states are sent from the CPU memory whenever needed.
- MaxRes then 20: Proximity ← Proximity +MinLength 21: restart ci loop 22: end if 23: end for 24: end for 25: end for of inputs, which get multiplied by the weights to obtain the thermal state in the next time step.
- In their implementation, the authors used the cuSPARSE library [11] for these computations on their GPU platform.
5. EXPERIMENTAL RESULTS
- The structural properties of the stack are shown in Table 1.
- This fixes the number of neurons in the network to 2520, as the authors define a neuron for each thermal cell in the three active layers.
- During the training phase, data was generated using a randomized floorplan with random power traces bounded by a maximum heat flux of 90W/cm2.
- In the following experiments, “LLL” refers to the case when all the dies have the floorplan configuration “L” in Figure 4.
5.1 Computation of Optimal Proximity
- During each training, the maximum value of residuals for each neuron was stored.
- As can be seen from this figure, the maximum residual from training and the maximum run-time error show the same trend.
- Figure 6 shows the optimal proximity profile computed both using Algorithm 2 and using run-time error.
- In each case, the authors measure the maximum runtime error as a function of distance along the channel.
- Hence, the authors can conclude that Algorithm 2 is a reliable method to compute the optimal proximity profile during training and the run-time errors resulting from it is less than the intended tolerance.
5.2 Error analysis
- Figure 7 shows the evolution of the maximum run-time error incurred by the proposed NN-based simulator (trained using Algorithm 2), when it is used to simulate Test 3D IC with CCC configuration.
- Errors in this figure are reported for the three different flow rates.
- As can be seen from these plots, the error decreases in each case with increasing proximity values.
- Figure 8 reports the maximum run-time error obtained when NN-based simulator was used to simulate the Test 3D IC for the same flow rate (36ml/min) but with different floorplan configurations.
- As can be seen, the run-time independent of the floorplan configuration.
5.3 Performance analysis
- To compare the performances of the two training techniques (Algorithms 1 and 2) simulation speed ups of NNbased simulators trained using both these techniques running on GPU, against 3D-ICE running on CPU.
- This experiment was repeated for different maximum proximity values and flow rates.
- Figure 9 shows the increase (in percentage) of speedup obtained with the introduction of the training Algorithm 2 over the Algorithm 1.
- Finally, to illustrate the scalability of their neural networkbased thermal simulator, Figure 10 shows how the GPU speedup changes with increasing number of dies (in other words, the problem size), increasing flow rate and trained for various error tolerances (1.0◦C, 0.5 ◦C and 0.1 ◦C).
- In all these experiments, the dies are interleaved with channel cavities, similar to the Test 3D IC.
6. CONCLUSIONS
- In this work the authors presented a Neural Network-based thermal model that can be run on massively parallel architectures like GPUs to accelerate thermal simulations of 3D ICs with liquid cooling.
- The authors also introduced a new training technique that exploits the horizontal heat flow due to the coolant passing through the channels to reduce the simulation time without worsening the run time error.
- Results show that the speedups obtained comparing the simulation times against a compact model running on CPU ranges from 35x, to limit the run-time error under 0.1◦C, up to 106x if errors lower than 1.0◦C are accepted.
Did you find this useful? Give us your feedback
Citations
7 citations
Cites background or methods from "Accelerating thermal simulations of..."
...Several methods have been proposed to build thermal models of liquid-cooled 3D ICs [8], [9], [3], [12], [4], [13]....
[...]
...Current models deal with the complexity of transient temperature evaluation with high computational load, memory use, and long simulation times [9], [12], [13], or low accuracy [4]....
[...]
...propose to accelerate transient analysis using neural networks [13]....
[...]
...The GPU implementation of [13] reports a speedup of 35-100x against 3D-ICE running on CPUs, depending on the level of accuracy....
[...]
...Vincenzi [13] Transient Neural Network O(Nx)† GPGPU † 1 < x < 2 depending on the chosen accuracy...
[...]
1 citations
References
1,387 citations
296 citations
161 citations
144 citations
56 citations