## Dynamic Thermal Management for FinFET-Based Circuits Exploiting the Temperature Effect Inversion Phenomenon Woojoo Lee, Yanzhi Wang, Tiansong Cui, Shahin Nazarian and Massoud Pedram University of Southern California, CA, USA {woojoole, yanzhiwa, tcui, snazaria, pedram}@usc.edu #### ABSTRACT Due to limits on the availability of the energy source in many mobile user platforms (ranging from handheld devices to portable electronics to deeply embedded devices) and concerns about how much heat can effectively be removed from chips, minimizing the power consumption has become a primary driver for system-on-chip designers. Because of their superb characteristics, FinFETs have emerged as a promising replacement for planar CMOS devices in sub-20nm CMOS technology nodes. However, based on extensive simulations, we have observed that the delay vs. temperature characteristics of FinFET-based circuits are fundamentally different from that of the conventional bulk CMOS circuits, i.e., the delay of a FinFET circuit decreases with increasing temperature even in the super-threshold supply voltage regime. Unfortunately, the leakage power dissipation of the FinFET-based circuits increases exponentially with the temperature. These two trends give rise to a tradeoff between delay and leakage power as a function of the chip temperature, and hence, lead to the definition of an optimum chip temperature operating point (i.e., one that balances concerns about the circuit speed and power efficiency.) This paper presents the results of our investigations into the aforesaid temperature effect inversion (TEI) and proposes a novel dynamic thermal management (DTM) algorithm, which exploits this phenomenon to minimize the energy consumption of FinFET-based circuits without any appreciable performance penalty. Experimental results demonstrate 40% energy saving (with no performance penalty) can be achieved by the proposed TEI-aware DTM approach compared to the best-inclass DTMs that are unaware of this phenomenon. ## 1. INTRODUCTION With the dramatic downscaling of layout geometries, the traditional bulk CMOS technology has hit critical roadblocks, namely increasing leakage current and power consumption induced by the short-channel effects (SCEs) and the increasing variability levels. To overcome such drawbacks, FinFET devices, a special kind of quasi-planar double gate (DG) devices, have been proposed as an alternative for the bulk CMOS as technology scales down below the 20nm technology node [1, 2]. This is due to more effective channel control, higher ON/OFF current ratios, and superior voltage scalability features of FinFET devices. DVFS (Dynamic Voltage and Frequency Scaling) is a well-known technique for minimizing power in VLSI designs by reducing the supply voltage and clock frequency to the minimum values that are needed to meet a given performance level. Indeed, a number Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. ISLPED'14, August 11–13, 2014, La Jolla, CA, USA. Copyright 2014 ACM 978-1-4503-2975-0/14/08 ...\$15.00. of recent studies of ultra-voltage scaled designs (i.e., circuits that operate at near/sub-threshold supply voltage levels) have proven the value of voltage scaling to very low supply voltage levels esp. when the performance targets are loose [3, 4]. The wide-range voltage scalability of FinFET devices enables them to outperform bulk CMOS devices in ultra-low power designs [5]. Meanwhile, as power density has continued to increase with the technology scaling, the accompanying high rate of heat generation has become a growing concern. The leakage current of a circuit increases exponentially with the increasing temperature [6] and this positive feedback mechanism between leakage power and temperature can result in a thermal runaway situation. Dynamic thermal management (DTM) has been proposed as an effective technique to control the over-heating of the circuit by maintaining the circuit temperature below a critical temperature threshold, while affecting circuit performance as little as possible. Several DTM response mechanisms (control knobs) e.g., fetch-toggling, dynamic thread migration, frequency throttling and DVFS, have been introduced [7, 8, 9]. A few of researchers have focused on developing resource management, task assignment, and scheduling policies to achieve the highest performance [6, 10] or the minimum energy consumption [11] under the condition that the target system hardware remains temperature-safe. The previous DTM works have tackled the question of how to limit the peak temperature on circuit substrates comprised of planar CMOS devices running in the super-threshold voltage regime to save power or maximize performance. To the best of our knowledge no previous work has studied the question of optimal DTM policy design for FinFET-based VLSI circuits that can operate in any of the super, near or sub-threshold regimes. This is an important point because the delay versus temperature behavior of FinFET devices and circuits is different from that of the conventional bulk CMOS devices operating in the super-threshold regime. For commercial bulk CMOS standard cell library operating at super-threshold $V_{dd}$ supply voltages, the worst-case (longest) path delay occurs at the highest temperature. However, in the near/subthreshold regime [12, 13] or in high-vt devices [14], it has been reported that the delay of these circuits decreases with increasing temperature. On the other hand, for various circuits designed using the PTM-MG FinFET libraries under 20nm bulk CMOS technology [15], a first observation from our SPICE simulations is that the circuits run faster at higher temperatures in all supply voltage regimes (including the super-threshold one.) This will be called as the *Temperature Effect Inversion* (TEI) phenomenon. A second observation is that, in the near/sub threshold regimes, the delay decrease for a fixed amount of die temperature increase is larger in FinFET-based designs compared to planar CMOS based designs. This paper starts from exploring the delay vs. temperature behavior of FinFET-based designs, which forces the worst-case delay of these circuits to occur at low temperatures (e.g., -25°C). Our objective is to minimize the circuit energy consumption without any performance penalty. Given a DVFS schedule derived from the worst-case (at, say, -25°C) delay at various voltage levels, the motivation is to scale down the voltage level when the circuit temperature is high enough such that the delay from the lower volt- age level is no larger than the worst-case delay from the original higher voltage level. This method can achieve significant energy reduction without performance penalty due to the following three reasons: (i) lowering down the voltage level will quadratically reduce the dynamic energy of the circuit and also reduce the leakage energy/power, (ii) lowering down the voltage level may slow down the rising speed of temperature, or may even reduce the temperature in presence of a heatsink (e.g., the ambient environment for mobile devices), and will exponentially reduce the leakage power, and (iii) the operating frequency determined by the worst-case delay of the higher original voltage can be maintained after the voltage scaling. Based on in-depth studies of the influence of the TEI on the energy consumption of the FinFET circuits and the key idea described above, we present a novel DVFS-based thermal management method to minimize energy consumption with no performance loss. In this proposed DTM, we effectively find the optimal temperature point to maximize energy efficiency of the circuits, and introduce new voltage scaling policies to make the circuits operate at the optimal point. Along with a detailed description of our experimental work, we validate the proposed thermal management algorithm on the four different FinFET circuits designed based on various PTM-MG technology libraries. We perform SPICE simulations on each circuit with various voltage levels in the full (possible) operating temperature range. Experimental results demonstrate some 40% energy saving (with no performance penalty) can be achieved by the proposed TEI-aware DTM approach compared to the best-in-class DTMs that are unaware of this phenomenon. # 2. TEMPERATURE EFFECT INVERSION (TEI) PHENOMENON IN FinFETs For VLSI circuits, the delay of a logic gate is directly affected by the driving current $(I_{on})$ . As $I_{on}$ increases, the logic gate switches faster, and vice versa. For a conventional MOSFET operating at superthreshold $V_{dd}$ (e.g., 0.9 V), it is well known that the rising temperature will result in a reduced $I_{on}$ and eventually aggravate the speed of circuit. That is why the worst-case timing corner for the commercial MOSFET standard cell library at superthreshold $V_{dd}$ occurs at the highest temperature (e.g., $125^{\circ}$ C). It has been reported that fabricated FinFETs operating at superthreshold $V_{dd}$ show the opposite behavior of MOSFET, i.e., $I_{on}$ increases as the die temperature rises [16]. Some FinFET-based circuits based on 32nm PTM have shown the similar result when operating at superthreshold $V_{dd}$ [17]. Reference [18] analyzed this opposite temperature influence on $I_{on}$ , illustrating that this effect results from the bandgap narrowing and carrier mobility changes, which are induced by *tensile stress effect* of the insulator in the FinFET structure. As technology scales down (e.g., beyond 30nm), the tensile stress from the insulator layer to the fin body (cf. Figure 2) affects the device characteristics more significantly. In other words, because the thinner fin body has larger stress, the stress-induced bandgap narrowing results in a more significant decrease of the threshold voltage $V_{th}$ . And, with increasing of the tempera- Figure 2: Three-dimensional structure of the bulk FinFET ture, the tensile stress becomes larger, which decreases $V_{th}$ as well as induces a slight change of the carrier mobility $\mu$ for FinFETs. Finally, the changes of $V_{th}$ and $\mu$ can directly affect $I_{on}$ of Fin-FET in the super-threshold operation regime. Generally, $I_{on}(T)$ as a function of the temperature T can be expressed as: $$I_{on}(T) = \begin{cases} \mu(T)e^{\frac{V_{gs} - V_{th}(T)}{S(T)}} & : \text{if } V_{gs} < V_{th} \\ \mu(T)(V_{gs} - V_{th}(T))^{\beta} & : \text{otherwise,} \end{cases}$$ (1) where $V_{gs}$ is the gate-source voltage, S is the subthreshold swing, $\beta$ is the velocity saturation effect factor. S, $\mu$ , and $V_{th}$ are the temperature dependent parameters. Due to the tensile stress with rising T, decreasing $V_{th}$ along with a slight change of $\mu$ result in an increasing $I_{on}$ , thereby decreasing the delay of logic gate. Meanwhile, conventional MOSFETs operating in the sub/near-threshold regime or high-vt devices have shown the similar phenomenon (indeed, more significant than what was observed in Fin-FETs with super-threshold $V_{dd}$ ) that the circuit delay decreases with the increasing temperature [12, 13, 14]. As temperature increases, $\mu$ and $V_{th}$ of MOSFETs decrease while S increases. From (1), $I_{on}$ in the sub-threshold regime is exponentially and dominantly dependent on $V_{th}$ and S, which is different from the case that $I_{on}$ is a nearly linear function of $V_{th}$ and $\mu$ in the super-threshold regime. As a consequence, different from the super-threshold regime where the slightly stronger effect of $\mu$ than that of $V_{th}$ causes decreasing $I_{on}$ with increasing T, the changes of $V_{th}$ and S considerably increases $I_{on}$ in the sub/near-threshold regime, and thus the gate can run much faster. $I_{on}$ of FinFETs operating in the sub/near-threshold regime also has the same exponential dependency on $V_{th}$ and S. Combined with the tensile stress effect, FinFETs in the sub/near-threshold regime exhibit a significant delay reduction as the temperature goes high. We conclude that temperature increase makes FinFETs run faster at all the supply voltage levels. As stated earlier, we call this phenomenon temperature effect inversion (TEI) in FinFETs. Figure 1 shows simulated results from four FinFET technologies: 20nm, 16nm, 14nm and 10nm. We can observe that all the technologies beyond 20nm clearly show the TEI phenomenon. The delay results of each technology is normalized by the delay at the nominal $V_{dd}$ (in the super-threshold) at 125°C, which is shown as the dashed line in the figure. We can see that the delay at 125°C is not the worst case any more, but in fact the best case. Rather, the worst case delay for each $V_{dd}$ level occurs at the lowest temperature (e.g., -25°C). Figure 1: Delay at different temperatures and supply voltage levels, from FinFET-based FO4 inverter chain simulations. Figure 3: Leakage power at different temperatures and supply voltage levels, based on the 20nm FinFET technology. ## 3. POWER AND THERMAL MODELS The power consumption of VLSI circuits has two components: a dynamic part and static (leakage) part. The dynamic power $P_{dynamic}$ is given by $P_{dynamic} = \alpha C V_{dd}^2 f$ , where $\alpha$ is the activity factor, C is the switching capacitance, and f is the clock frequency. It is known that the static power $P_{static}$ has a dependence on the die temperature $T_{die}$ and $V_{dd}$ , which can be expressed as: $$P_{static}(T_{die}, V_{dd}) = V_{dd} \left( c_1 T_{die}^2 e^{\left(\frac{c_2 V_{dd} + c_3}{I_{die}}\right)} + c_4 e^{\left(c_5 V_{dd} + c_6\right)} \right), \quad (2)$$ where the first term is the sub threshold leakage, and the second term after the plus symbol is the gate leakage; $c_1$ to $c_6$ are technology dependent parameters [11]. Figure 3 shows the changes of $P_{static}$ as a function of the elevated $T_{die}$ at different $V_{dd}$ 's, resulted from the simulations based on the 20nm bulk FinFETs. We use the conventional RC-circuit thermal model, which is shown in Figure 4 (a) [19]. In the figure, $P_{circuit}$ denotes the heat generated by the circuit, which is the sum of $P_{dynamic}$ and $P_{static}$ ; $P_{amb}$ is the heat dissipated to the ambience; $T_{amb}$ is the ambient temperature; and $C_{die}$ and $R_{die-amb}$ are the thermal capacitance of the circuit die and the thermal resistance from the die to the ambiance, respectively. Because we target the whole mobile device, modeling the on-chip thermal variations within the device [20] is less critical. Thus we do not account for thermal variations in this paper. Additionally, we do not include a separate heat sink, because in our target device there is none. Notice that, if we target a large scale chip that equips heatsinks or coolers, the spatial thermal variations should be taken into consideration, which may require to develop the more sophisticated thermal models and accompanying control logics (e.g., the feedback controller) to be robust to the modeling errors. However, they are beyond the scope of this paper. Applying Kirchhoff equations to the RC-circuit thermal model in Figure 4 (a), we have: $$C_{die}\frac{dT_{die}}{dt} = P_{circuit} - \frac{T_{die} - T_{amb}}{R_{die-amb}}.$$ (3) Figure 4 (b) shows a conceptual relationship between $P_{circuit}$ and $P_{amb}$ , where the two $P_{circuit}$ levels are resulted from the high $V_{dd}$ and low $V_{dd}$ . When $P_{circuit} = P_{amb}$ , i.e., $dT_{die}/dt = 0$ in (3), $T_{die}$ is stable. We call this point the equilibrium temperature $T_{eq}$ . $T_{eq}^{high}$ and $T_{eq}^{low}$ in the figure denote the equilibrium temperatures for the high $V_{dd}$ case and low $V_{dd}$ case, respectively. Due to the strong dependence of $P_{static}$ on $T_{die}$ and $V_{dd}$ from (2), the amount of differences between the two $P_{circuit}$ levels from the high $V_{dd}$ and low $V_{dd}$ , which is indicated by the arrows in Figure 4 (b), increases super-linearly with increasing $T_{die}$ and $V_{dd}$ . Similarly, the differences between the two $T_{eq}$ 's also follow the super-linear trend for the given $R_{die-amb}$ , a fixed design parameter. Hence, for some high $V_{dd}$ levels, it is possible that the corresponding $T_{eq}$ 's exceed the die temperature limit (e.g., 90°C), or such $T_{eq}$ 's do not exist at all. While $R_{die-amb}$ directly affects $T_{eq}$ , another design parameter, $T_{eq}$ influences how fast the die temperature reaches either Figure 4: (a) RC-circuit thermal model, and (b) the effect of the temperature and power variation Table 1: Simulation results of $T_{eq}$ and the time to reach $T_{eq}$ or 90°C from 20nm FinFET test circuits | _ | 70 C Hom Zomm I mi E1 test en cuits | | | | | | | | | | | | |---|-------------------------------------|------|------|------|------|------|------|------|--|--|--|--| | | $V_{dd}(V)$ | 0.50 | 0.55 | 0.60 | 0.65 | 0.70 | 0.75 | 0.80 | | | | | | | $T_{eq}(^{\circ}\mathrm{C})$ | 31.8 | 34.2 | 38.2 | 44.5 | N/A | N/A | N/A | | | | | | | Time(sec) | 1310 | 1465 | 1873 | 2375 | 3231 | 1600 | 1039 | | | | | $T_{eq}$ , if it exists, or the die temperature limit, otherwise. Especially, the time to reach the die temperature limit is an important design factor, because it determines how long a circuit can operate under the high voltage level. Table 1 shows the $T_{eq}$ levels and the times from 0°C to $T_{eq}$ or the die temperature limit, 90°C, from the 20nm bulk FinFET test circuits. From the measurement on ARM Cortex-A8, $R_{die\text{-}amb}$ , $C_{die}$ and $T_{amb}$ are set to be 35.8 K/W, 9.0 J/K and 25°C, respectively [19]. The power from the test circuit is scaled so that the circuit with $V_{dd}$ =0.7V has the same trend of temperature increase that ARM Cortex-A8 shows with the measured $R_{die\text{-}amb}$ , $C_{die}$ and $T_{amb}$ . The details will be explained at Section 5. The previous work on DTM has mainly focused on cases where $T_{eq}$ does not exist, and focused on how to avoid exceeding the die temperature limit with inevitable performance penalties: for example, lowering the clock frequency or both frequency and voltage levels to reduce $P_{circuit}$ , thereby to cool down $T_{die}$ . Different from the previous work, we present a novel DTM algorithm in the following section, which exploits the TEI phenomenon to improve energy efficiency of the circuit while neither exceeding the die temperature limit nor losing any performance. ## 4. TEI-AWARE DTM ## 4.1 Influence of TEI on energy consumption Due to the TEI phenomenon, the worst-case delays occur at the low temperature in FinFET circuits. Therefore, for a given target clock frequency, the corresponding voltage level of the circuit should be set according to the worst-cased circuit delay, which occurs at the lowest die temperatures. This is needed to guarantee correct circuit operation in the full range of the operating temperature. We call this voltage level the *base voltage level*, $V_{base}$ , associated with a target clock frequency, $f_{target}$ . Consider a FinFET-based circuit running at $f_{target}$ . As time goes by, the die temperature $T_{die}$ rises. Because of the TEI phenomenon, the FinFET-based circuit is getting faster with rising temperature, which allows us to drop the supply voltage level below $V_{base}$ while maintaining $f_{target}$ . Of course, we have to wait for $T_{die}$ to reach a predetermined level (which we will call the threshold temperature, $T_{th}$ ) before we can drop the supply voltage level. This is because we have a finite number of discrete supply voltage levels, so the move from a higher initial voltage level to the next lower voltage level can only happen when the delay decrease due to the temperature rise is some minimum amount so that correct circuit operation at lower voltage level can be ensured. Note that if $T_{th}$ exists, then this can significantly reduce the power consumption of the circuit due to the quadratic dependence of $P_{dynamic}$ and the exponential dependence Figure 5: (a) Threshold temperatures ( $T_{th}$ 's) at different voltage levels, and two different cases after lowering down the voltage level at $T_{th}$ : (b) $T_{die}$ increases, and (c) $T_{die}$ decreases, based on the 20nm FinFET based FO4 inverter chain simulation. of $P_{static}$ on the supply voltage level. Furthermore, differently from the conventional DTM methods, our approach does not scale down the clock frequency, so there will be no performance loss. Finally, note that because power dissipation is going down, the temperature rise in the substrate will be curbed. Figure 5 (a) shows an example of $T_{th}$ levels from multiple different voltage levels, based on the delay values with 20nm FinFET technology. Note that, for the figure and the remaining part of this paper, we assume the lowest temperature of test circuits is -25°C, and the die temperature limit is 90°C. We also assume that a finegrained (0.05V) input voltage control can be supported, similar to existing voltage controllers that power Intel CORE2 E6850 processor and ARM CORTEX-A8 with 0.05V difference in adjacent voltage levels. Then, in the figure, the operating frequency is set by the worst-case delay from the base voltage level, 0.75V, at -25°C. We use notation $T_{th}^{base \ voltage \rightarrow target \ voltage}$ in the figure to denote $T_{th}$ in each case. While $T_{th}^{0.75 \rightarrow 0.6}$ exceeds the die temperature limit, the other threshold temperatures can be exploited in DTM. Lowering down the voltage levels right after the increased temperature reaching $T_{th}$ leads to two possible cases: (Case I) $T_{die}$ keeps increasing, or (Case II) $T_{die}$ begins decreasing. Case I is because the equilibrium temperature $T_{eq}$ of the lowered voltage level is higher than $T_{th}$ , or such $T_{eq}$ does not exist. Case II is because $T_{eq}$ of the lowered voltage level lies below $T_{th}$ . For Case I, it is intuitive that the immediate voltage change at $T_{th}$ will not degrade the performance of the circuit, but give us the opportunity to save energy. This is illustrated on Figure 5 (b). Because 0.7V voltage level does not have $T_{eq}$ (from Table 1), lowering down the voltage from 0.75V to 0.7V at $T_{th}^{0.75 \to 0.7}$ =18°C allows the circuit to operate with the scheduled frequency but consume significantly less energy. On the other hand, the immediate voltage change at $T_{th}$ for Case II will result in timing violation because the temperature will begin to decrease. Therefore, we have to wait for a certain amount of time, until $T_{die}$ exceeds $T_{th}$ by a certain amount. Then, we can lower down the voltage level to reduce the power consumption, and keep the lowered voltage level until the decreasing temperature reaches $T_{th}$ . This is illustrated in Figure 5 (c). Because $T_{th}^{0.75 \rightarrow 0.65}$ equals 61°C, and $T_{eq}$ corresponding to the 0.6V voltage level is 44.5°C (from Table 1), $T_{die}$ decreases after the voltage change. Different from Figure 5 (b) and (c), each of which considers simply two available voltage levels, there can be more than two available voltages levels in reality that can meet the scheduled frequency condition in the whole temperature range. The availability of the multiple voltage levels requires more detailed analysis and more elaborate DTM policy. The following subsections will discuss all the possible cases in a DVFS schedule to complete a given task. The proposed optimal DTM policy can be generalized to arbitrary DVFS schedules. ## 4.2 Energy optimization With the given deadline specification of a task, the required (min- imum) operating frequency $f_{target}$ and corresponding base voltage level $V_{base}$ can be determined in order to finish task execution by deadline. Conventional DTMs of the circuit try not to exceed the temperature limit $T_{limit}$ by forcing to lower down the frequency or stop execution with performance penalties. Our proposed DTM method targets to minimize the energy consumption for a given task, or a given set of tasks, without violating the operating frequency of the initial schedule, and thereby without any performance loss. Simultaneously, our DTM slows down the speed of temperature increase, or makes the die temperature stable at a certain point below $T_{limit}$ , thereby avoiding the performance loss from such situations when the conventional DTMs inevitably lower the frequency or stop execution. Among all the possible voltage levels, if one voltage level $V_i$ has a threshold temperature such that $T_{th}^{V_{base} \rightarrow V_i} < T_{limit}$ , then $V_i$ may be exploited instead of $V_{base}$ in a certain temperature range. For the remainder of paper, we use a simple notation $T_{th}^{V_i}$ to denote $T_{th}^{V_{base} \rightarrow V_i}$ . Then, we can separate the operating temperature regions by each available $T_{th}$ . More specifically, the $i^{th}$ region is $R_i \triangleq [T_{th}^{V_i}, T_{th}^{V_{i+1}}]$ for $1 \le i \le N$ , where N is the number of the candidate voltage levels of the target frequency $f_{target}$ . We have $V_1 = V_{base} > V_2 > ... > V_N$ . Figure 6 (a) and (b) show an example that has three candidate voltage levels, $V_{High} = V_{base}$ , $V_{Mid}$ and $V_{Low}$ , and thus the temperature regions are divided into three regions. The red curves in both figures show the minimum energy consumption at each temperature, according to the lowest voltage level that makes the circuit work with $f_{target}$ at that temperature point. As can be seen from the figure, the minimum energy point in each region locates at the temperature point where the voltage level is changed, i.e., the threshold temperature level. Furthermore, from extensive simulations based on various FinFET libraries, we find that the energy consumption at $T_{th}^{V_{t+1}}$ is always higher than that at $T_{th}^{V_t}$ . This is because the leakage power increases fast as the temperature rises. Therefore, we start the optimization process from a premise: ▶ The minimum energy point in $R_i$ is always at $T_{th}^{V_i}$ , and the corresponding energy consumption is smaller than that at $T_{th}^{V_{i+1}}$ . The equilibrium temperature level $T_{eq}^{V_i}$ depends on the ambient temperature, and hence, it is an uncontrollable factor. The potential inequality between $T_{eq}^{V_i}$ and $T_{th}^{V_i}$ will not let the circuit operate with stable temperature $T_{th}^{V_i}$ . Suppose that the initial die temperature is $T_{init}$ , which is in region $R_i$ . Then the movement of die temperature $T_{die}$ follows the following two rules: - ▶ If $T_{init} < T_{eq}^{V_i}$ , $T_{die}$ will increase, until $T_{die} = \min\{T_{eq}^{V_i}, T_{th}^{V_{i+1}}\}$ . - ▶ If $T_{init} > T_{eq}^{V_i}$ , $T_{die}$ will decrease, until $T_{die} = \max\{T_{eq}^{V_i}, T_{th}^{V_i}\}$ . We use Figure 6 (a) as an example of the above rules. Suppose that $T_{init}$ is in $R_{Mid}$ , the temperature will eventually be stable at Figure 6: Case studies for (a) Policy I, and (b) Policy II. $T_{eq}^{Mid}$ which is also in $R_{Mid}$ . Then, $T_{eq}^{Mid}$ is the optimal temperature point where the circuit can achieve the maximum energy saving for the given task. Similarly, suppose that $T_{init}$ is in $R_{Low}$ , and $T_{eq}^{Mid}$ is still in $R_{Mid}$ . Then $T_{eq}^{Mid}$ is still the optimal point. That is because $T_{eq}^{Low}$ is lower than $T_{eq}^{Mid}$ , $T_{die}$ with initial voltage $V_{Low}$ decreases from $T_{init}$ to $T_{th}^{Low}$ . Then the voltage level switches to $V_{Mid}$ in order to maintain the speed of the circuit. Finally, $T_{die}$ will be stable at $T_{eq}^{Mid}$ . The opposite case that $T_{init}$ is in $R_{High}$ results in the same outcomes, because $T_{eq}^{High}$ is higher than $T_{eq}^{Mid}$ and this fact makes $T_{die}$ move to $T_{eq}^{Mid}$ . Therefore, we propose a policy as: ▶ Policy I: Check if there exists a k such that $T_{eq}^{V_k} \in R_k$ for $1 \le k \le N$ : we have proved that at most one such k exists. If k exists, the optimal voltage level is $V_k$ and the optimal and stable temperature is $T_{eq}^{V_k}$ . Whatever region $T_{init}$ starts in, we need to use the corresponding voltage level of the region, i.e., the lowest voltage in the region that meets the frequency condition, and then keep changing the voltage level whenever $T_{die}$ reaches a region boundary. Eventually die temperature will arrive at $T_{ee}^{V_k}$ . Now we discuss the case when no such k exists. In this case, $T_{die}$ keeps increasing in all the regions until the region i with $T_{eq}^{V_i}$ lower than $T_{th}^{V_i}$ . In this case $T_{die}$ should decrease in $R_i$ . Then, the minimum energy consumption of the circuit is at $T_{th}^{V_i}$ , because (i) using high voltage level than $V_i$ only makes $T_{die}$ increase, thus consuming more energy, (ii) $T_{die}$ can not further decrease than $T_{th}^{V_i}$ . This case is illustrated in Figure 6 (b). In the figure, $T_{eq}^{Mid}$ locates higher than $T_{th}^{Mid}$ , and $T_{eq}^{High}$ should be higher than $T_{eq}^{Mid}$ . Hence, $T_{die}$ always increases in both $R_{High}$ and $R_{Mid}$ . But, because $T_{eq}^{Low}$ lies below the region $R_{Low}$ (in $R_{Mid}$ in the figure), $T_{die}$ will decrease in $R_{Low}$ if $V_{Low}$ is applied. Finally, the optimal temperature is $T_{th}^{Low}$ . Although we know the optimal temperature point in Figure 6 (b), it is impossible to maintain operating at this point during circuit operation. Therefore, we propose to use $V_{Mid}$ for a certain amount of time to warm up. This process is indicated by ②. Then continue to perform: ③ lower down the voltage to $V_{Low}$ , and ④ maintain voltage $V_{Low}$ until $T_{die}$ decreases to $T_{th}^{Low}$ where we need to ⑤ increase the voltage to $V_{Mid}$ . Repeating these process makes the circuit operate near the optimal temperature without timing violation. We call this region the *stationary region* because when $T_{die}$ enters this region, it continues staying in that region by doing proper voltage switchings. The blue-colored region in Figure 6 (b) shows an example of the stationary region. Meanwhile, the amount of time for the warm-up process affects how far the circuit operates from the optimal point. The shorter the time is, the higher energy efficiency is achieved. The minimum constraint of such warm-up time is determined by the voltage switching time (i.e., the voltage transition latency of DC-DC converters) that the voltage controller can provide. However, this responsiveness issue of the voltage controller is beyond the scope of this paper. Based on the previous discussion, we propose the second policy of our DTM: ▶ Policy II. if Policy I cannot be applied, check whether there exists k such that $T_{eq}^{V_k} < T_{th}^{V_k}$ . Find the smallest k value if such k exists, and then the optimal temperature point should be $T_{th}^{V_{\min}(k)}$ . Whatever the region $T_{init}$ starts in, use the corresponding lowest voltage level of the region. Keep changing the voltage level whenever $T_{die}$ reaches a region boundary until $T_{die}$ enters the stationary region. In the stationary region, we keep performing $2 \rightarrow 3 \rightarrow 4 \rightarrow 5$ . At the end, we point out that if there exists no k such that $T_{eq}^{V_k} < T_{th}^{V_{k+1}}$ , $T_{die}$ will eventually exceed $T_{limit}$ and the task will fail to finish in time. Of course, conventional DTMs that use only the base voltage level of the task will make $T_{die}$ reach $T_{limit}$ even earlier. Compared to conventional DTMs, the proposed DTM could save a considerable amount of energy before $T_{die}$ reaches $T_{limit}$ , because the proposed DTM always selects the lowest (possible) voltage level in each region. Furthermore, using lower voltage levels slows down the temperature rise so that the circuit can operate with at a high frequency for longer time, while the circuit controlled by conventional DTMs would have to reduce the frequency earlier than the proposed DTM. ## 5. EXPERIMENTAL WORK We validated our proposed DTM with various FinFET-based circuits, namely, 50 FO4 inverter chain, 16-bit carry-select adder, 16bit multiplier, and 16-bit comparator based on 10nm, 14nm, 16nm, and 20nm PTM-MG bulk FinFET libraries. All the circuits are designed in the shorted gate mode. We performed Hspice simulation to obtain the delays and power consumptions of each circuit for different $V_{dd}$ setups and different temperatures. The delays were obtained from the worst case inputs of the circuits. Notice that we did not attempt to consider interconnect delays in our simulations. That is because the characteristics of interconnects used for deeply scaled FinFET-based circuit fabrics is unknown (i.e., although the R and C parasitic values of the interconnect go up with temperature, the current strength of the driver also improves, which can reduce the wire delay.) We determined the minimum and maximum temperature that the circuits operate as -25°C and 90°C, respectively. Based on the worst delay at -25°C for each $V_{dd}$ , we found the available voltage levels, which are lower than the base $V_{dd}$ but have Table 2: Simulation results from the four kinds of FinFET-based circuits based on the four different FinFET technology libraries. The number of \* indicates different reasons why 0.1V below $V_{base}$ can not be used: details are explained at the end of Section 5. | Tech. | Gain from $V_{base}$ = 0.75V and $V_i$ , the lower voltage level(s) | | | | Tech. | Gain from $V_{base}$ = 0.55V and $V_i$ , the lower voltage level(s) | | | | |---------|---------------------------------------------------------------------|----------------|------------|----------------|----------|---------------------------------------------------------------------|----------------|------------|----------------| | Teen. | Inverter chain | Adder | Multiplier | Comparator | 1 recii. | Inverter chain | Adder | Multiplier | Comparator | | 20nm | 32.44% | 22.60% | 22.51% | 20.14% | 20nm | 38.19% | 40.82% | 33.85% | 42.20% | | 2011111 | 0.7V and 0.65V | 0.7V and 0.65V | 0.7V ** | 0.7V * | | 0.5V and 0.45V | 0.5V and 0.45V | 0.5V ** | 0.5V and 0.45V | | 16nm | 22.17% | 32.56% | 8% | 35.22% | 16nm | 28.51% | 19.95% | 20.91% | 17.99% | | Tomin | 0.7V and 0.65V | 0.7V and 0.65V | 0.7V * | 0.7V and 0.65V | | 0.5V *** | 0.5V *** | 0.5V *** | 0.5V *** | | 14nm | 28.62% | 17.01% | 22.95% | 30.06% | 14nm | 19.25% | 18.93% | 19.51% | 18.46% | | 1411111 | 0.7V and 0.65V | 0.7V *** | 0.7V *** | 0.7V and 0.65V | | 0.5V *** | 0.5V *** | 0.5V *** | 0.5V *** | | 10nm | 16.49% | 13.65% | 8.96% | 15.60% | 10nm | 14.82% | 15.81% | 16.33% | 15.48% | | TOIIII | 0.7V and 0.65V | 0.7V *** | 0.7V *** | 0.7V * | | 0.5V *** | 0.5V *** | 0.5V *** | 0.5V *** | smaller delays than the worst delay of the base $V_{dd}$ in some higher temperature regions. For the power and thermal modeling, we used ARM Cortex-A8, which resulted that $R_{die-amb}$ = 35.8 K/W, and $C_{die}$ =9.0 J/K, and the chip increased from 25°C to 36°C in 500 sec. The detailed explanation of the measurement can be referred from [19]. Finally, we scaled the obtained power data of the test circuits in order to make the data compatible to using the measured $R_{die-amb}$ and $C_{die}$ . In this scaling work, we found the scaling factor s, such that multiplying s to the power data from the 20nm based inverter chain makes the temperature increase of the circuit (working with 0.7V) follow the same trend of ARM Cortex-A8. Then the derived s was multiplied to other circuits. Based on the scaled power and the ambient temperature set to 25°C, we finally derived the equilibrium temperature for each circuit and each voltage level. We defined $Gain = \frac{Saved\ energy\ w/\ the\ proposed\ DTM\cdot 100(\%)}{Energy\ consumption\ w/\ the\ conventional\ DTM}$ . We also determined the simulation conditions as follows: (i) the base $V_{dd}$ in the simulation is assumed to be the minimum voltage level, that the circuit controlled by the conventional DTM can finish a given task with the base $V_{dd}$ before the temperature exceeds 90°C or its $T_{eq}$ , and (ii) the circuit starts the operation at the ambient temperature (25°C). Note that the resulted gains under these conditions are almost the minimum gain that the proposed DTM can achieve, because (i) there can be the case that the proposed DTM makes the circuit finish a given task in time, while the conventional DTM does not, (ii) if the circuits starts with the higher temperature, then Gain may significantly increase than that can be derived from starting at the ambient temperature. Table 2 shows the simulation results that includes *Gain* and the possible voltage levels in the given operation conditions. We set the base $V_{dd}$ to 0.75V and 0.55V. Some cases in the table show that the test circuits can lower down the voltage level by 0.1V, i.e., two levels down, while the others can not. The other cases are because (i) the voltage level that 0.1V below $V_{base}$ could not satisfy $T_{th}^{V_i} < T_{limit}$ (indicated by \* in the table) (ii) the setup for conventional DTM made the temperature increase too fast to reach $T_{limit}$ , thereby the simulation was done earlier, otherwise our DTM could exploit the 1V low level voltage (indicated by \*\*), or (iii) the circuit with 0.05V low level voltage has the equilibrium temperature with the given thermal conditions, which is below $T_{th}$ of the 0.1V low level voltage (indicated by \*\*\*). The reason (ii) and (iii) prove the potential of our DTM that can enhance the more energy savings than those in the table. Finally, the proposed DTM has been demonstrated to significantly improve the energy efficiency of the FinFET-based circuits. ## 6. CONCLUSION This paper started by presenting a key observation of TEI phenomenon that the delay of a FinFET gate decreases with increasing die temperature both in the near and super-threshold voltage regimes, which is different from that exhibited by planar CMOS devices operating at the super-threshold $V_{dd}$ . Next it introduced the TEI-aware DTM algorithm to minimize the energy consumption of FinFET-based circuits without any appreciable performance penalty. More precisely, instead of choosing the smallest possi- ble voltage to complete a task within its specified deadline, the proposed DTM algorithm dynamically adjusts the supply voltage of the chip so as to maintain the chip temperature at or near its optimum operation point. Experimental results showed 40% energy saving (with no performance penalty) can be achieved by the proposed TEI-aware DTM approach compared to the best-in-class DTMs that are unaware of this phenomenon. ## 7. ACKNOWLEDGEMENTS This research is supported by grants from the PERFECT program of the Defense Advanced Research Projects Agency and the Semiconductor Research Corporation. ## 8. REFERENCES - [1] E. J. N. et al., "Turning silicon on its edge," *IEEE Circuits and Devices Magazine*, 2004. - [2] T. Sairam, W. Zhao, and Y. Cao, "Optimizing FinFET technology for high-speed and low-power design," GLSVLSI, 2007. - [3] B. Zhai et al., "Energy efficient subthreshold processor design," IEEE T. on VLSI, 2009. - [4] R. Dreslinski et al., "Near-threshold computing: reclaiming moores law through energy efficient integrated circuits," *IEEE*, 2010. - [5] F. Crupi et al., "Understanding the basic advantages of bulk FinFETs for sub- and near-threshold logic circuits from device measurements," IEEE.T on CAS II, 2012. - [6] W.Liao et al., "Temperature and supply voltage aware performance and power modeling at micro architecture level," *IEEE T. on CAD*, 2005 - [7] D. Brooks and M. Martonosi, "Dynamic thermal management for high performance microprocessors," HPCA, 2001. - [8] R. Jayaseelan and T. Mitra, "Temperature aware task sequencing and voltage scaling," *ICCAD*, 2008. - [9] H. Jung, P. Rong, and M. Pedram, "Stochastic modeling of a thermally-managed multi-core system," DAC, 2008. - [10] R. Jayaseelan and T. Mitra, "Dynamic thermal management via architectural adaptation," DAC, 2009. - [11] D. Shin et al., "Energy-optimal dynamic thermal management: Computation and cooling power co-optimization," *IEEE T. on Industrial Informatics*, 2010. - [12] Y. Pu et al., "Misleading energy and performance claims in sub/near threshold digital systems," ICCAD, 2010. - [13] M. Ashouei et al., "Novel wide voltage range level shifter for near-threshold designs," ICECS, 2010. - [14] A. Calimera et al., "Reducing leakage power by accounting for temperature inversion dependence in dual-vt synthesized circuits," ISLPED, 2008. - [15] "PTM," available at http://ptm.asu.edu. - [16] X. Huang *et al.*, "Sub-50 nm P-Channel FinFET," *IEEE T. on Electron Devices*, 2001. - [17] S. Soleimani, A. AfzaliKusha, and B. Forouzandeh, "Temperature dependence of propagation delay characteristic in finfet circuits," *ICM*, 2008. - [18] S. Kim et al., "Temperature dependence of substrate and drain-currents in bulk FinFETs," IEEE T. on Electron Devices, 2007. - [19] Q. Xie et al., "Dynamic thermal management in mobile devices considering the thermal coupling between battery and application processor," ICCAD, 2013. - [20] A. Bansal et al., "Compact thermal models for estimation of temperature-dependent power/performance in FinFET technology," ASPDAC, 2006.