A thermally-aware performance analysis of vertically integrated (3-D) processor-memory hierarchy
read more
Citations
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0
3D-Stacked Memory Architectures for Multi-core Processors
Die Stacking (3D) Microarchitecture
A novel architecture of the 3D stacked MRAM L2 cache for CMPs
References
The SimpleScalar tool set, version 2.0
Hitting the memory wall: implications of the obvious
Parameter variations and impact on circuits and microarchitecture
3-D ICs: a novel chip design for improving deep-submicrometer interconnect performance and systems-on-chip integration
Related Papers (5)
Frequently Asked Questions (19)
Q2. What is the reason for the slowing down of device scaling?
continued device scaling is slowing down due to severe short-channel effects [1], increasing variability [2, 3] and power dissipation [4].
Q3. What is the thermal resistance of the active layers?
Assuming HSQ dielectric (thermal conductivity Kth = 0.4 W/mK), Cu metal (Kth = 385 W/mK) with 50% metallization density, and interconnect dimensions from ITRS [1] specifications for 130 nm technology node, the effective thermal resistance for the back-end corresponding to each layer is θCuILD = 0.048 K/W, while that for the thinned Si active layers is θSi = 0.0037K/W.
Q4. What is the effect of increasing the frequency on the thermal management of 3D chips?
With the increasing power density of nanometer scale chips [1], die temperatures and on-chip thermal gradients are expected to rise substantially.
Q5. How many layers of memory are used for the 2-D system?
The SDRAM main memory of size 64MB is off-chip for the 2-D system, while the same SDRAM is divided into 16 layers of area 1cm2 each in order to accommodate 64MB on the same chip.
Q6. What are the thermal resistance values for each layer?
In order to calculate the vertical thermal profile, the active and standby power dissipation for each layer and the thermal resistance values between active layers as well as for the entire 3-D stack are needed.
Q7. What is the reason why the main memory bus has limited bandwidth?
the main memory bus haslimited bandwidth because of the high capacitance of the external bus and the limited number of input/output (I/O) pads which limit the bus width.
Q8. What is the common method of achieving the ever-standing requirements of faster and smaller chips?
The classical solution to achieve the ever-standing requirements of faster and smaller chips has been device scaling as per Moore’s Law.
Q9. Why does the performance of mcf application decrease with increasing bus size?
This is because the number of L2 cache-misses decrease with increasing L2 cache size, which reduces the number of accesses to the off-chip bus and off-chip main memory.
Q10. What is the reason for the temperature constraint on a chip?
The temperature constraint on a chip (which arises from reliability concerns) limits the maximum operating frequency of the system.
Q11. What is the reason why they claim that thermal management of such systems is not a significant issue?
they claim that thermal management of such systems will not be a significant issue as their analysis shows a very small vertical temperature gradient (less than 1.3ºC) possibly due to the fact that leakage power (which is highly temperature sensitive) was not a big concern at the old 0.5 um technology node.
Q12. What are the thermal considerations for a twolf chip?
Thermal considerations are critical even for conventional chips because most system failures and reliability mechanisms for VLSI chips are strongly temperature sensitive [24, 25].
Q13. What is the main reason for the increasing complexity of interconnecting devices?
the increasing number of devices and functionality on a single chip leads to increasing complexity in interconnecting devices with a large number of metal layers.
Q14. What is the effect of vertical integration of active layers on the thermal profile of the microprocessor?
The process of vertical integration of active layers not only adds to the power density that needs to be dealt with by the heat sink, but also increases the distance of the additional layers from the heat sink, as shown in Fig.
Q15. What is the performance improvement shown in the figure?
The performance improvement shown here demonstrates that the memory bus width and frequency play a significant role in the overall performance of the system.
Q16. How is the power dissipation of the processor layer calculated?
The power dissipation (active and leakage) of the processor layer (Alpha 21264) is found from [29] which provides the power dissipation of an Alpha 21264 microprocessor at 350 nm technology node by assuming a full scaling scheme down to 130 nm node.
Q17. Why is there a performance improvement on increasing L2 cache size in a 3-D chip?
It is also found that due to large bandwidth of memory bus, there is minimal performance improvement on increasing L2 cache size in a 3-D chip.
Q18. What is the effect of increasing the processor speed on the overall performance of the system?
On the other hand, applications with less memory intensive behavior really reap the benefits from increasing processor clock speed.
Q19. What is the difference between a true 3-D integration scheme and a traditional planar chip?
A true 3-D integration scheme involves monolithic stacking of multiple active layers and leads to a considerable reduction in the number and average lengths of the longest global wires seen in traditional planar (2-D) chips by providing shorter “vertical” paths for connection (Fig. 1).