3-D-DATE: A Circuit-Level Three-Dimensional DRAM Area, Timing, and Energy Model
Summary (7 min read)
Introduction
- A Circuit-Level Three-Dimensional DRAM Area, Timing, and Energy Model, also known as 3D-DATE.
- A few studies have offered models for power and access latency calculations of DRAM designs in limited ranges.
1.1 Motivation
- Three-dimensional die stacking involves connecting multiple silicon dies with a vertical interconnect, such as through-silicon vias (TSVs) or micro-bumps.
- Three-dimensional die stacking reduces global wire routing inside of integrated circuits[1].
- Samsung has shown that Wide-I/O has 330.6 mW read operating power in 50 nm process which is almost equal to LPDDR2 read power at the same process node.
- Many studies have shown that 3D DRAM provides higher bandwidth with lower power consumption, as well as methods to utilize 3-D DRAM in memory hierarchies [2, 7–10].
- Few studies have offered models for power and access latency calculations of custom designs.
1.2 Original Contributions
- The goal of this work is to provide a 3D DRAM Area, Timing and Energy (DATE) model.
- DATE not only can be used to model existing standard planar DRAM, but also for custom 3D DRAM designs or to find the optimal 3D DRAM design for architectures under exploration using traditional or emerging devices.
- 2 DATE presents four different transistor models for modeling DRAM.
- DATE demonstrates a new core design to support emerging VCAT based cell array layout as depicted in [11].
- A more detailed comparison with other models are presented in Section 3.4.
1.4 Organization of Dissertation
- Chapter 2 presents DRAM process node characterization.
- Transistors, wires, and through silicon via (TSV) models, modeled from 90 nm to 16 nm technology nodes, are discussed.
- Chapter 3 presents circuit-level model and architectural-level model of 3D DRAM.
- Chapter 4 presents the first case study, which explores the benefits of 3D design space using a 1 Gb standard double-data-rate DRAM.
1.5 Abbreviations
- ASC Asymmetric Channel Doping BL BitLine DATE DRAM Area, Timing, and Energy model DDR Double Data Rate DRAM Dynamic Random Access Memory F minimum Feature size FEOL Front-End-Of-Line FinFET Fin Field Effect Transistor 5 ITRS International Technology Roadmap for Semiconductors JEDEC Joint Electron Device Engineering Council LPDDR.
- Low Power Double Data Rate MASTAR ITRS2005 roadmap provides partial information of gate transistor with wordline voltage from 80 nm node.
- The roadmap does not provide any resistance and current information to calculate speed.
- DATE presents DRAM roadmap from 90 nm technology node.
2.1 Transistor Model and Scaling
- In DRAM, a gate transistor is required to reduce the leakage current and to retain the stored data in the cell capacitor during the required data retention time.
- SRCAT provides more recessed channel effect than RCAT [23].
- Thus, FinFET can be used as a gate transistor in a smaller technology node rather than a planar transistor [26, 29].
- Vertical channel access transistor (VCAT) is another transistor that has been proposed as a bitcell transistor alternative for DRAMs [11, 24].
- This allows for the bitcell transistor to be placed at the cross section of bitline and wordline and also allows VCAT dedicated denser cell layout such as 4 F 2.
2.1.1 Gate Transistor Model and Scaling
- Their gate transistor roadmap is deployed with Synopsys Technology ComputerAided Design (TCAD) device simulator technology.
- DATE provides roadmap from 90 nm for the comparison since vendors fabricate test chip in larger technology nodes [11].
- Among these empirical data, RCAT threshold voltages are collected as shown 15 in Figure 2.5.
2.1.2 High-Voltage and Peripheral transistor
- Peripheral and high voltage transistor roadmaps are deployed with a Model for Assessment of cmoS Technology And Roadmaps from ITRS [47].
- Figure 2.6 shows the graphical user interface of MASTAR.
- MASTAR has high performance (HP), low stand-by power (LSTP) and low operating power (LOP) process roadmaps with physical models of planar bulk, double gate (DG) and silicon on insulator (SOI) transistor.
- From this assembly, the authors rely upon MASTAR process assumptions along with Rambus size projections.
- DATE admits the ITRS projection for adjusting channel doping concentration.
2.2.1 Wire
- For the wire resistance and wire capacitance calculation, DATE adopts Horowitz wire model [48].
- For the general metal wire material, Horowitz and ITRS expected the technology would migrate from aluminum to copper because aluminum wires have a resistivity of 282 Ω·cm while copper wires have a resistivity of 170 Ω·cm at 20 ◦C [48–50].
- In the technical report [51], DRAM uses a metal size similar to the global wire size of a microprocessor process.
- There is a significant difference in the choice of inter-cell routing materials assumed between DATE and the technical report [51].
- For the other metal layers, DATE adopts similar width sizes and aspect ratios from the cross-sectional report [51].
2.2.2 Through Silicon Via
- TSVs are classified into different categories according to the fabricated order compared to the metal layer.
- Figure 2.11 shows a top view of the FEOL TSV bundles along with coupled capacitance.
- For the detailed calculation for each technology nodes, DATE follows CACT-3DD size roadmap for a conservative size scaling: ITRS provide size roadmap of TSVs.
- DATE includes TSVs in the driving circuits.
- The area calculations for TSVs includes the buffer chain unless the buffer chain can fit into the pitch of the TSV.
2.3.1 Gate Transistor
- RCAT, SRCAT and VCAT have been simulated with Synopsys TCAD under the condition proposed in Table 2.1 and Figure 2.4.
- Figure 2.13 shows three-dimensional view and cross-section of 27 28 VCAT TCAD simulation.
- Both Rambus and CACTI-3DD assume similar capacitance scaling projection as shown in Figure 2.14.
- Io f f below 5 fA with the lowest possible channel doping density while the threshold voltage met threshold voltage trend within the standard deviation (0.0665 V).
2.3.2 High Voltage and Peripheral Transistor
- Table 2.5 shows capacitance and turn-on current roadmap of high-voltage (HV) transistors.
- For the turn-on current, CACTI-3DD uses a fixed number on each node even through temperature changes.
- DATE follows ITRS roadmap at 25 ◦C and reflects turn-on current change due to temperature changes based on MASTAR calculation.
- Between CACTI-3DD and DATE, DATE exhibits more device capacitance because DATE adopts higher side-wall capacitance as discussed in the case of HV transistor and also expects more gate capacitance mainly due to longer channel length expectation of Rambus roadmap.
- 2The Rambus and CACTI-3DD projection was derived and calculated based on the source code or data provided by the author.
2.3.3 Wire
- Wire capacitance and resistance per µm have been calculated using the Horowitz equation for DATE.
- The CACTI3DD roadmap is derived from the source code.
- In M3 layer roadmap, DATE expects the smallest resistance in all nodes mainly due to it have the largest physical dimension compare to the ITRS and CACTI-3DD.
- The normalized values of wire capacitance and resistance across three anonymous processes with those of DATE, ITRS, and CACTI-3DD, are presented in Table 2.8.
- The anonymous processes are for the general logic design.
2.3.4 Through Silicon Via
- Through silicon via (TSV) is mainly made by etching or laser drilling.
- In DATE, the authors adopt CACTI-3DD TSV roadmap since they assume TSV size would scale due to technology advancement.
- The Table 2.11 shows the DATE TSV roadmap calculated as described in Section 2.2.2.
- Circuit level models can be expanded upon to calculate the resistance, capacitance, and area of the logic composed of multiple transistors.
- Examples of circuit level modeling of the DRAM memory system are CACTI, CACTI-D, CACTI-3DD, and Rambus models introduced in Section 1.3.
3.1.1 General Layout and Drain Capacitance
- In the circuit level model, the logic gate area, turn on resistance, and capacitance is derived from the physical geometry obtained from a transistor layout.
- DATE assumes ideal layout design rules as shown in Figure 3.2 and Figure 3.3.
- Only one of the two regions are considered.
- The drain capacitance of the series-connected transistors is also calculated by adding all the capacitance shown in Figure 2.15 for the gray areas.
3.1.2 Digital Logic and Driving Buffer
- DATE assumes that the buffer followed by the digital logic is driving the following logic or wire as shown in Figure 3.8.
- Table 3.1 shows logical effort for inputs of logic gates.
- For the driving buffer chain, Nils et al. [57] showed the optimum fanout of each inverter, that is, the optimum stage effort to achieve the least delay is within a range of 2.7 to 5.3 according to the technology dependency.
- The transistor size for the logic in Figure 3.8 is decided by the number of inputs and also the kind of logic.
- For the energy calculation of the gate, DATE accounts for drain-out charges by adding 56 capacitance of every node since the dissipated energy is given by the equation [55]: E =CL ×V 2×P0→1 (3.20) where CL is the sum of the intrinsic capacitance of the gate and loaded capacitance of the output.
3.1.3 Repeater for Wire
- When wire length linearly increases, the delay of wire increases quadratically since both resistance and capacitance increase linearly.
- Also, large wire load on the driver leads to excessive short-circuit power dissipation on the last stage of the driver, which is due to the degrading of the waveform shape [60].
- The general design approach is to introduce a repeater to resolve the problems caused by large wire loads.
- In the DATE model, Rabaey’s approach is adopted for the repeater model [55].
- Γ stands for the ratio of input capacitance and output capacitance of a minimum size inverter.
3.1.4 Address Decoder
- DATE provides a two-stage address decoder for both row and column address decoding as shown in CACTI5.1 [62] and described in Rambus model [13].
- After the MWL, there is a sub-wordline (SWL) which is driven by the inverter buffer.
- The row address path consists of the predecoder stage and following second stage decoder blocks.
- The outputs of these base decoders generate the final predecoder signal output by using NAND gates.
- In the row address path, The NOR gate drains out the stored internal charge when the driving output is not selected.
3.1.5 Bitline and Bitline Sense Amplifier
- 63 Figure 3.16 shows the schematic of the bitline sense amplifier.
- The bitline and complement bitline are precharged at half the voltage storage capacity of the storage capacitor by using an equalizer.
- All the equations in this section are taken and derived from the Section 6.1 and 9.3 of CACTI5.1.
- 64 According to the CACTI [12], the bitline delay with the effect of the wordline rise time is given by the equation:.
- For the energy calculation, DATE calculates the drain capacitance of the transistors which consist of the sense amplifier, connected to the bit line and multiplies it by the bitline voltage and the supply voltage as discussed in Section 3.1.2.
3.2 Architecture Level Modeling
- Column logic is also placed at the other edge of the bank to decode column address and to drive column select signal.
- The basic floor plan concept of DDR DRAM is not different, which places the banks and shares the control logic among the banks.
- DATE follows general floor plan between banks and peripheral logic as shown in Figure 3.17.
- Figure 3.19 shows the schematic diagram of subarray for the conventional 6 F 2 DRAM.
3.3 Validation
- For the validation, the DATE model results are compared to energy, and speed published in the data sheets of several commodity DRAMs across various technologies and different DRAM generations [43, 69–83].
- Table 3.2 shows the comparison of DATE energy results with the calculated energy from the specification, based on the system level model [84].
- TR C D represents row address to column address delay - the period between the issuing of the active command and the read/write command.
- Table 3.6 shows the comparison of DATE model area results against the derived areas of the VCAT based DRAM and 3D DRAM.
- Reducing oxide thickness 10% results increased gate capacitance about 10%.
3.4 Comparison with Other Models
- Table 3.9 shows circuit level model comparison of CACTI-3DD, Rambus, and DATE.
- All three models calculate area and energy.
- Moreover, DATE supports emerging device, i.e., VCAT.
- 80 CHAPTER 4 CASE STUDY: DRAM DESIGN SPACE EXPLORATION.
- To evaluate the effect of the design change on each component, the authors start from the most basic design case, i.e., 2D single bank.
4.1.1 Single Bank Design Space
- The single bank is not a practical DRAM design option, but the large bank size such as 1.
- Gb helps us to understand the tradeoff of the design elements that support and make up the bank.
- In detail, Table 4.1 shows the row and column address bits matched to each page size.
- While the page size change, the authors also change subarray size in each wordline and bitline direction from 25 bit to 212 bit.
- The geometry change caused wordline and bitline length to change along with the shift in driving peripheral circuit size.
4.2 3D Design Space Exploration in 35 nm Node
- A DRAM rank is a group of DRAM devices which respond and operate at the same time by the single command.
- Figure 4.8a shows a traditional planar DRAM, which, in this case, the rank is a single die.
- Compared to the TSV area in Figure 4.9, this makes single bank splitting more expensive than the fine-grained rank-level stacking in terms of area and power.
- The logic die could be designed and fabricated by using another process to achieve best operation performance.
- This is beyond DATE model’s evaluation range.
4.2.1 Area Efficiency
- As discussed in Section 4.1.1.1, as space for the peripheral circuit and additional functions is increased, area efficiency is decreased.
- As the number of layers increases, the memory size to be allocated per die to maintain 1 Gb becomes smaller (i.e., the cell array size per die is reduced).
- The number of data and control signals is similar even if the number of 1The figure is used under author’s permission.
- Thus the number of TSVs for data and control signals is similar even as the number of die increases.
- Since the TSV area for data and control signal is almost the same, and the area of the cell area is reduced, the area efficiency deteriorates as the number of dies increases in both 6 F 2 and 4 F 2 layout case.
4.2.2 Energy Efficiency
- Table 4.19 shows the best energy efficiency on each 3D stacked DRAM configuration as the number of layers is increased.
- In the other cases, the best energy efficiency design had 2.
- Mb bank size which was the smallest bank size examined during the evaluation.
- In all cases, wire energy and TSV energy accounted for approximately 88 to 95 percent of the entire read energy consumption, which indicates that the logic energy was optimized.
- As the number of stacked die increased, the TSV energy started to dominate overall wire energy since the TSV energy increased proportional to the number of layers.
4.2.3 Throughput
- Table 4.20 shows the best throughput on each 3D stacked DRAM as the number of stacked die is increased.
- After the eight-layer, the DRAM throughput exhibited diminishing returns.
- As discussed in Section 4.1.2.3, the smaller the bank size, faster throughput was observed.
- I/O and miscellaneous indicates I/O transceiver and control signal delay.
- Decoder indicates the sum of row and column address decoder delay.
4.2.4 Product of Design Metric
- Figure 4.13 shows the tendency of the best result of the combinations of area efficiency (AE), energy efficiency (EE), and throughput (TH) as the die count is varied.
- This tendency of the area efficiency also affects the combined design metric.
- Compared to planar design and four layered DRAM design, the area efficiency decreased approximately 48% while energy efficiency increased approximately 50%.
- The VCAT-base design displayed better throughput than the RCAT-base design in all cases, and the RCAT-based design displayed better area efficiency than the VCAT-based design in all cases.
- Thus, the RCAT-based design 114 115 exhibited better performances after that point.
4.2.5 Design Metric Comparison in Different Technology
- Figure 4.14 shows the design metric peak value comparison between 16 nm, 35 nm, and 68 nm node when the die count is increased.
- Figure 4.14c presents the best throughput for various process node as the die count is increased.
- In the case of area efficiency, 4 F 2 cell layout DRAM design exhibited the smaller area with 68.0% area efficiency compared the 6 F 2 layout.
- For the throughput, the RCAT-based design exhibited the optimum point when the die count was eight while VCAT-based design also exhibited the optimum with eight dies.
5.1 Summary of Contributions
- The authors have presented the three dimensional DRAM Area, Timing, and Energy model.
- The authors have proposed a wire roadmap using the material parameters provided through ITRS roadmap [20] and the physical dimensions presented in the ITRS roadmap and the cross-sectional die report [51].
- The metal layer 2 and 3 are predicted larger size, therefore resistance values of DATE roadmap are smaller than the logic processes and capacitance values of DATE roadmap are more significant than the logic process.
- The authors have implemented and verified circuit level modeling: The logic and buffer size was determined using the logical effort [54].
Did you find this useful? Give us your feedback
Citations
15 citations
13 citations
11 citations
2 citations
References
2,744 citations
"3-D-DATE: A Circuit-Level Three-Dim..." refers methods in this paper
...1) Repeater Model: Rabaey’s approach [45] is adopted for the repeater model in 3D-DATE....
[...]
...For the energy calculation of the gate, 3D-DATE accounts for the consumed charge by adding the capacitance of every node since the dissipated energy is given by the equation [45]:...
[...]
1,693 citations
"3-D-DATE: A Circuit-Level Three-Dim..." refers methods in this paper
...By using Elmore delay approach [61], propagation delay of the interconnect is given as: tp ,c r i t =m (0....
[...]
[...]
1,486 citations
"3-D-DATE: A Circuit-Level Three-Dim..." refers methods in this paper
...3D-DATE adopts Horowitz wire model [43] for calculating resistance and capacitance of wire....
[...]
845 citations
"3-D-DATE: A Circuit-Level Three-Dim..." refers background in this paper
...1 [46] and described in Rambus model [10]....
[...]
...1 and Horowitz to analyze delay of bitline, sense amplifier, and write-back driver [46], [48]....
[...]
...Nine bit row address decoding path [10], [46]....
[...]
829 citations
Related Papers (5)
Frequently Asked Questions (2)
Q2. What are the future works in this paper?
There are several interesting directions in future research. It would also be interesting to extend DATE model for evaluating the finegrained 3D DRAM design by adding high-performance transistor roadmap. “ The future of wires ”. For the device roadmap, the authors believe it would be interesting to update current roadmap with emerging devices such as FinFET-based gate transistors or emerging materials for the wire, which would impact overall speed.