scispace - formally typeset
Search or ask a question

Showing papers by "Nathan Binkert published in 2004"


01 Jan 2004
TL;DR: A new register-file architecture that virtualizes logical register contexts that achieves a 10% increase in performance over the baseline architecture even with fewer physical than logical registers while also reducing data cache bandwidth is proposed.
Abstract: This paper proposes a new register-file architecture that virtualizes logical register contexts. This architecture makes the number of active register contexts—representing different threads or activation records—independent of the number of physical registers. The physical register file is treated as a cache of a potentially much larger memory-mapped logical register space. The implementation modifies the rename stage of the pipeline to trigger the movement of register values between the physical register file and the data cache. We exploit the fact that the logical register mapping can be easily updated—simply by changing the base memory pointer—to construct an efficient implementation of register windows. This reduces the execution time by 8% while generating 20% fewer data cache accesses. We also use the large logical register space to avoid the cost of a large physical register file normally required for a multithreaded processor, allowing us to create an SMT core with fewer physical registers than logical registers that still performs within 2% of the baseline. Finally, the two are combined to create a simultaneous multithreaded processor that supports register windows. This architecture achieves a 10% increase in performance over the baseline architecture even with fewer physical than logical registers while also reducing data cache bandwidth. Thus we are able to combine the advantages of register windows with multithreading.

8 citations


01 Jan 2004
TL;DR: It is found that tighter integration of the network interface can provide benefits in TCP/IP throughput and latency and the interaction of the NIC with the on-chip memory hierarchy has a greater impact on performance than the raw improvements in bandwidth and latency that come from integration.
Abstract: High-bandwidth TCP/IP networking is a core component of current and future computer systems. Though networking is central to computing today, the vast majority of end-host networking research focuses on the current paradigm of the network interface being merely a peripheral device. Most optimizations focus solely on software changes or on moving some of the computation from the primary CPU to the off-chip network interface controller (NIC). We present an alternative approach for achieving high performance networking. Rather than increasing the complexity of the NIC, we directly integrate a conventional NIC on the CPU die. To evaluate this approach, we have developed a simulation environment specifically targeted for networked systems. It simulates server and client systems along with a network in a single process. Fullsystem simulation captures the execution of both application and OS code. Our model includes a detailed out-of-order CPU, event-driven memory hierarchy, and Ethernet interface device. Using this simulator, we find that tighter integration of the network interface can provide benefits in TCP/IP throughput and latency. We also see that the interaction of the NIC with the on-chip memory hierarchy has a greater impact on performance than the raw improvements in bandwidth and latency that come from integration.

5 citations