The future of microprocessors
Summary (1 min read)
20 Years of exponential Performance Gains
- For the past 20 years, rapid growth in microprocessor performance has been enabled by three key technology drivers-transistor-speed scaling, core microarchitecture techniques, and cache memories-discussed in turn in the following sections: Transistor-speed scaling.
- This might sound simple but is increasingly difficult to continue for reasons discussed later.
- Classical transistor scaling provided three major benefits that made possible rapid growth in compute performance.
- Figure 4b outlines the increasing speed disparity, growing from 10s to 100s of processor clock cycles per memory access.
the next 20 Years
- Microprocessor technology has delivered three-orders-of-magnitude performance improvement over the past two decades, so continuing this trajectory would require at least 30x performance increase by 2020.
- 11 With limited supply-voltage scaling, energy and power reduction is limited, adversely affecting further integration of transistors.
- Multiple cores and customization will be the major drivers for future microprocessor performance (total chip performance).
- In extreme cases, high-performance computing and embedded applications may even manage these complexities explicitly.
Conclusion
- The past 20 years were truly the great old days for Moore's Law scaling and microprocessor performance; dramatic improvements in transistor density, speed, and energy, combined with microarchitecture and memoryhierarchy techniques delivered 1,000fold microprocessor performance improvement.
- The next 20 years-the pretty good new days, as progress continues-will be more difficult, with Moore's Law scaling producing continuing improvement in transistor density but comparatively little improvement in transistor speed and energy.
- The pretty good old days of scaling that processor design faces today are helping prepare us for these new challenges.
- The authors thank him and the members and presenters to the working groups for valuable insightful discussions over the past few years.
Did you find this useful? Give us your feedback
Citations
538 citations
461 citations
433 citations
423 citations
Cites background from "The future of microprocessors"
...Potential benefits go beyond reduced power demands in servers and longer battery life in mobile devices; reducing power consumption is becoming a requirement due to limits of device scaling in what is termed the dark silicon problem [4, 11]....
[...]
379 citations
References
3,653 citations
3,514 citations
3,276 citations
"The future of microprocessors" refers background in this paper
...Shifting responsibility increases potential achievable energy efficiency, but realizing it depends on significant advances in applications, compilers and runtimes, and operating systems to understand and even predict the application and workload behavior.(7,16,19) However, these advances require radical research breakthroughs and major changes in software practice (see Table 7)....
[...]
3,008 citations
2,499 citations
Related Papers (5)
Frequently Asked Questions (19)
Q2. What are the future works in this paper?
Because the future winners are far from clear today, it is way too early to predict whether some form of scaling ( perhaps energy ) will continue or there will be no scaling at all. Moreover, the challenges processor design will faces in the next decade will be dwarfed by the challenges posed by these alternative technologies, rendering today ’ s challenges a warm-up exercise for what lies ahead.
Q3. What is the way to use the unused transistor-integration capacity for logic?
Aggressive voltage scaling provides an avenue for utilizing the unused transistor-integration capacity for logic to deliver higher performance.
Q4. What is the effect of the transistor on the supply voltage?
As the transistor scales, supply voltage scales down, and the threshold voltage of the transistor (when the transistor starts conducting) also scales down.
Q5. How many bits can be connected to a cluster?
The clusters could be connected through wide (high-bandwidth) low-swing (lowenergy) busses or through packet- or circuit-switched networks, depending on distance.
Q6. What is the effect of frequency of a well-tuned system?
When transistor performance increases frequency of operation, the performance of a well-tuned system generally increases, with frequency subject to the performance limits of other parts of the system.
Q7. What is the advantage of using the unused cache?
the transistor budget from the unused cache could be used to integrate even more cores with the power density of the cache.
Q8. What is the way to achieve the highest performance and energy efficiency?
Aggressive use of customized accelerators will yield the highest performance and greatest energy efficiency on many applications.
Q9. What is the challenge for chip architects?
Chip architects must limit frequency and number of cores to keep power within reasonable bounds, but doing so severely limits improvement in microprocessor performance.
Q10. How many watts of power can be saved by limiting the data movement over the network?
In the future, data movement over these networks must be limited to conserve energy, and, more important, due to the large size of local storage data bandwidth, demand on the network will be reduced.
Q11. How many transistors can be integrated into a die?
For 65 watts, the die could integrate 50 million transistors for logic and about 6MB of cache (Case C).traditional wisdom suggests investing maximum transistors in the 90% case, with the goal of using precious transistors to increase single-thread performance that can be applied broadly.
Q12. How many cores can be hardwired to a particular data representation or computational algorithm?
In some cases, units hardwired to a particular data representation or computational algorithm can achieve 50x–500x greater energy efficiency than a general-purpose register organization.
Q13. What is the effect of variation on the speed of the core?
variation in the threshold voltage manifests itself as variation in the speed of the core, the slowest circuit in the core determines the frequency of operation of the core, and a large core is more susceptible to lower frequency of operation due to variations.
Q14. What is the effect of the faster transistors on the performance of a system?
The faster transistors provide an additional 40% performance (increased frequency), almost doubling overall performance within the same power envelope (per scaling theory).
Q15. What are the common examples of extreme energy-efficient systems?
Extreme studies27,38 suggest that aggressive high-performance and extreme-energy-efficient systems may go further, eschewing the overhead of programmability features that software engineers have come to take for granted; for example, these future systems may drop hardware support for a single flat address space (which normally wastes energy on address manipulation/computing), single-memory hierarchy (coherence and monitoring energy overhead), and steady rate of execution (adapting to the available energy budget).
Q16. What is the main reason for the rapid growth in microprocessor performance?
For the past 20 years, rapid growth in microprocessor performance has been enabled by three key technology drivers—transistor-speed scaling, core microarchitecture techniques, and cache memories—discussed in turn in the following sections:Transistor-speed scaling.
Q17. How many transistors can be integrated into a single processor core?
Applying Pollack’s Rule, a single processor core with 150 million transistors will provide only about 2.5x microarchitecture performance improvement over today’s 25-million-transistor core, well shy of their 30x goal, while 80MB of cache is probably more than enough for the cores (see Table 3).
Q18. How many parallel machines used irregular and circuit-switched networks?
Many older parallel machines used irregular and circuit-switched networks31,41; Figure 12 describes a return to hybrid switched networks for on-chip interconnects.
Q19. What is the difference between a customized CPU and a GPU?
Another customization approach constrains the types of parallelism that can be executed efficiently, enabling a simpler core, coordination, and memory structures; for example, many CPUs increase energy efficiency by restricting memory access structure and control flexibility in single-instruction, multiple-data or vector (SIMD) structures,1,2 while GPUs encourage programs to express structured sets of threads that can be aligned and executed efficiently.