Amdahl's Law in the Multicore Era
read more
Citations
Roofline: an insightful visual performance model for multicore architectures
Dark silicon and the end of multicore scaling
The multikernel: a new OS architecture for scalable multicore systems
A view of the parallel computing landscape
Memory and Information Processing in Neuromorphic Systems
References
Computer Architecture: A Quantitative Approach
Validity of the single processor approach to achieving large scale computing capabilities
The Landscape of Parallel Computing Research: A View from Berkeley
Reevaluating Amdahl's law
Reevaluating Amdahl's law
Related Papers (5)
Frequently Asked Questions (16)
Q2. What is the corollary of Amdahl’s law?
Augmenting Amdahl’s law with a corollary for multicore hardware makes it relevant to futuregenerations of chips with multiple processor cores.
Q3. Why should architects increase core resources when perf(r) > r?
Architects should always increase core resources when perf(r) > r because doing so speeds up both sequential and parallel execution.
Q4. How much money did the US National Science Foundation give to this work?
The US National Science Foundation supported this work in part through grants EIA/CNS-0205286, CCR0324878, CNS-0551401, CNS-0720565, and CNS0720565.
Q5. How many cores can be harnessed for sequential mode?
For f = 0.99 and n = 256, for example, effectively harnessing all 256 cores would achieve a speedup of 223, which is much greater than the comparable asymmetric speedup of 165.
Q6. What is the speedup of a symmetric multicore chip?
Under Amdahl’s law, the speedup of a symmetric multicore chip (relative to using one single-BCE core) depends on the software fraction that is parallelizable (f), the total chip resources in BCEs (n), and the BCE resources (r) devoted to increase each core’s performance.
Q7. What is the implication of this article?
Implication 6. Researchers should continue to investigate methods that approximate a dynamic multicore chip, such as thread-level speculation and helper threads.
Q8. What is the simplest way to calculate the speedup of a multicore chip?
—Thomas Puzak, IBM, 2007Most computer scientists learn Amdahl’s law in school: Let speedup be the original execution time divided by an enhanced execution time.
Q9. What is the effect of Amdahl’s law on asymmetric multicore?
With a resource budget of n = 16 BCEs, for example, an asymmetric multicore chip can have one four-BCE core and 12 one-BCE cores, one nine-BCE core and seven oneBCE cores, and so on.
Q10. What is the important corollary of Amdahl’s law?
The modern version of Amdahl’s law states that if you enhance a fraction f of a computation by a speedup S, the overall speedup is:Speedupenhanced f S f fS ,( ) = −( ) + 1 1Amdahl’s law applies broadly and has important corollaries such as:Attack the common case:
Q11. What is the alternative to a symmetric multicore chip?
An alternative to a symmetric multicore chip is an asymmetric (or heterogeneous) multicore chip, in which one or more cores are more powerful than the others.
Q12. How many cores are used in asymmetric multicore chips?
For f = 0.975 and n = 1,024, an example not shown in their graphs, the best speedup is at a hypothetical design with one core of 345 BCEs and 679 single-BCE cores.
Q13. What is the important aspect of Amdahl’s law?
Without presenting an equation, he noted that the speedup on n processors is governed by:Speedupparallel f n f fn ,( ) = −( ) + 1 1•Finally, Amdahl argued that typical values of 1 – f were large enough to favor single processors.
Q14. How can the authors make a dynamic multicore chip faster?
hardware designers can’t build cores that achieve arbitrary high performance by adding more resources, nor do they know how to dynamically harness many cores for sequential use without undue performance and hardware resource overhead.
Q15. What is the simplest way to calculate perf(r)?
Published by the IEEE Computer Society July 2008 33C o v e r f e a t u r e34 ComputerTheir equations allow perf(r) to be an arbitrary function, but all their graphs follow Shekhar Borkar3 and assume perf(r) = r .
Q16. How many cores are used to execute in parallel?
It uses all n/r cores to execute in parallel at performance perf(r) × n/r. Overall, the authors get:Speedupsymmetric f n r f perf rf r , ,( ) =+− ( ) ⋅1 1perf r n( )⋅Consider Figure 2a.