Computer Architecture and Design
read more
References
A mathematical theory of communication
New Directions in Cryptography
Use of Elliptic Curves in Cryptography
Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems
Related Papers (5)
Frequently Asked Questions (17)
Q2. What is the primary advantage of lengthening a vector register?
For loops where few temporary values exist, longer vector registers can be used to reduce instruction bandwidth and stripmining overhead, while for loops where many temporary values exist, the number of shorter vector registers can be increased to reduce the number of vector register spills and, hence, the data memory bandwidth required.
Q3. How much memory is provided by a modern high-end vector supercomputer?
A modern high-end vector supercomputer provides over 50 GB/s of main memory bandwidth per CPU, while high-end microprocessor systems provide only around 1 GB/s per CPU.
Q4. What is the common way to perform a vector loop?
If the application requires vectors longer than will fit into a vector register, a process called strip mining is used to construct a vector loop that executes the application code loop in segments that each fit into the machine’s vector registers.
Q5. What is the main disadvantage of a configurable vector register file?
The main disadvantage of a configurable vector register file is the increase in control logic complexity and the increase in machine state to hold the configuration information.
Q6. What is the basic architecture of a vector processor?
The vector processing unit includes a set of vector registers and a set of vector functional units that operate on the vector registers.
Q7. How has vector processing been used in the world’s fastest supercomputers?
For nearly 30 years, vector processing has been used in the world’s fastest supercomputers to accelerate applications in scientific and technical computing.
Q8. What is the advantage of software prefetching?
Software prefetching can be very accurate as the compiler knows the reference patterns of each piece of code, but the software prefetch instructions have to be carefully scheduled so that data are not brought in too early, perhaps evicting useful data, or too late, which will leave some memory latency exposed.
Q9. What is the simplest form of vector load and store?
The simplest form of vector load and store transfers a set of elements that are contiguous in memory to successive elements of a vector register.
Q10. What is the important factor in determining the performance of data parallel programs?
One of the most important factors in determining the performance of data parallel programs is the range of vector lengths observed for typical data sets.
Q11. What is the basic vector register architecture?
Basic Vector Register Architecture Vector processors contain a conventional scalar processor that executes general-purpose code together with a vector processing unit that handles data parallel code.
Q12. Why are architects motivated to include data parallel instructions?
In both cases, architects are motivated to include data parallel instructions because they enable large increases in performance at much lower cost than alternative approaches to exploiting application parallelism.
Q13. What is the way to achieve high throughput on data parallel applications?
An alternative approach to attaining high throughput on data parallel applications is to add more CPUs each with vector units and to parallelize loops at the thread level.
Q14. Why is there no need for communication between the lanes?
Because of the way the vector ISA is designed, there is no need for communication between the lanes except via the memory system.
Q15. How many PEs can be constructed in the original connection machine design?
This allows large arrays of simple PEs to be constructed, for example, up to 65,536 single-bit PEs in the original connection machine design.
Q16. What is the main advantage of using a multiprocessor approach to hide memory latency?
Various forms of hardware and software prefetching schemes have become popular with microprocessor designers to hide memory latency.
Q17. Why do vector machines have a smaller number of highly pipelined PEs?
Because it is difficult to construct machines that allow a large number of simple processors to share a large central memory, vector machines typically have a smaller number of highly pipelined PEs.