Rodinia: A benchmark suite for heterogeneous computing
Summary (2 min read)
INTRODUCTION
- This article focusses on the interaction between internal and external constraints on linguistic variation and change as reflected in data from Arabic dialects spoken in the Arabian Peninsula and the Levant.
- This is because language change has been an object of linguistic inquiry much longer than variation has.
- He argues that even the earliest treatises of Arabic grammar, dating back to the 8th century, had sociolinguistic material embedded in them (see e.g. Owens 2001: 421), and that their understanding of the development of Arabic relies heavily on the knowledge the authors possess of the social reality of its speakers.
OVERVIEW OF THE AVAILABLE DATA
- The range of features the authors analyse in the research presented here represents several changes in progress in Arabic dialects.
- These features are listed below (all notations are in IPA).
- Page 6 of 82 Cambridge University Press Language in Society For Peer Review 7 J. Milroy (1993: 220), makes the following compelling observation:.
- The phonology of the Medina dialect is thus being restructured vis-à-vis this feature, as no such allophony had existed in the traditional dialects of either social group.
- She found that [j] has become the main variant of the young Baḥārna and that their traditional variant [ʤ] is rarely used in daily interactions.
PALATALISATION AND DEPALATALISATION
- Evidence from a vocalic morphophonemic feature: the feminine ending Page 15 of 82 Cambridge University Press Language in Society For Peer Review 16 In Arabic dialects, there is a suffix that denotes feminine grammatical gender in many nouns and most adjectives.
- Contemporary research on depalatalisation8 of etymological velar stops suggests that gender distinction has an effect on the change in the opposite direction, [ʧ] > [k].
- The 6% of the affricate tokens reported by Al-Essa all come from four speakers (out of a sample of 61).
- Page 22 of 82 Cambridge University Press Language in Society For Peer Review 23.
GENDER AS A SOCIAL FACTOR
- Al-Qahtani (2015) studied two isolated villages in Tihāmat Qahṭān, in southern Arabia.
- On the other hand, there is a steep increase in the use of the innovative variant from old Page 25 of 82 Cambridge University Press Language in Society For Peer Review 26 to young female speakers (16% to 69%).
- Evidence of drastic social changes with regard to gendered behaviour is found in field notes by Al-Qahtani from her work in southern Saudi Arabia.
- The authors experience conducting fieldwork across the Arab World has provided us with many narratives such as the one quoted above.
- Page 28 of 82 Cambridge University Press Language in Society For Peer Review 29.
CONCLUSION
- In what is perhaps the foundational text of sociolinguistic theory, Weinreich, Labov & Herzog (1968) deal with a number of ‘problems’ or ‘riddles’, the most difficult of which is the actuation problem (or actuation riddle).
- Steering the discussion away from the hitherto received wisdom that variation in language can be ‘free variation’ and that subsequent language change can be ‘random’, Weinreich et al. (1968: 112) critique earlier accounts of language change, especially that of Hermann Paul (1880).
- The youngest age group actually patterned alongside the oldest group in resisting the change, and the middle age group was the one who seemed to be leading it.
- It is unlikely that factoring in a speaker’s age into the analysis of their speech would have been possible, or even taken seriously, had it not been for the incorporation of social factors in general into the study of language change.
- These may involve or affect the speaker’s state of mind, and that state of mind may very well bear upon the behavior that implements the change’.
Did you find this useful? Give us your feedback
Citations
695 citations
Cites methods from "Rodinia: A benchmark suite for hete..."
...The Rodinia benchmarks published by the University of Virgi nia [2] are very similar in philosophy and development to the Parboil benchmarks....
[...]
620 citations
558 citations
541 citations
Cites background from "Rodinia: A benchmark suite for hete..."
...It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and data-dependent....
[...]
441 citations
Cites methods from "Rodinia: A benchmark suite for hete..."
...We created parallel applications adapted from existing benchmarksuitesincludingRodinia[5],MineBench[18], and NVIDIA sCUDASDK[19]in addition to creatingone of our own (blackjack)....
[...]
...We created parallel applications adapted from existing benchmark suites including Rodinia [5], MineBench [18], and NVIDIA’s CUDA SDK [19] in addition to creating one of our own (blackjack)....
[...]
...[5] S. Che et al. Rodinia: A benchmark suite for heterogeneous computing....
[...]
References
4,019 citations
"Rodinia: A benchmark suite for hete..." refers methods in this paper
...While developing and characterizing these benchmarks, we have experienced first-hand the following challenges of the GPU platform: Data Structure Mapping: Programmers must find efficient mappings of their applications’ data structures to CUDA’s hierarchical (grid of thread blocks) domain model....
[...]
4,002 citations
"Rodinia: A benchmark suite for hete..." refers background in this paper
...A diverse, multi-platform benchmark suite helps software, middleware, and hardware researchers in a variety of ways: • Accelerators offer significant performance and efficiency benefits compared to CPUs for many applications....
[...]
3,514 citations
"Rodinia: A benchmark suite for hete..." refers background or methods in this paper
...Needleman-Wunsch uses 16 threads per block as discussed earlier, and Leukocyte uses different thread block sizes (128 and 256) for its two kernels because it operates on different working sets in the detection and tracking phases....
[...]
...• Fused CPU-GPU processors and other heterogeneous multiprocessor SoCs are likely to become common in PCs, servers and HPC environments....
[...]
2,262 citations
"Rodinia: A benchmark suite for hete..." refers background in this paper
...Our decision to choose CUDA and OpenMP actually provides a real benefit....
[...]
...Each application or kernel is carefully chosen to represent different types of behavior according to the Berkeley dwarves [1]....
[...]
2,216 citations
"Rodinia: A benchmark suite for hete..." refers methods in this paper
...For GPU implementations, the Rodinia suite uses CUDA [22], an extension to C for GPUs....
[...]
Related Papers (5)
Frequently Asked Questions (15)
Q2. What are the future works mentioned in the paper "Rodinia: a benchmark suite for heterogeneous computing" ?
Directions for future work include: • Adding new applications to cover further dwarves, such as sparse matrix, sorting, etc. The authors plan to provide different download versions of ap- plications for steps where they add major incremental optimizations. The authors plan to extend the Rodinia benchmarks to support more platforms, such as FPGAs, STI Cell, etc. The authors plan to extend their diversity analysis by using the clustering analysis performed by Joshi et al. [ 15 ], which requires a principal components analysis ( PCA ) that they have left to future work.
Q3. What are the two widely used benchmark suites for general purpose computing?
SPEC CPU [31] and EEMBC [6] are two widely used benchmark suites for evaluating general purpose CPUs and embedded processors, respectively.
Q4. What are the important optimizations for a GPU?
The most important optimizations are to reduce CPUGPU communication and to maximize locality of memory accesses within each warp (ideally allowing a single, coalesced memory transaction to fulfill an entire warp’s loads).
Q5. How fast is Needleman-Wunsch compiled with icc?
For the single-threaded CPU implementation, for instance, Needleman-Wunsch compiled with icc is 3% faster than when compiled with gcc, and SRAD compiled with icc is 23% slower than when compiled with gcc.
Q6. What are the basic requirements of a benchmark suite for general purpose computing?
The basic requirements of a benchmark suite for general purpose computing include supporting diverse applications with various computation patterns, employing state-of-the-art algorithms, and providing input sets for testing different situations.
Q7. What is the limit on registers and shared memory available per SM?
The limit on registers and shared memory available per SM can constrain the number of active threads, sometimes exposing memory latency [29].
Q8. How do the authors extend their diversity analysis?
The authors plan to extend their diversity analysis by using theclustering analysis performed by Joshi et al. [15], which requires a principal components analysis (PCA) that the authors have left to future work.
Q9. Why does Needleman-Wunsch exhibit an L2 miss rate of 41.2%?
Needleman-Wunsch exhibits an L2 miss rate of 41.2% due to its unconventional memory access patterns (diagonal strips) which are poorly handled by prefetching.
Q10. Why do applications such as Leukocyte have relatively low overhead?
Applications such as SRAD and Leukocyte exhibit relatively low overhead because the majority of their computations are independent.
Q11. What is the purpose of the parallax benchmark suite?
A diverse, multi-platform benchmark suite helps software, middleware, and hardware researchers in a variety of ways:• Accelerators offer significant performance and efficiencybenefits compared to CPUs for many applications.
Q12. Why do the authors choose different number of threads per thread block for different applications?
The authors also choose different number of threads per thread block for different applications; generally block sizes are chosen to maximize thread occupancy, although in some cases smaller thread blocks and reduced occupancy provide improved performance.
Q13. What are some constraints that affect the performance of the GPU?
Other constraints include the fact that threads cannot fork new threads, the architecture presents a 32-wide SIMD organization, and the fact that only one kernel can run at a time.
Q14. What is the metric used to visualize the behavior of each benchmark?
The authors use Kiviat plots to visualize each benchmark’s inherent behavior, with each axis representing one of the eight microarchitectureindependent characteristics.
Q15. How did the compiler parallelize the two applications?
The compiler was able to automatically parallelize two of the Rodinia applications, HotSpot and SRAD, after the authors made minimal modifications to the code.