scispace - formally typeset
Y

Yuanrui Zhang

Researcher at Pennsylvania State University

Publications -  29
Citations -  349

Yuanrui Zhang is an academic researcher from Pennsylvania State University. The author has contributed to research in topics: Cache & Data access. The author has an hindex of 11, co-authored 29 publications receiving 332 citations. Previous affiliations of Yuanrui Zhang include Intel.

Papers
More filters
Proceedings ArticleDOI

A compiler framework for extracting superword level parallelism

TL;DR: A novel automated compiler framework for improving superword level parallelism exploitation and to increase SIMD parallelism and capture more superword reuses among the superword statements through global data access and reuse pattern analysis is proposed.
Journal ArticleDOI

Accurate Area, Time and Power Models for FPGA-Based Implementations

TL;DR: This paper presents accurate area, time, power estimation models for implementations using FPGAs from the Xilinx Virtex-2Pro family (Deng et al. 2008) to facilitate efficient design space exploration in an automated algorithm-architecture codesign framework.
Proceedings ArticleDOI

Optimizing shared cache behavior of chip multiprocessors

TL;DR: The proposed data locality optimization scheme improves inter-core conflict misses in the shared cache by 67% on average when both allocation and scheduling are used and the execution time improvements achieved are very close to the optimal savings that could be achieved using a hypothetical scheme.
Proceedings ArticleDOI

Optimizing Data Layouts for Parallel Computation on Multicores

TL;DR: This work explores automatic data layout transformation targeting multithreaded applications running on multicores that automatically determines a customized memory layout for each target array to minimize potential cache conflicts across threads.
Proceedings ArticleDOI

Studying inter-core data reuse in multicores

TL;DR: A novel, compiler-based data locality optimization strategy for multicores is presented that balances both inter-core and intra-core reuse optimizations carefully to maximize benefits that can be extracted from shared caches and is very effective in optimizing data locality in multicores.