Y
Yuan Tang
Researcher at Fudan University
Publications - 32
Citations - 566
Yuan Tang is an academic researcher from Fudan University. The author has contributed to research in topics: Cache-oblivious algorithm & Cache. The author has an hindex of 9, co-authored 31 publications receiving 521 citations. Previous affiliations of Yuan Tang include University of Electronic Science and Technology of China & University of Tennessee.
Papers
More filters
Proceedings ArticleDOI
The pochoir stencil compiler
TL;DR: The Pochoir stencil compiler allows a programmer to write a simple specification of a stencil in a domain-specific stencil language embedded in C++ which the Pochir compiler then translates into high-performing Cilk code that employs an efficient parallel cache-oblivious algorithm.
Proceedings ArticleDOI
Cache-oblivious wavefront: improving parallelism of recursive dynamic programming algorithms without losing cache-efficiency
TL;DR: Techniques are applied to a set of widely known dynamic programming problems, such as Floyd-Warshall's All-Pairs Shortest Paths, Stencil, and LCS, to remove the artificial dependency and preserve the cache-optimality by inheriting the DAC strategy.
Proceedings ArticleDOI
VNET/P: bridging the cloud and high performance computing through fast overlay networking
TL;DR: The design, implementation, and evaluation of a layer 2 virtual networking system that has negligible latency and bandwidth overheads in 1--10 Gbps networks are described, suggesting it is feasible to extend a software-based overlay network designed for computing at wide-area scales into tightly-coupled environments.
Proceedings ArticleDOI
AUTOGEN: automatic discovery of cache-oblivious parallel recursive algorithms for solving dynamic programs
Rezaul Chowdhury,Pramod Ganapathi,Jesmin Jahan Tithi,Charles Bachmeier,Bradley C. Kuszmaul,Charles E. Leiserson,Armando Solar-Lezama,Yuan Tang +7 more
TL;DR: The experimental results show that several autodiscovered algorithms significantly outperform parallel looping and tiled loop-based algorithms and are less sensitive to fluctuations of memory and bandwidth compared with their looping counterparts, and their running times and energy profiles remain relatively more stable.
Proceedings ArticleDOI
Provably Efficient Scheduling of Cache-oblivious Wavefront Algorithms
TL;DR: This paper systematically transform standard cache-oblivious recursive divide-and-conquer algorithms into recursive wavefront algorithms to achieve optimal parallel cache complexity and high parallelism under state-of-the-art schedulers for fork-join programs.