Y
Yin Huai
Researcher at Ohio State University
Publications - 12
Citations - 1969
Yin Huai is an academic researcher from Ohio State University. The author has contributed to research in topics: Big data & SQL. The author has an hindex of 8, co-authored 12 publications receiving 1744 citations.
Papers
More filters
Proceedings ArticleDOI
Spark SQL: Relational Data Processing in Spark
Michael Armbrust,Reynold Xin,Cheng Lian,Yin Huai,Davies Liu,Joseph K. Bradley,Xiangrui Meng,Tomer Kaftan,Michael J. Franklin,Ali Ghodsi,Matei Zaharia +10 more
TL;DR: Spark SQL is a new module in Apache Spark that integrates relational processing with Spark's functional programming API, and includes a highly extensible optimizer, Catalyst, built using features of the Scala programming language.
Proceedings ArticleDOI
RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems
TL;DR: This paper presents a big data placement structure called RCFile (Record Columnar File) and its implementation in the Hadoop system and shows the effectiveness of RCFile in satisfying the four requirements.
Proceedings ArticleDOI
YSmart: Yet Another SQL-to-MapReduce Translator
TL;DR: Y Smart, a correlation aware SQL-to-MapReduce translator that applies a set of rules to use the minimal number of MapReduce jobs to execute multiple correlated operations in a complex query, can significantly reduce redundant computations, I/O operations and network transfers compared to existing translators.
Proceedings ArticleDOI
Major technical advancements in apache hive
Yin Huai,Ashutosh Chauhan,Alan Gates,Günther Hagleitner,Eric N. Hanson,Owen O'Malley,Jitendra Pandey,Yuan Yuan,Rubao Lee,Xiaodong Zhang +9 more
TL;DR: A community-based effort on technical advancements in Hive provides significant improvements on storage efficiency and query execution performance and shows how academic research lays a foundation for Hive to improve its daily operations.
Journal ArticleDOI
Accelerating pathology image data cross-comparison on CPU-GPU hybrid systems
TL;DR: The solution consists of an efficient GPU algorithm and a pipelined system framework with task migration support that improves the performance of spatial cross-comparison by over 18 times compared with a parallelized spatial database approach.