J
Jiajia Li
Researcher at Pacific Northwest National Laboratory
Publications - 45
Citations - 1073
Jiajia Li is an academic researcher from Pacific Northwest National Laboratory. The author has contributed to research in topics: Sparse matrix & Speedup. The author has an hindex of 14, co-authored 40 publications receiving 716 citations. Previous affiliations of Jiajia Li include Chinese Academy of Sciences & Georgia Institute of Technology.
Papers
More filters
Proceedings ArticleDOI
Bridging the gap between deep learning and sparse matrix format selection
TL;DR: This work describes how to effectively bridge the gap between deep learning and the special needs of the pillar HPC problem through a set of techniques on matrix representations, deep learning structure, and cross-architecture model migrations.
Proceedings ArticleDOI
Model-Driven Sparse CP Decomposition for Higher-Order Tensors
TL;DR: A novel, adaptive tensor memoization algorithm, AdaTM, which allows a user to make a space-time tradeoff by automatically tuning algorithmic and machine parameters using a model-driven framework, making its performance more scalable for higher-order data problems.
Proceedings ArticleDOI
Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning
TL;DR: The toolchain is an attempt to automatically crack different GPU ISA encodings and build an assembler adaptively for the purpose of performance enhancements to applications on GPUs.
Proceedings ArticleDOI
A pattern based algorithmic autotuner for graph processing on GPUs
TL;DR: Gswitch is a pattern-based algorithmic auto-tuning system that dynamically switches between optimization variants with negligible overhead and provides a simple programming interface that conceals low-level tuning details from the user.
Proceedings ArticleDOI
Optimizing sparse tensor times matrix on multi-core and many-core architectures
TL;DR: The optimized design and implementation of sparse tensor-times-dense matrix multiply (SpTTM) for CPU and GPU platforms is presented, which is a critical bottleneck in data analysis and mining applications based on tensor methods, such as the Tucker decomposition.