scispace - formally typeset
L

Lansong Diao

Researcher at Alibaba Group

Publications -  12
Citations -  147

Lansong Diao is an academic researcher from Alibaba Group. The author has contributed to research in topics: Computer science & Speedup. The author has an hindex of 3, co-authored 7 publications receiving 37 citations.

Papers
More filters
Posted Content

DAPPLE: A Pipelined Data Parallel Approach for Training Large Models

TL;DR: DAPPLE, a synchronous training framework which combines data parallelism and pipeline parallelism for large DNN models, is proposed, which features a novel parallelization strategy planner to solve the partition and placement problems, and explores the optimal hybrid strategies of data and pipeline Parallelism.
Proceedings ArticleDOI

DAPPLE: a pipelined data parallel approach for training large models

TL;DR: DAPPLE as mentioned in this paper is a synchronous training framework which combines data parallelism and pipeline parallelism for large DNN models, and it features a novel parallelization strategy planner to solve the partition and placement problems.
Posted Content

FusionStitching: Boosting Memory Intensive Computations for Deep Learning Workloads.

TL;DR: This work proposes FusionStitching, a Deep Learning compiler capable of fusing memory intensive operators, with varied data dependencies and non-homogeneous parallelism, into large GPU kernels to reduce global memory access and operation scheduling overhead automatically and tunes the optimal stitching scheme just-in-time with a domain-specific cost model efficiently.
Proceedings ArticleDOI

DISC: A Dynamic Shape Compiler for Machine Learning Workloads

TL;DR: DISC as discussed by the authors enriches a set of IR to form a fully dynamic shape representation and generates the runtime flow at compile time to support processing dynamic shape based logic, which avoids the interpretation overhead at runtime and enlarges the opportunity of host-device co-optimization.
Proceedings ArticleDOI

PAI-FCNN: FPGA Based Inference System for Complex CNN Models

TL;DR: This paper presents the design of an FPGA-based CNN inference system, PAI-FCNN, to support modern complex CNN models, which achieves better throughput and power efficiency than GPU solutions.