scispace - formally typeset
N

Ningyi Xu

Researcher at Microsoft

Publications -  47
Citations -  2878

Ningyi Xu is an academic researcher from Microsoft. The author has contributed to research in topics: Speedup & Ranking (information retrieval). The author has an hindex of 18, co-authored 42 publications receiving 2361 citations. Previous affiliations of Ningyi Xu include Baidu & Shanghai Jiao Tong University.

Papers
More filters
Proceedings ArticleDOI

Going Deeper with Embedded FPGA Platform for Convolutional Neural Network

TL;DR: This paper presents an in-depth analysis of state-of-the-art CNN models and shows that Convolutional layers are computational-centric and Fully-Connected layers are memory-centric, and proposes a CNN accelerator design on embedded FPGA for Image-Net large-scale image classification.
Proceedings ArticleDOI

FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates

TL;DR: FP-DNN (Field Programmable DNN), an end-to-end framework that takes TensorFlow-described DNNs as input, and automatically generates the hardware implementations on FPGA boards with RTL-HLS hybrid templates, is proposed.
Proceedings ArticleDOI

ClickNP: Highly Flexible and High Performance Network Processing with Reconfigurable Hardware

TL;DR: To the best of the knowledge, ClickNP is the first FPGA-accelerated platform for NFs, written completely in high-level language and achieving 40 Gbps line rate at any packet size and reducing latency by 10x.
Proceedings ArticleDOI

FPMR: MapReduce framework on FPGA

TL;DR: FPMR, a MapReduce framework on FPGA, which provides programming abstraction, hardware architecture, and basic building blocks to developers so that more attention can be paid to the application itself and the speedup of this framework is demonstrated.
Proceedings ArticleDOI

ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture

TL;DR: ForeGraph, a large-scale graph processing framework based on the multi-FPGA architecture, is proposed, which outperforms state-of-the-art FPGA-based large- scale graph processing systems by 4.54x when executing PageRank on the Twitter graph.