scispace - formally typeset
Open AccessProceedings Article

Azure accelerated networking: SmartNICs in the public cloud

Reads0
Chats0
TLDR
The design of AccelNet is presented, including the hardware/software codesign model, performance results on key workloads, and experiences and lessons learned from developing and deploying Accel net on FPGA-based Azure SmartNICs.
Abstract
Modern cloud architectures rely on each server running its own networking stack to implement policies such as tunneling for virtual networks, security, and load balancing. However, these networking stacks are becoming increasingly complex as features are added and as network speeds increase. Running these stacks on CPU cores takes away processing power from VMs, increasing the cost of running cloud services, and adding latency and variability to network performance. We present Azure Accelerated Networking (AccelNet), our solution for offloading host networking to hardware, using custom Azure SmartNICs based on FPGAs. We define the goals of AccelNet, including programmability comparable to software, and performance and efficiency comparable to hardware. We show that FPGAs are the best current platform for offloading our networking stack as ASICs do not provide sufficient programmability, and embedded CPU cores do not provide scalable performance, especially on single network flows. Azure SmartNICs implementing AccelNet have been deployed on all new Azure servers since late 2015 in a fleet of >1M hosts. The AccelNet service has been available for Azure customers since 2016, providing consistent <15µs VM-VM TCP latencies and 32Gbps throughput, which we believe represents the fastest network available to customers in the public cloud. We present the design of AccelNet, including our hardware/software codesign model, performance results on key workloads, and experiences and lessons learned from developing and deploying AccelNet on FPGA-based Azure SmartNICs.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems

TL;DR: This paper presents DeathStarBench, a novel, open-source benchmark suite built with microservices that is representative of large end-to-end services, modular and extensible, and uses it to study the architectural characteristics of microservices, their implications in networking and operating systems, their challenges with respect to cluster management, and their trade-offs in terms of application design and programming frameworks.
Proceedings ArticleDOI

Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices

TL;DR: Seer is presented, an online cloud performance debugging system that leverages deep learning and the massive amount of tracing data cloud systems collect to learn spatial and temporal patterns that translate to QoS violations.
Proceedings ArticleDOI

Understanding PCIe performance for end host networking

TL;DR: A theoretical model for PCIe and pcie-bench, an open-source suite, are presented that allows developers to gain an accurate and deep understanding of the PCIe substrate, and insights are gained which guided software and future hardware architectures for both commercial and research oriented network cards and DMA engines.
Proceedings Article

Shenango: Achieving High {CPU} Efficiency for Latency-sensitive Datacenter Workloads

TL;DR: Shenango achieves tail latency and throughput comparable to ZygOS, a state-of-the-art, kernel-bypass network stack, but can linearly trade latency-sensitive application throughput for batch processing application throughput, vastly increasing CPU efficiency.
Posted Content

Datacenter RPCs can be General and Fast

TL;DR: eRPC is a new general-purpose remote procedure call (RPC) library that offers performance comparable to specialized systems, while running on commodity CPUs in traditional datacenter networks based on either lossy Ethernet or lossless fabrics.
References
More filters
Journal ArticleDOI

P4: programming protocol-independent packet processors

TL;DR: This paper proposes P4 as a strawman proposal for how OpenFlow should evolve in the future, and describes how to use P4 to configure a switch to add a new hierarchical label.
Proceedings ArticleDOI

A cloud-scale acceleration architecture

TL;DR: A new cloud architecture that uses reconfigurable logic to accelerate both network plane functions and applications, and is much more scalable than prior work which used secondary rack-scale networks for inter-FPGA communication.
Proceedings ArticleDOI

SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs

TL;DR: The system, called SilkRoad, is defined in a 400 line P4 program and when compiled to a state-of-the-art switching ASIC, it can load-balance ten million connections simultaneously at line rate.
Related Papers (5)