Azure accelerated networking: SmartNICs in the public cloud

Open AccessProceedings Article

Azure accelerated networking: SmartNICs in the public cloud

Daniel Firestone, +31 more

- pp 51-64

Chats0

TLDR

The design of AccelNet is presented, including the hardware/software codesign model, performance results on key workloads, and experiences and lessons learned from developing and deploying Accel net on FPGA-based Azure SmartNICs.

Abstract:

Modern cloud architectures rely on each server running its own networking stack to implement policies such as tunneling for virtual networks, security, and load balancing. However, these networking stacks are becoming increasingly complex as features are added and as network speeds increase. Running these stacks on CPU cores takes away processing power from VMs, increasing the cost of running cloud services, and adding latency and variability to network performance. We present Azure Accelerated Networking (AccelNet), our solution for offloading host networking to hardware, using custom Azure SmartNICs based on FPGAs. We define the goals of AccelNet, including programmability comparable to software, and performance and efficiency comparable to hardware. We show that FPGAs are the best current platform for offloading our networking stack as ASICs do not provide sufficient programmability, and embedded CPU cores do not provide scalable performance, especially on single network flows. Azure SmartNICs implementing AccelNet have been deployed on all new Azure servers since late 2015 in a fleet of >1M hosts. The AccelNet service has been available for Azure customers since 2016, providing consistent <15µs VM-VM TCP latencies and 32Gbps throughput, which we believe represents the fastest network available to customers in the public cloud. We present the design of AccelNet, including our hardware/software codesign model, performance results on key workloads, and experiences and lessons learned from developing and deploying AccelNet on FPGA-based Azure SmartNICs.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems

Yu Gan, +23 more

TL;DR: This paper presents DeathStarBench, a novel, open-source benchmark suite built with microservices that is representative of large end-to-end services, modular and extensible, and uses it to study the architectural characteristics of microservices, their implications in networking and operating systems, their challenges with respect to cluster management, and their trade-offs in terms of application design and programming frameworks.

...read moreread less

Proceedings ArticleDOI

Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices

Yu Gan, +6 more

TL;DR: Seer is presented, an online cloud performance debugging system that leverages deep learning and the massive amount of tracing data cloud systems collect to learn spatial and temporal patterns that translate to QoS violations.

...read moreread less

Proceedings ArticleDOI

Understanding PCIe performance for end host networking

Rolf Neugebauer, +5 more

TL;DR: A theoretical model for PCIe and pcie-bench, an open-source suite, are presented that allows developers to gain an accurate and deep understanding of the PCIe substrate, and insights are gained which guided software and future hardware architectures for both commercial and research oriented network cards and DMA engines.

...read moreread less

Proceedings Article

Shenango: Achieving High {CPU} Efficiency for Latency-sensitive Datacenter Workloads

Amy Ousterhout, +4 more

TL;DR: Shenango achieves tail latency and throughput comparable to ZygOS, a state-of-the-art, kernel-bypass network stack, but can linearly trade latency-sensitive application throughput for batch processing application throughput, vastly increasing CPU efficiency.

...read moreread less

Posted Content

Datacenter RPCs can be General and Fast

Anuj Kalia, +2 more

- 02 Jun 2018 -

arXiv: Operating Systems

TL;DR: eRPC is a new general-purpose remote procedure call (RPC) library that offers performance comparable to specialized systems, while running on commodity CPUs in traditional datacenter networks based on either lossy Ethernet or lossless fabrics.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

P4: programming protocol-independent packet processors

Pat Bosshart, +10 more

TL;DR: This paper proposes P4 as a strawman proposal for how OpenFlow should evolve in the future, and describes how to use P4 to configure a switch to add a new hierarchical label.

...read moreread less

Journal ArticleDOI

A reconfigurable fabric for accelerating large-scale datacenter services

Andrew Putnam, +22 more

- 28 Oct 2016 -

Communications of The ACM

TL;DR: The authors deployed the reconfigurable fabric in a bed of 1,632 servers and FPGAs in a production datacenter and successfully used it to accelerate the ranking portion of the Bing Web search engine by nearly a factor of two.

...read moreread less

Journal ArticleDOI

A reconfigurable fabric for accelerating large-scale datacenter services

Andrew Putnam, +22 more

TL;DR: The requirements and architecture of the fabric are described, the critical engineering challenges and solutions needed to make the system robust in the presence of failures are detailed, and the performance, power, and resilience of the system when ranking candidate documents are measured.

...read moreread less

Proceedings ArticleDOI

SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs

Rui Miao, +4 more

TL;DR: The system, called SilkRoad, is defined in a 400 line P4 program and when compiled to a state-of-the-art switching ASIC, it can load-balance ten million connections simultaneously at line rate.

...read moreread less

Azure accelerated networking: SmartNICs in the public cloud

Citations

An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems

Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices

Understanding PCIe performance for end host networking

Shenango: Achieving High {CPU} Efficiency for Latency-sensitive Datacenter Workloads

Datacenter RPCs can be General and Fast

References

P4: programming protocol-independent packet processors

A reconfigurable fabric for accelerating large-scale datacenter services

A reconfigurable fabric for accelerating large-scale datacenter services

A cloud-scale acceleration architecture

SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs

Related Papers (5)

P4: programming protocol-independent packet processors

Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN

A cloud-scale acceleration architecture

NetCache: Balancing Key-Value Stores with Fast In-Network Caching

A reconfigurable fabric for accelerating large-scale datacenter services