Open AccessProceedings Article
Azure accelerated networking: SmartNICs in the public cloud
Daniel Firestone,Andrew Putnam,Mundkur Sambhrama Madhusudhan,Derek Chiou,Alireza Dabagh,Mike Andrewartha,Hari Angepat,Vivek Bhanu,Adrian M. Caulfield,Eric S. Chung,Chandrappa Harish Kumar,Chaturmohta Somesh,Matt Humphrey,Jack Lavier,Lam Norman C,Fengfen Liu,Kalin Ovtcharov,Jitu Padhye,Gautham Popuri,Shachar Raindel,Tejas Sapre,Mark Shaw,Gabriel Silva,Madhan Sivakumar,Nisheeth Srivastava,Anshuman Verma,Qasim Zuhair,Deepak Bansal,Doug Burger,Kushagra Vaid,David A. Maltz,Albert Greenberg +31 more
- pp 51-64
Reads0
Chats0
TLDR
The design of AccelNet is presented, including the hardware/software codesign model, performance results on key workloads, and experiences and lessons learned from developing and deploying Accel net on FPGA-based Azure SmartNICs.Abstract:
Modern cloud architectures rely on each server running its own networking stack to implement policies such as tunneling for virtual networks, security, and load balancing. However, these networking stacks are becoming increasingly complex as features are added and as network speeds increase. Running these stacks on CPU cores takes away processing power from VMs, increasing the cost of running cloud services, and adding latency and variability to network performance.
We present Azure Accelerated Networking (AccelNet), our solution for offloading host networking to hardware, using custom Azure SmartNICs based on FPGAs. We define the goals of AccelNet, including programmability comparable to software, and performance and efficiency comparable to hardware. We show that FPGAs are the best current platform for offloading our networking stack as ASICs do not provide sufficient programmability, and embedded CPU cores do not provide scalable performance, especially on single network flows.
Azure SmartNICs implementing AccelNet have been deployed on all new Azure servers since late 2015 in a fleet of >1M hosts. The AccelNet service has been available for Azure customers since 2016, providing consistent <15µs VM-VM TCP latencies and 32Gbps throughput, which we believe represents the fastest network available to customers in the public cloud. We present the design of AccelNet, including our hardware/software codesign model, performance results on key workloads, and experiences and lessons learned from developing and deploying AccelNet on FPGA-based Azure SmartNICs.read more
Citations
More filters
Proceedings ArticleDOI
An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems
Yu Gan,Yanqi Zhang,Dailun Cheng,Ankitha Shetty,Priyal Rathi,Nayan Katarki,Ariana Bruno,Justin Hu,Brian Ritchken,Brendon Jackson,Kelvin Hu,Meghna Pancholi,Yuan He,Brett Clancy,Chris Colen,Fukang Wen,Catherine Leung,Siyuan Wang,Leon Zaruvinsky,Mateo Espinosa,Rick Lin,Zhongling Liu,Jake Padilla,Christina Delimitrou +23 more
TL;DR: This paper presents DeathStarBench, a novel, open-source benchmark suite built with microservices that is representative of large end-to-end services, modular and extensible, and uses it to study the architectural characteristics of microservices, their implications in networking and operating systems, their challenges with respect to cluster management, and their trade-offs in terms of application design and programming frameworks.
Proceedings ArticleDOI
Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices
TL;DR: Seer is presented, an online cloud performance debugging system that leverages deep learning and the massive amount of tracing data cloud systems collect to learn spatial and temporal patterns that translate to QoS violations.
Proceedings ArticleDOI
Understanding PCIe performance for end host networking
Rolf Neugebauer,Gianni Antichi,Jose Fernando Zazo,Yury Audzevich,Sergio Lopez-Buedo,Andrew W. Moore +5 more
TL;DR: A theoretical model for PCIe and pcie-bench, an open-source suite, are presented that allows developers to gain an accurate and deep understanding of the PCIe substrate, and insights are gained which guided software and future hardware architectures for both commercial and research oriented network cards and DMA engines.
Proceedings Article
Shenango: Achieving High {CPU} Efficiency for Latency-sensitive Datacenter Workloads
TL;DR: Shenango achieves tail latency and throughput comparable to ZygOS, a state-of-the-art, kernel-bypass network stack, but can linearly trade latency-sensitive application throughput for batch processing application throughput, vastly increasing CPU efficiency.
Posted Content
Datacenter RPCs can be General and Fast
TL;DR: eRPC is a new general-purpose remote procedure call (RPC) library that offers performance comparable to specialized systems, while running on commodity CPUs in traditional datacenter networks based on either lossy Ethernet or lossless fabrics.
References
More filters
Journal ArticleDOI
P4: programming protocol-independent packet processors
Pat Bosshart,Daniel P. Daly,Glen Gibb,Martin J. Izzard,Nick McKeown,Jennifer Rexford,Cole Schlesinger,Daniel Talayco,Amin Vahdat,George Varghese,David Walker +10 more
TL;DR: This paper proposes P4 as a strawman proposal for how OpenFlow should evolve in the future, and describes how to use P4 to configure a switch to add a new hierarchical label.
Journal ArticleDOI
A reconfigurable fabric for accelerating large-scale datacenter services
Andrew Putnam,Adrian M. Caulfield,Eric S. Chung,Derek Chiou,Kypros Constantinides,John Demme,Hadi Esmaeilzadeh,Jeremy Fowers,Gopi Prashanth Gopal,Jan Gray,Michael Haselman,Scott Hauck,Stephen F. Heil,Amir Hormati,Joo-Young Kim,Sitaram Lanka,James R. Larus,Eric C. Peterson,Simon Pope,Aaron L. Smith,Jason Thong,Phillip Yi Xiao,Doug Burger +22 more
TL;DR: The authors deployed the reconfigurable fabric in a bed of 1,632 servers and FPGAs in a production datacenter and successfully used it to accelerate the ranking portion of the Bing Web search engine by nearly a factor of two.
Journal ArticleDOI
A reconfigurable fabric for accelerating large-scale datacenter services
Andrew Putnam,Adrian M. Caulfield,Eric S. Chung,Derek Chiou,Kypros Constantinides,John Demme,Hadi Esmaeilzadeh,Jeremy Fowers,Gopi Prashanth Gopal,Jan Gray,Michael Haselman,Scott Hauck,Stephen F. Heil,Amir Hormati,Joo-Young Kim,Sitaram Lanka,James R. Larus,Eric C. Peterson,Simon Pope,Aaron L. Smith,Jason Thong,Phillip Yi Xiao,Doug Burger +22 more
TL;DR: The requirements and architecture of the fabric are described, the critical engineering challenges and solutions needed to make the system robust in the presence of failures are detailed, and the performance, power, and resilience of the system when ranking candidate documents are measured.
Proceedings ArticleDOI
A cloud-scale acceleration architecture
Adrian M. Caulfield,Eric S. Chung,Andrew Putnam,Hari Angepat,Jeremy Fowers,Michael Haselman,Stephen F. Heil,Matt Humphrey,Puneet Kaur,Joo-Young Kim,Lo Daniel,Todd Massengill,Kalin Ovtcharov,Michael K. Papamichael,Lisa Woods,Sitaram Lanka,Derek Chiou,Doug Burger +17 more
TL;DR: A new cloud architecture that uses reconfigurable logic to accelerate both network plane functions and applications, and is much more scalable than prior work which used secondary rack-scale networks for inter-FPGA communication.
Proceedings ArticleDOI
SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs
TL;DR: The system, called SilkRoad, is defined in a 400 line P4 program and when compiled to a state-of-the-art switching ASIC, it can load-balance ten million connections simultaneously at line rate.
Related Papers (5)
A reconfigurable fabric for accelerating large-scale datacenter services
Andrew Putnam,Adrian M. Caulfield,Eric S. Chung,Derek Chiou,Kypros Constantinides,John Demme,Hadi Esmaeilzadeh,Jeremy Fowers,Gopi Prashanth Gopal,Jan Gray,Michael Haselman,Scott Hauck,Stephen F. Heil,Amir Hormati,Joo-Young Kim,Sitaram Lanka,James R. Larus,Eric C. Peterson,Simon Pope,Aaron L. Smith,Jason Thong,Phillip Yi Xiao,Doug Burger +22 more