Proceedings ArticleDOI
Understanding network failures in data centers: measurement, analysis, and implications
Phillipa Gill,Navendu Jain,Nachiappan Nagappan +2 more
- Vol. 41, Iss: 4, pp 350-361
Reads0
Chats0
TLDR
The first large-scale analysis of failures in a data center network is presented, finding that data center networks show high reliability, commodity switches such as ToRs and AggS are highly reliable, and network redundancy is only 40% effective in reducing the median impact of failure.Abstract:
We present the first large-scale analysis of failures in a data center network. Through our analysis, we seek to answer several fundamental questions: which devices/links are most unreliable, what causes failures, how do failures impact network traffic and how effective is network redundancy? We answer these questions using multiple data sources commonly collected by network operators. The key findings of our study are that (1) data center networks show high reliability, (2) commodity switches such as ToRs and AggS are highly reliable, (3) load balancers dominate in terms of failure occurrences with many short-lived software related faults,(4) failures have potential to cause loss of many small packets such as keep alive messages and ACKs, and (5) network redundancy is only 40% effective in reducing the median impact of failure.read more
Citations
More filters
Proceedings ArticleDOI
SIMPLE-fying middlebox policy enforcement using SDN
TL;DR: SIMPLE, a SDN-based policy enforcement layer for efficient middlebox-specific "traffic steering", is presented, a significant step toward addressing industry concerns surrounding the ability of SDN to integrate with existing infrastructure and support L4-L7 capabilities.
Proceedings ArticleDOI
CONGA: distributed congestion-aware load balancing for datacenters
Mohammad Alizadeh,Tom Edsall,Sarang Dharmapurikar,Ramanan Vaidyanathan,Kevin Chu,Andy Fingerhut,Francis Matus,Rong Pan,Navindra Yadav,George Varghese +9 more
TL;DR: It is argued that datacenter fabric load balancing is best done in the network, and requires global schemes such as CONGA to handle asymmetry, and CONGA is nearly as effective as a centralized scheduler while being able to react to congestion in microseconds.
Journal ArticleDOI
Exascale computing and big data
Daniel A. Reed,Jack Dongarra +1 more
TL;DR: This work unifies traditionally separated high-performance computing and big data analytics in one place to accelerate scientific discovery and engineering innovation and foster new ideas in science and engineering.
Proceedings ArticleDOI
Integrating scale out and fault tolerance in stream processing using operator state management
TL;DR: The key idea is to expose internal operator state explicitly to the SPS through a set of state management primitives that can scale automatically to a load factor of L=350 with 50 VMs, while recovering quickly from failures.
Proceedings ArticleDOI
Dynamic scheduling of network updates
Xin Jin,Hongqiang Harry Liu,Rohan Gandhi,Srikanth Kandula,Ratul Mahajan,Ming Zhang,Jennifer Rexford,Roger Wattenhofer +7 more
TL;DR: Dionysus encodes as a graph the consistency-related dependencies among updates at individual switches, and it then dynamically schedules these updates based on runtime differences in the update speeds of different switches, which increases the system's speed.
References
More filters
Journal ArticleDOI
OpenFlow: enabling innovation in campus networks
Nick McKeown,Thomas Anderson,Hari Balakrishnan,Guru Parulkar,Larry L. Peterson,Jennifer Rexford,Scott Shenker,Jonathan S. Turner +7 more
TL;DR: This whitepaper proposes OpenFlow: a way for researchers to run experimental protocols in the networks they use every day, based on an Ethernet switch, with an internal flow-table, and a standardized interface to add and remove flow entries.
Journal ArticleDOI
A scalable, commodity data center network architecture
TL;DR: This paper shows how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements and argues that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today's higher-end solutions.
Proceedings ArticleDOI
VL2: a scalable and flexible data center network
Albert Greenberg,James R. Hamilton,Navendu Jain,Srikanth Kandula,Changhoon Kim,Parantap Lahiri,David A. Maltz,Parveen Patel,Sudipta Sengupta +8 more
TL;DR: VL2 is a practical network architecture that scales to support huge data centers with uniform high capacity between servers, performance isolation between services, and Ethernet layer-2 semantics, and is built on a working prototype.
Proceedings ArticleDOI
Network traffic characteristics of data centers in the wild
TL;DR: An empirical study of the network traffic in 10 data centers belonging to three different categories, including university, enterprise campus, and cloud data centers, which includes not only data centers employed by large online service providers offering Internet-facing applications but also data centers used to host data-intensive (MapReduce style) applications.
Proceedings ArticleDOI
Data center TCP (DCTCP)
Mohammad Alizadeh,Albert Greenberg,David A. Maltz,Jitendra Padhye,Parveen Patel,Balaji Prabhakar,Sudipta Sengupta,Murari Sridharan +7 more
TL;DR: DCTCP enables the applications to handle 10X the current background traffic, without impacting foreground traffic, thus largely eliminating incast problems, and delivers the same or better throughput than TCP, while using 90% less buffer space.