scispace - formally typeset
Search or ask a question

Enhancing TCP's Loss Recovery Using Limited Transmit

01 Jan 2001-Vol. 3042, pp 1-9
TL;DR: The invention is carried out through apparatus comprising a diffusion pump with a mass spectrometerconnected to the pump inlet and a trace gas inlet connected to the diffusion pump foreline.
Abstract: This document proposes a new Transmission Control Protocol (TCP) mechanism that can be used to more effectively recover lost segments when a connection's congestion window is small, or when a large number of segments are lost in a single transmission window. The "Limited Transmit" algorithm calls for sending a new data segment in response to each of the first two duplicate acknowledgments that arrive at the sender. Transmitting these segments increases the probability that TCP can recover from a single lost segment using the fast retransmit algorithm, rather than using a costly retransmission timeout. Limited Transmit can be used both in conjunction with, and in the absence of, the TCP selective acknowledgment (SACK) mechanism.

Content maybe subject to copyright    Report

Citations
More filters
01 Apr 2004
TL;DR: The purpose of this document is to advance NewReno TCP's Fast Retransmit and Fast Recovery algorithms in RFC 2582 from Experimental to Standards Track status.
Abstract: The purpose of this document is to advance NewReno TCP's Fast Retransmit and Fast Recovery algorithms in RFC 2582 from Experimental to Standards Track status.

1,602 citations

Proceedings ArticleDOI
16 Aug 2009
TL;DR: This paper uses high-resolution timers to enable microsecond-granularity TCP timeouts and shows that eliminating the minimum retransmission timeout bound is safe for all environments, including the wide-area.
Abstract: This paper presents a practical solution to a problem facing high-fan-in, high-bandwidth synchronized TCP workloads in datacenter Ethernets---the TCP incast problem. In these networks, receivers can experience a drastic reduction in application throughput when simultaneously requesting data from many servers using TCP. Inbound data overfills small switch buffers, leading to TCP timeouts lasting hundreds of milliseconds. For many datacenter workloads that have a barrier synchronization requirement (e.g., filesystem reads and parallel data-intensive queries), throughput is reduced by up to 90%. For latency-sensitive applications, TCP timeouts in the datacenter impose delays of hundreds of milliseconds in networks with round-trip-times in microseconds.Our practical solution uses high-resolution timers to enable microsecond-granularity TCP timeouts. We demonstrate that this technique is effective in avoiding TCP incast collapse in simulation and in real-world experiments. We show that eliminating the minimum retransmission timeout bound is safe for all environments, including the wide-area.

483 citations


Cites background from "Enhancing TCP's Loss Recovery Using..."

  • ...TCP mechanisms such as Limited Transmit [1] were specifically designed to help TCP recover from packet loss when window sizes are small—exactly the problem that occurs during incast collapse....

    [...]

  • ...Prior work characterizing TCP incast collapse ended on a somewhat down note, finding that TCP improvements— NewReno, SACK [22], RED [13], ECN [30], Limited Transmit [1], and modifications to Slow Start— sometimes increased throughput, but did not substantially prevent TCP incast collapse because the majority of timeouts were caused by full window losses [28]....

    [...]

Journal ArticleDOI
01 Jan 2002
TL;DR: This paper illustrates the impact of reordering on TCP performance, and proposes several alternatives to dynamically make the fast retransmission algorithm more tolerant of the reordering observed in the network.
Abstract: Previous research indicates that packet reordering is not a rare event on some Internet paths. Reordering can cause performance problems for TCP's fast retransmission algorithm, which uses the arrival of duplicate acknowledgments to detect segment loss. Duplicate acknowledgments can be caused by the loss of a segment or by the reordering of segments by the network. In this paper we illustrate the impact of reordering on TCP performance. In addition, we show the performance of a conservative approach to "undo" the congestion control state changes made in conjunction with spurious retransmissions. Finally, we propose several alternatives to dynamically make the fast retransmission algorithm more tolerant of the reordering observed in the network and assess these algorithms.

322 citations

Proceedings Article
26 Feb 2008
TL;DR: This paper analyzes this Incast problem, explores its sensitivity to various system parameters, and examines the effectiveness of alternative TCP- and Ethernet-level strategies in mitigating the TCP throughput collapse.
Abstract: Cluster-based and iSCSI-based storage systems rely on standard TCP/IP-over-Ethernet for client access to data. Unfortunately, when data is striped over multiple networked storage nodes, a client can experience a TCP throughput collapse that results in much lower read bandwidth than should be provided by the available network links. Conceptually, this problem arises because the client simultaneously reads fragments of a data block from multiple sources that together send enough data to overload the switch buffers on the client's link. This paper analyzes this Incast problem, explores its sensitivity to various system parameters, and examines the effectiveness of alternative TCP- and Ethernet-level strategies in mitigating the TCP throughput collapse.

280 citations


Cites background from "Enhancing TCP's Loss Recovery Using..."

  • ...Keywords: Cluster-based storage systems, TCP, performance measurement and analysis...

    [...]

Journal ArticleDOI
01 Apr 2005
TL;DR: Measurement results showing the impact of the current network environment on a number of traditional and proposed protocol mechanisms are provided and can be used to guide the definition of more realistic Internet modeling scenarios.
Abstract: In this paper we explore the evolution of both the Internet's most heavily used transport protocol, TCP, and the current network environment with respect to how the network's evolution ultimately impacts end-to-end protocols. The traditional end-to-end assumptions about the Internet are increasingly challenged by the introduction of intermediary network elements (middleboxes) that intentionally or unintentionally prevent or alter the behavior of end-to-end communications. This paper provides measurement results showing the impact of the current network environment on a number of traditional and proposed protocol mechanisms (e.g., Path MTU Discovery, Explicit Congestion Notification, etc.). In addition, we investigate the prevalence and correctness of implementations using proposed TCP algorithmic and protocol changes (e.g., selective acknowledgment-based loss recovery, congestion window growth based on byte counting, etc.). We present results of measurements taken using an active measurement framework to study web servers and a passive measurement survey of clients accessing information from our web server. We analyze our results to gain further understanding of the differences between the behavior of the Internet in theory versus the behavior we observed through measurements. In addition, these measurements can be used to guide the definition of more realistic Internet modeling scenarios. Finally, we present several lessons that will benefit others taking Internet measurements.

242 citations

References
More filters
Journal ArticleDOI
01 Aug 1988
TL;DR: The measurements and the reports of beta testers suggest that the final product is fairly good at dealing with congested conditions on the Internet, and an algorithm recently developed by Phil Karn of Bell Communications Research is described in a soon-to-be-published RFC.
Abstract: In October of '86, the Internet had the first of what became a series of 'congestion collapses'. During this period, the data throughput from LBL to UC Berkeley (sites separated by 400 yards and three IMP hops) dropped from 32 Kbps to 40 bps. Mike Karels1 and I were fascinated by this sudden factor-of-thousand drop in bandwidth and embarked on an investigation of why things had gotten so bad. We wondered, in particular, if the 4.3BSD (Berkeley UNIX) TCP was mis-behaving or if it could be tuned to work better under abysmal network conditions. The answer to both of these questions was “yes”.Since that time, we have put seven new algorithms into the 4BSD TCP: round-trip-time variance estimationexponential retransmit timer backoffslow-startmore aggressive receiver ack policydynamic window sizing on congestionKarn's clamped retransmit backofffast retransmit Our measurements and the reports of beta testers suggest that the final product is fairly good at dealing with congested conditions on the Internet.This paper is a brief description of (i) - (v) and the rationale behind them. (vi) is an algorithm recently developed by Phil Karn of Bell Communications Research, described in [KP87]. (viii) is described in a soon-to-be-published RFC.Algorithms (i) - (v) spring from one observation: The flow on a TCP connection (or ISO TP-4 or Xerox NS SPP connection) should obey a 'conservation of packets' principle. And, if this principle were obeyed, congestion collapse would become the exception rather than the rule. Thus congestion control involves finding places that violate conservation and fixing them.By 'conservation of packets' I mean that for a connection 'in equilibrium', i.e., running stably with a full window of data in transit, the packet flow is what a physicist would call 'conservative': A new packet isn't put into the network until an old packet leaves. The physics of flow predicts that systems with this property should be robust in the face of congestion. Observation of the Internet suggests that it was not particularly robust. Why the discrepancy?There are only three ways for packet conservation to fail: The connection doesn't get to equilibrium, orA sender injects a new packet before an old packet has exited, orThe equilibrium can't be reached because of resource limits along the path. In the following sections, we treat each of these in turn.

5,620 citations

01 Mar 1997
TL;DR: This document defines these words as they should be interpreted in IETF documents as well as providing guidelines for authors to incorporate this phrase near the beginning of their document.
Abstract: In many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. Authors who follow these guidelines should incorporate this phrase near the beginning of their document:

3,501 citations


Additional excerpts

  • ...The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this document, are to be interpreted as described in [B97]....

    [...]

01 Sep 1981

3,411 citations

01 Apr 1999
TL;DR: This document defines TCP's four intertwined congestion control algorithms: slow start, congestion avoidance, fast retransmit, and fast recovery, as well as discussing various acknowledgment generation methods.
Abstract: This document defines TCP's four intertwined congestion control algorithms: slow start, congestion avoidance, fast retransmit, and fast recovery. In addition, the document specifies how TCP should begin transmission after a relatively long idle period, as well as discussing various acknowledgment generation methods.

2,237 citations

01 Oct 1996
TL;DR: TCP may experience poor performance when multiple packets are lost from one window of data because of the limited information available from cumulative acknowledgments.
Abstract: TCP may experience poor performance when multiple packets are lost from one window of data. With the limited information available from cumulative acknowledgments, a TCP sender can only learn about a single lost packet per round trip time. An aggressive sender could choose to retransmit packets early, but such retransmitted segments may have already been successfully received.

1,639 citations