TCP
- Nagle Algorithm
- Bandwidth-Delay product
- TCP streams and push
- TCP header
- tcpdump and wireshark
Nagle Algorithm and sender SWS
- the usable window is the number of bytes I could send given the
ack, window, and the next sequence number to send:
U = ack + wnd - nxt
- D is the amount of data I am ready to send
- to avoid SWS, the sender sends if it can send one full MSS,
or at least 1/2 the (estimated) receive buffer:
min(D, U) >= MSS or min(D, U) >= rcvbuff / 2
- Nagle suggests, for D < MSS, only sending if the window is
large enough to send everything, and no acks are pending:
nxt = ack and D <= U
- or send if there is data to send, and an override timeout occurs
- in-class exercise: what is the benefit of this, that is, the
benefit of not sending when (a) the
window is large enough to send everything, but (b) acks are pending?
(i.e. if D < MSS and nxt != ack?)
Nagle Algorithm Overview
- two conflicting goals:
- send data as fast as possible, and
- don't send small packets, which have higher overhead (only significant
if the network is busy/congested)
- solutions:
- always send if the network is not busy -- first packet
when all that was sent before has been acked
- always send any maximum sized packet (either MSS, or at least
1/2 of the estimated receive window)
- otherwise wait a little while to send, until either one of the
above holds, or a timer expires
TCP delayed acks
- a receiver does not know when the sender is going to send the
next segment
- for fastest performance, a receiver should ack every segment
- in a two-way stream an ack is piggybacked on every data segment
going in the other direction
- typically, in a one-way stream TCPs should send one ack packet for
every two full-size segments
- a timer on the receiver should send an ACK for the latest data
if it has not been acked within about 20ms.
Bandwidth-Delay Product
- How big do I make the window?
- Suppose I have a small buffer of size s = 512,
I could set it to size s
- if I can send 100,000 bytes/second (800Kbps), and the round-trip
delay is 10ms, I can only send 512 bytes 100 times per second, or at
most 51,200 bytes/second
- if I set my window to the bandwidth-delay product,
s = 100KB/s x 0.01s = 1,000 bytes,
I can send at full speed
- this affects high-bandwidth applications (gigabit and 10-gig ethernet,
microwave) and high-latency applications (especially satellites)
TCP window scaling
- the window field in the TCP header is 16 bits, so the largest
window is 65,535 bytes
- this is not enough for full bandwidth on a 100ms (RTT) gigabit
ethernet, with a bandwidth-delay product of 100Mb = 12.5MB
- so TCP provides a window scaling option, sent with the
SYN packet
- the option only takes effect if both sides send a window scaling
option with their SYN packet
- if I send a window scale with a value of n, where
0 <= n <= 14, then I must divide the windows
that I send by 2n
- correspondingly, if I receive a window scale with a value
of n, then I must multiply the windows that I receive by
2n
- since n <= 14, the maximum window size is
< 230 bytes
- window scaling is defined in
RFC 1323,
"TCP Extensions for High Performance", which also provides protection
against wrapped sequence numbers.
TCP Streams and push
- TCP actually has a segmentation bit: PSH, or push
- when the application "pushes" the data, that information could be
conveyed all the way to the application at the other end
- if TCP can coalesce several user segments (each with PSH) into one
TCP segment, that TCP segment can only carry one PSH bit
- so passing PSH to the application is optional, and TCP has
no record boundaries
- so push is an advisory bit only: it encourages TCP (and the
application) to assume that the data must be sent (and presumably
replied to) before more data will be sent
TCP header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |C|E|U|A|P|R|S|F| |
| Offset|Reservd|W|C|R|C|S|S|Y|I| Window |
| | |R|E|G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
TCP Header Format
TCP Header fields
- Source and Destination port: demultiplexing
- Sequence and acknowledgement: reliable delivery
- Data Offset: header size, options
- Window: flow control
- Checksum: correctness
- Urgent Pointer: "special place" in the data stream
TCP Header bits
- SYN: I want to establish a connection
- FIN: I will never again send data on this connection
- RST: kill this connection
- PSH: immediate delivery of this data is probably a good idea
- URG: the urgent pointer is valid
- ACK: the acknowledgement field is valid (set in all but the
first SYN packet)
- ECE: this packet acknowledges a packet received with the IP
"congestion experienced" bit set
- CWR: the sender of this packet has reduced its congestion window
- the last two bits will be discussed in the context of congestion
control
tcpdump and wireshark
- tcpdump is a utility to look at all the packets on the network
and print out the headers
- usually run as root
- Ethereal is similar but (a) window-based, (b) newer
16:41:58.905998 maru.ics.hawaii.edu.14407 >
volcano.telnet: S 2671654129:2671654129(0)
win 512 [tos 0x10]
16:41:59.115893 volcano.telnet >
maru.ics.hawaii.edu.14407: R 0:0(0)
ack 2671654130 win 0
tcpdump example
16:47:02.285753 maru.1022 > volcano.ssh:
S 185741093:185741093(0) win 512
16:47:02.495648 volcano.ssh > maru.1022:
S 3829593384:3829593384(0)
ack 185741094 win 16352
16:47:02.495648 maru.1022 > volcano.ssh:
. ack 1 win 32120 (DF)
16:47:07.183328 volcano.ssh > maru.1022:
P 1:16(15) ack 1 win 16352 (DF)
16:47:07.183328 maru.1022 > volcano.ssh:
P 1:16(15) ack 16 win 32120 (DF) [tos 0x10]
16:47:07.433203 volcano.ssh > maru.1022:
P 16:292(276) ack 16 win 16352 (DF)
16:47:08.502673 volcano.ssh > maru.1022:
P 16:292(276) ack 16 win 16352 (DF)
16:47:08.522663 maru.1022 > volcano.ssh:
. ack 292 win 32120 (DF) [tos 0x10]