Congestion Control
- TCP timer computation
- congestion control:
- delay (Vegas)
- loss (Reno)
- additive increase/multiplicative decrease
- project
- available bit rate in ATM
TCP timer computation
- TCP designed to run on a wide variety of networks
- to accurately detect packet loss, TCP keeps a running average
of round-trip times
- if current average is ti, and an ack is received
now for packets sent dt time ago, new average is
t[i+1] = ati + (1 - a) dt
- a is usually 7/8, so we multiply by 7 and divide by 8 (shift right
3 places)
- some routes have low average round-trip time but high variability,
e.g. packet round trip times of 100ms, 101ms, 500ms, 100ms, 100ms...
TCP timer computation
- as well as a running average of round-trip times, TCP also a
running average of the difference between the actual and the average
round-trip time.
- if current average is ti with current average difference
Di, and an ack is received now for packets sent dt time ago, new
average difference is D[i+1] = bDi + (1 - b) |dt - t[i]|
- b is usually 3/4, so we multiply by 3 and divide by 4 (shift right
2 places)
- packet is considered lost if no ack is received within ti + 4Di
- running averages weigh recent values exponentially more than
previous values
- cannot use retransmitted packets
Goals of congestion control
- efficiency: allow people to send as fast as possible
- fairness: distribute bandwidth approximately evenly among
connections (hosts? networks?)
- collapse prevention: if everyone sends as fast as possible,
routers will spend all their time receiving and discarding packets,
rather than transmitting data. Hosts will spend all their time
retransmitting discarded packets, adding to the congestion.
window-based congestion control
- flow control window Wf is set by receiver
- sender uses a congestion window W <= Wf
- W varies with time
delay-based congestion control
- TCP Vegas
- measure the round trip time T when sending at rate R due to
window being W
- call the minimum T so far D
- RD packets are "stored" in the network, the remainder
x = W - RD are "stored" in the routers' buffers
- increase W if x <= 1
- decrease W if x >= 3
- if we have at most 3 packets in the routers' buffers, we cannot
be creating congestion
- if we have more than 1 packet in the routers' buffers, our rate
R will increase when more bandwidth becomes available
loss-based congestion control
- TCP Reno
- increase W on every packet we send:
- linear increase: W increases by MTU/W (1/W if
measuring in packets) for every ack
- exponential increase: W increases by MTU (1) for every ack
- if we detect packet loss, reduce W to 1:
- exponential increase back to W/2 ("slow" start)
- linear increase thereafter
- if we see a 3rd duplicate ack, we retransmit (fast
retransmission), set to W/2
fast recovery
- don't want to empty the pipe when a single packet is lost
- fast retransmit the packet
- set window to W0/2+3
- for every ack, increase window by MTU
- after W0/2 worth of acks, can start retransmitting
- by the time the lost packet is acked, there are W0/2 more
packets in the pipe
- allows incoming acks to "clock" new outgoing packets, avoids
slow start
additive increase/multiplicative decrease
- additive increase: each round-trip time, add a fixed amount (e.g. 1 MTU)
to the window
- multiplicative decrease: when congestion is detected, reduce window
by constant factor
- reduces large windows more than small windows
- gives everyone equal opportunity to increase (modulo different
round-trip times)
- multiplicative increase would be unstable and unfair
Project
- groups (up to 4 people)
- pick your language
- start early
- reliable multicast (based on ABP+sequence numbers)
- link-state routing
- reverse-path routing for multicast
- client and router
- test against your own, other groups' projects
ATM ABR congestion control
- switch can tell N sources at what rate E to send
- sources want to transmit at rate C'>C, the output capacity of
the switch
- switch selects a threshold (high watermark) B' buffer occupancy
- when B > B', switch sets E = 0
- senders react after different times t < T, so switch may still
receive up to NTE
- switch picks B'+NTE< buffer size
- delay up to T before starting up,
to maintain utilization, switch must set E=C'/N when buffer occupancy is
C/T
- output link will be full if B' > C/T