a monolithic implementation of networking protocol stack would
look at IP source and destination, protocol number, and TCP/UDP ports to
select a socket and TCB corresponding to the packet
in a layered implementation:
the IP layer uses source, destination, and protocol
to identify "connection"
TCP/UDP (transport) layer uses source and destination port
to identify TCB/socket
Microprotocol implementation:
demux layer looks at protocol to choose TCP or UDP upper layer
demux layer uses source IP to determine "connection" (corresponding
to all TCP or UDP packets from that source IP)
IP)
check layer checks destination
demux layer uses source port to determine "connection" (which includes
all packets from the given source IP and the given source port)
demux layer uses destination port to finally determine socket
Router Congestion
assume a fast router
two ethernet links receiving lots of outgoing data
one (relatively) slow T-1 link (1.5 Mb/s) sending the outgoing data
if the two links send more than 1.5 Mb/s over an extended period,
the router buffers begin to fill up
eventually the router will have to discard data due to congestion:
more data being sent than the line can carry
Congestion Collapse
assume a fixed timeout
if I have n bytes/second to send, I send them
if they get dropped, I retransmit them (total 2n bytes/second,
3n bytes/second, ...)
when there is congestion, packets get dropped
if everybody retransmits with fixed timeout, the amount of data
sent grows, increasing congestion
eventually, very little data gets through, most is discarded
the network is (nearly) down
TCP Reno
exponential backoff: if retransmit timer was t before
retransmission, it becomes 2t after retransmission
careful RTT measurements give retransmission as soon as possible,
but no sooner
keep a congestion window:
effective window is lesser of: (1) flow control window, and
(2) congestion window
congestion window is kept only on the sender, and never communicated
between the peers
congestion window (cwin) starts at 1 MSS, grows by 1 MSS for every MSS
acked: this is the exponential growth phase of the congestion window, called
slow start
on a retransmission, thresh = cwin / 2, and cwin = 1
then, use slow start while cwin < thresh
then (after cwin >= thresh) for each ack, add to the
window the value MSS * newly-acked/window: this adds one MSS
to the window for each whole window that is acked (typically, once
every RTT) resulting in linear growth
fast retransmit is similar -- interesting details at
RFC 2001.
RTT estimate
RFC 1122, section 4.2.3.1
RTT estimate must be accurate, or TCP will incorrectly assume
that the network is congested
Karn/Partridge algorithm: don't use retransmitted segments
for RTT estimation.
for accurate RTT estimate, keep a running average of RTTs:
RTTaveragex = (1 - alpha) RTTaveragex-1 *
alpha RTTx
(typically, delta = 1/8 for RTTaverage,
and 1/4 for Dev)
u = 1
phi = 4
Congestion Control
TCP Vegas
other ways of detecting congestion
addressing congestion
router intervention
Internet Explicit Congestion Notification
TCP Vegas
Reno detects congestion after it happens
Reno also causes congestion by increasing the window until
congestion occurs
early congestion detection:
as queues get filled in the router, packets take longer, so the RTT increases
when RTT gets bigger, we can slow our sending
when RTT gets back to minimum, we can increase our sending
not standard, but tested to work well
Detecting and Addressing Congestion
detecting congestion:
queues get longer
RTT gets bigger
data / RTT ( power) starts to drop as you try to send more
addressing congestion:
additive increase/multiplicative decrease (needed for stability if
congestion is occurring)
additive increase/additive decrease (TCP Vegas) -- works as long
as congestion can be avoided
setting flow rate
bandwidth reservation
Router Intervention to Avoid or React to Congestion
Random Early Discard -- causes TCP to back off
information feed-forward -- the receiver must then return
congestion information to the sender (see Internet ECN, below)
information feedback -- requires route back to sender, does not
work in Internet (except source quench ICMP, which is deprecated)
communication time from router to sender may be insufficient if
sender is sending lots of stuff. Also, stability issues -- all senders
could increase their sending rate at the same time
credits: can only send as much as we have in the "bank", automatically
(but not immediately) replenished
in ECN, two of the bits of the Type of Service (ToS) field are
used to indicate (a) whether congestion notification is requested (ECT),
and (b) whether the packet experienced congestion (CE).
TCP uses two new bits: ECE (ECN-Echo, to report that a packet
was received with the CE bit set -- bit before URG),
and CWR (Congestion Window Reduced, bit before ECE),
to indicate that the ECE bit was received.
compatible with hosts and routers that don't do ECN
typical usage of ECN:
senders can set ECT
routers can change ECT to CE to record that congestion
was experienced, perhaps instead of dropping a packet
transport layer is informed of CE, sends an ECE
receiver of ECE reduces congestion window, sends CWR