Outline: Transport Protocols
- Unix API Review
- UDP Review
- TCP header review
- TCP connection management review
- End-to-end Issues
- Project 3
Unix API
- socket returns a file descriptor:
- must specify address family: unix or inet
- must specify connection type: stream, "dgram", or "raw"
- must specify protocol: TCP, UDP, other
- why might socket fail?
- server then calls:
- bind to specify a local port
- listen to specify a queue size
- accept to create new connection(s)
- client then calls:
- bind (optional) to specify a local port
- connect to establish the connection
Unix Port numbers
- Port numbers under 1024 are reserved -- only the root user may
bind to them.
- Port numbers over 1024 are freely available.
- If you do not call bind, you are assigned an unpredictable port
number (not a random port number!).
- On Unix, "well-known" ports, for standard servers, are shown in {\tt
/etc/services} and obtained with getservbyname.
- Systems other than Unix might have mechanisms other than {\tt
getservbyname} and /etc/services, but they generally do need some
mechanism.
Transport Protocols
- In book: "end-to-end" protocols.
- Mediate between functions provided by network-level protocols and
functions required by applications.
- Topmost protocol concerned with transmission/reception.
- main protocols on top of IP are ICMP, UDP, TCP (complete list
in RFC 1700)
Transport Protocols:
Example Functions
- guaranteed message delivery
- correctness.
- in-order delivery
- at-most-once delivery
- flow control
- demultiplexing (multiple applications/sockets on each host)
- arbitrarily large messages
- synchronization
- authentication/encryption.
UDP: packet format
- UDP only provides demultiplexing and an optional data checksum.
- Source and destination port together are the demultiplexing key.
Each is allocated independently on each of the end-systems.
- Optional checksum covers header, pseudo-header, and data.
- if we are not checksumming, we send checksum as 0x0000,
whereas 0xffff means the sum has the
numerical value 0
- on older systems, checksum was expensive, often turned off for
NFS over ethernet
- What is the purpose of the length field?
- pseudo-header included in checksum
Who uses UDP?
- RIP
- DNS
- NFS
- NTP
- any others?
TCP header
- port numbers for demultiplexing
- sequence, ACK count bytes (and additionally also SYN, FIN)
- header length allows options:
- winscale:
window scaling, in case our bandwidth* delay product
is greater than 64K
- time segment sent, returned in ack
- sequence number extension, by combinining option "winscale"
with regular sequence number
TCP connection invariants
- a connection is established after we have sent a {\tt
syn} and received an ack for the syn, and the remote host has
done the same.
- When we send a fin, the connection is no longer established,
and we can no longer send data.
- When we receive a fin, remote host is done sending data.
- When we have sent and received fin and acks for each
fin, the connection can be closed (after a possible timeout).
- When we receive a rst, the remote host no longer shares our
state of the connection, and the connection is void.
TCP 3-way Handshake, Close
- With the three-way handshake, the two end-systems
- agree to open a connection between the stated ports.
- agree on each other's initial sequence numbers (ISS).
- negotiate any other start-time options.
- after the first fin has been received, the connection is {\em
half-closed}: only one side can still send data.
- one host has to keep the connection alive for a time in case
the ack gets lost and host right resends the fin.
In-class exercise
- what does the system do when I call accept?
- what does the system do when I call bind?
- what does the system do when I call connect?
- what does the system do when I call listen?
- what does the system do when I call socket?
End-to-End Argument: Premises
- computer hardware is more trustworthy than networks
- for example, memory is unlikely to corrupt or lose our data
- in contrast, the network is likely to corrupt or lose our data:
- built from software (untrustworthy?), not just hardware
- many components that we have no control over (collisions,
congestion, lightning, denial-of-service attacks)
- interoperation of components made by different manufacturers,
possibly never tested together
End-to-End Argument
- If an application needs a guarantee (for example, reliability, error
detection), it cannot expect it to be satisfied by the network, and
should implement it in the end system.
- The network may provide the guarantee anyway, and that will lead to
better performance.
- example of application that doesn't need reliability or error
detection: clock distribution using a network (NTP)
- example of application that needs both: file transfer
End-to-End Example
- Even if a network guarantees reliable, in-order message delivery at
each hop, software bugs on a router can make that guarantee
meaningless (re-entrant code can be hard to make correct).
- Therefore, it is a good idea for the transport protocols or applications
to also ensure reliability (if needed).
- A broken router can affect many other nodes. A broken end-system
protocol only affects that node and the ones talking to it.
Project 3
- web limited search engine
- command line operation
- similar to fgrep (no patterns, no wildcards)
- must support:
- -i for case independence
- -l for list only, don't show matching line(s)
- -v for only non-matching lines
- -d=N for only search to depth N links (defaults to N=0
if not present)
- any reasonable general-purpose programming language (check with
instructor if other than C, C++, Java, Pascal, Perl, SML, Lisp,
Haskell, Fortran, Ada)
Project 3 Basic Algorithm
- parse command line to determine URLs, options
- repeatedly:
- fetch page using HTTP1.0 (RFC 1945)
- search for matches in page, print as appropriate
- search for links in page, add to list of links to traverse if depth
is less than N