Outline: Transport Protocols

Unix API

socket returns a file descriptor:
- must specify address family: unix or inet
- must specify connection type: stream, "dgram", or "raw"
- must specify protocol: TCP, UDP, other
why might socket fail?
server then calls:
- bind to specify a local port
- listen to specify a queue size
- accept to create new connection(s)
client then calls:
- bind (optional) to specify a local port
- connect to establish the connection

Port numbers under 1024 are reserved -- only the root user may bind to them.
Port numbers over 1024 are freely available.
If you do not call bind, you are assigned an unpredictable port number (not a random port number!).
On Unix, "well-known" ports, for standard servers, are shown in {\tt /etc/services} and obtained with getservbyname.
Systems other than Unix might have mechanisms other than {\tt getservbyname} and /etc/services, but they generally do need some mechanism.

In book: "end-to-end" protocols.
Mediate between functions provided by network-level protocols and functions required by applications.
Topmost protocol concerned with transmission/reception.
main protocols on top of IP are ICMP, UDP, TCP (complete list in RFC 1700)

UDP only provides demultiplexing and an optional data checksum.
Source and destination port together are the demultiplexing key. Each is allocated independently on each of the end-systems.
Optional checksum covers header, pseudo-header, and data.
if we are not checksumming, we send checksum as 0x0000, whereas 0xffff means the sum has the numerical value 0
on older systems, checksum was expensive, often turned off for NFS over ethernet
What is the purpose of the length field?
pseudo-header included in checksum

port numbers for demultiplexing
sequence, ACK count bytes (and additionally also SYN, FIN)
header length allows options:
1. winscale: window scaling, in case our bandwidth* delay product is greater than 64K
2. time segment sent, returned in ack
3. sequence number extension, by combinining option "winscale" with regular sequence number

a connection is established after we have sent a {\tt syn} and received an ack for the syn, and the remote host has done the same.
When we send a fin, the connection is no longer established, and we can no longer send data.
When we receive a fin, remote host is done sending data.
When we have sent and received fin and acks for each fin, the connection can be closed (after a possible timeout).
When we receive a rst, the remote host no longer shares our state of the connection, and the connection is void.

With the three-way handshake, the two end-systems
- agree to open a connection between the stated ports.
- agree on each other's initial sequence numbers (ISS).
- negotiate any other start-time options.
after the first fin has been received, the connection is {\em half-closed}: only one side can still send data.
one host has to keep the connection alive for a time in case the ack gets lost and host right resends the fin.

computer hardware is more trustworthy than networks
for example, memory is unlikely to corrupt or lose our data
in contrast, the network is likely to corrupt or lose our data:
- built from software (untrustworthy?), not just hardware
- many components that we have no control over (collisions, congestion, lightning, denial-of-service attacks)
- interoperation of components made by different manufacturers, possibly never tested together

If an application needs a guarantee (for example, reliability, error detection), it cannot expect it to be satisfied by the network, and should implement it in the end system.
The network may provide the guarantee anyway, and that will lead to better performance.
example of application that doesn't need reliability or error detection: clock distribution using a network (NTP)
example of application that needs both: file transfer

Even if a network guarantees reliable, in-order message delivery at each hop, software bugs on a router can make that guarantee meaningless (re-entrant code can be hard to make correct).
Therefore, it is a good idea for the transport protocols or applications to also ensure reliability (if needed).
A broken router can affect many other nodes. A broken end-system protocol only affects that node and the ones talking to it.

web limited search engine
command line operation
similar to fgrep (no patterns, no wildcards)
must support:
- -i for case independence
- -l for list only, don't show matching line(s)
- -v for only non-matching lines
- -d=N for only search to depth N links (defaults to N=0 if not present)
any reasonable general-purpose programming language (check with instructor if other than C, C++, Java, Pascal, Perl, SML, Lisp, Haskell, Fortran, Ada)

parse command line to determine URLs, options
repeatedly:
1. fetch page using HTTP1.0 (RFC 1945)
2. search for matches in page, print as appropriate
3. search for links in page, add to list of links to traverse if depth is less than N