The domain name protocol itself gives us a motivation for doing this -- we want to be able to send packets to a domain name server and get replies. This transmission can be best-effort, since DNS itself will retransmit if no reply has been obtained.
In order to leverage our point-to-point network into an internet, we need to have machines with multiple interfaces that will forward packets among those interfaces. Such machines are called routers or gateways. The first name reflects the function of these packet forwarders: they must decide which interface constitutes the best route for the packet, and forward the packet along that route. The second name is more historical, and reflects the idea that a shared network at a company, office, or lab (typically an ethernet) would have a single machine connecting it to the Internet, such that all packets to or from the Internet would pass through this gateway machine. At present, router is the accepted term.
When a router receives a packet, it must look at a destination address and determine whether that packet is for itself. If the packet is not for itself, the router must look in a table to find a matching route. A routing table has entries of the form:
destination | interface |
---|---|
128.171.1.1 | /dev/ttyS0 |
128.171.1.2 | /dev/ttyS0 |
128.171.1.3 | /dev/ttyS0 |
1.2.3.4 | /dev/ttyS1 |
9.7.5.3 | /dev/ttyS1 |
In this table, destinations are listed by IP address, in dotted-decimal notation: each of the four of the bytes of the 32-bit IPv4 address is given in decimal. The hexadecimal numbers corresponding to these IP addresses would be 0x80ab0101, 0x80ab0102, 0x80ab0102, 0x01020304, and 0x09070503. This means that if I receive a packet with a 32-bit destination address 0x80ab0102 (i.e., 128.171.1.2), I must send it out on interface /dev/ttyS0. This is the core function of a router, that is, the routing function: to look up the destination address of a packet, and forward it on the appropriate interface.
It is easy to imagine that
The remainder of this chapter is devoted to making sure that these two conditions hold, with occasional modifications to allow most of the Internet to continue to behave should some of these conditions fail in localized areas of our Internet.
A host wishing to send a packet to another IP host must have a way of communicating to the next router both the contents of the packet, and the intended destination. We have seen that we can use SLIP to transfer packets of bytes. What we need in addition is a way of interpreting the packets we receive to figure out what the destination is and what the contents are. A standard way of communicating information is referred to as a protocol. Such protocols exist in daily life -- for example, there is a protocol for signaling to other drivers that one wishes to make a left turn (some drivers do not follow the protocol, but that is a different issue). A networking protocol allows us to "understand" received data to mean something specific.
As an example of a very simplified IP protocol, when we have data to send on our network, we could place the IP destination address in the first 4 bytes of a packet, immediately followed by all the bytes of the data. The IP destination address should be placed with the "first" byte first, that is, 128.171.1.2 should be encoded so that 0x80 is the first byte, 0xAB the second, 0x01 the third, and 0x02 the fourth. The receiver of such a packet, assuming there were no errors, would then look up the first four bytes in its routing table, and forward the packet on towards its destination. The final destination would recognize its own IP address and understand that the packet is for itself. Perhaps a further protocol might then tell the recipient what to do with the data just received.
This very simplified protocol has two things in common with the true IP version 4 (IPv4) protocol: there is a separate header placed in the packet right before the data, and the header includes the destination IP address.
In IPv4, the header is 20 bytes long (or more). The header includes both the destination address, which is used for routing, and the source address, which the recipient can use to reply to the message. These two addresses account for the last 8 of the minimum 20 bytes.
The header actually begins with a 4-bit field recording the IP version number -- this is the value "4" for IP version 4. The next 4 bits are the IP header length, in units of 32-bits. For a 20-byte header, this value is therefore 5, and for a 20-byte header, the first byte takes the value 0x45 (decimal 69). If the header is longer than 20 bytes, it must be extended in a way that keeps the length a multiple of 4 bytes. For example, a header length of 24 bytes would be encoded by a first byte of 0x46, a header length of 28 bytes would be encoded by a first byte of 0x47, and the maximum header length of 60 bytes would be encoded by a first byte of 0x4F. The header length might be longer than 20 bytes if the sender has added options at the end of the header. A number of options have been defined, some of them documented in RFC 791, at http://www.ietf.org/rfc/rfc791.txt, the original RFC defining the Internet Protocol, some of them documented in RFC 1122, at http://www.ietf.org/rfc/rfc1122.txt, the RFC updating the TCP and IP protocols, and some of them are defined in their own RFC. Some of these options are potentially useful, but many cause security concerns and complicate the implementation of routers, so the general consensus in the Internet community is to not send IP packets with options, and to not support received options. This is an example of a standard which is not generally supported.
The options may require an odd number of bytes. If the final header length is not a multiple of four, the header is extended to a 4-byte boundary. Keeping the header a multiple of four bytes has a benefit beyond reducing the number of bits used to store the header length. Most computers have a preferred alignment for data. In the most general case, bytes can have any alignment, 2-byte units (for example, values of type short int in C) should be aligned on a 2-byte boundary, 4-byte units on a 4-byte boundary, and so on. The penalty for mis-alignment may be a loss of performance (typical on CISC architectures such as the x86) or even bus errors (typical on RISC architectures such as SPARC or MIPS). However, if the buffer in which the IP packet and header are stored is aligned on a 4-byte boundary, then the IP payload is also aligned on a 4-byte boundary, which is a good thing.
The second byte of the header is called type of service, and is another example of a standard which is not widely supported. The basic idea of type of service is to label important packets so they are only discarded as a last resort, or packets which must be delivered quickly so they can take priority, or packets which need to be sent on high-throughput links if possible. Unfortunately, this scheme was never widely implemented, so the type of service field is ignored by most routers, and many hosts simply send 0 in this field.
The 3rd and fourth bytes of the header are the total length of the IP packet. This is the length (in bytes) of the entire packet, including the header and the payload which follows the header. In our system, such a header could help us detect packets where one byte has been lost (we know this is a possible problem on serial lines). The length is stored as a 16-bit number, with the more significant byte stored first and the less significant byte stored second.
This storage strategy for the total length -- more significant bytes sent before less significant bytes -- is used throughout the Internet protocols and is referred to as big-endian binary encoding. The opposite strategy is called little-endian. People in the field sometimes strongly feel that one or the other approach is better, perhaps because they are accustomed to a machine architecture that favors one or the other. This is similar to the religious wars satirized by Swift's "Gulliver's Travels", in which the inhabitants of one island, who eat eggs from the big end, fight terrible wars with the inhabitants of a neighboring island, who eat eggs from the little end, over which the "right" end of the egg is to open. There is, of course, no right or wrong side to an egg, as there is no right or wrong way of ordering the bytes of a word. On most computers, the function (sometimes defined as a C macro) htons will convert a Short (a 16-bit integer) from Host format TO Network format, that is, to big endian format. htonl does the same for 32-bit integers, and ntohs and ntohl perform the opposite conversion from network format to host format. On machines whose native format is big-endian, these operations do nothing, but portable code which must run on both kinds of machines uses these functions to place values in the correct byte order.
The next 32 bits of the IP header have to do with fragmentation, and will be discussed below.
The 9th byte of the IP header is called Time-To-Live, and usually referred to as TTL. The original intent of this field was that it be decremented once a second. If the value should ever reach zero, the packet should be discarded. This keeps a single packet from "living forever", and forever consuming resources such as memory and bandwidth on transmission lines.
The clocks of different routers, however, are not necessarily synchronized. In addition, most of the time a router will forward a packet in a small fraction of a second. It was decided, therefore, that the TTL would be decremented once for every entire second that a packet spends on a host or router, and also once for every time that a packet is forwarded by a router. The TTL field, therefore, is a combination of maximum hop counter and actual time to live.
The TTL field is usually set to a standard value, for example 60 or 64, by the sender. Special applications can set particular values for the TTL. For example, an application called traceroute, described below, uses the TTL to automatically find out the route traversed by packets in the Internet. For another example, applications that wanted to make sure packets did not leave the local network (for security purposes, for example), can set a TTL of 1 so that no router will ever forward such packets.
The 10th byte of the IPv4 header is the protocol field. This field is used to answer the question, hinted at above, of what to do with the payload. The number in this field identifies a protocol that knows how to interpret the payload. A value of 1 here identifies the Internet Control Message Protocol, ICMP, discussed below. A value of 6 identifies the Transmission Control Protocol, TCP, and a value of 17 (0x11) identifies the User Datagram Protocol, UDP, both of which are described in the next chapter. Many other values have been assigned, as documented starting on page 8 of them documented in RFC 1700, at http://www.ietf.org/rfc/rfc1700.txt, but most of these values are not in common use.
The next two bytes of the header are a checksum for the header itself. The basic idea of a checksum is add all the bytes in the header, and record the sum in the header itself. The receiver can then perform the same computation and verify that the header was received without errors. Checksums are easy to compute (just add the numbers together), but do not provide as much protection against errors as stronger algorithms such as Cyclic Redundancy Checks, CRCs, which are discussed in Chapter 4. For example, if two numbers in a sum are exchanged, the total will remain the same even though the header has changed and may now be meaningless.
The Internet checksum adds all pairs of bytes using 1's complement arithmetic. When checksumming an odd number of bytes, a 0 byte is inserted at the end for the calculation, but this is never a concern with the IP header.
In 1's complement arithmetic, the complement (negation) of a number is obtained by subtracting each bit of the number from 1 or, in other words, inverting the bit. Specifically, each 0 bit becomes 1 - 0 = 1, and each 1 bit becomaes 1 - 1 = 0. Some of the mathematical properties of this checksum are discussed in RFC 1071, at http://www.ietf.org/rfc/rfc1071.txt. For this discussion, suffice it to say that most modern computers do not provide 1's complement arithmetic, but can implement it relatively easily. The 16 bit, 1's complement sum of a set of numbers is the (normal) 16-bit sum of those numbers, added to any carry from the 16-bit sum. For example, the sum of the 16-bit numbers 0x89AB + 0xCDEF = 0x1468A, a 17-bit number. Adding the carry back in gives us 0x468B, which in this case would be the checksum.
A sender of an IP packet takes the following steps:
The receiver performs steps 2 and 3 and, if the result is 0xFFFF, knows the header has no detected errors.
The overall structure of the IP header is (from RFC 791):
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| IHL |Type of Service| Total Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identification |Flags| Fragment Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time to Live | Protocol | Header Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
IP version six, IPv6, has much the same core functionality of IPv4: the packet header has a destination address which is used for routing and to which the packet is delivered. The header also has a source address. Since the number of bits in an IP address is considered to be the major limitation as the Internet expands beyond 100 million hosts, the addresses are 128 bits (16 bytes) each. The total header size is fixed and is 40 bytes (320 bits).
Fields that are similar but not identical include a 2-byte payload lenght (instead of total length), a 1-byte hop limit (instead of TTL), an 8-bit traffic class (instead of type of service), and a 1-byte next-header field (instead of protocol). The last two are further discussed below, together with a totally new 20-bit field called a flow label.
Notably missing is any support for fragmentation and header checksums. Fragmentation will be discussed below. Header checksums were thought to be of limited use, and omitted to simplify the implementation of routing functions directly in hardware.
The header format is described in RFC 2460, which is the basic specification for IP v6.
+++++++++++++++++++++++++++++++++ |Ver| Class | Flow Label | +++++++++++++++++++++++++++++++++ |Payload Length |Nxt Hdr|Hop Lmt| +++++++++++++++++++++++++++++++++ | | + + | | + Source Address + | | + + | | +++++++++++++++++++++++++++++++++ | | + + | | + Destination Address + | | + + | | +++++++++++++++++++++++++++++++++
The version number is always six and takes up the first 4 bits in the datagram, as in IPv4. The next 8 bits are the traffic class (this is the only header field that is not aligned, spanning two different bytes). The traffic class is not defined by the RFC, except to say that zero should be the default.
The flow label takes up the next 20 bits. The idea of a flow label is to allow routers and hosts to negotiate special properties for a sequence of related packets -- for example, special routing, special low-latency delivery, and possibly others. The combination of flow label, source address, and destination address identifies packets belonging to a given flow.
Payload length is the length, in bytes, of the datagram, minus the forty bytes of the IPv6 header. The maximum length is 216 bytes, and there are also mechanisms for supporting longer datagrams.
Next header identifies the first header in the payload. This could be a transport-level protocol, such as ICMP, TCP, or IP, or an IP extension header. Since the IPv6 header size is constant, there is no room for options, as there are in IPv4. Instead, IPv4 has defined a number of extension headers, including hop-by-hop headers, which must be processed by every router, and end-to-end headers for fragmentation, security, and authentication. The headers must appear in the specified order, i.e. first the hop-by-hop headers, then fragmentation, security, authentication, and finally any other "destination" (end-to-end options). The hop-by-hop and destination options are specified in an extensible manner, with a length and a code field that specify whether routers and hosts should
Fragmentation means that IP packets up to the maximum size can be sent even if the underlying network does not support this. Let us use as an example our SLIP network with an MTU of 1006 bytes, and an application wanting to send a UDP packet that is 4096 bytes long. Adding the standard 20-byte IP header gives a total of 4116 bytes. Each fragment will need its own IP header, so the maximum number of payload bytes that a single fragment can carry is 1006 - 20 = 986. 4096 / 986 is 4.15, so a minimum of at least 5 fragments will be needed to carry the entire payload. How many bytes each fragment carries is up to the implementation, as long as the following rules are observed. In what follows, L is the total length of the IP packet, 4116 in our example, h is the header length, 20 in our example, and M is the MTU of the network, 1006 in our example. Subscript i identifies fragments of the original datagram, and n the number of fragments, 5 in our example.
Note that with an IP header length of 20 bytes, the total datagram length will not be a multiple of 8 (since 20 bytes is not a multiple of 8). It is the payload length that matters. Again, the final fragment may have an odd length. This last fragment has the MF (More Fragments) flag set to 0, whereas every other fragment carries MF=1.
Any datagram or fragment may have the Don't Fragment (DF) bit set. If the bit is set, the datagram/fragment should not be further fragmented, and if fragmentation is required, the packet should be discarded instead. The IPv6 extension header that supports fragmentation does not have this bit -- instead, all IPv6 datagrams can only be fragmented by the original sender, and not by routers. The routers therefore can only discard a received packet that does not fit in the MTU.
Fragments belonging to the same original datagram should all have the same ID field, the same source and destination address, and the same protocol field. Fragments belonging to different original datagrams should differ in at least one of these fields, most commonly the packet ID. -- ping of death and other odd IP fragmentation things such as overlapping and re-transmitted fragments
Although so far we have only been supporting point-to-point connections along serial lines, we want our routing tables to support more general configurations. In particular, we want to be able to use arbitrary "lower layer" networks to connect our computers. Some of these will be described in Chapters 4 and 5, but for now we simply need to be aware that these networks allow the interconnection of more than two computers, and hence we may need to record the IP address of the next router to which we are forwarding packets.