Computer Networks Notes 4: Broadcast Networks in the Data-Link Layer


1. Introduction. Broadcast Networks

In all the examples so far, we have used the point-to-point serial line connection, which can carry individual bytes. We use the serial line to send individual bytes, as well as framing information encoded according to the SLIP (Serial Line IP) protocol.

The task of framing packets belongs to the data-link layer, which is responsible for communication of data across a single hop.

Note that many protocols in the data-link layer have been generally adapted to work over multiple physical hops, for example switched Ethernet can be carried over any number of hops.
In contrast, the task of carrying individual bits and bytes of data belongs to the physical layer -- in the above example, the serial line, with its bit/baud rate and line discipline, is the physical layer. Line discipline refers in part to whether both sides can transmit at once -- a full duplex connection -- or whether only one sender can use the medium at a given time -- a half duplex connection.

These notes do not much focus on the physical layer, beyond noting that modern communication networks use three main classes of physical layer:

  1. copper, in which copper conductors relay electrical signals, and bits and bytes are encoded using a variety of modulation techniques including amplitude modulation (AM), frequency modulation (FM), phase modulation (PM), or some combination of these.

    AM includes modulation of either voltage or current. AM can be used in on/off mode, where one bit corresponds to a voltage or current level and the other bit to no voltage or current, or with a number of steps.

  2. fiber optics, in which glass fibers relay optical signals, and bits and bytes are encoded generally by turning the light on or off.
  3. wireless, in which radio- or microwave-frequency electromagnetic signals in free space are used to encode the bits and bytes. Again, a variety of modulation techniques are used.

Each of these classes includes many different specific physical layers.

There is also a wide variety of data-link layers, each of which may be compatible with a number of different physical layers. For example, SLIP can be used not only over serial lines, but also over Modems and over serial radios. The same physical layers will support a different data-link layer called the Point-to-Point Protocol, or PPP (RFC 1661).

These physical layers may be point-to-point, that is, connect (at most) two computer interfaces, or point-to-multipoint, also known as broadcast. Even with point-to-point technologies, most of the networks we use on a daily basis can be used to connect more than two computers or devices.

The simplest physical model is to imagine each computer having as an interface a digital radio that can be used to send and receive bits. All other radios within range can send and receive data to this computer. Such a setup is called a broadcast network, because every receiver can hear all the messages sent by every sender. Typical examples of broadcast networks include Aloha, and the original form of Ethernet -- which would broadcast on a wire -- and satellite-based networks.

The major challenge with broadcast networks is that they cannot be full-duplex -- when one unit is sending, all other units must avoid sending to avoid disrupting the communication in progress. If this is not done, the result is a collision of two packets, which normally results in neither packet being received correctly. Negotiating access to this shared medium is called medium access control, or MAC, and some people use the term MAC layer to refer to the data-link layer.

Whether the medium is shared or point-to-point, all data-link layer protocols provide sufficient services to carry IP packets. In each case, the IP header and payload is given to the data-link layer protocol. This layer may add its own header and sometimes its own trailer, and then sends it to the directly-connected next hop (on a broadcast medium, the frame is sent to all connected interfaces, including the next hop). When the frame is received, the data-link layer may perform some checking, then deliver the original packet back to the IP layer for IP processing. In all this, note that the data-link layer generally does not have to route -- routing is a network layer function. However, we will see some exceptions to this rule when we talk about Wireless Ad-Hoc networks and about Ethernet.

2. Wireless and Satellite networks

As mentioned, a wireless network is relatively easy to build given suitable digital radios.

Two basic kinds of wireless networks are:

  1. wireless networks where every sender is in range of every receiver.
  2. wireless networks where some senders and some receivers cannot communicate.
In the second case, connectivity can be provided if some of the intermediate network nodes agree to retransmit data for those hosts that are not in range. The more common case, however, is the first.

2.1 Aloha Protocols and Applications

The original Aloha Network was designed by a team at the University of Hawaii, with most of the credit going to Norman Abramson.

In an Aloha Network, a number of communication units are connected via radios. One communication unit is called a hub. There is a channel reserved for the hub, and the hub is the only system to transmit on this channel. There is also one other channel reserved for all the other communication units. We can imagine each channel as being a distinct radio frequency, though in a satellite network, the hub might be the satellite, and the channel can simply be the transmission from the satellite to the ground stations. As long as ground stations cannot communicate with each other directly, the satellite is the only one that can transmit on the satellite-to-earth channel, and the earth stations use the earth-to-satellite channel, which may be on the same or different frequencies.

The MAC algorithm in an Aloha Network is very simple. Each unit sends data whenever it has data to send. The hub listens on the inbound channel, and retransmits everything it receives on the outbound channel. If a unit sends data, and doesn't receive what it sent, it knows the transmission failed, and it sends again. In other words, this Medium Access Control actually does not control anything, only providing for notification and retransmission in case of failure.

A slight elaboration on this scheme is to add a CRC to each frame sent. Then, the hub verifies the CRC of any received packet before sending it. If the CRC is not correct, most likely a collision has occurred, and the hub need not retransmit anything.

A practical Aloha scheme must also include some sort of header to identify the intended recipient of the message. Although each unit will receive each message, most of them will discard messages not directly addressed to them.

This Aloha protocol is used widely in satellite networks, including telephone satellite networks. In these networks, the travel time from the ground to the satellite is generally quite large.

Unfortunately, the Aloha protocol is rather inefficient. The maximum theoretical rate at which the channel can be used by a large number of units is about 18% -- that is, a 1Mb/s channel can carry about 180,000 bits per second. If units attempt to send more than that, collisions become much more likely, and the channel is kept busy delivering packets that participate in collisions and therefore cannot reach the intended recipients. This theoretical limit only applies if there is a large number of senders -- when there is a single sender, channel utilization can reach 100%. However, the utilization when there are multiple senders is sufficiently low that people have designed schemes to improve it.

The first scheme is Slotted Aloha, in which the units are synchronized, perhaps by a signal from the hub. Time is divided into time slots, each slot being sufficient for transmission of one packet by one unit. Each unit agrees to begin packet transmission only at the beginning of a slot. The advantage of this scheme is that it avoids approximately 1/2 of the collisions -- if a collision is going to happen, it is only going to happen at the beginning of a slot, and there is no possibility that the end of one packet can collide with the beginning of another, as can happen with regular Aloha. As a result of this increased efficiency, theoretical network utilization can be as high as approximately 33%, which is a substantial improvement over Aloha. This theoretical result assumes that all packets are the same length and the same as the length of a slot, however, which may not be true. As a result, practical utilization is likely to be lower.

The second scheme is to use Aloha to reserve the channel, and then transmit the data at the reserved time. A restaurant reservation, for example, takes relatively little effort, and can avoid the bigger effort of going to a restaurant and finding that the restaurant is too busy. In the same way, an Aloha reservation simply communicates to the hub that the unit wishes to send a packet, usually also communicating the size of the packet the unit wants to send. If the reservation goes through, the hub will respond giving a time at which the unit may send the packet. If the reservation collides with another reservation, the unit does not get a response from the hub and must try to retransmit the request. If a reservation request requires much fewer bytes than a data packet, and therefore takes much less time on the channel than a data packet, we will have the following situation:

In the extreme, we can imagine the contention period being used to request a telephone connection in a satellite telephone system. The satellite must then see if bandwidth is available to carry this new call, and if it is, assign a time slot to this telephone call. This slot is then available for the entire duration of the call, requiring no additional contention.

2.2 802.11

In the second half of the 1990s, researchers developed radios that could be used for communication among laptops. The main standard for this communication has been called 802.11, and by now includes a large family of related standards. 802.11 was originally designed to communicate at 1Mb/s on the 2.4GHz band, which is the same frequency used by microwave ovens, and can be used worldwide without a license. The first extension to 802.11 allowed communication at 2Mb/s. Further standards have been defined: 802.11b supports 10Mb/s, 802.11a and 802.11g up to 54Mb/s on the 5GHz band, and 802.11ac up to 500Mb/s, again on the 5GHz band. Be aware that the faster technologies still occasionally fall back to transmitting some of their data at the slower speeds, especially in the presence of interference.

802.11 is also referred to as Wi-Fi or Wireless Local Area Network (Wireless LAN). 802.11 is mostly used for local communication, since most 802.11 radios only have a range of 200m (600ft), or usually much less.

The 802.11 protocol resembles the Aloha reservation system described in Section 2.1, but does not require participation by a central hub, and does not assume that all units are within range of each other. Instead, it assumes that any interfering traffic will be within range of the sender, of the receiver, or both. In other words, a unit that is too far to receive a message from the sender or a message from the receiver will be too far to interfere with a communication. The MAC protocol, originally called MACA and then MACAW, is therefore designed to make sure that all nodes within range of the intended sender and receiver are told to avoid transmission before the sender begins to transmit the data frame.

To achieve this "silence", the sender initiates the transmission by sending a request to send, or RTS packet to the receiver, indicating the address of the receiver and the length in bytes of the packet it wishes to send. If this transmission fails, the sender assumes there has been a collision and waits for a while before trying again. If the RTS is sent successfully, however, the receiver will reply with a clear to send, also listing the packet length. Again, the CTS may be lost due to a collision, and since the sender has no way to distinguish this event from a loss of the RTS, the sender again waits before sending another RTS.

If both RTS and CTS are sent and received correctly, the sender assumes that the channel is available and starts to send its data packet. If the transmission is successful, the receiver will send an ACK to confirm, otherwise the sender will again wait and try again later.

If the RTS/CTS exchange is successful, then everyone within range of both the sender and the receiver has been informed that a transmission is likely to take place, and so knows both that it should not transmit and knows for how long it should not transmit.

The RTS/CTS exchange is specifically designed to solve the problem where one node is in range of the sender or the receiver, but not both. This problem is known as the hidden terminal problem. However, this exchange cannot be used in the case of broadcast packets, which are intended for all recipients in range, and is not entirely successful if the interference range of a radio is greater than the communication range of the same radio, as is usually the case -- a radio limited to a communication range of 200m might interfere with another radio that is 300m away, for example.

Note that the RTS/CTS exchange replaces collisions of large packets with collisions of the small RTS and CTS packets. This only increases efficiency if the packet size is large. Most implementations have an RTS threshold parameter, below which the packet is sent without first sending an RTS.

802.11 is most commonly used to form wireless LANs. In this most common usage, one of the units is called a Wireless Access Point, and has a wired interface (usually, an Ethernet interface) as well as a radio. The wireless access point will forward all of the frames it receives on its radio to the attached wired network, and all the frames it receives on its wired network to the wireless LAN. In this, the wireless access point functions as a bridge. The bridging function includes packet forwarding, which routers also perform, but at best a very primitive form of routing -- all packets on one interface are forwarded to the other. Some wireless access points might be more elaborate and try to remember which interface provides access to which destination addresses. For example, if host A sends data using 802.11, and host B sends data on the Ethernet (A and B are MAC addresses), the access point might conclude that host A can be reached using the Wireless Lan and host B using the Ethernet, and might decide to only forward packets for B on the wired side, and packets for A on the wireless side. A system performing this function is called a learning bridge or learning switch, and can help reduce the amount of traffic on both networks. Algorithms for learning bridges and learning switches are discussed in more detail in Section 3.2 below.

A bridge also performs any necessary protocol conversions, for example sending RTS and CTS as needed to transfer data on the wireless side. 802.11 and Ethernet are similar protocols using the same format of addresses and compatible packet formats (Ethernet is defined by standard 802.3), so the conversion is relatively simple. Other bridges have been designed to perform protocol conversions that are considerably more complex.

Finally, a wireless access point is actually a distinct kind of node, and if present, it can be used to coordinate traffic among the other 802.11 units in its range. 802.11, for example, can be used to transport real-time data, and the access point is responsible for telling the other units when it is time for each unit to transmit their real-time data. In principle, such real-time data transfer is similar to the reservation system discussed in Section 2.1.

2.3 Ad-Hoc Wireless Networks, MANETs, and VANETs

An 802.11 network using an access point can only function within range of this access point. Many people are interested in building wireless networks where no node may be in range of every other node. On the Internet, for example, special nodes called routers can forward data among networks. In a wireless network, each node could forward packets to other nodes until the packets reach the intended destination. Such a network is called an Ad-Hoc wireless network, with ad-hoc referring to the fact that the network need not be planned since every node is willing to forward data, and that configuration ought to be minimal since each node should be self-configuring. In theory, if there is any way to forward data to a destination, such a network should be able to do so. Ad-hoc wireless networks are often designed for environments where the nodes may move on a frequent or infrequent basis -- such a network is a Mobile Ad-hoc NETwork or MANET. More information on MANETs is available in RFC 2501. A MANET involving vehicles and perhaps road-side equipment is called a VANET (Vehicular Ad-Hoc NETwork).

There are also many uses of ad-hoc networks where the nodes are not intended to move (a fixed ad-hoc network), or where only a few nodes are intended to move, and most are stationary, leading to the fixed-mobile communication problem. The fixed and fixed-mobile architectures are often suitable for sensor networks, where most of the nodes may be stationary, but a few may be self-propelled or may be carried by a human.

The major challenge in an Ad-Hoc network is routing. For a fixed ad-hoc network, many of the concepts from Internet routing can be used. Unlike in the Internet, there is no natural hierarchy, though some algorithms begin by artificially grouping nodes into clusters so that hierarchical routing is feasible. For a MANET, protocols such as link-state and distance-vector often do not perform well in the presence of high mobility, failing to converge, providing wrong routes, or having other problems.

Many MANET routing protocols have been proposed, and will not be covered here, though some of the principles are worth mentioning. The first principle is that in the absence of hierarchy or other structures, the only way to find a node is by broadcasting or by using distance-vector diffusion, described in Section 7.2 of Chapter 2. Broadcasting is usually implemented by flooding, as described in Chapter 2, Section 7.4.2. This limits the scalability of the network in that every destination connected to the network must result in some information being sent to every node in the network.

One notable exception to the need for broadcasts is given by reverse paths. Consider for example a sensor network where each node is designed to send data to one or a few base stations. Each base station can do a periodic broadcast. A sensor node receiving such a broadcast knows that the node it received it from is one step closer to the base station, and can forward base station data along this gradient. The base station can respond to the sensor node along the reverse of the path followed by the sensor data, as long as intermediate nodes remember from which immediate neighbor they received data from each individual sensor.

Another principle used in the design of ad-hoc networks is to provide data link layer reliability. Whether or not the 802.11 RTS/CTS handshake is used, the likelyhood of data loss in a congested 802.11 ad-hoc network can be high. Individual nodes can keep track of a neighbor's reception of each packet in one of two ways, either with an explicit ACK, or by seeing if the neighbor retransmits the packet to a further hop -- this is called an implicit ACK. Either way, a node can tell if a packet transmission failed, and can retransmit if necessary. While packet loss in the network is still possible, for example if a node's buffers become full, it is less likely, and this leads to higher performance for congestion-aware protocols such as TCP, and to higher data delivery rates for other protocols.

2.4 Comparison with IP

Some wireless ad-hoc networks carry IP traffic, and some don't. Either way, it seems that the wireless ad-hoc network is performing some of the functions of IP, for example routing and multi-hop forwarding, and even some functions of TCP, such as retransmission.

When such a network is used to carry IP data, the entire ad-hoc network is a single "hop" for IP. In other words, when nodes forward IP packets, they should not decrement the Time To Live field, and should not expect their routing protocols to interoperate with the Internet routing protocols. The reason for this is simple -- a wireless network does not follow the same model as an IP network. In an IP network, if interface A is on the same network as interface B, and interface B is on the same network as interface C, then A must be on the interface as C. Likewise if D is not directly connected to C, then it cannot directly communicate with A or B. None of these statements apply to wireless networks, since a single interface (a single radio) can communicate with two other interfaces which cannot communicate with each other. Because of this, the IP notion of "network", as a collection of interfaces all sharing the same network prefix and all able to communicate with each other, can only be applied to the entire ad-hoc wireless network, and not (gracefully) to any subset of nodes in such a network.

Chapter 3 described the ISO/OSI 7-layer model, which states that a network-layer protocol should be layered on top of data-link layer protocols. We have just seen that this is not true, that in a wireless ad-hoc network a network-layer protocol (IP) can be layered over another network-layer protocol (the ad-hoc network routing and transport protocol). We will see another example of this when considering ATM in Chapter 5.

3. Ethernet

3.1 Basic Ethernet

One way to think of an Aloha protocol is that the hub acts as a broadcast medium, reflecting to all units all the messages it gets, perhaps including the packets which have collisions. Such a broadcast medium is possible not only using wireless, but also using wired networks. After all, if a specific voltage is placed on a pair of conductors, any "receiver" measuring the voltage between the two conductors should get the same result. This is the basic principle behind the original design of Ethernet: the cable, i.e. the pair of conductors, forms an Ether, i.e. a medium for carrying signals. Unlike wireless networks, units will only interfere if they are connected to the same cable.

A time-varying voltage such as a transmission of a number of bits can also be detected by all receivers, but the same voltage is not detected simultaneously at all points on the wire. In the first place, the signal strength decreases slowly with distance, resulting in a signal that is hard to detect if the wire is overly long -- about 500m is the limit for the original 10Mb/s Ethernet (historically, there was a 3Mb/s Ethernet, but the first widespread adoption was of the 10Mb/s Ethernet). Also, signals travel at finite speed on any medium, including a pair of wires. This speed is the speed of light in the medium. In a vacuum this speed is about 300,000 km/s, 300,000,000 m/s, or about a foot per nanosecond. In any other medium, the speed of light will be less than the speed of light in vacuum (though many physicists keep trying to disprove this statement), and for a typical cable or fiber, 200,000 km/s is a reasonable approximation. In other words, to cross the 500m maximum distance of the original 10Mb/s Ethernet a signal would take about 0.0000025s, or 2.5 microseconds (2.5 us). Since a 10Mb/s Ethernet puts 10 bits on the network every us, such a maximum-size network "stores" about 25 bits, in transit from a sender to a receiver. Note that in such a network, a bit is about 0.1us x 200 m/us = 20m long.

Because the power on a pair of conductors only decreases slowly with distance, if two stations are transmitting simultaneously, each unit's signal will be substantially distorted by the other signal. For example, if unit A puts a "high" voltage, say 5 volts, and unit B puts a "low" voltage of zero volts, both of these units will be able to detect something close to 2.5V rather than the signal they placed on the wire. With a quickly time-varying signal the detection is more complex, but the basic principle is that each node can detect that others are interfering with its transmission. Since others interfere with transmissions only when there is a collision, this is called collision detection, or CD. Collision Detection is one major advantage of Ethernet over Aloha networks -- because each station can tell very quickly that a collision has occurred, each station can stop transmitting quickly, and avoid keeping the network busy with a corrupted transmission. Because of this, Ethernet can achieve much higher efficiencies than either Aloha or Slotted Aloha.

Because Ethernet distances and times are limited, each sender listens for other transmissions (or a carrier) before sending. If another unit is transmitting, the sender waits until the end of the transmission before sending its own data. Collisions are therefore most likely just after the end of each packet, and can only happen at the beginning of a new packet.

The original Ethernet was designed primarily by Robert Metcalfe, who was aware of the work on the Aloha Network. The designers of Ethernet felt it was important that Ethernets be reliable, that is, that a sender be able to tell whether or not a packet was transmitted successfully, and retransmit if necessary.

Collision detection increases reliability without acknowledgments, in that a detected collision behaves as an unreliable NAK. However, collisions could occur that are not detected by the sender. Imagine a packet that is only 16 bits long -- such a packet might be completely transmitted by unit A, with no collisions, before the first bit even reaches unit B, which therefore begins to transmit. As a result, the bits are received by B after B has begun to transmit, and B detects a collision, but A mistakenly believes the packet was sent successfully. To avoid this, Ethernet requires that each packet have a minimum size of 64 bytes (512 bits), and a limited maximum delay.

The benefit of reliable delivery is questionable. An undetected collision will simply lead to packet loss, which causes TCP to retransmit, or a real-time protocol to adapt to the packet loss. Retransmission at the Ethernet layer does have advantages however. Chapter 3 presented the TCP adaptive timer, which is used to predict likely round-trip times and therefore suitable timeouts. These timeouts can be on the order of 100s of milliseconds (ms), and therefore a single packet loss can significantly delay a TCP stream. In contrast, an Ethernet can detect packet loss and retransmit in at most the time it takes to send a maximum-sized packet, which is 1500 bytes or 12,000 bits -- 1.2ms on a 10Mb/s Ethernet. Retransmission is therefore much faster than TCP can achieve, and the result is the TCP packet is delivered relatively quickly.

The problem with retransmissions is that if a retransmission is needed due to a collision, which is likely due to congestion, retransmitting is likely to increase the amount of traffic on the network, resulting in a decreased overall throughput. To avoid this, Ethernet employs an exponential backoff strategy, in which a sender that has collided must wait a specific time after the first collision, then twice as long after the second, and so on up to 10 collisions. After the tenth collision, the time is no longer increased, and after 16 collisions, the packet is dropped by the sender.

This scheme is not perfect. In particular, if the packet was dropped due to a collision, there is at least one other sender on the network waiting to send a packet, and the other sender is likely to be waiting exactly the same length of time that we are waiting. To get around this, Ethernet uses a randomized exponential backoff, where the exponentially increasing wait time is used as a range within which to randomly select a time for the sender to wait.

Ethernet is frequently described as a Carrier Sense Multiple Access technology with Collision Detection, or CSMA/CD.

3.2 Ethernet Format, Addressing, and Processing

Each Ethernet packet starts with a preamble, a fixed 64 bit (8 Byte) pattern that helps the receiver synchronize on the sender. This preamble is generally handled by the hardware both on the sender and on the receivers, and cannot be modified by the software. The preamble is followed by the Ethernet header, by the payload, by padding if necessary, and finally by a 32-bit CRC. Padding is only used when the payload is less than 46 bytes, meaning the packet would be less than 64 bytes altogether.

The Ethernet header is 14 bytes long, and has a 6-byte destination address, a 6-byte source address, and a 2-byte field known as the Ethernet Type, or EtherType.

Although the original Ethernet design had 2-byte (16-bit) addresses, the currently accepted standards require that an Ethernet address have 6 bytes. Ethernet addresses are generally configured into each Ethernet device by the manufacturer, so that such a device can be used without the software having to assign an address. Ethernet equipment manufacturers purchase 3-byte numbers and select their own values for the remaining three bytes, meaning for example that an Ethernet card with address 00:01:03:DE:31:51 and a different Ethernet card with address 00:01:03:AF:32:57 are likely to have been built by the same manufacturer. The benefit of this scheme for assigning addresses is that each Ethernet card is guaranteed to have a different Ethernet address from every other Ethernet card on a network, and this address can therefore be used as a unique ID for this interface.

Ethernet addresses have been so successful that many other technologies, including 802.11, use similar 6-byte addresses, and some even use Ethernet addresses for their own addressing. Ethernet addresses are assigned in a hierarchical manner, but the hierarchy cannot be used for routing, since cards from the same manufacturer are often found on different networks and cards from different manufacturers are often used on the same network. Primarily as a result of this lack of routable hierarchy, Ethernet cannot scale to the size of the Internet. Limitations on the scale of an Ethernet are also built-in to the collision detection mechanism, the signal attenuation, and the fact that all receivers on a network will have to process all the messages, and it is for these reasons that Ethernet is considered purely a Local Area Network, or LAN, technology.

Recent forms of Ethernet address some of these limitations. For example, full-duplex Ethernet does not have the size limitation imposed by collision detection, and optical fiber can be used instead of electrical cable to remove the attenuation limitations. However, the address format generally prevents Ethernet from being used for networks the size of today's Internet.

The final limitation we mentioned, that every receiver on the network must process every packet on the network, is also being addressed, as we will see below. Let us first consider the basic steps a system must perform in processing an incoming (received) packet.

  1. The preamble is used to determine that a packet is about to be transmitted, and therefore start listening for each bit at the appropriate time.
  2. The first 6 bytes are received, and compared to both the interface's own address, and to the broadcast address (FF:FF:FF:FF:FF:FF:), and perhaps to one or more configured multicast addresses. If there is no match, the interface can stop receiving the packet, meaning there is no need to store the data into a buffer on the receiving system.
  3. If the destination address matched, the packet is read from the network and stored into memory.
  4. As the packet is received, the CRC is computed, usually in hardware. By the time the entire packet has been received, the CRC should match, and if not, the packet is discarded.
  5. The EtherType type is then examined to determine what further processing, if any, is needed. For example, for an IPv4-over-Ethernet packet, the payload type should be 0x800. If this is the case, the Ethernet header and CRC are discarded, and IP processing begins. For IPv6 packets, the EtherType should be 0x86DD.

The second step in this algorithm was crucial for Ethernet's acceptance on early computers which had very limited memory. One continuing advantage is that if the packet is not for this interface, the entire packet can be ignored, which means the CPU need not be involved in examining the IP header and discarding the packet.

The last step is a little bit more complicated than it sounds. Ideally, for IP processing it would be good for the IP header to be aligned on an address that is a multiple of 4 bytes. However, since the Ethernet header is 14 bytes, if the packet as a whole is aligned, the IP and TCP headers will be mis-aligned. The most common work-around for this is to allocate 16 bytes in memory for the Ethernet header, and start storing the Ethernet header at the third byte.

Another limitation of this scheme is that there is no explicit indication of payload length. While the hardware can detect the end of the transmission, giving an upper bound on the number of payload bytes, if the packet is a minimum-sized packet there is no way to tell if the packet was padded or not. If the packet is an IP packet, the total length field will provide this information, but for other protocols, this information might not be available.

To get around these last two problems, a standard has been developed whereby if the value in the EtherType field is 1500 (0x5DC) or less, it is considered to mark the payload length. In this case the next header following the Ethernet header is assumed to belong to the Logical Link Control, or LLC protocol. The LLC header is three bytes long, and is normally followed by another three-byte header called SNAP -- the combination usually referred to as LLC/SNAP. In this case, the LLC header has the value AA:AA:03 (with 03 indicating SNAP), and the SNAP header has the value 00:00:00, indicating that the following two bytes are a valid EtherType, for example 0x800 if the payload is an IP packet. The payload then immediately follows the EtherType. With LLC/SNAP, the overall data-link header is 14 + 8 = 22 bytes, so the total overhead is 26 bytes including the CRC or 32 bytes including also the preamble.

Because the overhead is greater and the benefit for IP and other protocols is limited, it is not clear which of the two standards is in wider use.

3.3 Hubs and Switches

The original Ethernet, called 10Base5 (10Mb/s, 500m) or thickwire, used a single pair of wires to connect all the computers on the network, and was therefore quite limited. An improvement used thinner wire, and was therefore more limited in distance and called 10Base2, but allowed T-junctions that gave more flexibility -- no longer did one single wire have to reach every computer in a network.

Eventually, people started connecting computers to special devices called hubs which perform almost the same function as Aloha hubs: they rebroadcast everything they receive. Because these are wired rather than wireless hubs, they have a number of distinct interfaces, and when they receive a signal on one of the interfaces, they rebroadcast on all other interfaces. This algorithm is similar to the flooding algorithm described in Chapter 2, and works well as long as the network does not have any loops, that is, as long as there is exactly one wire between any two systems in the network. The wiring in this system directly connects a computer to a hub, with no other computers connected, and so the wiring is not a shared medium -- the only shared medium is the hub itself, and the hub must detect collisions as well as forward any externally generated indications of collisions.

The wiring in a hub-based system is a collection of twisted pairs known as CAT-5, and this variant of Ethernet is 10BaseT (T for Twisted pair). A similar 100Mb/s version is known as 100BaseT, and is currently the standard for Ethernet deployments.

Physically, 10Base5, 10Base2, and 10BaseT are very different. 10Base5 is a single pair of wires that must reach every computer. Also, this technology uses vampire taps to access the wire without breaking it.

A vampire tap is shown here -- the device that looks like a short bolt, with a sharp conductor in the middle surrounded by an insulator, makes contact with the center conductor of the orange coaxial cable, and the outside makes contact with the shielding, or outer conductor, of the cable.

10Base2 has smaller and more flexible wires, and the wire can be tapped not only to connect a computer, but also to extend the conductors in a new direction. Although the medium is still shared at an electrical level, the topology is arbitrary, unlike 10Base5 where the physical topology is a bus.

10BaseT and 100BaseT have even smaller and more flexible (and cheaper) wires, and the physical topology is a star, or point-to-point, topology. There is no physically shared medium, but the hubs implement medium sharing by detecting collisions when data is received from more than one interface at the same time.

A hub is still considered to be on the physical layer, since it does not look at the Ethernet header but simply forwards each bit that it receives. One of the things a hub does is restore the signal, allowing connections to be longer overall -- a two-port hub is simply known as a repeater. A hub however necessarily introduces a delay, which increases the number of bits stored by the network, and also limits the overall size of a local area network -- the limit is on the diameter, that is, the maximum distance or delay between any two systems on the network. Specifically, if a signal has to cross many hubs, it should not have to cross as much distance, since any round-trip delay of 51.2us or more (for 10Mb/s Ethernet) will lead to unreliable collision detection. This rule has been summarized as the 4-repeater rule, requiring a maximum of 4 "repeaters" between any pair of systems. In practice, as discussed above, reliable collision detection is not required by higher-level protocols, and uninformed network administrators routinely violate the 4-repeater rule without negative consequences, especially on lightly-loaded networks with only few collisions.

As a network gets large, some of the limitations of hubs become apparent. In particular, an entire hub-connected network forms not only a single broadcast network, but also a single collision domain, meaning that any sender will collide with any other sender.

As a way to break up collision domains, we use Ethernet Switches. An Ethernet switch is somewhat like a hub in rebroadcasting each received packet on all its other ports, and a little bit like a router in that it can store and queue entire packets, and therefore works on layer 2 -- an Etherswitch can also check the Ethernet header, as we will see below.

There are more aspects to a switch than what we cover here. For example, a switch is expected to use hardware to forward frames quickly, since a device with the same functions as a switch but without hardware forwarding is generally known as a bridge, which we mentioned above in the discussion of Wireless Access Points.

Unlike a hub, a switch breaks the collision domain, since it can receive packets on multiple interfaces at once. Unlike an IP router, the switch does not perform a routing protocol designed to figure out which port a packet should be forwarded on -- instead, the switch can send the packet to all ports except the one it was received on.

Consider an Ethernet switch with two ports, each connected to a large network. The first network, connected on the first port, has hosts with arbitrary (but unique) Ethernet numbers we will describe as A, B, C, ... Z, and the second network on the second port has Ethernet numbers such as 0, 1, 2, ... n. A hub or a simple switch will function as a repeater, forwarding all packets from each port to the other port. The the switch may add a delay before forwarding if that is necessary to avoid collisions.

However, a better way of forwarding these packets would be to look at the destination address. A packet on the first port that is for a host A...Z really does not need to be forwarded to the second port, whereas a packet for a host in 0..n really does.

To achieve this, Ethernet switches look not only at the destination of each packet, but also the source address. If a packet from A to 10 is received on port 1, the switch makes a note of the fact that packet A is reachable on port 1. The packet is still forwarded on all ports other than port 1. At a later time, the switch may receive a packet from 10 on port 2, perhaps addressed to Ethernet address A. The switch notes that Ethernet address 10 can be reached through port 2, and only forwards the packet on port 1, on which A was last heard from. The algorithm follows:

  1. When receiving a packet from source S to destination D on interface I, record that S is reachable through I.
  2. If there is no record for D, send the packet on all interfaces except the interface I that the packet was received on. Otherwise, there is a record for D being reachable through interface I'. If I' != I, send the packet to I', whereas if I and I' are the same interface, drop the packet.
  3. If there is a record for a destination X that has not been updated in the last 30 seconds or so, delete this record.

A switch performing this algorithm is known as a learning switch, since over time it learns how to reach each of the hosts it is connected to. The algorithm works well even if there are multiple switches in the network, as long as there are no loops. The reader is encouraged to simulate the algorithm by hand on a simple network with multiple switches and verify that this is indeed the case.

Learning switches (commonly simply referred to as switches) break up the collision domain, and also reduce the amount of traffic present on a given Ethernet segment. As a result, they can provide an increase in throughput as well as some measure of confidentiality -- for example, data from the accounting department to the servers need never be present on the segment which the remainder of the company uses to access the Internet.

If there are loops, a broadcast packet will loop forever, and the learning algorithm may flap and continue to forward even a non-broadcast packet. This consumes bandwidth on the Ethernet, so should be avoided. Keeping the network loop-free is difficult for large networks, however. Also, it is frequently desirable to have redundant links so that if one link gets disconnected, other links can take over the traffic with minimal disruption to network users. The algorithm used at present is to have switches use a special protocol to build a spanning tree of the network -- a loop-free connected subset of the links, and hence a tree, that reaches every individual Ethernet segment, and hence a spanning tree. This spanning tree can be recomputed quickly in case the network topology changes, and broadcast packets are only forwarded if they are received on links which belong to the spanning tree, completely avoiding broadcast loops.

3.4 Full-duplex Ethernet

Ancient terminals were able to send data and receive data, but could not do both at the same time. After terminals were developed that could do both at the same time, the first kind was referred to as half-duplex, and the second kind as full-duplex. The terminology has spread to the networking world, where the Ethernets we have discussed so far are referred to as half-duplex since no node may transmit a packet and receive a different packet at the same time.

With a hub- or switch-connected network, physical connections are all point-to-point. If a network is connected entirely via switches, the only possible collisions are when a switch transmits at the same time as the computer or switch that is directly connected to it. There is enough bandwidth in a CAT-5 connection to carry 10Mb/s or 100Mb/s in both directions simultaneously, however, and full-duplex Ethernet takes advantage of this to achieve 100Mb/s throughput in each direction rather than 100Mb/s total throughput. This is also a popular strategy with fiber connections, since fiber is naturally full duplex -- most fiber connections consist of two separate fibers, one for traffic in each direction, and theoretically even a single strand of fiber can carry two non-interfering streams of traffic in opposite directions.

Fiber-based ethernets have been standardized at both 1Gb/s and 10Gb/s. Such a network has essentially no distance limitations -- fiber can go at least 25Km, and if repeaters are used, can go anywhere on the planet -- and can be used to connect two or more distant routers, perhaps as part of an Internet or Intranet backbone.

4. IP and data-link layer technology interactions.

In chapters 1 and 2, we assumed IP was layered over a point-to-point network technology such as SLIP over serial line. Another common protocol for serial lines and modems is the point-to-point protocol, or PPP. PPP provides the same framing function as SLIP, but also has a large number of extension to provide additional functionality, including authentication and on-demand IP address assignment. As far as IP data transfer is concerned, SLIP and PPP over serial lines or modems perform the same function in the same way, and therefore are equivalent.

4.1 Address Resolution Protocol.

The same is not true for IP over Ethernet or IP over wireless. The main distinction is that if a packet is sent over a broadcast network, it must carry a destination address -- in the case of Ethernet or 802.11, a 6 byte hardware address identifying the physical interface that is intended to receive the packet. IP of course knows the IP address, or protocol address of the interface that it wishes to receive the packet -- this is either the packet's destination address, for a directly connected destination, or the Next Hop address stored in the routing table. Obtaining a valid hardware address corresponding to a given protocol address is called address resolution.

The simplest way to do address resolution is to send every IP packet to the hardware broadcast address. This has the advantage of simplicity, but requires every host on a network to process each and every incoming packet, which may cause noticeable slowdowns on many hosts. It also forfeits any optimizations and security improvements that would be obtained from using learning switches. Finally, it is not practical if a network has multiple routers connecting it to the remainder of the Internet, since all of these routers will happily forward the packet, resulting in packet duplication and consequent waste of network resources.

Another way to do address resolution is to have a function mapping protocol (IP) addresses to hardware (MAC) addresses. For example, assume a network administrator has a network with fewer than 256 hosts. The administrator purchases 256 network cards with Ethernet addresses that are identical in the first 5 bytes, and only differ in the last byte. The administrator could then assign IP addresses such that the last byte of the IP address matches the last byte of the Ethernet address, and address resolution provides a hardware address by copying the initial 5 bytes from the machine's own Ethernet address, and the last byte from the destination's IP address.

Unfortunately, this solution is not very practical, in part because few network administrators are in a position to assign network interface hardware (many laptops, for example, cannot change their network interface), and in part because sometimes IP addresses are to be assigned independently of hardware -- for example, some organizations may run an active and a standby web server, and when the active server goes down, simply assign the IP address of the active server to the standby server. This might be done also with other servers, for example mail servers. While this could also be achieved by changing the DNS mapping, some DNS mappings are long-lived (to avoid having to refresh the mapping too frequently), so changing the IP address is more practical. IP addresses also change in systems that use DHCP, including many home networks connected to ISPs. In short, while this solution may be appropriate for special cases, it does not have general applicability.

A better way of providing address resolution is to maintain a table listing a hardware address for each protocol address. When IP needs to forward a packet, it looks up the next hop IP address in this table, and places the corresponding hardware address in the Ethernet or wireless header. Since in most systems IP is implemented within the operating system, this table must also be in the operating system, and is commonly referred to as the Address Resolution Protocol table, or ARP table. The only challenge of having an ARP table is the challenge of initializing and maintaining it. This could be done manually, but the task is tedious and error-prone. Instead, the Address Resolution Protocol (ARP) is used to query the network itself for translations that are needed, but not present in the table.

When ARP needs to resolve an address, it broadcasts a request specifying the destination protocol address. All machines receiving this broadcast message are supposed to see if they have been assigned the given destination address, and if so, they unicast a reply directly to the sender. The sender then uses this information to transmit the IP packet that was queued waiting for the ARP reply. The information is also added to the ARP table, and can be used for future packets using the same next hop. The ARP table is generally maintained as a cache, with the least-recently used entry (or a random entry) ejected when the table is full, and with entries timed out after a few minutes, typically 15.

If host A needs to send IP data to host B, there is a high likelyhood that host B will soon need to send data back to host A -- whether that data is the SYN+ACK of the TCP three-way handshake, or any other statement that host B actually exists and has received the packet. As an optimization, therefore, ARP packets carry not only the destination protocol address (and, for replies, the destination hardware address), but also the source protocol and hardware addresses. If an ARP request is sent by host A with a destination protocol matching the IP address of host B, host B will generally place the protocol and hardware addresses of host A in its own cache.

ARP is defined by RFC 826, which however gives few details. An example ARP header for use over Ethernet follows:

The hardware type 1 indicates this is an Ethernet address resolution. The hardware address length is 6, since Ethernet addresses are 6 bytes long, and the protocol address length is 4, since IPv4 addresses (indicated by a protocol number of x800) are 4 bytes long. The opcode is 1 for a request, and 2 for a reply. Requests are usually broadcast, replies usually directed to a single Ethernet address.

If an IP address is reassigned, e.g. to a backup server, the hardware address will be different, and any existing ARP entries will provide misleading information until the entries time out. This is particularly severe if the incorrect ARP entries are present on the default router. The system administrator can use the arp command (on Unix-like systems) to flush specific table entries, but there are alternatives. For example, if the backup server sends an IP packet outside the local network, that packet will have the default router as its next hop, and the backup server will send an ARP request to the default router. This request will update the entry in the router's table, and communication will be re-established.

4.2 IP over Wireless Networks

ARP as we have discussed it so far works unaltered on a wireless 802.11 LAN using an access point. Broadcasts work in much the same way as on Ethernets (though without collision detection), and ARP thus produces a hardware address that will bring data to a given destination.

The process is more complex on a multi-hop wireless network. To begin with, there are different ways of implementing IP over multi-hop wireless. For example, if the maximum distance between any two nodes in the network is two hops, the wireless network could be subdivided into separate IP "networks", each with its own IP address, and one or more nodes that can communicate with both sides could be designated to be a router and to forward packets between the two sides. This scheme is not used in practice because it requires careful assignment of IP addresses corresponding with node reachability, and in wireless networks node reachability can change dynamically (even if the nodes don't move), or at least may not be known at configuration time.

Instead, and as described in Section 2.3, an entire wireless ad-hoc network is usually seen as a single IP network, and a routing protocol specific to the wireless ad-hoc network provides for packet forwarding. However, this means the wireless ad-hoc network must either route to IP addresses, or support network-wide broadcasts that can be used to implement ARP. To implement such a broadcast, it is necessary that each node rebroadcast each broadcast packet it receives, and that each node do this exactly once (to avoid broadcast storms). As in OSPF, sequence numbers can be used to achieve this.

4.3 IP and ARP over other technologies

On networks where hardware addresses are not used, such as SLIP and PPP, ARP is not necessary. Where ARP is needed, the network must support broadcasting to all other hosts on the network. This is very common, but there are exceptions. One of the protocols described in Chapter 5 is ATM, which requires the use of ATM addresses but does not generally support broadcasts. ATM does support arbitrary multicasts, and can thus be configured as a broadcast network, but this configuration can be nontrivial and may negate some of the advantages of using ATM, so we now consider how IP can be implemented over ATM without using broadcasts. This is part of the Classical IP over ATM, or CLIP, protocol.

The purpose of the broadcast ARP is to locate a node that has the required information, namely the translation from protocol address to hardware address. For ARP used over a broadcast medium, the destination host self-selects as the host that should reply. It is however possible that other hosts would have the required information and could reply. If a centralized host were to do this, we would think of it as an ARP server. The CLIP solution is therefore to configure each system with the ATM address of an ARP server. The server accepts ARP requests for all the hosts in the network, and will respond if it has a translation (there could be multiple ARP servers in a network). The ARP server also caches the translation in the ARP request, so that it can be returned in further ARP requests. To speed up the process, each system in the network can send an (arbitrary) ARP request to the server when it first comes on-line, implicitly requesting that the ARP server cache its information.

4.4 TCP congestion control and data link technologies

TCP congestion control is designed assuming that data loss implies congestion. This assumption is safe in the sense that it may cause congestion control to be invoked unnecessarily, resulting in lowered performance, but it always causes congestion control to be invoked when it is actually needed.

Unfortunately, this loss in performance can be very significant, and might even fail to address the problem. Consider for example a node on a multi-hop wireless network connected to the Internet via a router. The node is sending TCP data to a distant host, so that the round-trip time might be hundreds of milliseconds. On the wireless network, however, interference might last on the order of a few milliseconds. By the time TCP detects the packet loss (at least one round-trip-time later) and reacts by slowing down, the local congestion might no longer be relevant, but the performance will suffer. If this happens frequently, TCP's congestion window will remain small, severely impacting the performance of the congestion.

To get around this problem, most broadcast data-link layer technologies, including both Ethernet and 802.11, provide a limited number of retransmissions in case of collision. If a packet suffers a collision, it can be retransmitted quickly. Ethernet has hardware collision detection, whereas 802.11 uses the collision avoidance (CA) method of sending an RTS and replying with a CTS, and finally sending an ACK after the packet has been received successfully. For both 802.11 and Ethernet, retransmission takes place relatively quickly, in many case avoiding the TCP retransmission. In cases where the repeated retransmissions also fail, the network is suffering from longer-term congestion and the TCP retransmission algorithm takes the appropriate actions.

The usefulness of data-link retransmission in the presence of TCP retransmission illustrates an important point: that even though TCP provides end-to-end retransmission, which guarantees correctness, hop-by-hop retransmission can still be useful as an optimization to improve performance. Because hop-by-hop retransmission is not required to guarantee correctness, it does not have to solve all possible problems. In particular, it does not have to deal with the case where many successive transmissions all fail. Instead, the data-link layer can simply give up, stating that packet transmission failed, and let the higher layers deal with the problem. For example, if a link fails repeatedly, IP (or the lower-layer routing protocol, e.g. for wireless multi-hop networks) may be able to use an alternate route to the destination.

5. Principles

  1. broadcast style networks can connect many computers on a single network
  2. Medium Access Control (MAC) describes how different hosts can gain access to a shared medium. The MAC of different broadcast networks is different, but has the same goals.
  3. a broadcast network is fundamentally only a good idea if it is a good match of need and resources. If more bandwidth is needed, a dedicated (non-shared) network may be more cost-effective. Ethernet hubs and switches leave this delicate balance to the purchaser.
  4. on a broadcast network, we need to specify the next hop's address. If this is not derived from the IP address, then we need a mechanism such as ARP.