The main purpose of computer networking is to allow distributed programs to communicate.
This is a sweeping definition, covering local- and wide-area networks (LANs and WANs) and modems, the world-wide web and Internet telephony, wireless and fiber optics. It excludes older communications technologies such as traditional telephony and the broadcast media, which were not originally designed for computer-to-computer communication.
What separates computer communication from the other technologies is the need to exchange bytes and collections of bytes among programs. This imposes a number of requirements:
The applications themselves may use different protocols. These application-specific protocols belong in the application layer. Application layer protocols depend on the lower-layer protocol to transfer the data, and add services such as security, file sharing, directory maintenance, email, web access, network debugging, and much more. These protocols use the transport, network, and data link layer protocols to actually transfer the data.
Two or more programs that are communicating are called peers. The word peer implies equality, and in most cases communication between peers is bidirectional and in some sense equivalent.
One common model of computer communication is highly asymmetric: one peer, the client, sends requests (encoded as bytes) to another peer, the server, which sends replies. In the client-server model, requests and replies are matched, with one reply for every request. The client chooses the server, sends the request, and waits for the reply. The server has an infinite loop which accepts requests, does whatever work is necessary, then sends the appropriate reply back to the client. Each server can serve many clients, each client can access many server.
One example of the client-server model is the world-wide web. The client, usually a web browser or a web robot, sends a request to a web server using a protocol called HTTP. This protocol allows the client to specify which file from the remote system is desired. The server receives the request, checks that the desired file is available, and if so sends back the contents, again using HTTP. If the file is not available, HTTP provides mechanisms for the server to communicate this to the client.
Other client-server applications include:
The APIs do support peer-to-peer transfers, letting the programmer implement any necessary client or server interactions. Although these APIs are available for a number of languages, what follows describes the APIs for the C language. Due to its efficiency and flexibility, and due to the large volume of programs already written in C, C is still the most commonly used language for implementing network applications, although Java, C++, and other languages keep gaining in popularity.
int socket(int domain, int type, int protocol); int bind(int s, struct sockaddr *my_addr, socklen_t addrlen); int listen(int s, int backlog); int accept(int s, struct sockaddr *addr, socklen_t *addrlen); int connect(int s, struct sockaddr *server_address, socklen_t addrlen); int shutdown(int s, int how); int close(int s);
int send(int s, const void *msg, int len, int flags);
int sendto(int s, const void *msg, int len, unsigned int flags,
const struct sockaddr *to, socklen_t tolen);
int write(int s, const void *buf, int count);
On windows, write cannot be used to send data on sockets.
int recv(int s, void *buf, int len, int flags);
int recvfrom(int s, void *buf, int len, unsigned int flags,
struct sockaddr *from, socklen_t *fromlen);
int read(int s, void *buf, int count);
On windows, read cannot be used to send data on sockets.
int gethostname(char *name, int len); struct hostent *gethostbyname(const char *name); struct protoent *getprotobyname(const char *name);
int WSAStartup(int version, WSADATA *implementation); int WSACleanup();
Another reference is Comer and Stevens' "Internetworking with TCP/IP -- Volume III -- Client-Server Programming and Applications". The copy I have seen was published in 1997 by Prentice Hall, ISBN 0-13-848714-6, and proclaims on the cover "Winsock Version of Client-Server Programming". However, at least one student in this course has told me this book covers the material but not in sufficient detail -- he had the Linux/Posix edition from 2000.
The following is a list of books that some of my ICS 451 students suggested, in no particular order:
If you look at the functions bind, listen, accept, connect, shutdown, close, send, sendto, write, recv, recvfrom, and read you will see that all take as first argument an integer file descriptor such as returned by the call to socket. If C were object oriented, it is possible that this design would have been somewhat different, with an object of class socket having methods bind, listen, accept, and so on. Perhaps socket would be a subclass of another class, "file descriptor", which would provide the operations close, write, etc. In such a hypothetical object-oriented design, we might have an object s of type socket and call:
s->send (...)Instead, in C, we call
send (s, ...)The two notations are formally equivalent, and in fact most compilers internally converts the first form to the second. This notation can generally be used to express any object-oriented concept in languages that are not object oriented.
For examples of how these functions can be used in actual practice, refer to Homework 1, which uses the TCP protocol. The alternative is to use the UDP protocol. Programs that use UDP are similar to those using TCP but we specify "udp" instead of "tcp" (of course), and SOCK_DGRAM instead of SOCK_STREAM.
There are many differences between the TCP and UDP protocols, but for the application programmers two of the differences are essential: TCP is stream-oriented and reliable, UDP is packet (datagram) oriented and unreliable. We describe these terms in the remainder of this section.
A packet-oriented protocol is one in which each send or write operation corresponds to exactly one read or receive (recv) operation. If the buffer given to the receive operation is too small for the packet (or datagram), the packet is truncated to fit, and the bytes that do not fit are discarded.
In contrast, in a stream-oriented protocol is one in which the bytes are treated as being in sequence, with no boundaries introduced by the send operation. For example, a sequence of send operations may be combined and all the data received in a single recv operation. Conversely, the data sent using a single send operation may be received by multiple read operations. The system is free to combine or partition the stream at will, as long as the bytes are delivered in the correct order, and the application has no control over how many bytes are received each time.
With a reliable protocol, either all the data sent is correctly received, or the sender is informed that at least some of the data was not delivered. With an unreliable protocol, some of the data may be lost. With an unreliable datagram protocols, datagrams might even be delivered out of order.
For most applications we use TCP. The reliability is exactly what is needed for most systems. The stream-oriented nature generally means we have to check whether we've received all the data we expected, and loop back and read again if we haven't -- this is a potential pitfall that programmers need to be aware of. The program may test fine on selected inputs, and fail unexpectedly and surprisingly on a larger range of inputs. For all but the most trivial applications, always have recv inside a loop, and continue looping until all the expected data has been received.
UDP is only used by specific, advanced applications for which the stream model or the reliability of TCP are inappropriate. For example, in a real-time situation such as Internet Telephony it is better to lose the occasional packet than for TCP to slow down while lost data is being retransmitted.
Networking protocols are usually implemented as part of the operating system. The sockets API is essentially used to transfer control to the operating system to perform the required functions, for example, send data. Within this general model are two possible implementations.
One way to implement the API is to provide the functions of the API directly as system calls. A system call executes a special, trap instruction which transfers control to the operating system (referred to as the OS or the kernel). The OS examines the parameters of the trap and decides which internal function to execute, then uses another special instruction to return to the caller. Functions which are internal to the OS but can be called by application-level programs are therefore system calls. System calls are documented in Chapter 2 of the Unix manual, so the man page for bind can be read by typing man 2 bind. Linux implements the sockets API as system calls.
Another possible implementation is as library functions. A library is simply a collection of related application-level code that can be called by application programs (in Java, a library is called an "archive"). The library functions need some way to send and receive the data, perhaps via other system calls, but as long as the OS provides such mechanisms, the OS itself need not implement the specific sockets API (it still does need to implement the underlying protocols -- contact your instructor if this is confusing to you). Unlike system calls, library calls are not automatically available to programs but have to be explicitly linked in. The linking is the final stage of the compilation process in which the executable file is created. On systems where the sockets API is available as a library, the final compilation step must include -lsocket to get the socket functions, and -lnsl to get the name server functions. C library calls are documented in Chapter 3C of the Unix manual, so the man page for bind can be read by typing man -s 3c bind. Solaris/SunOS implements the sockets API as library functions.
Whichever way the sockets API is implemented, the user programs themselves the same, and the only noticeable difference is in the compilation -- which libraries, if any, must be linked in to get the program to work.
Almost every non-trivial implementation of a networking system is multithreaded. To see why this is fundamental, consider that the networking code must be able to react to packets coming from two different sources: one is the network (data that the system receives), and the other is the user code (data that this system wishes to send). One relatively simple way of reacting to data arriving from two different sources (i.e. the network and the application) is to have one thread handle data from one source, and the other thread handle data from the other source. Typically, the part of the program that handles data from the user program is considered the top half (also called upper half), and the part of the code that handles data from the network (or other devices) is the bottom half (lower half). The top half runs when a user program does a system call, and performs any operations needed to get the data to its destination. The bottom half runs whenever the device signals that data is available. The two halves may have to communicate. For example, the receive system call (i.e. the part of the top half that takes data received from the network and gives it to the application) has to check whether any data has already been received from the network, and if not, must block until the data arrives. When the data arrives, the bottom half must somehow make the data available to the top half and unblock the top half so the receive system call can complete.
While two threads is generally the minimum, some networking systems have many more than two threads.
A crucial task of the networking system is controlling the network device (or devices). Network devices can be quite complex, and often perform many operations automatically once they have been set up to do so. In most computers, a device is able to interrupt the main processor when it is ready for the next task. Once the processor is interrupted, it begins to execute the bottom half for that device. Code that interacts with a hardware device is called the device driver. The specific portion of the device driver that is responsible for responding to an interrupt is called the interrupt handler.
In these notes, we build a network from the ground up. To do so, we start with simple and understandable hardware: the computer serial line. Most computers have one or more serial ports. The USB interface found on many of the newer computers is simply a more advanced version of a serial port. A serial port can be connected via a serial line to another computer or to a hardware device. The serial port and serial line can communicate one data bit at a time in each direction, hence the name. The hardware is designed to take sequences of 8 bits and give them to the computer as a byte. Most serial ports can be configured to communicate at different speeds. In general, longer wires will only work with lower speeds, and shorter wires can support higher speeds. For two computers to communicate across the serial line, they have to configure their serial ports to run at the same speed.
The following program, tty.c, can be used to read and write data across a serial line. This is a user-level Unix program. There are some system calls (especially write and read) that request that the operating system transfer data to or from the serial line. The system calls also set the line speed to 9600 baud and request that the line be set into raw mode, so the operating system should not do any special processing for characters that are meaningful on terminals. An example of processing that we do not want to see happening is erasing received characters when the backspace character or the delete character are received. Since we will be sending binary data, and backspace and delete both have binary encodings, if we allowed the operating system to do terminal processing we would lose bytes whenever we happened to transmit those particular bit sequences.
/* tty.c: write to and read from a serial port (ttyS1) */
#include <stdio.h>
#include <sys/fcntl.h>
#include <termios.h>
#include <unistd.h>
#define MESSAGE "hello world\r\n\0"
#define BUFSIZE 1000
main ()
{
char buf [BUFSIZE];
int i, j, fd = open("/dev/ttyS0", O_RDWR);
struct termios tio;
printf ("fd = %d\n", fd);
if (fd <= 0) { perror ("open"); return 0; }
i = tcgetattr (fd, &tio);
if (i < 0) { perror ("tcgetattr"); return 0; }
if (cfgetispeed (&tio) != B9600) {
printf ("ispeed was %d, != 9600 baud (%d)\n", cfgetispeed (&tio));
cfsetispeed (&tio, B9600);
}
if (cfgetospeed (&tio) != B9600) {
printf ("ospeed was %d, != 9600 baud (%d)\n", cfgetospeed (&tio));
cfsetospeed (&tio, B9600);
}
cfmakeraw (&tio);
i = tcsetattr (fd, TCSANOW, &tio);
if (i < 0) { perror ("tcsetattr"); return 0; }
i = write (fd, MESSAGE, sizeof(MESSAGE));
if (i < 0) { perror ("write"); } else { printf ("write = %d\n", i); }
i = read (fd, buf, BUFSIZE);
if (i < 0) { perror ("read"); } else { printf ("read = %d\n", i); }
buf [i] = '\0';
printf ("%s", buf);
close (fd);
}
tty.c only has a single thread, and only writes and reads once. We really need a somewhat more complex program:
The following program, ttynet.c, shows an implementation of these requirements.
/* ttynet.c: provide serial-line send and receive */
/* link with -lpthread */
/* released under the GPL */
#include <stdio.h>
#include <sys/fcntl.h>
#include <termios.h>
#include <unistd.h>
#include <pthread.h>
/* exported functions, could be in a .h file */
int install_tty_data_handler (int tty, void (*) (int, char));
int write_tty_data (int tty, char data);
/* this should definitely be in a .h file */
#define MAX_TTYS 100
/* keep a mapping from tty numbers to unix file descriptor numbers */
static int tty_fds [MAX_TTYS] = {0, };
/* any static function is NOT exported */
static int initialize_tty (int tty_number)
{
/* assume no TTY number has more than 100 digits */
char tty_name [sizeof("/dev/ttyS0") + 100];
int i, fd;
struct termios tio;
if (tty_number >= MAX_TTYS) { perror ("tty number"); exit(1); }
if (tty_fds [tty_number] != 0) { perror ("tty already open"); exit(1); }
sprintf (tty_name, "/dev/ttyS%d", tty_number);
fd = open(tty_name, O_RDWR);
printf ("fd = %d\n", fd);
if (fd <= 0) { perror ("open"); exit(1); }
tty_fds [tty_number] = fd;
i = tcgetattr (fd, &tio);
if (i < 0) { perror ("tcgetattr"); exit(1); }
if (cfgetispeed (&tio) != B9600) {
printf ("ispeed was %d, != 9600 baud (%d)\n", cfgetispeed (&tio));
cfsetispeed (&tio, B9600);
}
if (cfgetospeed (&tio) != B9600) {
printf ("ospeed was %d, != 9600 baud (%d)\n", cfgetospeed (&tio));
cfsetospeed (&tio, B9600);
}
cfmakeraw (&tio);
i = tcsetattr (fd, TCSANOW, &tio);
if (i < 0) { perror ("tcsetattr"); exit (1); }
return tty_number;
}
struct receive_thread_arg {
void (* data_handler) (int, char);
int tty;
};
static void * tty_receive_thread (void * argument)
{
/* cast the argument back to a pointer to the receive_thread_arg */
struct receive_thread_arg * rta = (struct receive_thread_arg *) argument;
void (* data_handler) (int, char) = rta->data_handler;
int tty = rta->tty;
printf ("tty_receive_thread is starting\n");
/* we have read the argument, it won't be used ever again, so free it */
free (argument);
/* set the argument to NULL to guarantee it won't ever be used again */
argument = NULL;
/* loop forever, and whenever data is received, call the data handler */
/* when no data is available, the loop blocks on read. */
while (1) {
char buffer [1];
int i = read (tty_fds [tty], buffer, 1);
if (i == -1) {
perror ("read");
exit (1);
}
if (i == 1) {
/* deliver the data with an upcall */
data_handler (tty, buffer [0]);
} else {
printf ("ttynet error: got value %d from 'read', expected 1\n", i);
}
}
/* we never return, but if we ever did, we'd want to return a void * */
return NULL;
}
/* returns the identifier (an integer >= 0) to be used for write_tty_data */
int install_tty_data_handler (int tty, void (* data_handler) (int, char))
{
pthread_t thread;
int actual_tty = initialize_tty (tty);
struct receive_thread_arg * arg =
(struct receive_thread_arg *) malloc (sizeof (struct receive_thread_arg));
arg->tty = actual_tty;
arg->data_handler = data_handler;
if (pthread_create (&thread, NULL, &tty_receive_thread, (void *) arg) < 0) {
perror ("pthread_create");
exit (1);
}
return actual_tty;
}
int write_tty_data (int tty, char data)
{
char buffer [1];
buffer [0] = data;
return write (tty_fds [tty], buffer, 1);
}
#ifdef RUN_THIS_TEST
/* this is a sample program to exercise the above code */
/* my data handler simply prints any received data to the screen */
static void my_test_data_handler (int tty, char c)
{
putchar (c);
}
main ()
{
int tty = install_tty_data_handler (0, my_test_data_handler);
char data_to_send [] = "this is my test data\n123\n";
int i;
for (i = 0; i < sizeof (data_to_send); i++) {
write_tty_data (tty, data_to_send [i]);
}
/* wait and see if we receive anything */
printf ("sleeping 100 seconds\n");
sleep (100);
}
#endif /* RUN_THIS_TEST */
A few things to note in this code:
This scheme has one fatal flaw. On a serial line, characters are occasionally lost. Assuming that one character is lost, our receive routine would keep reading until the first byte of the next packet is received. This byte is the first byte of the length, but the receiver does not know that. So the receiver returns that packet to the application, and reads the next two bytes, expecting to find the length of the next packet. Unfortunately, these two bytes are, respectively: the second byte of the length, and the first byte of the data. Putting them together as a 16-bit integer produces nonsense, and the receiver and sender get out of synch. The two may accidentally resynchronize later, but the chances are small, and meanwhile, communication is lost.
The alternative is to use a special character to mark the beginning of a frame. Since all possible characters may be present in the data, we also need some escape mechanism to help us distinguish the special character marking the beginning of the frame from any and all occurrences of this special character in the data. Once such a scheme is available we can send packets on serial lines.
One very simple scheme for framing packets is described by the SLIP protocol, documented by RFC 1055, available at http://www.ietf.org/rfc/rfc1055.txt (all RFCs, that is, all Internet protocol definitions, are available from this site). SLIP defines not one, but two special characters, END and ESC. To quote the RFC,
The SLIP protocol defines two special characters: END and ESC. END is octal 300 (decimal 192) and ESC is octal 333 (decimal 219) not to be confused with the ASCII ESCape character; for the purposes of this discussion, ESC will indicate the SLIP ESC character. To send a packet, a SLIP host simply starts sending the data in the packet. If a data byte is the same code as END character, a two byte sequence of ESC and octal 334 (decimal 220) is sent instead. If it the same as an ESC character, an two byte sequence of ESC and octal 335 (decimal 221) is sent instead. When the last byte in the packet has been sent, an END character is then transmitted.
Phil Karn suggests a simple change to the algorithm, which is to begin as well as end packets with an END character.
This strategy is called byte stuffing -- replacing one byte in the data by multiple transmitted bytes.
Note that the END character, octal 300, is hex C0, the ESC character, octal 333, is hex DB, and the two encoding characters, octal 334 and 335, are hex DC and DD. In general, you should be comfortable converting between hex and other formats. To do this manually we need to write down the binary representation of a number, which can then conveniently be converted to any other format.
The RFC itself is worth reading -- it is only six pages long, including a complete implementation of the byte stuffing algorithm which replaces a single ESC or END character with the appropriate two-character sequence.
The byte stuffing in SLIP is a special case of a more general principle. In general, we want a symbol to mark boundaries within a stream of symbols. If we have a symbol that we can transmit that is not a valid data item, then we can use that symbol to mark the boundary. That is what C does in using the null character (which is not a valid character in a C string) to mark the end of a string. In networking, however, we use binary encodings and we want to be able to transmit arbitrary binary data. As an extreme example, we could actually transmit 9 bits for each byte of data sent, and use the 9th bit to mark the beginning or end of a frame. Likewise we could transmit 8-bit bytes, but only send 7 bits of actual data in each transmitted byte, reserving the 8th bit for this signaling. These two strategies have a relatively large overhead of 1/9 (11%) or approximately 1/8 (12.5% -- the overhead may be larger if we are only sending 8 bits of data, since we would then have to transmit two bytes), and such an overhead is usually undesirable.
With byte stuffing, we only add overhead in relatively rare cases: at the beginning or end of the frame, and when the escape or end symbols are transmitted. If the data has a uniform random distribution, then we will only stuff one byte 2 times out of every 256 bytes sent, for an average overhead of approximately 1%, though in the worst case, the data will consist entirely of escape and end bytes, and the overhead will be 100%. Additional overhead is introduced for very small packets -- again, a packet of size 1 has 100% overhead if its data does not have to be escaped, and 200% overhead if it does. While these are large numbers, it would be unusual to transmit very many escape or end bytes, and if we are only transmitting small packets, we don't usually send too many of them. TCP, for example, consolidates many small transmissions into a single large segment, so that small packets are only sent when absolutely necessary.
Bit stuffing is similar to byte stuffing, but on the bit level. In one example, at least one fifth of the bits in any given transmission are required to be "0" to guarantee receiver synchronization (in this scheme, a "0" is encoded by a transition in the signal, and a "1" is encoded as no change in the signal). That means if we are sending lots of zeros, the receiver can synchronize on the data. However, if we are sending long sequences of "1" bits, the receiver may lose synchronization and have a significant chance of either inserting or deleting a bit.
In this case, the sender monitors the data as it is sent, and after any sequence of 4 data bits with a value of "1", it automatically inserts ("stuffs") a "0" bit. This means the receiver can rely on the fact that a sequence "11110" represents only four (not five) actual bits of data. To give concrete examples, the four bits "1110" are sent unchanged as "1110", the five bits "11110" are sent as "111100", and the five bits "11111" are sent as "111101".
In this kind of bit stuffing, the worst-case overhead is 25%, but in most cases, the overhead is about 1/16 of 25%, or 1/64 -- 1.6%.
/* slipnet.c: provide serial-line send and receive of packets */
/* link with ttynet and pthreads */
/* released under the GPL */
#include <stdio.h>
#include <pthread.h>
#define MAX_SLIP_SIZE 1006
#define MAX_SLIP_SEND 1006
#define END 0300 /* indicates end of packet */
#define ESC 0333 /* indicates byte stuffing */
#define ESC_END 0334 /* ESC ESC_END means END data byte */
#define ESC_ESC 0335 /* ESC ESC_ESC means ESC data byte */
#define MAX_TTYS 100
/* exported functions, could be in a .h file */
int install_slip_data_handler (int, void (*) (int, char *, int));
int write_slip_data (int, char *, int);
/* buffers for the data */
static char receive_buffer [MAX_TTYS] [MAX_SLIP_SIZE];
/* this is the position to which we add newly received characters */
static int receive_position [MAX_TTYS];
/* record whether the last character for this buffer was an escape character */
static int escaped [MAX_TTYS];
/* true if an error was detected in the current frame */
static int error_frame [MAX_TTYS];
/* serialize all access to the buffers */
static pthread_mutex_t receive_mutex [MAX_TTYS];
static pthread_mutex_t send_mutex [MAX_TTYS];
/* serialize access to the global data */
static pthread_mutex_t global_mutex = PTHREAD_MUTEX_INITIALIZER;
/* the data handlers are also global. */
typedef void (* my_data_handler) (int, char *, int);
static my_data_handler slip_data_handler [MAX_TTYS];
static void print_packet (char * string, char * data, int numbytes)
{
int i;
printf ("%s:\n", string);
for (i = 0; i < numbytes; i++) {
/* must mask the byte with 0xff, since otherwise bytes greater
than 0x80 will be converted to negative integers */
printf ("%02x", (data [i]) & 0xff);
if ((i == (numbytes - 1)) || (i % 20 == 19)) {
printf ("\n");
} else {
printf (".");
}
}
}
static void put_char_in_buffer (int tty, unsigned char c)
{
if (receive_position [tty] < MAX_SLIP_SIZE - 1) {
receive_buffer [tty] [(receive_position [tty])++] = c;
} else {
printf ("error: slip framing error on port %d, maybe lost END\n", tty);
/* discard the character -- basically, we don't save it anywhere. */
/* also make sure the current frame is discarded */
error_frame [tty] = 1;
}
}
static void data_handler_for_tty (int tty, unsigned char c)
{
#ifdef DEBUG
printf (" received character %x/%o on port %d\n", c, c, tty);
#endif /* DEBUG */
/* make sure we have been initialized */
pthread_mutex_lock (&global_mutex);
/* we have been initialized, so proceed */
pthread_mutex_unlock (&global_mutex);
/* acquire the lock for the receive buffer */
pthread_mutex_lock (&(receive_mutex [tty]));
if (error_frame [tty]) {
if (c == END) {
error_frame [tty] = 0;
receive_position [tty] = 0;
escaped [tty] = 0;
}
} else {
if (escaped [tty]) { /* last character was an escape */
escaped [tty] = 0;
if (c == ESC_END) {
put_char_in_buffer (tty, END);
} else if (c == ESC_ESC) {
put_char_in_buffer (tty, ESC);
} else { /* this may be a legitimate oversight in the sender */
printf ("warning: accepting illegal character after ESC\n");
put_char_in_buffer (tty, c);
}
} else { /* last character was not ESC */
if (c == END) { /* done, give packet to data handler. */
if (receive_position [tty] > 0) { /* packet is not empty */
if (slip_data_handler [tty] == NULL) {
printf ("error: received packet, but no slip data handler\n");
print_packet ("received packet", receive_buffer [tty],
receive_position [tty]);
receive_position [tty] = 0;
} else {
#ifdef DEBUG
printf ("received %d bytes\n", receive_position [tty]);
print_packet ("received packet", receive_buffer [tty],
receive_position [tty]);
#endif /* DEBUG */
/* note the receive buffer remains locked while we call the
slip data handler. If the slip data handler never returns,
slip will deadlock, i.e., be unable to ever again receive data.
This would also block the receive thread in ttynet. */
slip_data_handler [tty] (tty, receive_buffer [tty],
receive_position [tty]);
} /* if packet is empty, silently ignore */
/* get ready to start receiving a new packet */
receive_position [tty] = 0;
} /* else: silently ignore packets of size 0 */
} else if (c == ESC) { /* signal for the next character */
escaped [tty] = 1;
} else { /* 'normal' character */
put_char_in_buffer (tty, c);
}
}
}
/* finally make the buffer available to other threads. */
pthread_mutex_unlock (&(receive_mutex [tty]));
}
/* returns the identifier (an integer >= 0) to be used for write_slip_data */
int install_slip_data_handler (int tty,
void (* data_handler) (int, char *, int))
{
int fd;
pthread_mutex_t tmp = PTHREAD_MUTEX_INITIALIZER;
/* keep thread from executing until we are done initializing */
pthread_mutex_lock (&(global_mutex));
fd = install_tty_data_handler (tty, data_handler_for_tty);
if (fd < 0) {
pthread_mutex_unlock (&global_mutex);
return fd;
}
receive_position [fd] = 0;
escaped [fd] = 0;
error_frame [fd] = 0;
memcpy (&(receive_mutex [fd]), &tmp, sizeof (tmp));
memcpy (&(send_mutex [fd]), &tmp, sizeof (tmp));
slip_data_handler [fd] = data_handler;
pthread_mutex_unlock (&global_mutex);
return fd;
}
/* this is a macro so the return statement returns from write_slip_data */
#define WRITE_BYTE(fd, c) \
if (write_tty_data (fd, c) != 1) { \
pthread_mutex_unlock (&(send_mutex [fd])); \
printf ("slip: error writing tty data\n"); \
return -1; \
}
int write_slip_data (int fd, char * data, int numbytes)
{
int byte;
if ((numbytes <= 0) || (numbytes > MAX_SLIP_SEND)) {
printf ("slip: bad size %d\n", numbytes);
return -1;
}
#ifdef DEBUG
printf ("acquiring send lock for tty %d\n", fd);
#endif /* DEBUG */
pthread_mutex_lock (&(send_mutex [fd]));
#ifdef DEBUG
print_packet ("sending packet", data, numbytes);
#endif /* DEBUG */
WRITE_BYTE (fd, END);
for (byte = 0; byte < numbytes; byte++) {
unsigned char c = (data [byte]) & 0xff;
if (c == END) {
WRITE_BYTE (fd, ESC);
WRITE_BYTE (fd, ESC_END);
} else if (c == ESC) {
WRITE_BYTE (fd, ESC);
WRITE_BYTE (fd, ESC_ESC);
} else { /* normal byte */
WRITE_BYTE (fd, c);
}
}
WRITE_BYTE (fd, END);
pthread_mutex_unlock (&(send_mutex [fd]));
return numbytes;
}
#ifdef RUN_SLIP_TEST
/* this is a sample program to exercise the above code */
/* my data handler simply prints any received data to the screen */
static void my_test_data_handler (int tty, char * data, int numbytes)
{
printf ("tty %d, ", tty);
print_packet ("slip received packet", data, numbytes);
}
main ()
{
int slip0 = install_slip_data_handler (0, my_test_data_handler);
int slip1 = install_slip_data_handler (1, my_test_data_handler);
int slip2 = install_slip_data_handler (2, my_test_data_handler);
char data1 [] = "123\300\333\334\335xxx\300\334\335 321";
char data2 [] = "\300";
char data3 [] = "\333\334\335";
write_slip_data (slip0, data1, sizeof (data1) - 1);
if (slip1 >= 0) write_slip_data (slip1, data1, sizeof (data1) - 1);
if (slip2 >= 0) write_slip_data (slip2, data1, sizeof (data1) - 1);
sleep (10);
write_slip_data (slip0, data2, sizeof (data2) - 1);
if (slip1 >= 0) write_slip_data (slip1, data2, sizeof (data2) - 1);
if (slip2 >= 0) write_slip_data (slip2, data2, sizeof (data2) - 1);
sleep (10);
write_slip_data (slip0, data3, sizeof (data3) - 1);
if (slip1 >= 0) write_slip_data (slip1, data3, sizeof (data3) - 1);
if (slip2 >= 0) write_slip_data (slip2, data3, sizeof (data3) - 1);
/* wait and see if we receive anything */
printf ("sleeping 100 seconds\n");
sleep (100);
}
#endif /* RUN_SLIP_TEST */
A few things of note:
Threads and buffer management are part of what makes "real" networking and operating system code harder to specify and code than "toy" implementations.
Identifications don't, strictly speaking, need to identify hosts. The Internet Protocol (IP), for example, uses addresses to identify network interfaces. In our example network, multiple serial ports on a single machine would each be connected to a different system, and each would have its own distinct IP address. Even though in everyday usage we talk of "a host's IP address", this is only correct when that host only has a single network interface. A host with multiple interfaces is called a multi-homed host -- it has a "home" on multiple separate networks.
IP has two different versions. The older version is IPv4, the newer version is IPv6. Addresses in both versions have the following properties:
We contrast IP addresses with domain names in the domain name system, DNS. (Domain names are often called DNS names or DNS addresses). A domain name is a human-readable variable-length string identifying a host, for example, mail.ics.hawaii.edu or www.cs.cmu.edu. Properties of domain names include:
"Identifying a host" is a loose term. More accurately, there is a mapping from domain names to IP addresses. Not all possible domain names have such a mapping, nor do all assigned domain names have such a mapping (for example, www.hawaii.edu might have such a mapping, but hawaii.edu might not). Several domain names may map to a single IP address. A single domain name may map to multiple IP addresses. Some IP addresses may not correspond to any domain names at all.
This mapping is maintained by a distributed database, the domain name system. Each individual or organization that has ownership of a domain name is responsible for maintaining the portion of the database corresponding to that domain name and all domain names that are below it in the hierarchy. This means Information Technology Services (ITS) at the University of Hawaii is responsible for maintaining the portion of the database for all names ending in hawaii.edu. They may delegate some of this responsibility to others. For example, the responsibility for maintaining the portion of the database for all names ending in ics.hawaii.edu has been delegated to the system and network administrators of the Information and Computer Sciences department.
In DNS, each contiguous part of the hierarchy which is under the control of one individual or organization is called a zone.
The main purpose of this distributed database is to translate ("resolve") domain name systems to IP addresses and vice versa. The details are available in RFC 1034 and RFC 1035. Documentation that is both more accessible and more thorough is at http://www.dns.net/dnsrd/.
Domain names themselves consist of labels separated by periods. A label consists of one or more letters, digits (not at the beginning of a label), and hyphens (neither at the beginning nor at the end of a label). The maximum length of a single label is 63 characters, and the maximum length of a domain name is 255 characters.
The process of domain name resolution requires a resolver (someone who wants to resolve a domain name) to query a domain name server. This DNS server may or may not be the same as the authoritative server for the domain that the resolver is in. The resolver must be configured with the DNS server's IP address (not its domain name, since that would cause a bootstrapping problem). To quote RFC 1035,
The resolver starts with knowledge of at least one name server. When the resolver processes a user query it asks a known name server for the information; in return, the resolver either receives the desired information or a referral to another name server. Using these referrals, resolvers learn the identities and contents of other name servers. Resolvers are responsible for dealing with the distribution of the domain space and dealing with the effects of name server failure by consulting redundant databases in other servers.
A domain name server may have the translation, either because the server is authoritative for the zone that the DNS request is seeking, or because the server has cached the result of a previous request. Because the data changes infrequently -- much less frequently, and more predictably, than web pages, for example -- caching can be very effective. Translations received from a server carry an indication of the length of time they may be cached.
If a domain name server does not have the translation, it must be configured with the IP address of a server that is closer (in the hierarchy) to the specified IP address. This means that each domain name server for a zone Z must be configured with the IP addresses of all the servers for the all the zones below Z, and the IP addresses of at least one server for the zone above Z. If such a server receives a query for a domain name for which it is not authoritative and which it has not cached, the selection of the next server to query is automatic depending on whether the name is reached by traversing the domain name tree downwards towards the leaves or upwards towards the root.
The Domain Name System database is designed to distribute resource records (RRs). The most common RR type is "A", which provides the IP address listed for a given domain name. Another RR type is "CNAME", which provides the canonical domain name listed for a given "alias" domain name.
A DNS query or response always has a fixed format. It starts with a 16-bit ID, followed by one bit to distinguish queries from responses, four bits to specify the type of query, a bit to specify whether this response is authoritative, a bit to record that the data had to be truncated, a bit each to specify whether recursion is available or desired, a 4-bit reserved field, and a response code, followed by four 16-bit integers. These integers record the number of RRs in the question section, the number of RRs in the answers section, the number of RRs in the name server section, and the number of RRs in the additional records section. Again quoting from RFC 1035,
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | ID | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |QR| Opcode |AA|TC|RD|RA| Z | RCODE | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | QDCOUNT | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | ANCOUNT | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | NSCOUNT | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | ARCOUNT | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
This header is followed by the specified number of questions, answers, and name server records, and additional records.
A question is always encoded with the domain name, followed by the question type (e.g. "A" for an address translation), followed by the question class (usually Internet).
A domain name is encoded by the concatenation of its labels. Each label is encoded as a one-byte length field followed by that number of characters. The last label must have length 0. For a specific example, the domain name "abab.bbb" would be encoded as follows, where "a" is ASCII 61 (hex) and "b" is ASCII 62 (hex):
0x04 0x61 0x62 0x61 0x62 0x03 0x62 0x62 0x62 0x00
The length of the first label, 4 characters, is encoded first, followed by the four characters of the encoding. The length of the next label follows, then the characters of that label. Finally, a label of length zero marks the end of the domain name. Note that no periods (".") are used!
A label length must be 63 or less, meaning the first two bits must be zero. To compress resource records, a sender may specify, instead of an 8-bit label length followed by characters, a 16-bit number beginning with two "1" bits. The remaining 14 bits specify an offset, in bytes, from the beginning of the DNS packet header, which contains the domain name that logically belongs here. For example, if the domain name "hawaii.edu" is encoded starting at position 29 (hex 1D) from the beginning of the DNS packet, the name "www.hawaii.edu" in a subsequent (or earlier!) resource record can be specified as:
0x03 0x77 0x77 0x77 0xC0 0x1D
Where the ASCII for "w" is hex 77.
All the resource records that are not questions have the following format (from RFC 1035):
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | | / / / NAME / | | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | TYPE | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | CLASS | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | TTL | | | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | RDLENGTH | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--| / RDATA / / / +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
The name, type, and class match the corresponding question. The TTL, time to live, is a 32-bit number of seconds that this answer may be cached. RDLENGTH is the length in bytes of the RDATA field, which contains the desired answer.
The format of the RDATA field depends on the type and class of the response. For "A" queries, the RDATA field is 4 bytes wide and contains the IP address of the translation (if a translation fails, that is indicated in the response code). For "CNAME" queries, the RDATA field contains the canonical DNS name.
DNS queries and replies can be carried over either TCP or UDP, in either case using port 53 on the server. It is more common for queries and responses to be carried over UDP, and for zone transfers to use TCP. The format is the same in both cases, except that