Computer Networks Project 1


The goals of this project are:
  1. to implement a web server
  2. to learn HTTP
  3. to practice using the sockets interface

This is an individual project. You may discuss concepts and ideas with other students, but you should be the sole author of all your code. All your code must be written in C.

Submission is electronic (see below). Submission must be on time -- late submissions will not be graded and will receive no credit. So please submit what you have on the due date, by 11:59pm HST on September 28th.

The assignment is to write a simple HTTP/1.0 web server. The server is unusual in that it does not serve the file specified in the URL. Instead, it accesses the specified file, searches through it for references, randomly selects one of the references, and and forwards the contents of that reference to the client.

Forwarding the contents is similar to what a web proxy might do.

Selecting one of the links at random is similar to what the quiz server does.

HTTP version 1.0

HTTP version 1.0 is described in RFC 1945. Do NOT print out the entire RFC on the departmental printer -- there are large sections you do not need. Please read it online and only print out the sections you really need (or print it on a printer you own with paper you bought).

Note: RFC 1945 is a long document. Part of this assignment is to read and understand the document, and identify which parts are relevant to this project and useful to you.

Project Deliverables

I will expect the following from each student:
  1. The source code (source and makefile only -- no objects, binaries, or random files) for your implementation. The source code must compile and run on uhunix2. Compilation must be achieved by running ``make'' with no arguments (see make(1S)). You should also include a file status that describes whether your program works and if not, what problems you are having.
  2. Create a web page with links to each of your source files, make file, and status file, and make it readable to the entire world. You do not need to publish the web page.
  3. Send the URL for that web page to the instructor. Your web page must remain available for at least one week, giving the instructor, TAs, and your colleagues plenty of time to download it and review it. If we are unable to download your page automatically on the deadline, you will get no credit for the course. I suggest posting the web page and placing the files in your public_html directory on uhunix, and making sure you can access them through http://www2.hawaii.edu/~yourlogin.
  4. Please follow the naming instructions: name your makefile makefile (all lower case), and your status file status (no extension).
  5. Please turn in your project by the deadline. After the deadline you will be asked to review others' projects, and therefore you can get no credit for any work turned in after the deadline.

Program

I want you to write a forwarding web server (called webselect). The web server loads the specified html file (only files ending in .html or .htm should be loaded in this first step) and parses it.

The path /home/1/esb/public_html/ (or, for your own testing, /home/n/yourlogin -- but remember to change it back before submitting) should be placed before whatever path is specified in the URL itself, so that all html files specified to your server are relative to /home/1/esb/public_html/.

The purpose of parsing is to extract all the HTML references in the page. References are identified by the string "href=", which may have arbitrary capitalization. webselect must then return to the client the contents of one, randomly selected reference.

The references themselves may point to pages that are not HTML pages, for example images or plain text. For this project, you must be able to process references to files with the following extension and MIME type:

The references may be to files that are on the local machine or on other servers. If the files are on other servers, webselect must act as a proxy, retrieving the file from the other server and sending the contents to the client. The header sent by the other server may be forwarded, unchanged, directly to the client.

If the references are to files that are on the local machine, webselect must act as a server, returning a header to the client, reading the file, and sending it back. The file reference might be absolute, if it begins with a "/" character, or relative to the directory in which the original html file was found.

In summary, webselect should return the contents of one of the references, randomly chosen.

Calling Sequence: webselect must take one or two arguments. The first argument is the port number and is required, and webselect should terminate if it is not present. The second argument is optional. If it is present, it is an integer n giving the number of the selection to return. When this argument is present, webselect must, the first time it is called, return the nth reference (n can take any value from 1, 2, 3, ...). If n is out of range (i.e. there is no nth reference), webselect must terminate the first time a client connects to it with a valid file. After the first connection, webselect must return to using the random selection.

Synopsis of the calling sequence:

usage: webselect port [selection]

webselect will terminate if the arguments are not integers,
if the port is already in use, or if the selection (if specified)
fails to identify a valid reference in the URL specified in the first
access to the server.  The first reference is number 1.

Hints

  1. Solaris Warning On Solaris, you should be aware that the accept system call will return -1 if it is interrupted by a signal. You must then compare the global variable errno (see man errno) to EINTR to see if it failed because of the interrupt, and if so, repeat the call. Other Unixes don't have this property, but if you write code that works on Solaris, it should work well on other Unixes as well.
  2. Random selection Please use random(3C) to make your random selection. Do not use rand(3C). I suggest you do NOT call srandom(3C), initstate(3C), or setseed(3C), so your program will be repeatable and easier for you to test.
  3. Telnet Program The telnet program on Unix accepts an optional second argument: a port number. If you telnet to uhunix2 (or whatever machine you are running your server on) and specify the port number your server is listening on, you will be able to enter text and see the responses. Telnet on Unix generally terminates lines with the single character \n, so in order to test in this manner, your program will need to be flexible about how lines are terminated.

    If you want to use telnet to connect to a server on the same machine, you can simply telnet localhost portnumber

    Port numbers below 1025 are generally reserved for systems servers (the root user on Unix), so use higher numbers

  4. Testing

    After testing using telnet, you should test using a regular web browser (or two, or 10). Most machines have a specific web browser, for example Netscape or IE, and uhunix2 also has the lynx text-only browser. Be sure your server works with both Netscape or IE, and with lynx. You can specify port numbers in URLs by putting a colon after the machine name followed by the port:

     http://www.ics.hawaii.edu:99/path 
    does an HTTP request on port 99 of www.ics.hawaii.edu, requesting /path.

    You should also test using the TA's test program, which should be available at least a week before the deadline. Plan for this debugging phase to take at least a week.

    When testing a network program, there is always the question of knowing what exactly my program is sending, and what exactly it is receiving from the peer. I strongly suggest that you add to your program a compile-time option (disable it before turning in the program) that allows you to see, and maybe save in a file, the entire exchange between your server and the client.

  5. Strings.

    Data read from the network is not in C string format (even though HTTP uses ASCII encoding), and specifically does not include the terminating NULL character.

    When sending a C string, you can use strlen to decide how many bytes to write (because you are sending ASCII -- if you were sending binary, you would be unable to use strlen). When reading from the network, specify the buffer length and use the return value from read to determine how many bytes were read.

    Remember that your program may or may not get all the data in a single read. The test program will specifically test to make sure your program handles the case where not all the data is received at once. You may have to do multiple reads to get the entire request header, and you may have to do multiple reads to get the entire contents when you are acting as a proxy. Be sure you have received the entire request header before processing the request (however, when acting as a proxy you may start to forward data as soon as you receive it -- just make sure you don't stop until you are done).

  6. Line Termination.

    HTTP uses CRLF (\r\n) as a line terminator, but not all browsers and servers implement that, and your program should work correctly whether lines are terminated with CR, LF, or CRLF. Remember that you should be generous in what you accept, and strict in what you send.

    Part of this project is to parse data received from the network. Parsing in C is often done with "lex" and "yacc". Both of these are available on uhunix2. I suggest that you study them to decide whether they might be the best way of implementing your parser. There is no requirement that you use lex or yacc -- you may do your own parsing if you prefer.

  7. Server.

    The server I gave in homework 1 in a past course may give you ideas for webselect (I also encourage you to study that homework for more ideas about client-server programming). Feel free to use both that server and the code in the textbook as a basis for this project. Do not use any other code, unless you have written it yourself.

  8. Debugging.

    If you haven't done so already, you may want to learn to use the gdb debugger, or any other debugger available on your system. If you use gdb, you may want to check the GDB Manual.

  9. String Searching.

    You are welcome to use this case-independent string matching function.