Operating Systems Project 1


The goals of this project are:
  1. to learn about shells
  2. to implement a simple shell
  3. to practice using C, fork, and exec

This is an individual or group project, at your choice and as described in the course administration page. You may use any standard libraries -- check with the instructor if you think you might want to use non-standard library functions other than strlcpy and strlcat mentioned below.

The project is due Thursday, January 25th 2007, any time. Submission is by email. Please tar (and if you wish, gzip) your source files and mail them to the instructor in an attachment. Your files must include a makefile which, by typing "make", should build the executable, which should be named shell. Your code should also include a brief README file to report whether your code works or not and anything else (e.g. any unusual library functions) that I need to know. If you are unclear about one or more of these requirements, please contact the instructor or ask your fellow students on the mailing list.

Please send in your project on time -- late submissions will not be accepted, and I prefer to have partially-working projects rather than no project at all.

Deliverables

Your program must implement the shell described in Section 1.3.3 of the textbook -- that is, must be able to execute commands, redirect the standard input or standard output of commands to files, pipe the output of commands to other commands, and put commands in the background. The shell must also print a prompt when it is ready to accept commands.

To simplify things, your shell should look for commands in exactly two directories, /bin and /usr/bin. Also, in case of error your shell may fail, or may print an error message and quit, without having to recover gracefully. Your shell also does not need to catch any signals, nor modify the terminal settings, both of which a real shell must do. Finally, it is very reasonable to have limits on the number and size of commands, arguments, input lines, etc that your shell will handle. Typical limitations might require any single token (command, argument, file name) to be 100 characters or less, the entire command line to be 1,000 characters or less, and other limitations accordingly.

Your shell should use the fork(2) system call and the execv(2) system call (or one of its variants) to execute commands. It should also use waitpid(2) (or wait(2), if you prefer) to wait for a program to complete execution (unless the program is in the background).

The shell should recognize the command exit to mean the shell program itself should terminate.

Your shell must work under linux or on uhunix -- please indicate which you tested on. If you wish for an account on esb-course (a linux machine) please inform the instructor. If you had an account on this machine last semester, you still have the account.

Implementation

A very simple shell such as this needs at least the following components:

The command-line parser for this shell should be very simple, and you may benefit from using the strpbrk(3) library function to implement it.

The fork system call creates a second process that is an exact duplicate of the first, except for the return value from the fork call itself. In the child process this return value is 0, in the parent process it is the child's process ID (which is never 0). The parent may then wait on this pid (unless the process is in the background).

The main challenge of calling execv is to build the argument list correctly. If you use execv, remember that the first argument in the list is the name of the command itself, and the last argument must be a null pointer.

The easiest way to redirect input and output is to follow these steps in order:

  1. open (or create) the input or output file (or pipe).
  2. close the corresponding standard file descriptor (stdin or stdout)
  3. use dup2 to make file descriptor 0 or 1 correspond to your newly opened file
  4. close the newly opened file (without closing the standard file descriptor)

When executing a command line that requires a pipe, the pipe must be created before forking the child processes. Also, if there are multiple pipes, the command(s) in the middle may have both input and output redirected to pipes. Finally, be sure the pipe is closed in the parent process, so that termination of the process writing to the pipe will automatically close the pipe and send an EOF to the process reading the pipe.

Any pipe or file opened in the parent process may be closed as soon as the child is forked -- this will not affect the open file descriptor in the child.

It may also be a good idea to use the strlcpy and strlcat functions to deal with strings safely, even though they are not in all C libraries -- see here to get your own. See this paper for an explanation of why it might be a good idea to use them -- they might help prevent inadvertent buffer overflows.

It took the instructor approximately 3.5 hours (the first time) to write and debug a solution to this project (269 lines long). Please assume that it will take you longer. I usefully debugged using both gdb (use the "-g" switch to gcc to make the output of gdb readable) and printf statements. My worst bug was due to loss of information after I redirected stdout -- my printf statements were no longer working! It took me a while to realize this did not imply that my code was not working, and instead the problem was I still had the write part of the pipe open in the parent process.

Another interesting twist for me was figuring out how to set up the data structures to hold the result of parsing, and how to actually start each sub-command and plumb the pipe(s) appropriately. So even though the parsing is relatively simple, and even though you do not have to check for errors, it may be to your advantage to be a little bit careful when designing your program.

Also note there is code for a very simple shell on page 29 of the textbook. While studying this may be helpful, by itself it is not sufficient for this project. Figure 1.13 may also be helpful as an example of creating pipes.

Grammar and Parsing

The grammar for the command line is approximately as follows:
commandline := pipecommand | pipecommand "&" commandline | empty

pipecommand := redirectcommand | pipecommand "|" redirectcommand

redirectcommand := command ">" outfile |
                   command "<" infile |
                   command "<" infile ">" outfile |
                   command ">" outfile "<" infile |
                   command

command := program | command argument

This simplified grammar would allow output go to both a pipe and a file, and likewise for input. That is not required for this project.


One simple way to parse such a grammar is to write a function to parse a commandline. A subfunction will be invoked to parse any pipecommand encountered. A subfunction will be invoked to parse any redirectcommand, and that will call a subfunction to parse a command.

Each such function has relatively few decisions to make -- e.g. the commandline parser must test for emptyness of the string and for the existence of one or more "&"s, then call pipecommand on the resulting substrings (if any). So the parsing itself is likely to be easy -- what might be more challenging is deciding what to do with the result of the parsing, i.e., how to actually arrange all the arguments, pipes, and input and output files when executing the program.