thesis

DESIGN AND IMPLEMENTATION OF AN OPERATING SYSTEM IN STANDARD ML

A THESIS SUBMITTED TO THE GRADUATE DIVISION OF THE UNIVERSITY OF HAWAII IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

We certify that we have read this thesis and that, in our opinion, it is satisfactory in scope and quality as a thesis for the degree of Master of Science in Information and Computer Sciences.

I still remember how we began to talk about doing this project on one afternoon at the end of 1997, right after the final exam week. I did not recognize this project would be so significant for the system and language research. I even did not think it would be a challenging project at that time. But it turns out to be a great one.

I did learn a lot from the project. Thank you, Edo, for giving me such a wonderful project, for teaching me so much, and for being my advisor!

I also owe a lot to the SML people in Bell labs, CMU, and Princeton, and numerous Linux developers around the world. Without their continuous contribution to SML and Linux development, our project would not have been possible.

Thanks also go to Zach Kent, an undergraduate student here, and an assembly language master. His knowledge and experience in assembly language helped me understand how the Linux interrupt handler works.

Finally, thanks go to my parents. Though they were not able to give any directly help on this project and they are far away from me, their encouragement and their love give me the best moral support to help me go through all these years. They have more confidence in me than I have in myself.

The Hello project aims to design and implement an operating system in Standard ML of New Jersey(SML/NJ). We give an empirical study on the interaction between advanced programming languages and systems programming. This thesis presents the design and implementation of the Hello operating system, including porting the runtime system of SML/NJ, constructing the kernel image on a bare PC, and writing SML software to access and manage the hardware and system resources.

ACKNOWLEDGMENTS

ABSTRACT

LIST OF TABLES

LIST OF FIGURES

CHAPTER 1. INTRODUCTION

1.1 Objectives

1.2 Background

1.3 Contributions of this work

1.4 Structure of this document

CHAPTER 2. DESIGN

2.1 System architecture overview

2.2 Integrating the SML heap image with the operating system

2.3 Communication between SML/NJ and its runtime system

2.4 Handling interrupts with the SML/NJ signaling mechanism

CHAPTER 3. IMPLEMENTATION

3.1 Implementation environment

3.2 Porting the runtime system of SML/NJ to a bare PC

3.3 System size

3.4 Interrupt handler module

3.5 Keyboard module

3.6 Timer module

3.7 Network adaptor module

3.8 Machine interface module

CHAPTER 4. RELATED WORK

4.1 Express project

4.2 Fox project

4.3 LISP Machine

4.4 SPIN project

4.5 JavaOS

4.6 Smart-card project

4.7 STING project

CHAPTER 5. EVALUATION

5.1 Strength of SML/NJ

5.2 Can SML/NJ be better?

5.3 Analysis of the Hello project

CHAPTER 6. FUTURE WORK AND CONCLUSIONS

6.1 Future work

6.2 Conclusions

BIBLIOGRAPHY

1. System code size(in lines) 15
2. Lines of code by language (in lines) 16
3. Image size(in bytes) 16

1. Architecture Overview 9
2. Communication between the co-routines 11
3. Signature of interrupt handler module 17
4. Signature of keyboard module 19
5. Signature of timer module 20
6. Signature of eexpress internal functions 23
7. Signature of eexpress interface functions 24
8. Signature of eexpress interface functions for foxnet 25
9. Signature of machine interface module 27

Over the years, much research has been done on the design of new languages with more and more advanced features for programming. At the same time, a lot of other projects build new operating systems with some novel characteristics, but most of these operating systems are written in C or a similar language. Not much effort has been done to fully deploy advanced language features to build a good operating system.

There are many reasons for functional languages are less popular than imperative languages [24]. One of the most important reasons is that functional programming researchers place far more emphasis on construction of optimizing compilers than on applying the language for practical usage. In spite of this, there are quite few real world applications developed in functional languages [31]. We describe one more application built using an advanced functional language.

Because functional languages have been thought too inefficient, they have not been used widely in systems programming, though they have many good features for system software. From what we have done, we think operating system software written in advanced functional languages such as SML can achieve comparable performance to software written in C.

In this chapter, we will explain the objectives of this project and give a brief introduction to SML, which is the language we choose to implement the system.

The goal is to build a stand-alone operating system on a bare machine in Standard ML. As part of this project, the author has implemented a simple operating system to study the interaction between this advanced programming language and systems programming. The result of this project is a test-bed for measuring and comparing the functionality and performance of an operating system in an advanced language with traditional operating systems such as UNIX.

Standard ML is an advanced functional programming language and has many good features for developing systems software.

Standard ML supports automatic garbage collection, higher-order functions, and first-class continuations. All these features are very good for building systems software:

The process of reclaiming memory not used by the program is called garbage collection. Garbage collection can free programmers from controlling the dynamic allocation and deallocation of memory, so programmers need not remember to free the unused memory and there will be no danger of freeing memory that is still in use.

Higher-order functions means functions are created at runtime and can be passed as parameters, returned as results, and defined in a data structure, that is, functions are first-class values in ML.

Higher-order functions provide more flexibility compared to first order functions. In ML, functions can be nested, be called recursively, and be created at run time. Run-time creation is possible is because ML uses closures to represent the function. A closure is a record that contains the machine-code pointer and a way to access any required nonlocal variables [4]. This is different from C, in which the function is simply represented as the address of the machine code.

A continuation is a function that expresses ``what to do next in the computation'' [3]. This is a type also supported by Scheme, in which the dynamic calling context of a function can be abstracted as another function [2].

In ML, continuations are first-class values, the same as functions. By using continuations, writing a thread scheduler is very easy and efficient without having to code in assembly language.

ML is a strongly typed language, that is, all the type errors can be detected at compile time and no implicit type conversion will occur if a mismatch happens during type checking. With automatic garbage collection and a strong type system, Standard ML is a safe language, which means "programs cannot corrupt the runtime system so that further execution of the program is not faithful to the language semantics" [2] Runtime errors, and especially errors that lead to execution that is not faithful to the semantics of the language(e.g. accessing past the end of an array, casting integer values to illegal pointer), are hard to debug and test, so using a safe language should make the software easier to develop and more reliable. 3. Easy to maintain

ML supports polymorphic types and has an advanced module system. These features provide flexibility in the function interface definition and make programs easier to maintain, since the compiler can perform many checks for mis-matched definition and use without restricting code re-use. In general, ML programs are more reliable, portable and reusable than C programs.

SML is a functional language but not a pure functional language. As a functional language, its values are immutable, which makes the program much easier to trace and understand. SML also has mutable objects, so when necessary, variables can be both read and written. With these two properties, SML can be a practical programming language for real-world projects.

For further study about SML, readers may refer to the book by Jeffrey D. Ullman [23] and the paper by Andrew W. Appel [2].

The runtime system of Standard ML of New Jersey(SML/NJ) provides an interface between the hardware, the operating system and the compiler. It includes the garbage collector, an interface for operating system calls, a mechanism for handling asynchronous events and exceptions, and language primitives implemented in assembly language [1].

Interrupts are a common mechanism in operating systems to make the CPU stop processing its current task and begin to respond to asynchronous events such as hardware signals and software traps. The CPU normally returns to process the interrupted job after processing the asynchronous events. Interrupts make the CPU work more efficiently. There is no need to poll devices to check whether some action is needed. The operating system has different interrupt handlers for the different interrupts, and these interrupt handlers are installed at system boot time. At a fixed location in memory there is an interrupt vector, which is used to record the entry addresses of the handler programs. For example, on the x86 the interrupt vector occupies the 16 4-byte words starting at address 0x20(32 in decimal). In general, different devices correspond to different interrupt numbers. This number is called an IRQ number. For example, on the x86 IRQ(Interrupt ReQuest) numbers range from 0 to 15. When an interrupt happens, some registers in the machine will be set automatically to the IRQ number. By reading from those registers the CPU will know which interrupt happened and will jump to the handler program for that interrupt.

A device driver is a program used to control the device. A device driver must initialize and configure the device and must control the device registers. Basically, the functions provided by the driver program should include:

$\bullet$ the read function to get data from the device if it is a readable device

$\bullet$ the write function to send data to the device if it is a writable device

Most devices are both readable and writable, but some devices can only be read or written to. For example, on a PC the console can only be written to and the keyboard can only be read from. Device drivers may also contain configuration functions and other special functions which depend on the different functionalities of devices. The device will interrupt the CPU only when it needs attention from the CPU: for example when the device finishes its task or something abnormal has happened in the device. Normally there are three kinds of registers for each device: the command register, the status register and the data register. By writing to the device command register, the CPU can control and configure the device. By reading from the device status register, the CPU will know what happens on the device. The CPU and the device exchange data by reading from or writing to the device data register. The driver initialization program should allocate memory for all the device registers at system boot time.

There are many books about operating system concepts. The best way to really understand operating systems is to study the source code of an existing system or to build an operating system. Linux and XINU [10] are two good examples to study. Both are open source and can be downloaded freely from the Internet.

The first contribution of this thesis includes the design of the Hello project and the lessons we learned from the implementation. The Hello project is the first project to implement an operating system in Standard ML that runs on a bare machine. These first-hand experiences may give some insight for further research on system software and language design.

$\bullet$ handling interrupt events with a language that has automatic garbage collecting and is heap-allocated.

$\bullet$ taking advantage of the parametrizable module system of SML to produce a reliable, reusable, and simple software architecture design.

$\bullet$ scheduling threads with continuations. This basic idea has been fully explained before [9], so we won't present much of this idea in the thesis. The difference between our implementation and the example stated in the paper is that we use a preemptive scheduler, while they use a cooperative scheduler.

The second contribution is we demonstrate that SML is a suitable language for the operating system implementations. We believe other advanced languages with optimizing compilers can also be languages for system programming, but the lack of an advanced module system for some languages will make it hard to maintain good structure.

As part of the implementation of the Hello system, we provide a way to integrate an SML heap image and its runtime system into the operating system kernel space and load it as a booting image when the system starts up. This integration method should be interesting to other non-ML researchers as well.

This thesis is documentation for the Hello operating system. We present the signatures of all the implemented modules in chapter 3 with a detailed explanation. It gives a good start point to study the Hello system.

The Hello operating system is at an early stage. We hope that by making the source code available to the public, other people will be motivated to join in this project, which will make the software useful to a larger number of people [33].

This thesis focuses on the design and implementation of the system, especially on the interrupt handling and device control.

Chapter 2 presents the design of this project. Chapter 3 describes the implementation. Chapter 4 contains a summary of other work that is related to this project. Chapter 5 gives an evaluation of this work. Chapter 6 discusses future work to be done on this project and gives some conclusions we have drawn.

In this version of the Hello system, we have built a simple operating system with only one process. Since we use the Linux boot loader(LILO) to load our system and have built the system by modifying the Linux kernel, our system partly depends on Linux code such as the system start up and memory management.

Since we use LILO to load our system, our first problem was how to integrate the SML heap image and its runtime system into the Linux kernel space. There are at least two ways to do this. One is to load the SML heap image produced by the compiler as the data into the runtime system, and have the Linux start up function call the runtime system main function. The other way is to embed the SML heap image into an ELF format object file and link it into the Linux kernel space. The Hello system is implemented by using the first method, since it is simpler. The limitation of these two methods is the boot loader has a maximum value for the kernel size, for example, the maximum size of the kernel image size in LILO is 1023K(0xFFFEF) bytes. But this is good enough for a small kernel. Right now, the kernel image size of the Hello system is 214K bytes.

The runtime system and the code generated by the SML/NJ compiler execute as a pair of co-routines, that is, there is a convention between SML and its runtime system that allow control to be transfered back and forth. There are two assembly functions in the runtime system for transferring control: restoreregs transfers the control from the runtime system to SML; saveregs transfers control from SML to the runtime system.

RunML is a function defined in the runtime system of SML/NJ. In RunML, control is explicitly transferred to the suspended ML thread after the garbage collection or other runtime calls have finished.

There are two cases in which control is transferred from the ML co-routine to the runtime. One is any calls to the runtime system functions in the ML code. The other one is when compiler-generated code at the beginning of each ML function detects that insufficient memory is available and that garbage collection is needed.

The communication between these two co-routines totally depends on the registers, including three special-purpose registers and several machine dependent root registers [21]. These registers are described in the next section.

The SML/NJ compiler uses continuation-passing style(CPS) to generate code. The continuations and the generated code are heap allocated, so the major problem with handing asynchronous signals is to avoid corrupting the heap [21]. The heap is only synchronized at the beginning of functions, so suspending ML code and then at some point calling GC would lead to the garbage collector mis-interpreting the values in the heap. Handling interrupts in SML has the same problem. We use a mechanism very similar to the SML/NJ signal handler to handle interrupts, and again we must keep the heap uncorrupted.

The SML/NJ compiler generates code to compare the limit pointer to the allocation requirement at the beginning of each function. If the comparison shows the free space is less than what the function needs, control goes to the runtime for garbage collection.

One of the registers which are visible to both co-routines is the limit pointer, which is the upper limit of the allocation space. Another register visible to both co-routines is the free memory pointer. The memory space between the free memory pointer and the limit pointer is the free memory. If the size of the free memory is less than the memory allocation requirement, a garbage collection trap will happen, and control will transfer from ML to the runtime at this point. The runtime system provides a mechanism to handle the asynchronous signal. When a signal comes, the runtime system does not really process it but instead sets the limit pointer register to show there is no free space. At the next check there will be no free memory, so a GC trap happens and control will transfer from the ML co-routine to the runtime co-routine. The runtime routine will see that this is not a true GC trap, so it will transfer control back to another ML co-routine to handle the signal. We use this signal mechanism to handle interrupts in the Hello system.

We have added some functions in the runtime system to handle interrupts. A C interrupt handler is installed at boot time and device initialization time. The handler is the same for all the entries of the interrupt vector. When an interrupt happens, the CPU executes the instructions in our C interrupt handler, which do the following: $\bullet$ disable interrupts

$\bullet$ get the interrupt number(IRQ number) from the memory at address 0x21 and 0xA1.

$\bullet$ call the C signal handler, which is part of the runtime system of SML/NJ.

The C signal handler inserts the signal into a pending queue and if the system was running in ML space before being interrupted, the handler sets the limit pointer register to zero.

After the C interrupt handler completes, the interrupted code continues to run. If the interrupted code was generated by the ML compiler, a GC trap happens within a finite time, so control transfers back to the runtime system. From there the ML signal dispatcher is resumed. The ML signal dispatcher calls our ML interrupt dispatcher. Our ML interrupt dispatcher gets the first IRQ number from the pending interrupt queue, and calls the interrupt handler corresponding to the IRQ number. Our ML interrupt dispatcher loops until all interrupts have been processed and removed from the pending queue.

In this chapter, we explain how we have integrated the SML heap image and the runtime system into the operating system kernel space and present the signature of each module we implemented with a detailed explanation of it.

Signatures are an important mechanism in the module system of SML. Besides their theoretical properties, signatures serve as the ``type'' interface of structures and functors, which are the basic units of executable code in SML.

The hardware of our testing environment includes a PC with Pentium 133MHz CPU and 12MB RAM, and an Intel Ether Express card as the network adaptor. The software we used to implement our project includes the Standard ML of New Jersey version 110.7 and Debian Linux with kernel version 2.0.30.

Since the runtime system provides an interface between the hardware, the operating system and the compiler, we needed to modify the runtime system of SML/NJ for Linux on x86 and build a runtime system to support our Hello system. The basic rule is to keep the architecture-related code, since we don't want to change the register allocation interface between the compiler and its runtime system, and to remove any code that depends on the operating system.

At first, we identified the operating system dependent parts of the runtime system and eliminated them. The parts we eliminated include the code for supporting Linux system calls, socket programming, and functions that implement the Posix standard.

Besides eliminating code from the runtime system of the SML/NJ, we also eliminated large parts of the code for Linux, only keeping its start-up code and memory management part. For example, we eliminated the Linux file system, virtual memory, all device drivers, network protocols, and the scheduler.

We built the operating system kernel in ML and wrote the kernel into an SML/NJ heap image by using the function exportFn provided by SML/NJ. We wrote a simple program to read the heap image from a file and write it to a C source file as a very long C string. Once the executable image is loaded into the memory by the Linux boot loader(LILO), the runtime system of SML/NJ loads data from the string.

In the Table 1 we showed the Linux kernel and the runtime system code size before the minimization and after the minimization. We counted the code size in lines, including the code in C and in assembly language.

In the Table 2 we showed the line numbers of code in 3 different languages in the Hello system. Most part of the assembly code is used for compressing the kernel and booting the kernel and switching context between the two co-routines. Most part of the C code is used for memory management and handling the architecture-dependent parts of initialization. SML code includes all the modules we introduced from Section 3.3 to Section 3.7.

In the Table 3 we showed the original compressed Linux kernel image size, the the original stand-alone runtime system image size, and the compressed Hello system image size.

This module is used for installing or removing the interrupt handler for the given interrupt, masking or unmasking the given interrupt, printing to the screen all the installed interrupt handlers. This module also includes a signal handler function, dispatch. As we said before, we use the signal mechanism in the SML/NJ to handle the interrupt. dispatch is called by the SML/NJ signal dispatcher whenever an interrupt happens. dispatch calls the related interrupt handler according to the interrupt ID, which is an input parameter of this function. The module satisfies the DEV_INT signature shown in Figure 3.

**Figure 3:** Signature of interrupt handler module.
signature DEV_INT = sig exception INVALID_INTERRUPT_ID exception NO_ASSOCIATED_INTERRUPT_HANDLER type IRQ = int type handler = IRQ -> unit datatype dev_status = UNINSTALLED \| INSTALLED of handler type dev_info = dev_status * bool * int * string val listHandlers: unit -> unit val setHandler: (IRQ * dev_status * int * string) -> unit val clearHandler: IRQ -> unit val mask: IRQ -> unit val unmask: IRQ -> unit val dispatch: IRQ -> unit end

Figure 3: Signature of interrupt handler module.

signature DEV_INT = sig
    exception INVALID_INTERRUPT_ID
    exception NO_ASSOCIATED_INTERRUPT_HANDLER
    type IRQ = int
    type handler = IRQ -> unit
    datatype dev_status = UNINSTALLED  |  INSTALLED of handler 
    type dev_info = dev_status * bool * int * string
    val listHandlers: unit  ->  unit 
    val setHandler: (IRQ * dev_status * int * string)  ->  unit
    val clearHandler: IRQ  ->  unit
    val mask: IRQ  ->  unit
    val unmask: IRQ  ->  unit
    val dispatch: IRQ  ->  unit
end

$\bullet$ type handler is the type of functions from IRQ to unit. It is the type of the device interrupt handler functions.

$\bullet$ datatype dev_status defines two data constructors.These two constructors identify the two states of the interrupt handler, either installed or uninstalled. If the status is installed, a handler is defined.

$\bullet$ type dev_info is a tuple of 4 components. The first component indicates the interrupt handler status. The second component indicates whether the given interrupt is masked or not. The third component is the priority of the interrupt handler thread. The last component is the device name. The maximum interrupt number is a constant value defined in this module. It is a machine dependent value. For example in the X86 architecture, it is 16.

$\bullet$ When the interrupt number is out of the range from 0 to maxim interrupt number - 1, exception INVALID_INTERRUPT_ID will be raised.

$\bullet$ When the system is going to remove, mask, unmask or dispatch to a non-existent interrupt handler, exception NO_ASSOCIATED_INTERRUPT_HANDLER will be raised.

The keyboard module provides two functions: one to initialize and the other to manage the keyboard. The initial function should be called at system boot time.It will install the keyboard interrupt handler, which is a function internal to the module. When a keyboard interrupt happens, the interrupt function will be called. The handler will obtain from the keyboard data register the scan code of the key. The keyboard data register is machine dependent and found at location 0x60 in the x86 architecture. The handler then converts the scancode to the ascii value of the key, and adds the ascii value to the end of a queue. The queue is maintained by the keyboard module to store all the characters which have not been read by the user.

Function read_char is used to get the first character in the queue. The interrupt handler and read_char run as the producer and the consumer of the queue, respectively. When the queue is empty, read_char returns immediately with the value NONE.

**Figure 4:** Signature of keyboard module.
signature KEYBOARD = sig (* keyboard interrupt handler function, for kernel internal use only val interrupt: IRQ -> unit *) val read_char: unit -> char option val init: unit -> unit end

The timer module provides two functions: one to initialize the module and the other to get the time from the timer. There is an internal variable, jiffies, to record how many times the timer interrupt has been called in the timer module. The initial function should be called at system boot time. It will clear jiffies to zero and install the timer interrupt handler, which is defined as an internal interrupt function. When a timer interrupt happens, the interrupt function will be called. It will increase jiffies by one. The gettime function will return the current jiffies value to the caller.

**Figure 5:** Signature of timer module.
signature TIMER = sig (* timer interrupt handler function, for kernel internal use only val interrupt: int -> unit *) val init: unit -> unit val gettime: unit -> int end

We have implemented a driver program in SML for the Intel Ether Express card. Our implementation is a translation into SML of the Linux Ether Express driver program.

The Intel Ether Express card is an Intel i82586 coprocessor based board. It has the following features and functions related to our implementations [17].

$\bullet$ automatically generates and adds the preamble bits for the outgoing frame, fetches the destination address and length field (or type field if using the IEEE802.3 standard) from the transmit command, inserts its own hardware address into the source field, and gets the data from the transmit command, and finally computes and appends the CRC to the end of the frame.

$\bullet$ automatically strips the preamble bits and the ending CRC bits of the received frame.

$\bullet$ on-chip memory management, including automatic buffer chaining and reclamation of bad frames.

For the detailed specification of the Intel i82586 coprocessor, readers may refer to the Intel Microcommunications Databook Vol.1 1990 and Intel Networking book 1998 [17]. Our network adaptor module provides functions to probe whether the card is physically installed and write some initialization code to the card to start it. It also provides functions to open and close the device, that is, to install or uninstall an interrupt handler for the device. As any network driver, it has functions to get the data from the device and to send the data to the device. It also provides a function to get the physical address of the adaptor, which is the ethernet address for the Ether Express card.

In this module, we define three data types that are independent of the specific network card we use.

$\bullet$ Type STAT is used to record network statistics, including how many frames have been received and sent and how many error frames have been received.

$\bullet$ A value of type T represents an EtherExpress device. Type T includes both device-dependent and device-independent information. The device-dependent part records the configuration information of the adaptor. It includes the starting and ending addresses of the incoming and outgoing frame area, the next available frame in the incoming and outgoing frame area, the size of each frame and numbers of all the frames, status of whether the device is busy or not, the device startup time, and whether the device is in promiscuous mode or not. Most of this information is read or specified at device startup time. The device-independent part includes information such as the name of the device, the IRQ number assigned to the device, the device's ethernet address, broadcast address and hardware address, the header length of the physical layer, the MTU size, the hardware type, the interface type, the status of whether the interrupt for this device is enabled or not, and whether the device is up.

$\bullet$ Type iftype defines the network interface type. It can be in one of three: AUI, BNC, or TP type. For example, the Intel Ether Express has type TP(Twisted Pair).

In this module, we implemented 26 functions. 12 of them are for internal use only. Other functions are divided in two groups: the first is the general interface provided by this module and is a translation of the Linux device driver interface to the kernel, the second group is provided for integrating with the FoxNet and is implemented by calling functions from the first group. The FoxNet is a network protocol stack written in SML and developed at CMU [8].

**Figure 6:** Signature of eexpress internal functions.
datatype iftype = AUI \| BNC \| TP val hw_readeeprom: (int * int ) -> word val hw_ASICrst: (int * int ) -> int val hw_probe: (string * int) -> T option val hw_txinit: T -> unit val hw_rxinit: T -> unit val hw_txrestart: T -> unit val hw_tx: (Word8Vector.vector * T ) -> unit val hw_rx: T -> unit val hw_lasttxstat: T -> int val hw_init586: T -> unit val newBuf: ( int * int ) -> Word8Array.array val delBuf: int -> Word8Array.array

Figure 6: Signature of eexpress internal functions.

	datatype iftype = AUI | BNC | TP  
	val hw_readeeprom: (int *  int )  ->  word 
	val hw_ASICrst: (int *  int ) -> int
	val hw_probe: (string *  int)  ->  T option
	val hw_txinit: T  ->  unit
	val hw_rxinit: T  ->  unit
	val hw_txrestart: T -> unit
	val hw_tx: (Word8Vector.vector *  T )  ->  unit
	val hw_rx: T  ->  unit
	val hw_lasttxstat: T  ->  int
	val hw_init586: T  ->  unit
	val newBuf: ( int *  int )  ->  Word8Array.array
	val delBuf: int  ->  Word8Array.array

**Figure 7:** Signature of eexpress interface functions.
signature EEXPRESS = sig (* a Linux-like interface ) type T type STAT val probe:(string int) -> T option val open: T -> unit val close: T -> unit val xmit: ( Word8Vector.vector * T ) -> unit val receive: T -> (Word8Array.array * int ) val stats: T -> STAT val set_multicast: T -> unit val irq: int -> unit val etheraddress : T -> Word8Vector.vector end

Figure 7: Signature of eexpress interface functions.

signature EEXPRESS = sig
    (*  a Linux-like interface *)   
	type T
	type STAT
	val probe:(string *  int)  ->  T option
	val open: T  ->  unit
	val close: T  ->  unit
	val xmit: ( Word8Vector.vector *  T )  ->  unit
	val receive: T -> (Word8Array.array * int )
	val stats: T  ->  STAT
	val set_multicast: T  ->  unit
	val irq: int  ->  unit
	val etheraddress : T -> Word8Vector.vector
end

$\bullet$ irq: the interrupt handler. When the adaptor gets a frame or finishes sending a frame or something abnormal has happened, an interrupt take place, and this handler function will be called to handle the interrupt.

$\bullet$ probe: initialize and configure the ethernet card. It is a function called at system startup time.

$\bullet$ open: install the interrupt handler in the system interrupt vector. Though the system may get the interrupt right after the network card is initialized, it is only after this function is called that the system can handle the network interrupt correctly,

$\bullet$ xmit: send out an ethernet packet to the network interface. The packet should be at least 60 bytes and no more than 1514 bytes including the ethernet header.

$\bullet$ receive: get a packet from the network interface. The received packet is appended at the end of the system buffer. The buffer is a byte array and defined in the type T. It is allocated when the device is initialized. Whenever the kernel code calls this function, the first packet in the buffer and the packet length will be returned.

$\bullet$ set_multicast: we provide the interface to support the multicast but have not implemented it.

The functions provided for the FoxNet have almost the same functionality as the functions we list above, except in the interface for the FoxNet there is no function to get the network statistics and no interface for setting up the multicast. The main difference between these two groups of functions is the type of the functions' parameters and their return values. We hide the device-dependent type T and STAT from the interface for FoxNet by using the IRQ number to select the related device instead. We think the interface for the FoxNet should be general enough for common use.

**Figure 8:** Signature of eexpress interface functions for foxnet.
signature EEXPRESS = sig (* an interface for integration with CMU FoxNet ) val probe: ( string int ) -> int val dev_open: int -> unit val get_ethernet_address: int -> Word8Array.array val dev_close: int -> unit val send: ( int * Word8Vector.vector ) -> unit val receive: (int * Word8Array.array * int ) -> int end

Figure 8: Signature of eexpress interface functions for foxnet.

signature EEXPRESS = sig
    (*  an interface for integration with CMU FoxNet  *)
	val probe:  ( string *  int )  ->  int
	val dev_open: int  ->  unit
	val get_ethernet_address: int  ->  Word8Array.array
	val dev_close: int  ->  unit
	val send: ( int *  Word8Vector.vector )  ->  unit
	val receive: (int *  Word8Array.array *  int )  ->  int  
end

ML does not provide direct access to the hardware, including registers and the memory, so we have to depend on C functions in the runtime system of SML/NJ to access the hardware. We provide an interface module in ML for calling these functions in ML code. We call this module ASM.

The x86 has two instructions, sti and cli, which are used to set and clear the interrupt bit in the system flag register [15]. We provide two functions sti and cli in ASM to call the Linux assembly macros to enable and disable interrupts.

The x86 provides several instructions, inb, inw, insw and outb, outw, outsw for reading data from or writing data to the specified physical address. The data can be transfered either in a byte, word(16 bits) or in a 16-bit string format [15]. Besides providing the assembly macros to use these instructions directly, Linux also provides inb_p, inw_p, outb_p and outw_p, which are functions and will pause a while after calling the corresponding macros. We implement these functions in ASM.

The x86 also has two instructions, bts and btr, for testing and setting or resetting a bit in the specified address [15]. In ASM, we provide set_bit and clear_bit functions to call the Linux assembly macros to set or reset an assigned bit at a specified address. These functions are used to configure the device.

All the assembly macros mentioned above are defined in a Linux header file at include/asm/bitops.h.

getID is a C function we implemented to communicate with our C interrupt handler in the SML runtime system to get the first IRQ number in the queue of pending interrupts.

request_region, release_region, request_IRQ, and release_IRQ are C functions we use to call the corresponding Linux C functions. request_region and release_region are used to get and free memory associated with the device. request_IRQ and release_IRQ are used to install and uninstall our C interrupt handler in the system interrupt vector.

After we implement our own memory management module, we will be much less dependent on the Linux code and won't need some of these C functions.

**Figure 9:** Signature of machine interface module.
signature ASM = sig (* an interface based on Intel Pentium instruction set ) val getID:unit -> int val sti:unit -> unit val cli:unit -> unit val inb:int -> int ( port -> value * ) val inb_p: int -> int val outb: ( int * int ) -> unit (* (value, port) -> unit * ) val outb_p: ( int * int ) -> unit val set_bit: ( int * word) -> int val clear_bit: ( int * word) -> int val inw: int -> int (* port -> value * ) val outw: (int * int) -> unit (* (value, port) -> unit * ) val inw_p: int -> int (* port -> value * ) val outw_p: (int * int) -> unit (* (value, port) -> unit * ) val insw: (int * Word32.word * int ) -> unit (* (port, addr, count) -> unit ) val outsw:(int Word32.word * int ) -> unit (* (port, addr, count) -> unit ) val request_region:(int int * string) -> unit val release_region:(int * int) -> unit val request_irq:(int * int * string) -> int val free_irq: int -> unit end

Figure 9: Signature of machine interface module.

signature ASM = sig
	
   (* an interface based on Intel Pentium instruction set *) 	
	val getID:unit  ->  int
	val sti:unit  ->  unit
	val cli:unit  ->  unit
	val inb:int  ->  int  (*   port -> value * )
	val inb_p: int  ->  int
	val outb: ( int *   int )   ->  unit (*  (value, port) -> unit *  )
	val outb_p: ( int *   int )  ->  unit 
	val set_bit: ( int *   word)  ->  int
	val clear_bit: ( int *   word)  ->  int 
	val inw: int  ->  int (*   port  ->  value * ) 
	val outw: (int *   int)  ->  unit (*   (value, port) -> unit * )
	val inw_p: int  ->  int (*  port  ->  value * ) 
	val outw_p: (int *  int)  ->  unit (*  (value, port) -> unit * )
	val insw: (int *  Word32.word *  int )  ->  unit (* (port, addr, count) -> unit *) 
	val outsw:(int *  Word32.word *  int )  ->  unit (* (port, addr, count) -> unit *)
	val request_region:(int *  int *  string)  ->  unit  
	val release_region:(int *  int) -> unit 	    
	val request_irq:(int *  int *  string)  ->  int  
	val free_irq: int -> unit 

end

The Express project at MIT aims at exploring the interaction between advanced programming languages, operating systems, and compilers, and developing the software technology to make advanced programming languages practical useful tools for systems programming. They are currently developing an experimental kernel constructed by porting the SML/NJ implementation of Standard ML to run on a bare PC. They use the University of Utah's OSKit library for the low-level machine management, and they are planning to run the CMU FoxNet Web Server to test their experimental SML/NJ kernel.

The Express project has goals very similar to those of Hello, but the approach is different. The main difference is that Hello does not use the Utah OSkit at all but depends a lot directly on the Linux kernel, while Express depends more heavily on Utah's OSkit for the low level operation of the hardware. So, in the Hello implementations, we did a lot of work to port the runtime system of SML/NJ to a bare machine and to shrink the Linux kernel size, while most of their job is to integrate the runtime system of SML/NJ with the OSkit. For example, we use the Linux boot loader(LILO) to load Hello, while Express uses the multi loader of Utah's OSkit to load the system. Another important difference is we build the device drivers in SML while they use OSkit device drivers to access the hardware. OSkit builds its device driver modules by importing and encapsulating the driver programs from Linux and FreeBSD. The reason for Express to build an operating system based on Utah's OSkit is their current research interest in modeling concurrency using semantic features of advanced languages, especially continuations. The reason we are interested in implementing the device driver part in SML is that we think with less layer glue we may get higher performance.

Another interesting similarity between these two projects is both project leaders are from CMU and both projects have been implemented by master's students with the help of undergraduate students [13].

The objective of the Fox Project in the School of Computer Science at Carnegie Mellon University is to advance the art of programming-language design and implementation, and simultaneously to apply advanced programming languages to advance the art of systems building. The work of the project includes theoretical studies of programming languages and their properties, development of new compiler and run-time technology, and empirical studies of the application of advanced language techniques to real-world programming problems, especially in the areas of high-performance networks and operating systems.

Since the Fox project was proposed in the early 90's, it has made a lot of progress in the system software design and the study of advanced programming languages, especially in ML. Some products of the Fox project includes the design and implementation of the FoxNet, which is a network protocol stack in SML, the improvement of SML semantics and its runtime system, and intensive research on the system extensibility and safety. One of Fox project goals proposed in 1991 [11] was to build an operating system on a bare Nectar communication processor. The Nectar communication processor is basically a SPARC_based CPU board and embedded in a host machine. It is the interface between the host machine and a hub in the Nectar network [12]. There is no further released software or published papers about this operating system project. Though the Hello implementation environment is different from theirs, some of the goals are the same.

The LISP Machine is a computer system specially designed to execute LISP programs efficiently. The idea was first promoted by the MIT AI Lab and Xerox PARC in late 70's [5]. There were several commercial products after it, such as the Xerox 1100 series built in 1981, the Symbolics 3600 series in 1983, the LMI Lambda in 1983,the Fujitsu ALPHA in 1983, and the Texas Instruments Explorer in 1984. One of the interesting things in the Symbolics machine is that all the systems software is written in LISP, so there is no need to use an intermediate language such as assembly language or C to access the hardware. The system software includes the operating systems and the runtime system of the compiler. The Symbolics 3600 series uses real time, incremental, and copying garbage collection [18]. The memory is divided into three spaces: the newspace, the oldspace, and the copyspace. New objects are created in the newspace. When it is time to do the collection, the collector changes the newspace and the copyspace into oldspace, then creates a fresh newspace and a fresh copyspace. After the flip, accessible objects in oldspace are evacuated by copying them into the copyspace. After all of the accessible objects have been evacuated and all of the references to the objects in the oldspace have been replaced by the reference to the copyspace, the garbage collector reclaims the oldspace [18]. The garbage collector doesn't delay the interrupt handler and doesn't stop the interrupt handler in the middle to do the garbage collection. The Symbolics garbage collector works well with the interrupt handlers is because it is incremental and never need to pause. In contrast, SML/NJ uses a stop-and-copy, generational garbage collector, and the memory in SML/NJ is divided into two spaces, the to-space and the from-space. One consequence is that our Hello system has to delay interrupt processing when the system is doing garbage collection. Symbolics also depends on the special design in hardware such as tagged architecture to make the garbage collecting more efficient.

The SPIN operating system was developed at the University of Washington. SPIN combines research in systems, languages, and compilers to achieve the three fundamental goals of modern operating systems:

$\bullet$ Flexibility: Arbitrary users may customize SPIN by writing and installing new kernel code. User-defined extensions are linked into the kernel's address space and dynamically integrated with the executing system.

$\bullet$ Safety: Ignorant and malicious kernel extensions are isolated from critical kernel interfaces via restricted dynamic linking. Restrictions are enforced using the type-safe properties of Modula-3, the programming language in which SPIN and its extensions are written.

$\bullet$ Performance: Application-specific extensions access system resources and services with low latency since there are no expensive protection boundaries within the kernel.

SPIN is designed around a core set of services including threads, virtual memory primitives, device drivers, and the extension mechanism. These modules are the foundation for the rest of the system and are required for the machine to boot. Additional features, such as file systems and networking, are implemented in SPIN extensions and dynamically installed into the system while it is running.

Though SPIN and Hello have similar goals, the two systems have different focuses and are developed in quite different languages. Modula-3 is a PASCAL-like language with automatic garbage collection, type safety, interfaces, objects, threads and exceptions [6]. Right now, most of our efforts are focused on exploring the hardware accessibility of SML and handling interrupts, while developers of SPIN are more interested in extending the system dynamically and safely. We have interest in finding some way to provide safe extensibility of our system and we think the strong type system and module system of SML are very promising places to start.

The JavaOS for Business operating system is a Java-centric operating system with a feature set and architecture designed to provide a complete Java environment for a wide range of products, including network computers. JavaOS for Business operating system is developed by the Sun Microsystems Inc. JavaOS for Business software eliminates the need for a host operating system to run Java technology-based applications.

JavaOS uses a different approach from Hello to achieve hardware accessibility from a high level language. Their device driver program are platform-independent and implemented in Java. JavaOS provides JavaOS Platform Interface(JPI) between the Java Virtual Machine(JVM) and the JavaOS Microkernel, which are platform-dependent, and the JavaOS Device Interface(JDI). Java has one more layer abstracted from hardware than SML. SML does not run at any virtual machine layer and totally depends on the its runtime system to provide the interface between the low-level hardware and the SML application.

Smart-card is a prototype of a smart card operating system. It was implemented in Miranda and at the University of Amsterdam in 1993. Miranda is a purely functional language.

Smart-card has the same structure as a real operating system. The prototype offers most of the functionality of a real system. The program comprises a card terminal emulator and can be executed to perform cryptographically secure card-terminal interactions for a real life card application. With the prototype implementation as reference, the reliability of the product implementation can be assessed in detail.

Sting is an experimental operating system designed to serve as an efficient customizable substrate for modern programming languages. The base language used in the current implementation is Scheme, but Sting's core ideas could be incorporated into any reasonably high-level language. STING was developed at the NEC Research Institute. The ultimate goal in this project is to build a unified programming environment for parallel and distributed computing. To this end, Sting provides mechanisms for:

$\bullet$ creating extremely lightweight first-class asynchronous threads of control

$\bullet$ building customized scheduling, migration, and load-balancing protocols, specifying first-class virtual processors and virtual topologies

$\bullet$ supporting a range of execution strategies from fully eager to completely lazy evaluation

$\bullet$ experimenting with diverse storage allocation policies, and implementing persistent multiple address spaces and other features of a modern micro-kernel software architecture such as non-blocking I/O and user-level exception handling.

Sting was developed at the same time as the Smart-card project. It is also just a prototype of an operating system, not a real one. The Sting project was focused on exploring scheduling concurrent threads on a distributed system entirely relying on the features provided by the language itself.

This comment from the MIT Express project group helps describe the significance of our work.

''Strong groups of advanced programming language researchers have been intending, for years, to implement a sophisticated functional language directly on a raw hardware platform. For example, the Fox project at CMU and the Programming Principles group at Bell Labs have at different times begun efforts to port SML/NJ to run on bare hardware. But the details of doing so have been sufficiently forbidding as to prevent these efforts from ever being completed. '' [13]

In this chapter, we will present some experiences and lessons we learned from this project.

Besides the features we mentioned in Chapter 1 as the reasons why we chose SML as the developing language for this project, the Bell Labs implementation of SML has provided some very good and useful tools which make SML a practical programming language.

$\bullet$ SML/NJ provides a tool named Compilation Manager(CM) [29]. The functionality of CM is similar to that of the UNIX make program. For large software, there is always some hierarchy to the modules. By organizing all the SML program modules into a hierarchical group structure, CM can perform separate compilation, automatic dependency analysis, and selective re-compilation.

$\bullet$ SML/NJ provides a function named exportFn to dump a given function into a heap image in a file. By loading the heap image into the runtime system, the computation will begin from this function. Without this tool, our project would be impossible.

$\bullet$ SML/NJ provides an interface to call the C functions in the runtime library from ML code. It is similar to the Java Native Interface(JNI).

In order to test the context switch time between the two co-routines, we added a empty C function into the SML/NJ C interface and added 3 timers into the SML/NJ runtime system to check:

From subtracting the garbage collection time from the time taken for the runtime co-routine, we can get the total time used for the context switch, since the main jobs of the runtime system are garbage collection and context switching. Our network device driver spends about 2.5% of the CPU time in runtime processing and 0.3% time on garbage collection, so overall the driver program spends 2.2% of the time on switching the context between the co-routines. Dividing the total context-switch time by the total switch times, we found that each context-switch between the two co-routines takes about 20 s. We also found sometimes a major garbage collection may last several hundreds of milliseconds. By using a pending interrupt queue to temporarily delay the interrupt handling when the system is doing garbage collection, we won't miss any interrupt but can not handle all of them right away. We think the current garbage collector in SML/NJ has pauses that are too long to meet a real time system requirement.

There is no debugger for SML/NJ. Some SML people think since there are very few runtime errors in SML programs, there is no need to have a debugger. But, these few bugs can bring a lot of headaches when developing large software systems. If we had had a SML/NJ debugger, we would have spent less time doing this project.

Andrew Tolmach and Andrew Appel developed a debugger for SML/NJ in 1990. But since then the SML/NJ has evolved a lot and the debugger was not maintained, so there is no debugger available for the current version of SML/NJ.

Our Intel Ether Express driver program is almost an exact translation from its Linux implementation. This is not a good idea, since the programming structure of the functional language is quite different from the imperative language. Our code looks awkward. But it is not a bad idea for us either, since we didn't have any prior device driver programming experience and don't have enough specification for the card itself. If we were to implement the driver again, we could write it in a much more SML-oriented style.

Since we only implemented three driver programs for different kinds of devices and we have quite limited knowledge about device driver programming, our device driver signature could be significantly improved. We should give a better design interface design for different types of devices. Linux classifies devices as character devices, block devices, SCSI devices, network devices, and so on. Devices can also be classified as read-only, write-only, or read-write. Depending on the type of the device, the system should provide different interfaces.

One of the weakness of SML as a system programming language is its lack of accessibility to the hardware. Device driver programs must access the device registers or memory ports explicitly. As we said in Chapter 3, we added some functions in the SML runtime system to call the machine instructions to fulfill the requirement. There are two limitations of doing this. One is it makes the runtime system more architecture-dependent. Another one is our code does a context-switch between the SML and its runtime system whenever the driver program needs to access the hardware. In our implementation, the system needs to switch context between the SML and its runtime co-routine at least twice to handle an interrupt.

Except for the above problems, we think we made a good starting point on implementing a stand alone operating system in SML.

In this chapter, first we will discuss the future work which this project may take, then we will draw some conclusions from what we have done.

Building an operating system in an advanced language is a fairly complicated topic and has not gotten enough attention. There is a huge amount of work which can be done to improve the system.

We have not done much work in measuring system performance. There are two ways to evaluate and improve the system performance. One way is to find which module in the system has worst performance and improve it.

Besides finding and improving the bottlenecks of the current system, another way is to compare our system with other systems, e.g. Linux, and find which modules of our system are less efficient than corresponding modules of other systems. Our system might need to be improved and optimized.

Besides to improve the performance of the system, another part of the work that still needs to be done is to add more functionality to the system. For example, right now we use the memory management functions of Linux to allocate and deallocate memory. We are interested in building a memory management module in SML. Other functionalities which might be interesting to add into the system are IPC and RPC.

One problem in our implementation is that when an interrupt happens while the runtime co-routine is running, the interrupt can not be handled right away. This is because in our implementation the SML interrupt handler is called by fooling the SML co-routine into believing no more free memory is available. We have no such mechanism for cleanly interrupting the runtime and especially the garbage collector. In some cases, the runtime co-routine may need tens of milliseconds to finish the garbage collection. If control transfers back to the SML co-routine in the middle of a garbage collection, the heap will be corrupted. So, we use a queue to store all the un-handled interrupts in the C interrupt handler in the runtime routine. When the garbage collection is finished, all these queued interrupts will be processed. An elegant way to solve this problem might be to separate the device driver memory space from other processes' memory space, so the system could be able to handle the interrupt on time if the runtime co-routine is not collecting the garbage in the device driver memory space. Right now, we have not implemented this.

Before this system can be useful and successful, it needs applications. It should be possible to run a wide range of applications on this system. Right now, since we have built a driver program for the ethernet network interface card, we are going to run the CMU Foxnet on top of this system. Getting more applications running on this system may reveal some deficiencies in the system.

As stated in Chapter 5, this project has the potential to make a significant impact in the fields of system software programming research and language design research. We believe the work we have done is very exciting, and we hope it will inspire other researchers.

We have found that programming in advanced languages can bring great improvements in software design, implementation and maintenance. This conclusion supports a claim made by the Fox project at CMU, that advanced languages such as SML can represent the architecture of the software more naturally and directly [7] than traditional languages. The relatively recent popularity of Java also confirms this, though we note that SML has more advanced features than Java.

With automatic garbage collection, there is no need to allocate and deallocate memory explicitly in the SML code, so there are fewer potential bugs in the code than in a C program. For example, in our SML operating system we have been unable to detect any memory leaks so far.

Undetected runtime errors are much less likely in a system written in SML than in operating systems written in C or C++. Since there is no way to cast pointers in SML, there is no risk of dangling pointer errors, whereas in an operating system in C or C++, programmers have to expend considerable effort to avoid or detect such errors. The only way to break language invariants is to have an error either in a C interface function or in the call to a C interface function. As described in Section 5.1, SML/NJ provides a mechanism for users to call a C function from SML code. This allows us to control hardware resources from SML, but also makes this the most unsafe part of our program.

We believe an operating system written in SML - or any other safe language - can achieve safety and reliability benefits similar to those provided by a micro-kernel operating system. Instead of running different subsystems in different address spaces, we use the module system of SML to keep separate subsystems from unintentionally affecting each other. This avoids the cost of the context switches that a micro-kernel operating system has to pay. All of our modules (and in the future, user programs as well) run in the same memory space. Separate subsystems can only get service from other subsystems by calling the functions provided in the interface of each module. Separate subsystems have no direct access to the internal state of other subsystems. As long as all modules are compiled using the SML compiler, and as long as the SML compiler and runtime system do not introduce errors, the SML module system is guaranteed to protect internal variables from access by other modules.

The availability in the language of first-class continuations gives us a hardware-independent means for writing code to create and schedule threads. There is no assembly language anywhere in our scheduler code.

One of the goals of this project has also been to explore possible weaknesses of SML as a system programming language for real world projects. We have found that the code generated by the current SML/NJ compiler requires too much memory, and the garbage collection pauses are too long, to write an operating system capable of running an embedded real time system. On the other hand, for a regular operating system we have found the garbage collection pauses to be acceptable - interrupts are queued, but the 32-interrupt queue does not overflow - but the memory requirements are still overly large - we are unable to run any but the most trivial OS on a system with 12M bytes of memory, i.e. with less than 8M bytes of memory dedicated to the SML heap. For the rest, we have been generally satisfied with the features provided by both the SML language and (except for the lack of a debugger) with the SML/NJ compiler.

Our work shows that a real and practical operating system can be programmed in SML/NJ, with performance comparable to that of other real-world operating systems.

	Linux code size	Runtime size
Before	385,201	22,732
After	23,495	12,770

SML	C	Assembly
2,005	32,190	4,075