Today's plan
- Minix RAM disk
- Minix hard disk driver
Minix RAM disk
- RAM disk is block-oriented, just like the real disks, even the
block size is the same
- RAM disk is stored in main memory: a 1MB RAM disk takes up 1MB
of main memory
- write requests write directly to memory
- read requests copy data directly from memory
- four different Minix RAM devices:
- major device 0: /dev/ram, a true RAM disk on which the root
file system is usually mounted. Memory for this is allocated at boot time
- major device 1: /dev/mem, a pseudo RAM disk corresponding
to the physical memory of the PC
- major device 2: /dev/kmem, a pseudo RAM disk corresponding
to the kernel data memory of Minix
- major device 4: /dev/null, a pseudo RAM disk that throws
all its data away
- the first three RAM disks have some overlap (aliasing -- different
names for the same underlying objects)
Minix RAM disk implementation
- many functions (e.g. close, cleanup) are no-ops
- mem_task calls
m_init (which sets up /dev/mem and /dev/kmem
sizes) and then goes to the shared driver_task function
- open checks the device number, then also gives permission to
read/write I/O ports to any process that opens (and therefore has
enough privilege to open) /dev/mem or /dev/kmem
- 3 ioctls are supported:
- set the size of a RAM disk (only FS may do this) -- code on lines 9896
implements first-fit memory allocation algorithm
- set the address of the MM or FS part of the process table
- get the address of the process table
- prepare records the device's minor number and returns the (completely
fake) geometry
- finish is a nop -- schedule does all the work
- reading or writing of /dev/null is a question of returning
the correct number of bytes read or written (0 read, all written)
Reading or writing a RAM disk block
- clear the optional bit -- all operations on a RAM bit complete immediately
- check for address legality (within the RAM disk) and truncate
the transfer if necessary
- determine the physical address for the transfer, for both the user
buffer and the RAM disk buffer
- perform the physical copy
- compute the return value, the number of bytes transferred
Disk drives
- Disk drive controllers can be simple (e.g. for floppies) or
advanced (e.g. for hard disks)
- more advanced controllers allow for simpler drivers, and
simpler controllers require more complex drivers
- on a hard disk with variable geometry, the controller may
hide the actual geometry, in which case the device driver can simply
ignore all the details and address the blocks in linear order
- timing characteristics may vary a lot, e.g. for sample devices
(book p. 202)
| floppy | hard disk |
| seek time (best) | 6ms | 4ms |
| seek time (avg) | 77ms | 11ms |
| rotation time | 200ms | 13ms |
| motor start time | 250ms | 9s |
| 1 sector transfer time | 22ms | 53us |
- the time to transfer one block is the sum of the seek time,
the rotational delay (on average, 1/2 the rotation time), and
the block transfer time
- the time to transfer successive blocks is much less than the
time to seek to a given position, so successive blocks should
be read if at all desirable
Disk scheduling
- because seek and rotational latency can dominate transfer time,
it can be best to reorder block transfers to minimize these
- this can only be done if the device driver has multiple
requests to satisfy, so they can be reordered
- the following algorithms can be adapted to two-dimensional
geometries, but are most easily explained in terms of a 1-dimensional
array of disk blocks
- first-come-first-served (FCFS): no reordering, latency is
one average seek plus half a rotation
- shortest-seek-first (SSF): satisfy next whatever request is nearest
the last request.
- elevator algorithm: satisfy the nearest request in the current
direction. Once there are no more, either start over from the opposite
end, or (elevator-like), reverse direction
- SSF is usually efficient, but may lead to starvation if additional
requests are served before existing requests
Linux I/O scheduling
- anticipatory I/O scheduler predicts whether the next read will
be sequential (based on process statistics) and if so,
does nothing for 6ms after a read, optimizing for the case where
another read will request a nearby block
- deadline I/O scheduler keeps three queues of requests:
- sorted requests (for elevator), handled in FIFO order
- read requests, satisfied after at most 0.5s
- write requests, satisfied after at most 30s
- reads and writes are taken from the sorted queues (and removed
from the corresponding read or write queues), until/unless one of
the deadlines has expired, in which case the corresponding read/write
takes priority
- so elevator is used in most cases, but reads or writes are
not delayed too long
- Linux I/O scheduling is implemented in a device-independent
fashion, unlike Minix where the I/O scheduling is device-dependent
Minix hard disk driver
- BIOS supports hard disk operations, but without multithreading
and only in 16-bit real mode (designed for MS-DOS)
- instead, write own driver:
- BIOS driver, uses BIOS support
- ESDI driver
- XT driver
- AT driver
- each of these may be compiled in or out, and, if compiled in,
may be selected at boot time (first one is used if none selected)
- probing is possible, but Minix does not support it due to the
variety of devices that could be present
- BIOS may be most portable, but also slowest
- AT driver supports drivers from 80286 computers through EIDE (GB
capacity)
- driver initializes without accessing the disk, so booting a system
with a hard disk driver but no actual hard disk drive is possible as
long as the drive is never accessed: the disk is first accessed
when w_do_open is called.
AT hard disk control flow
- w_do_open called to open the device, calls
w_prepare, which sets global variables to refer to the disk
being accessed, w_identify which fills in the geometry
and sets up the interrupt handler (w_handler), and loads
the partition information
- w_schedule groups adjacent requests into a single
request (much simpler than an elevator), up to a given maximum,
then waits for the existing request
to complete, then loops to make the next request
- w_finish resets the controller (to enter a new command),
then tells it to actually transfer the bytes (bytes transferred using
PIO, not DMA)
- on a read, w_finish
calls w_intr_wait which calls
receive (HARDWARE, &mess) in a loop, until the returned
status is idle (i.e. not BUSY) -- so this returns after the first
hard disk interrupt at which the disk is not busy
- on a write, w_finish:
- calls waitfor (or, equivalently, w_waitfor) which busy-waits until a bit
(STATUS_DRQ, for Data transfer ReQuest) is set in the controller
status register (or until a 32s timeout is detected), then
- copies the data to the device, then
- calls w_intr_wait to wait until the controller interrupts
to signal it is ready for the next command
- for either reads or writes, the next I/O descriptor is then
selected if this I/O descriptor has been satisfied
- w_finish also checks for errors, resetting the device
(and giving up if the read was optional) after two failures, and failing
the request after four failures
- resetting set up, but only done when the next I/O operation begins
- in case of timeout, the maximum read size is decreased (from n to
8 sectors, and from 8 sectors to 1), in case that was what tripped up
the drive