Today's Outline
- filesystem and disk access layers
- the Virtual File System (VFS) layer
- VFS objects: superblocks, inodes, dentries, file objects, open file tables
- Block I/O
Implementation layers for disk and file access
- disks serve a number of purposes, including
implementing file systems and swap (and memory mapping a file combines
the two)
- implementers find it convenient to think in terms of abstractions
- low-level abstractions, e.g. "read disk block n", are used to
implement higher-level abstractions, e.g. "read bytes a through b of
file x".
- in an OS, the highest-level abstraction implements the system
calls, the lowest-level abstraction controls the hardware
- the following is the layering for the Linux disk and file
access system (note that this picture is not in the book):
The Virtual File System layer (VFS)
- The Virtual File System layer (VFS) supports file system
operations across a variety of file systems, including
- native file systems such as ext2 and ext3
- foreign file systems such as FAT
- distributed file systems such as NFS, Coda, and Andrew
- in-memory filesystems such ramfs
- the VFS is unix-oriented, in that it expects (for example)
that each file will be represented by an inode and that a directory
will be stored in a file
- if a file system does not have inodes (for example, the file
metadata is stored in the file itself, rather than in a separate block),
the inode object in memory is built by the file system code to reflect
the information in the file itself
VFS objects
- the superblock object describes
a mounted filesystem. There is one per mounted filesystem
- the inode object describes a file. There is one per file
being accessed, and inodes may be cached while files are not accessed
- the dentry object is a cached file name, with a pointer to the
corresponding inode
- the file object represents an open file. It includes a pointer to
a dentry which has a pointer to the inode
- each process has a files_struct (open file table)
object which represents
all files opened by that process (p. 208). This includes
an array of file descriptors, the index into which is the fd
returned by an open call
- each process also has an fs object that records the root and current
directories, and a namespace object which allows each process to
have a different set of mounted devices
VFS object-oriented philosophy
- each structure includes a pointer to an _op structure
(e.g. i_op for the inodes) which implements functions
to be used for that structure
- each such function pointer defaults to NULL, in which case the
system default is used
- if the function pointer is defined (e.g. i_op->rename != NULL
in the inode structure), then that function is used
- this is a form of object-oriented programming in C: each specific
file system implementation can use the default methods or override them
- this is implemented entirely by the programmers, not by the language,
and so (a) may allow for more flexibility and efficiency, and (b) permits
more mistakes
- as in many object oriented systems, memory is allocated and deallocated
frequently, e.g. whenever a file is opened. This memory comes from
the slab layer
VFS requirements
- implementing a new file system under Linux simply means implementing
the required VFS functions
- for example:
- write_super should save a copy of the superblock
- truncate should make the file corresponding to the
given inode have size zero
- d_revalidate should confirm that the dentry is still
valid (the default NULL simply returns that dentries are valid)
- aio_read starts an asynchronous read operation
- writev writes data that is spread across several
buffers (i.e. in a vector of buffers)
- the VFS code makes certain assumptions about what kind of
things the specific functions will do, but is not dependent on
how exactly they are implemented
- for a disk-based file system, the VFS calls eventually have
to be mapped to calls to the block I/O subsystem, which is shared
among all parts of the kernel which use the disk
Block I/O
- the essential function of block I/O is to transfer
data blocks from the disk (disk blocks) to memory
(buffers)
- specific aspects of data storage on disk are handled outside
the block I/O system: for example, finding the correct block (file
system), allocating buffers (memory allocation), and actually
accessing the disk (device driver)
- buffers in memory can be clean, that is, identical
to the corresponding block on disk, or dirty, that is,
different from the corresponding block on disk
- Linux also has locked buffers, whose data is in
the process of being transferred to or from disk
Block I/O requirements
- reads are probably holding up an application process, so should
complete as soon as possible
- writes must be completed for correctness and to free up memory,
but probably do not affect performance as much as reads
- for highest throughput, it is best to minimize seeks:
- minimize the number of times the head is moved by reading the
largest possible unit at once: merge adjacent requests so they
can be done simultaneously
- minimize the distance the head must seek: sort requests so
near requests are satisfied before far requests
- maximizing throughput is not sufficient, since it is important
that no single request be starved for too long -- this could
be rephrased as, "if you wait long enough, every process is a real-time
process" (analogously, no process would need special real-time treatment
on an infinitely fast machine)
- these are conflicting requirements, leading to tradeoffs and
approximate solutions