Overview
- minix fork and exit
- minix exec
- modules
- Unix/Posix signals
- Minix signals
Minix fork
- fork can fail if:
- there is no process table entry (last few table entries reserved
for root), or
- the required memory cannot be allocated
if either condition holds, fork fails before allocating
any resources (similar to, but not the same as, two-phase commit
-- check first, then allocate)
- copy the data and stack segments (if the text is not shared, it
is included in the data segment) to the newly allocated memory
- copy the parent's process table entry to the child's
- modify the child's parent, traced flags, exit and signal status,
and PID (next free ID < 30,000: get_free_pid(), from
servers/pm/utility.c, not in book)
- ask the kernel to fork the process
- ask the FS to duplicate the file descriptors
- ask the kernel to map the process
- send a message to the child with return value 0 (p. 882, line 18501 --
setreply is on p. 877)
- return to the parent with the child's PID
Minix exit
- a process terminates when it is killed (depending on the signal
and the signal handler) or when it calls exit
- no error checking on exit -- exit always succeeds
- cancel alarms and signals
- tell FS to close all the open files and free process slot
- tell kernel to free process slot and take care of canceling message
sends
- free the memory of the process
- if the parent is waiting, wake up the parent and free the process slot
- if no parent is waiting, leave the process in a zombie state,
with only a process table slot still allocated, and signal the parent
- either way, reparent any of this process's children to init
- in-class discussion: is it more important for exit to be fast, or
for fork to be fast, or are they equally important?
Minix wait
- wait can specify one of three possible arguments to wait for:
- for a specific process (PID)
- for any child
- for any child within a given process group
- a wait will complete if (one of) the given process has terminated,
or is being traced and has stopped
- first loop through the process table to find a child that matches
the arguments and is a zombie -- if found, clean up this process and
send a message back to the caller and return
- if found a child that matches and is stopped, send a message
back to the caller and return
- if no child is ready to report, but one or more matching processes
are running, leave the caller waiting (on receive in the system
call), mark the caller as waiting, and return, unless WNOHANG
was specified
Minix exec
- part of exec is implemented in the exec library call,
/usr/src/lib/posix/_execve.c -- this builds the initial
stack in a buffer
- it might be dangerous to trust a user library to build the stack
right, but the stack is (over)writable by the user code anyway, so there
is no loss of security
Minix exec implementation
- check to see that the stack size is reasonable
- copy the stack into an internal buffer (before freeing the memory)
- check to see that the file is accessible, by asking the file system,
including using system calls -- the file system server has some special
code to make file system calls from the PM look like they are from the
process itself
- read the executable file header (see p. 889), which also checks
to see whether this is a script (line 18938)
- scripts require executing a different file, with this one as an
argument (line 18801)
- look to see if the text can be shared (find_share, p 895)
- allocate the new memory (19019 on p. 891, called from 18812
on p. 887) -- this is the commit point,
if this succeeds, we can never return, because there is no longer an old
process image to return to
- relocate the stack (see below -- patch_ptr, not
patch_stack) and copy it to the newly allocated
stack segment
- read in the text (unless shared) and the initialized data data
- change the effective user/group ID if executing a setuid/setgid file
- set default signal handlers
- ask FS to close "close-on-exec" files
- ask the system to initialize the new stack pointer, with the
initial valid return address and to make the process ready to execute
(sys_exec, line 18878, and kernel/system/do_exec.c, p. 764)
- return, no need to send any messages
Minix exec allocation
- new_mem, page 890
- compute needed sizes, all in multiples of clicks (look at the
arithmetic carefully -- ceiling computation)
- make sure sizes are reasonable, e.g. the data+bss segment does
not overlap the stack segment (gap < 0, line 19016)
- allocate the new memory -- if not, fail. This is overly
conservative, since it may be that the new memory would be available if we
returned the old memory first, but if we return the memory
first, we will not be able to fail.
- free the current text (unless shared) and data + bss + stack segments
- initialize the three segment descriptors (T,
D, and S)
- ask the kernel to map these segments
- clear the uninitialized parts of these segments
Minix stack relocation
- patch_ptr, page 892
- stack built by user library assumes the initial
stack pointer will be zero
- the initial stack pointer is not at virtual address zero -- the
virtual address depends on the size of the text, data, bss, gap, and
stack segments
- PM assumes that anything in the initial stack is either a 0 --
a null pointer, used e.g. to mark the end of an arguments list -- or
a relative pointer
- starting from the top of the stack
- patch_ptr adds the segment base to any nonzero entry (pointer)
- until it has seen two null pointers, one to terminate the end of
the args list, the other to terminate the environment list
- sanity check is done to make sure this update doesn't run off the
end of the stack
program execution
- when a C program begins execution, it starts at a C runtime system
call crtso, whose entire function is to call main
- crtso pushes the addresses of the three arguments to
main, argc, argv, and envp, then calls
main
- the final stack is as in Figure 4-38d (page 436), remembering
that the stack grows downward
Modules
- exec and the stack relocation are somewhat similar to what
is needed for a module
- a module is a loadable piece of code that executes in kernel
space
- the module must be relocated when loaded, or must be position-independent
code, since it is loaded dynamically at an arbitrary location in kernel space
- entry points for each module must be recorded inside the kernel,
for example an initialization routine which sets everything up, including
e.g. interrupt handlers and read/write functions
- however, the module is not (e.g. in Linux) a separate process/task,
so it does not need its own stack -- it executes on the kernel stack that
is active when it is called
- if the kernel is multithreaded, however, the module must be
coded in a thread-safe way, e.g. locking global data structures
before modifying them
- because it executes with kernel privilege, a module can be a
correctness or security risk
- modules make it much easier to install new drivers, since no
kernel recompilation is needed
Unix signals
- any process may set a signal handler for a given signal
typedef void (*sighandler_t)(int);
sighandler_t signal(int signum, sighandler_t handler);
- the argument to the signal handler is the signal number, so
the same signal handler can handle multiple signals
- the return value of signal is the old signal handler
- the signal handlers default to SIG_DFL, which:
- aborts if the signal is HUP, INT, PIPE, ALARM, TERM, USR1, USR2
(and other non-Posix signals such as POLL)
- aborts and dumps core if the signal is QUIT, ILL, ABRT, FPE, SEGV, etc
- returns (ignoring the signal) if the signal is CHLD, URG, or others
See signal(7) for details
- the signal handlers can also be set to SIG_IGN, which ignores
the signal (except KILL and STOP cannot be blocked or ignored)
- "Unix" signal handling varies among systems, but typically the
signal is blocked during execution of the signal handler
Unix/Posix sigaction
struct sigaction {
void (*sa_handler)(int);
void (*sa_sigaction)(int, siginfo_t *, void *);
sigset_t sa_mask;
int sa_flags;
void (*sa_restorer)(void);
}
int sigaction(int signum, const struct sigaction *act,
struct sigaction *oldact);
int sigprocmask(int how, const sigset_t *set, sigset_t *oldset);
int sigpending(sigset_t *set);
int sigsuspend(const sigset_t *mask);
- sigaction is defined portably across Unix-like systems
- the second parameter (if non-null) is used to define a signal
handler, either a sa_handler or a
sa_sigaction, but not both
- the sa_handler can be set to SIG_DFL or SIG_IGN
- the mask specifies which signals should be blocked during execution
of the handler
- the flags can be used to specify
- whether we are specifying a sigaction or a handler
- that certain signals should be ignored (e.g. child stop signals)
- whether the signal handler should be permanently installed, or
executed at most once
- whether system calls should continue across a signal
- whether further instances of the signal being handled should be deferred
while the signal handler is executing
- sigprocmask lets us block or unblock the given signals
- sigpending returns raised signals that are currently blocked
- sigsuspend suspends the process after temporarily installing the
given signal mask
Minix signal handling
- each process table entry in PM has the following sets of signals:
- signals to block (the sigmask)
- signals to ignore
- signals pending (i.e. previously blocked, not yet delivered)
- signals to catch (handle)
- for each signal, a mask of signals to block while the signal handler
is executing
- a second copy of the sigmask (which signals to block) so sigsuspend
can replace it with a temporary sigmask
- in addition, each process table entry has an optional signal
handler for each signal
Signal handling and system calls
- a process making a system call is blocked on receive
- what is the appropriate behavior of a long-running system call
(e.g. read from a pipe or a socket) when its process gets
a signal?
- some OSs think the system call should be ended with
errno = EINTR, so the caller can (or must) call it again
- some OSs execute the signal handler while the main part of
the process is blocked on the system call -- a system call return
then has to make sure it is returning to the right place
- Linux and Solaris (at least) give a choice via the
SA_RESTART flag (in sigaction), which if
set, means slow system calls are automatically restarted
- Minix completes/aborts the system call, by sending it a message
with result code EINTR
- how about other OSs?
Minix signal handling implementation
- sig_proc (p. 904) delivers a signal, may be called
from the kernel or the memory manager
- sig_proc
- makes sure the process is not dead,
- returns if the signal is ignored or blocked
(if blocked, after recording it in the pending signals), then
- checks for a handler, and if so, executes it, or
- if no handler, kills the process (calling pm_exit), optionally
dumping core
- to execute a signal handler, sig_proc builds two large
structures on the stack (Fig 4-49 on p. 392):
- a sigcontext structure holds a copy of significant
parts of the kernel process table entry, particularly all the saved registers
- a sigframe structure holds a valid stack frame for
the execution of sigreturn, including a return address and some parameters
because these structures are large, the stack may overflow, in which
case the entire process is killed
- the signal is removed from the set of pending processes
- if the process is paused on a system call (including but
not limited to pause), it is unblocked by sending it
a reply -- the stack is set up so that the signal handler will execute
first, and only then will the system call complete
- the signal itself is delivered by sys_sigsend, p.759.
Since the PM set up the stack correctly, all that is needed is the
appropriate context switch
Minix signal handling functions
- check_pending is called whenever the set of signals for
a process may have changed, and calls sig_proc as appropriate
- do_sigaction, do_sigpending,
do_sigprocmask, do_sigsuspend,
and do_sigreturn
do the bit set and handler table manipulation, contacting the
kernel or calling check_pending as appropriate
- check_pending is called to make sure the signal can be sent,
and to send it to a group of processes if appropriate (e.g. by the kernel
when rebooting)
- do_kill and ksig_pending are called to send a signal
(from user space and from the kernel, respectively), and eventually
call check_sig. The major difference between them is the kernel
may send several signals at once
- do_alarm and set_alarm turn alarms on and off
by calling pm_set_timer and pm_expire_timers and,
if necessary, contacting the system task
- dump_core uses several system calls to create
and write a core file
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.5 License.