SPARC Subroutines
- Open and Closed Subroutines
- Call and Return
- Parameter Passing
- Register Organization
- Stack Frames
- Calling Conventions
- Return Conventions
- Returning structs
- Pointer Arguments
Open Subroutines
- Macros
- Defined using m4, a macro assembler, C #define macros,
C++ "inline" directives
- No code is produced for the subroutine
- The code is in-lined at the point of "call"
- Very efficient, very flexible
- Cannot be used recursively
- Could lead to large code
Open Subroutine (Macro) Examples
- C #define macros
- #define max(a, b) (((a)>(b))?(a):(b))
- cmul macro
- cmul(%l0, 100, %g1, %l0) expands to:
(100[10] = 64[16] = 0110 0100[2])
sll %l0, 2, %l0
sll %l0, 3, %g1
sub %l0, %g1, %l0
sll %g1, 2, %g1
add %l0, %g1, %l0
Closed Subroutines
- "functions", "procedures", "methods"
- Defined using labels (for entry point), ret to exit
- Code is produced for the subroutine
- call is used to go to the subroutine
- Less efficient, less flexible, more general
- Can be used recursively
- Compact code
Call to a Label
- call .mul places the return address into
o7, and jumps to .mul
- the instruction after call (the instruction in the
delay slot) is executed before the branch
- (no "annulled" calls)
- if call is at label x:, return address is
x + 8
- first instruction in a function is usually save
SPARC registers
- The SPARC programming model has 32 (32-bit) general purpose registers:
- 8 global (g0..g7, r0..r7, r0 is always zero)
- 8 output (o0..o7, r8..r15, o6 is stack pointer, o7 is
return address of called subroutines)
- 8 local (l0..l7, r16..r23)
- 8 input (i0..i7, r24..r31, i6 is frame pointer, i7 is return address)
- The SPARC hardware actually has 128 (32-bit) registers
- The global registers are shared
- All other registers are relative to a window pointer (CWP,
Current Window Pointer)
save
- save moves the CWP by 16 registers so:
- The caller's input and local registers become unavailable
- The caller's output registers are mapped to input registers
- The subroutine has a fresh set of local and output registers
- Note %sp (%o6) is mapped to
%fp (%i6)
restore
- restore moves the CWP back by 16 registers so:
- The callee's local and output registers become unavailable
- The callee's input registers are mapped back to output registers
- The caller's local and output registers are restored
Deep calls
- 128 registers (8 sets) is nice, but can be exhausted at a call
nesting level of 7
- Call the register sets A (oldest, outermost), B, C, D, E, F, G, H
(newest, innermost)
- a save when all register sets are in use:
- saves the 16 input/local registers from A
- into the 64 bytes pointed to by B's stack pointer, and
- shifts everything down to make room for J's registers
- the hardware actually traps to the OS which does all of the above
and returns
- the converse happens on an underflow (only on restore)
Code Example
void f (int a, int b, int c) {
...
}
...
f(1, 99, 77)
...
f: save %sp, -64, %sp
! a is in i0, b in i1, c in i2
...
ret
restore
...
! put 1 in o0, 99 in o1, 77 in o2
call f
delay slot instruction
...
Call and Return
- jmpl r1 r2/imm r3 places the return address into
r3, and jumps to r1 + (r2/imm)
- call label is a native instruction
- call %r1 is jmpl %r1 %g0 %o7, i.e. jmpl %r1 %o7
- ret is jmpl %i7+8 %g7
Arguments
- Arguments can be:
- on the stack (slow, but any number of arguments is OK)
- in-line after the call (fast if arguments are literals, but
does not support recursion -- Fortran)
- in registers (fast, but limited number)
- arguments 0..5 are in %o0..%o5
- all arguments have space reserved for them on the stack
- arguments 6..oo are only on the stack
Stack frame
The stack frame is delimited by %fp to %sp
(%fp > %sp):
- local variables (%fp + offset, with offset < 0)
- arguments (at least 6 4-byte words, possibly more:
%sp + offset, with offset > 0)
- return structure pointer (4-bytes: %sp + 64)
- 64 bytes reserved for saving registers if necessary
(%sp .. %sp + 64)
Return Values
- Return values are always in callee's %i0 caller's %o0
- Structures don't fit in a register:
- Caller allocates memory to store structure
- Caller puts address of memory onto stack
- Callee reads address from stack
- Callee puts result into memory
- what if caller didn't know callee was going to return a structure?
- caller puts intended size of structure after call delay
- callee returns to %i7 + 12 (%i7 + 8 is the structure size)
Example
int foo(int a, int b, int c, int d,
int e, int f, int g, int h) {
return (a + b + c + d + e + f + g + h);
}
main ()
{
foo(1, 2, 3, 4, 5, 6, 7, 8)
}
m4
foo
define(hs, argd(8))
define(gs, argd(7))
define(fr, i5)
define(er, i4)
define(dr, i3)
define(cr, i2)
define(br, i1)
define(ar, i0)
foo: save %sp, -(64+4+24), %sp
ld [%fp + hs], %o0
ld [%fp + gs], %o1
add %o1, %o0, %o0
add %fr, %o0, %o0
...
add %ar, %o0, %o0
ret
restore
m4
main
main: save %sp, -(64+4+24), %sp
ld %o0, 8
st [%sp - 4], %o0
ld %o0, 7
st [%sp - 8], %o0
ld %o0, 1
ld %o1, 2
ld %o2, 3
ld %o3, 4
ld %o4, 5
call foo
ld %o5, 6
ret
restore