Outline
- instruction fetch
- arithmetic instructions
- load and store instructions
- branch and jump instructions
- combining the different instructions
Instruction Fetch
-
Figure 5.5
gives one possible implementation
- a 32-bit register holds the PC
- the output of the register is connected to the address
lines of the instruction memory:
for now, assume the instruction memory is read-only and separate from
the data memory (Harvard architecture)
- the output of the register is also one of the inputs to an adder
(or an ALU hard-wired to add) -- the other input is the constant 4, and
the output is the input for the register
- on every clock, the PC loads its inputs, those contents address
the memory, the memory data lines deliver the instruction, and
PC+4 is computed and ready to be loaded on the next clock
- alternatives and thoughts:
- are 30 bits sufficient?
- but: do we have to use an adder/ALU? What alternatives do we have?
- how are the branch and jump operations implemented?
R-type instructions
- arithmetic-logical: add, sub, and, or, slt
- read two registers, compute a result, and store it into a third register
(the registers do not have to be different)
- a register file holds 32 32-bit registers (1024 bits) and allows reading
from up to two at a time and writing up to one at the end of the cycle
- the register file needs four sets of control lines: a write enable line,
and three 5-bit selectors for the two registers to read and the register to
write (if any)
- note the register select control lines can come directly from the
instruction
- the ALU is the same as studied in Chapter 4, and includes 3 control
lines, 2 32-bit inputs, and one 32-bit output
- the complete implementation is shown in
Figure 5.7
Register File
- each bit of a register is an edge-triggered D flip-flop
- the output of each register goes to two 32-input, 32-bit wide
multiplexers (each has 5 control lines, for a total of 10 control lines)
- the register file has an additional 5 control lines which determine
which register gets written
- these last control lines control a demultiplexer:
- all but one of the outputs of a demultiplexer are zero
- the selected output line follows the input
- for this register file, we have a 1-to-32 demultiplexer
- the data input to the demultiplexer is the clock, used to tell
registers when to accept new data
- this write clock is gated by an external signal. This prevents
writes when the operation is not an R operation
Load and Store operations
- lw $r1, offset($r2)
- sw $r1, offset($r2)
- read one register, add it to the offset, and use the result as
an address for the data memory
- the second register either provides data to the memory (sw), or
gets data from the memory (lw) -- both of these functions are provided
for the R instructions, only one of them will be needed for each of lw/sw
- the 16-bit offset from the instruction must be sign-extended before
being used in a 32-bit ALU
- the complete implementation is shown in
Figure 5.9
- how would you build a sign-extend unit?
Branch/Jump instructions
- Branch if equal:
- beq $r1, $r2, offset
- if we branch, we branch to: offset * 4 + (PC + 4)
- PC + 4 has already been computed by the instruction fetch unit
- to multiply the offset by four, we simply shift it left by 2 bits,
and sign extend to 32-bits
- the ALU can subtract the two registers -- if the result is zero
(tested with a 32-input NOR gate), the branch should be taken, and
the newly computed value should be stored in the PC
- Jump: the 26 bits from the instruction are loaded into the
corresponding bits of the PC
- see
Figure 5.10
Combining all the instructions: principles
- simple implementation: one clock cycle per instruction
- hence, each component can be used for at most one phase of the
execution
- this means we need two separate memories, one for data, one for
instructions: to relax this, we would need to allow an instruction to
execute over multiple clock cycles
- multiplexers can allow us to provide different inputs to
the same component, to implement different instructions
Datapath for all the instructions
- see
Figure 5.13
- same ALU can be used for:
- R instructions and testing for equality (inputs are two registers),
as well as for
- load/store instructions, for adding a sign-extended offset to
a register
so use a 32-bit wide, 2-1 mux to select the second ALU input
- value to be written to a register comes from memory (sw), or
the ALU (R-instructions),
so use a 32-bit wide, 2-1 mux to select the register file data input
- the PC should be loaded from either PC+4 (most instructions),
or PC+4+offset (beq if the condition is true) -- offset is
sign-extended to 32 bits --
so use a 32-bit wide, 2-1 mux to select the PC register next value
- for the jump instruction (j):
- the topmost 4 bits come from PC+4
- the next 26 bits come from the instruction
- the last 2 bits are zero
so use another 32-bit wide, 2-1 mux to select between the output of the
previous mux and the above bits