Outline
- combining the different instructions
- ALU control
- overall control
- evaluating the single-cycle implementation
Combining all the instructions: principles
- simple implementation: one clock cycle per instruction
- hence, each component can be used for at most one phase of the
execution
- this means we need two separate memories, one for data, one for
instructions: to relax this, we would need to allow an instruction to
execute over multiple clock cycles
- multiplexers can allow us to provide different inputs to
the same component, to implement different instructions
Datapath for all the instructions
- see
Figure 5.13
- same ALU can be used for:
- R instructions and testing for equality (inputs are two registers),
as well as for
- load/store instructions, for adding a sign-extended offset to
a register
so use a 32-bit wide, 2-1 mux to select the second ALU input
- value to be written to a register comes from memory (sw), or
the ALU (R-instructions),
so use a 32-bit wide, 2-1 mux to select the register file data input
- the PC should be loaded from either PC+4 (most instructions),
or PC+4+offset (beq if the condition is true) -- offset is
sign-extended to 32 bits --
so use a 32-bit wide, 2-1 mux to select the PC register next value
- for the jump instruction (j):
- the topmost 4 bits come from PC+4
- the next 26 bits come from the instruction
- the last 2 bits are zero
so use another 32-bit wide, 2-1 mux to select between the output of the
previous mux and the above bits
ALU control
- The ALU needs 3 bits to control its operation (it can ADD,
SUB, AND, OR, or Set on Less Than)
- looking at the machine instruction, these bits must come from:
- the instruction opcode (bits 31-26 of the instruction), for
everything but R instructions: specifically, ADD for lw and sw, SUB for beq
- the function field (bits 5-0 of the instruction) for R instructions
- assume that our ALU control hardware obtains two bits from elsewhere
reflecting the opcode
- for load or store, ALUOp = 00
- for beq, ALUOp = 01
- for R instructions, ALUOp = 10
- then the truth table for the ALU control unit is relatively simple
(Figure 5.15):
ALUOp1 | ALUOp2 | F5 | F4 | F3 | F2 |
F1 | F0 | op |
0 | 0 | X | X | X | X | X |
X | 010 |
X | 1 | X | X | X | X | X |
X | 110 |
1 | X | X | X | 0 | 0 | 0 |
0 | 010 |
1 | X | X | X | 0 | 0 | 1 |
0 | 110 |
1 | X | X | X | 0 | 1 | 0 |
0 | 000 |
1 | X | X | X | 0 | 1 | 0 |
1 | 001 |
1 | X | X | X | 1 | 0 | 1 |
0 | 111 |
- implementing this control unit can be done by hand, but can also be
done automatically.
Main Control Unit
- control is a combinational circuit
- for the ALU control, inputs are the function field and the two ALUOp
bits, output is the 3 ALU control bits
- for the main control unit, inputs are the 6 bits of the opcode,
outputs are the control lines for the datapath, i.e. the lines
for each of the multiplexers, the ALUOp bits, and bits to read or
write memory or registers
- in the machine word:
- the two registers to read are always in positions rs and rt. This
includes the two registers for R instructions and the base register
for load and store, as well as the source register for store
- the destination register is in position rt for a load, and position
rd for R instructions
- the 16-bit offset is always in the low 16 bits of the word, for
beq, lw, and sw
- this information leads to a slightly updated datapath design, as in
Figure 5.19
Main Control Unit Specification
- For an R instruction:
- the destination register is specified by bits 20-16 of the instruction
(line RegDst high)
- the ALU source MUX should select the register input (line ALUSrc is low)
- the register write input MUX should select the ALU output
(line MemtoReg is low)
- the register write selector should select writing (RegWrite is high)
- the memory read and write selectors should be off (MemRead and MemWrite
are low)
- the PC should be loaded from PC+4 (Branch is low)
- the ALU operation should be selected from the function field (ALUOp is 10)
- different combinations implement the other instructions (see
Figure 5.20)
Overall Control Unit
- Truth table in
Figure 5.27
- yet another combinational circuit, that can be implemented
in a variety of ways, most of them involving a computer automatically
designing a circuit
Evaluating the Single-Cycle Implementation
- CPI is always 1, and all operations take the same time
- cycle time must be sufficient for the slowest operation to complete
- drawbacks:
- should never add hardware to support a slow operation, such as
floating point multiply
- can't do anything to make frequent operations faster, have to "waste"
resources on making slowest operation faster
- functional units can be used at most once per clock cycle, hence
one ALU and two adders and two separate memories
Computing the clock in a single-cycle implementation
- assume:
- memory takes 2ns to read or write
- the ALU and adders take 2ns to compute their results
- the register file takes 1ns to read or write
- the load word instruction uses the following functional units:
- instruction fetch: 1 memory access, 2ns
- register access: 1 register file access, 1ns
- adding base to offset: 1 ALU operation, 2ns
- data memory access: 2ns
- storing the result: 1 register file access, 1ns
- so the minimum clock cycle would be 8ns
- but most other operations use fewer functional units
- with a variable clock cycle, or (more practically), with
different CPIs for different instructions, we can let the other
instructions perform faster.