Outline

Performance scaling

what happens if you double processor speed?
the answer depends on the benchmarks and on the rest of the system
given the benchmarks, the result depends on the rest of the system
if the system is CPU bound, the benchmark performance should double
if the system is memory bound, the benchmark performance might not change at all
most systems are somewhere in-between:
- reasonable designers put effort/money where it will do some good
- if increasing the CPU speed does no good, no point in spending the extra money
while the performance scaling in many case is close to linear (even if the factor is less than 1), this is not necessarily the case
a memory bound system can be improved by adding well-designed cache memories
memory hierarchy: the main memory is a cache for the disk, the cache caches the memory, the registers cache the cache

the law:
- if component A takes x% of the time
- no matter how much we improve component A,
- we cannot save more than x% of the time
sounds straightforward, but easy to forget or overlook
this means: spend time and effort improving components that take large percentages of the time (also applies to programming)
often results in different components later taking a larger percentage of the time, making them better candidates for optimization
speedup: ratio of performance after to performance before improvement
speedup: ratio of time before to time after improvement
if it takes 10 seconds before improvement, 9 seconds after improvement, the speedup is 11%

it would be nice to have a way to predict performance from general considerations
for example, larger code sizes for the same program often imply slower execution
however, this is not always true
in modern processors (e.g. RISC), larger code sizes can achieve higher processing speeds
measured (or computed) execution time really is the best measure

Million Instructions Per Second
different instructions do different things!
different instructions have different CPI, so different programs on the same machine have different MIPS
a program which executes more instructions in more time might have more MIPS!
also referred to as MOPS
sometimes peak MIPS are quoted, which is even less helpful
relative MIPS: given a time t1 on a reference n-MIP machine and t2 on the target machine, relative MIPS = t1 / t2 * n

some user programs (typically used on supercomputers) have a need to perform a certain number of floating point operations
hence, machines that perform more FLoating point Operations Per Second are more desirable
problem: some computers might take more or less time for different floating point operations, so the mix of operations may affect the number of MFLOPS
problem: if an operation (e.g. square root) can be implemented either in hardware or in software, the number of floating point operations will not be the same on different machines for a given program
problem: says nothing about the performance of programs that are not floating-point intensive
problem: peak MFLOPS...

Vax-11/780 was considered a 1-MIP machine
running programs on our machine and the Vax-11/780 gives us a number of MIPS, but:
- the Vax 11/780 was reclassified as a 1/2 MIP machine
- do you have access to a Vax? The baseline machine should be comparable
- as compilers evolve, do you evolve the baseline compiler? (in-class discussion)
continuously evolving numbers, hard to have a good historical perspective
SPEC uses a mix of programs: SPEC (1988) SPEC 92, SPEC 95, SPEC 2000
SPEC synthetic benchmarks which include I/O:
- SPEC system development multitasking: compiling editing, execution, system commands
- SPEC system level file server: file server performance
manufacturers need to look good, consumers need accurate information

synthetic benchmarks:
- whetstone: scientific and engineering programs, first in Algol, now in Fortran
- dhrystone: systems programs, first in Ada, now in C
program kernels:
- livermore loops: 21 small loop fragments, representative of supercomputer loads
- linpack: linear algebra benchmark
small programs:
- sieve of erastothenes
- quicksort
do not really reflect real workloads
major drawback: too easy to optimize, and these optimizations do not benefit other programs