Problem

Assume that you are involved in the NASA Extra-Terrestrial Mission and working on the reliability analysis of a multiprocessor system for motion planning of an autonomous mobile robot.

Assume that the robot has 3 identical CPUs for executing the same algorithm which determines the next move out of the following 3 possible actions.

Let p be the probability that a CPU produces a correct output. The robot employs the so-called majority consensus, i.e., it chooses the majority of outputs from the 3 CPUs. When the majority does not exist, all outputs are regarded as invalid (although a real system may retry to execute the algorithm on the CPUs until the system gets the output). For simplicity, we suppose that there is always exactly one correct action.

Show how you derive the probability ps that the entire system takes a correct action. Does the multiprocessor system improve the reliability in comparison with a single CPU?

Solution

Let p be the probability that a CPU produces a correct output. Since it is assumed that there is always exactly one correct action out of 3 possible actions, let c denote the correct output and w denote any of the other wrong outputs. The following table enumerates all possible combinations of the 1st outputs of 3 CPUs.
---------------------------------------------------------------
Output Values   | Majority  | Correct?  | Probability
---------------------------------------------------------------
all 3 outputs are correct
  c   c   c     |    c      |    Y      | p3 correct
---------------------------------------------------------------
2 outputs out of 3 outputs are correct
  c   c   w     |    c      |    Y      | p2(1-p) correct
  c   w   c     |    c      |    Y      | p2(1-p) correct
  w   c   c     |    c      |    Y      | p2(1-p) correct
---------------------------------------------------------------
only 1 out of 3 outputs is correct      | 0 correct
irrespective of which wrong outputs     |
---------------------------------------------------------------
all 3 outputs are wrong                 | 0 correct
irrespective of which wrong outputs     |
---------------------------------------------------------------
The total probability of a correct output by 3 CPUs at the 1st trial is equal to p3 + 3p2(1-p) = p2(p + 3(1-p)) = p2(3 - 2p) = p2(1 + 2(1 - p)).

For p = 0.9 and 1-p = 0.1, the total probability of the correct output by 3 CPUs is 0.9 × 0.9 × (1 + 2 × 0.1) = 0.81 × 1.2 = 0.972. Thus, the multiprocessor system is 8% better than a single CPU.

For p = 0.999 and 1-p = 0.001, the total probability p_s of the correct output by 3 CPUs is 0.999 × 0.999 × (1 + 2 × 0.001) = 0.998001 × 1.002 = 0.999997002. Thus, the multiprocessor system is about 0.1% better than a single CPU.

Note that an improvement ratio decreasingly approaches toward 0 as p approaches 1.

However, such an improvement is not always achievable. We show more examples below.

Now, it is apparent that an improvement is possible only when p > 0.5 and the system reliability becomes worse than a single CPU when p < 0.5. The fact that an equilibrium occurs at p = 0.5 can analytically be derived by solving the following equation.

p = p2(1 + 2(1 - p))

Remark: You may consider that p < 0.5 is unrealistic. However, in extreme environments where there is lots of noise, very high or very low temperature, and high level of radiation, p is possibly low.