Fault (Tolerance) Ideas
Murphy's Law said that if anything could go wrong, then it will. (ref: Captain Edward A. Murphy http://www.murphys-laws.com/murphy/murphy-true.html ). In our world of computing this includes : our network switches and wirings, they could be disabled, or worse : bit flipping data that were sent through the network TCP checksum, instead of checksum errors (that will get transmitted), double bit flip will corrupt the packet but TCP layer not knowing that it is corrupted HDD wiring, install wrong cable or wrongly install a correct cable. Switching a good ultra DMA ATA cable with a bad one (so it will still be detected as ultra DMA) and we get a large ultra DMA CRC error rate. And we have also CRC-undetectable error rates of something like 5x10^-13 (illogically taken from http://doc.utwente.nl/64267/1/schiphorst.pdf ), this corrupt data (average is one bit for two terabytes of data) will get stored to our disks. HDD failure, that commodity disks will fail in 2 -3 years, an