Principle: Fault tolerance
Fault tolerance is based on redundancy
Redundancy is achieved by replicating components or actions
- Duplicate components, detect errors, and ignore bad components (replicate in space)
- Detect errors and retry (replicate in time)
- Checkpoint, detect errors, crash, reconfigure without the bad components, and restart from the checkpoint (hybrid replication)