The classification of hardware errors is an essential prerequisite for handling these.
Hence, in a first step, hardware errors have to be classified according to the actions which need to be taken for recovery:
- Corrected Errors (CE): These are automatically detected and corrected by the hardware, meaning that the application does not need to take any action.
- Uncorrected Recoverable Errors (UCR): These are non-fatal but they require an action from the application side to survive.
- Uncorrected Errors (UE): These are fatal and always require a full system reboot.
