- [1] Defined as Time To Recover (TTR), it is the elapsed time from the date the failure reaches production to the date the incident is resolved. The unit of measurement for <elapsed_time> must be defined as follows:
- if elapsed_time < 1 hour, then X minutes.
- if 1 day > elapsed_time > 1 hour, then X hours.
- if 1 month > elapsed_time > 1 day, then X days.
Always with a precision of up to one decimal place.
Term | Definition | |
COMMUNICATION_FAILURE | Occurs when it was an expected behavior, but it was not specified in the documentation, or there were insufficient instructions to perform the task correctly. | |
DATA_QUALITY | Refers to a situation in which the data used within a process is deemed inadequate or insufficient in accuracy, completeness, or relevancy, potentially leading to erroneous results or compromised outcomes. | |
FAILED_LINTER | Occurs when the linter and its configured rules do not identify a syntax error. | |
FAILED_MIGRATION | Denotes an error during a migration process, negatively impacting processes or data integrity. | |
IMPOSSIBLE_TO_TEST | Occurs when the flow or nature of the operation does not allow for testing. | |
INCOMPLETE_PERSPECTIVE | Refers to a situation where certain aspects were not considered during the planning or development of the functionality/process, resulting in failures or unexpected behavior. | |
INFRASTRUCTURE_ERROR | Refers to a situation when misconfigurations within the infrastructure as code setup result in component failures, reduced performance, or even complete service interruptions. | |
LACK_OF_TRACEABILITY | Occurs when insufficient loggers or tools are available to identify the root cause of the error. | |
MISSING_ALERT | Occurs when technologies in the development environment and the continuous integration do not trigger an alert on the error, allowing it to go unnoticed in production. | |
MISSING_TEST | Refers to the absence of specific tests, which can lead to undetected errors and result in product failures. | |
NO_SPECIFIED | Refers to an error that, regarding its nature, has no reasonable or clear explanation. | |
ROTATION_FAILURE | Occurs when failures in the credential rotation process lead to specific components losing access to third-party services, compromising the operability, performance, or availability of our products. | |
THIRD_PARTY_CHANGE | Refers to a situation wherein a third-party service or technology provider implements alterations in their infrastructure, potentially resulting in operational disruptions or unanticipated behaviors for their end-users. | |
THIRD_PARTY_ERROR |
| |
UNHANDLED_EXCEPTION | Occurs when an exception happens without an associated error-handling mechanism. |