Asymmetric denial of service - ReDoS - Python

Need

Prevention of Regular Expression Denial of Service (ReDoS) attacks

Context

Usage of Python 3 for scripting and application development
Usage of the re library for regular expression pattern matching and manipulation

Description

Non compliant code

        import re

def validate_input(user_input):
    # This is a dangerous regular expression that can cause ReDoS
    pattern = re.compile("^(a+)+$")
    return bool(pattern.match(user_input))

In the code above, the function validate_input(user_input) is used to validate the input from the user using a regular expression. The regular expression ^(a+)+$ is a dangerous regex because it can cause Regular expression Denial of Service (ReDoS).

This regular expression is dangerous because it uses nested quantifiers. The outer + means "one or more of the preceding element" and the inner + means "one or more of 'a'". This kind of regular expression can lead to a catastrophic backtracking when trying to find a match.

For example, if an attacker sends a string of 'a's followed by a single different character (e.g., 'aaaaaaaaaaaaaaaab'), the regex engine will start backtracking to find a match, which can cause the CPU to spike and the server to become unresponsive, leading to a denial of service.

This kind of vulnerability can be exploited by an attacker to crash the server, making it unavailable to legitimate users.

Steps

Identify the dangerous regular expression in the code. In this case, it is '^(a+)+$'. This pattern can cause ReDoS because it has nested quantifiers, which can lead to excessive backtracking.
Simplify the regular expression to reduce the risk of ReDoS. In this case, the regular expression can be simplified to '^(a+)$'. This pattern will match one or more 'a' characters, but it will not cause excessive backtracking because it does not have nested quantifiers.
Replace the dangerous regular expression in the code with the simplified regular expression.
Test the updated code to ensure that it still functions as expected and that the risk of ReDoS has been mitigated.

Compliant code

        import re

def validate_input(user_input):
    # This is a simplified regular expression that mitigates the risk of ReDoS
    pattern = re.compile("^(a+)$")
    return bool(pattern.match(user_input))

The original code contained a dangerous regular expression ^(a+)+$ which could lead to Regular Expression Denial of Service (ReDoS) attacks. This is because the pattern had nested quantifiers ( + inside another +), which can cause excessive backtracking when matching certain strings. This excessive backtracking can consume a lot of CPU resources, potentially leading to a denial of service.

The fixed code replaces the dangerous regular expression with a simplified one: ^(a+)$. This pattern will match one or more 'a' characters, but it will not cause excessive backtracking because it does not have nested quantifiers. This mitigates the risk of ReDoS attacks.

The function validate_input now uses this safer regular expression to validate user input. It compiles the regular expression pattern, then checks if the user input matches this pattern. The function returns True if the input matches the pattern, and False otherwise.

This updated code should be tested with a variety of inputs to ensure it still functions as expected. It's also recommended to monitor the application's CPU usage during testing, to confirm that the risk of ReDoS has been effectively mitigated.

References

211. Asymmetric denial of service - ReDoS