Insufficient data authenticity validation - Checksum verification - Python

Insufficient data authenticity validation - Checksum verification - Python

Need

Ensure proper data authenticity validation through checksum verification

Context

  • Usage of Python 3 for writing and executing Python code
  • Usage of requests for making HTTP requests in Node.js

Description

Non compliant code

        import requests

def load_resource(url):
    response = requests.get(url)
    data = response.content
    return data
        
        

In this piece of code, the function load_resource(url) is used to load resources from an external server. The function sends a GET request to the provided URL and returns the content of the response.

This code is vulnerable because it does not validate the authenticity of the data it receives from the external server. In other words, it does not ensure that the data has not been tampered with during transmission. This lack of validation can lead to various security issues, such as the execution of malicious code or the exposure of sensitive information.

The vulnerability lies in the fact that the application implicitly trusts the data from the external server without performing any checks on its integrity. A common way to ensure data integrity is to use a checksum, a small-sized datum derived from the original data for the purpose of detecting errors that may have been introduced during its transmission or storage. The application should compute the checksum of the received data and compare it to the expected checksum. If the two do not match, the data should be considered corrupted and should not be used.

Steps

  • Introduce a checksum verification process for the data received from the external server.
  • Calculate the checksum of the received data using a secure hash algorithm such as SHA-256.
  • Obtain the expected checksum from a trusted source. This could be a separate secure channel or included with the data if it's signed by a trusted entity.
  • Compare the calculated checksum with the expected checksum.
  • If the checksums do not match, reject the data as it may have been tampered with during transmission.
  • Ensure that the checksum verification process is applied to all data received from external servers, not just specific resources.

Compliant code

        import requests
import hashlib

def load_resource(url, expected_checksum):
    response = requests.get(url)
    data = response.content

    # Calculate the checksum of the received data
    sha256 = hashlib.sha256()
    sha256.update(data)
    calculated_checksum = sha256.hexdigest()

    # Compare the calculated checksum with the expected checksum
    if calculated_checksum != expected_checksum:
        raise ValueError('Checksum does not match, data may have been tampered with.')

    return data
        
        

The updated code introduces a checksum verification process to validate the integrity of data received from external servers.

The load_resource function now takes an additional parameter expected_checksum, which is the expected checksum of the data to be loaded. This checksum should be obtained from a trusted source.

The function retrieves the data from the specified URL and calculates its checksum using the SHA-256 hash algorithm. This is done by creating a new SHA-256 hash object, updating it with the received data, and then getting the resulting checksum as a hexadecimal string.

The calculated checksum is then compared with the expected checksum. If they do not match, the function raises a ValueError indicating that the data may have been tampered with. This ensures that any tampered data is not used by the rest of the application.

This checksum verification process is a crucial step in ensuring the authenticity and integrity of data loaded from external sources. It helps protect the application against attacks that involve tampering with the data during transmission.

References