Insecurely generated token - Python

Insecurely generated token - Python

Need

Secure token generation mechanism

Context

  • Usage of Python 3 for writing and executing Python code
  • Usage of hashlib for cryptographic hashing operations

Description

Non compliant code

        import hashlib

def generate_token(username, password):
    token = hashlib.md5((username + password).encode()).hexdigest()
    return token
        
        

In the code above, we have a function generate_token that takes in a username and password as arguments. This function generates a token by concatenating the username and password, encoding the result to bytes, and then hashing it using the MD5 hash function from the hashlib module. The generated token is then returned by the function.

The vulnerability in this code lies in the use of the MD5 hash function, which is considered to be cryptographically broken and unsuitable for further use. It's vulnerable to hash collisions, where two different inputs produce the same hash output, which poses a serious security risk. This makes it possible for an attacker to crack the token's hash and obtain the information it is masking.

Moreover, the code does not use any salt in the hashing process. A salt is random data that is used as an additional input to a one-way function that hashes data, a password or passphrase. Salts are used to safeguard passwords in storage. Without a salt, an attacker can precompute a table of hash values for common passwords (a 'rainbow table') and then just look up the hash to quickly find the password that produced it.

This is a serious security vulnerability because if an attacker can obtain these tokens, they can potentially impersonate a user or gain access to sensitive information that the token is intended to protect.

Steps

  • Replace the insecure MD5 hash function with a secure hash function such as SHA-256.
  • Use a salt in the hash function to make it more secure. A salt is a random data that is used as an additional input to a one-way function that hashes data, a password or passphrase.
  • Consider using a library specifically designed for password hashing such as bcrypt, scrypt or Argon2.
  • Avoid using user input directly in the token generation. Instead, use user-specific but non-sensitive data.
  • Consider using a secure random number generator to generate a unique token for each user session.

Compliant code

        import hashlib
import os
import binascii
from getpass import getpass

def generate_token(username):
    # Use os.urandom to generate a salt
    salt = os.urandom(16)
    # Use the user's username and the salt to generate a SHA256 hash
    hashed_password = hashlib.pbkdf2_hmac('sha256', username.encode(), salt, 100000)
    # Convert the binary hash to a hexadecimal string for storage
    token = binascii.hexlify(hashed_password).decode()
    return token
        
        

The updated code now uses a more secure method for generating tokens. Instead of using the insecure MD5 hash function, it now uses the SHA-256 hash function, which is currently considered secure against attack.

The os.urandom function is used to generate a salt. A salt is a random string that is used as additional input to the hash function. This makes it more difficult for an attacker to use precomputed tables (rainbow tables) to attack the hash.

The hashlib.pbkdf2_hmac function is used to generate the hash. This function applies the HMAC (Hash-based Message Authentication Code) and the PBKDF2 (Password-Based Key Derivation Function 2) algorithm. HMAC provides a way to check the integrity of information transmitted over or stored in an unreliable medium, based on a secret key. PBKDF2 applies a pseudorandom function, such as a cryptographic hash, cipher, or HMAC to the input password or passphrase along with a salt value and repeats the process many times to produce a derived key, which can then be used as a cryptographic key in subsequent operations.

The binascii.hexlify function is used to convert the binary hash into a hexadecimal string for easier storage and handling.

The password is no longer used in the token generation process, which reduces the risk of password exposure. Instead, the token is now based on the user's username and a random salt. This means that even if two users have the same username, their tokens will be different because the salt is different.

References