CSV injection - Python

CSV injection - Python

Need

Prevention of CSV injection attacks

Context

  • Usage of Python 3 for scripting and application development
  • Usage of csv for parsing and manipulating CSV files
  • Usage of Django for building web applications in Python

Description

Non compliant code

        import csv
from django.http import HttpResponse

def export_users_csv(request):
    response = HttpResponse(content_type='text/csv')
    response['Content-Disposition'] = 'attachment; filename="users.csv"'

    writer = csv.writer(response)
    users = User.objects.all().values_list('username', 'first_name', 'last_name', 'email')
    for user in users:
        writer.writerow(user)

    return response
        
        

In the above Django view, we're exporting all users in the system to a CSV file. The exported fields are: username, first_name, last_name, and email.

The vulnerability lies in the fact that we're directly writing user data into the CSV file without any sanitization or escaping. If any of these fields contain malicious data, it will be written as is into the CSV file.

For instance, if a user's first_name is =1+1, Excel will interpret this as a formula and execute it when the CSV file is opened. This is a CSV Injection vulnerability, also known as Formula Injection. It can lead to serious consequences if the formula is something harmful like =cmd|' /C calc'!A0 which can launch the calculator app, or worse, any other arbitrary commands.

Steps

  • Sanitize the data before writing it to the CSV file. This can be done by creating a function that checks if the first characters of the data are '=', '+', '-', '@', and if so, prepend the field with a single quote ('), which makes Excel treat the field as a string and not execute it as a formula.
  • Apply this sanitization function to each field of each user before writing it to the CSV file.

Compliant code

        import csv
from django.http import HttpResponse

def sanitize(data):
    if data[0] in ('=', '+', '-', '@'):
        return "'" + data
    return data

def export_users_csv(request):
    response = HttpResponse(content_type='text/csv')
    response['Content-Disposition'] = 'attachment; filename="users.csv"'

    writer = csv.writer(response)
    users = User.objects.all().values_list('username', 'first_name', 'last_name', 'email')
    for user in users:
        sanitized_user = [sanitize(field) for field in user]
        writer.writerow(sanitized_user)

    return response
        
        

The original code was vulnerable to CSV Injection, as it directly wrote user data into a CSV file without sanitizing it. This could allow an attacker to inject formulas into fields that are later exported as part of CSV files and can be interpreted by Excel.

The fixed code includes a sanitize function that checks if the first character of the data is one of '=', '+', '-', '@'. If it is, the function prepends the field with a single quote ('), which makes Excel treat the field as a string and not execute it as a formula.

In the export_users_csv function, this sanitize function is applied to each field of each user before writing it to the CSV file. This ensures that any potentially harmful data is neutralized before it is written to the CSV file, thus preventing CSV Injection.

References