XPath injection - Python

Need

Prevention of XPath injection attacks

Context

Usage of Python 3 for scripting and application development
Usage of Django for building web applications in Python
Usage of lxml for parsing and manipulating XML documents

Description

Non compliant code

        from django.http import HttpResponse
from lxml import etree
import os

def user_search(request):
    username = request.GET['username']
    tree = etree.parse(os.path.join(os.getcwd(), 'users.xml'))
    users = tree.xpath(f"//user[username='{username}']")
    return HttpResponse(users)

This code snippet is a Django view function that fetches user data from an XML file based on a username passed as a GET parameter.

The vulnerability lies in the user_search function, specifically in the line:

                users = tree.xpath(f"//user[username='{username}']")

Here, the username parameter is directly embedded into an XPath expression without any sanitization or validation. This means that an attacker can manipulate the 'username' GET parameter to alter the XPath expression, leading to an XPath Injection vulnerability.

For instance, an attacker could provide a username of 'admin'] | //user[password=' or '1' = '1, which would result in the XPath expression //user[username='admin'] | //user[password=' or '1' = '1']. This would return all users where the username is 'admin' or the password exists, effectively bypassing any intended access controls.

Steps

Use parameterized XPath queries to prevent injection.
Avoid using string concatenation to create dynamic XPath queries.
Validate and sanitize user input before using it in an XPath query.
Consider using a safe API which avoids the use of the interpreter or provides a parameterized interface.

Compliant code

        from django.http import HttpResponse
from lxml import etree
import os
from defusedxml import lxml as dlxml

def user_search(request):
    username = request.GET['username']
    tree = dlxml.parse(os.path.join(os.getcwd(), 'users.xml'))
    users = tree.xpath("//user[username=$username]", username=username)
    return HttpResponse(users)

The original code was vulnerable to XPath injection because it used string formatting to insert a user-provided value directly into an XPath query. This could allow an attacker to manipulate the query, potentially accessing data they shouldn't be able to.

The fixed code avoids this vulnerability by using parameterized XPath queries. Instead of inserting the username directly into the query, it uses a placeholder ( $username) and provides the username as a separate argument ( username=username). This ensures that the username is properly escaped and can't be used to manipulate the query.

Additionally, the fixed code uses the defusedxml.lxml module to parse the XML file. This module is a more secure version of lxml that provides several protections against various XML-related attacks. It's a good practice to use this module whenever you're working with XML data in a security-sensitive context.

References

021. XPath injection