Lack of data validation - HTML code - Scala

Need

Ensure proper data validation and sanitization of HTML code to prevent code injection vulnerabilities.

Context

Usage of Scala for building scalable and high-performance applications
Usage of play.api.mvc for handling HTTP requests and building web applications in Play Framework
Usage of play.api.libs.json for JSON parsing and manipulation in Play Framework

Description

Non compliant code

        import play.api.mvc._
import play.api.libs.json._

class ApplicationController extends Controller {

  def unsafeEndpoint = Action { request =>
    val userSuppliedData = request.body.asText.getOrElse("")
    val response = Json.obj("message" -> userSuppliedData)
    Ok(response)
  }

}

In the above Scala code, we have an endpoint unsafeEndpoint that takes user input directly from the request body and inserts it into a JSON object without any form of validation or sanitization. This is a classic example of lack of data validation vulnerability.

The user input is obtained from request.body.asText.getOrElse("") and it is directly used to create a JSON object Json.obj("message" -> userSuppliedData). This user-supplied data is then sent as a response.

The problem here is that if the user-supplied data contains valid HTML code, it will be included as-is in the JSON response. Depending on how this JSON response is used in the front-end, it could lead to a situation where the HTML code is interpreted and executed by the browser.

This could lead to multiple types of attacks, including Cross-Site Scripting (XSS) attacks, where an attacker could inject malicious scripts to steal sensitive information, manipulate web content, or perform actions on behalf of the user.

To mitigate this vulnerability, we should never trust user-supplied data and always validate or sanitize it before use.

Steps

Implement input validation to ensure that the user-supplied data is safe and does not contain any malicious HTML code.
Use a library or framework feature to sanitize the user-supplied data before including it in the JSON response.
Consider using a whitelist approach to allow only specific HTML tags and attributes that are necessary for the application.
Escape any user-supplied data that is included in the JSON response to prevent it from being interpreted as HTML code.

Compliant code

        import play.api.mvc._
import play.api.libs.json._
import org.jsoup.Jsoup
import org.jsoup.safety.Whitelist

class ApplicationController extends Controller {

  def safeEndpoint = Action { request =>
    val userSuppliedData = request.body.asText.getOrElse("")
    val sanitizedData = Jsoup.clean(userSuppliedData, Whitelist.none())
    val response = Json.obj("message" -> sanitizedData)
    Ok(response)
  }

}

The above code fixes the vulnerability by sanitizing the user-supplied data before including it in the JSON response.

The Jsoup.clean method is used to sanitize the user-supplied data. This method takes two parameters: the text to be cleaned and a Whitelist object that defines the list of allowed HTML elements. In this case, Whitelist.none() is used, which means that no HTML tags are allowed and all will be removed.

The sanitized data is then included in the JSON response. This ensures that any HTML code supplied by the user is not included in the response and therefore cannot be interpreted as HTML by the client.

This approach effectively prevents HTML injection attacks, as any HTML code supplied by the user is stripped out before the response is sent.

References

321. Lack of data validation - HTML code