
Fix request: Automatic fixes can be requested from Retrieves via the 'Get Custom Fix' and 'Apply Suggested Fix' functionalities, or from Views via the 'How to fix' option in the vulnerability modal.
Subscriptions: The functionalities mentioned above always use one of these two GraphQL subscriptions: getCustomFix and getSuggestedFix, passing the required parameters accordingly.
Validation and prompt construction: After validating the provided inputs, the backend (Integrates) gathers the vulnerability context, which includes the URL of the S3 object where the vulnerable file is located.
Fixes API: Integrates sends a request to the Fixes WebSocket API Gateway, using a pre-signed URL for authentication and to transmit the previously obtained vulnerability context.
Fixes Lambda: Through the API, the Fixes Lambda is instructed to:
Retrieve the vulnerable code from S3.
Analyze the code.
Extract the vulnerable snippet.
Generate a prompt with instructions for the AI model, either to produce a remediation guide or to directly remediate the vulnerable code snippet.
Sending the prompt to the LLM: From the Lambda, the prompt is sent via the Boto client to a large language model (LLM) hosted on Amazon Bedrock, using an inference profile.
LLM response: The LLM processes the input and generates a response.
Since the complete output may take several seconds, it is returned as a streamed response over WebSockets, where the response is progressively delivered in chunks.
This technique improves the user experience by enabling partial results to be displayed as they are generated.
Transmission to the final client: Integrates relays the streamed response, which is then sent to the Retrieves or Views client through the initial GraphQL subscription.
Displaying the result: The response is shown to the user either
as a Markdown-formatted remediation guide, or
as a structured text containing the remediated code snippet with placeholders to replace the vulnerable code.
Fixes includes two LLM-as-a-Judge evaluators within its testing pipeline.
An LLM-as-a-Judge evaluator is a workflow designed to measure, through artificial intelligence, the quality of the responses generated by a system that also produces results using AI. In the case of Fixes, every time a commit is pushed, the evaluator runs the getCustomFix and getSuggestedFix functionalities multiple times, using different inputs taken from a collection of test cases that simulate real scenarios encountered in production.
Each output generated by Fixes is evaluated individually by a language model, which determines whether the response meets the quality criteria. To do this, the evaluator relies on a set of rubrics that define what a valid response must satisfy. Based on these rules, the LLM assigns a score of 0 or 1 for each execution, depending on whether the response meets the established quality standards.
Once all test cases have been processed, the evaluator computes the average of all the scores (zeros and ones). This average becomes the final score of the evaluation.
It is important to note that because both Fixes and the evaluators are powered by LLMs, the results are not fully deterministic. Two identical executions do not guarantee the same score. For this reason, although the goal is to set the threshold as close to 1 as possible, achieving a perfectly consistent score of 1 is difficult in practice.
These evaluations run as jobs within the CI pipeline. Each job is configured with a threshold between 0 and 1. If the final score of any evaluator does not exceed this threshold, the job will fail and prevent the changes from moving to production.
This ensures that every deployment of Fixes maintains or improves the quality of the generated responses. If a degradation in quality is detected, the deployment is blocked and cannot proceed to production.
To manage these evaluations, we rely on LangSmith, which allows us to store and review the full history of all executions. You can browse this history directly at smith.langchain.com.
