Data Security and Privacy
Matches does not manipulate personally identifiable information (PII), compromising or sensitive data, as it only processes custom vulnerabilities provided by the client. Such custom threats are Stored in DynamoDB and S3, all inside Fluid Attacks’ AWS Account
Voyage AI
Voyage AI, by default, utilizes customer data for training and improving AI models. However, Fluid Attacks explicitly opts out of this default setting, ensuring that client data provided to Voyage AI is used solely for generating embeddings and is not leveraged for any model training or improvement purposes.
(Voyage AI Privacy Policy)
Some additional points to consider:
- Voyage AI hosts its infrastructure in USA.
- Since Fluid Attacks opted out of data collection, the data is not stored in Voyage AI’s servers (zero storage time).
- Data transmitted to Voyage AI is encrypted in transit with SSL in all its APIs.
- Voyage AI has GDPR, SOC 2 and HIPAA compliance certifications.
Chroma vector database
Chroma is an open source vector database which supports self management locally, which means that the data is not sent to any cloud provider.
Yet, chroma contains an
anonymous telemetry feature which is enabled by default. We have opted out of this feature by setting the
anonymized_telemetry
to
false
in the
Settings
object.
Contributing
All matches executions must be run within its root directory:
This will set up the development shell with the environment to run matches logged in as dev
.
Running lint
We use ruff to lint the code and mypy to type check it. These quality checks can be run with the following command
Running unit tests
Tests are based on pytest and can be run with the following command:
These tests are expected to be pure in terms of third party services, which means that they should not call external services or even communicate with the outside world, this is why the pytest socket plugin is configured to block all socket connections. IO on file system is allowed, using pytest tmp_path fixtures is the recommended flow for avoiding persistent side effects.
For mocking langchain chat models, we use the GenericFakeChatModel
class.
Debugging
If using VsCode based IDEs, you can use the Debug: Start Debugging(F5)
command to start a debug session and choose any of the debug configurations available in the launch.json
file. Set up breakpoints in the code and start debugging!
Running matches
Matches can be run with the following command target to prod
or dev
environments:
nix run .#matches <environment> <command>
or in an active development shell (which will default environment to dev
):
Matches extract command
nix run .#matches <environment> extract <group_name>
This will upload the matches results s3 and push an SQS message to the integrates platform to update the matches results as records in the integrates database. This async layer avoids coupling between the matches processing cli and the platform database DAL.
Other commands
The matches main cli also includes other commands, which can we seen by launching the matches cli with matches --help
flag:
Usage: matches [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
augment Augment training data.
benchmark Run end-to-end benchmark.
embed-defines Embed Fluid Attacks criteria in the vector database.
eval-model Evaluate a model with the given MODEL_NAME.
eval-refinement Evaluate refinement quality.
eval-translation Evaluate translation quality.
extract Extract matches for a group.
gen-unlabeled Generate unlabeled data.
label Label data for a target (train/test).
list-profiles List bedrock inference profiles.
semantic-compare Run semantic comparison experiment.
train Train a model with the given MODEL_NAME.