Rationale
SageMaker is the platform we use for developing solutions involving Machine Learning.
The main reasons why we chose it over other alternatives are:
- It integrates with EC2, allowing to easily provision cloud computing resources. Such feature is essential in order to have horizontal autoscaling.
- It complies with several certifications from ISO and CSA. Many of these certifications are focused on granting that the entity follows best practices regarding secure cloud-based environments and information security.
- It integrates with S3, allowing us easily to store raw data, datasets and training outputs in our S3 Bucket.
- It supports a wide range of EC2 ML-specific machines for training models.
- It supports EC2 spot machines, allowing to considerably reduce machine costs.
- Thanks to its horizontal autoscaling capabilities, it is very easy to implement parallelism by running several models or feature combinations in separate machines, greatly increasing training performance.
- It supports Hyperparametrization, allowing to concurrently train several instances of a model using different parameter values. Such feature is essential for optimizing our most accurate model.
- It integrates with IAM, allowing to keep a least privilege approach regarding authentication and authorization.
- It supports a wide range of frameworks, including scikit-learn, the one that Sorts uses.
- EC2 workers performance can be monitored via CloudWatch.
- Logs for training jobs can be monitored via CloudWatch.
Alternatives
- IBM Watson Studio: It does not integrate with EC2 or S3, increasing overall complexity. Pending to review.
- GCP Vertex AI: It does not integrate with EC2 or S3, increasing overall complexity. Pending to review.
- Azure machine learning: It does not integrate with EC2 or S3, increasing overall complexity. Pending to review.
Usage
- We use SageMaker as the Machine Learning platform for training sorts, our ML-based software vulnerability scanner.
- We do not use SageMaker spot instances. Pending to implement.