DynamoDB

DynamoDB is a critical component in our ecosystem, fulfilling three primary roles:

Integrates: Storage of entities, vulnerabilities, findings, and operational state.
Sifts: Analysis state and processing data.
Streams: Event source for updating OpenSearch indices (DynamoDB Streams → Lambda → OpenSearch).

Important

Unlike OpenSearch and many other AWS services, DynamoDB does not write logs to CloudWatch. There are no DynamoDB logs to consult. Instead, monitoring and debugging rely on CloudWatch metrics and logs from Integrates that interact with DynamoDB.

DynamoDB Architecture

Key Tables and Configurations

Sifts State Table

Name: sifts_state
Configuration:

Billing Mode: PAY_PER_REQUEST (on-demand).
Primary Key: hash_key (pk), range_key (sk).
Protection: Deletion protection enabled.
Backups: Point-in-time recovery enabled.
Security: Server-side encryption enabled.

Integrates Tables

Multiple tables for different entity types (vulns, findings, etc.).
Stream-enabled tables connected to Lambda functions.
On-demand capacity (PAY_PER_REQUEST).

Implementation

DynamoDB tables are defined in Terraform:

resource "aws_dynamodb_table" "sifts_state" {
  name                        = "sifts_state"  
  billing_mode                = "PAY_PER_REQUEST"  
  hash_key                    = "pk"  
  range_key                   = "sk"  
  deletion_protection_enabled = true  
  point_in_time_recovery { 
    enabled = true 
  }  
  server_side_encryption { 
    enabled = true 
  }
}

Stream configuration with Lambda triggers:

resource "aws_lambda_event_source_mapping" "integrates_streams" {
  for_each = local.triggers 
 
  event_source_arn                   = aws_dynamodb_table.integrates_vms.stream_arn  
  function_name                      = aws_lambda_function.integrates_streams[each.key].arn
  batch_size                         = each.value.batch_size
  bisect_batch_on_function_error     = true
  maximum_retry_attempts             = -1
  starting_position                  = "TRIM_HORIZON"}

Monitoring with CloudWatch

Understanding DynamoDB Monitoring Sources

Since DynamoDB doesn't generate CloudWatch Logs, monitoring comes from two sources:

CloudWatch Metrics (namespace AWS/DynamoDB):

Shows capacity consumption, throttling events, errors, and latencies.
Access via: CloudWatch → Metrics → AWS/DynamoDB → filter by TableName.

Application and Lambda Logs:

Your application logs: SDK errors, retries, and conditional check failures.
Lambda function logs: Stream processing errors, batching issues.
Access via: CloudWatch → Logs → [your-application-log-group] or /aws/lambda/integrates_streams_*.

Key Metrics to Monitor

Capacity and Throttling Metrics

ConsumedReadCapacityUnits / ConsumedWriteCapacityUnits
ReadThrottleEvents / WriteThrottleEvents
ThrottledRequests (at API level)

Error and Latency Metrics

SystemErrors / UserErrors
SuccessfulRequestLatency

or Stream-enabled Tables (Lambda Metrics)

IteratorAge - Age of last processed event (critical for detecting lag)
Errors - Failed executions
Duration - Processing time
Throttles - Lambda concurrency issues

How to View Metrics

Navigate to CloudWatch → Metrics → AWS/DynamoDB.
Filter by TableName dimension (e.g., "sifts_state").
Select desired metrics.
For stream metrics, go to AWS/Lambda namespace and filter by FunctionName.

Note

Even with on-demand capacity (PAY_PER_REQUEST), you can still experience throttling due to hot partition keys or burst limits.

Debugging with CloudWatch Logs

Finding DynamoDB-Related Issues in Logs

Since DynamoDB doesn't generate logs, you need to search in:

Application Logs (e.g., Integrates log group):

SDK errors and exceptions.
Throttling events.
Conditional failures.
Network/timeout issues.

Lambda Function Logs (for stream processing):

Log groups: /aws/lambda/integrates_streams_*.
Stream batch processing errors.
Retries and failures.
Indexing issues to downstream systems (e.g., OpenSearch).

Useful Logs Insights Queries

Throttling and Common Errors (Run in application logs):

fields @timestamp, @message
| filter @message like /ProvisionedThroughputExceededException|Throttling|TransactionCanceledException/| sort @timestamp desc
| limit 100

Conditional Failures (Run in application logs):

fields @timestamp, @message
| filter @message like /ConditionalCheckFailedException/| sort @timestamp desc
| limit 100

Network and SDK Issues (Run in application logs):

fields @timestamp, @message
| filter @message like /ClientError|ReadTimeout|ConnectTimeout|Retry/| sort @timestamp desc
| limit 100

Stream Processing Errors (Run in Lambda logs):

fields @timestamp, @message
| filter @message like /BulkIndexError|bisect|retry|validation|UnprocessedItems/| sort @timestamp desc
| limit 100

CloudWatch Alarms for DynamoDB

Universe has centralized alarms for selected DynamoDB tables defined in runs/vpc/infra/alarms.tf.

resource "aws_cloudwatch_metric_alarm" "dynamodb_write_capacity_alarm" {
  for_each            = toset(var.dynamodb_table_names)
  metric_name         = "ConsumedWriteCapacityUnits"  namespace           = "AWS/DynamoDB"  period              = 300
  statistic           = "Sum"  threshold           = var.write_capacity_threshold
  alarm_actions       = [aws_sns_topic.central_alarms.arn]
}

To add your table to these alarms, include it in the var.dynamodb_table_names variable.

Recommended Alarms

Capacity Alarms

High Write Capacity:

Metric: ConsumedWriteCapacityUnits
Statistic: Sum
Period: 5 minutes
Threshold: > your expected maximum

High Read Capacity:

Metric: ConsumedReadCapacityUnits
Statistic: Sum
Period: 5 minutes
Threshold: > your expected maximum

Throttling Alarms

Write Throttling Events:

Metric: WriteThrottleEvents
Statistic: Sum
Period: 1 minute
Threshold: > 0 (or acceptable baseline)

Read Throttling Events:

Metric: ReadThrottleEvents
Statistic: Sum
Period: 1 minute
Threshold: > 0 (or acceptable baseline)

Lambda Stream Processing Alarms

High Iterator Age:

Metric: IteratorAge (AWS/Lambda)
Statistic: Maximum
Period: 1 minute
Threshold: > 300000 (5 minutes in ms)

Lambda Errors:

Metric: Errors (AWS/Lambda)
Statistic: Sum
Period: 1 minute
Threshold: > 0