DynamoDB

DynamoDB

DynamoDB is a critical component in our ecosystem, fulfilling three primary roles:
  1. Integrates: Storage of entities, vulnerabilities, findings, and operational state.
  2. Sifts: Analysis state and processing data.
  3. Streams: Event source for updating OpenSearch indices (DynamoDB Streams → Lambda → OpenSearch).
Info
Important
Unlike OpenSearch and many other AWS services, DynamoDB does not write logs to CloudWatch. There are no DynamoDB logs to consult. Instead, monitoring and debugging rely on CloudWatch metrics and logs from Integrates that interact with DynamoDB. 

DynamoDB Architecture

Key Tables and Configurations

  1. Sifts State Table
    1. Name: sifts_state
    2. Configuration: 
      1. Billing Mode: PAY_PER_REQUEST (on-demand).
      2. Primary Key: hash_key (pk), range_key (sk).
      3. Protection: Deletion protection enabled.
      4. Backups: Point-in-time recovery enabled.
      5. Security: Server-side encryption enabled.
  2. Integrates Tables
    1. Multiple tables for different entity types (vulns, findings, etc.).
    2. Stream-enabled tables connected to Lambda functions.
    3. On-demand capacity (PAY_PER_REQUEST).

Implementation

DynamoDB tables are defined in Terraform:

resource "aws_dynamodb_table" "sifts_state" { name = "sifts_state"
billing_mode = "PAY_PER_REQUEST"
hash_key = "pk"
range_key = "sk"
deletion_protection_enabled = true
point_in_time_recovery {
enabled = true
}
server_side_encryption {
enabled = true
}
}



Stream configuration with Lambda triggers:

resource "aws_lambda_event_source_mapping" "integrates_streams" { for_each = local.triggers

event_source_arn = aws_dynamodb_table.integrates_vms.stream_arn
function_name = aws_lambda_function.integrates_streams[each.key].arn batch_size = each.value.batch_size bisect_batch_on_function_error = true maximum_retry_attempts = -1 starting_position = "TRIM_HORIZON"}


Monitoring with CloudWatch

Understanding DynamoDB Monitoring Sources

Since DynamoDB doesn't generate CloudWatch Logs, monitoring comes from two sources:
  1. CloudWatch Metrics (namespace AWS/DynamoDB):
    1. Shows capacity consumption, throttling events, errors, and latencies.
    2. Access via: CloudWatch → Metrics → AWS/DynamoDB → filter by TableName.
  2. Application and Lambda Logs:
    1. Your application logs: SDK errors, retries, and conditional check failures.
    2. Lambda function logs: Stream processing errors, batching issues.
    3. Access via: CloudWatch → Logs → [your-application-log-group] or /aws/lambda/integrates_streams_*.

Key Metrics to Monitor 

  1. Capacity and Throttling Metrics
    1. ConsumedReadCapacityUnits / ConsumedWriteCapacityUnits
    2. ReadThrottleEvents / WriteThrottleEvents
    3. ThrottledRequests (at API level)
  2. Error and Latency Metrics
    1. SystemErrors / UserErrors
    2. SuccessfulRequestLatency 
  3. or Stream-enabled Tables (Lambda Metrics)
    1. IteratorAge - Age of last processed event (critical for detecting lag)
    2. Errors - Failed executions
    3. Duration - Processing time
    4. Throttles - Lambda concurrency issues

How to View Metrics

  1. Navigate to CloudWatch → Metrics → AWS/DynamoDB.
  2. Filter by TableName dimension (e.g., "sifts_state").
  3. Select desired metrics.
  4. For stream metrics, go to AWS/Lambda namespace and filter by FunctionName.
Notes
Note
Even with on-demand capacity (PAY_PER_REQUEST), you can still experience throttling due to hot partition keys or burst limits.

Debugging with CloudWatch Logs

Since DynamoDB doesn't generate logs, you need to search in:
  1. Application Logs (e.g., Integrates log group):
    1. SDK errors and exceptions.
    2. Throttling events.
    3. Conditional failures.
    4. Network/timeout issues.
  2. Lambda Function Logs (for stream processing):
    1. Log groups:  /aws/lambda/integrates_streams_*.
    2. Stream batch processing errors.
    3. Retries and failures.
    4. Indexing issues to downstream systems (e.g., OpenSearch).

Useful Logs Insights Queries

Throttling and Common Errors (Run in application logs):

fields @timestamp, @message | filter @message like /ProvisionedThroughputExceededException|Throttling|TransactionCanceledException/| sort @timestamp desc | limit 100



Conditional Failures (Run in application logs):

fields @timestamp, @message | filter @message like /ConditionalCheckFailedException/| sort @timestamp desc | limit 100



Network and SDK Issues (Run in application logs):

fields @timestamp, @message | filter @message like /ClientError|ReadTimeout|ConnectTimeout|Retry/| sort @timestamp desc | limit 100



Stream Processing Errors (Run in Lambda logs):

fields @timestamp, @message | filter @message like /BulkIndexError|bisect|retry|validation|UnprocessedItems/| sort @timestamp desc | limit 100



CloudWatch Alarms for DynamoDB

Universe has centralized alarms for selected DynamoDB tables defined in runs/vpc/infra/alarms.tf

resource "aws_cloudwatch_metric_alarm" "dynamodb_write_capacity_alarm" { for_each = toset(var.dynamodb_table_names) metric_name = "ConsumedWriteCapacityUnits" namespace = "AWS/DynamoDB" period = 300 statistic = "Sum" threshold = var.write_capacity_threshold alarm_actions = [aws_sns_topic.central_alarms.arn] }




To add your table to these alarms, include it in the var.dynamodb_table_names variable.
  1. Capacity Alarms
    1. High Write Capacity:
      1. Metric: ConsumedWriteCapacityUnits
      2. Statistic: Sum
      3. Period: 5 minutes
      4. Threshold: > your expected maximum
    2. High Read Capacity:
      1. Metric: ConsumedReadCapacityUnits
      2. Statistic: Sum
      3. Period: 5 minutes
      4. Threshold: > your expected maximum
  2. Throttling Alarms
    1. Write Throttling Events:
      1. Metric: WriteThrottleEvents
      2. Statistic: Sum
      3. Period: 1 minute
      4. Threshold: > 0 (or acceptable baseline)
    2. Read Throttling Events:
      1. Metric: ReadThrottleEvents
      2. Statistic: Sum
      3. Period: 1 minute
      4. Threshold: > 0 (or acceptable baseline)
  3. Lambda Stream Processing Alarms
    1. High Iterator Age:
      1. Metric: IteratorAge (AWS/Lambda)
      2. Statistic: Maximum
      3. Period: 1 minute
      4. Threshold: > 300000 (5 minutes in ms)
    2. Lambda Errors:
      1. Metric: Errors (AWS/Lambda)
      2. Statistic: Sum
      3. Period: 1 minute
      4. Threshold: > 0