OpenSearch

OpenSearch

OpenSearch is a critical component in our Universe ecosystem, used by several key services:
Service OpenSearch Usage
Integrates Vulnerability search, code lines, packages, ports, roots
Sifts Code analysis and similar vulnerability search
Streams Processing DynamoDB events to OpenSearch

OpenSearch Architecture

OpenSearch Domains

  1. integrates Domain 

    • Version: OpenSearch 2.19
    • Configuration: 3 dedicated master nodes + 3 data nodes
    • Instance: r6gd.large.search (instances with NVMe storage)
    • Log group: opensearch (retention: 90 days)
  2. sifts Domain

    • Version: OpenSearch 2.15
    • Configuration: 3 dedicated master nodes + 3 data nodes
    • Instance: r6gd.large.search
    • Log group: sifts-opensearch (retention: 90 days)

Logs Configuration

Configured log types

Log Type Integrates Sifts Description
INDEX_SLOW_LOGS ❌ Disabled ❌ Disabled Slow indexing logs
SEARCH_SLOW_LOGS ✅ Enabled ❌ Disabled Slow search logs
ES_APPLICATION_LOGS ✅ Enabled ✅ Enabled General application logs

Main Indices

Integrates:
  1. vulns_index - Vulnerabilities
  2. findings_index - Findings
  3. lines_index - Code lines
  4. packages_index - Packages
  5. ports_index - Ports
  6. roots_index - Code roots
  7. inputs_index - User inputs
  8. events_index - System events
  9. executions_index - Executions
Sifts:
  1. vulnerabilities_candidates_v1 - Vulnerability candidates for analysis
  2. pkgs_index - Packages for analysis

Monitoring Strategies with CloudWatch

  1. Application Logs

  2. Location: CloudWatch → Logs → Log groups → opensearch or sifts-opensearch

    Useful queries:

    # General errors
    fields @timestamp, @message | filter @message like /ERROR/
    | sort @timestamp desc | limit 100

    # Connection errors
    fields @timestamp, @message | filter @message like /ConnectionError|connection.*failed|timeout/
    | sort @timestamp desc | limit 50

    # Cluster errors
    fields @timestamp, @message | filter @message like /cluster.*error|node.*failed|shard.*failed/
    | sort @timestamp desc
    | limit 50

  3. Slow Search Logs - Integrates Only

  4. Useful queries:

    # Slow searches
    fields @timestamp, @message | filter @message like /took\[[0-9]+ms\], took_millis\[[0-9]+\]/
    | sort @timestamp desc | limit 100

  5. OpenSearch Metrics

  6. Key metrics in CloudWatch (AWS/ES namespace):

    1. ClusterStatus - Cluster status (green, yellow, red)
    2. CPUUtilization - CPU usage
    3. FreeStorageSpace - Free space
    4. SearchLatency - Search latency
    5. IndexingLatency - Indexing latency
    6. JVMMemoryPressure - JVM memory pressure

    How to visualize:

    1. CloudWatch → Metrics → All Metrics 
    2. Filter by AWS/ES
    3. Select Per-Domain, Per-Client Metrics
    4. Filter by DomainName: integrates or sifts
    5. Select relevant metrics
    6. Group by ClientId

    Debugging Common Issues:

    1. No Results in Searches:
      1. Check application logs

      2. fields @timestamp, @message
        | filter @message like /search.*error|no.*results|zero.*results/
        | sort @timestamp desc
        | limit 20

      3. Check streams logs (indexing)
        1. Navigate to: CloudWatch → Logs → Log groups → /aws/lambda/integrates_streams_*.
        2. Look for errors in specific index processors:

        3. fields @timestamp, @message | filter @message like /indexing.*failed|BulkIndexError/
          | sort @timestamp desc

      4. Check indexing metrics
        1. Review IndexingRate and IndexingLatency.
        2. Check for recent drops in the indexing rate.
      5. Check cluster status
        1. Verify ClusterStatus (red indicates serious problems).
        2. Review ShardAllocationStatus if available.

    2. Indexing Errors from DynamoDB Streams:
      1. Check Lambda streams logs

      2. # In log group /aws/lambda/integrates_streams_*
        fields @timestamp, @message | filter @message like /Error|Exception|failed/
        | parse @message "Error * - */ as errorType, errorMessage
        | sort @timestamp desc
        | limit 50

      3. Check specific bulk operation errors

      4. fields @timestamp, @message | filter @message like /BulkIndexError|bulk.*error/
        | sort @timestamp desc

      5. Check Streams application logs

      6. fields @timestamp, @message | filter @message like /bulk.*rejected|EsRejectedExecutionException/
        | sort @timestamp desc

    3. Connectivity Issues:
      1. Check Lambda streams logs

      2. # In log group /aws/lambda/integrates_streams_*
        fields @timestamp, @message | filter @message like /Error|Exception|failed/
        | parse @message "Error * - */ as errorType, errorMessage
        | sort @timestamp desc
        | limit 50

      3. Check specific bulk operation errors

      4. fields @timestamp, @message | filter @message like /BulkIndexError|bulk.*error/
        | sort @timestamp desc

      5. Check Streams application logs

      6. fields @timestamp, @message | filter @message like /bulk.*rejected|EsRejectedExecutionException/
        | sort @timestamp desc