Search Engine

To overcome the main tradeoff due to DynamoDB’s limitations, we use OpenSearch as our primary search engine to provide fast and efficient search capabilities across the data in our system. OpenSearch is deployed in a highly available configuration, ensuring reliability and performance.

Key Features

High Availability: The OpenSearch cluster is configured with zone awareness enabled, distributing nodes across multiple availability zones for maximum uptime.
Scalability: The cluster uses r6gd.large.search instances, providing a good balance of memory and compute resources.
Monitoring: Comprehensive logging is implemented using OpenTelemetry and a custom implementation for OpenSearch, tracking application logs and index operations.

Technical Implementation

The system uses the official Async OpenSearch Python client, providing:

Asynchronous operations for better performance
Automatic retry mechanisms for handling transient failures
Telemetry integration for monitoring and debugging

The search functionality is implemented with a focus on:

Full-text search capabilities
Fields filtering
Listing unique values
Pagination and sorting
Aggregation support

Development and Testing

For development and testing purposes, we maintain:

Local OpenSearch instance for development.
Mock implementations for testing using openmock.
Comprehensive logging and monitoring with Jaeger.

All the test data is populated and updated automatically with Streams once the application is started. You can see this behavior locally with:

integrates-local

A folder structure is used to organize the code for entities that have an index. See an example with the vulnerabilities index:

▶ 📁 vulnerabilities/

▶ 📁 index/

enums.py

filter.py

search.py

sort.py

types.py

types.py and enums.py contains important types for the search engine.

filter.py contains the methods to sanitize, parse, and validate the filters received from the API.

sort.py contains the methods to build valid OpenSearch sort parameters using the sort received from the API.

search.py contains the methods to build and perform a valid OpenSearch query using the filters and sorting parameters.

All filters, sort, and search will have an apply() method to use all its features in a single place and facilitate implementation like this:

from integrates.vulnerabilities.index import filter as vulns_filters
from integrates.vulnerabilities.index import sort as vulns_sort
from integrates.vulnerabilities.index import search as vulns_search

...

def resolve(parent: Vulnerability, info: GraphQLResolveInfo, **kwargs: dict) -> list[Vulnerability]:
    formatted_filters = vulns_filters.apply(kwargs.get("filters", {}))    
    formatted_sort = vulns_sort.apply(kwargs.get("sort", {}))

    results = await vulns_search.apply(formatted_filters, formatted_sort)

    return [format_vuln(result) for result in results]

Caution

Exceptions must be handled when apply() methods are used. Refer to the current implementations to see examples.

Unit testing

Test files can be found beside the implementation files.

▶ 📁 vulnerabilities/

▶ 📁 index/

filter_test.py

search_test.py

sort_test.py

Filters tests must test the most important sanitization, parsing, and authorization cases.

Sort tests must test the valid construction of sort parameters.

Search tests must test the valid construction of OpenSearch queries using the filters and sort parameters.

Note

Due to [openmock][openmock] limitations, complex queries with aggregations and other advanced features are not supported yet, and we are not able to test search.apply() completely. However, we can test the other search methods using snapshot testing.

Functional testing

When the application is started locally, the OpenSearch instance will be running in http://localhost:9200/. You can make requests to this endpoint using tools like Insomnia or Postman.

Refer to OpenSearch's official documentation to get more information about queries.

Tip