1.2 Architecture

All services leverage Docker containers to ensure isolation and enable the creation of loosely coupled services that can be deployed independently. This approach simplifies and accelerates workflows, while providing flexibility to innovate with custom toolsets, application stacks, and deployment environments tailored to each service.
By bundling software, libraries, and configuration files within containers, components remain isolated from one another. Communication between components is facilitated through RESTful methods, ensuring interoperability. REST's architectural constraints promote scalability of interactions, generality of interfaces, and independent deployment of individual components.
For external communication, security is ensured by a gateway that employs nginx.
Data Context Hub
The following sections provide a brief overview of the Data Context Hub components. For more detailed information, refer to the linked sections.
Graph Builder Services (GBS)
Graph Builder Services is the central component of Data Context Hub. It provides all the necessary logic to import data using Intake Agents and process it into a knowledge graph stored in Neo4j.
To manage asynchronous tasks, GBS uses RabbitMQ for message handling and Apache Airflow to automate and schedule data import workflows.
It also serves as a backend for GBS UI, a frontend targeted at data engineers. This UI offers full access to all functionalities provided by GBS, enabling users to:
- Define data structures tailored to the data being imported via Intake Agents,
- Create and execute load plans through Apache Airflow,
- Set access rules for the Graph Security Layer.
Additionally, GBS Beta is an upcoming frontend currently in development, designed to offer an enhanced and modern user experience.
Linked Data API
Linked Data API provides a REST API with endpoints to query data from the knowledge graph, enabling the development of custom frontends. These graph-related endpoints enforce rules defined in the Graph Security Layer.
Intake Agents
Intake agents are independent applications used by the Data Context Hub to import data from various sources and manage problem-specific models. For more details, refer to Intake Agents.
Workers
Workers are independent applications used by the Data Context Hub to perform specialized tasks. They can operate through different services like GBS or Linked Data API.
Memory For Your AI (M4AI)
M4AI enables the creation and management of specialized engineering agents designed to deliver domain-specific insights and streamline the automation of complex, industry-specific tasks. This allows organizations to enhance efficiency and focus on resolving specialized challenges.
Dependencies
Internally developed components leverage 3rd-party packages, and each component provides a method to display its respective licenses:
| Component | Licenses |
|---|---|
| UI | {{ui-url}}/LICENSES.html |
| API | {{api-url}}/api/system/licenses |
| Workers | {{worker-url}}/licenses |
| Intake Agents | Visible in Admin UI of each worker |
External Software
Data Context Hub uses following 3rd party software components.
PostgreSQL
PostgreSQL is a powerful, open source object-relational database system. It has a strong reputation for reliability, feature robustness, and performance. All the relational data for Data Context Hub are stored in PostgreSQL.
Neo4j
Neo4j is an ACID-compliant transactional graph database management system featuring native graph storage and processing. Data Context Hub uses the enterprise edition to store data as a schema-free knowledge graph providing access through a unified access layer.
RabbitMQ
RabbitMQ is a lightweight open source message broker that implements the Advanced Message Queuing Protocol (AMQP). GBS utilizes RabbitMQ to send messages and trigger asynchronous tasks in the GBS Worker.
Apache Airflow
Apache Airflow is an open-source workflow management platform designed for data engineering pipelines. Workflows are represented as Directed Acyclic Graphs (DAGs) of tasks. GBS utilizes Apache Airflow to automate and schedule data import processes through these workflows.
Keycloak
Keycloak is an open source identity and access management solution designed for modern applications and services. It simplifies the process of securing applications and services with minimal code changes. For more details, refer to the Keycloak section.
Open Telemetry
Open Telemetry is a collection of open source tools, APIs, and SDKs to instrument, generate, collect, and export telemetry data. It uses open-standard semantic conventions to ensure vendor-agnostic data collections which can be sent to different observability backends.
OpenSearch
OpenSearch is an open source enterprise-grade search and observability suite. Data Prepper (license) is a server-side data collector capable of filtering, enriching, transforming, normalizing, and aggregating data for downstream analysis and visualization. OpenSearch Dashboards (license) is a flexible, fully integrated data visualization toolset for visually exploring and querying data. For more information refer to the Deployment & Maintenance section.
Weviate
Weaviate is an open-source vector database that simplifies the development of AI applications. Built-in vector and hybrid search, easy-to-connect machine learning models, and a focus on data privacy enable developers of all levels to build, iterate, and scale AI capabilities faster.
Celery
Celery is an open source asynchronous task queue or job queue which is based on distributed message passing which focuses on operations in real time.