1.2 Architecture
All services leverage Docker containers to ensure isolation and enable the creation of loosely coupled services that can be deployed independently. This approach simplifies and accelerates workflows, while providing flexibility to innovate with custom toolsets, application stacks, and deployment environments tailored to each service.
By bundling software, libraries, and configuration files within containers, components remain isolated from one another. Communication between components is facilitated through RESTful methods, ensuring interoperability. REST's architectural constraints promote scalability of interactions, generality of interfaces, and independent deployment of individual components.
For external communication, security is ensured by a gateway that employs nginx.
Data Context Hub
The following sections provide a brief overview of the Data Context Hub components. For more detailed information, refer to the linked sections.
Graph Builder Services (GBS)
Graph Builder Services is the central component of Data Context Hub. It provides all the necessary logic to import data using Data Pumps and process it into a knowledge graph stored in Neo4j.
To manage asynchronous tasks, GBS utilizes RabbitMQ for message handling and Apache Airflow to automate and schedule data import workflows.
It also serves as a backend for GBS UI, a frontend targeted at data engineers. This UI offers full access to all functionalities provided by GBS, enabling users to:
- Define data structures tailored to the data being imported via Data Pumps,
- Create and execute load plans through Apache Airflow,
- Set access rules for the Graph Security Layer.
Additionally, GBS Beta is an upcoming frontend currently in development, designed to offer an enhanced and modern user experience.
Backend Services
Backend Services provide a REST API with endpoints to query data from the knowledge graph, enabling the development of custom frontends. These graph-related endpoints enforce rules defined in the Graph Security Layer.
Explorer & Explorer Backend
The Explorer allows users to navigate the knowledge graph. With its dedicated Explorer Backend, users can effortlessly traverse the graph and examine how data is interconnected.
Data Pumps
Data Pumps are specialized, isolated plug-and-play components used to import data from various sources. For more details, refer to Data Pumps.
Workers
Workers are independent applications utilized by the Data Context Hub. They can operate through either GBS or Data Pumps to perform specific tasks. Designed for maximum independence, Workers can be deployed and run separately from the Data Context Hub stack.
Memory For Your AI (M4AI)
M4AI enables the creation and management of specialized engineering agents designed to deliver domain-specific insights and streamline the automation of complex, industry-specific tasks. This allows organizations to enhance efficiency and focus on resolving specialized challenges.
Dependencies
Internally developed components leverage 3rd-party packages, and each component provides a method to display its respective licenses:
Component | Licenses |
---|---|
GBS UI and Explorer | {{ui-url}}/LICENSES.html |
GBS and EBS API | {{api-url}}/api/system/licenses |
Workers | {{worker-url}}/licenses |
External Software
Data Context Hub uses following 3rd party software components.
PostgreSQL
PostgreSQL is a powerful, open source object-relational database system. It has a strong reputation for reliability, feature robustness, and performance. All the relational data for Data Context Hub are stored in PostgreSQL.
Neo4j
Neo4j (license) is a graph database management system. It is an ACID-compliant transactional database featuring native graph storage and processing. Neo4j is not included or embedded as part of our solution. To access all the Data Context Hub features, access to a dedicated Neo4j instance must be provided separately. In the context of the Data Context Hub, data is stored in Neo4j as a schema-free knowledge graph and is accessible through a unified access layer.
RabbitMQ
RabbitMQ is a lightweight open source message broker that implements the Advanced Message Queuing Protocol (AMQP). GBS utilizes RabbitMQ to send messages and trigger asynchronous tasks in the GBS Worker.
Apache Airflow
Apache Airflow is an open-source workflow management platform designed for data engineering pipelines. Workflows are represented as Directed Acyclic Graphs (DAGs) of tasks. GBS utilizes Apache Airflow to automate and schedule data import processes through these workflows.
Keycloak
Keycloak is an open source identity and access management solution designed for modern applications and services. It simplifies the process of securing applications and services with minimal code changes. For more details, refer to the Keycloak section.
Open Telemetry
Open Telemetry is a collection of open source tools, APIs, and SDKs to instrument, generate, collect, and export telemetry data. It uses open-standard semantic conventions to ensure vendor-agnostic data collections which can be send to different observability backends.
OpenSearch
OpenSearch is an open source enterprise-grade search and observability suite. Data Prepper (license) is a server-side data collector capable of filtering, enriching, transforming, normalizing, and aggregating data for downstream analysis and visualization. OpenSearch Dashboards (license) is a flexible, fully integrated data visualization toolset for visually exploring and querying data. For more information refer to the Deployment & Maintenance section.
Weviate
Weaviate is an open-source vector database that simplifies the development of AI applications. Built-in vector and hybrid search, easy-to-connect machine learning models, and a focus on data privacy enable developers of all levels to build, iterate, and scale AI capabilities faster.
Celery
Celery is an open source asynchronous task queue or job queue which is based on distributed message passing which focuses on operations in real time.