Skip to main content
Version: 3.1

5.1 Installation

Kubernetes is the targeted platform for running Data Context Hub. This guide assumes a cluster exists and can be accessed for deployments. Data Context Hub is shipped with all the necessary components and services to run completely in Kubernetes without the need to setup external services. However, it is possible to use external services like PostgreSQL and Neo4j which is recommended in production environments. More information on how to configure Data Context Hub to run with external services can be found below.

tip

We recommend the adoption of ArgoCD as a continuous delivery solution for managing Kubernetes applications.

Transfer Docker Images

Docker images are available on our registry and can be pulled from there. However, we recommend that you transfer the images to your own registry as we do not guarantee high availability of our registry.

tip

You can get all available images with this command:

curl --header "PRIVATE-TOKEN: <token>" "https://gitlab.c64.ai/api/v4/projects/4/registry/repositories" | jq

Install Helm

We provide a Helm chart repository for the deployment of Data Context Hub on Kubernetes.

We provide a stable channel for Helm packages which contains the stable release packages of Data Context Hub. Use following command to add the Data Context Hub Helm repository:

helm repo add dch-stable https://gitlab.c64.ai/api/v4/projects/4/packages/helm/stable --username __token__ --password <token>
helm repo update

You can now deploy using helm install but before doing so make sure to read the next section about how to configure Data Context Hub.

helm install datacontexthub dch-stable/explore --namespace <namespace>

Configuring Data Context Hub

Before deploying a Data Context Hub instance, ensure to read the following sections for any required configuration.

note

It is highly recommended to create your own values.yaml file.

Development Mode

The Development Mode is intended for use in non-production environments, such as testing or development setups. When activated, certificate validation is disabled.

warning

If certificate validation is deactivated, this is done at the user's own risk. In this case, Context64 GmbH assumes no warranty or liability for risks that may arise from insecure or expired certificates, to the extent permitted by law.

To enable Development Mode, set the "mode" value in the values.yaml to "development".

Versions of 3rd Party Services

The values.yaml includes versions of 3rd party services Data Context Hub was tested with. While we do not expect any problems when using different minor or patch versions, it is highly recommended to use the provided versions.

Storage

Data Context Hub requires a set of volumes for persistence and data exchange between services in the system.

  • Airflow
  • Postgres
  • Neo4j
  • Weaviate
  • Opensearch

By default, storage is configured to automatically create Persistent Volume Claims (PVC). You have to configure the appropriate storage classes by setting <service>.persistence.storageClassName and <service>.persistence.size in values.yaml. Since OpenSearch quickly takes up a lot of space, volumes are optional and deactivated by default. You can enable persistence for these services by setting <service>.persistence.enabled: true.

Optionally, you can use local persistent volumes which will be written to the file system. To do so, set storage.create_local_storage_pv: true, <service>.persistence.storageClassName: local-storage and adapt pv_path_* variables. Remember to manually create all folders that are used in pv_path_* variables with sufficient permissions before starting an instance.

SMTP Configuration in Airflow

Airflow supports sending emails when tasks are retried or fail. In order to have Airflow send emails following variables have to be set in your values.yaml:

  • airflow__smtp__smtp_host
  • airflow__smtp__smtp_mail_from
  • airflow__smtp__smtp_port
info

Additionally, username and password of the SMTP user have to be configured in Airflow's UI:

  • Create a new connection called "smtp_default"
  • Fill in Login (= username) and Password and save. It doesn't matter which Connection Type is chosen because only credentials are used.

Using External Services

Please check values.yaml to see which versions of the following services Data Context Hub was tested with.

PostgreSQL

The system can optionally work with a previously provisioned PostgreSQL instance (e.g. on a bare metal server or AWS Aurora). In this case postgres.enabled has to be set to false and following configuration variables have to be adapted:

  • postgres.host
  • postgres.port
  • postgres.user
  • postgres.password

Depending on which database user is provided to Data Context Hub, several required databases are created automatically by the system. Otherwise, all databases have to be present before starting Data Context Hub for the first time. The following table lists all databases required by Data Context Hub as well as the corresponding configuration variables in values.yaml.

DatabaseConfigurationNotes
airflowairflow.database
keycloakkeycloak.database
dch_contenvironment.cont_databaseCreated automatically
dch_repoenvironment.repo_databaseCreated automatically

Neo4j

Neo4j is not included in the system and must be installed separately.
To integrate Neo4j, you need to set up and manage your own Neo4j or Neo4j AuraDB instance.
A sample configuration for this integration is available in the Helm chart.