Skip to main content

Investigate: Dagster Orchestration Platform for UIS

IMPLEMENTATION RULES: Before implementing this plan, read and follow:

Status: Backlog

Goal: Deploy Dagster as the data orchestration platform in UIS, providing scheduling, observability, and lineage for data pipelines across applications — starting with Atlas, with reusability for future apps.

Last Updated: 2026-04-21

Depends on: PostgreSQL (042), Traefik ingress. Authentik (070-079) is optional — Dagster UI can run without authentication on internal networks, or behind Traefik OIDC middleware when Authentik is deployed.

Request origin: Atlas (atlas-data application) — see ~/learn/helpers/atlas/docs/stack/ for the context in which the need arose. Atlas has ~24 public data sources to ingest, with mixed cadences (annual SSB tables, daily NGO scrapes, live weather feeds for future Storm Mode extension). One operator view across all ingestion jobs is a hard requirement.


Questions to Answer

  1. Which Dagster distribution — Dagster OSS (free, self-hosted) or Dagster+ (SaaS)?
  2. Which deployment mechanism — official Helm chart or roll our own manifests?
  3. How to deploy user code — the official "user-code-deployments" pattern (separate gRPC server per application) vs. co-located?
  4. Metadata storage — reuse shared UIS PostgreSQL (new database + user), or give Dagster a dedicated instance?
  5. Auth at day 1 — no auth (internal only), or behind Traefik + Authentik OIDC from the start?
  6. Run pod execution — Kubernetes executor (spawns a pod per run step) or in-process?
  7. Resource footprint — is Dagster light enough for the target UIS laptop profile, or does it require a dedicated host tier?
  8. Multi-tenant — how do we isolate Atlas's code from a future second application's code? One deployment per app, or shared deployment with code locations?

Problem Statement

UIS today has no data-pipeline orchestrator. Applications that need to ingest, transform, and catalogue data have three options — each with drawbacks:

  • Hand-rolled Kubernetes CronJobs — works for 5 jobs. Does not scale to 20+: no single pane of view, no freshness tracking, no DAG dependencies, no UI for re-runs.
  • Application-level schedulers — each app builds its own. Duplication across apps, no shared pattern, no shared observability.
  • Airbyte — commercial-style ingestion tool, but only its catalogue of pre-built connectors earns its keep; for Norwegian public-sector APIs the connectors don't exist and we'd write all Custom Connectors ourselves (evaluated and rejected for Atlas; see ~/learn/helpers/atlas/docs/stack/ conversation log).

Dagster addresses this at the platform level: one orchestrator serving many applications, each shipping its own user-code image that Dagster loads as a "code location."


What is Dagster?

Dagster is an open-source (Apache 2.0) data orchestrator built around the concept of software-defined assets. Each data product (a table, a file, a model) is an "asset" with explicit code, dependencies, schedules, freshness policies, and metadata. The Dagster UI shows every asset across every pipeline in one place.

Distribution: Apache 2.0, GitHub dagster-io/dagster, maintained by Dagster Labs.

Why it fits UIS:

  • Platform-shaped, not application-shaped — one install can host many applications as separate "code locations."
  • Language-agnostic execution — user code can be Python (native), or via Dagster Pipes any language that writes structured stdout (TypeScript, Go, Rust, shell). Atlas's TypeScript ingestion lands cleanly here.
  • dbt integration is first-class — every dbt model becomes a Dagster asset automatically with lineage intact.
  • Kubernetes-native — designed to run on K8s, spawning one pod per run step.
  • Observability out of the box — a web UI showing asset status, run history, lineage graph, schedules, freshness. Alerts via Slack / email / webhooks. Metrics as Prometheus.

Architecture

Dagster is a multi-component system. Understanding each piece is essential for the UIS deployment.

ComponentWhat it doesShapePersistence
Webserver (dagster-webserver)Serves the UI and GraphQL API. Stateless.Long-running K8s DeploymentNone
Daemon (dagster-daemon)Runs schedules, sensors, and the run queue. Triggers runs.Long-running K8s Deployment (singleton)None
User code deploymentA gRPC server hosting application-specific asset/job definitions. One per code location.Long-running K8s DeploymentNone
Metadata DBRun history, event log, schedule state, asset catalogue. The source of truth for "what happened."PostgreSQLYes — must be durable
Run podsEphemeral K8s pods spawned by the daemon to execute a specific run step.Short-lived K8s PodsNone (state in metadata DB)

In practice, UIS installs the first four once. Each application (Atlas today, possibly others tomorrow) contributes a Docker image with its user code; Dagster loads it as a code location. Run pods are spawned on-demand from user code images.

Code locations — the multi-application pattern

A code location is Dagster's unit of user-code deployment. Each is:

  • A Docker image containing the application's asset/job definitions (Python package + any supporting tools like dbt, node runtime)
  • Registered in the Dagster Helm chart's workspace section via a Kubernetes user-code deployment
  • Loaded by the webserver and daemon on startup (or on reload)

For UIS this is the clean multi-application boundary: platform owns the Dagster install; applications ship their own user-code image.


Deployment in UIS

Follow the Adding a Service guide. This section maps Dagster to the UIS service conventions.

Distribution: Dagster OSS (self-hosted)

Not Dagster+. Dagster+ is the SaaS offering ($800+/month team tier for its managed webserver and hosted metadata). UIS is a sovereign-infrastructure platform; self-hosted OSS is the correct match. All features relevant to Atlas (asset graph, schedules, freshness policies, dbt integration, alerts) are in OSS. Dagster+ adds branch deployments, usage analytics, and a managed UI — none of which outweigh the sovereignty requirement.

Deployment mechanism: official Helm chart

dagster/dagster from https://dagster-io.github.io/helm — maintained by Dagster Labs, tracks the core release cadence, includes all components. This is the default for production deployments and has been battle-tested.

helm repo add dagster https://dagster-io.github.io/helm
helm install dagster dagster/dagster -n dagster -f values.yaml

The chart deploys webserver, daemon, and optionally user-code deployments. Rolling our own manifests would duplicate the chart's ~1,500 lines without adding value.

Namespace

New namespace: dagster. Matches UIS's one-namespace-per-service-category convention (Backstage has backstage, Grafana is in observability, etc.).

PostgreSQL: reuse shared UIS PostgreSQL

Dagster needs PostgreSQL for its metadata (run history, event log, schedule state, asset catalogue). Reuse the existing UIS shared PostgreSQL — create a new database and user:

CREATE DATABASE dagster;
CREATE USER dagster_user WITH PASSWORD '...';
GRANT ALL PRIVILEGES ON DATABASE dagster TO dagster_user;

Dagster's metadata is modest (tens of MB for a typical install; low hundreds of MB for a year of run history at high volume). No reason to provision a dedicated instance.

Ingress: Traefik

Same pattern as Backstage, Grafana, etc. The Dagster webserver exposes HTTP on port 80 inside the cluster; Traefik routes dagster.sovereignsky.no (or similar) to it.

Auth strategy

Two options, both viable:

  • Day 1 — no auth. Acceptable if the Dagster UI is reachable only from the UIS internal network or via Tailscale. Lowest complexity.
  • Day 2 — Authentik OIDC via Traefik middleware. Adds a Traefik forwardAuth middleware that redirects unauthenticated requests to Authentik. Dagster OSS has no built-in auth, so this is the standard pattern.

Recommended: start with no auth (Dagster is an operator tool, not public), move to OIDC when Authentik deployment is stable.

Metrics and logging

  • Metrics: Dagster exposes Prometheus metrics on the daemon and webserver. Add a ServiceMonitor for Prometheus to scrape. Metrics include run counts, scheduler lag, asset freshness violations.
  • Logs: Structured JSON to stdout, scraped by Loki via the existing UIS log pipeline. No special config needed.

Resource footprint (estimated)

Based on Dagster's published sizing guide and typical OSS deployments:

ComponentCPU RequestCPU LimitMemory RequestMemory LimitReplicas
Webserver100m500m256 Mi512 Mi1–2
Daemon100m500m512 Mi1 Gi1 (singleton)
User-code (per code location, e.g. atlas-data)100m500m256 Mi512 Mi1
Run pods100m1000m256 Mi1 Giephemeral

Baseline (webserver + daemon + one user-code + metadata in shared PG): ~1 Gi memory, 300m CPU. Within the UIS laptop profile.

Run pods add bursty load during pipeline execution; for Atlas's expected load (ingestion runs typically finish in seconds–minutes) this is modest.


How Atlas will use it

Atlas (the first consumer) has its own repo atlas-data containing:

  • TypeScript ingestion code (ingest/src/sources/*.ts) — one file per data source
  • dbt project (dbt/) — transformations from raw landing to serving tables
  • Dagster user code (dagster/atlas_data/) — @asset definitions that invoke the ingestion via Dagster Pipes and auto-load the dbt models as downstream assets

CI builds one Docker image with all three. UIS ArgoCD picks up the image tag and reconfigures Dagster's user-code deployment to load it.

Once deployed, the Dagster UI at dagster.sovereignsky.no (or equivalent) shows:

  • ~24 Atlas assets (one per data source) with status, last run, freshness
  • ~10–20 dbt models as downstream assets with lineage back to the sources
  • Schedules for each source (annual, monthly, daily depending on the upstream)
  • Sensors for SSB release-calendar polling

This is "Atlas's data platform" as seen from inside Dagster. A future second application on UIS contributes its own user-code image and adds its own asset group without touching Atlas's.

Reference: the full data journey for one Atlas source is documented in ~/learn/helpers/atlas/docs/stack/data-journey-ssb-08764.md.


Options

Option A: Dagster OSS + official Helm chart + shared Postgres + no auth (day 1)

Pros:

  • Matches the Backstage pattern already used in UIS
  • Low operational overhead; fewest moving parts
  • Reuses UIS PostgreSQL and observability
  • Rolls forward cleanly to Authentik OIDC later

Cons:

  • No auth on day 1 — must rely on network-level access control
  • One singleton daemon is a single point of failure (manageable; restarts are seconds)

Option B: Dagster OSS + official Helm chart + dedicated Postgres

Pros:

  • Metadata isolation from other UIS workloads
  • Can tune Postgres for Dagster-specific workload (high write rate during heavy runs)

Cons:

  • Extra service to operate for marginal benefit
  • Shared PG is already production-scale; Dagster metadata is modest

Option C: Dagster+ (SaaS)

Pros:

  • Zero install
  • Managed UI, hosted metadata
  • Branch deployments for testing

Cons:

  • Per-user pricing (~$800+/month for team tier)
  • Violates UIS's sovereignty principle — metadata and orchestration state leave the sovereign environment
  • Unavailable if Dagster Labs has an outage

Option D: Roll our own manifests (no Helm chart)

Pros:

  • No dependency on chart maintainer

Cons:

  • Reimplements ~1,500 lines of Helm chart
  • Higher maintenance burden on every Dagster version bump
  • No material benefit

Recommendation

Option A — Dagster OSS + official Helm chart + shared Postgres + no auth day 1, with Traefik + Authentik OIDC middleware added as a follow-up.

This mirrors the proven Backstage pattern, reuses every existing UIS component, and stays entirely within the sovereign infrastructure.


Next Steps

Following the PLANS.md guidance on splitting investigations into ordered plans:

  • PLAN-001-dagster-deployment.md — Deploy Dagster OSS via the official Helm chart into the dagster namespace, using shared UIS PostgreSQL, with Traefik ingress and no auth. Verify webserver reachable and daemon healthy.
  • PLAN-002-dagster-code-location-atlas.md — Register the atlas-data application as the first Dagster code location. Reload from webserver UI; verify assets visible.
  • PLAN-003-dagster-auth-authentik.md (optional, after Authentik is stable) — Put Dagster UI behind Traefik OIDC middleware.
  • PLAN-004-dagster-alerts.md (optional) — Wire Dagster alerts to Slack / email via webhooks.

Before writing the PLAN files, answer Questions 1–8 above (recommendations noted inline).


Files to Modify

Scoped to UIS repo:

  • manifests/###-dagster.yaml (TBD numbering per UIS conventions; likely ###–### range for orchestration category)
  • provision-host/uis/services/<dagster>/ — Ansible playbook, defaults, verification
  • website/docs/services/<appropriate-category>/dagster.md — service documentation page
  • templates/uis.secrets/dagster-secrets.yml — PG credentials, webhook secrets
  • Cross-link from the Atlas project's atlas-data/deploy/ (future — a separate PLAN will cover code-location registration)