Skip to main content

INVESTIGATE: Version Pinning for Helm Charts and Container Images

IMPLEMENTATION RULES: Before implementing this plan, read and follow:

Created: 2026-02-27 Status: Backlog Related: INVESTIGATE-system-service-version-metadata — different angle on versions (in-service display metadata rather than upstream chart/image pins). The two can be designed together: a pinned chart version and a service-script declared version are the same fact from two surfaces.

Problem Statement

Everything works today, but 18 of 21 Helm charts and several container images have no version pinning. Any upstream release — intentional or accidental — can break the system without warning. A single ./uis deploy could pull a new chart version with breaking changes.


Current State

Helm Charts — Version Pinning Status

ServiceChartVersionStatus
argocdargo/argo-cd7.8.26PINNED
graviteegraviteeio/apim4.8.4PINNED
authentikauthentik/authentik2025.8.1PINNED
prometheusprometheus-community/prometheusUNPINNED
tempografana/tempoUNPINNED
lokigrafana/lokiUNPINNED
otel-collectoropen-telemetry/opentelemetry-collectorUNPINNED
grafanagrafana/grafanaUNPINNED
postgresqlbitnami/postgresqlUNPINNED
redisbitnami/redisUNPINNED
rabbitmqbitnami/rabbitmqUNPINNED
elasticsearchelastic/elasticsearch9.3.0PINNED
qdrantqdrant/qdrantUNPINNED
tikatika/tikaUNPINNED
open-webuiopen-webui/open-webuiUNPINNED
litellmoci://ghcr.io/berriai/litellm-helmUNPINNED
sparkspark-kubernetes-operator/spark-kubernetes-operatorUNPINNED
jupyterhubjupyterhub/jupyterhubUNPINNED
pgadminrunix/pgadmin4UNPINNED
redisinsightredisinsight/redisinsightUNPINNED
openmetadataopen-metadata/openmetadata1.12.1PINNED
mysql(manifest, no helm)N/A

Summary: 5 pinned, 17 unpinned out of 22 Helm charts.

Container Images — Version Pinning Status

Images explicitly set in manifests or config files:

ServiceImageTagStatus
whoamitraefik/whoamiv1.10.2PINNED
mongodbmongo8.0.5PINNED
rabbitmqbitnamilegacy/rabbitmq3.13.7-debian-12-r5PINNED
tikaapache/tika3.0.0.0PINNED
elasticsearchdocker.elastic.co/elasticsearch/elasticsearch9.3.0PINNED
openmetadatadocker.getcollate.io/openmetadata/server1.12.1PINNED
redisredis7.4FLOATING (minor)
mysqlmysql8.0FLOATING (minor)
postgresqlghcr.io/terchris/urbalurba-postgresqllatestUNPINNED
unity-catalogunitycatalog/unitycataloglatestUNPINNED
cloudflare-tunnelcloudflare/cloudflaredlatestUNPINNED
pgadmin initbusyboxlatestUNPINNED

Images controlled by Helm chart (not explicitly set in our config — chart decides):

  • prometheus, grafana, tempo, loki, otel-collector, qdrant, open-webui, litellm, spark, jupyterhub, pgadmin, redisinsight, authentik, argocd

Questions to Investigate

Q1: What is the right pinning strategy?

Options:

  • Pin everything — maximum stability, requires manual updates
  • Pin Helm charts only — charts control image versions, so pinning charts is sufficient
  • Pin charts + explicit images — pin what we control, let pinned charts manage their own images

Q2: Where should versions live?

Options:

  • In each playbookchart_version parameter in ansible helm tasks (current pattern for argocd/gravitee/authentik)
  • In a central versions file — single file listing all versions, sourced by playbooks
  • In config manifests — alongside other service config in manifests/*-config.yaml

Q3: How do we handle updates?

Options:

  • Manual — developer checks for updates periodically, updates versions, tests
  • Automated detection — script/CI that checks for newer versions and reports
  • Dependabot/Renovate — GitHub-native dependency update PRs

Q4: Helm repos — RESOLVED

05-install-helm-repos.yml was the original approach. The current pattern is that each playbook manages its own helm repo. The 2 repos still in 05-install-helm-repos.yml (bitnami, runix) are legacy — they should move into the playbooks that use them. No further investigation needed.

Q5: Bitnami subscription changes

Bitnami changed their distribution model (Aug 2025). RabbitMQ already uses bitnamilegacy image. Are other Bitnami charts affected? Will future updates break?


Helm Repos Inventory

RepositoryURLWhere Added
bitnamihttps://charts.bitnami.com/bitnami05-install-helm-repos.yml
runixhttps://helm.runix.net05-install-helm-repos.yml
graviteeiohttps://helm.gravitee.io090-setup-gravitee.yml
prometheus-communityhttps://prometheus-community.github.io/helm-charts030-setup-prometheus.yml
grafanahttps://grafana.github.io/helm-chartsmultiple playbooks
open-telemetryhttps://open-telemetry.github.io/opentelemetry-helm-charts033-setup-otel-collector.yml
argohttps://argoproj.github.io/argo-helm220-setup-argocd.yml
elastichttps://helm.elastic.co060-setup-elasticsearch.yml
qdranthttps://qdrant.github.io/qdrant-helm044-setup-qdrant.yml
open-webuihttps://helm.openwebui.com/200-setup-open-webui.yml
jupyterhubhttps://hub.jupyter.org/helm-chart/350-setup-jupyterhub.yml
authentikhttps://charts.goauthentik.io070-setup-authentik.yml
redisinsighthttps://mrnim94.github.io/redisinsight/651-adm-redisinsight.yml
spark-kubernetes-operatorhttps://apache.github.io/spark-kubernetes-operator330-setup-spark.yml
open-metadatahttps://open-metadata.github.io/openmetadata-helm-charts/340-setup-openmetadata.yml

Risk Assessment

High risk (unpinned chart + critical service):

  • postgresql (all data services depend on it)
  • redis (authentik depends on it)
  • elasticsearch

Medium risk (unpinned chart + important service):

  • grafana, prometheus, loki, tempo, otel-collector (observability stack)
  • open-webui, litellm (AI stack)
  • jupyterhub, spark (data science stack)

Lower risk (unpinned chart + admin/utility):

  • pgadmin, redisinsight, qdrant, tika

:latest images (highest breakage risk):

  • postgresql (custom image — we control this)
  • unity-catalog
  • cloudflare-tunnel
  • busybox (pgadmin init container)

Next Step

Investigate the questions above, then create a PLAN with a phased approach to pin versions across all services.