Skip to main content

PLAN-008: Service Migration Status & Remaining Work

IMPLEMENTATION RULES: Before implementing this plan, read and follow:

Status: Backlog

Goal: Track migration status of all 26 UIS services and complete remaining work for services that are not fully migrated.

Last Updated: 2026-02-27 (password architecture fixed: orphaned defaults connected, email validation added)

Priority: Medium — core services work, remaining items are edge cases

Blocks: PLAN-004-secrets-cleanup — old path references in playbooks must be fixed before backwards compatibility code can be removed


Service Migration Status

All 26 services have service scripts (provision-host/uis/services/*/service-*.sh) and deploy playbooks. The table below tracks full migration status including remove playbooks, verified deployment, and legacy path dependencies.

Legend

  • Service script: provision-host/uis/services/<category>/service-<id>.sh — metadata for ./uis list, ./uis deploy, etc.
  • Deploy playbook: Ansible playbook for ./uis deploy <service>
  • Remove playbook: Ansible playbook for ./uis undeploy <service>
  • Verified: Service has been deployed and tested in the new UIS system
  • Old paths: ⚠️ = playbook still references topsecret/, secrets/, or cloud-init/

Core (000-029)

ServiceService ScriptDeployRemoveVerifiedNotes
whoami025-setup-whoami-testpod.yml✅ via -e operation=deleteSame playbook handles both deploy and remove via operation parameter
nginx020-setup-nginx.yml020-remove-nginx.ymlVerified in talk9.md

Monitoring (030-039)

ServiceService ScriptDeployRemoveVerifiedNotes
prometheus030-setup-prometheus.yml030-remove-prometheus.ymlVerified in talk9.md
tempo031-setup-tempo.yml031-remove-tempo.ymlVerified in talk9.md
loki032-setup-loki.yml032-remove-loki.ymlVerified in talk9.md
otel-collector033-setup-otel-collector.yml033-remove-otel-collector.ymlVerified in talk9.md
grafana034-setup-grafana.yml034-remove-grafana.ymlVerified in talk9.md

Databases (040-059)

ServiceService ScriptDeployRemoveVerifiedNotes
postgresql040-database-postgresql.yml040-remove-database-postgresql.ymlRequired by authentik
mysql040-database-mysql.yml040-remove-database-mysql.ymlVerified in talk9.md
mongodb040-setup-mongodb.yml040-remove-database-mongodb.ymlVerified in talk9.md
qdrant044-setup-qdrant.yml044-remove-qdrant.ymlVerified in talk9.md
redis050-setup-redis.yml050-remove-redis.ymlRequired by authentik

Search (060-069)

ServiceService ScriptDeployRemoveVerifiedNotes
elasticsearch060-setup-elasticsearch.yml060-remove-elasticsearch.ymlVerified in talk9.md

Authentication (070-079)

ServiceService ScriptDeployRemoveVerifiedNotes
authentik070-setup-authentik.yml070-remove-authentik.ymlFully tested with 5 E2E auth tests (PLAN-007)

Queues (080-089)

ServiceService ScriptDeployRemoveVerifiedNotes
rabbitmq080-setup-rabbitmq.yml080-remove-rabbitmq.ymlVerified in talk9.md

Management (090, 220+)

ServiceService ScriptDeployRemoveVerifiedNotes
gravitee090-setup-gravitee.yml❌ MissingWas not working before migration. Needs new setup — deploy playbook may need rewrite
argocd220-setup-argocd.yml220-remove-argocd.ymlVerified in talk9.md
pgadmin641-adm-pgadmin.yml641-remove-pgadmin.ymlVerified in talk10.md. Auto-login TODO (pgpass works but pgAdmin ignores it)
redisinsight651-adm-redisinsight.yml651-remove-redisinsight.ymlVerified in talk10.md

AI (200-219)

ServiceService ScriptDeployRemoveVerifiedNotes
openwebui200-setup-open-webui.yml200-remove-open-webui.ymlVerified in talk9.md
litellm210-setup-litellm.yml210-remove-litellm.ymlVerified in talk9.md

Data Science (320-350)

ServiceService ScriptDeployRemoveVerifiedNotes
unity-catalog320-setup-unity-catalog.yml320-remove-unity-catalog.ymlVerified in talk9.md. Fixed: wrong image, security context, API version, no curl
spark330-setup-spark.yml330-remove-spark.ymlVerified in talk9.md
jupyterhub350-setup-jupyterhub.yml350-remove-jupyterhub.ymlVerified in talk9.md

Network (800+)

ServiceService ScriptDeployRemoveVerifiedNotes
tailscale-tunnel802-deploy-network-tailscale-tunnel.yml801-remove-network-tailscale-tunnel.ymlFully verified in PLAN-009/010/011. CLI: uis tailscale expose/unexpose/verify
cloudflare-tunnel820-deploy-network-cloudflare-tunnel.yml821-remove-network-cloudflare-tunnel.ymlFully verified: deploy, undeploy, E2E connectivity (PLAN-cloudflare-tunnel-undeploy)

Summary

CategoryTotalVerifiedIssues
Core22None
Monitoring55None
Databases55None
Search11None
Authentication11None
Queues11None
Management43gravitee broken before migration
AI22None
Data Science33None
Network22None
Total26251 not verified (gravitee broken before migration)

Automated Integration Test (PLAN-013)

./uis test-all automates deploy/undeploy for all 23 testable services (47 operations). First run: 47/47 PASS in 38m 40s. Also supports --dry-run and --clean.

Playbooks with Old Path References (2026-02-18 scan)

Scanned all playbooks in ansible/playbooks/ for references to topsecret/, secrets/, and cloud-init/:

PlaybookLineReferenceImpactFixed
01-configure_provision-host.yml30ansible/secrets/id_rsa_ansible.secret-keyHardcoded old SSH key path✅ PR #35
350-setup-jupyterhub.yml65topsecret/kubernetes/kubernetes-secrets.ymlBreaks if topsecret/ removed✅ PR #35
802-deploy-network-tailscale-tunnel.yml193-194topsecret/kubernetes/kubernetes-secrets.ymlError message text only✅ PR #35

All old path references in playbooks are now fixed. Also fixed: ansible/ansible.cfg and provision-host/provision-host-vm-create.sh (PR #35).

Topsecret Cleanup Beyond Playbooks (talk11)

In addition to the playbook fixes above (PR #35), a full topsecret cleanup was completed across the codebase:

CategoryFiles ChangedWhat Changed
Legacy scripts7Removed topsecret/ and secrets/ fallback paths from networking scripts and provision-host container creation
Config files3.github/workflows/build-uis-container.yml, .dockerignore, manifests/220-litellm-config.yaml
Documentation17All docs updated from topsecret/kubernetes/kubernetes-secrets.yml to .uis.secrets/generated/kubernetes/kubernetes-secrets.yml
Deleted obsolete scripts2install-rancher.sh and copy2provisionhost.sh — fully replaced by ./uis CLI
Docs updated for script removal15+All references to deleted scripts replaced with ./uis start, ./uis provision, ./uis shell

Remaining topsecret references: Only in 7 remote deployment target scripts (provision-host/provision-host-vm-create.sh, hosts/azure-aks/ scripts). These are deferred until remote deployment targets are tested with real infrastructure.

New Reference Documentation (talk11)

  • Created website/docs/reference/factory-reset.md — user-facing runbook for factory reset, recovery, service deployment order, and verification checklist. Extracted from INVESTIGATE-rancher-reset findings.

Completed Investigations Closed (talk11)

  • INVESTIGATE-rancher-reset-and-full-verification.md → moved to completed/
  • INVESTIGATE-unity-catalog-crashloop.md → moved to completed/

Password Architecture Fix (PR #44)

Fixed default-secrets.env single-source-of-truth pattern — 8 of 11 DEFAULT_ variables were orphaned (never applied to templates). See PLAN-fix-password-architecture.

WhatChange
Removed redundant variables4 removed from default-secrets.env (DEFAULT_DATABASE_ROOT_PASSWORD, DEFAULT_POSTGRES_PASSWORD, DEFAULT_MONGODB_ROOT_PASSWORD, DEFAULT_AUTHENTIK_BOOTSTRAP_EMAIL)
Connected orphaned defaultsExtended sed replacements from 5→8 in first-run.sh
Removed hardcoded credentialsReplaced in 00-common-values.env.template and 00-master-secrets.yml.template
Email consolidationRemoved 2 orphaned email variables, kept single DEFAULT_ADMIN_EMAIL
ValidationExtended from 3→7 variables, added email format check, weak-password detection
Self-healing initFixed bug where fresh .uis.secrets/ didn't get templates when .uis.extend/ already existed

Tested: postgresql, redis, pgadmin, authentik, openwebui — all deploy/undeploy clean with correct credentials.


Phase 1: Quick Fixes (metadata)

Tasks

  • 1.1 Fix ArgoCD: set SCRIPT_REMOVE_PLAYBOOK="220-remove-argocd.yml" in service-argocd.sh ✓ (PLAN-argocd-migration)

Phase 2: Fix Old Path References in Playbooks

These must be fixed before PLAN-004-secrets-cleanup can remove backwards compatibility.

Tasks

  • 2.1 Fix 350-setup-jupyterhub.yml line 65: replace topsecret/kubernetes/kubernetes-secrets.yml with new .uis.secrets/ path ✓ (PR #35)
  • 2.2 Fix 01-configure_provision-host.yml line 30: replace ansible/secrets/id_rsa_ansible.secret-key with new path ✓ (PR #35)
  • 2.3 Fix 802-deploy-network-tailscale-tunnel.yml lines 193-194: update error message text to reference .uis.secrets/ ✓ (PR #35)
  • 2.4 Fix ansible/ansible.cfg: update private_key_file to new .uis.secrets/ path ✓ (PR #35)
  • 2.5 Fix provision-host/provision-host-vm-create.sh: update SSH key copy destination, remove legacy fallback ✓ (PR #35)

Phase 3: Missing Remove Playbooks

Tasks

  • 3.1 Create 801-remove-network-tailscale-tunnel.yml — tear down Tailscale tunnel deployment and namespace ✓ (PLAN-009)
  • 3.2 Create 821-remove-network-cloudflare-tunnel.yml — tear down Cloudflare tunnel deployment ✓ (PLAN-cloudflare-tunnel-undeploy, PR #43)
  • 3.3 Update service-tailscale-tunnel.sh with SCRIPT_REMOVE_PLAYBOOK ✓ (PLAN-009)
  • 3.4 service-cloudflare-tunnel.sh already had SCRIPT_REMOVE_PLAYBOOK set ✓

Phase 4: Gravitee (New Setup)

Gravitee was not working before the migration. This is effectively a fresh setup, not a migration.

Tasks

  • 4.1 Investigate current state of 090-setup-gravitee.yml — does it work at all?
  • 4.2 If broken, rewrite the deploy playbook or disable the service
  • 4.3 Create 090-remove-gravitee.yml
  • 4.4 Test deploy and remove cycle

Phase 5: Deployment Verification — COMPLETE

23/26 services verified. All deploy and undeploy cleanly.

Automated testing (PLAN-013): ./uis test-all runs 47 operations (deploy + undeploy + verify) for all 23 services. First full automated run: 47/47 PASS in 38m 40s (2026-02-26).

Service dependency fixes during PLAN-013:

  • service-otel-collector.sh: Added SCRIPT_REQUIRES="prometheus loki tempo" (E2E needs backends)
  • service-grafana.sh: Added SCRIPT_REQUIRES="prometheus loki tempo otel-collector" (E2E sends data via OTEL)

Tasks

  • 5.1 Verify monitoring stack: prometheus, grafana, loki, tempo, otel-collector ✓ (talk9.md)
  • 5.2 Verify databases: mysql, mongodb, qdrant ✓ (talk9.md)
  • 5.3 Verify AI stack: openwebui, litellm ✓ (talk9.md)
  • 5.4 Verify data science stack: jupyterhub, spark, unity-catalog ✓ (talk9.md)
  • 5.5 Verify other: nginx, elasticsearch, rabbitmq ✓ (talk9.md)
  • 5.6 Verify management: pgadmin, redisinsight ✓ (talk10.md)
  • 5.7 Verify tailscale-tunnel ✓ (PLAN-009/010/011 — 12+ rounds of testing)
  • 5.8 Verify cloudflare-tunnel ✓ (PLAN-cloudflare-tunnel-undeploy — deploy, undeploy, E2E connectivity all passed)

Skipped: gravitee (broken before migration).


Files to Modify

FileChange
provision-host/uis/services/management/service-argocd.sh✅ Done — Add SCRIPT_REMOVE_PLAYBOOK
provision-host/uis/services/network/service-tailscale-tunnel.sh✅ Done — SCRIPT_REMOVE_PLAYBOOK added (PLAN-009)
provision-host/uis/services/network/service-cloudflare-tunnel.sh✅ Done — Already had SCRIPT_REMOVE_PLAYBOOK
ansible/playbooks/350-setup-jupyterhub.yml✅ Done — Replace hardcoded topsecret/ path (PR #35)
ansible/playbooks/01-configure_provision-host.yml✅ Done — Replace hardcoded secrets/ SSH key path (PR #35)
ansible/playbooks/802-deploy-network-tailscale-tunnel.yml✅ Done — Update error message text (PR #35)

Files to Create

FilePurpose
ansible/playbooks/801-remove-network-tailscale-tunnel.yml✅ Done (PLAN-009)
ansible/playbooks/821-remove-network-cloudflare-tunnel.yml✅ Done (PLAN-cloudflare-tunnel-undeploy, PR #43)
ansible/playbooks/090-remove-gravitee.ymlGravitee removal (if service is fixed)