PLAN: per-instance rows in ./uis status + ./uis list for multi-instance services
IMPLEMENTATION RULES: Before implementing this plan, read and follow:
- WORKFLOW.md - The implementation process
- PLANS.md - Plan structure and best practices
Status: Completed (talk56 R1-R7 all PASS after R4 fix)
Goal: Make multi-instance service deployments individually visible in ./uis status and ./uis list. After this PLAN ships, deploying postgrest --app atlas + postgrest --app railway produces two rows in the status table (atlas-postgrest, railway-postgrest) instead of a single binary postgrest ✅ Healthy row, so the user can identify each instance by its Kubernetes Service name — the same string they need for ./uis network expose tailscale <name>.
Last Updated: 2026-05-15
Investigation: INVESTIGATE-cli-status-multi-instance — all 5 open questions answered; C-1 through C-8 locked.
Prerequisites: None — independent of any in-flight work. Touches provision-host/uis/lib/ + provision-host/uis/manage/uis-cli.sh only.
Priority: Medium — operationally useful for the customer-onboarding flow (per INVESTIGATE-docs-customer-onboarding-database), no production-blocking impact.
Problem Summary
cmd_status and cmd_list both iterate the service registry and run each service's SCRIPT_CHECK_COMMAND. For multi-instance services like postgrest, that check is a single binary "is any matching deployment ready?" — collapsing N deployed instances into one row. The user cannot see from the status output how many instances exist, which apps own them, or what Kubernetes Service name to type into ./uis network expose tailscale <name>.
The fix lands per-instance iteration for services flagged SCRIPT_MULTI_INSTANCE="true", using the Kubernetes deployment name (<app>-<service>) as the row ID — the actionable identifier in every downstream CLI.
See the INVESTIGATE for evidence, root-cause confirmation, and the rejected alternatives.
Phase 1: Helper consolidation (C-1)
Today there are two functionally identical helpers checking multiInstance on a service's services.json entry: _is_service_multi_instance (in provision-host/uis/lib/service-deployment.sh:77) and _is_multi_instance (in provision-host/uis/lib/configure.sh:42). Consolidate into one canonical helper before touching status/list code, so the iteration logic in later phases has exactly one source of truth.
Tasks
- 1.1 Delete
_is_multi_instancefromprovision-host/uis/lib/configure.sh:42-52. - 1.2 Update the call site at
provision-host/uis/lib/configure.sh:216from_is_multi_instance "$service_id"to_is_service_multi_instance "$service_id". - 1.3 Ensure
lib/configure.shsourceslib/service-deployment.sh(or that both are loaded by the same caller). Checkprovision-host/uis/manage/uis-cli.shto confirm load order; ifconfigure.shis loaded beforeservice-deployment.sh, swap them so_is_service_multi_instanceis defined whenconfigure.sh:216runs.
Validation
bash -n provision-host/uis/lib/configure.sh
bash -n provision-host/uis/lib/service-deployment.sh
bash provision-host/uis/tests/run-tests.sh
grep -nR "_is_multi_instance\b" provision-host/uis/ # expect: no matches
grep -nR "_is_service_multi_instance\b" provision-host/uis/ # expect: 1 def + ≥3 callers
User confirms phase is complete.
Phase 2: Per-instance iteration helper (C-2)
Add a new helper in lib/service-scanner.sh (next to check_service_deployed) that lists the actual Kubernetes Deployments backing a multi-instance service. The helper is a display-side override — only called by cmd_status and cmd_list for rendering rows. SCRIPT_CHECK_COMMAND on the service script and the existing check_service_deployed path are unchanged (deploy/undeploy/dep-check paths still use them).
Tasks
-
2.1 Add
get_multi_instance_deployments <service_id>toprovision-host/uis/lib/service-scanner.sh. ReadsSCRIPT_NAMESPACEandSCRIPT_IDfrom the service script (same source-parse pattern ascheck_service_deployed). Runs:kubectl get deploy -n "$SCRIPT_NAMESPACE" -l "app.kubernetes.io/name=$SCRIPT_ID" --no-headers 2>/dev/nullEmits one tab-separated line per deployment:
<name>\t<ready>(e.g.,atlas-postgrest\t2/2). Returns 0 on success (including the zero-row case); returns non-zero only on internal failure (e.g., service script not found). -
2.2 Add a tiny health-classifier helper
_classify_ready_count <ready>(also inlib/service-scanner.sh) that returns:0(healthy) iff input matches^([1-9][0-9]*)/\1$1(degraded) iff input matches^[0-9]+/[0-9]+$but not the healthy regex2(unknown — kubectl returned an unexpected shape) otherwise
Used by both
cmd_statusandcmd_listto decide what icon to print. -
2.3 Document the helpers' contract in the file header comment of
service-scanner.sh(which already documentscheck_service_deployedandget_all_service_ids).
Validation
bash -n provision-host/uis/lib/service-scanner.sh
bash provision-host/uis/tests/run-tests.sh
# Manual smoke (inside provision-host container with a deployed postgrest):
source provision-host/uis/lib/service-scanner.sh
source provision-host/uis/lib/integration-testing.sh # for SERVICES_DIR
get_multi_instance_deployments postgrest
# Expected output for two-app deploy:
# atlas-postgrest 2/2
# railway-postgrest 2/2
User confirms phase is complete.
Phase 3: cmd_status integration (C-3, partial C-4)
Wire the new helper into cmd_status so multi-instance services emit per-instance rows. Single-instance services keep their existing path — no change to single-instance behaviour.
Tasks
-
3.1 In
provision-host/uis/manage/uis-cli.sh:262(cmd_status), after thesource "$script"line that loads service metadata, branch on_is_service_multi_instance "$service_id":- single-instance (today's path, unchanged): run
check_service_deployed, emit one row withSCRIPT_ID. - multi-instance (new path): call
get_multi_instance_deployments "$service_id", iterate the tab-separated output, classify each row's ready count, and emit one row per healthy deployment using the deployment name as the ID:printf "%-18s %-20s %-12s %s\n" "$deployment_name" "${SCRIPT_NAME:0:20}" "${SCRIPT_CATEGORY:0:12}" "✅ Healthy" - Skip degraded deployments and the zero-row case — matches
cmd_status's today-behaviour of "only show healthy services."
- single-instance (today's path, unchanged): run
-
3.2 Bump the ID column width from
%-15sto%-18s(in both the header line at line 275 and the row print at line 291).atlas-postgrest(15 chars) andrailway-postgrest(17 chars) fit cleanly at 18; provides headroom for typical<app>-<service>names. Update the underline separator on line 276 to match the new width if it depends on the format. -
3.3 Verify
has_deployedflag still flips totruewhen any multi-instance row is emitted, so the "No deployed services found" fallback doesn't fire incorrectly.
Validation
bash -n provision-host/uis/manage/uis-cli.sh
bash provision-host/uis/tests/run-tests.sh
# Manual smoke inside the container with a fresh build:
./uis stop && ./uis build && ./uis pull # (contributor side — tester runs the actual deploys later)
User confirms phase is complete (visual review of the status output format on a local cluster).
Phase 4: cmd_list integration (C-4 + degraded/zero cases per C-2 table)
Same iteration helper, different presentation policy. cmd_list always emits a row for every service in the registry, so the multi-instance path needs explicit handling for the degraded and zero-instance cases.
Tasks
-
4.1 In
provision-host/uis/manage/uis-cli.sh:195(cmd_list), in the per-service block (currently around lines 234-244), branch on_is_service_multi_instance "$service_id":- single-instance (today's path, unchanged): existing
check_service_deployed→ emit✅ Deployed/❌ Not deployed/○ No check. - multi-instance (new path): call
get_multi_instance_deployments "$service_id", iterate the output:- For each row classified healthy (
2/2): emit one row with deployment name as ID, status✅ Deployed. - For each row classified degraded (
1/2): emit one row with deployment name as ID, status⚠ Degraded (<ready>/<replicas>). - If the helper returned zero rows: emit one row with
$SCRIPT_IDas ID, status❌ Not deployed(so the service-type stays visible in the registry).
- For each row classified healthy (
- single-instance (today's path, unchanged): existing
-
4.2 Reuse the same
%-18scolumn width bump from 3.2.
Validation
bash -n provision-host/uis/manage/uis-cli.sh
bash provision-host/uis/tests/run-tests.sh
User confirms phase is complete (visual review of the list output on a local cluster: deployed instance, degraded instance, undeployed service-type all render correctly).
Phase 5: Tests (C-8)
Add a focused static test for the helper output parsing. Integration coverage is deferred to tester verification.
Tasks
-
5.1 Add
provision-host/uis/tests/static/test-multi-instance-parsing.sh. Tests:_classify_ready_count "2/2"→ 0 (healthy)_classify_ready_count "1/2"→ 1 (degraded)_classify_ready_count "0/2"→ 1 (degraded — zero replicas ready is degraded, not unknown)_classify_ready_count "0/0"→ 1 (degraded — counts as not-fully-ready)_classify_ready_count ""→ 2 (unknown)_classify_ready_count "garbage"→ 2 (unknown)- A sample kubectl-output fixture (saved as a heredoc in the test) parses into the expected tab-separated rows.
-
5.2 Wire the new test into
provision-host/uis/tests/run-tests.shif it doesn't auto-discoverstatic/test-*.shfiles (check current discovery behaviour first).
Validation
bash provision-host/uis/tests/static/test-multi-instance-parsing.sh # explicit run
bash provision-host/uis/tests/run-tests.sh # full suite
User confirms tests pass.
Phase 6: Local verification + build for tester fast-loop
- 6.1
bash -nclean on all touched files:service-scanner.sh,service-deployment.sh,configure.sh,uis-cli.sh, the new test. - 6.2
bash provision-host/uis/tests/run-tests.sh— all test scripts pass. - 6.3
cd website && npm run build—[SUCCESS](no docs touched in this PLAN, but build catches any accidental sidebar / markdown breakage). - 6.4 Build the local image for tester consumption: run
./uis buildfrom the repo root. Producesuis-provision-host:localon the local Docker daemon — the same daemon the tester's./uisinvocations talk to. This is the fast-loop pattern: tester runsUIS_IMAGE=uis-provision-host:local ./uis ...and sees this PLAN's code immediately, no GHCR wait. - 6.5 Quick contributor-side smoke (optional, separate from the tester's round): with the running container swapped to
:local, run./uis status+./uis listagainst the contributor's rancher-desktop cluster — quick sanity check that startup doesn't error.
Validation
User confirms phase is complete (./uis build finishes clean; docker images uis-provision-host:local shows the new image with a recent timestamp).
Phase 7: Tester verification round (on uis-provision-host:local, not :latest)
A talk round against the tester's rancher-desktop cluster covering the deploy → expose flow, running against the locally-built image from Phase 6.4 — no GHCR rebuild wait.
Tasks
-
7.1 Archive current
testing/uis1/talk/talk.md→talkNN.mdper the talk.md naming protocol. -
7.2 Write a fresh
talk.mdfor this round. Brief covers:-
Pre-flight (local-build fast-loop):
# Confirm the contributor's locally-built image is on this Docker daemon:
docker images ghcr.io/helpers-no/uis-provision-host uis-provision-host
# Expect: a `uis-provision-host:local` tag with a recent timestamp.
# Stop the running container and start it on the local image:
./uis stop
UIS_IMAGE=uis-provision-host:local ./uis start
# All subsequent commands in this round must run with the same env var
# (the wrapper passes UIS_IMAGE through to docker exec).Do NOT
./uis pullfor this round —:lateston GHCR is stale until this PLAN merges; we're testing the locally-built image against the cluster. -
R1 — Deploy two postgrest instances:
UIS_IMAGE=uis-provision-host:local ./uis configure postgrest --app atlas --database atlas --schemas api_v1+UIS_IMAGE=... ./uis deploy postgrest --app atlas. Repeat for--app railway. Verify both deployments come up. -
R2 —
./uis statusshows two rows: confirm output containsatlas-postgrest ... ✅ Healthyandrailway-postgrest ... ✅ Healthyon separate rows. Column alignment holds. -
R3 —
./uis listshows the same two rows under INTEGRATION. Single-instance services (postgresql, nginx) still render unchanged. -
R4 — Degraded case:
kubectl scale deploy -n postgrest atlas-postgrest --replicas=0. Verify./uis listshowsatlas-postgrest ... ⚠ Degraded (0/0)or0/2;./uis statusshows noatlas-postgrestrow (onlyrailway-postgrestif it's healthy). Restore with--replicas=2. -
R5 — Zero-instance case:
./uis undeploy postgrest --app atlas+./uis undeploy postgrest --app railway. Verify./uis listshows a singlepostgrest ... ❌ Not deployedrow for the service-type;./uis statusshows no postgrest row. -
R6 — Single-instance regression:
./uis status+./uis liststill render single-instance services (nginx, traefik, postgresql) exactly as before this PLAN. No drift. -
R7 — Tailscale expose flow (proves the team-share use case the INVESTIGATE motivated): deploy two postgrest instances; run
./uis status; pickrailway-postgrestfrom the output; run./uis network expose tailscale railway-postgrest. Verify it works end-to-end against the public Funnel URL.
-
-
7.3 Iterate on findings as small follow-up commits on the same branch if R1–R7 surface anything. The contributor re-runs
./uis buildafter each commit; tester's next round picks up the new:localimage immediately (the wrapper re-creates the container onstart).
Validation
Tester closes the round with all R1–R7 PASS. Any FAIL findings filed as F-findings in the talk.md and resolved on the same branch before this PLAN merges. After merge, the GHCR Build UIS Container workflow produces a fresh :latest carrying the verified code — no separate re-verification round needed since :local and the post-merge :latest are built from the same commit.
Acceptance Criteria
-
./uis statusshows one row per healthy multi-instance deployment, with the deployment name (e.g.,atlas-postgrest) as the row ID. -
./uis statusskips degraded multi-instance deployments (consistent with today's single-instance "check failed → no row" behaviour). -
./uis listshows one row per multi-instance deployment, with explicit✅ Deployed/⚠ Degraded/❌ Not deployed (service-type)states per the C-2 table. - Single-instance services in both commands render exactly as they did before this PLAN — no drift.
-
_is_service_multi_instanceis the single canonical helper;_is_multi_instanceis deleted. -
SCRIPT_CHECK_COMMANDonservice-postgrest.shis unchanged;check_service_deployedbehaviour for deploy/undeploy/dep-check paths is unchanged. - Local
bash -n+bash provision-host/uis/tests/run-tests.sh+cd website && npm run buildall pass. - Tester round R1–R7 closes PASS.
- This plan is in
completed/.
Files to Modify
provision-host/uis/lib/service-scanner.sh— addget_multi_instance_deployments+_classify_ready_count; update header doc.provision-host/uis/lib/configure.sh— delete_is_multi_instance(lines 42-52); update call site at line 216.provision-host/uis/lib/service-deployment.sh— no functional change (existing_is_service_multi_instancestays canonical).provision-host/uis/manage/uis-cli.sh— branch on multi-instance incmd_status(line 262) andcmd_list(line 195); bump ID column width to%-18s.provision-host/uis/tests/static/test-multi-instance-parsing.sh— new test (Phase 5).testing/uis1/talk/talk.md— fresh round per Phase 7 (handled at PLAN-execution time, not in the code PR).
Implementation Notes
-
Helper placement.
get_multi_instance_deploymentsbelongs inlib/service-scanner.shalongsidecheck_service_deployedbecause both are "introspect a service against the current cluster" primitives. Keep them together so future readers see the single/multi pair as the two display paths. -
Reading service metadata.
check_service_deployedparsesSCRIPT_CHECK_COMMANDbywhile IFS= read -r lineline-scanning the service script — it avoidssource-ing the script to skirt side-effects. The newget_multi_instance_deploymentsneedsSCRIPT_NAMESPACEandSCRIPT_ID, which are also simple=-assignments at the top of the script. Use the same line-scan pattern for consistency. (cmd_status itself does source the script, but that's already a known cost in the calling context.) -
Column width choice.
%-18scoversrailway-postgrest(17) with one char of headroom. Longer<app>-<service>names will still overflow without truncation (bashprintfminimum-width semantics). If the PLAN's tester round surfaces a real name that's wider, the format is easy to bump again; don't pre-optimise. -
Degraded vs unknown ready-count. A deployment that shows
0/0happens transiently right after a scale-to-zero or a fresh apply. Treating0/0as "degraded" (rather than "unknown") matches the operator's intent — "I deployed this and it isn't ready." Only truly malformed kubectl output (<no value>, garbage) maps to unknown. -
Don't add a new metadata field.
SCRIPT_MULTI_INSTANCE="true"+SCRIPT_NAMESPACEalready on the service script are sufficient. The PLAN keeps the contract narrow. -
Backwards compatibility. Documented in C-5 — scripts that grep
^postgrestwill no longer match after this lands. Document the migration in the PR body; the cost is acceptable since./uis statusis interactive, not script-driven. -
Per-app namespace future case (C-7). If a future multi-instance service ever needs to deploy each instance into its own namespace, the iteration in
get_multi_instance_deploymentswill need extending (it'd need to query across namespaces, or read namespace from a different field). Out of scope here; flagged so the contract doesn't get reused incorrectly. -
Local-build fast-loop for the tester round (Phase 7). Don't merge → wait 12+ min for the GHCR
Build UIS Containerworkflow → tell tester to./uis pull. Instead: run./uis buildlocally after Phase 6.5 passes; the resultinguis-provision-host:localimage is on the same Docker daemon the tester uses, so they runUIS_IMAGE=uis-provision-host:local ./uis ...and see this PLAN's code with zero CI wait. Iterations within the round (fix → re-build → tester re-tests) take seconds instead of minutes. After Phase 7 closes green, merge the PR; the GHCR rebuild then produces a:latestfrom the same commit the tester already verified, so no separate post-merge regression round is needed.