INVESTIGATE: ./uis status doesn't show multi-instance services
Status: Investigation complete — ready for PLAN
Created: 2026-05-14
Updated: 2026-05-15 (decisions locked in; root cause confirmed)
Surfaced by: talk53 F5 (Tailscale CLI port verification) — tester noticed railway-postgrest deployment running healthily but absent from ./uis status output.
Related to: INVESTIGATE-docs-customer-onboarding-database (the Railway customer onboarding flow that motivated multi-instance PostgREST in the first place), PLAN-002 / customer-onboarding work expanding the --app <name> pattern to more services.
Problem Statement
./uis status and ./uis list don't surface per-instance detail for multi-instance services. When a service like postgrest is deployed via the --app <name> pattern (e.g. ./uis deploy postgrest --app atlas + ./uis deploy postgrest --app railway), each app gets its own Kubernetes Deployment + Service in the shared postgrest namespace (atlas-postgrest, railway-postgrest). But the status/list commands collapse all of them into a single postgrest row — the user cannot see from the official "what's healthy" surface:
- How many instances are running.
- Which apps own them.
- What Kubernetes Service name to type into
./uis network expose tailscale <name>(this is the team-share use case: pick an instance from the status output and expose it via Tailscale Funnel).
See "Root cause (confirmed)" below for what the code is actually doing.
Symptom — talk53 evidence (original report, partially superseded)
The tester originally reported postgrest as missing entirely from the status output:
$ ./uis status
ID NAME CATEGORY HEALTH
nginx Nginx MANAGEMENT ✅ Healthy
whoami Whoami MANAGEMENT ✅ Healthy
postgresql PostgreSQL DATABASES ✅ Healthy
tailscale-tunnel Tailscale Tunnel NETWORKING ✅ Healthy
traefik Traefik NETWORKING ✅ Healthy
…while the deployments were demonstrably healthy:
$ kubectl -n postgrest get pods
NAME READY STATUS RESTARTS AGE
atlas-postgrest-b945447b5-shr5c 1/1 Running 0 8d
atlas-postgrest-b945447b5-wt6dp 1/1 Running 0 8d
railway-postgrest-7dc674c4f9-jk66h 1/1 Running 0 47h
railway-postgrest-7dc674c4f9-kn57x 1/1 Running 0 47h
$ kubectl get ingressroutes -A | grep postgrest
postgrest atlas-postgrest 8d
postgrest railway-postgrest 47h
$ curl -o /dev/null -w "%{http_code}\n" http://api-railway.localhost/
200
Reading the symptom against the current code (per "Root cause" below): the "no postgrest row at all" output likely reflects a pre-label state on the 8-day-old atlas-postgrest deployment (the app.kubernetes.io/name=postgrest label was added to the template at a date that may post-date that deployment). On a freshly-deployed cluster today, ./uis status would show a single postgrest ✅ Healthy row — and the gap shifts from "invisible" to "one binary row regardless of instance count." Either framing motivates the same fix.
Root cause (confirmed 2026-05-15)
cmd_status (in provision-host/uis/manage/uis-cli.sh:262) and cmd_list (line 195) both iterate get_all_service_ids() — which scans provision-host/uis/services/ and returns every SCRIPT_ID — and run each service's SCRIPT_CHECK_COMMAND. Postgrest IS in this iteration today.
The actual gap is not "postgrest is invisible." The actual gap is one row regardless of instance count:
- Postgrest's check command is
kubectl get deploy -n postgrest -l app.kubernetes.io/name=postgrest --no-headers | grep -qE '\s([1-9][0-9]*)/\1\s'. - This matches any deployment in the
postgrestnamespace carrying theapp.kubernetes.io/name=postgrestlabel. - When
atlas-postgrestandrailway-postgrestare both running, the check passes and./uis statusshows a singlepostgrest ✅ Healthyrow. - The user cannot see from the status output how many instances are running, which apps own them, or what Kubernetes Service name to type into
./uis network expose tailscale <name>.
The talk53 evidence ("no postgrest row at all") likely reflects a pre-label state on a deployment that's 8 days older than the current app.kubernetes.io/name=postgrest template — not a current-code defect. The principle this investigation solves is real regardless: multi-instance services need per-instance visibility, not a single binary-OK summary.
Why it matters
For solo development the gap is cosmetic — the user knows what they deployed and can kubectl directly.
For customer onboarding (per the INVESTIGATE-docs-customer-onboarding-database flow with the Railway customer) the gap is misleading in either failure mode:
- If postgrest appears as one binary
✅ Healthyrow (the today-on-fresh-deploy case), the novice can't tell whether theiratlas-postgrestis running, whetherrailway-postgrestis also there, or what string to type to expose theirs via Tailscale. The signal is technically accurate but operationally useless. - If postgrest doesn't appear at all (the talk53 case, attributable to a stale-label state), the novice assumes their deployment failed and starts debugging the wrong thing — same false-negative flavour as talk52 F4 ("Tailscale deploy reported FAILED but actually worked").
The fix lands per-instance visibility, eliminating both modes at once.
The --app <name> pattern is also the direction PLAN-002 / customer-onboarding work is expanding (likely to redis --app foo, future per-customer postgresql namespaces, etc.). Each new multi-instance service that adopts the pattern inherits the same gap unless we fix it at the framework level.
Fix candidates
1. Extend ./uis status + ./uis list to iterate multi-instance services ← CHOSEN
For each multi-instance service type (postgrest today; future redis, etc.), enumerate <app>-<service> deployments in the service's namespace and print each as its own row, using the Kubernetes deployment name as the row ID (so the same string can be typed into ./uis network expose tailscale ..., kubectl describe deploy -n <ns> ..., etc.):
ID NAME CATEGORY HEALTH
postgresql PostgreSQL DATABASES ✅ Healthy ← single-instance, unchanged
atlas-postgrest PostgREST INTEGRATION ✅ Healthy ← multi-instance row
railway-postgrest PostgREST INTEGRATION ✅ Healthy ← multi-instance row
- Pro: matches user mental model ("what's running?" → "I want to see every running thing")
- Pro: ID column is the actionable identifier — same string the user types into
./uis network expose tailscale <name>orkubectl describe deploy. Critical for the team-share flow where a user reads the status output, picks an instance, and exposes it via Tailscale Funnel. - Pro: rows sort naturally —
atlas-postgrestandrailway-postgrestcluster alphabetically. - Pro: single-instance services unchanged.
- Pro: same iteration helper feeds
./uis list(parallel gap fixed in the same PR).
2. Add ./uis status --apps opt-in flag — rejected
Default output unchanged; opt-in flag shows multi-instance variants. Rejected: novices won't discover the flag, so the discoverability gap stays the default. Defeats the false-negative-prevention principle.
3. Single-line summary at the bottom — rejected
Under the per-service table, add a "Multi-instance:" line listing instance counts. Rejected: instances don't get health-state info, so it's not actually "status." Half-measure.
Open questions — answered
-
Where does multi-instance metadata live? ✅ Already exists:
SCRIPT_MULTI_INSTANCE="true"on the service script (e.g.provision-host/uis/services/integration/service-postgrest.sh:32).- Propagates to
services.jsonasmultiInstance: true. - Two helpers query it today:
_is_service_multi_instance(provision-host/uis/lib/service-deployment.sh:77) and_is_multi_instance(provision-host/uis/lib/configure.sh:42). They're functionally identical — the PLAN consolidates these into one canonical helper (_is_service_multi_instance) and updates both call sites. - No new fields needed.
-
Health-check command per instance. ✅ The existing
SCRIPT_CHECK_COMMANDstays as-is on multi-instance service scripts; it's still used bycheck_service_deployedfromlib/service-scanner.sh:106, which is called by deploy/undeploy/dep-check code paths inlib/service-deployment.sh:199, 297, 340. Changing it would have wider blast radius than this PLAN wants.For status/list display only, the runtime composes a per-instance check by listing the actual deployments and parsing kubectl's standard output:
kubectl get deploy -n "$SCRIPT_NAMESPACE" -l "app.kubernetes.io/name=$SCRIPT_ID" --no-headersOutput (one row per deployment):
atlas-postgrest 2/2 2 2 8d
railway-postgrest 2/2 2 2 47hParse column 1 (
NAME— already<app>-<service>by deployment naming convention; this becomes the row ID) and column 2 (READY, e.g.2/2). A deployment is healthy iff column 2 matches^([1-9][0-9]*)/\1$— same regex shape postgrest's currentSCRIPT_CHECK_COMMANDuses, just applied per row. -
./uis listparity. ✅ YES — same gap. Bothcmd_statusandcmd_listiterateget_all_service_ids()and runSCRIPT_CHECK_COMMAND. Fix bundles both; same iteration helper feeds both commands. -
Naming in output. ✅ Use the Kubernetes deployment/Service name (
atlas-postgrest) as the ID column. NAME column =$SCRIPT_NAMEonly (no(atlas)parenthetical — the app discriminator is already in the ID).- Rationale: the ID is the actionable identifier — same string a user types into
./uis network expose tailscale <name>,kubectl describe deploy -n postgrest <name>, etc. The team-share flow ("see what's running → expose one via Tailscale Funnel") depends on this string being visible. - Asymmetry-with-other-verbs noted:
./uis deploy postgrest --app atlasuses<service> --app <name>form, while./uis network expose tailscale atlas-postgrestuses the Service name. The status output exposes the Kubernetes-real name; the deploy/configure verbs translate--app→<app>-<service>internally. A future cleanup could make./uis deploy atlas-postgresta synonym, but that's a separate INVESTIGATE — not blocking this fix.
- Rationale: the ID is the actionable identifier — same string a user types into
-
Cross-namespace pattern. ✅ Read
SCRIPT_NAMESPACEfrom the service script. Postgrest setsSCRIPT_NAMESPACE="postgrest". The iteration logic querieskubectl get deploy -n $SCRIPT_NAMESPACE -l app.kubernetes.io/name=$SCRIPT_ID. This works for any multi-instance service that shares a single namespace across instances (postgrest's pattern). A future multi-instance service that uses per-app namespaces (e.g.atlasandrailwaynamespaces instead of all-in-postgrest) wouldn't fit a singleSCRIPT_NAMESPACEvalue — see C-7 for that out-of-scope future case.
Outcomes — decided
- Confirm the root cause by reading
cmd_status+ the postgrest service's metadata — see "Root cause (confirmed)" section above. - Decide on metadata vs. convention for multi-instance detection — metadata, via the existing
SCRIPT_MULTI_INSTANCE/multiInstanceflag. - Pick a fix candidate (1 / 2 / 3 / hybrid) — Option 1, with the deployment name as the row ID.
- Verify whether
./uis listhas the same gap and bundle the fix if so — YES; bundled. - Decide naming convention for the status row — ID =
<app>-<service>(deployment/Service name); NAME =$SCRIPT_NAME.
Implementation Contracts — locked
-
C-1: Multi-instance metadata. Use the existing
SCRIPT_MULTI_INSTANCE="true"flag (already onservice-postgrest.sh; already inservices.jsonasmultiInstance: true). Consolidate the two duplicate helpers_is_service_multi_instance(lib/service-deployment.sh) and_is_multi_instance(lib/configure.sh) into one canonical_is_service_multi_instanceinlib/service-deployment.sh; update thelib/configure.sh:216call site. -
C-2: Per-instance iteration (status + list display only). For each service with
SCRIPT_MULTI_INSTANCE="true", run:kubectl get deploy -n "$SCRIPT_NAMESPACE" -l "app.kubernetes.io/name=$SCRIPT_ID" --no-headers 2>/dev/nullParse column 1 (deployment name =
<app>-<service>) as the row ID; column 2 (READY, e.g.2/2) for health — healthy iff column 2 matches^([1-9][0-9]*)/\1$.Behaviour for each case, matching today's single-instance asymmetry between
cmd_status(only-healthy) andcmd_list(always-show-row):Case cmd_statuscmd_listDeployment exists, healthy (e.g., 2/2)one ✅ Healthyrowone ✅ DeployedrowDeployment exists, degraded (e.g., 1/2,0/2)no row (same as today's check-failed behaviour for single-instance) one ⚠ Degraded (<ready>/<replicas>)rowZero deployments matching the label selector no row one ❌ Not deployedrow for the service-type (ID =$SCRIPT_ID)kubectl error / no cluster no row no row for that service (kubectl-error path same as single-instance) SCRIPT_CHECK_COMMAND on the service script stays unchanged. It's still used by
check_service_deployed(lib/service-scanner.sh:106), which is called bylib/service-deployment.sh:199, 297, 340for deploy/undeploy/dep-check paths. Those paths only care about "is the service-class active at all," which the current check correctly answers. The per-instance iteration is a display-side override used only bycmd_statusandcmd_list. -
C-3: Status / list output format.
- Single-instance row (unchanged):
<SCRIPT_ID> <SCRIPT_NAME> <SCRIPT_CATEGORY> ✅ Healthy - Multi-instance row (new):
<deployment-name> <SCRIPT_NAME> <SCRIPT_CATEGORY> ✅ Healthy <deployment-name>comes directly from.metadata.nameon each Deployment (e.g.atlas-postgrest). No label parsing needed — the deployment name is the actionable identifier the user types into./uis network expose tailscale <name>orkubectl describe deploy -n postgrest <name>.- Header row unchanged. Column widths unchanged (existing
%-15s %-20s %-12s %sformat handlesatlas-postgrest(15 chars) andrailway-postgrest(17 chars — overflow visible but no truncation of subsequent columns since%-15sleft-pads but doesn't truncate). PLAN should validate column widths against the widest expected<app>-<service>name and bump the format if needed. - Note about the
app.kubernetes.io/instancelabel: it's set to just<app>(e.g.atlas) per the postgrest template (088-postgrest-config.yml.j2:20), NOT<app>-<service>. The PLAN's iteration uses.metadata.nameinstead because that's the user-facing string.
- Single-instance row (unchanged):
-
C-4:
./uis listparity. Same iteration helper feeds bothcmd_listandcmd_status; behaviour per state is documented in the table under C-2. Bundled in the same PR. Single-instance services incmd_listare unchanged (still showDeployed/Not deployed/No checkfrom the existingSCRIPT_CHECK_COMMANDpath). -
C-5: Backwards compatibility. Interactive readers see more useful detail (a strict improvement). Scripts doing
./uis status | grep '^postgrest'no longer match — the row is nowatlas-postgrest/railway-postgrest. Mitigation: document the change in the PR body; suggest./uis status | awk '$1 ~ /-postgrest$/'as the migration pattern. Acceptable cost — the current users of./uis statusare interactive, not scripts. -
C-6: talk53 mystery — explicitly out of scope. The PLAN does not include a phase to reproduce the talk53 "no postgrest row at all" symptom. That output likely reflected a pre-label state on an 8-day-old
atlas-postgrestdeployment, not a current-code defect. Tester verification on the new code path will catch any label-mismatch on freshly-deployed instances. -
C-7: Single-namespace assumption. The iteration in C-2 assumes all instances of a multi-instance service share a single
SCRIPT_NAMESPACE. This holds for postgrest today and matches the convention documented inservice-postgrest.sh("UIS deploys one PostgREST instance per consuming application; all instances share a namespace"). If a future multi-instance service deploys per-app namespaces (atlas/railwaynamespaces instead of all-in-postgrest), the iteration shape needs revisiting. Out of scope for this PLAN — flagged as a known assumption. -
C-8: Tests. No tests cover
cmd_statusorcmd_listtoday (verified by grep). The PLAN should decide whether to add unit/integration coverage for the multi-instance iteration path, especially the zero-instance / unreachable-kubectl edge cases. Recommendation: add at least one static test that asserts the format of the kubectl-output parsing (mockable without a cluster); defer integration coverage to tester verification.