Plan: ./uis platform up azure-aks chain wrapper
IMPLEMENTATION RULES: Before implementing this plan, read and follow:
- WORKFLOW.md - The implementation process
- PLANS.md - Plan structure and best practices
Status: ✅ Completed (2026-05-11)
Shipped in: PR #156 (bundled with PLAN #4).
Verified end-to-end: talk44 (UIS_IMAGE=:local), talk45 + talk46 (post-merge against CI-built :latest) — cold cycle ran 00→01→02 successfully, F1–F5 surfaced and fixed in the same PR before merge, F11/F12 in the follow-on status command fixed in PR #158.
Goal: Add ./uis platform up azure-aks — a thin chain wrapper that runs the three existing lifecycle scripts (00-bootstrap-state.sh → 01-apply.sh → 02-post-apply.sh) in order with visible inter-step banners. This is PLAN #3 of 4 spawned by INVESTIGATE-platform-aks-novice-onboarding.md. Trivial once PLAN #2's init ships — the heavy lifting (sub discovery, role check, region pick, provider registration, env-file write) is done; up just executes the IaC.
Last Updated: 2026-05-11 — bundled with PLAN #4 (./uis platform down azure-aks) in PR #156 so the AKS wrapper sequence ships as one logical change. Tester round at testing/uis1/talk/talk.md covers both wrappers.
Source: INVESTIGATE-platform-aks-novice-onboarding.md. Implements Q8 (three-layer split), Q9 (naive chain), Q10 (always have output), Q11 (refuse-with-pointer if env missing).
Problem Summary
After PLAN #2's init ships, the novice flow is at step 8 of 8:
uis tools install azure-aks # PR #154 ✅
uis platform init azure-aks # PR #155 ✅
# Now: run three scripts manually:
./platforms/azure-aks/scripts/00-bootstrap-state.sh
./platforms/azure-aks/scripts/01-apply.sh
./platforms/azure-aks/scripts/02-post-apply.sh
The novice has to know which scripts to run, in which order, that they're idempotent, and where they live (./platforms/azure-aks/scripts/... is an unfamiliar path inside an unfamiliar shape). This PLAN replaces the three manual invocations with:
uis platform up azure-aks
Per Q9: naive chain — all three lifecycle scripts run on every invocation. All three are idempotent today, so warm runs are fast no-ops with visible logging per Q10 (no --force / --skip-* flags needed).
Out of Scope
- The
downwrapper — PLAN #4. - The
cleancommand for wiping.uis.secrets/cloud-accounts/azure-default.envpost-down— deferred per Q12. --non-interactiveflag — there are no interactive prompts inup(everything reads from the env file written byinit), so this is moot. The destroy-confirmation pattern from03-destroy.shdoesn't apply.- Changing the underlying lifecycle scripts.
upcalls them as-is. Any improvements to00-bootstrap-state.sh/01-apply.sh/02-post-apply.share separate PRs. - A progress UX over
tofu apply's output — Q10 says always have output, no spinners.tofu applyalready streams per-resource output;uponly adds inter-step banners. - Adding
upfor other platforms (gke/eks/azure-microk8s/microk8s-rpi). The dispatcher infrastructure (cmd_platform_updiscoveringplatforms/<provider>/scripts/up.sh) makes future additions cheap, but only AKS is in scope now.
Phase 1: Create platforms/azure-aks/scripts/up.sh
The chain orchestrator. Mirrors init.sh's shape (banner + preflight + delegate + summary) but the delegation is to the three existing lifecycle scripts in sequence.
Tasks
-
1.1 Create
platforms/azure-aks/scripts/up.sh:#!/bin/bash
# up.sh — Provision an AKS cluster end-to-end (PLAN #3 of INVESTIGATE-platform-aks-novice-onboarding.md).
#
# Entry point: uis platform up azure-aks
# Chains the three existing lifecycle scripts in order, with inter-step
# banners per the always-have-output principle. All three are idempotent,
# so warm runs are fast no-ops with visible logging.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="${UIS_REPO_ROOT:-$(cd "$SCRIPT_DIR/../../.." && pwd)}"
ENV_FILE="$REPO_ROOT/.uis.secrets/cloud-accounts/azure-default.env"
# Q11 — refuse with a pointer if init has not been run.
if [[ ! -f "$ENV_FILE" ]]; then
echo "✗ No config file found at $ENV_FILE" >&2
echo " Run 'uis platform init azure-aks' first to set one up." >&2
exit 1
fi
# Make AZURE_* available to the lifecycle scripts.
set -a
# shellcheck source=/dev/null
source "$ENV_FILE"
set +a
echo "═══════════════════════════════════════════════════════════"
echo " AKS cluster provisioning"
echo " (uis platform up azure-aks)"
echo " Subscription: ${AZURE_SUBSCRIPTION_ID:-unset}"
echo " Region: ${AZURE_REGION:-unset}"
echo "═══════════════════════════════════════════════════════════"
echo
echo "⚠ This will create or update Azure resources and may incur cost (~€1/day)."
echo " Run 'uis platform down azure-aks' to tear down when finished."
echo
echo "▶ 1/3 Bootstrap remote tofu state (Azure storage account + container)..."
"$SCRIPT_DIR/00-bootstrap-state.sh"
echo
echo "▶ 2/3 Apply cluster (tofu apply against platforms/azure-aks/tofu/)..."
"$SCRIPT_DIR/01-apply.sh"
echo
echo "▶ 3/3 Post-apply (kubeconfig merge + storage-class aliases + Traefik)..."
"$SCRIPT_DIR/02-post-apply.sh"
echo
echo "═══════════════════════════════════════════════════════════"
echo " ✓ AKS cluster is up"
echo "═══════════════════════════════════════════════════════════"
echo " Try: kubectl get nodes"
echo " uis deploy nginx"
echo
echo " Tear down: uis platform down azure-aks (PLAN #4 — not yet shipped)"
echo " ./platforms/azure-aks/scripts/03-destroy.sh (works today)" -
1.2
chmod +x platforms/azure-aks/scripts/up.sh
Validation (Phase 1)
- 1.3
bash -n platforms/azure-aks/scripts/up.shparses cleanly. - 1.4 Script runs the three lifecycle scripts in order. Verified at Phase 3 by the tester. — talk46 R3
▶ 1/3 / ▶ 2/3 / ▶ 3/3banners fired in order, cluster came up successfully.
Phase 2: Wire up into the dispatcher
cmd_platform in provision-host/uis/manage/uis-cli.sh currently handles up/down with a "not yet implemented" placeholder (PLAN #2 c780a74). This phase removes the placeholder for up and routes it through a new cmd_platform_up that mirrors cmd_platform_init.
Tasks
-
2.1 In
cmd_platform, replace theup|down)joint placeholder with two distinct cases:up)
cmd_platform_up "$@"
;;
down)
log_error "'uis platform down' is not yet implemented"
{ ... } >&2
exit "$EXIT_GENERAL_ERROR"
;;(The
downplaceholder gets its own block now thatupanddownship separately. Keep the same lifecycle-script fallback hint indown's placeholder.) -
2.2 Add
cmd_platform_upaftercmd_platform_init. It's structurally identical tocmd_platform_initbut dispatches toup.sh:cmd_platform_up() {
local provider="${1:-}"
local repo_root
repo_root="$(cd "$SCRIPT_DIR/../../.." && pwd)"
if [[ -z "$provider" ]]; then
log_error "Usage: uis platform up <provider>"
{ _list_available_platforms_with_script up.sh "$repo_root"; } >&2
exit "$EXIT_GENERAL_ERROR"
fi
local script="$repo_root/platforms/$provider/scripts/up.sh"
if [[ ! -f "$script" ]]; then
log_error "Platform '$provider' has no up.sh (looked at $script)"
{ _list_available_platforms_with_script up.sh "$repo_root"; } >&2
exit "$EXIT_GENERAL_ERROR"
fi
export UIS_REPO_ROOT="$repo_root"
exec "$script"
}Note: lifting the "list available platforms with script X" logic into a helper
_list_available_platforms_with_scriptavoids duplicating the loop acrosscmd_platform_init(looks forinit.sh) andcmd_platform_up(looks forup.sh). Tiny refactor — same shape, new arg for the script name:_list_available_platforms_with_script() {
local target_script="$1"
local repo_root="$2"
echo "Available platforms:"
local p script_path
for script_path in "$repo_root"/platforms/*/scripts/"$target_script"; do
[[ -f "$script_path" ]] || continue
p=$(basename "$(dirname "$(dirname "$script_path")")")
echo " - $p"
done
}Refactor
cmd_platform_init's two suggestion blocks to use this helper too (soinitlists platforms-with-init.sh anduplists platforms-with-up.sh, both via the same code). -
2.3 Update
cmd_help'sPlatform:section to remove the "PLAN #3 — not yet implemented" tag from theupline. Thedownline keeps its "PLAN #4 — not yet implemented" tag.
Validation (Phase 2)
- 2.4
bash -n provision-host/uis/manage/uis-cli.shparses cleanly. - 2.5 Smoke-test inside the rebuilt local container:
uis platform up(no provider) — usage + available platforms (azure-aks only), exit non-zero.uis platform up nonexistent— "Platform 'nonexistent' has no up.sh..." + available list, exit non-zero.uis platform up azure-akswith no env file — refuses with theuis platform init azure-akspointer, exit non-zero. (can be self-tested by deletingazure-default.envbefore invoking; no Azure call attempted.)uis platform down azure-aks— still prints "not yet implemented" placeholder.uis helpshowsplatform up <provider> Provision the clusterwithout the "not yet implemented" tag;downstill tagged.
Phase 3: Tester verification (talk.md round)
This is the load-bearing test of the entire AKS path. Up until now, every PR has been verifiable without provisioning a real cluster. PLAN #3 changes that: up creates an actual AKS cluster, which costs real money (~€1/day for a 1-node Standard_B2s_v2 cluster). The tester must spend ~€1–2 to fully verify cold-run + warm-run + (Phase 4 will tear it down at the end).
Tasks
- 3.1 File the verification round at
testing/uis1/talk/talk.md, archiving the current talk.md (theinitround) astalk43.md. - 3.2 Tester rounds (talk44 pre-merge against
:local, talk45/talk46 post-merge against:latest):- R0 — local image preflight; confirm
platforms/azure-aks/scripts/up.shis in the running container. - R1 — dispatcher error paths:
uis platform up(no provider),uis platform up nonexistent,uis platform up azure-akswith the env file deleted (Q11 refusal). All non-zero exits with the expected messages. - R2 — cold run against a real Azure subscription. Run
uis platform up azure-aks. Expect ~10–15 minutes:- Inter-step banners
▶ 1/3 Bootstrap...,▶ 2/3 Apply...,▶ 3/3 Post-apply... 00-bootstrap-state.sh: ~10s if state RG exists from prior work, ~30-60s otherwise.01-apply.sh:tofu applystreams per-resource output. RG creation → cluster creation → wait for cluster Ready → ~8–12 minutes.02-post-apply.sh: kubeconfig merge, storage-class aliases applied, Traefik installed.- Final banner with
Try: kubectl get nodeshint. Verify after:kubectl get nodeslists the one Standard_B2s_v2 node,kubectl get pods -n traefikshows Traefik running.
- Inter-step banners
- R3 — warm run immediately after R2. Same command. Expect:
- All three scripts run again (Q9 naive chain).
00-bootstrap-state.sh: "state RG already exists, skipping creation".01-apply.sh:tofu apply — no changesin ~30s.02-post-apply.sh: kubeconfig already merged, storage classes already aliased, Traefik already installed.- Final banner. Total elapsed should be ~1–2 minutes.
- R4 — deploy verification:
uis deploy nginxagainst the live AKS cluster. Confirm the cluster-config flip from R2 routed the deploy to AKS (per INVESTIGATE-active-cluster-visibility-ux.md, this is currently a silent-failure mode; layered visibility lands later).kubectl get pods -n nginxshows the pod running. - R5 — cost reassurance + tear-down hint visible: confirm the wizard printed
⚠ This will create or update Azure resources and may incur cost (~€1/day)before any actual API call. After R2-R4, run./platforms/azure-aks/scripts/03-destroy.shmanually (PLAN #4'sdownwrapper isn't shipped yet). Cluster goes away, cost stops.
- R0 — local image preflight; confirm
Validation (Phase 3)
- 3.3 Tester closes R0–R3 green (R2 is the load-bearing one). R4 + R5 nice-to-have. — All closed across talk44/45/46; talk46 R3 ran R2 (cold cycle) + R4 (
uis deploy nginxon AKS) + R5 (real tear-down).
Verification gate before merge
- All Phase 1/2
bash -nchecks pass. - Tester closes Phase 3 R2 (cold run end-to-end against real Azure) at minimum. — talk46 R3.
- Local Docusaurus build clean for this PLAN file.
- PR description includes the cold-run transcript from R2 (proves the chain works end-to-end).
- PR description includes the tear-down output from R5 (proves the verification environment is clean — no cluster left running and incurring cost).
What this PLAN deliberately does NOT do
- Add a
--forceor--skip-*flag. Q9: naive chain is the answer. The few seconds of warm-run "already exists" output is the cost of not having a flag the novice has to learn. - Wrap
tofu apply's output in a spinner or progress bar. Q10. Stream the per-resource output through unchanged. - Try to detect "is this a cold or warm run" upfront to skip steps. Each script self-determines that internally; the wrapper just runs all three.
- Cache kubeconfig flips across runs.
02-post-apply.shre-flips on every invocation. If the user manually changed kubectl context betweenupruns, the post-apply will restore it — acceptable behavior sinceupis the "I want this AKS cluster to be the active target" command. - Add cost-confirmation Y/N prompts. The cold-run cost is ~€1/day, visible in the banner before the API calls. A prompt every time is friction; users who don't want the cost don't run
up.
Related
- INVESTIGATE-platform-aks-novice-onboarding.md — parent investigation. Q8, Q9, Q10, Q11 directly inform this PLAN.
- PLAN-uis-tools-install-azure-aks.md — PLAN #1 (PR #154, merged).
- PLAN-uis-platform-init-azure-aks.md — PLAN #2 (PR #155, merged).
initwrites the env file thatupreads. - INVESTIGATE-active-cluster-visibility-ux.md — once
uplands and the operator has 2+ clusters (rancher-desktop + azure-aks), Layer 1's per-command banner + Layer 4'suis platform list/usebecome the next safety problem. R4 of Phase 3 explicitly notes this. platforms/azure-aks/scripts/{00-bootstrap-state,01-apply,02-post-apply}.sh— the three lifecycle scriptsup.shchains. Unchanged in this PR.- Next: PLAN #4 —
./uis platform down azure-akspass-through to03-destroy.sh. Trivial; under a day.