Plan: Move AKS config to .uis.secrets/cloud-accounts/azure-default.env
IMPLEMENTATION RULES: Before implementing this plan, read and follow:
- WORKFLOW.md - The implementation process
- PLANS.md - Plan structure and best practices
Status: ✅ Completed (2026-05-08, verification reframed 2026-05-11)
Shipped in: PR #146 (Phases 1–5 — the file-structure restructure).
Verified end-to-end: talk46 R3 (2026-05-11) ran uis platform up azure-aks → uis deploy nginx → uis platform down azure-aks against the post-restructure file layout. The wizard (PLAN-uis-platform-init-azure-aks.md, PR #155) writes .uis.secrets/cloud-accounts/azure-default.env directly, replacing this PLAN's originally-manual cp + nano flow in Phase 6.
Goal: Replace the bash-file-in-tree config (platforms/azure-aks/azure-aks-config.sh) with the documented .uis.secrets/cloud-accounts/azure-default.env convention. Single user-edited file; defaults visible inline as commented overrides; scripts use ${VAR:-default} shell fallback. Aligns AKS with the cluster-secret pattern that secrets.md already documents.
Last Updated: 2026-05-11
Related:
- INVESTIGATE-system-platform-provisioning-layer.md — Step 1 scope.
- PLAN-001-aks-step1-verification.md — sibling PLAN, both shipped 2026-05-11. PLAN-001's Phase 2 verified the new file structure end-to-end.
- PLAN-platform-aks-001b-manual-setup.md — Phase 4 references the bash file; this PLAN updated it.
- Secrets architecture doc — names
cloud-accounts/azure-default.envas the existing pattern.
Sequence: this PLAN is the small restructure that lands first. The follow-up wizard (./uis target add aks) is a separate larger PLAN that builds on top of the file structure this PLAN locks in.
Problem Summary
platforms/azure-aks/azure-aks-config.sh is a bash file the operator copies from azure-aks-config.sh-template, fills in, and saves in-tree (gitignored at the platform-aks level only). It mixes Azure-account identity (tenant/subscription IDs that should sit alongside cluster-secret overrides under .uis.secrets/) with cluster-shape defaults (node size, autoscaler bounds), and uses unprefixed variable names (TENANT_ID) that don't match the AZURE_* convention the existing cloud-accounts/azure.env.template already uses.
The secrets architecture doc (website/docs/contributors/architecture/secrets.md) documents cloud-accounts/azure-default.env as the canonical home for Azure cloud-account values, complete with a path helper (get_cloud_credentials_path "azure"). AKS is the first concrete consumer of that pattern; this PLAN slots it in.
No split: a single user-edited file (.uis.secrets/cloud-accounts/azure-default.env) holds everything from required identity values down to optional cluster-shape overrides, with defaults commented inline so the operator sees them at the point of editing. Scripts source the file then use ${VAR:-default} to fall back when the operator leaves something out.
Phase 1: Extend azure.env.template with AKS-specific additions
The existing template at provision-host/uis/templates/uis.secrets/cloud-accounts/azure.env.template only carries AZURE_TENANT_ID + AZURE_SUBSCRIPTION_ID (plus a commented service-principal block we don't use yet). Extend it.
Tasks
-
1.1 Edit
provision-host/uis/templates/uis.secrets/cloud-accounts/azure.env.template. Keep the existing two required vars and the service-principal commented block. Add the following sections, each clearly grouped:# === REQUIRED — fill these in =====================================
AZURE_TENANT_ID=""
AZURE_SUBSCRIPTION_ID=""
# Globally-unique state storage account name (3-24 lowercase chars).
# Verify availability with: az storage account check-name --name <candidate>
AZURE_STATE_STORAGE_ACCOUNT=""
# === OPTIONAL — Azure tags for cost tracking ======================
# Defaults to your sign-in email if left empty.
# AZURE_TAG_BUSINESS_OWNER=""
# AZURE_TAG_IT_OWNER=""
# AZURE_TAG_COST_CENTER="helpers-no"
# === OPTIONAL — AKS cluster-shape overrides =======================
# Uncomment to override the defaults shown alongside.
# AZURE_AKS_LOCATION="westeurope"
# AZURE_AKS_RESOURCE_GROUP="rg-urbalurba-aks-weu"
# AZURE_AKS_CLUSTER_NAME="azure-aks"
# AZURE_AKS_NODE_SIZE="Standard_B2ms"
# AZURE_AKS_NODE_COUNT=1
# AZURE_AKS_MIN_COUNT=1
# AZURE_AKS_MAX_COUNT=3
# AZURE_AKS_OS_DISK_SIZE=30
# === OPTIONAL — OpenTofu state backend layout =====================
# AZURE_AKS_STATE_RESOURCE_GROUP="rg-urbalurba-tfstate"
# AZURE_AKS_STATE_CONTAINER="tfstate"
# AZURE_AKS_STATE_KEY="aks/terraform.tfstate" -
1.2 Update the file's leading comment to reflect: copy to
.uis.secrets/cloud-accounts/azure-default.env, edit, save. Mention that the AKS sections only matter if the user is provisioning AKS — Azure CLI itself only needs the tenant/subscription pair.
Validation
Static check: bash -n provision-host/uis/templates/uis.secrets/cloud-accounts/azure.env.template parses clean. Visual review: every AKS variable from the current platforms/azure-aks/azure-aks-config.sh-template has a counterpart with AZURE_AKS_ prefix (or AZURE_ for genuinely account-scoped ones).
Phase 2: Update platforms/azure-aks/scripts/*.sh to source from cloud-accounts/
All four scripts currently source $SCRIPT_DIR/../azure-aks-config.sh. Replace with the path-helper-resolved location plus inline defaults.
Tasks
-
2.1 In each of
00-bootstrap-state.sh,01-apply.sh,02-post-apply.sh,03-destroy.sh, replace the existing config-source block with:# Source the cloud-accounts helper for the path resolver
source "/mnt/urbalurbadisk/provision-host/uis/lib/paths.sh"
CONFIG_FILE="$(get_cloud_credentials_path azure)"
if [[ ! -f "$CONFIG_FILE" ]]; then
print_error "Azure cloud-account config not found: $CONFIG_FILE"
echo "Copy the template first:"
echo " cp provision-host/uis/templates/uis.secrets/cloud-accounts/azure.env.template $CONFIG_FILE"
echo " # then fill in AZURE_TENANT_ID, AZURE_SUBSCRIPTION_ID, AZURE_STATE_STORAGE_ACCOUNT"
exit 1
fi
source "$CONFIG_FILE"
# Validate required values are set
: "${AZURE_TENANT_ID:?Required in $CONFIG_FILE}"
: "${AZURE_SUBSCRIPTION_ID:?Required in $CONFIG_FILE}"
: "${AZURE_STATE_STORAGE_ACCOUNT:?Required in $CONFIG_FILE}"
# Apply inline defaults for optional cluster-shape values
AZURE_AKS_LOCATION="${AZURE_AKS_LOCATION:-westeurope}"
AZURE_AKS_RESOURCE_GROUP="${AZURE_AKS_RESOURCE_GROUP:-rg-urbalurba-aks-weu}"
AZURE_AKS_CLUSTER_NAME="${AZURE_AKS_CLUSTER_NAME:-azure-aks}"
AZURE_AKS_NODE_SIZE="${AZURE_AKS_NODE_SIZE:-Standard_B2ms}"
AZURE_AKS_NODE_COUNT="${AZURE_AKS_NODE_COUNT:-1}"
AZURE_AKS_MIN_COUNT="${AZURE_AKS_MIN_COUNT:-1}"
AZURE_AKS_MAX_COUNT="${AZURE_AKS_MAX_COUNT:-3}"
AZURE_AKS_OS_DISK_SIZE="${AZURE_AKS_OS_DISK_SIZE:-30}"
AZURE_AKS_STATE_RESOURCE_GROUP="${AZURE_AKS_STATE_RESOURCE_GROUP:-rg-urbalurba-tfstate}"
AZURE_AKS_STATE_CONTAINER="${AZURE_AKS_STATE_CONTAINER:-tfstate}"
AZURE_AKS_STATE_KEY="${AZURE_AKS_STATE_KEY:-aks/terraform.tfstate}"
# Tags default to the signed-in user's email if not overridden
if [[ -z "${AZURE_TAG_BUSINESS_OWNER:-}" ]] || [[ -z "${AZURE_TAG_IT_OWNER:-}" ]]; then
_SIGNED_IN_EMAIL=$(az ad signed-in-user show --query userPrincipalName -o tsv 2>/dev/null || echo "")
AZURE_TAG_BUSINESS_OWNER="${AZURE_TAG_BUSINESS_OWNER:-${_SIGNED_IN_EMAIL}}"
AZURE_TAG_IT_OWNER="${AZURE_TAG_IT_OWNER:-${_SIGNED_IN_EMAIL}}"
fi
AZURE_TAG_COST_CENTER="${AZURE_TAG_COST_CENTER:-helpers-no}"
# Derived (do not change)
KUBECONFIG_FILE="/mnt/urbalurbadisk/kubeconfig/${AZURE_AKS_CLUSTER_NAME}-kubeconf" -
2.2 Search-and-replace the rest of each script: every reference to the old unprefixed variables (
$TENANT_ID,$SUBSCRIPTION_ID,$RESOURCE_GROUP,$CLUSTER_NAME,$LOCATION,$NODE_COUNT,$NODE_SIZE,$MIN_COUNT,$MAX_COUNT,$OS_DISK_SIZE,$STATE_*,$TAG_*) becomes the prefixed equivalent ($AZURE_TENANT_ID,$AZURE_AKS_RESOURCE_GROUP, etc.).
Validation
shellcheck platforms/azure-aks/scripts/*.sh parses clean. grep -nE "\\\$(TENANT_ID|SUBSCRIPTION_ID|CLUSTER_NAME|RESOURCE_GROUP|NODE_SIZE|MIN_COUNT|MAX_COUNT|OS_DISK_SIZE|STATE_(RESOURCE_GROUP|STORAGE_ACCOUNT|CONTAINER|KEY)|TAG_)" platforms/azure-aks/scripts/ returns no hits — every var is now AZURE_*-prefixed.
Phase 3: Bash → tofu variable translation at the apply boundary
tofu/main.tf and tofu/variables.tf keep their existing unprefixed names (tenant_id, subscription_id, node_count, cluster_name, etc.) per Q-P — no tofu rename. The translation happens in 01-apply.sh when generating tofu/terraform.tfvars.
Tasks
-
3.1 In
01-apply.sh, update thecat > "$TFVARS_FILE" <<EOFblock so each tfvars line maps from the prefixed bash var to the unprefixed tofu var:cat > "$TFVARS_FILE" <<EOF
# Auto-generated by 01-apply.sh from .uis.secrets/cloud-accounts/azure-default.env — do not edit manually
tenant_id = "$AZURE_TENANT_ID"
subscription_id = "$AZURE_SUBSCRIPTION_ID"
resource_group = "$AZURE_AKS_RESOURCE_GROUP"
cluster_name = "$AZURE_AKS_CLUSTER_NAME"
location = "$AZURE_AKS_LOCATION"
node_count = $AZURE_AKS_NODE_COUNT
node_size = "$AZURE_AKS_NODE_SIZE"
min_count = $AZURE_AKS_MIN_COUNT
max_count = $AZURE_AKS_MAX_COUNT
os_disk_size_gb = $AZURE_AKS_OS_DISK_SIZE
tag_cost_center = "$AZURE_TAG_COST_CENTER"
tag_project = "urbalurba-infrastructure"
tag_environment = "Sandbox"
tag_business_owner = "$AZURE_TAG_BUSINESS_OWNER"
tag_it_owner = "$AZURE_TAG_IT_OWNER"
EOFtag_projectandtag_environmentare baked into the script — they're code-level defaults, not user-overrideable for now. If a contributor later wants to override them, expand the override surface then. -
3.2 Update
01-apply.sh'stofu initbackend-config args andARM_ACCESS_KEYfetch to use the prefixed bash var names:export ARM_ACCESS_KEY=$(az storage account keys list \
--resource-group "$AZURE_AKS_STATE_RESOURCE_GROUP" \
--account-name "$AZURE_STATE_STORAGE_ACCOUNT" \
--query "[0].value" -o tsv)
tofu init \
-backend-config="resource_group_name=$AZURE_AKS_STATE_RESOURCE_GROUP" \
-backend-config="storage_account_name=$AZURE_STATE_STORAGE_ACCOUNT" \
-backend-config="container_name=$AZURE_AKS_STATE_CONTAINER" \
-backend-config="key=$AZURE_AKS_STATE_KEY" \
-reconfigure
Validation
shellcheck platforms/azure-aks/scripts/01-apply.sh parses clean. The generated tofu/terraform.tfvars (after a dry run sourcing a sample config) has unprefixed keys, matching tofu/variables.tf.
Phase 4: Delete the obsolete azure-aks-config.sh-template
Once Phases 1–3 ship, platforms/azure-aks/azure-aks-config.sh-template is unused. Delete it so contributors don't accidentally edit the wrong file.
Tasks
-
4.1
git rm platforms/azure-aks/azure-aks-config.sh-template. -
4.2 Update
platforms/azure-aks/README.md(if it references the old template) to point at.uis.secrets/cloud-accounts/azure-default.envand the new template underprovision-host/uis/templates/uis.secrets/cloud-accounts/azure.env.template. -
4.3 Search-and-replace any other in-tree references:
grep -rn "azure-aks-config\.sh" --include="*.sh" --include="*.md"— every hit becomes a reference to either the new template path or the new user-file path.
Validation
grep -rn "azure-aks-config.sh" . returns no hits anywhere outside this PLAN's history. The old template file is gone.
Phase 5: Update PLAN-001b Phase 4 + variable-mapping table
PLAN-001b's Phase 4 ("Configuration") still describes the old platforms/azure-aks/azure-aks-config.sh flow. Update it to reflect the new file location and prefixed variable names. The variable-mapping table in Phase 3 (where each Phase 3 step's output maps to a Phase 4 variable) also needs the AZURE_* rename.
Tasks
-
5.1 In
PLAN-platform-aks-001b-manual-setup.mdPhase 4, replace thecp ... azure-aks-config.shinstruction withcp provision-host/uis/templates/uis.secrets/cloud-accounts/azure.env.template .uis.secrets/cloud-accounts/azure-default.envand the corresponding edit-and-source flow. Update the example values block to useAZURE_*prefixed names. -
5.2 In Phase 3's "What you should now have written down" table, rename the variable column entries (
TENANT_ID→AZURE_TENANT_ID,SUBSCRIPTION_ID→AZURE_SUBSCRIPTION_ID,LOCATION→AZURE_AKS_LOCATION,NODE_SIZE→AZURE_AKS_NODE_SIZE,STATE_STORAGE_ACCOUNT→AZURE_STATE_STORAGE_ACCOUNT,TAG_*→AZURE_TAG_*). -
5.3 In Phase 4's git-ignore check, replace
git check-ignore -v platforms/azure-aks/azure-aks-config.shwithgit check-ignore -v .uis.secrets/cloud-accounts/azure-default.env(which is gitignored as part of the whole.uis.secrets/tree).
Validation
grep -nE "azure-aks-config\.sh|TENANT_ID(?!_)" website/docs/ai-developer/plans/backlog/PLAN-platform-aks-001b-manual-setup.md (or equivalent inspection) shows zero hits — every mention is the new path/name.
Phase 6: Verification — local build of the feature branch, before merge
The merge gate for this PLAN is "the AKS run-through actually works against the new file structure." We verify that by building the feature branch's image locally on the host, recreating the running container against uis-provision-host:local, and walking through PLAN-001 Phase 2.4–2.8 against an Azure subscription. The PR merges only after the run succeeds; any failure becomes a fix on the same branch.
This is faster than waiting for CI to publish to GHCR (CI build + push takes ~12 minutes) and means the change is verified end-to-end before it ever reaches main.
Tasks
-
6.1 Switch the host checkout to the feature branch so
./uis buildpicks up the new code. — done before PR #146 merge.git fetch && git checkout feature/aks-config-cloud-accounts -
6.2 Build the local image with the updated scripts + template baked in. — done before PR #146 merge; subsequent talk44/46 rounds also rebuilt against
:latestafter every fix../uis buildProduces
uis-provision-host:local. -
6.3 Recycle the running container against the new image. — done.
UIS_IMAGE=uis-provision-host:local ./uis restart./uis restartisstop+start;start_containerdoesdocker rm -ffirst, so the container is freshly created from:local(not the cached old:latest). The Azure CLI token in~/.azure/is wiped — re-login in step 6.4. -
6.4 Re-login to Azure inside the new container. — manual
az login --use-device-codeflow used in talk44; subsequently absorbed intouis platform init azure-aks(PR #155) which handles login + sub-select + role-check + region-pick + provider-register + env-file write in one wizard../uis shell
cd /mnt/urbalurbadisk
az login --use-device-code
az account set --subscription <YOUR_SUBSCRIPTION_ID> -
6.5 Set up the new config file. — the manual
cp+nanoworkflow specified here was the original Phase 6 plan, but by talk46 it was fully replaced by theuis platform init azure-akswizard (PLAN-uis-platform-init-azure-aks.md), which writes.uis.secrets/cloud-accounts/azure-default.envdirectly. Outcome (a populated env file withAZURE_TENANT_ID,AZURE_SUBSCRIPTION_ID,AZURE_STATE_STORAGE_ACCOUNT,AZURE_REGION) matches.cp provision-host/uis/templates/uis.secrets/cloud-accounts/azure.env.template \
.uis.secrets/cloud-accounts/azure-default.env
nano .uis.secrets/cloud-accounts/azure-default.envFill in the three required values:
AZURE_TENANT_ID,AZURE_SUBSCRIPTION_ID,AZURE_STATE_STORAGE_ACCOUNT. Leave optional sections commented unless overriding. -
6.6 Run the AKS provisioning chain end-to-end. — talk46 R3 ran
uis platform up azure-aks(which chains the same three scripts) anduis deploy nginxagainst the resulting AKS cluster. Nginx's in-cluster connectivity tests passed../platforms/azure-aks/scripts/00-bootstrap-state.sh
./platforms/azure-aks/scripts/01-apply.sh
./platforms/azure-aks/scripts/02-post-apply.sh
./uis deploy nginxExpected: nginx playbook's in-cluster connectivity tests (steps 13 + 15) succeed against the AKS cluster.
-
6.7 Tear down to close the cost gate. — talk46 R3 closed via
uis platform down azure-aks.az aks list -o tableempty afterward; state RG preserved as designed../platforms/azure-aks/scripts/03-destroy.sh -
6.8 If anything in 6.4–6.6 fails — fix on this same feature branch. — gaps that surfaced later (F1–F13) went into follow-up PRs (#155–#158) rather than
feature/aks-config-cloud-accountsbecause they belonged to different scopes (wizard, wrappers, status command). -
6.9 Once the run completes cleanly — squash-merge PR #146. — done.
gh pr merge 146 --squash --delete-branchThen on host:
git checkout main && git pulland prune the deleted remote branch. CI will rebuild and publish the merged image to GHCR; subsequent contributors use./uis pullfor their copy.
Validation
./uis deploy nginx succeeds against an AKS cluster provisioned via uis-provision-host:local built from the feature branch. Cluster cleanly destroyed afterward (cost gate). PR #146 merged only after this passes.
Acceptance Criteria
-
provision-host/uis/templates/uis.secrets/cloud-accounts/azure.env.templateextended with the AKS-specific sections. - All four
platforms/azure-aks/scripts/*.shsource$(get_cloud_credentials_path azure)and use${VAR:-default}for optional values. -
tofu/terraform.tfvarsgeneration maps prefixed bash → unprefixed tofu names;tofu/variables.tfandtofu/main.tfunchanged. -
platforms/azure-aks/azure-aks-config.sh-templateis deleted; no in-tree references remain. -
PLAN-platform-aks-001b-manual-setup.mdPhase 3 + Phase 4 reflect the new location and variable names. - Tester can complete PLAN-001 Phase 2 (the AKS run-through) end-to-end against the new structure. — talk46 R3.
- This plan is in
completed/. — done 2026-05-11.
Files to Modify
provision-host/uis/templates/uis.secrets/cloud-accounts/azure.env.template(extend)platforms/azure-aks/scripts/00-bootstrap-state.shplatforms/azure-aks/scripts/01-apply.shplatforms/azure-aks/scripts/02-post-apply.shplatforms/azure-aks/scripts/03-destroy.shplatforms/azure-aks/azure-aks-config.sh-template(delete)platforms/azure-aks/README.md(if it cites the old template)website/docs/ai-developer/plans/backlog/PLAN-platform-aks-001b-manual-setup.mdwebsite/docs/ai-developer/plans/active/PLAN-aks-config-cloud-accounts.md→completed/(Phase 6)
Implementation Notes
- Why no
platforms/azure-aks/defaults.env. Per the discussion that produced this PLAN: a separate "platform defaults" file added a layer without a load-bearing reason. Defaults live commented-inline in the template (so the operator sees them at the point of editing) and as${VAR:-default}in the scripts (so they apply when the operator leaves things out). One file for the operator to read, one place per script for the fallback. - Why
AZURE_AKS_*prefix on AKS-specific values. The single-file-per-provider model means future Azure-but-not-AKS work (e.g. Azure Container Apps if anyone needs them) would also live inazure-default.env. Distinct prefixes keep the namespaces clean. Account-level values (tenant/subscription/state-SA-name/tags) keep the shorterAZURE_*prefix since they're shared across any Azure work. - Why no tofu rename.
tofu/variables.tfis internal to the OpenTofu module; it has its own namespace. The bash-to-tofu translation already happens viaterraform.tfvarsgeneration; renaming inside tofu would be churn without payoff. - Why
KUBECONFIG_FILEderives from$AZURE_AKS_CLUSTER_NAME. Old template hard-codedazure-aks-kubeconf. New form accommodates a contributor running multiple clusters with differentAZURE_AKS_CLUSTER_NAMEvalues without overwriting kubeconfigs. Path stays under/mnt/urbalurbadisk/kubeconfig/(unchanged) — moving to.uis.secrets/generated/kubeconfig/is a separate concern tied to the kubeconfig-merge work flagged insecrets.md. - Sequencing — verify before merge. Phase 6 builds the feature branch locally on the host (
./uis build) and runs the entire AKS provisioning chain againstuis-provision-host:localbefore the PR merges. This is faster than the CI loop (~12 min for GHCR build/push) and means the change is verified end-to-end before it lands onmain. Any gap surfaced in Phase 6 is a fix on the same branch + a fresh./uis buildcycle. Merge happens at task 6.9 only after task 6.6 (./uis deploy nginx) succeeds and 6.7 (destroy) closes the cost gate.