INVESTIGATE: Platform Provisioning Layer
IMPLEMENTATION RULES: Before implementing this plan, read and follow:
- WORKFLOW.md - The implementation process
- PLANS.md - Plan structure and best practices
Created: 2026-04-09
Updated: 2026-04-09
Status: ACTIVE — AKS Step 1 in progress on feature/platform-aks-opentofu
Background
UIS currently has a hosts/ directory containing bash scripts and az CLI commands
that set up Kubernetes clusters on various targets. These scripts predate a consistent
platform model and have grown organically — each target has different conventions,
different secret handling approaches, and different levels of automation maturity.
The idea is to introduce a platforms/ layer that replaces hosts/ with a consistent,
OpenTofu-based approach for cloud targets and an Ansible/cloud-init approach for VM and
bare-metal targets. Every platform gets a working Kubernetes cluster that is ready to
receive UIS services.
The first concrete output of this investigation is platforms/aks/ (Step 1),
which provisions AKS via OpenTofu with Azure Blob Storage remote state.
What is the Platform Layer?
The platform layer is everything that must exist before ./uis deploy <service> can run.
It is not part of UIS service management — it is the substrate that UIS runs on.
For a cloud-managed cluster (AKS, GKE, EKS), the platform layer:
- Creates cloud resources (resource group, cluster, networking)
- Writes a kubeconfig
- Applies cross-cluster compatibility shims (storage class aliases)
- Installs Traefik (ingress)
For a VM or bare-metal cluster (MicroK8s), the platform layer:
- Generates and applies cloud-init
- Creates the VM or configures the physical machine
- Bootstraps MicroK8s
- Connects via Ansible for post-boot configuration
The platform layer is intentionally separate from the UIS service layer. Once
02-post-apply.sh (or equivalent) finishes, the cluster is identical from UIS's
perspective regardless of what created it.
Target Platforms
| Target | Type | Tooling | Status |
|---|---|---|---|
aks | Azure managed K8s | OpenTofu + bash | Step 1 in progress |
gke | GCP managed K8s | OpenTofu + bash | Not started |
eks | AWS managed K8s | OpenTofu + bash | Not started |
microk8s-vm | Ubuntu VM (any cloud) | cloud-init + Ansible | Exists in hosts/ — needs migration |
microk8s-metal | Ubuntu bare metal | cloud-init + Ansible | Exists partially in hosts/ |
microk8s-rpi | Raspberry Pi | cloud-init + Ansible | Exists partially in hosts/ |
Directory Structure (Target)
platforms/
aks/ ← Azure Kubernetes Service (OpenTofu)
azure-aks-config.sh-template
tofu/
main.tf
variables.tf
outputs.tf
backend.tf
terraform.tfvars.example
manifests/
000-storage-class-azure-alias.yaml
scripts/
00-bootstrap-state.sh
01-apply.sh
02-post-apply.sh
03-destroy.sh
README.md
gke/ ← Google Kubernetes Engine (OpenTofu)
gke-config.sh-template
tofu/
scripts/
README.md
eks/ ← Elastic Kubernetes Service (OpenTofu)
eks-config.sh-template
tofu/
scripts/
README.md
microk8s-vm/ ← Ubuntu VM on any cloud (Ansible + cloud-init)
config.sh-template
cloud-init/
ansible/
scripts/
README.md
microk8s-metal/ ← Ubuntu bare metal (Ansible + cloud-init)
config.sh-template
cloud-init/
ansible/
scripts/
README.md
microk8s-rpi/ ← Raspberry Pi (Ansible + cloud-init)
config.sh-template
cloud-init/
ansible/
scripts/
README.md
Design Decisions
OpenTofu for cloud targets, Ansible/cloud-init for VM/bare-metal
Cloud-managed clusters (AKS, GKE, EKS) are a natural fit for OpenTofu:
- Declarative resource definitions
- Remote state (Azure Blob, GCS, S3)
- Plan/apply/destroy lifecycle
- Managed identity and role assignments are first-class
VM and bare-metal targets are not a good fit for OpenTofu because:
- The "resource" is a physical machine or a VM that already exists
- Configuration is imperative (run these commands in this order)
- Ansible and cloud-init are already proven for this use case
- The existing
hosts/scripts for MicroK8s are largely correct and just need migration
Single config file per platform
Each platform has one *-config.sh file (git-ignored) that is sourced by all scripts.
This mirrors the existing azure-aks-config.sh pattern and means there is one place
to change values — no separate tfvars editing, no env file juggling.
State backend from day one
All OpenTofu platforms use remote state (Azure Blob, GCS bucket, S3 bucket).
Local state is never used in production. State storage is created by a one-time
00-bootstrap-state.sh script and survives cluster destroy/recreate cycles.
Script numbering convention
00-bootstrap-state.sh ← one-time pre-requisite
01-apply.sh ← create / update
02-post-apply.sh ← configure (storage, ingress)
03-destroy.sh ← tear down
This makes the execution order self-documenting and consistent across all cloud platforms.
Relationship to hosts/
hosts/ is not deleted. It is kept as a reference and fallback while platforms/ is built
out. Migration happens platform by platform:
platforms/aks/replaceshosts/azure-aks/andhosts/install-azure-aks.shplatforms/microk8s-vm/replaceshosts/azure-microk8s/,hosts/multipass-microk8s/platforms/microk8s-rpi/replaceshosts/raspberry-microk8s/hosts/is archived or deleted after all platforms are migrated
AKS — Step-by-Step Plan
AKS is the first platform. It is being built iteratively:
Step 1 (current): Minimal working cluster ✅
- Resource group
- AKS cluster (matches original
az aks createflags) - Azure Blob remote state
- Storage class aliases
- Traefik via Helm in
02-post-apply.sh
Step 2: ACR + Key Vault
- Azure Container Registry
- Role assignment: AKS managed identity → ACR pull
- Azure Key Vault
- Key Vault access: RBAC model
- Config additions to
azure-aks-config.sh-template
Step 3: Workload Identity
- Enable OIDC issuer on AKS
- Federated credential for service account
- Annotated Kubernetes service account
- Key Vault secret access via Workload Identity (replaces pod-level managed identity)
Step 4: Networking
- Dedicated VNet and subnet
- Private cluster option
- Network Security Group rules
GKE — Questions to Answer
Before building platforms/gke/:
- Which GCP project and region?
- Standard or Autopilot cluster?
- State backend: GCS bucket in same project?
- Node pool sizing — same as AKS defaults?
- Workload Identity for GKE uses a different mechanism than AKS — needs separate investigation
- Does Red Cross have a GCP subscription or is this helpers.no only?
EKS — Questions to Answer
Before building platforms/eks/:
- Which AWS account and region?
- EKS managed node groups or Fargate?
- State backend: S3 bucket + DynamoDB lock table
- VPC: reuse existing or create new?
- IAM roles for service accounts (IRSA) vs EKS Pod Identity
MicroK8s VM — Migration Questions
Before migrating hosts/azure-microk8s/ and hosts/multipass-microk8s/:
- Should
platforms/microk8s-vm/be cloud-agnostic, or separatemicrok8s-azure-vm/andmicrok8s-multipass/? - The existing scripts reference
topsecret/viapaths.shfallback — this must be cleaned up as part of migration - Ansible playbooks live in
ansible/playbooks/— should the platform have its own playbooks or reference shared ones? - Cloud-init templates are in
cloud-init/— same question - Tailscale bootstrap is required for VM targets — the source of truth for the Tailscale auth key needs to be resolved (see INVESTIGATE-remote-deployment-targets.md)
Shared Concerns Across All Platforms
kubeconfig merge
All platforms write a *-kubeconf file. The 04-merge-kubeconf.yml Ansible playbook
merges these into kubeconf-all. This must keep working as platforms are added.
The current ./uis wrapper bypasses this by symlinking kubeconf-all to the host
kubeconfig. This should be restored to a real merge as part of the target management
work (see INVESTIGATE-remote-deployment-targets.md).
.gitignore
All platform config files and generated tofu files must be git-ignored:
# Platform configs (contain secrets)
platforms/*/azure-aks-config.sh
platforms/*/gke-config.sh
platforms/*/eks-config.sh
platforms/*/config.sh
# OpenTofu generated files
platforms/*/tofu/terraform.tfvars
platforms/*/tofu/tfplan
platforms/*/tofu/.terraform/
platforms/*/tofu/.terraform.lock.hcl
Traefik values
02-post-apply.sh references /mnt/urbalurbadisk/manifests/003-traefik-config.yaml.
This is shared across all cluster types. It should remain in manifests/ and be
referenced from all platform post-apply scripts.
Open Questions
- Should
platforms/have its ownREADME.mdat the top level explaining the concept? - Should OpenTofu modules be shared across
aks/,gke/,eks/(e.g., a sharedmodules/traefik-helm/) or kept separate for simplicity? - What is the right home for the Ansible merge playbook — should each platform call it, or should there be a shared
platforms/common/area? - Should
hosts/be deleted immediately after a platform is migrated, or kept until all platforms are done? - How does this interact with the
./uis targetcommand work described in INVESTIGATE-remote-deployment-targets.md? Theplatforms/scripts are the implementation behind./uis target createand./uis target bootstrap.
Related
- INVESTIGATE-remote-deployment-targets.md — target management UX (
./uis targetcommands) - INVESTIGATE-provision-host-tools-and-auth.md — tool installation and cloud auth inside provision-host
platforms/aks/README.md— AKS platform documentationhosts/azure-aks/— original bash scripts being replaced