Implementation Plans

How we plan, track, and implement features and fixes.

Related: WORKFLOW.md - End-to-end flow from idea to implementation

Folder Structure

website/docs/ai-developer/plans/
├── backlog/      # Approved plans waiting for implementation
├── active/       # Currently being worked on (max 1-2 at a time)
└── completed/    # Done - kept for reference

Flow

Idea/Problem → PLAN file in backlog/ → active/ → completed/
                       ↓
              (or INVESTIGATE file first if unclear)

File Types

PLAN-*.md

For work that is ready to implement. The scope is clear, the approach is known.

When to create:

Bug fix with known solution
Feature request with clear requirements
Infrastructure change with defined scope

Naming Conventions:

Format	Use Case	Example
`PLAN-<short-name>.md`	Standalone plan, no specific order	`PLAN-postgres-backup-cronjob.md`
`PLAN-<nnn>-<short-name>.md`	Ordered sequence, indicates execution order	`PLAN-001-monitoring-foundation.md`

Ordered Plans (PLAN-nnn-*)

When an investigation produces multiple related plans that should be executed in a specific order, use three-digit numbering to indicate the sequence:

PLAN-001-monitoring-foundation.md      # Must be done first (critical foundation)
PLAN-002-prometheus-config.md          # Can start after 001
PLAN-003-grafana-dashboards.md         # Depends on 002
PLAN-004-alerting-rules.md             # Depends on 003

Benefits of ordered numbering:

Clear execution sequence at a glance
Dependencies are implicit in the number order
Easy to track progress through a large initiative
Files sort naturally in file explorers

When to use ordered numbering:

Investigation produces 3+ related plans
Plans have sequential dependencies
Work is part of a larger initiative (e.g., monitoring stack overhaul)

When NOT to use ordered numbering:

Standalone bug fix or small feature
Plans can be executed in any order
Single plan from an investigation

Splitting Investigations into Multiple Plans

When an investigation covers a large initiative (e.g., deploying a new platform service with multiple phases), split it into separate ordered plans rather than one monolithic plan. Each plan should be independently completable and deliverable.

How to split:

Group by dependency and risk — phases that need different prerequisites (e.g., "no cluster needed" vs "requires running cluster") should be separate plans
Group by completeness — each plan should deliver something useful on its own, even if later plans aren't started yet
Keep optional/deferred work separate — don't mix required work with nice-to-haves in the same plan

Example: Deploying a new service with catalog generation

INVESTIGATE-backstage.md                    ← Research and decisions
  ↓ produces:
PLAN-001-backstage-metadata-and-generator.md  ← No cluster needed, low risk
PLAN-002-backstage-deployment.md              ← Cluster needed, medium risk
PLAN-003-backstage-auth-and-plugins.md        ← Optional, after deployment works

PLAN-001 adds metadata fields and builds the generator — pure code, no cluster, can be tested locally
PLAN-002 deploys Backstage following the adding-a-service guide — requires a running cluster
PLAN-003 adds Authentik SSO and extra plugins — optional, only if Authentik is deployed

Each plan references the investigation and the previous plan in its header:

**Investigation**: [INVESTIGATE-backstage.md](../backlog/INVESTIGATE-backstage.md)
**Prerequisites**: PLAN-001 must be complete first

Benefits:

Earlier plans can be completed and merged while later plans are still being refined
Risk is isolated — a deployment failure in PLAN-002 doesn't block the metadata/generator work in PLAN-001
Optional work (auth, plugins) can stay in backlog indefinitely without blocking core functionality
Each plan is small enough to review and validate in one session

INVESTIGATE-*.md

For work that needs research first. The problem exists but the solution is unclear.

When to create:

Complex infrastructure where options need evaluation
Bug with unknown root cause
Feature requiring architectural decisions

Naming: INVESTIGATE-<topic>.md

Examples:

INVESTIGATE-monitoring-architecture.md
INVESTIGATE-multi-cluster-networking.md

After investigation: Create one or more PLAN files with the chosen approach.

Plan Structure

Every plan has these sections:

1. Header (Required)

# Plan Title

> **IMPLEMENTATION RULES:** Before implementing this plan, read and follow:
> - [WORKFLOW.md](../../WORKFLOW.md) - The implementation process
> - [PLANS.md](../../PLANS.md) - Plan structure and best practices

## Status: Backlog | Active | Blocked | Completed

**Goal**: One sentence describing what this achieves.

**Last Updated**: 2026-01-18

**GitHub Issue**: #42 (optional - if tracking with issues)

The IMPLEMENTATION RULES blockquote ensures Claude Code reads the workflow and plan guidelines before starting work.

2. Dependencies (If applicable)

**Prerequisites**: PLAN-001 must be complete first
**Blocks**: PLAN-003 cannot start until this is done
**Priority**: High | Medium | Low

For ordered plans (PLAN-nnn-*), dependencies are often implicit in the number order. Only add explicit dependency notes when the relationship is non-obvious.

3. Problem Summary (Required)

What's wrong or what's needed. Be specific.

4. Phases with Tasks (Required)

Break work into phases. Each phase has:

Numbered tasks
A validation step at the end (usually user confirmation)

## Phase 1: Setup

### Tasks

- [ ] 1.1 Create the ConfigMap
- [ ] 1.2 Add validation rules
- [ ] 1.3 Test with dry-run

### Validation

User confirms phase is complete.

---

## Phase 2: Implementation

### Tasks

- [ ] 2.1 Create the deployment manifest
- [ ] 2.2 Add the service manifest
- [ ] 2.3 Apply and verify deployment

### Validation

User confirms deployment works correctly.

5. Acceptance Criteria (Required)

## Acceptance Criteria

- [ ] Manifests apply without errors
- [ ] Pods are running and healthy
- [ ] Service is accessible
- [ ] Documentation is updated

6. Implementation Notes (Optional)

Technical details, gotchas, code patterns to follow.

7. Files to Modify (Optional but helpful)

## Files to Modify

- `manifests/250-new-service.yaml`
- `docs/services/new-service.md`

Status Values

Status	Meaning	Location
`Backlog`	Approved, waiting to start	`plans/backlog/`
`Active`	Currently being worked on	`plans/active/`
`Blocked`	Waiting on something else	`plans/backlog/` or `plans/active/`
`Completed`	Done	`plans/completed/`

Updating Plans During Implementation

Critical: Plans are living documents. Update them as you work.

When starting a phase:

## Phase 2: Implementation — IN PROGRESS

When completing a task:

- [x] 2.1 Update the manifest ✓
- [ ] 2.2 Add the service

When a phase is done:

## Phase 2: Implementation — ✅ DONE

When blocked:

## Status: Blocked

**Blocked by**: Waiting for decision on approach

When complete:

Update status: ## Status: Completed
Add completion date: **Completed**: 2026-01-18
Move file: mv website/docs/ai-developer/plans/active/PLAN-xyz.md website/docs/ai-developer/plans/completed/
(Optional) Close GitHub issue if using issue tracking

Validation

Every phase ends with validation. The simplest form is asking the user to confirm.

Default: User Confirmation

Claude asks: "Phase 1 complete. Does this look good to continue?"

In the plan, this can be written as:

### Validation

User confirms phase is complete.

Optional: Automated Check

When a command can verify the work, include it:

### Validation

```bash
kubectl apply --dry-run=client -f manifests/xxx-new-service.yaml
kubectl get pods -n namespace -l app=new-service

User confirms output is correct.

### Key Point

Don't force automated validation when it's impractical. User confirmation is valid and often the best approach.

---

## Plan Templates

### Simple Bug Fix

```markdown
# Fix: [Bug Description]

> **IMPLEMENTATION RULES:** Before implementing this plan, read and follow:
> - [WORKFLOW.md](../../WORKFLOW.md) - The implementation process
> - [PLANS.md](../../PLANS.md) - Plan structure and best practices

## Status: Backlog

**Goal**: [One sentence]

**GitHub Issue**: #XX (optional)

**Last Updated**: YYYY-MM-DD

---

## Problem

[What's broken]

## Solution

[How to fix it]

---

## Phase 1: Fix

### Tasks

- [ ] 1.1 [Specific change]
- [ ] 1.2 [Another change]

### Validation

User confirms fix is correct.

---

## Acceptance Criteria

- [ ] Bug is fixed
- [ ] No regressions
- [ ] Manifests apply cleanly

Feature Implementation

# Feature: [Feature Name]

> **IMPLEMENTATION RULES:** Before implementing this plan, read and follow:
> - [WORKFLOW.md](../../WORKFLOW.md) - The implementation process
> - [PLANS.md](../../PLANS.md) - Plan structure and best practices

## Status: Backlog

**Goal**: [One sentence]

**GitHub Issue**: #XX (optional)

**Last Updated**: YYYY-MM-DD

---

## Overview

[What this feature does and why]

---

## Phase 1: [Setup/Preparation]

### Tasks

- [ ] 1.1 [Task]
- [ ] 1.2 [Task]

### Validation

User confirms phase is complete.

---

## Phase 2: [Core Implementation]

### Tasks

- [ ] 2.1 [Task]
- [ ] 2.2 [Task]

### Validation

User confirms phase is complete.

---

## Acceptance Criteria

- [ ] [Criterion]
- [ ] Deployment succeeds
- [ ] Services are accessible
- [ ] Documentation updated

---

## Files to Modify

- `manifests/xxx-new-feature.yaml`

Investigation

# Investigate: [Topic]

> **IMPLEMENTATION RULES:** Before implementing this plan, read and follow:
> - [WORKFLOW.md](../../WORKFLOW.md) - The implementation process
> - [PLANS.md](../../PLANS.md) - Plan structure and best practices

## Status: Backlog

**Goal**: Determine the best approach for [topic]

**Last Updated**: YYYY-MM-DD

---

## Questions to Answer

1. [Question 1]
2. [Question 2]

---

## Current State

[What exists now]

---

## Options

### Option A: [Name]

**Pros:**
-

**Cons:**
-

### Option B: [Name]

**Pros:**
-

**Cons:**
-

---

## Recommendation

[After investigation, what do we do?]

---

## Next Steps

- [ ] Create PLAN-xyz.md with chosen approach
  - For multiple related plans, use ordered naming: PLAN-001-*, PLAN-002-*, etc.

Working with Claude Code

See WORKFLOW.md for the complete flow from idea to implementation.

Best Practices

One active plan at a time - finish before starting another
Small phases - easier to validate and recover from errors
Specific tasks - "Update line 42 in manifests/xyz.yaml" not "Fix the thing"
Runnable validation - commands, not descriptions
Update as you go - the plan is the source of truth
Keep completed plans - they're documentation
Check existing lib/ before creating new code - see Library Reuse Rules below

Library Reuse Rules

CRITICAL: Before writing any new code in provision-host/uis/lib/, you MUST:

1. Check Existing Libraries

Review these files for existing functionality:

Library	Purpose
`paths.sh`	All path detection - TEMPLATES_DIR, EXTEND_DIR, SECRETS_DIR, etc.
`utilities.sh`	Base utilities - get_base_path(), die(), config_* functions
`logging.sh`	All logging - log_info(), log_error(), print_section()
`first-run.sh`	Initialization - check_first_run(), generate_ssh_keys()

2. Use Existing Functions

DO NOT create duplicate path functions. Use paths.sh:

# Good - use paths.sh functions
source "$LIB_DIR/paths.sh"
templates_dir=$(get_templates_dir)
secrets_dir=$(get_secrets_dir)

# Bad - creating your own path detection
_my_detect_templates_dir() { ... }  # WRONG!

3. If New Functionality is Needed

Ask these questions before creating new functions:

Does this already exist in another library?
Should this be added to an existing library instead?
Will multiple libraries need this? → Add to shared library
Is this truly specific to this feature? → OK to add locally

4. Centralized Path Functions

All paths are managed by paths.sh. Available functions:

get_templates_dir()           # provision-host/uis/templates/
get_extend_dir()              # .uis.extend/
get_secrets_dir()             # .uis.secrets/
get_services_dir()            # provision-host/uis/services/
get_tools_dir()               # provision-host/uis/tools/
get_hosts_templates_dir()     # templates/uis.extend/hosts/
get_secrets_templates_dir()   # templates/uis.secrets/
get_cloud_init_templates_dir() # templates/ubuntu-cloud-init/

Why This Matters

Code duplication leads to:

Inconsistent behavior (different functions return different values)
Maintenance burden (fix bugs in multiple places)
Confusion (which function should I use?)

Folder Structure​

Flow​

File Types​

PLAN-*.md​

Ordered Plans (PLAN-nnn-*)​

Splitting Investigations into Multiple Plans​

INVESTIGATE-*.md​

Plan Structure​

1. Header (Required)​

2. Dependencies (If applicable)​

3. Problem Summary (Required)​

4. Phases with Tasks (Required)​

5. Acceptance Criteria (Required)​

6. Implementation Notes (Optional)​

7. Files to Modify (Optional but helpful)​

Status Values​

Updating Plans During Implementation​

When starting a phase:​

When completing a task:​

When a phase is done:​

When blocked:​

When complete:​

Validation​

Default: User Confirmation​

Optional: Automated Check​

Feature Implementation​

Investigation​

Working with Claude Code​

Best Practices​

Library Reuse Rules​

1. Check Existing Libraries​

2. Use Existing Functions​

3. If New Functionality is Needed​

4. Centralized Path Functions​

Why This Matters​