INVESTIGATE: Unity Catalog CrashLoopBackOff
Related: INVESTIGATE-rancher-reset-and-full-verification Created: 2026-02-20 Status: COMPLETE Resolved: 2026-02-20
Problem
Unity Catalog server pod enters CrashLoopBackOff / RunContainerError after deployment via ./uis deploy unity-catalog.
- PostgreSQL dependency was deployed and healthy
- The Ansible playbook runs without errors
- The pod fails to start (never becomes ready, times out after 180s)
- Undeploy works correctly
Context
Discovered during full service verification (talk9.md, Round 6, Step 9). All other 20 testable services deploy and undeploy successfully from a clean slate after factory reset.
Root Causes Found (3 issues)
1. Wrong container image
File: manifests/320-unity-catalog-deployment.yaml
godatadriven/unity-catalog:latestimage is broken — missing jars directory, SBT build cache under/root/, broken classpath- Fix: Changed to official
unitycatalog/unitycatalog:latest
2. Wrong security context (permission denied)
File: manifests/320-unity-catalog-deployment.yaml
bin/start-uc-serveris owned by UID 100 (unitycatalog user) with permissions-r-xr-x---- Container was configured to run as root (UID 0), which has no execute permission on the file
- Fix: Changed
runAsUser: 0torunAsUser: 100,runAsGroup: 0torunAsGroup: 101, setrunAsNonRoot: true
3. Wrong API version in health probes
Files: manifests/320-unity-catalog-deployment.yaml, ansible/playbooks/320-setup-unity-catalog.yml
- Health probes and API calls used
/api/1.0/unity-catalog/catalogswhich returns 404 - Actual API endpoint is
/api/2.1/unity-catalog/catalogs - Fix: Updated all API paths from
/api/1.0/to/api/2.1/(3 probes in manifest + 7 occurrences in playbook)
4. No curl in container (playbook API tests broken)
File: ansible/playbooks/320-setup-unity-catalog.yml
- The
unitycatalog/unitycatalog:latestimage uses BusyBox which haswgetbut notcurl - All
kubectl exec ... -- curlcommands failed with "executable file not found" - Fixed 30-second pause was insufficient for Unity Catalog startup (needs 2-3 minutes)
- Fix: Replaced
curlwithwget -Sfor HTTP status checks andwget --post-datafor catalog creation. Replaced fixed pause with retry loop (18 retries, 10s apart).
Verification
After all 4 fixes, Unity Catalog deploys successfully:
- Pod status: 1/1 Running, 0 restarts
- Health check passes on
/api/2.1/unity-catalog/catalogs - PostgreSQL connection works correctly
- API connectivity test:
Working (HTTP 200) - Catalog creation test:
Success - Verified by tester in talk9.md Rounds 7 and 8