LiteLLM AI Proxy Setup Guide

This document explains how LiteLLM is configured in the Urbalurba infrastructure and how to add/configure AI models.

Overview

LiteLLM is deployed as a unified AI model proxy that provides OpenAI-compatible API endpoints for multiple model sources including:

Local Ollama instances (in-cluster and external)
Cloud AI providers (OpenAI, Anthropic, Google, etc.)
Custom model endpoints

Architecture

Applications → LiteLLM Proxy → Model Sources
                   ↓
            Shared PostgreSQL

Key Components

LiteLLM Pod: Main proxy service (ai namespace)
Shared PostgreSQL: Database for configuration, keys, and usage tracking
ConfigMap: Model configuration and routing rules
Ingress: External access via http://litellm.localhost

Database Setup

LiteLLM uses a dedicated database on the shared PostgreSQL instance:

Database: litellm
User: litellm
Host: postgresql.default.svc.cluster.local:5432

Database Management

Create database:

cd /mnt/urbalurbadisk
ansible-playbook ansible/playbooks/utility/u10-litellm-create-postgres.yml -e operation=create

Delete database (⚠️ DESTRUCTIVE):

ansible-playbook ansible/playbooks/utility/u10-litellm-create-postgres.yml -e operation=delete -e force_delete=true

Configuration Management

LiteLLM configuration is managed via external ConfigMap in topsecret/kubernetes/kubernetes-secrets.yml. The Helm chart is configured to use this existing ConfigMap rather than creating its own.

Helm Configuration (manifests/220-litellm-config.yaml):

# Use existing ConfigMap instead of inline config
configMapRef:
  name: litellm-config
  key: config.yaml

# Disable Helm-managed ConfigMap creation
proxyConfigMap:
  create: false
  name: litellm-config

ConfigMap Definition (topsecret/kubernetes/kubernetes-secrets.yml):

apiVersion: v1
kind: ConfigMap
metadata:
  name: litellm-config
  namespace: ai
data:
  config.yaml: |
    general_settings:
      master_key: os.environ/LITELLM_PROXY_MASTER_KEY
    model_list:
      - model_name: mac-gpt-oss-balanced
        litellm_params:
          model: ollama/gpt-oss:20b
          api_base: "http://host.lima.internal:11434"
          temperature: 0.7

Adding New Models

1. Ollama Models (Local)

In-cluster Ollama:

- model_name: qwen3-0.6b-incluster
  litellm_params:
    model: ollama/qwen3:0.6b
    api_base: "http://ollama.ai.svc.cluster.local:11434"

External Ollama (Mac/Host):

- model_name: external-llama3
  litellm_params:
    model: ollama/llama3:8b
    api_base: "http://host.lima.internal:11434"
    temperature: 0.7

2. Cloud Providers

OpenAI:

- model_name: gpt-4o
  litellm_params:
    model: gpt-4o
    api_key: "os.environ/OPENAI_API_KEY"

Anthropic Claude:

- model_name: claude-3-sonnet
  litellm_params:
    model: anthropic/claude-3-sonnet-20240229
    api_key: "os.environ/ANTHROPIC_API_KEY"

Google Gemini:

- model_name: gemini-pro
  litellm_params:
    model: gemini/gemini-pro
    api_key: "os.environ/GOOGLE_API_KEY"

3. Model Variants with Different Temperatures

- model_name: mac-gpt-oss-creative
  litellm_params:
    model: ollama/gpt-oss:20b
    api_base: "http://host.lima.internal:11434"
    temperature: 0.9

- model_name: mac-gpt-oss-precise
  litellm_params:
    model: ollama/gpt-oss:20b
    api_base: "http://host.lima.internal:11434"
    temperature: 0.3

4. Fallback Configuration

- model_name: gpt-4-with-fallback
  litellm_params:
    model: gpt-4
    api_key: "os.environ/OPENAI_API_KEY"
    fallbacks:
      - model: ollama/llama3:8b
        api_base: "http://host.lima.internal:11434"

Deployment Process

1. Update Configuration

Edit the ConfigMap in topsecret/kubernetes/kubernetes-secrets.yml

2. Apply Changes

# Copy files to provision-host
./copy2provisionhost.sh

# Apply from provision-host container
docker exec -it provision-host bash -c "cd /mnt/urbalurbadisk && kubectl apply -f topsecret/kubernetes/kubernetes-secrets.yml"

3. Restart LiteLLM

kubectl rollout restart deployment/litellm -n ai

4. Verify Models

# Port forward to access API
kubectl port-forward svc/litellm 4000:4000 -n ai

# Get master key
MASTER_KEY=$(kubectl get secret urbalurba-secrets -n ai -o jsonpath="{.data.LITELLM_PROXY_MASTER_KEY}" | base64 --decode)

# List available models
curl -X GET http://localhost:4000/v1/models -H "Authorization: Bearer $MASTER_KEY"

Full Installation

Use the Ansible playbook for complete setup:

cd /mnt/urbalurbadisk
ansible-playbook ansible/playbooks/210-setup-litellm.yml

This playbook:

Creates the PostgreSQL database
Deploys LiteLLM via Helm
Applies ingress configuration
Verifies installation

API Usage

Authentication

All requests require the master key:

Authorization: Bearer $MASTER_KEY

List Models

curl -X GET http://localhost:4000/v1/models \
  -H "Authorization: Bearer $MASTER_KEY"

Chat Completion

curl -X POST http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MASTER_KEY" \
  -d '{
    "model": "mac-gpt-oss-balanced",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Environment Variables

Required secrets in urbalurba-secrets:

LITELLM_PROXY_MASTER_KEY: API authentication (secure random key)
LITELLM_POSTGRESQL__USER: Database username (litellm)
LITELLM_POSTGRESQL__PASSWORD: Database password (secure random password)
OPENAI_API_KEY: OpenAI API access (if using OpenAI models)
ANTHROPIC_API_KEY: Anthropic API access (if using Claude)
GOOGLE_API_KEY: Google API access (if using Gemini)

Troubleshooting

Check Pod Status

kubectl get pods -n ai
kubectl logs -f deployment/litellm -n ai

Database Connection Issues

# Test database connectivity
kubectl exec -it litellm-xxx -n ai -- psql postgresql://litellm:$DB_PASSWORD@postgresql.default.svc.cluster.local:5432/litellm

Model Not Available

Verify model configuration in ConfigMap
Check API keys for cloud providers
Ensure Ollama is running and accessible
Review LiteLLM logs for specific errors

Configuration Reload

kubectl rollout restart deployment/litellm -n ai
kubectl rollout status deployment/litellm -n ai

Access Points

Internal: http://litellm.ai.svc.cluster.local:4000
External: http://litellm.localhost (via Traefik)
Port Forward: kubectl port-forward svc/litellm 4000:4000 -n ai

LiteLLM Admin UI Access

⚠️ IMPORTANT: The LiteLLM Admin UI with authentication is an Enterprise/Premium feature only.

Free Version (Current Setup):

✅ API Access: Full API functionality available
✅ Model Management: Via API calls and OpenWebUI interface
❌ Web Admin UI: No authentication available (requires Enterprise license)

Enterprise Version (Paid):

✅ Authenticated Web UI: Username/password protection
✅ Advanced Features: SSO, RBAC, audit logging
✅ Dashboard Access: Full web-based management interface

Alternative Access:

Since the web UI requires a paid license, use OpenWebUI (http://openwebui.localhost) as your primary interface for:

Model selection and management
Chat interface with all LiteLLM models
User authentication via Authentik integration

Best Practices

Model Naming: Use descriptive names indicating source and characteristics
Temperature Variants: Create separate model entries for different use cases
Fallbacks: Configure local models as fallbacks for cloud models
API Keys: Store sensitive keys in Kubernetes secrets, reference as os.environ/KEY_NAME
Testing: Always verify model availability after configuration changes
Monitoring: Check logs regularly for authentication and connectivity issues

Complete AI Infrastructure Setup

Using the Orchestration Script

For a complete AI infrastructure deployment with both LiteLLM and OpenWebUI:

# From host machine
scripts/packages/ai.sh

# This runs the complete orchestration inside provision-host container

The orchestration performs:

LiteLLM Setup: Database creation + Helm deployment + ConfigMap configuration
OpenWebUI Setup: Database setup + Tika deployment + OpenWebUI with LiteLLM integration
Ingress Configuration: External access via openwebui.localhost and litellm.localhost

Manual Component Installation

LiteLLM Only:

cd /mnt/urbalurbadisk
ansible-playbook ansible/playbooks/210-setup-litellm.yml

OpenWebUI Only (requires LiteLLM running):

cd /mnt/urbalurbadisk
ansible-playbook ansible/playbooks/200-setup-open-webui.yml -e deploy_ollama_incluster=false

Final Configuration

After deployment, configure OpenWebUI to use LiteLLM:

Access OpenWebUI: http://openwebui.localhost
Create Admin User: First login creates admin account
Configure LiteLLM Connection:
- Go to Settings → Connections
- URL: http://litellm.ai.svc.cluster.local:4000/v1
- Auth: Bearer
- API Key: $(kubectl get secret urbalurba-secrets -n ai -o jsonpath="{.data.LITELLM_PROXY_MASTER_KEY}" | base64 --decode)
Save and Refresh: All LiteLLM models will appear in OpenWebUI

Integration with OpenWebUI

LiteLLM integrates seamlessly with OpenWebUI:

OpenWebUI configured to use LiteLLM as OpenAI-compatible backend
All LiteLLM models appear in OpenWebUI model dropdown
Arena mode available for model comparison
Single authentication point for all AI providers
Shared PostgreSQL database for both services
Unified ingress access via Traefik

Overview​

Architecture​

Key Components​

Database Setup​

Database Management​

Configuration Management​

Adding New Models​

1. Ollama Models (Local)​

2. Cloud Providers​

3. Model Variants with Different Temperatures​

4. Fallback Configuration​

Deployment Process​

1. Update Configuration​

2. Apply Changes​

3. Restart LiteLLM​

4. Verify Models​

Full Installation​

API Usage​

Authentication​

List Models​

Chat Completion​

Environment Variables​

Troubleshooting​

Check Pod Status​

Database Connection Issues​

Model Not Available​

Configuration Reload​

Access Points​

LiteLLM Admin UI Access​

Free Version (Current Setup):​

Enterprise Version (Paid):​

Alternative Access:​

Best Practices​

Complete AI Infrastructure Setup​

Using the Orchestration Script​

Manual Component Installation​

Final Configuration​

Integration with OpenWebUI​