AI Package
The AI package is a comprehensive self-hosted AI platform that enables organizations to build and deploy AI applications with advanced document processing and knowledge management capabilities. This implementation is based on the Open WebUI project, which provides a powerful foundation for building AI applications. We've enhanced and customized it to better suit enterprise needs and specific use cases.
Getting started
On your host computer, run the following command to install the AI package:
./scripts/packages/ai.sh
The script will install the AI package and start the Open WebUI frontend. You can then access the Open WebUI frontend at http://openwebui.localhost.
The install takes a while to complete as it sets up the LiteLLM proxy and OpenWebUI components. No models are downloaded into the cluster - all models are accessed through the LiteLLM proxy.
To use models, you should install Ollama on your host computer and configure the LiteLLM proxy to access it. When running on your host computer, Ollama is able to use the GPU and access more memory for larger, smarter models.
Checking installation progress
You are not supposed to know anything about kubernetes so we have a script that you can run on your host computer to check the progress of the installation.
./scripts/manage/k9s.sh
This will show you a list of whats going on in the cluster. You just need to wait until you see Running on all items on the list.
Installing ollama on your host computer
Go to ollama.com and download the ollama binary for your platform.
There are documentation on how to install ollama on your platform on the ollama documentation on github.
TODO: add more doc here so people can pull a model and run it locally. There must be someone that has written a good guide for this.
Technical stuff
Implementation Differences from Open WebUI
While our implementation is based on Open WebUI, we've made several significant modifications to enhance its capabilities and better suit enterprise needs:
1. Vector Database
- Original: Uses ChromaDB as the default vector database
- Our Implementation: Uses PostgreSQL with pgvector extension
- Leverages existing shared database infrastructure
- Simplified deployment and maintenance
- Better integration with enterprise databases
- Reduced resource requirements
2. Document Processing
- Original: Uses embedded Tika server
- Our Implementation: Deployed standalone Tika server
- Better resource isolation
- Improved scalability
- Independent scaling of document processing
- Enhanced reliability
3. LLM Integration
- Original: Direct integration with Ollama and OpenAI-compatible APIs
- Our Implementation: Uses LiteLLM as a central proxy
- Unified interface for all LLM providers
- Advanced fallback mechanisms
- Better cost tracking and monitoring
- Enhanced rate limiting and access control
- Support for multiple API providers through a single interface
4. Storage Architecture
- Original: Uses embedded storage solutions
- Our Implementation: Kubernetes-native persistent storage
- Better data persistence
- Improved backup capabilities
- Enhanced scalability
- Better resource management
5. Deployment Architecture
- Original: Designed for simpler deployments
- Our Implementation: Kubernetes-native deployment
- Better scalability
- Enhanced reliability
- Improved resource management
- Better integration with enterprise infrastructure
6. Security Enhancements
- Original: Basic security features
- Our Implementation: Enhanced security features
- Centralized API key management
- Advanced access control
- Better secret management
- Enhanced audit capabilities
7. Monitoring and Management
- Original: Basic monitoring capabilities
- Our Implementation: Enhanced monitoring and management
- Detailed cost tracking
- Usage analytics
- Better resource monitoring
- Enhanced troubleshooting capabilities
Key features include:
-
Knowledge Base Management:
- Document ingestion and processing through a RAG (Retrieval-Augmented Generation) pipeline
- Support for various document formats (PDF, Word, Excel, PowerPoint)
- Integration with Apache Tika for advanced document extraction
- Vector database storage using PostgreSQL with pgvector extension
-
Chat Interface:
- ChatGPT-like interface for querying knowledge bases
- Support for multiple LLM providers through LiteLLM proxy
- Local model support via Ollama
- Markdown and LaTeX support for rich text interactions
- Code execution capabilities via Pyodide
- Mermaid diagram rendering for visualizations
-
Collaboration Features:
- Multi-user support with role-based access control
- Team-based knowledge sharing
- User groups and granular permissions
- Shared workspaces and chat histories
- Webhook integrations for notifications (Discord, Slack, Teams)
-
Mobile & Accessibility:
- Progressive Web App (PWA) support for mobile devices
- Responsive design for desktop and mobile
- Speech-to-text integration
- Offline capabilities when hosted locally
-
Security & Administration:
- Granular user permissions and access control
- LDAP authentication support
- API key management
- Model whitelisting
- Rate limiting and usage monitoring
- Toxic content filtering
-
Integration Capabilities:
- Support for multiple OpenAI-compatible APIs
- Custom database integration (SQLite, Postgres)
- External speech-to-text services
- Web search integration for RAG
- Custom pipeline support for extended functionality
-
Organizational Features:
- Centralized API Key Management:
- Secure sharing of API keys across departments
- Support for multiple LLM providers (OpenAI, Anthropic, Azure, etc.)
- Virtual key management for different teams/projects
- Rate limiting and usage quotas per department
- Cost Management:
- Detailed cost tracking per department/user
- Usage monitoring and analytics
- Budget management and alerts
- Integration with logging tools (Lunary, MLflow, Langfuse, Helicone)
- LLM Gateway Features:
- Unified interface for accessing 100+ LLM models
- Automatic retry and fallback logic
- Consistent output format across providers
- Load balancing across multiple deployments
- Centralized API Key Management:
The platform is designed to operate entirely offline while maintaining enterprise-grade security and scalability features. It provides organizations with a secure, cost-effective way to leverage multiple LLM providers while maintaining control over usage and costs.
System Architecture
The AI platform uses LiteLLM as a central proxy for all LLM interactions, providing unified access to multiple AI providers through a single interface.
Use Cases
1. Document Processing and Knowledge Base Creation
2. Querying Knowledge Bases with Different LLMs
3. Multi-User Collaboration
The platform supports flexible LLM configuration per knowledge base:
- Internal documents can be configured to use local Ollama models for enhanced privacy
- Public documents can leverage external LLMs (OpenAI, Anthropic, Azure) through LiteLLM
- Department-specific knowledge bases can have custom LLM configurations
- All configurations are managed through the LiteLLM proxy, providing unified access and cost tracking
Open WebUI Stack Setup
The AI stack is set up using an Ansible playbook (200-setup-open-webui.yml) that deploys a complete AI infrastructure on Kubernetes. The stack consists of several key components:
Core Components
-
Persistent Storage
- Provides persistent storage for all AI components
- Ensures data persistence across pod restarts
- Kubernetes Persistent Volumes Documentation
-
Apache Tika
- Document processing and extraction server
- Used for handling various document formats
- Helm chart:
tika/tika - Apache Tika Official Website
- Tika Helm Chart
-
LiteLLM
- LLM proxy service and gateway
- Acts as a central dispatcher for all LLM requests
- Enables integration with various LLM providers
- Supports OpenAI, Anthropic, Azure, and other providers
- Provides detailed cost tracking and usage analytics
- Manages API keys and access control
- Implements rate limiting and fallback strategies
- Configuration: Uses external ConfigMap in
topsecret/kubernetes/kubernetes-secrets.yml - Database: Uses shared PostgreSQL (database:
litellm, user:litellm) - Helm chart:
oci://ghcr.io/berriai/litellm-helm - LiteLLM Official Website
- LiteLLM Documentation
- LiteLLM Helm Chart
-
Open WebUI
- An extensible, feature-rich, and user-friendly self-hosted AI platform
- Designed to operate entirely offline
- Supports various LLM runners like Ollama and OpenAI-compatible APIs
- Features a built-in inference engine for RAG (Retrieval-Augmented Generation)
- Provides a powerful AI deployment solution with enterprise capabilities
- Helm chart:
open-webui/open-webui - Open WebUI Official Website
- Open WebUI Documentation
- Open WebUI GitHub
- Open WebUI Helm Chart
Open WebUI Custom Configuration
The default Open WebUI Helm chart has been customized to better integrate with our AI stack:
Disabled Components
- Embedded Ollama (using LiteLLM proxy instead)
- Built-in Tika server (using standalone Tika deployment)
- WebSocket support (not required for our setup)
- Redis cluster (not required for our setup)
Enabled Features
- Document processing pipelines
- Persistent storage using existing PVC
- Integration with LiteLLM proxy for LLM access
- PostgreSQL with pgvector for vector database storage
- Standalone Tika server for document extraction
Resource Configuration
- Memory: 768Mi request, 1.5Gi limit
- CPU: 300m request, 600m limit
Key Integrations
-
LiteLLM Proxy
- Connected via
OPENAI_API_BASE: http://litellm:4000 - Uses master key from Kubernetes secrets
- Enables access to multiple LLM providers
- Connected via
-
Document Processing
- Uses standalone Tika server at
http://tika:9998 - Configured for document extraction and processing
- Uses standalone Tika server at
-
Vector Database
- Uses PostgreSQL with pgvector extension
- Database:
openwebui - Connection via shared PostgreSQL service
-
Embedding Model
- Uses
all-MiniLM-L6-v2for RAG embeddings - Configured for efficient document processing
- Uses
Technical Notes
- Model Access via LiteLLM Proxy:
- No models are deployed in the cluster
- All model access is routed through LiteLLM proxy
- Configure Ollama on your host computer for local model access
- Cloud models (OpenAI, Anthropic, etc.) are accessed via API keys in LiteLLM configuration
Configuration and Requirements
The setup requires:
- A Kubernetes cluster
- Helm package manager
- Required API keys stored in Kubernetes secrets:
LITELLM_PROXY_MASTER_KEYOPENAI_API_KEYANTHROPIC_API_KEYAZURE_API_KEYAZURE_API_BASE- PostgreSQL database credentials (automatically configured)
Deployment Process
- Creates an
ainamespace in Kubernetes - Verifies required secrets exist
- Sets up persistent storage
- Adds required Helm repositories
- Deploys components in sequence:
- LiteLLM proxy (first - required dependency)
- Tika server
- Open WebUI frontend
Each component is deployed with appropriate timeouts and readiness checks to ensure proper initialization.
Experiements and notes
RAG pipeline notes
Norwegian BERT Models for RAG
The following Norwegian BERT models are particularly well-suited for use in Retrieval-Augmented Generation (RAG) pipelines, providing strong Norwegian language understanding and generation capabilities:
| Model Name | Developer/Source | Main Use Case | Notes |
|---|---|---|---|
| NorBERT | University of Oslo | General Norwegian NLP | Trained from scratch on Norwegian |
| NorBERT 3 Large | Dataloop/NorwAI | Advanced NLP tasks in Norwegian | Large model, versatile |
| Klinisk NorBERT | eHealthResearch | Clinical/medical Norwegian text | Fine-tuned for healthcare |
| NB-BERT | National Library (NB) | General Norwegian & Scandinavian NLP | Trained on 200 years of text |
| Norwegian BERT | Certainly AI | General Norwegian NLP | Open-source, community-driven |
These models can be integrated into the RAG pipeline to enhance Norwegian language processing capabilities, particularly useful for:
- Document understanding and retrieval in Norwegian
- Question answering systems
- Text summarization
- Information extraction from Norwegian documents