Skip to main content

AI Package

The AI package is a comprehensive self-hosted AI platform that enables organizations to build and deploy AI applications with advanced document processing and knowledge management capabilities. This implementation is based on the Open WebUI project, which provides a powerful foundation for building AI applications. We've enhanced and customized it to better suit enterprise needs and specific use cases.

Getting started

On your host computer, run the following command to install the AI package:

./scripts/packages/ai.sh

The script will install the AI package and start the Open WebUI frontend. You can then access the Open WebUI frontend at http://openwebui.localhost.

The install takes a while to complete as it sets up the LiteLLM proxy and OpenWebUI components. No models are downloaded into the cluster - all models are accessed through the LiteLLM proxy.

To use models, you should install Ollama on your host computer and configure the LiteLLM proxy to access it. When running on your host computer, Ollama is able to use the GPU and access more memory for larger, smarter models.

Checking installation progress

You are not supposed to know anything about kubernetes so we have a script that you can run on your host computer to check the progress of the installation.

./scripts/manage/k9s.sh 

This will show you a list of whats going on in the cluster. You just need to wait until you see Running on all items on the list.

Installing ollama on your host computer

Go to ollama.com and download the ollama binary for your platform.

There are documentation on how to install ollama on your platform on the ollama documentation on github.

TODO: add more doc here so people can pull a model and run it locally. There must be someone that has written a good guide for this.

Technical stuff

Implementation Differences from Open WebUI

While our implementation is based on Open WebUI, we've made several significant modifications to enhance its capabilities and better suit enterprise needs:

1. Vector Database

  • Original: Uses ChromaDB as the default vector database
  • Our Implementation: Uses PostgreSQL with pgvector extension
    • Leverages existing shared database infrastructure
    • Simplified deployment and maintenance
    • Better integration with enterprise databases
    • Reduced resource requirements

2. Document Processing

  • Original: Uses embedded Tika server
  • Our Implementation: Deployed standalone Tika server
    • Better resource isolation
    • Improved scalability
    • Independent scaling of document processing
    • Enhanced reliability

3. LLM Integration

  • Original: Direct integration with Ollama and OpenAI-compatible APIs
  • Our Implementation: Uses LiteLLM as a central proxy
    • Unified interface for all LLM providers
    • Advanced fallback mechanisms
    • Better cost tracking and monitoring
    • Enhanced rate limiting and access control
    • Support for multiple API providers through a single interface

4. Storage Architecture

  • Original: Uses embedded storage solutions
  • Our Implementation: Kubernetes-native persistent storage
    • Better data persistence
    • Improved backup capabilities
    • Enhanced scalability
    • Better resource management

5. Deployment Architecture

  • Original: Designed for simpler deployments
  • Our Implementation: Kubernetes-native deployment
    • Better scalability
    • Enhanced reliability
    • Improved resource management
    • Better integration with enterprise infrastructure

6. Security Enhancements

  • Original: Basic security features
  • Our Implementation: Enhanced security features
    • Centralized API key management
    • Advanced access control
    • Better secret management
    • Enhanced audit capabilities

7. Monitoring and Management

  • Original: Basic monitoring capabilities
  • Our Implementation: Enhanced monitoring and management
    • Detailed cost tracking
    • Usage analytics
    • Better resource monitoring
    • Enhanced troubleshooting capabilities

Key features include:

  • Knowledge Base Management:

    • Document ingestion and processing through a RAG (Retrieval-Augmented Generation) pipeline
    • Support for various document formats (PDF, Word, Excel, PowerPoint)
    • Integration with Apache Tika for advanced document extraction
    • Vector database storage using PostgreSQL with pgvector extension
  • Chat Interface:

    • ChatGPT-like interface for querying knowledge bases
    • Support for multiple LLM providers through LiteLLM proxy
    • Local model support via Ollama
    • Markdown and LaTeX support for rich text interactions
    • Code execution capabilities via Pyodide
    • Mermaid diagram rendering for visualizations
  • Collaboration Features:

    • Multi-user support with role-based access control
    • Team-based knowledge sharing
    • User groups and granular permissions
    • Shared workspaces and chat histories
    • Webhook integrations for notifications (Discord, Slack, Teams)
  • Mobile & Accessibility:

    • Progressive Web App (PWA) support for mobile devices
    • Responsive design for desktop and mobile
    • Speech-to-text integration
    • Offline capabilities when hosted locally
  • Security & Administration:

    • Granular user permissions and access control
    • LDAP authentication support
    • API key management
    • Model whitelisting
    • Rate limiting and usage monitoring
    • Toxic content filtering
  • Integration Capabilities:

    • Support for multiple OpenAI-compatible APIs
    • Custom database integration (SQLite, Postgres)
    • External speech-to-text services
    • Web search integration for RAG
    • Custom pipeline support for extended functionality
  • Organizational Features:

    • Centralized API Key Management:
      • Secure sharing of API keys across departments
      • Support for multiple LLM providers (OpenAI, Anthropic, Azure, etc.)
      • Virtual key management for different teams/projects
      • Rate limiting and usage quotas per department
    • Cost Management:
      • Detailed cost tracking per department/user
      • Usage monitoring and analytics
      • Budget management and alerts
      • Integration with logging tools (Lunary, MLflow, Langfuse, Helicone)
    • LLM Gateway Features:
      • Unified interface for accessing 100+ LLM models
      • Automatic retry and fallback logic
      • Consistent output format across providers
      • Load balancing across multiple deployments

The platform is designed to operate entirely offline while maintaining enterprise-grade security and scalability features. It provides organizations with a secure, cost-effective way to leverage multiple LLM providers while maintaining control over usage and costs.

System Architecture

The AI platform uses LiteLLM as a central proxy for all LLM interactions, providing unified access to multiple AI providers through a single interface.

Use Cases

1. Document Processing and Knowledge Base Creation

2. Querying Knowledge Bases with Different LLMs

3. Multi-User Collaboration

The platform supports flexible LLM configuration per knowledge base:

  • Internal documents can be configured to use local Ollama models for enhanced privacy
  • Public documents can leverage external LLMs (OpenAI, Anthropic, Azure) through LiteLLM
  • Department-specific knowledge bases can have custom LLM configurations
  • All configurations are managed through the LiteLLM proxy, providing unified access and cost tracking

Open WebUI Stack Setup

The AI stack is set up using an Ansible playbook (200-setup-open-webui.yml) that deploys a complete AI infrastructure on Kubernetes. The stack consists of several key components:

Core Components

  1. Persistent Storage

  2. Apache Tika

  3. LiteLLM

    • LLM proxy service and gateway
    • Acts as a central dispatcher for all LLM requests
    • Enables integration with various LLM providers
    • Supports OpenAI, Anthropic, Azure, and other providers
    • Provides detailed cost tracking and usage analytics
    • Manages API keys and access control
    • Implements rate limiting and fallback strategies
    • Configuration: Uses external ConfigMap in topsecret/kubernetes/kubernetes-secrets.yml
    • Database: Uses shared PostgreSQL (database: litellm, user: litellm)
    • Helm chart: oci://ghcr.io/berriai/litellm-helm
    • LiteLLM Official Website
    • LiteLLM Documentation
    • LiteLLM Helm Chart
  4. Open WebUI

    • An extensible, feature-rich, and user-friendly self-hosted AI platform
    • Designed to operate entirely offline
    • Supports various LLM runners like Ollama and OpenAI-compatible APIs
    • Features a built-in inference engine for RAG (Retrieval-Augmented Generation)
    • Provides a powerful AI deployment solution with enterprise capabilities
    • Helm chart: open-webui/open-webui
    • Open WebUI Official Website
    • Open WebUI Documentation
    • Open WebUI GitHub
    • Open WebUI Helm Chart

Open WebUI Custom Configuration

The default Open WebUI Helm chart has been customized to better integrate with our AI stack:

Disabled Components

  • Embedded Ollama (using LiteLLM proxy instead)
  • Built-in Tika server (using standalone Tika deployment)
  • WebSocket support (not required for our setup)
  • Redis cluster (not required for our setup)

Enabled Features

  • Document processing pipelines
  • Persistent storage using existing PVC
  • Integration with LiteLLM proxy for LLM access
  • PostgreSQL with pgvector for vector database storage
  • Standalone Tika server for document extraction

Resource Configuration

  • Memory: 768Mi request, 1.5Gi limit
  • CPU: 300m request, 600m limit

Key Integrations

  1. LiteLLM Proxy

    • Connected via OPENAI_API_BASE: http://litellm:4000
    • Uses master key from Kubernetes secrets
    • Enables access to multiple LLM providers
  2. Document Processing

    • Uses standalone Tika server at http://tika:9998
    • Configured for document extraction and processing
  3. Vector Database

    • Uses PostgreSQL with pgvector extension
    • Database: openwebui
    • Connection via shared PostgreSQL service
  4. Embedding Model

    • Uses all-MiniLM-L6-v2 for RAG embeddings
    • Configured for efficient document processing

Technical Notes

  • Model Access via LiteLLM Proxy:
    • No models are deployed in the cluster
    • All model access is routed through LiteLLM proxy
    • Configure Ollama on your host computer for local model access
    • Cloud models (OpenAI, Anthropic, etc.) are accessed via API keys in LiteLLM configuration

Configuration and Requirements

The setup requires:

  • A Kubernetes cluster
  • Helm package manager
  • Required API keys stored in Kubernetes secrets:
    • LITELLM_PROXY_MASTER_KEY
    • OPENAI_API_KEY
    • ANTHROPIC_API_KEY
    • AZURE_API_KEY
    • AZURE_API_BASE
    • PostgreSQL database credentials (automatically configured)

Deployment Process

  1. Creates an ai namespace in Kubernetes
  2. Verifies required secrets exist
  3. Sets up persistent storage
  4. Adds required Helm repositories
  5. Deploys components in sequence:
    • LiteLLM proxy (first - required dependency)
    • Tika server
    • Open WebUI frontend

Each component is deployed with appropriate timeouts and readiness checks to ensure proper initialization.

Experiements and notes

RAG pipeline notes

Norwegian BERT Models for RAG

The following Norwegian BERT models are particularly well-suited for use in Retrieval-Augmented Generation (RAG) pipelines, providing strong Norwegian language understanding and generation capabilities:

Model NameDeveloper/SourceMain Use CaseNotes
NorBERTUniversity of OsloGeneral Norwegian NLPTrained from scratch on Norwegian
NorBERT 3 LargeDataloop/NorwAIAdvanced NLP tasks in NorwegianLarge model, versatile
Klinisk NorBERTeHealthResearchClinical/medical Norwegian textFine-tuned for healthcare
NB-BERTNational Library (NB)General Norwegian & Scandinavian NLPTrained on 200 years of text
Norwegian BERTCertainly AIGeneral Norwegian NLPOpen-source, community-driven

These models can be integrated into the RAG pipeline to enhance Norwegian language processing capabilities, particularly useful for:

  • Document understanding and retrieval in Norwegian
  • Question answering systems
  • Text summarization
  • Information extraction from Norwegian documents