Ollama Integration - Open WebUI

Overview

Ollama is a local LLM runner that allows you to run large language models on your own hardware. Open WebUI provides native integration with Ollama, supporting both local and remote Ollama instances.

Quick Start

Install Ollama

Download and install Ollama from ollama.ai

Pull a Model

ollama pull llama2

Configure Open WebUI

Set the Ollama base URL in your environment or admin settings

Configuration

Environment Variables

OLLAMA_BASE_URL=http://localhost:11434
ENABLE_OLLAMA_API=True

Admin Panel Configuration

Navigate to Admin Panel > Settings > Connections to configure Ollama:

Enable Ollama API: Toggle to enable/disable Ollama integration
Base URLs: Add one or more Ollama server URLs
API Configurations: Configure advanced settings per instance

Advanced Configuration

Multiple Ollama Instances

Open WebUI supports load balancing across multiple Ollama instances:

{
  "OLLAMA_BASE_URLS": [
    "http://localhost:11434",
    "http://gpu-server-1:11434",
    "http://gpu-server-2:11434"
  ],
  "OLLAMA_API_CONFIGS": {
    "0": {
      "enable": true,
      "key": "",
      "prefix_id": "",
      "tags": [],
      "connection_type": "local"
    },
    "1": {
      "enable": true,
      "key": "your-api-key",
      "prefix_id": "gpu1",
      "tags": ["gpu", "fast"],
      "connection_type": "external"
    }
  }
}

Authentication

For secured Ollama instances:

{
  "OLLAMA_API_CONFIGS": {
    "0": {
      "enable": true,
      "key": "your-bearer-token"
    }
  }
}

The API key will be sent as: Authorization: Bearer {key}

Model Filtering

Filter specific models from an Ollama instance:

{
  "OLLAMA_API_CONFIGS": {
    "0": {
      "model_ids": ["llama2", "mistral", "codellama"]
    }
  }
}

Model Prefixing

Add prefixes to distinguish models from different instances:

{
  "OLLAMA_API_CONFIGS": {
    "0": {
      "prefix_id": "local"
    },
    "1": {
      "prefix_id": "remote"
    }
  }
}

Models will appear as local.llama2 and remote.llama2.

API Endpoints

Open WebUI proxies the following Ollama API endpoints:

Model Management

GET /ollama/api/tags - List available models
File: backend/open_webui/routers/ollama.py:448
POST /ollama/api/pull - Pull a model from registry
File: backend/open_webui/routers/ollama.py:708
POST /ollama/api/create - Create a model from Modelfile
File: backend/open_webui/routers/ollama.py:784
DELETE /ollama/api/delete - Delete a model
File: backend/open_webui/routers/ollama.py:874
POST /ollama/api/show - Show model information
File: backend/open_webui/routers/ollama.py:943

Inference

POST /ollama/api/generate - Generate completion
File: backend/open_webui/routers/ollama.py:1192
POST /ollama/api/chat - Chat completion
File: backend/open_webui/routers/ollama.py:1281
POST /ollama/api/embed - Generate embeddings
File: backend/open_webui/routers/ollama.py:1014

OpenAI Compatible

POST /ollama/v1/chat/completions - OpenAI-compatible chat
File: backend/open_webui/routers/ollama.py:1496
POST /ollama/v1/completions - OpenAI-compatible completions
File: backend/open_webui/routers/ollama.py:1412

Docker Integration

All-in-One Container

# With bundled Ollama
docker run -d -p 3000:8080 \
  --gpus=all \
  -v ollama:/root/.ollama \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:ollama

Separate Containers

docker-compose.yml

version: '3'
services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
      - open-webui:/app/backend/data
    depends_on:
      - ollama

volumes:
  ollama:
  open-webui:

Troubleshooting

Connection Errors

Cannot connect to Ollama

Docker Network Issues:If using Docker, ensure you’re using the correct hostname:

Same machine: http://host.docker.internal:11434
Different container: http://ollama:11434
Network mode host: http://localhost:11434

Firewall: Ensure port 11434 is accessible

Models not appearing

Verify Ollama is running: ollama list
Check ENABLE_OLLAMA_API is set to True
Refresh the models list in the UI
Check browser console for errors

Port 11434 in use

Open WebUI will automatically try port 12434 as fallback.File: backend/open_webui/config.py:1046

Performance Optimization

Load Balancing: The current implementation uses random selection for routing requests. For production deployments, consider implementing weighted round-robin or least-connections algorithms.File: backend/open_webui/routers/ollama.py:1

User Info Forwarding

Forward user information to Ollama for logging and access control:

ENABLE_FORWARD_USER_INFO_HEADERS=true

Headers sent:

X-OpenWebUI-User-Name
X-OpenWebUI-User-Id
X-OpenWebUI-User-Email
X-OpenWebUI-User-Role
X-OpenWebUI-Chat-Id

File: backend/open_webui/routers/ollama.py:93

Best Practices

Use Model Prefixes

Distinguish models from different instances with prefixes

Monitor Resources

Use GET /ollama/api/ps to see loaded models and memory usage

Enable Caching

Models are cached for better performance (default: 5 minutes TTL)

GPU Allocation

Configure model-specific GPU allocation in Ollama Modelfile

​Overview

​Quick Start

​Configuration

​Environment Variables

​Admin Panel Configuration

​Advanced Configuration

​Multiple Ollama Instances

​Authentication

​Model Filtering

​Model Prefixing

​API Endpoints

​Model Management

​Inference

​OpenAI Compatible

​Docker Integration

​All-in-One Container

​Separate Containers

​Troubleshooting

​Connection Errors

​Performance Optimization

​User Info Forwarding

​Best Practices

Use Model Prefixes

Monitor Resources

Enable Caching

GPU Allocation

Overview

Quick Start

Configuration

Environment Variables

Admin Panel Configuration

Advanced Configuration

Multiple Ollama Instances

Authentication

Model Filtering

Model Prefixing

API Endpoints

Model Management

Inference

OpenAI Compatible

Docker Integration

All-in-One Container

Separate Containers

Troubleshooting

Connection Errors

Performance Optimization

User Info Forwarding

Best Practices