Skip to main content

Overview

Ollama is a local LLM runner that allows you to run large language models on your own hardware. Open WebUI provides native integration with Ollama, supporting both local and remote Ollama instances.

Quick Start

1

Install Ollama

Download and install Ollama from ollama.ai
2

Pull a Model

ollama pull llama2
3

Configure Open WebUI

Set the Ollama base URL in your environment or admin settings

Configuration

Environment Variables

OLLAMA_BASE_URL=http://localhost:11434
ENABLE_OLLAMA_API=True

Admin Panel Configuration

Navigate to Admin Panel > Settings > Connections to configure Ollama:
  1. Enable Ollama API: Toggle to enable/disable Ollama integration
  2. Base URLs: Add one or more Ollama server URLs
  3. API Configurations: Configure advanced settings per instance

Advanced Configuration

Multiple Ollama Instances

Open WebUI supports load balancing across multiple Ollama instances:
{
  "OLLAMA_BASE_URLS": [
    "http://localhost:11434",
    "http://gpu-server-1:11434",
    "http://gpu-server-2:11434"
  ],
  "OLLAMA_API_CONFIGS": {
    "0": {
      "enable": true,
      "key": "",
      "prefix_id": "",
      "tags": [],
      "connection_type": "local"
    },
    "1": {
      "enable": true,
      "key": "your-api-key",
      "prefix_id": "gpu1",
      "tags": ["gpu", "fast"],
      "connection_type": "external"
    }
  }
}

Authentication

For secured Ollama instances:
{
  "OLLAMA_API_CONFIGS": {
    "0": {
      "enable": true,
      "key": "your-bearer-token"
    }
  }
}
The API key will be sent as: Authorization: Bearer {key}

Model Filtering

Filter specific models from an Ollama instance:
{
  "OLLAMA_API_CONFIGS": {
    "0": {
      "model_ids": ["llama2", "mistral", "codellama"]
    }
  }
}

Model Prefixing

Add prefixes to distinguish models from different instances:
{
  "OLLAMA_API_CONFIGS": {
    "0": {
      "prefix_id": "local"
    },
    "1": {
      "prefix_id": "remote"
    }
  }
}
Models will appear as local.llama2 and remote.llama2.

API Endpoints

Open WebUI proxies the following Ollama API endpoints:

Model Management

  • GET /ollama/api/tags - List available models
    File: backend/open_webui/routers/ollama.py:448
  • POST /ollama/api/pull - Pull a model from registry
    File: backend/open_webui/routers/ollama.py:708
  • POST /ollama/api/create - Create a model from Modelfile
    File: backend/open_webui/routers/ollama.py:784
  • DELETE /ollama/api/delete - Delete a model
    File: backend/open_webui/routers/ollama.py:874
  • POST /ollama/api/show - Show model information
    File: backend/open_webui/routers/ollama.py:943

Inference

  • POST /ollama/api/generate - Generate completion
    File: backend/open_webui/routers/ollama.py:1192
  • POST /ollama/api/chat - Chat completion
    File: backend/open_webui/routers/ollama.py:1281
  • POST /ollama/api/embed - Generate embeddings
    File: backend/open_webui/routers/ollama.py:1014

OpenAI Compatible

  • POST /ollama/v1/chat/completions - OpenAI-compatible chat
    File: backend/open_webui/routers/ollama.py:1496
  • POST /ollama/v1/completions - OpenAI-compatible completions
    File: backend/open_webui/routers/ollama.py:1412

Docker Integration

All-in-One Container

# With bundled Ollama
docker run -d -p 3000:8080 \
  --gpus=all \
  -v ollama:/root/.ollama \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:ollama

Separate Containers

docker-compose.yml
version: '3'
services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
      - open-webui:/app/backend/data
    depends_on:
      - ollama

volumes:
  ollama:
  open-webui:

Troubleshooting

Connection Errors

Docker Network Issues:If using Docker, ensure you’re using the correct hostname:
  • Same machine: http://host.docker.internal:11434
  • Different container: http://ollama:11434
  • Network mode host: http://localhost:11434
Firewall: Ensure port 11434 is accessible
  1. Verify Ollama is running: ollama list
  2. Check ENABLE_OLLAMA_API is set to True
  3. Refresh the models list in the UI
  4. Check browser console for errors
Open WebUI will automatically try port 12434 as fallback.File: backend/open_webui/config.py:1046

Performance Optimization

Load Balancing: The current implementation uses random selection for routing requests. For production deployments, consider implementing weighted round-robin or least-connections algorithms.File: backend/open_webui/routers/ollama.py:1

User Info Forwarding

Forward user information to Ollama for logging and access control:
ENABLE_FORWARD_USER_INFO_HEADERS=true
Headers sent:
  • X-OpenWebUI-User-Name
  • X-OpenWebUI-User-Id
  • X-OpenWebUI-User-Email
  • X-OpenWebUI-User-Role
  • X-OpenWebUI-Chat-Id
File: backend/open_webui/routers/ollama.py:93

Best Practices

Use Model Prefixes

Distinguish models from different instances with prefixes

Monitor Resources

Use GET /ollama/api/ps to see loaded models and memory usage

Enable Caching

Models are cached for better performance (default: 5 minutes TTL)

GPU Allocation

Configure model-specific GPU allocation in Ollama Modelfile