Skip to main content

Overview

Open WebUI supports multiple LLM providers through OpenAI-compatible API endpoints. This includes native integrations with Anthropic Claude, Google Gemini, and many others.

Supported Providers

Anthropic Claude

Claude 3 Opus, Sonnet, and Haiku models

Google AI

Gemini Pro, Gemini Ultra, and PaLM 2

OpenRouter

Access to 100+ models from multiple providers

Groq

Ultra-fast LLM inference

Mistral AI

Mistral, Mixtral, and specialized models

Together AI

Open source models with fast inference

Anthropic Claude

Configuration

OPENAI_API_BASE_URL=https://api.anthropic.com/v1
OPENAI_API_KEY=sk-ant-...
ENABLE_OPENAI_API=True

Available Models

  • claude-3-opus-20240229 - Most powerful model
  • claude-3-sonnet-20240229 - Balanced performance and speed
  • claude-3-haiku-20240307 - Fastest and most affordable
  • claude-2.1 - Previous generation
Open WebUI automatically detects Anthropic URLs and applies appropriate model mappings.File: backend/open_webui/utils/anthropic.py

Features

  • System prompts
  • Tool/function calling
  • Vision capabilities (Claude 3)
  • Long context (200K tokens)

Google AI (Gemini)

Configuration

GEMINI_API_KEY=your-api-key
GEMINI_API_BASE_URL=https://generativelanguage.googleapis.com/v1

Available Models

  • gemini-1.5-pro - Latest and most capable
  • gemini-1.5-flash - Optimized for speed
  • gemini-1.0-pro - Previous generation

Features

  • Multimodal (text, image, video, audio)
  • 2M context window (Gemini 1.5 Pro)
  • Code execution
  • Function calling

OpenRouter

Access 100+ models from multiple providers through a single API.

Configuration

OPENAI_API_BASE_URL=https://openrouter.ai/api/v1
OPENAI_API_KEY=sk-or-v1-...
OpenRouter requires HTTP-Referer and X-Title headers. Open WebUI automatically adds these when it detects an OpenRouter URL.File: backend/open_webui/routers/openai.py:134
  • anthropic/claude-3-opus
  • google/gemini-pro-1.5
  • meta-llama/llama-3-70b-instruct
  • mistralai/mixtral-8x7b-instruct
  • openai/gpt-4-turbo

Groq

Ultra-fast inference for open source models.

Configuration

OPENAI_API_BASE_URL=https://api.groq.com/openai/v1
OPENAI_API_KEY=gsk_...

Available Models

  • llama-3.1-70b-versatile
  • llama-3.1-8b-instant
  • mixtral-8x7b-32768
  • gemma-7b-it

Features

  • Extremely fast inference (500+ tokens/sec)
  • Free tier available
  • OpenAI-compatible API

Mistral AI

Configuration

OPENAI_API_BASE_URL=https://api.mistral.ai/v1
OPENAI_API_KEY=your-mistral-key

Available Models

  • mistral-large-latest - Most capable
  • mistral-medium-latest - Balanced
  • mistral-small-latest - Fast and efficient
  • mixtral-8x7b - Open source MoE
  • codestral-latest - Code generation specialist

Together AI

Open source models with fast inference.

Configuration

OPENAI_API_BASE_URL=https://api.together.xyz/v1
OPENAI_API_KEY=your-together-key

Features

  • 50+ open source models
  • Fine-tuning support
  • Fast inference
  • Competitive pricing

Perplexity AI

Configuration

OPENAI_API_BASE_URL=https://api.perplexity.ai
OPENAI_API_KEY=pplx-...

Models

  • llama-3.1-sonar-large-128k-online - With web search
  • llama-3.1-sonar-small-128k-online - Faster with web search
  • llama-3.1-70b-instruct - Base model

Cohere

Configuration

OPENAI_API_BASE_URL=https://api.cohere.ai/v1
OPENAI_API_KEY=your-cohere-key

Models

  • command-r-plus - Most capable
  • command-r - Balanced
  • command-light - Fast and efficient

Hugging Face Inference

Configuration

OPENAI_API_BASE_URL=https://api-inference.huggingface.co/models
OPENAI_API_KEY=hf_...

Usage

Access any model on Hugging Face:
model: meta-llama/Llama-2-70b-chat-hf
model: mistralai/Mixtral-8x7B-Instruct-v0.1

Multiple Provider Setup

Configure multiple providers simultaneously:
OPENAI_API_BASE_URLS="https://api.openai.com/v1;https://api.anthropic.com/v1;https://api.groq.com/openai/v1"
OPENAI_API_KEYS="sk-openai-key;sk-ant-key;gsk-groq-key"

Provider-Specific Features

Vision Models

Providers with vision support:
  • OpenAI: gpt-4-vision-preview, gpt-4-turbo
  • Anthropic: claude-3-opus, claude-3-sonnet, claude-3-haiku
  • Google: gemini-1.5-pro, gemini-1.5-flash

Function Calling

Providers with function/tool calling:
  • OpenAI: All GPT models
  • Anthropic: Claude 3 models
  • Google: Gemini models
  • Mistral: Most models

Streaming

All providers support streaming responses through Server-Sent Events (SSE).

Cost Optimization

Model Selection

Use smaller models (haiku, flash, small) for simple tasks

Provider Comparison

Compare costs across providers for equivalent capabilities

Caching

Enable model caching to reduce duplicate API calls

Free Tiers

Leverage free tiers from Groq, Hugging Face, etc.

Troubleshooting

  1. Verify API key is valid
  2. Check base URL is correct
  3. Ensure provider service is operational
  4. Check firewall/network restrictions
Some providers require specific model IDs in configuration:
{
  "OPENAI_API_CONFIGS": {
    "0": {
      "model_ids": ["model-name-1", "model-name-2"]
    }
  }
}
Each provider has different rate limits:
  • Use multiple API keys
  • Implement exponential backoff
  • Monitor usage through provider dashboard

Best Practices

  1. API Key Security: Store keys in environment variables
  2. Model Prefixing: Use prefixes to distinguish provider models
  3. Cost Monitoring: Track usage across providers
  4. Fallback Providers: Configure multiple providers for redundancy
  5. Model Tagging: Use tags to categorize models by capability

References