Skip to main content

Overview

Open WebUI’s web search integration allows AI models to access real-time information from the internet, enhancing responses with current data, facts, and sources.

Supported Search Providers

Open WebUI integrates with 15+ search providers, offering flexibility for different use cases and privacy preferences.

Self-Hosted Options

SearXNG

Privacy-focused metasearch engine
  • Aggregates results from multiple sources
  • Self-hosted, no tracking
  • Highly customizable
  • Free and open source

YaCy

Decentralized search
  • P2P search network
  • Full control over infrastructure
  • No central authority

Commercial APIs

High-quality paid services:
  • Google PSE (Programmable Search Engine)
    • Reliable, comprehensive results
    • Custom search engines
    • Usage-based pricing
  • Brave Search
    • Privacy-focused
    • Independent index
    • Competitive pricing
  • Kagi
    • Premium search quality
    • No ads or tracking
    • Advanced features

Configuration

Basic Setup

Enable and configure web search through the admin panel:
# From routers/retrieval.py:534-597
{
  "ENABLE_WEB_SEARCH": true,
  "WEB_SEARCH_ENGINE": "searxng",  // Choose your provider
  "WEB_SEARCH_RESULT_COUNT": 5,
  "WEB_SEARCH_CONCURRENT_REQUESTS": 3
}

Provider-Specific Configuration

{
  "WEB_SEARCH_ENGINE": "searxng",
  "SEARXNG_QUERY_URL": "http://searxng:8080/search",
  "SEARXNG_LANGUAGE": "en"
}
Deploy SearXNG using Docker for easy self-hosting:
docker run -d -p 8080:8080 searxng/searxng

Advanced Settings

Performance

{
  "WEB_SEARCH_CONCURRENT_REQUESTS": 3,
  "WEB_LOADER_CONCURRENT_REQUESTS": 5,
  "WEB_LOADER_TIMEOUT": "30"
}

Security

{
  "WEB_SEARCH_TRUST_ENV": true,
  "ENABLE_WEB_LOADER_SSL_VERIFICATION": true
}

Filtering

{
  "WEB_SEARCH_DOMAIN_FILTER_LIST": [
    "wikipedia.org",
    "github.com"
  ]
}

Processing

{
  "BYPASS_WEB_SEARCH_WEB_LOADER": false,
  "BYPASS_WEB_SEARCH_EMBEDDING_AND_RETRIEVAL": false
}

Web Content Loading

Loading Engines

Choose how web pages are fetched and processed:
Standard HTTP requests
  • Fast and lightweight
  • Works for most sites
  • No JavaScript execution
{
  "WEB_LOADER_ENGINE": "default"
}

Using Web Search in Chat

Basic Usage

Trigger web search by mentioning URLs or topics:
# Search and cite sources
What are the latest developments in quantum computing?

# Direct URL loading
Summarize https://example.com/article

# Multiple URLs
Compare these articles:
- https://source1.com
- https://source2.com

Search Workflow

1

Query Detection

AI determines if web search would enhance the response
2

Search Execution

Query sent to configured search provider
  • Returns top N results (configurable)
  • Concurrent requests for multiple queries
3

Content Fetching

Web loader retrieves page content
  • Parallel loading of multiple URLs
  • Timeout protection
  • SSL verification
4

Processing

Content prepared for RAG:
  • Text extraction
  • Chunking (if enabled)
  • Embedding generation (if enabled)
5

Response Generation

AI uses retrieved content to formulate answer
  • Cites sources
  • Combines multiple perspectives
  • Provides attribution

YouTube Integration

Extract and search YouTube video transcripts:
{
  "YOUTUBE_LOADER_LANGUAGE": ["en", "es", "fr"],
  "YOUTUBE_LOADER_PROXY_URL": "http://proxy:8080",
  "YOUTUBE_LOADER_TRANSLATION": "en"
}
Features:
  • Automatic transcript extraction
  • Multi-language support
  • Translation capabilities
  • Proxy support for restricted regions
Usage:
Summarize this video: https://youtube.com/watch?v=VIDEO_ID

Domain Filtering

Control which domains are searched:
{
  "WEB_SEARCH_DOMAIN_FILTER_LIST": [
    "wikipedia.org",       # Encyclopedic content
    "github.com",          # Code repositories
    "stackoverflow.com",   # Technical Q&A
    "arxiv.org"            # Academic papers
  ]
}
When filter list is populated, only results from these domains are returned. Leave empty to search all domains.

Performance Optimization

Concurrent Requests

Balance speed and resource usage:
{
  "WEB_SEARCH_CONCURRENT_REQUESTS": 3,    // Parallel searches
  "WEB_LOADER_CONCURRENT_REQUESTS": 5     // Parallel page loads
}
Recommendations:
  • Small deployments: 2-3 concurrent searches, 3-5 loaders
  • Medium deployments: 3-5 concurrent searches, 5-10 loaders
  • Large deployments: 5-10 concurrent searches, 10-20 loaders

Timeouts

{
  "WEB_LOADER_TIMEOUT": "30",           // Seconds
  "PLAYWRIGHT_TIMEOUT": 30000,          // Milliseconds
  "FIRECRAWL_TIMEOUT": "60"             // Seconds
}
Set reasonable timeouts to prevent resource exhaustion. Slow sites can hang requests indefinitely.

Bypass Options

Skip Web Loader

Use search results without fetching full content:
{
  "BYPASS_WEB_SEARCH_WEB_LOADER": true
}
When to use:
  • Search summaries sufficient
  • Reduce API calls/bandwidth
  • Faster responses needed

Skip Embedding

Disable RAG processing for web content:
{
  "BYPASS_WEB_SEARCH_EMBEDDING_AND_RETRIEVAL": true
}
When to use:
  • Direct content injection
  • Avoid embedding costs
  • Real-time freshness required

External Search Integration

Connect custom search services:
{
  "WEB_SEARCH_ENGINE": "external",
  "EXTERNAL_WEB_SEARCH_URL": "http://your-search-api:8000",
  "EXTERNAL_WEB_SEARCH_API_KEY": "your-key"
}
API Contract:
// Request
POST /search
{
  "queries": ["search query 1", "query 2"],
  "count": 5
}

// Response
{
  "results": [
    {
      "title": "Result Title",
      "url": "https://example.com",
      "content": "Snippet..."
    }
  ]
}

API Endpoints

Configuration

# Get web search config
GET /api/v1/retrieval/config

# Response includes web search settings
{
  "web": {
    "ENABLE_WEB_SEARCH": true,
    "WEB_SEARCH_ENGINE": "searxng",
    ...
  }
}

Update Configuration

POST /api/v1/retrieval/config/update
{
  "web": {
    "ENABLE_WEB_SEARCH": true,
    "WEB_SEARCH_ENGINE": "brave",
    "BRAVE_SEARCH_API_KEY": "new-key"
  }
}

Best Practices

Choose Right Provider

Consider:
  • Privacy requirements
  • Budget constraints
  • Result quality needs
  • Geographic coverage
  • API rate limits

Optimize Result Count

Balance:
  • More results = better coverage
  • Fewer results = faster responses
  • Recommended: 3-7 results
  • Adjust based on use case

Use Domain Filtering

Benefits:
  • Improve result quality
  • Reduce irrelevant content
  • Focus on trusted sources
  • Faster processing

Monitor Costs

Track:
  • API usage per provider
  • Bandwidth consumption
  • Processing time
  • Set up alerts for limits

Troubleshooting

Check:
  • Provider API key is valid
  • Search engine is properly configured
  • Rate limits not exceeded
  • Domain filter not too restrictive
  • Network connectivity to provider
Solutions:
  • Reduce WEB_LOADER_TIMEOUT
  • Increase concurrent requests
  • Use faster web loader engine
  • Enable BYPASS_WEB_SEARCH_WEB_LOADER
  • Check network latency
Options:
  • Ensure ENABLE_WEB_LOADER_SSL_VERIFICATION is true
  • Update CA certificates
  • Configure proxy with valid certs
  • For internal sites, use custom CA
Verify:
  • Video has captions/transcript
  • Language is in YOUTUBE_LOADER_LANGUAGE
  • Proxy configured if in restricted region
  • Video is publicly accessible

Security Considerations

Important security practices:
  1. SSL Verification: Always enable in production
  2. API Keys: Use environment variables, never commit
  3. Domain Filtering: Prevent access to internal networks
  4. Rate Limiting: Implement to prevent abuse
  5. Timeout Settings: Prevent resource exhaustion

URL Validation

Open WebUI validates URLs to prevent SSRF attacks:
  • Blocks private IP ranges (10.0.0.0/8, 192.168.0.0/16, etc.)
  • Prevents localhost access
  • Validates URL format
  • Enforces timeout limits