Skip to main content
Open WebUI provides comprehensive observability through OpenTelemetry, enabling production monitoring with distributed tracing, metrics collection, and centralized logging.

Overview

OpenTelemetry integration provides:
  • Distributed Tracing - Track requests across services and workers
  • Metrics Collection - Monitor performance, usage, and system health
  • Structured Logging - Centralized log aggregation and analysis
  • Auto-Instrumentation - Automatic instrumentation of FastAPI, SQLAlchemy, Redis, and HTTP clients

Configuration

Enable OpenTelemetry

Set these environment variables to enable observability:
# Enable OpenTelemetry
ENABLE_OTEL=true

# Enable specific signals
ENABLE_OTEL_TRACES=true
ENABLE_OTEL_METRICS=true
ENABLE_OTEL_LOGS=true

# Service identification
OTEL_SERVICE_NAME=open-webui

OTLP Exporter Configuration

Configure the OpenTelemetry Protocol (OTLP) endpoint:
# Single endpoint for all signals
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
OTEL_EXPORTER_OTLP_INSECURE=true

# Or separate endpoints for each signal
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317  # Traces
OTEL_METRICS_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317  # Metrics
OTEL_LOGS_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317  # Logs

Transport Protocol

Choose between HTTP and gRPC:
# gRPC (default, port 4317)
OTEL_OTLP_SPAN_EXPORTER=grpc
OTEL_METRICS_OTLP_SPAN_EXPORTER=grpc
OTEL_LOGS_OTLP_SPAN_EXPORTER=grpc

# HTTP (port 4318)
OTEL_OTLP_SPAN_EXPORTER=http
OTEL_METRICS_OTLP_SPAN_EXPORTER=http
OTEL_LOGS_OTLP_SPAN_EXPORTER=http
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318

Authentication

For authenticated endpoints:
# Basic authentication
OTEL_BASIC_AUTH_USERNAME=your-username
OTEL_BASIC_AUTH_PASSWORD=your-password

# Per-signal authentication
OTEL_METRICS_BASIC_AUTH_USERNAME=metrics-user
OTEL_METRICS_BASIC_AUTH_PASSWORD=metrics-password
OTEL_LOGS_BASIC_AUTH_USERNAME=logs-user
OTEL_LOGS_BASIC_AUTH_PASSWORD=logs-password

Resource Attributes

Add custom resource attributes:
# Comma-separated key=value pairs
OTEL_RESOURCE_ATTRIBUTES=environment=production,region=us-east-1,version=1.0.0

Sampling Configuration

# Trace sampling (always_on, always_off, traceidratio, parentbased_always_on)
OTEL_TRACES_SAMPLER=parentbased_always_on

# Sample 10% of traces (for high-traffic deployments)
OTEL_TRACES_SAMPLER=traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1

Observability Backend Integration

Datadog

ENABLE_OTEL=true
ENABLE_OTEL_TRACES=true
ENABLE_OTEL_METRICS=true
ENABLE_OTEL_LOGS=true

OTEL_SERVICE_NAME=open-webui
OTEL_EXPORTER_OTLP_ENDPOINT=https://api.datadoghq.com:4317
OTEL_OTLP_SPAN_EXPORTER=grpc

# Datadog API key as header
OTEL_EXPORTER_OTLP_HEADERS=api-key=your-datadog-api-key

OTEL_RESOURCE_ATTRIBUTES=deployment.environment=production,service.version=1.0.0

New Relic

ENABLE_OTEL=true
ENABLE_OTEL_TRACES=true
ENABLE_OTEL_METRICS=true

OTEL_SERVICE_NAME=open-webui
OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net:4317
OTEL_OTLP_SPAN_EXPORTER=grpc

# New Relic license key
OTEL_EXPORTER_OTLP_HEADERS=api-key=your-new-relic-license-key

Honeycomb

ENABLE_OTEL=true
ENABLE_OTEL_TRACES=true

OTEL_SERVICE_NAME=open-webui
OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io:443
OTEL_OTLP_SPAN_EXPORTER=grpc

# Honeycomb API key and dataset
OTEL_EXPORTER_OTLP_HEADERS=x-honeycomb-team=your-api-key,x-honeycomb-dataset=open-webui

Grafana Cloud

ENABLE_OTEL=true
ENABLE_OTEL_TRACES=true
ENABLE_OTEL_METRICS=true
ENABLE_OTEL_LOGS=true

OTEL_SERVICE_NAME=open-webui
OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-us-east-0.grafana.net:443
OTEL_OTLP_SPAN_EXPORTER=grpc

# Grafana Cloud credentials
OTEL_BASIC_AUTH_USERNAME=your-instance-id
OTEL_BASIC_AUTH_PASSWORD=your-api-token

Self-Hosted OpenTelemetry Collector

ENABLE_OTEL=true
ENABLE_OTEL_TRACES=true
ENABLE_OTEL_METRICS=true
ENABLE_OTEL_LOGS=true

OTEL_SERVICE_NAME=open-webui
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
OTEL_EXPORTER_OTLP_INSECURE=true
OTEL_OTLP_SPAN_EXPORTER=grpc

Docker Compose with OpenTelemetry Collector

version: '3'

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    environment:
      # OpenTelemetry Configuration
      - ENABLE_OTEL=true
      - ENABLE_OTEL_TRACES=true
      - ENABLE_OTEL_METRICS=true
      - ENABLE_OTEL_LOGS=true
      - OTEL_SERVICE_NAME=open-webui
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
      - OTEL_EXPORTER_OTLP_INSECURE=true
      - OTEL_RESOURCE_ATTRIBUTES=environment=production,region=us-east-1
    volumes:
      - open-webui:/app/backend/data
    depends_on:
      - otel-collector
    restart: always

  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317:4317"  # OTLP gRPC
      - "4318:4318"  # OTLP HTTP
      - "8888:8888"  # Prometheus metrics
      - "13133:13133"  # Health check
    restart: always

  # Optional: Local Jaeger for trace visualization
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"  # Jaeger UI
      - "14250:14250"  # gRPC
    restart: always

  # Optional: Prometheus for metrics
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
    restart: always

volumes:
  open-webui:

OpenTelemetry Collector Configuration

otel-collector-config.yaml:
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 10s
    send_batch_size: 1024
  
  memory_limiter:
    check_interval: 1s
    limit_mib: 512

exporters:
  # Jaeger for traces
  otlp/jaeger:
    endpoint: jaeger:4317
    tls:
      insecure: true
  
  # Prometheus for metrics
  prometheus:
    endpoint: "0.0.0.0:8888"
  
  # Logging exporter for debugging
  logging:
    loglevel: info
  
  # Example: Export to Datadog
  # datadog:
  #   api:
  #     key: ${DD_API_KEY}
  #     site: datadoghq.com

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp/jaeger, logging]
    
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheus, logging]
    
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [logging]

Auto-Instrumented Components

Open WebUI automatically instruments:

FastAPI

  • HTTP request/response traces
  • Endpoint latency metrics
  • Error rates and status codes
  • Request attributes (method, path, status)

SQLAlchemy

  • Database query traces
  • Query execution time
  • Connection pool metrics
  • Database type and operations

Redis

  • Redis command traces
  • Command execution time
  • Connection metrics
  • Cache hit/miss rates

HTTP Clients (httpx, aiohttp, requests)

  • External API call traces
  • Request/response attributes
  • Latency and error tracking

Custom Instrumentation

Add custom spans and metrics in your code:
from opentelemetry import trace
from opentelemetry import metrics

# Get tracer
tracer = trace.get_tracer(__name__)

# Create custom span
with tracer.start_as_current_span("custom_operation") as span:
    span.set_attribute("operation.type", "data_processing")
    span.set_attribute("user.id", user_id)
    # Your code here
    span.add_event("Processing started")
    result = process_data()
    span.add_event("Processing completed")

# Custom metrics
meter = metrics.get_meter(__name__)
counter = meter.create_counter(
    "custom.requests",
    description="Custom request counter"
)
counter.add(1, {"endpoint": "/api/chat"})

Structured Logging

Enable JSON-formatted logs for centralized log aggregation:
# Enable JSON logging
LOG_FORMAT=json

# Set log level
GLOBAL_LOG_LEVEL=INFO  # DEBUG, INFO, WARNING, ERROR, CRITICAL
JSON log output:
{
  "ts": "2024-03-02T10:30:45.123Z",
  "level": "info",
  "msg": "User authenticated via LDAP",
  "caller": "open_webui.routers.auths",
  "user_id": "user-123",
  "auth_method": "ldap"
}

Metrics

Key metrics collected:
  • HTTP Metrics
    • http.server.duration - Request latency
    • http.server.active_requests - Concurrent requests
    • http.server.request.size - Request body size
    • http.server.response.size - Response body size
  • Database Metrics
    • db.client.connections.usage - Connection pool usage
    • db.client.operation.duration - Query execution time
  • Redis Metrics
    • redis.command.duration - Command execution time
    • redis.connections - Active connections

Troubleshooting

Traces Not Appearing

Check:
  • ENABLE_OTEL=true and ENABLE_OTEL_TRACES=true
  • Collector endpoint is reachable: telnet otel-collector 4317
  • Check collector logs for connection errors
  • Verify authentication credentials

High Overhead

Reduce sampling:
OTEL_TRACES_SAMPLER=traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1  # Sample 10% of traces

Collector Connection Refused

For gRPC:
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
OTEL_EXPORTER_OTLP_INSECURE=true
For HTTP:
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
OTEL_OTLP_SPAN_EXPORTER=http

TLS Certificate Errors

# Use insecure connection for testing (NOT for production)
OTEL_EXPORTER_OTLP_INSECURE=true
OTEL_METRICS_EXPORTER_OTLP_INSECURE=true
OTEL_LOGS_EXPORTER_OTLP_INSECURE=true

Implementation Details

  • Telemetry setup: backend/open_webui/utils/telemetry/setup.py
  • Metrics configuration: backend/open_webui/utils/telemetry/metrics.py
  • Logs configuration: backend/open_webui/utils/telemetry/logs.py
  • Auto-instrumentation: backend/open_webui/utils/telemetry/instrumentors.py
  • Uses OpenTelemetry SDK 1.39.1 with OTLP exporters

Security Considerations

Observability Security:
  1. Sensitive Data - Ensure traces/logs don’t contain passwords, API keys, or PII
  2. Authentication - Use Basic Auth or API keys for collector endpoints
  3. TLS/SSL - Enable TLS for production collector connections
  4. Network Isolation - Restrict collector access to application network
  5. Sampling - Use sampling to reduce data volume and costs

Next Steps