Skip to main content
Memories in Open WebUI provide long-term storage for important information, preferences, and context. This enables your AI models to remember details across conversations and provide more personalized, context-aware responses.

Overview

Memories provide:
  • Persistent user-specific storage
  • Vector-based semantic search
  • Automatic embedding generation
  • Context retrieval for conversations
  • Privacy-focused per-user isolation
Memories must be explicitly enabled in your Open WebUI configuration. They require a vector database and embedding function to be configured.

Architecture

Memories use a two-tier storage system:
  1. Database: Stores the actual memory content and metadata
  2. Vector Store: Stores embeddings for semantic search
When you query memories, the system:
  1. Generates an embedding of your query
  2. Searches the vector database for similar memories
  3. Returns the most relevant memories

Configuration

Enable memories in your environment:
# Enable memories feature
ENABLE_MEMORIES=true

# Vector database (required)
VECTOR_DB=chroma  # or qdrant, milvus, etc.

# Embedding function (required)
EMBEDDING_FUNCTION=sentence-transformers
EMBEDDING_MODEL=all-MiniLM-L6-v2
Memories require a configured vector database and embedding function. Without these, the feature will not work.

Managing Memories

Add Memory

Store a new memory:
import requests

url = "http://localhost:8080/api/memories/add"
payload = {
    "content": "The user prefers Python for backend development and React for frontend."
}

response = requests.post(url, json=payload)
memory = response.json()

print(f"Memory ID: {memory['id']}")
print(f"Content: {memory['content']}")
print(f"Created: {memory['created_at']}")

List All Memories

Retrieve all memories for the current user:
import requests

response = requests.get("http://localhost:8080/api/memories/")
memories = response.json()

for memory in memories:
    print(f"[{memory['created_at']}] {memory['content']}")

Query Memories

Search for relevant memories using semantic search:
import requests

url = "http://localhost:8080/api/memories/query"
payload = {
    "content": "What programming languages does the user like?",
    "k": 3  # Return top 3 most relevant memories
}

response = requests.post(url, json=payload)
results = response.json()

for result in results:
    print(f"Score: {result.get('score', 'N/A')}")
    print(f"Memory: {result['text']}")
    print("---")

Update Memory

Modify an existing memory:
import requests

memory_id = "memory-uuid-here"
url = f"http://localhost:8080/api/memories/{memory_id}/update"
payload = {
    "content": "The user prefers Python and TypeScript for development."
}

response = requests.post(url, json=payload)
updated_memory = response.json()

Delete Memory

Remove a specific memory:
import requests

memory_id = "memory-uuid-here"
response = requests.delete(f"http://localhost:8080/api/memories/{memory_id}")

if response.json():
    print("Memory deleted successfully")

Delete All Memories

Clear all memories for the current user:
import requests

response = requests.delete("http://localhost:8080/api/memories/delete/user")

if response.json():
    print("All memories deleted")

Reset Memory Embeddings

Regenerate all embeddings (useful after changing embedding models):
import requests

response = requests.post("http://localhost:8080/api/memories/reset")

if response.json():
    print("Memory embeddings regenerated")
The reset operation regenerates embeddings for all user memories. This can take time if you have many memories.

Memory Workflow

1

User Interaction

User shares information during a conversation:
  • “I prefer dark mode”
  • “My timezone is EST”
  • “I work with Python and JavaScript”
2

Memory Creation

Important information is stored as memories:
memories = [
    "User prefers dark mode interface",
    "User timezone: EST (UTC-5)",
    "User works with Python and JavaScript"
]

for content in memories:
    requests.post(
        "http://localhost:8080/api/memories/add",
        json={"content": content}
    )
3

Embedding Generation

System automatically generates embeddings and stores them in the vector database
4

Context Retrieval

When user asks a question, relevant memories are retrieved:
response = requests.post(
    "http://localhost:8080/api/memories/query",
    json={
        "content": "What theme should I use?",
        "k": 1
    }
)
# Returns: "User prefers dark mode interface"
5

Personalized Response

AI uses retrieved memories to provide personalized answers

Use Cases

User Preferences

Store user preferences for personalized experiences:
import requests

preferences = [
    "User prefers concise responses",
    "User wants code examples in Python",
    "User is interested in machine learning",
    "User works in healthcare industry"
]

for pref in preferences:
    requests.post(
        "http://localhost:8080/api/memories/add",
        json={"content": pref}
    )

Project Context

Remember ongoing project details:
project_info = [
    "Current project: Building a REST API for inventory management",
    "Tech stack: FastAPI, PostgreSQL, Docker",
    "Deployment target: AWS ECS",
    "Team size: 3 developers"
]

for info in project_info:
    requests.post(
        "http://localhost:8080/api/memories/add",
        json={"content": info}
    )

Learning Progress

Track learning journey and achievements:
learning_memories = [
    "Completed Python basics course on 2024-01-15",
    "Struggling with async/await concepts",
    "Built first REST API successfully",
    "Next goal: Learn Docker containerization"
]

for memory in learning_memories:
    requests.post(
        "http://localhost:8080/api/memories/add",
        json={"content": memory}
    )

Vector Database Integration

Memories are stored in user-specific collections:
# Collection naming pattern
collection_name = f"user-memory-{user_id}"

# Example: user-memory-123e4567-e89b-12d3-a456-426614174000
Supported vector databases:
  • ChromaDB (default)
  • Qdrant
  • Milvus
  • Elasticsearch
  • OpenSearch
  • Pinecone
  • PGVector
  • S3Vector
  • Oracle 23ai

Privacy and Security

Memories are user-isolated. Each user has their own private memory collection that other users cannot access.

Best Practices

  1. Sensitive Data: Avoid storing passwords or API keys in memories
  2. PII: Be cautious with personally identifiable information
  3. Retention: Implement a memory lifecycle policy
  4. Encryption: Use encrypted vector databases for production
  5. Access Control: Ensure memories feature is permission-gated

Performance Considerations

Embedding Generation

Embeddings are generated asynchronously to avoid blocking:
# Adding memory doesn't block on embedding generation
vector = await EMBEDDING_FUNCTION(memory.content, user=user)

Bulk Operations

When adding many memories, consider batching:
import requests
import asyncio

async def add_memories_bulk(memories):
    tasks = []
    for content in memories:
        task = requests.post(
            "http://localhost:8080/api/memories/add",
            json={"content": content}
        )
        tasks.append(task)
    
    return await asyncio.gather(*tasks)

Vector Search Optimization

Adjust k parameter based on your needs:
# Fewer results = faster search
response = requests.post(
    "http://localhost:8080/api/memories/query",
    json={"content": "query", "k": 3}
)

# More results = better context
response = requests.post(
    "http://localhost:8080/api/memories/query",
    json={"content": "query", "k": 10}
)

Integration Example

Memory-Augmented Chat

import requests

def chat_with_memory(user_message):
    # 1. Query relevant memories
    memory_response = requests.post(
        "http://localhost:8080/api/memories/query",
        json={"content": user_message, "k": 3}
    )
    relevant_memories = memory_response.json()
    
    # 2. Build context from memories
    context = "\n".join([
        f"- {mem['text']}" for mem in relevant_memories
    ])
    
    # 3. Send to chat with context
    system_prompt = f"""
    You have access to the following memories about the user:
    {context}
    
    Use this information to provide personalized responses.
    """
    
    # Send to chat API with enhanced context
    chat_response = requests.post(
        "http://localhost:8080/api/chat",
        json={
            "messages": [
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_message}
            ]
        }
    )
    
    return chat_response.json()

Troubleshooting

Memories Not Saving

  • Verify ENABLE_MEMORIES=true in configuration
  • Check vector database is running and accessible
  • Ensure embedding function is configured
  • Review user permissions for features.memories

Search Returns No Results

  • Confirm memories exist for the user
  • Try broader search queries
  • Check embedding model compatibility
  • Verify vector database collection exists

Slow Performance

  • Reduce k parameter in queries
  • Consider faster embedding models
  • Optimize vector database configuration
  • Use local embedding models instead of API calls

Next Steps

  • Learn about Artifacts for persistent storage
  • Explore Skills for reusable capabilities
  • Configure Functions for custom logic