Memories in Open WebUI provide long-term storage for important information, preferences, and context. This enables your AI models to remember details across conversations and provide more personalized, context-aware responses.
Overview
Memories provide:
- Persistent user-specific storage
- Vector-based semantic search
- Automatic embedding generation
- Context retrieval for conversations
- Privacy-focused per-user isolation
Memories must be explicitly enabled in your Open WebUI configuration. They require a vector database and embedding function to be configured.
Architecture
Memories use a two-tier storage system:
- Database: Stores the actual memory content and metadata
- Vector Store: Stores embeddings for semantic search
When you query memories, the system:
- Generates an embedding of your query
- Searches the vector database for similar memories
- Returns the most relevant memories
Configuration
Enable memories in your environment:
# Enable memories feature
ENABLE_MEMORIES=true
# Vector database (required)
VECTOR_DB=chroma # or qdrant, milvus, etc.
# Embedding function (required)
EMBEDDING_FUNCTION=sentence-transformers
EMBEDDING_MODEL=all-MiniLM-L6-v2
Memories require a configured vector database and embedding function. Without these, the feature will not work.
Managing Memories
Add Memory
Store a new memory:
import requests
url = "http://localhost:8080/api/memories/add"
payload = {
"content": "The user prefers Python for backend development and React for frontend."
}
response = requests.post(url, json=payload)
memory = response.json()
print(f"Memory ID: {memory['id']}")
print(f"Content: {memory['content']}")
print(f"Created: {memory['created_at']}")
List All Memories
Retrieve all memories for the current user:
import requests
response = requests.get("http://localhost:8080/api/memories/")
memories = response.json()
for memory in memories:
print(f"[{memory['created_at']}] {memory['content']}")
Query Memories
Search for relevant memories using semantic search:
import requests
url = "http://localhost:8080/api/memories/query"
payload = {
"content": "What programming languages does the user like?",
"k": 3 # Return top 3 most relevant memories
}
response = requests.post(url, json=payload)
results = response.json()
for result in results:
print(f"Score: {result.get('score', 'N/A')}")
print(f"Memory: {result['text']}")
print("---")
Update Memory
Modify an existing memory:
import requests
memory_id = "memory-uuid-here"
url = f"http://localhost:8080/api/memories/{memory_id}/update"
payload = {
"content": "The user prefers Python and TypeScript for development."
}
response = requests.post(url, json=payload)
updated_memory = response.json()
Delete Memory
Remove a specific memory:
import requests
memory_id = "memory-uuid-here"
response = requests.delete(f"http://localhost:8080/api/memories/{memory_id}")
if response.json():
print("Memory deleted successfully")
Delete All Memories
Clear all memories for the current user:
import requests
response = requests.delete("http://localhost:8080/api/memories/delete/user")
if response.json():
print("All memories deleted")
Reset Memory Embeddings
Regenerate all embeddings (useful after changing embedding models):
import requests
response = requests.post("http://localhost:8080/api/memories/reset")
if response.json():
print("Memory embeddings regenerated")
The reset operation regenerates embeddings for all user memories. This can take time if you have many memories.
Memory Workflow
User Interaction
User shares information during a conversation:
- “I prefer dark mode”
- “My timezone is EST”
- “I work with Python and JavaScript”
Memory Creation
Important information is stored as memories:memories = [
"User prefers dark mode interface",
"User timezone: EST (UTC-5)",
"User works with Python and JavaScript"
]
for content in memories:
requests.post(
"http://localhost:8080/api/memories/add",
json={"content": content}
)
Embedding Generation
System automatically generates embeddings and stores them in the vector database
Context Retrieval
When user asks a question, relevant memories are retrieved:response = requests.post(
"http://localhost:8080/api/memories/query",
json={
"content": "What theme should I use?",
"k": 1
}
)
# Returns: "User prefers dark mode interface"
Personalized Response
AI uses retrieved memories to provide personalized answers
Use Cases
User Preferences
Store user preferences for personalized experiences:
import requests
preferences = [
"User prefers concise responses",
"User wants code examples in Python",
"User is interested in machine learning",
"User works in healthcare industry"
]
for pref in preferences:
requests.post(
"http://localhost:8080/api/memories/add",
json={"content": pref}
)
Project Context
Remember ongoing project details:
project_info = [
"Current project: Building a REST API for inventory management",
"Tech stack: FastAPI, PostgreSQL, Docker",
"Deployment target: AWS ECS",
"Team size: 3 developers"
]
for info in project_info:
requests.post(
"http://localhost:8080/api/memories/add",
json={"content": info}
)
Learning Progress
Track learning journey and achievements:
learning_memories = [
"Completed Python basics course on 2024-01-15",
"Struggling with async/await concepts",
"Built first REST API successfully",
"Next goal: Learn Docker containerization"
]
for memory in learning_memories:
requests.post(
"http://localhost:8080/api/memories/add",
json={"content": memory}
)
Vector Database Integration
Memories are stored in user-specific collections:
# Collection naming pattern
collection_name = f"user-memory-{user_id}"
# Example: user-memory-123e4567-e89b-12d3-a456-426614174000
Supported vector databases:
- ChromaDB (default)
- Qdrant
- Milvus
- Elasticsearch
- OpenSearch
- Pinecone
- PGVector
- S3Vector
- Oracle 23ai
Privacy and Security
Memories are user-isolated. Each user has their own private memory collection that other users cannot access.
Best Practices
- Sensitive Data: Avoid storing passwords or API keys in memories
- PII: Be cautious with personally identifiable information
- Retention: Implement a memory lifecycle policy
- Encryption: Use encrypted vector databases for production
- Access Control: Ensure memories feature is permission-gated
Embedding Generation
Embeddings are generated asynchronously to avoid blocking:
# Adding memory doesn't block on embedding generation
vector = await EMBEDDING_FUNCTION(memory.content, user=user)
Bulk Operations
When adding many memories, consider batching:
import requests
import asyncio
async def add_memories_bulk(memories):
tasks = []
for content in memories:
task = requests.post(
"http://localhost:8080/api/memories/add",
json={"content": content}
)
tasks.append(task)
return await asyncio.gather(*tasks)
Vector Search Optimization
Adjust k parameter based on your needs:
# Fewer results = faster search
response = requests.post(
"http://localhost:8080/api/memories/query",
json={"content": "query", "k": 3}
)
# More results = better context
response = requests.post(
"http://localhost:8080/api/memories/query",
json={"content": "query", "k": 10}
)
Integration Example
Memory-Augmented Chat
import requests
def chat_with_memory(user_message):
# 1. Query relevant memories
memory_response = requests.post(
"http://localhost:8080/api/memories/query",
json={"content": user_message, "k": 3}
)
relevant_memories = memory_response.json()
# 2. Build context from memories
context = "\n".join([
f"- {mem['text']}" for mem in relevant_memories
])
# 3. Send to chat with context
system_prompt = f"""
You have access to the following memories about the user:
{context}
Use this information to provide personalized responses.
"""
# Send to chat API with enhanced context
chat_response = requests.post(
"http://localhost:8080/api/chat",
json={
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_message}
]
}
)
return chat_response.json()
Troubleshooting
Memories Not Saving
- Verify
ENABLE_MEMORIES=true in configuration
- Check vector database is running and accessible
- Ensure embedding function is configured
- Review user permissions for
features.memories
Search Returns No Results
- Confirm memories exist for the user
- Try broader search queries
- Check embedding model compatibility
- Verify vector database collection exists
- Reduce
k parameter in queries
- Consider faster embedding models
- Optimize vector database configuration
- Use local embedding models instead of API calls
Next Steps
- Learn about Artifacts for persistent storage
- Explore Skills for reusable capabilities
- Configure Functions for custom logic