curl -X POST "https://your-domain.com/api/v1/files/?process=true" \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "file=@/path/to/document.pdf" \
-F 'metadata={"source":"user_upload","category":"documentation"}'
{
"status": true,
"id": "file_abc123",
"filename": "product_guide.pdf",
"path": "files/abc123_product_guide.pdf",
"data": {
"status": "pending"
},
"meta": {
"name": "product_guide.pdf",
"content_type": "application/pdf",
"size": 245678,
"data": {
"source": "user_upload",
"category": "documentation"
}
},
"user_id": "user_456def",
"created_at": 1678901234,
"updated_at": 1678901234
}
Knowledge
Upload Documents for RAG
Upload and process documents for retrieval-augmented generation
POST
/
api
/
v1
/
files
curl -X POST "https://your-domain.com/api/v1/files/?process=true" \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "file=@/path/to/document.pdf" \
-F 'metadata={"source":"user_upload","category":"documentation"}'
{
"status": true,
"id": "file_abc123",
"filename": "product_guide.pdf",
"path": "files/abc123_product_guide.pdf",
"data": {
"status": "pending"
},
"meta": {
"name": "product_guide.pdf",
"content_type": "application/pdf",
"size": 245678,
"data": {
"source": "user_upload",
"category": "documentation"
}
},
"user_id": "user_456def",
"created_at": 1678901234,
"updated_at": 1678901234
}
Upload a document file and optionally process it for embedding into a knowledge base. The file is chunked, embedded, and stored in the vector database for semantic search.
This returns a Server-Sent Events (SSE) stream with status updates:
Request
Form Data
The document file to upload. Supported formats depend on your configuration (PDF, DOCX, TXT, Markdown, etc.)
JSON string or object with additional metadata about the file. Can include custom fields for your application.
Query Parameters
Whether to process the file for RAG (extract text, chunk, and embed)
Whether to process the file asynchronously in the background
Headers
Bearer token for authentication
Response
Whether the upload was successful
Unique identifier for the uploaded file
Original filename of the uploaded file
Storage path of the uploaded file
ID of the user who uploaded the file
Unix timestamp when the file was uploaded
Unix timestamp when the file was last updated
curl -X POST "https://your-domain.com/api/v1/files/?process=true" \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "file=@/path/to/document.pdf" \
-F 'metadata={"source":"user_upload","category":"documentation"}'
{
"status": true,
"id": "file_abc123",
"filename": "product_guide.pdf",
"path": "files/abc123_product_guide.pdf",
"data": {
"status": "pending"
},
"meta": {
"name": "product_guide.pdf",
"content_type": "application/pdf",
"size": 245678,
"data": {
"source": "user_upload",
"category": "documentation"
}
},
"user_id": "user_456def",
"created_at": 1678901234,
"updated_at": 1678901234
}
Add File to Knowledge Base
After uploading a file, add it to a knowledge base:POST /api/v1/knowledge/{knowledge_id}/file/add
Request Body
ID of the uploaded file to add to the knowledge base
curl -X POST "https://your-domain.com/api/v1/knowledge/kb_123/file/add" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"file_id": "file_abc123"}'
Batch Upload Files
Upload multiple files to a knowledge base at once:POST /api/v1/knowledge/{knowledge_id}/files/batch/add
Request Body
[
{"file_id": "file_1"},
{"file_id": "file_2"},
{"file_id": "file_3"}
]
Processing Pipeline
- Upload: File is stored and assigned a unique ID
- Extraction: Text content is extracted based on file type (PDF, DOCX, etc.)
- Chunking: Content is split into chunks (configured via CHUNK_SIZE and CHUNK_OVERLAP)
- Embedding: Each chunk is embedded using the configured embedding model
- Storage: Embeddings are stored in the vector database for retrieval
Monitoring Processing Status
Check the processing status of a file:GET /api/v1/files/{file_id}/process/status?stream=true
data: {"status": "pending"}
data: {"status": "completed"}
Notes
- Supported file types are configurable via
ALLOWED_FILE_EXTENSIONS - Maximum file size is controlled by
FILE_MAX_SIZEsetting - Processing extracts text using various engines (PyMuPDF, Tika, Docling, etc.)
- Audio files are transcribed using the configured STT engine
- Files are automatically chunked and embedded if
process=true
⌘I