Upload Documents for RAG

curl -X POST "https://your-domain.com/api/v1/files/?process=true" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@/path/to/document.pdf" \
  -F 'metadata={"source":"user_upload","category":"documentation"}'

{
  "status": true,
  "id": "file_abc123",
  "filename": "product_guide.pdf",
  "path": "files/abc123_product_guide.pdf",
  "data": {
    "status": "pending"
  },
  "meta": {
    "name": "product_guide.pdf",
    "content_type": "application/pdf",
    "size": 245678,
    "data": {
      "source": "user_upload",
      "category": "documentation"
    }
  },
  "user_id": "user_456def",
  "created_at": 1678901234,
  "updated_at": 1678901234
}

POST

api

files

curl -X POST "https://your-domain.com/api/v1/files/?process=true" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@/path/to/document.pdf" \
  -F 'metadata={"source":"user_upload","category":"documentation"}'

{
  "status": true,
  "id": "file_abc123",
  "filename": "product_guide.pdf",
  "path": "files/abc123_product_guide.pdf",
  "data": {
    "status": "pending"
  },
  "meta": {
    "name": "product_guide.pdf",
    "content_type": "application/pdf",
    "size": 245678,
    "data": {
      "source": "user_upload",
      "category": "documentation"
    }
  },
  "user_id": "user_456def",
  "created_at": 1678901234,
  "updated_at": 1678901234
}

Upload a document file and optionally process it for embedding into a knowledge base. The file is chunked, embedded, and stored in the vector database for semantic search.

Request

Form Data

file

required

The document file to upload. Supported formats depend on your configuration (PDF, DOCX, TXT, Markdown, etc.)

metadata

string | object

JSON string or object with additional metadata about the file. Can include custom fields for your application.

Query Parameters

process

boolean

default:"true"

Whether to process the file for RAG (extract text, chunk, and embed)

process_in_background

boolean

default:"true"

Whether to process the file asynchronously in the background

Headers

Authorization

string

required

Bearer token for authentication

Response

status

boolean

Whether the upload was successful

string

Unique identifier for the uploaded file

filename

string

Original filename of the uploaded file

path

string

Storage path of the uploaded file

data

object

File processing data

status

string

Processing status: “pending”, “completed”, or “failed”

content

string

Extracted text content (after processing)

error

string

Error message if processing failed

Add File to Knowledge Base

After uploading a file, add it to a knowledge base:

POST /api/v1/knowledge/{knowledge_id}/file/add

Request Body

file_id

string

required

ID of the uploaded file to add to the knowledge base

curl -X POST "https://your-domain.com/api/v1/knowledge/kb_123/file/add" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"file_id": "file_abc123"}'

Batch Upload Files

Upload multiple files to a knowledge base at once:

POST /api/v1/knowledge/{knowledge_id}/files/batch/add

Request Body

[
  {"file_id": "file_1"},
  {"file_id": "file_2"},
  {"file_id": "file_3"}
]

Processing Pipeline

Upload: File is stored and assigned a unique ID
Extraction: Text content is extracted based on file type (PDF, DOCX, etc.)
Chunking: Content is split into chunks (configured via CHUNK_SIZE and CHUNK_OVERLAP)
Embedding: Each chunk is embedded using the configured embedding model
Storage: Embeddings are stored in the vector database for retrieval

Monitoring Processing Status

Check the processing status of a file:

GET /api/v1/files/{file_id}/process/status?stream=true

This returns a Server-Sent Events (SSE) stream with status updates:

data: {"status": "pending"}
data: {"status": "completed"}

Notes

Supported file types are configurable via ALLOWED_FILE_EXTENSIONS
Maximum file size is controlled by FILE_MAX_SIZE setting
Processing extracts text using various engines (PyMuPDF, Tika, Docling, etc.)
Audio files are transcribed using the configured STT engine
Files are automatically chunked and embedded if process=true

List Knowledge Bases Query Knowledge Bases

​Request

​Form Data

​Query Parameters

​Headers

​Response

​Add File to Knowledge Base

​Request Body

​Batch Upload Files

​Request Body

​Processing Pipeline

​Monitoring Processing Status

​Notes

Request

Form Data

Query Parameters

Headers

Response

Add File to Knowledge Base

Request Body

Batch Upload Files

Request Body

Processing Pipeline

Monitoring Processing Status

Notes