Skip to main content
POST
/
api
/
audio
/
transcriptions
Transcribe Audio
curl --request POST \
  --url https://api.example.com/api/audio/transcriptions \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "language": "<string>"
}
'
{
  "text": "<string>",
  "filename": "<string>"
}
Transcribe audio files to text using configured speech-to-text engines. Supports multiple STT providers including OpenAI Whisper, Deepgram, Azure, and Mistral.

Request

Headers

Authorization
string
required
Bearer token for authentication

Body

file
file
required
Audio file to transcribe. Supported formats: flac, m4a, mp3, mp4, mpeg, wav, webmMaximum file size varies by engine:
  • Default engines: 20 MB
  • Azure: 200 MB
language
string
Language code for transcription (e.g., en, es, fr). Used as a hint to improve accuracy.

Response

text
string
The transcribed text from the audio file
filename
string
Name of the processed audio file

Example

curl -X POST https://your-domain.com/api/audio/transcriptions \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@recording.mp3" \
  -F "language=en"
Response
{
  "text": "Hello, this is a sample audio transcription. The quick brown fox jumps over the lazy dog.",
  "filename": "550e8400-e29b-41d4-a716-446655440000.mp3"
}

Supported Engines

Configure the STT engine in Admin Settings > Audio:

Local Whisper (Default)

  • Uses Faster Whisper model running locally
  • Configurable model size and compute type
  • Supports VAD filtering and multilingual mode

OpenAI Whisper API

  • Cloud-based transcription using OpenAI’s API
  • Requires OpenAI API key
  • Supports language parameter

Deepgram

  • High-accuracy transcription API
  • Smart formatting enabled by default
  • Requires Deepgram API key

Azure Speech Services

  • Microsoft Azure cognitive services
  • Supports speaker diarization (up to 3 speakers by default)
  • Multi-locale detection
  • Requires Azure subscription key and region

Mistral

  • Uses Voxtral models for transcription
  • Two methods: dedicated transcriptions API or chat completions
  • Requires Mistral API key

Audio Processing

The API automatically handles:
  1. Format conversion: Non-supported formats are converted to MP3
  2. Compression: Large files are compressed to reduce size
  3. Chunking: Files exceeding size limits are split into chunks
  4. Parallel processing: Multiple chunks processed concurrently

Permissions

Requires the chat.stt permission. Admin users have access by default.

Error Responses

400
Invalid file format or transcription failed
403
User does not have permission to use transcription

Configuration

To configure STT settings:
POST /api/audio/config/update
Admin-only endpoint to update STT engine, model, and API credentials.