Transcribe Audio
curl --request POST \
--url https://api.example.com/api/audio/transcriptions \
--header 'Authorization: <authorization>' \
--header 'Content-Type: application/json' \
--data '
{
"language": "<string>"
}
'{
"text": "<string>",
"filename": "<string>"
}Files & Media
Transcribe Audio
POST
/
api
/
audio
/
transcriptions
Transcribe Audio
curl --request POST \
--url https://api.example.com/api/audio/transcriptions \
--header 'Authorization: <authorization>' \
--header 'Content-Type: application/json' \
--data '
{
"language": "<string>"
}
'{
"text": "<string>",
"filename": "<string>"
}Transcribe audio files to text using configured speech-to-text engines. Supports multiple STT providers including OpenAI Whisper, Deepgram, Azure, and Mistral.
Admin-only endpoint to update STT engine, model, and API credentials.
Request
Headers
Bearer token for authentication
Body
Audio file to transcribe. Supported formats: flac, m4a, mp3, mp4, mpeg, wav, webmMaximum file size varies by engine:
- Default engines: 20 MB
- Azure: 200 MB
Language code for transcription (e.g.,
en, es, fr). Used as a hint to improve accuracy.Response
The transcribed text from the audio file
Name of the processed audio file
Example
curl -X POST https://your-domain.com/api/audio/transcriptions \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "file=@recording.mp3" \
-F "language=en"
Response
{
"text": "Hello, this is a sample audio transcription. The quick brown fox jumps over the lazy dog.",
"filename": "550e8400-e29b-41d4-a716-446655440000.mp3"
}
Supported Engines
Configure the STT engine in Admin Settings > Audio:Local Whisper (Default)
- Uses Faster Whisper model running locally
- Configurable model size and compute type
- Supports VAD filtering and multilingual mode
OpenAI Whisper API
- Cloud-based transcription using OpenAI’s API
- Requires OpenAI API key
- Supports language parameter
Deepgram
- High-accuracy transcription API
- Smart formatting enabled by default
- Requires Deepgram API key
Azure Speech Services
- Microsoft Azure cognitive services
- Supports speaker diarization (up to 3 speakers by default)
- Multi-locale detection
- Requires Azure subscription key and region
Mistral
- Uses Voxtral models for transcription
- Two methods: dedicated transcriptions API or chat completions
- Requires Mistral API key
Audio Processing
The API automatically handles:- Format conversion: Non-supported formats are converted to MP3
- Compression: Large files are compressed to reduce size
- Chunking: Files exceeding size limits are split into chunks
- Parallel processing: Multiple chunks processed concurrently
Permissions
Requires thechat.stt permission. Admin users have access by default.
Error Responses
400
Invalid file format or transcription failed
403
User does not have permission to use transcription
Configuration
To configure STT settings:POST /api/audio/config/update
⌘I