API Reference
Proxy Endpoint
Connect via WebSocket to start streaming audio.
wss://api.router.audio/v1/listen'; Authentication
There are three ways to authenticate with router.audio.
| Method | Format | Notes |
|---|---|---|
| HTTP header | x-api-key: YOUR_KEY | This is the most common method. Pass either a router.audio API key or a provider-specific key depending on your use case. If both are provided, the header takes precedence over query parameters. |
| Query parameter | api_key=YOUR_KEY | Useful for browser clients that can't set custom HTTP headers. Pass either a router.audio API key or a provider-specific key depending on your use case. |
| Query parameter | jwt_token=TOKEN | For client-facing applications, exchange your router.audio API key for a short-lived JWT token and pass it on the WebSocket connection. This keeps your API key secure and allows for more granular access control. |
JWT Tokens
For client-facing applications, exchange your router.audio API key for a short-lived, single-use JWT token. The token is generated server-side and passed on the WebSocket connection, so your API key isn't exposed to the client.
curl -X POST https://api.router.audio/v1/token \
-H "x-api-key: router_YOUR_KEY"Use the token on the WebSocket:
wss://api.router.audio/v1/listen?provider=deepgram&jwt_token=TOKEN JWT tokens are ideal for browser-based applications, mobile apps, or any client where you want to avoid exposing your API key. They also allow for more granular access control, such as restricting usage to specific providers or rate limits.
Bring Your Own Key
Pass your provider's API keys directly. This is useful if you have a direct agreement with a provider at a lower price, or if you want to use router.audio purely as a proxy without a router.audio account.
Query Parameters
Pass these as query parameters in the WebSocket URL.
| Parameter | Required | Type | Default | Description |
|---|---|---|---|---|
provider | Yes | string | - |
The Speech-To-Text API provider to route audio to. One of: deepgram , assemblyai , elevenlabs , openai , speechmatics , soniox , gradium |
encoding | Yes | string | - |
Various providers allow for a variety of encoding formats;
where the encoding format is not compatible with a provider,
they are resampled to pcm_s16le at 16kHz mono, or 24kHz for
OpenAI. One of: pcm_s16le, pcm_mulaw, ogg_opus, webm, mp3 |
sample_rate | Yes | number | - |
Sample rate in Hz. One of: 8000, 16000, 22050, 24000, 44100, 48000 |
channels | No | number | 1 | Number of audio channels in the input stream. |
model | No | string | Depends | Provider-specific model. See Providers page for options. The default models are chosen for a wider range of language availability. |
language | No | string | en | Language code for recognition. Some providers may detect the language automatically or have multilingual/code-switching capabilities. The full list of supported languages and their codes can be found in the Providers page. |
diarize | No | boolean | false | Enable speaker diarization on the word level where possible (i.e. Deepgram, Speechmatics and sometimes ElevenLabs). |
punctuate | No | boolean | true | Enable automatic punctuation in transcripts. N.b. some providers only support punctuated transcripts. |
partial_results | No | boolean | false | Whether to receive partial (non-final) transcripts. Note that not all providers support this feature, and behavior may vary. |
endpointing | No | number | 500 | Silence duration in milliseconds before finalizing a transcript segment |
jwt_token | No | string | - | Temporary JWT tokens which can be used in place of API keys. |
parse_json | No | boolean | true | Normalize provider responses into a unified JSON format. Set to false for raw provider output |
Response Messages
Transcripts are sent as text frames containing JSON, where only
finalized transcripts are delivered and all providers return a unified
format. Set parse_json=false to receive raw, unmodified
responses from the upstream provider instead.
{
"type": "transcript",
"provider": "deepgram",
"transcript": "Hello, how are you?",
"start_time": 0.0,
"end_time": 0.95,
"is_partial": false,
"words": [
{
"text": "hello",
"start_time": 0.0,
"end_time": 0.42,
"speaker": null
},
{
"text": "how",
"start_time": 0.48,
"end_time": 0.61,
"speaker": null
},
{
"text": "are",
"start_time": 0.62,
"end_time": 0.75,
"speaker": null
},
{
"text": "you",
"start_time": 0.76,
"end_time": 0.90,
"speaker": null
}
]
}| Field | Type | Description |
|---|---|---|
type | string | Message type. Either "transcript" for recognized speech segments or "turn_end" for speaker turn end events. |
provider | string | The provider that generated this transcript |
transcript | string | The recognized text |
start_time | number | Segment start time in seconds |
end_time | number | Segment end time in seconds |
words[] | array | null | Word-level details |
words[].text | string | The recognized word |
words[].start_time | number | Word start time in seconds |
words[].end_time | number | Word end time in seconds |
words[].speaker | string | null | Speaker ID when diarization is enabled, otherwise null |
is_partial | boolean | Whether this is a partial (non-final) transcript. Only present when partial_results is enabled. |
Partial Responses
When partial_results=true, you will also
receive non-final transcripts with is_partial: true. Each partial result replaces the previous one for the current
utterance — you should overwrite (not append) when displaying them.
Once the utterance is finalized, a message with is_partial: false is sent, at which point you can commit the transcript and begin awaiting
the next one.
Turn End
Some providers emit a turn end event to signal that a speaker turn
has completed. When supported, router.audio forwards this as a
separate message with type: "turn_end". This
can be used to detect a natural pause in speech and trigger
downstream processing, such as sending the final transcript to an
LLM.
Error Handling
Errors are communicated via WebSocket close codes. The server may also
send a text frame with an error field before closing
the connection.
| Close Code | Description |
|---|---|
1000 | Normal closure. The connection completed successfully. |
1006 | Connection lost unexpectedly. Typically a network issue between the client and the server. Note that some APIs only validates the configuration after the connection is established, so you may receive this error if the query parameters are invalid (e.g. unsupported provider or encoding). |
3000 | Unauthorized. Invalid or missing API key / JWT token. |
4040 | Provider error. The upstream STT provider returned an error or could not be reached. |