API Reference

Proxy Endpoint

Connect via WebSocket to start streaming audio.

wss://api.router.audio/v1/listen

Authentication

There are three ways to authenticate with router.audio.

Method	Format	Notes
HTTP header	`x-api-key: YOUR_KEY`	This is the most common method. Pass either a router.audio API key or a provider-specific key depending on your use case. If both are provided, the header takes precedence over query parameters.
Query parameter	`api_key=YOUR_KEY`	Useful for browser clients that can't set custom HTTP headers. Pass either a router.audio API key or a provider-specific key depending on your use case.
Query parameter	`jwt_token=TOKEN`	For client-facing applications, exchange your router.audio API key for a short-lived JWT token and pass it on the WebSocket connection. This keeps your API key secure and allows for more granular access control.

JWT Tokens

For client-facing applications, exchange your router.audio API key for a short-lived, single-use JWT token. The token is generated server-side and passed on the WebSocket connection, so your API key isn't exposed to the client.

curl -X POST https://api.router.audio/v1/token \
  -H "x-api-key: router_YOUR_KEY"

Use the token on the WebSocket:

wss://api.router.audio/v1/listen?provider=deepgram&jwt_token=TOKEN

JWT tokens are ideal for browser-based applications, mobile apps, or any client where you want to avoid exposing your API key. They also allow for more granular access control, such as restricting usage to specific providers or rate limits.

Bring Your Own Key

Pass your provider's API keys directly. This is useful if you have a direct agreement with a provider at a lower price, or if you want to use router.audio purely as a proxy without a router.audio account.

Query Parameters

Pass these as query parameters in the WebSocket URL.

Parameter	Required	Type	Default	Description
`provider`	No	string	auto	The Speech-To-Text API provider to route audio to. One of: `auto`, `deepgram` , `assemblyai` , `elevenlabs` , `openai` , `speechmatics` , `soniox` , `gradium` , `sarvam` When set to `auto`, the router automatically selects the best provider based on your request parameters.
`encoding`	Yes	string	-	Various providers allow for a variety of encoding formats; where the encoding format is not compatible with a provider, they are resampled to pcm_s16le at 16kHz mono, or 24kHz for OpenAI. One of: `pcm_s16le`, `pcm_mulaw`, `ogg_opus`, `webm`, `mp3`
`sample_rate`	Yes	number	-	Sample rate in Hz. One of: `8000`, `16000`, `22050`, `24000`, `44100`, `48000`
`channels`	No	number	1	Number of audio channels in the input stream.
`model`	No	string	Depends	Provider-specific model. See Providers page for options. The default models are chosen for a wider range of language availability.
`language`	No	string	en	Language code for recognition. Some providers may detect the language automatically or have multilingual/code-switching capabilities. The full list of supported languages and their codes can be found in the Providers page.
`diarize`	No	boolean	false	Enable speaker diarization on the word level where possible (i.e. Deepgram, Speechmatics and sometimes ElevenLabs).
`punctuate`	No	boolean	true	Enable automatic punctuation in transcripts. N.b. some providers only support punctuated transcripts.
`partial_results`	No	boolean	false	Whether to receive partial (non-final) transcripts. Note that not all providers support this feature, and behavior may vary.
`endpointing`	No	number	500	Silence duration in milliseconds before finalizing a transcript segment
`jwt_token`	No	string	-	Temporary JWT tokens which can be used in place of API keys.
`parse_json`	No	boolean	true	Normalize provider responses into a unified JSON format. Set to false for raw provider output

Response Messages

Transcripts are sent as text frames containing JSON, where only finalized transcripts are delivered and all providers return a unified format. Set parse_json=false to receive raw, unmodified responses from the upstream provider instead.

{
  "type": "transcript",
  "provider": "auto",
  "transcript": "Hello, how are you?",
  "start_time": 0.0,
  "end_time": 0.95,
  "is_partial": false,
  "words": [
    {
      "text": "hello",
      "start_time": 0.0,
      "end_time": 0.42,
      "speaker": null
    },
    {
      "text": "how",
      "start_time": 0.48,
      "end_time": 0.61,
      "speaker": null
    },
    {
      "text": "are",
      "start_time": 0.62,
      "end_time": 0.75,
      "speaker": null
    },
    {
      "text": "you",
      "start_time": 0.76,
      "end_time": 0.90,
      "speaker": null
    }
  ]
}

Field	Type	Description
`type`	string	Message type. Either "transcript" for recognized speech segments or "turn_end" for speaker turn end events.
`provider`	string	The provider that generated this transcript
`transcript`	string	The recognized text
`start_time`	number	Segment start time in seconds
`end_time`	number	Segment end time in seconds
`words[]`	array \| null	Word-level details
`words[].text`	string	The recognized word
`words[].start_time`	number	Word start time in seconds
`words[].end_time`	number	Word end time in seconds
`words[].speaker`	string \| null	Speaker ID when diarization is enabled, otherwise null
`is_partial`	boolean	Whether this is a partial (non-final) transcript. Only present when partial_results is enabled.

Partial Responses

When partial_results=true, you will also receive non-final transcripts with is_partial: true. Each partial result replaces the previous one for the current utterance — you should overwrite (not append) when displaying them. Once the utterance is finalized, a message with is_partial: false is sent, at which point you can commit the transcript and begin awaiting the next one.

Turn End

Some providers emit a turn end event to signal that a speaker turn has completed. When supported, router.audio forwards this as a separate message with type: "turn_end". This can be used to detect a natural pause in speech and trigger downstream processing, such as sending the final transcript to an LLM.

Error Handling

Errors are communicated via WebSocket close codes. The server may also send a text frame with an error field before closing the connection.

Close Code	Description
`1000`	Normal closure. The connection completed successfully.
`1006`	Connection lost unexpectedly. Typically a network issue between the client and the server. Note that some APIs only validates the configuration after the connection is established, so you may receive this error if the query parameters are invalid (e.g. unsupported provider or encoding).
`3000`	Unauthorized. Invalid or missing API key / JWT token.
`4040`	Provider error. The upstream STT provider returned an error or could not be reached.