API Reference

Proxy Endpoint

Connect via WebSocket to start streaming audio.

wss://api.router.audio/v1/listen';

Authentication

There are three ways to authenticate with router.audio.

Method Format Notes
HTTP header x-api-key: YOUR_KEY This is the most common method. Pass either a router.audio API key or a provider-specific key depending on your use case. If both are provided, the header takes precedence over query parameters.
Query parameter api_key=YOUR_KEY Useful for browser clients that can't set custom HTTP headers. Pass either a router.audio API key or a provider-specific key depending on your use case.
Query parameter jwt_token=TOKEN For client-facing applications, exchange your router.audio API key for a short-lived JWT token and pass it on the WebSocket connection. This keeps your API key secure and allows for more granular access control.

JWT Tokens

For client-facing applications, exchange your router.audio API key for a short-lived, single-use JWT token. The token is generated server-side and passed on the WebSocket connection, so your API key isn't exposed to the client.

curl -X POST https://api.router.audio/v1/token \
  -H "x-api-key: router_YOUR_KEY"

Use the token on the WebSocket:

wss://api.router.audio/v1/listen?provider=deepgram&jwt_token=TOKEN

JWT tokens are ideal for browser-based applications, mobile apps, or any client where you want to avoid exposing your API key. They also allow for more granular access control, such as restricting usage to specific providers or rate limits.

Bring Your Own Key

Pass your provider's API keys directly. This is useful if you have a direct agreement with a provider at a lower price, or if you want to use router.audio purely as a proxy without a router.audio account.

Query Parameters

Pass these as query parameters in the WebSocket URL.

Parameter Required Type Default Description
provider Yes string - The Speech-To-Text API provider to route audio to.

One of: deepgram , assemblyai , elevenlabs , openai , speechmatics , soniox , gradium
encoding Yes string - Various providers allow for a variety of encoding formats; where the encoding format is not compatible with a provider, they are resampled to pcm_s16le at 16kHz mono, or 24kHz for OpenAI.

One of: pcm_s16le, pcm_mulaw, ogg_opus, webm, mp3
sample_rate Yes number - Sample rate in Hz.

One of: 8000, 16000, 22050, 24000, 44100, 48000
channels No number 1 Number of audio channels in the input stream.
model No string Depends Provider-specific model. See Providers page for options. The default models are chosen for a wider range of language availability.
language No string en Language code for recognition. Some providers may detect the language automatically or have multilingual/code-switching capabilities. The full list of supported languages and their codes can be found in the Providers page.
diarize No boolean false Enable speaker diarization on the word level where possible (i.e. Deepgram, Speechmatics and sometimes ElevenLabs).
punctuate No boolean true Enable automatic punctuation in transcripts. N.b. some providers only support punctuated transcripts.
partial_results No boolean false Whether to receive partial (non-final) transcripts. Note that not all providers support this feature, and behavior may vary.
endpointing No number 500 Silence duration in milliseconds before finalizing a transcript segment
jwt_token No string - Temporary JWT tokens which can be used in place of API keys.
parse_json No boolean true Normalize provider responses into a unified JSON format. Set to false for raw provider output

Response Messages

Transcripts are sent as text frames containing JSON, where only finalized transcripts are delivered and all providers return a unified format. Set parse_json=false to receive raw, unmodified responses from the upstream provider instead.

{
  "type": "transcript",
  "provider": "deepgram",
  "transcript": "Hello, how are you?",
  "start_time": 0.0,
  "end_time": 0.95,
  "is_partial": false,
  "words": [
    {
      "text": "hello",
      "start_time": 0.0,
      "end_time": 0.42,
      "speaker": null
    },
    {
      "text": "how",
      "start_time": 0.48,
      "end_time": 0.61,
      "speaker": null
    },
    {
      "text": "are",
      "start_time": 0.62,
      "end_time": 0.75,
      "speaker": null
    },
    {
      "text": "you",
      "start_time": 0.76,
      "end_time": 0.90,
      "speaker": null
    }
  ]
}
Field Type Description
type string Message type. Either "transcript" for recognized speech segments or "turn_end" for speaker turn end events.
provider string The provider that generated this transcript
transcript string The recognized text
start_time number Segment start time in seconds
end_time number Segment end time in seconds
words[] array | null Word-level details
words[].text string The recognized word
words[].start_time number Word start time in seconds
words[].end_time number Word end time in seconds
words[].speaker string | null Speaker ID when diarization is enabled, otherwise null
is_partial boolean Whether this is a partial (non-final) transcript. Only present when partial_results is enabled.

Partial Responses

When partial_results=true, you will also receive non-final transcripts with is_partial: true. Each partial result replaces the previous one for the current utterance — you should overwrite (not append) when displaying them. Once the utterance is finalized, a message with is_partial: false is sent, at which point you can commit the transcript and begin awaiting the next one.

Turn End

Some providers emit a turn end event to signal that a speaker turn has completed. When supported, router.audio forwards this as a separate message with type: "turn_end". This can be used to detect a natural pause in speech and trigger downstream processing, such as sending the final transcript to an LLM.

Error Handling

Errors are communicated via WebSocket close codes. The server may also send a text frame with an error field before closing the connection.

Close Code Description
1000 Normal closure. The connection completed successfully.
1006 Connection lost unexpectedly. Typically a network issue between the client and the server. Note that some APIs only validates the configuration after the connection is established, so you may receive this error if the query parameters are invalid (e.g. unsupported provider or encoding).
3000 Unauthorized. Invalid or missing API key / JWT token.
4040 Provider error. The upstream STT provider returned an error or could not be reached.