# router.audio

> router.audio is a WebSocket proxy for real-time streaming speech-to-text (STT). It provides a single endpoint that routes audio to multiple STT providers — Deepgram, AssemblyAI, ElevenLabs, OpenAI, Speechmatics, Soniox, and Gradium — using a consistent request and response format. Developers connect once and switch providers by changing a query parameter, with no additional SDKs or integrations required.

## Pages

- [Home](https://router.audio/): Overview of router.audio, code examples in Python, TypeScript/React, Go, Elixir, and Ruby, and a list of supported providers.
- [Speech to Text](https://router.audio/speech-to-text): Product page covering what router.audio does, key features, and links to documentation.
- [Which STT APIs support your language?](https://router.audio/speech-to-text/language-support): A comparison table of language support across all providers.
- [Language Checker](https://router.audio/speech-to-text/language-finder): Interactive search — type a language name and instantly see which providers and models support it for real-time streaming transcription.
- [Enterprise](https://router.audio/enterprise): Enterprise plans covering volume pricing, SLA guarantees, dedicated support, priority routing, compliance options, and a contact form.

## Documentation

- [Getting Started](https://router.audio/docs): How to connect to router.audio, obtain an API key, and send your first audio stream.
- [Models and Languages](https://router.audio/docs/providers): Full list of supported STT providers, available models per provider, and which languages each model supports.
- [API Reference](https://router.audio/docs/api_reference): WebSocket endpoint, authentication, query parameters, response message format, partial (interim) transcripts, and error codes.

## API

The WebSocket endpoint is: `wss://api.router.audio/v1/listen`

Authentication is via the `x-api-key` header or `api_key` query parameter, using the provider's own API key.

Key query parameters:
- `provider` (required): One of `deepgram`, `assemblyai`, `elevenlabs`, `openai`, `speechmatics`, `soniox`, `gradium`
- `model`: Provider-specific model name
- `language`: BCP-47 language code (e.g. `en`, `fr`, `zh`)
- `encoding`: Audio encoding — `pcm_s16le`, `webm`, or `mulaw`
- `sample_rate`: Sample rate in Hz (e.g. `16000`)

Responses are JSON with a consistent schema across all providers:
```json
{
  "provider": "deepgram",
  "transcript": "Hello, how are you?",
  "start_time": 0.0,
  "end_time": 0.95,
  "is_partial": false,
  "words": [
    { "text": "hello", "start_time": 0.0, "end_time": 0.42, "speaker": null }
  ]
}
```