Real-time voice with Gradium TTS & STT in TEN
We just shipped two new voice integrations for TEN: Gradium speech-to-text (STT/ASR) and Gradium text-to-speech (TTS). They unlock low-latency, ultrarealistic audio for assistants, copilots, and any product that needs to sound human while responding in real time. Explore Gradium at gradium.ai.
TL;DR
- Gradium STT streams transcriptions over WebSockets with interim and final results, plus VAD support to keep latency low.
- Gradium TTS generates lifelike PCM audio (48 kHz by default) and streams it back instantly for smooth turn-taking.
- Both ship as TEN extensions (
gradium_asr_pythonandgradium_tts_python) with simple JSON config and region-aware endpoints (US/EU). - Designed for real-time loops: microphone -> Gradium STT -> reasoning -> Gradium TTS -> speaker, all under one TEN app graph.
Why Gradium voice inside TEN
TEN already handles streaming I/O, session state, and routing between extensions. Gradium fits perfectly into that pipeline: its WebSocket APIs keep round trips tight, while the voice quality feels natural enough for production-grade customer experiences. Combining the two lets you prototype and ship responsive voice agents without gluing together separate services.
What we shipped
Gradium STT (ASR)
- WebSocket streaming for real-time transcription with interim and final results.
- Multi-region endpoints (
wss://us.api.gradium.ai/api/speech/asr,wss://eu.api.gradium.ai/api/speech/asr). - VAD-aware flow so you can trim silence and speed up responses.
- Flexible inputs: PCM, WAV, or Opus with 16-bit mono audio (24 kHz recommended).
Gradium TTS
- Streaming TTS that emits 16-bit PCM audio at 48 kHz by default, with options for 16 kHz and 24 kHz PCM.
- Works with any Gradium voice ID and model name; keep the style consistent across product surfaces.
- Region-aware WebSocket endpoint (
wss://<region>.api.gradium.ai/api/speech/tts) plus simpleGRADIUM_API_KEYauth.
Quick start: wire both into a TEN app
- Set your API key:
export GRADIUM_API_KEY=your_gradium_key- Add Gradium STT and TTS nodes to your TEN graph (simplified example):
{
"nodes": [
{
"type": "extension",
"name": "gradium_asr",
"addon": "gradium_asr_python",
"extension_group": "gradium_asr_group",
"property": {
"params": {
"api_key": "${env:GRADIUM_API_KEY|}",
"region": "us",
"model_name": "default",
"input_format": "pcm",
"sample_rate": 24000
}
}
},
{
"type": "extension",
"name": "gradium_tts",
"addon": "gradium_tts_python",
"extension_group": "gradium_tts_group",
"property": {
"params": {
"api_key": "${env:GRADIUM_API_KEY}",
"region": "us",
"model_name": "default",
"voice_id": "YOUR_GRADIUM_VOICE_ID",
"output_format": "pcm"
}
}
}
],
"connections": [
{
"extension_group": "microphone_group",
"extension": "microphone",
"audio_frame_out": [
{
"name": "pcm_frame",
"dest": [
{
"extension_group": "gradium_asr_group",
"extension": "gradium_asr"
}
]
}
]
},
{
"extension_group": "llm_router_group",
"extension": "llm_router",
"text_out": [
{
"name": "reply_text",
"dest": [
{
"extension_group": "gradium_tts_group",
"extension": "gradium_tts"
}
]
}
]
}
]
}- Start your TEN app and you have a full duplex loop: mic -> Gradium STT -> LLM/tooling -> Gradium TTS -> speakers.
Optimization tips for low latency
- Match sample rates: keep STT input at 24 kHz mono PCM and TTS output at 48 kHz PCM for fidelity; downsample only once at the edge.
- Use VAD: let Gradium STT handle silence trimming to shorten end-of-utterance delays.
- Cache voices: pick a single Gradium
voice_idper session to avoid extra lookups. - Stay regional: choose the closest Gradium region (US/EU) for lower round-trip time.
- Stream small chunks: send ~80 ms PCM chunks (1,920 samples at 24 kHz) to keep transcripts flowing smoothly.
Where to use it
- Customer support and sales agents that need natural prosody and tight response loops.
- Real-time copilots in productivity or creative tools where hands-free operation matters.
- Multilingual kiosks, IVRs, and embedded devices that rely on reliable ASR plus high-quality playback.
Ready to try?
Gradium voice is live in TEN today. Grab an API key from gradium.ai, drop your config into property.json, and ship a voice experience that sounds human and responds fast. If you build something cool with Gradium TTS or STT, let us know; we'd love to feature it next.