Python - Cascade Main
Building on the Main Extension
The file extension.py
is the control center of the agent.
If you want to build your own solution on top of the TEN Framework, this is the file you should understand first.
It shows how runtime messages (ASR, LLM, tools, user events) are captured, normalized, and redirected into the agent workflow. By following its patterns, you can easily extend or replace parts of the pipeline with your own logic.
Quick File Layout
You mostly need to know extension.py
. The other files provide typed events and execution helpers.
Architecture Overview
How extension.py
Works
1. Event Routing
The extension listens to runtime messages and turns them into typed events.
For example, an ASR result from ASR extension becomes an ASRResultEvent
object that the agent can use.
Here’s what happens:
- Partial speech → streamed to transcript/logging
- Final speech → sent into the LLM queue
- Overlap detected →
_interrupt()
flushes ongoing TTS/LLM so the user can speak naturally
2. Handling LLM Responses
When the LLM responds, the extension splits sentences and forwards them to TTS while also recording transcripts.
Key points:
- Streaming: You can send partial outputs sentence by sentence to TTS for natural speech.
- Final output: Marked and sent to the transcript collector.
- Reasoning traces: Can be separated if you want to show them differently in your UI.
3. Tool Registration
The extension lets the LLM register tools dynamically.
To add your own tool, define its metadata in events.py
, then handle it in your solution.
This makes it easy to plug in APIs, databases, or custom functions that the LLM can call.
4. Interruption
Natural conversation needs interruption (barge-in).
If the user speaks while the assistant is still generating, _interrupt()
flushes LLM and TTS:
This ensures the assistant doesn’t “talk over” the user.
How to Extend for Your Own Solution
- Custom ASR behavior: Edit
_on_asr_result
to filter text, add punctuation, or preprocess before sending to LLM. - Custom LLM logic: Change
_on_llm_response
to transform text before TTS, or enrich transcripts with metadata. - Add tools: Use
ToolRegisterEvent
to expose your own APIs or functions. - Custom interruption policy: Tweak
_interrupt()
to make the agent more/less tolerant of overlapping speech.
Summary
extension.py
is the heart of the agent: it routes runtime messages into typed events and applies core conversation logic.- By modifying or extending these handlers, you can quickly build your own conversational AI solution on top of TEN.
- Everything else (
agent.py
,llm_exec.py
,events.py
) supports this routing, but the patterns inextension.py
are what you’ll reuse the most.