Customize via Code
Configuring extensions via TMAN Designer should well help you get started with basic modifications. But what happens when you need to implement more complex changes that aren't supported by the visual interface?
Before making any code changes, it's essential to understand the existing structure and functionality of the main extension.
In the TEN Framework, connections are used to define how video and audio data flow between extensions. This approach reduces unnecessary complexity and ensures high performance—a core strength of TEN.
For events and text-based data, which are typically less resource-intensive, handling them directly in code offers greater flexibility and efficiency for implementing complex logic.
To make this workflow easier, the latest version of the default TEN Agent app introduces a built-in “main” extension in every graph. This extension acts as a central hub for orchestrating components and managing their connections.
With this design:
-
Video/audio data always flows through connections between extensions. This lets you take advantage of the framework’s performance optimizations without touching low-level media processing logic.
-
The “main” extension can communicate with any other extension and collect the event/text data it needs.
- For inbound events/data, you must register connections from the source extensions to the “main” extension in the
property.jsonfile. - For outbound events/data, you can call
setDestsat runtime to specify destination extensions, without needing predefined connections inproperty.json.
- For inbound events/data, you must register connections from the source extensions to the “main” extension in the
-
All application-specific logic can reside inside the “main” extension.
-
Other extensions should remain stateless and independent, making them easy to reuse.
The "main" extension can be written in any language supported by the TEN Framework. For example, you could implement the main extension in Node.js while keeping other extensions in Python or C++. This design neatly avoids Node.js performance bottlenecks with media processing, particularly in production environments.
Comparison of Main Extension Variants
Choose the right architecture for your use case:
| Aspect | Python Cascade | Python Realtime V2V | Node.js Cascade |
|---|---|---|---|
| Language | Python | Python | TypeScript/JavaScript |
| Use Case | Traditional voice agent with ASR → LLM → TTS pipeline | Direct multimodal (MLLM) with real-time vision/audio | High-performance voice agent with media streaming |
| Event Model | Decorator-based @agent_event_handler() | Match/case with on_data() method | Direct event handlers in class methods |
| Message Flow | S2C: Receives typed events from runtime | S2C: Receives all events via single on_data | Direct: Routes messages as they arrive |
| LLM Type | Standard LLM (text-only) | Multimodal LLM (vision + audio + text) | Standard LLM (text-only) |
| Streaming Support | Text streaming with sentence splitting | MLLM streaming with modality support | Streaming with sentence parsing |
| Interruption | Barge-in detection via _interrupt() | Event-based via ServerInterruptEvent | Barge-in with sentence fragment reset |
| Tool Support | Via @tool_handler() decorator | Via match/case FunctionCallEvent | Via direct agent tool registration |
| Configuration | Via config.yaml in Python | Via config.py with greeting | Via manifest and property.json |
| Performance | Good for standard workloads | Optimized for real-time multimodal | Best for media-heavy workloads |
| Complexity | Moderate (decorators + event bus) | Moderate (match/case dispatch) | Moderate (direct method routing) |
| Best For | Learning TEN fundamentals | Modern AI with vision + speech | Production voice agents |
Quick Decision Guide
Choose Python Cascade if you want to learn how the main extension works or build a standard voice agent in Python.
Choose Python Realtime V2V if you need multimodal capabilities (camera input, vision processing) alongside voice, or want to experiment with advanced LLMs that support multiple modalities.
Choose Node.js Cascade if you're building a production voice agent and want the performance benefits of Node.js while avoiding media processing bottlenecks through the connection-based audio flow.
Main Extension Documentation
Python Cascade Pattern
Traditional ASR → LLM → TTS pipeline with decorator-based event handling. Ideal for learning TEN fundamentals or building standard voice agents.
- Python Cascade Main — Complete guide with decorator patterns, event routing, and LLM integration
- Best for: Learning, standard voice agents, Python-only environments
Python Realtime V2V Pattern
Multimodal MLLM integration with real-time vision and audio. Modern approach for advanced AI capabilities.
- Python Realtime V2V Main — Event routing with match/case, C2S primitives, tool handling, and function calls
- Best for: Vision + speech, multimodal AI, modern LLMs with vision
Node.js Cascade Pattern
High-performance voice agent using TypeScript/JavaScript with optimized media streaming.
- Node.js Cascade Main — ASR handling, LLM response streaming, tool registration, and natural interruption
- Best for: Production voice agents, performance-critical applications, Node.js teams
Common Tasks
Understanding Event Flow
- Python Cascade: Decorator-based event handlers (
@agent_event_handler()) - See ASR event handling section - Python Realtime V2V: Match/case dispatch - See Event routing section
- Node.js Cascade: Direct method handlers - See ASR handling section
Handling Tool Calls
- Python Cascade: Tool decorator pattern
- Python Realtime V2V: Function call event handling - See C2S primitives
- Node.js Cascade: Agent tool registration
Implementing Interruption
Learn how to handle user interruption (barge-in) in your pattern:
- Python Cascade:
_interrupt()method in extension - Python Realtime V2V:
ServerInterruptEventhandling - See Patterns section - Node.js Cascade: Natural barge-in with sentence fragment reset - See LLM handling section
Last Updated