NodeJS - Cascade Main
Main Agent Extension for Node.js
The file index.ts defines the MainControlExtension — the entry point for Node.js agents in the TEN Framework.
This is the class that wires together ASR results, LLM responses, tool registration, and interruption handling. If you're building your own solution on top of TEN (with Node.js), start here.
Quick File Layout
.
├── index.ts → Main extension: message routing + interruption
├── helper.ts → Utilities for sending Cmd/Data, sentence parsing
└── agent/
├── agent.ts → Event bus + orchestration
├── events.ts → Event classes (ASR, LLM, Tools, User)
├── llm_exec.ts → LLM execution queue and handlers
└── struct.ts → Zod schemas for TTS/ASR/LLM messagesArchitecture Overview
Core Architecture
MainControlExtension Class
Extends Extension from ten-runtime-nodejs. Normalizes runtime messages into typed events (ASRResultEvent, LLMResponseEvent, etc.). Routes events into the Agent class. Manages interruption when users talk over the assistant.
Key Properties
class MainControlExtension extends Extension {
tenEnv!: TenEnv;
agent!: Agent;
config!: MainControlConfig;
joinedUserCount: number = 0;
session_id: string = "0";
turn_id: number = 0;
sentenceFragment: string = "";
}session_id + turn_id track conversation state. sentenceFragment accumulates partial output for sentence splitting.
ASR Event Handling
When ASR detects speech, it's turned into an ASRResultEvent.
if (event.final || event.text.length > 2) {
await this._interrupt(); // flush LLM/TTS if user is overlapping
}
if (event.final) {
this.turn_id += 1;
await this.agent.queueLLMInput(event.text);
}
await this._sendTranscript("user", event.text, event.final, Number(this.session_id));Behavior
Partial speech is logged as transcript and may trigger interrupt if overlapping. Final speech is queued for LLM input and increments turn counter. Overlap detection via _interrupt() cancels current LLM/TTS tasks.
Adjust interrupt sensitivity in _onASRResult based on your use case (stricter vs. looser barge-in).
LLM Response Handling
LLM responses are streamed sentence by sentence, then forwarded to TTS and transcript collectors.
if (!event.is_final && event.type === "message") {
const [sentences, fragment] = parseSentences(this.sentenceFragment, event.delta);
this.sentenceFragment = fragment;
for (const s of sentences) {
await this._sendToTTS(s, false);
}
}
await this._sendTranscript(
"assistant", event.text, event.is_final, 100,
event.type === "reasoning" ? "reasoning" : "text"
);Behavior
Streaming sends TTS short sentences for natural, fluent speech. Sentence splitting via parseSentences() breaks output into natural pauses. Partial transcripts are logged as they arrive. Final transcripts are marked complete with reasoning/text type.
Tool Registration
The LLM can register tools dynamically.
async _onToolRegister(event: ToolRegisterEvent) {
await this.agent.registerLLMTool(event.tool, event.source);
}Define your tool schema in events.ts and handle execution in your Agent logic.
Interruption Handling
Natural barge-in is handled by _interrupt().
async _interrupt() {
this.sentenceFragment = "";
await this.agent.flushLLM();
await sendData(this.tenEnv, "tts_flush", "tts", { flush_id: uuidv4() });
await sendCmd(this.tenEnv, "flush", "agora_rtc");
}Flushes LLM requests (cancels pending generations). Cancels TTS playback (stops audio playback). Signals RTC to stop streaming. This ensures the user can always cut in naturally.
Implementation Patterns
Custom ASR Flow
Edit _onASRResult to filter or preprocess speech:
async _onASRResult(event: ASRResultEvent) {
// Custom preprocessing
let text = event.text.toLowerCase().trim();
// Filter unwanted phrases
if (text.includes("ignore")) return;
// Then continue with normal flow
await this._handleASRNormally(text, event.final);
}Custom LLM Output
Modify _onLLMResponse to change text before TTS or UI:
async _onLLMResponse(event: LLMResponseEvent) {
// Transform response
event.delta = event.delta.replace("technical_term", "simplified_term");
// Continue normally
await this._handleLLMNormally(event);
}Add a Tool
- Extend
ToolRegisterEventinevents.ts - Register in
AgentviaregisterLLMTool() - Handle execution in tool handler
Tweak Responsiveness
Adjust _interrupt() for stricter or looser barge-in:
// Stricter (require more speech)
if (event.text.length > 5) {
await this._interrupt();
}
// Looser (interrupt on any sound)
if (event.text.length > 0) {
await this._interrupt();
}Node.js extension changes require a build step. Run task build to rebuild all Node.js extensions.
Supporting Components
agent.ts manages event queues for ASR & LLM, plus handler registration. llm_exec.ts provides async queue that sends requests to LLMs and handles streaming responses, aborts, and tool calls. events.ts defines event classes like ASRResultEvent, LLMResponseEvent, ToolRegisterEvent. helper.ts provides sendCmd, sendData, and parseSentences for splitting output into natural phrases.
Architecture Summary
index.ts (MainControlExtension) is the entry point and router. It normalizes runtime messages into typed events and controls interruption. Supporting files (agent/, helper.ts) provide queues, event models, and execution helpers. The Cascade pattern flows: ASR → LLM → TTS with natural interruption handling.
By following this structure, you can quickly extend or replace parts of the flow to build your own Node.js-based conversational agent on TEN.
最后更新