Main Agent Extension for Node.js

The file index.ts defines the MainControlExtension — the entry point for Node.js agents in the TEN Framework.

This is the class that wires together ASR results, LLM responses, tool registration, and interruption handling. If you're building your own solution on top of TEN (with Node.js), start here.

Quick File Layout

.
├── index.ts          → Main extension: message routing + interruption
├── helper.ts         → Utilities for sending Cmd/Data, sentence parsing
└── agent/
    ├── agent.ts      → Event bus + orchestration
    ├── events.ts     → Event classes (ASR, LLM, Tools, User)
    ├── llm_exec.ts   → LLM execution queue and handlers
    └── struct.ts     → Zod schemas for TTS/ASR/LLM messages

Architecture Overview

Core Architecture

MainControlExtension Class

Extends Extension from ten-runtime-nodejs. Normalizes runtime messages into typed events (ASRResultEvent, LLMResponseEvent, etc.). Routes events into the Agent class. Manages interruption when users talk over the assistant.

Key Properties

class MainControlExtension extends Extension {
  tenEnv!: TenEnv;
  agent!: Agent;
  config!: MainControlConfig;

  joinedUserCount: number = 0;
  session_id: string = "0";
  turn_id: number = 0;
  sentenceFragment: string = "";
}

session_id + turn_id track conversation state. sentenceFragment accumulates partial output for sentence splitting.

ASR Event Handling

When ASR detects speech, it's turned into an ASRResultEvent.

if (event.final || event.text.length > 2) {
  await this._interrupt(); // flush LLM/TTS if user is overlapping
}
if (event.final) {
  this.turn_id += 1;
  await this.agent.queueLLMInput(event.text);
}
await this._sendTranscript("user", event.text, event.final, Number(this.session_id));

Behavior

Partial speech is logged as transcript and may trigger interrupt if overlapping. Final speech is queued for LLM input and increments turn counter. Overlap detection via _interrupt() cancels current LLM/TTS tasks.

Adjust interrupt sensitivity in _onASRResult based on your use case (stricter vs. looser barge-in).

LLM Response Handling

LLM responses are streamed sentence by sentence, then forwarded to TTS and transcript collectors.

if (!event.is_final && event.type === "message") {
  const [sentences, fragment] = parseSentences(this.sentenceFragment, event.delta);
  this.sentenceFragment = fragment;
  for (const s of sentences) {
    await this._sendToTTS(s, false);
  }
}

await this._sendTranscript(
  "assistant", event.text, event.is_final, 100,
  event.type === "reasoning" ? "reasoning" : "text"
);

Behavior

Streaming sends TTS short sentences for natural, fluent speech. Sentence splitting via parseSentences() breaks output into natural pauses. Partial transcripts are logged as they arrive. Final transcripts are marked complete with reasoning/text type.

Tool Registration

The LLM can register tools dynamically.

async _onToolRegister(event: ToolRegisterEvent) {
  await this.agent.registerLLMTool(event.tool, event.source);
}

Define your tool schema in events.ts and handle execution in your Agent logic.

Interruption Handling

Natural barge-in is handled by _interrupt().

async _interrupt() {
  this.sentenceFragment = "";
  await this.agent.flushLLM();
  await sendData(this.tenEnv, "tts_flush", "tts", { flush_id: uuidv4() });
  await sendCmd(this.tenEnv, "flush", "agora_rtc");
}

Flushes LLM requests (cancels pending generations). Cancels TTS playback (stops audio playback). Signals RTC to stop streaming. This ensures the user can always cut in naturally.

Implementation Patterns

Custom ASR Flow

Edit _onASRResult to filter or preprocess speech:

async _onASRResult(event: ASRResultEvent) {
  // Custom preprocessing
  let text = event.text.toLowerCase().trim();

  // Filter unwanted phrases
  if (text.includes("ignore")) return;

  // Then continue with normal flow
  await this._handleASRNormally(text, event.final);
}

Custom LLM Output

Modify _onLLMResponse to change text before TTS or UI:

async _onLLMResponse(event: LLMResponseEvent) {
  // Transform response
  event.delta = event.delta.replace("technical_term", "simplified_term");

  // Continue normally
  await this._handleLLMNormally(event);
}

Add a Tool

Extend ToolRegisterEvent in events.ts
Register in Agent via registerLLMTool()
Handle execution in tool handler

Tweak Responsiveness

Adjust _interrupt() for stricter or looser barge-in:

// Stricter (require more speech)
if (event.text.length > 5) {
  await this._interrupt();
}

// Looser (interrupt on any sound)
if (event.text.length > 0) {
  await this._interrupt();
}

Node.js extension changes require a build step. Run task build to rebuild all Node.js extensions.

Supporting Components

agent.ts manages event queues for ASR & LLM, plus handler registration. llm_exec.ts provides async queue that sends requests to LLMs and handles streaming responses, aborts, and tool calls. events.ts defines event classes like ASRResultEvent, LLMResponseEvent, ToolRegisterEvent. helper.ts provides sendCmd, sendData, and parseSentences for splitting output into natural phrases.

Architecture Summary

index.ts (MainControlExtension) is the entry point and router. It normalizes runtime messages into typed events and controls interruption. Supporting files (agent/, helper.ts) provide queues, event models, and execution helpers. The Cascade pattern flows: ASR → LLM → TTS with natural interruption handling.

By following this structure, you can quickly extend or replace parts of the flow to build your own Node.js-based conversational agent on TEN.

NodeJS - Cascade Main

Main Agent Extension for Node.js

Quick File Layout

Architecture Overview

Core Architecture

MainControlExtension Class

Key Properties

ASR Event Handling

Behavior

LLM Response Handling

Behavior

Tool Registration

Interruption Handling

Implementation Patterns

Custom ASR Flow

Custom LLM Output

Add a Tool

Tweak Responsiveness

Supporting Components

Architecture Summary

目录