Get Started with TEN Framework in 5 Minutes
A quick start guide to building your first real-time multi-language AI application with TEN Framework. Set up a voice transcription demo using Go, Python, and TypeScript in under 5 minutes.
Building real-time AI applications often means wrestling with complex architectures, language barriers, and integration headaches. The TEN Framework eliminates these pain points by letting you orchestrate multi-language extensions through a unified runtime—and you can have your first app running in under 5 minutes.
This guide walks through the quick start process using the transcriber demo, a real-world example that showcases Go for WebSocket handling, Python for speech recognition, and TypeScript for subtitle generation—all working together seamlessly.
Why TEN Framework?
Before diving into the setup, here's what makes TEN different:
Multi-Language by Design → Use the best tool for each job: Go for networking, Python for AI, TypeScript for UI logic
Real-Time First → Built-in support for audio/video streaming and low-latency data flows
Extension-Based → Modular architecture lets you swap components without rewriting your entire stack
Production Ready → Handles cross-language communication, memory management, and concurrency out of the box
The framework abstracts away the complexity of multi-language orchestration while preserving the performance characteristics of each language.
System Requirements
Before you begin, verify your system meets these requirements:
Supported Platforms
Linux (x64)
macOS Intel (x64)
macOS Apple Silicon (arm64)
Required Software
Python 3.10 → For AI extensions and speech processing
Go 1.20+ → For WebSocket and networking extensions
Node.js & npm → For frontend and TypeScript extensions
Quick Verification
Run these commands to check your setup:
python --version # Should show 3.10.x
go version # Should show 1.20 or higher
node --version # Should show recent version
Python Environment Setup
The guide recommends using virtual environments to avoid conflicts with system Python. You can use either pyenv or venv:
# Using pyenv
pyenv install 3.10
pyenv local 3.10
# Or using venv
python3.10 -m venv .venv
source .venv/bin/activate
Installation Process
Step 1: Install TEN Manager
The TEN Manager handles project creation, dependency management, and builds. Install it with a single command:
curl -fsSL https://get.theten.ai/install.sh | bash
After installation, verify it's in your PATH:
tman --version
If the command isn't found, add /usr/local/bin to your PATH:
export PATH="/usr/local/bin:$PATH"
echo 'export PATH="/usr/local/bin:$PATH"' >> ~/.bashrc # or ~/.zshrc
Step 2: Create Your First Project
Now generate the transcriber demo application:
tman create transcriber_demo
cd transcriber_demo
This scaffolds a complete project structure with all necessary configuration files, extension definitions, and the multi-language runtime graph.
Step 3: Install Dependencies
The framework needs to install two types of dependencies:
# Install TEN packages and programming language dependencies
tman install
This step typically takes 1-2 minutes and:
Downloads required TEN framework packages
Installs Python dependencies (speech recognition libraries)
Sets up Go modules (WebSocket handling)
Configures TypeScript dependencies (subtitle generation)
Step 4: Build the Project
Compile all extensions across all languages:
tman build
The build process takes approximately 30 seconds and compiles:
Go extensions → Native binaries for WebSocket functions
Python extensions → Bytecode and dependency linking
TypeScript extensions → Transpiled JavaScript modules
Configuration
Before running the demo, you need to configure your speech service credentials. The transcriber demo uses Azure Speech Service by default.
Create Environment File
Create a .env file in your project root:
cp .env.example .env
Add Azure Credentials
Open .env and add your Azure Speech Service credentials:
AZURE_SPEECH_KEY=your_azure_speech_key_here
AZURE_SPEECH_REGION=your_region # e.g., eastus, westus2
Don't have Azure credentials? You can:
Sign up for a free Azure account (includes free tier for Speech Service)
Or swap in a different STT provider by modifying the extension configuration
Running Your First TEN Application
With everything configured, start the application:
tman start
The framework will:
Initialize the multi-language runtime
Load and connect all extensions (Go, Python, TypeScript)
Start the WebSocket server
Launch the web interface on port 8080
Access the Demo
Open your browser and navigate to:
http://localhost:8080
You should see the transcriber interface with two main features:
Real-Time Voice Transcription → Click to allow microphone access, then start speaking. Your speech appears as text in real-time.
Audio File Upload → Upload pre-recorded audio files and watch as subtitles are generated with timestamps.
Understanding the Demo Architecture
The transcriber demo showcases TEN's multi-language orchestration capabilities. Here's how data flows through the system:
┌─────────────────┐
│ Browser/Client │
└────────┬────────┘
│ WebSocket Audio Stream
▼
┌─────────────────┐
│ Go Extension │ ← WebSocket handling & audio routing
└────────┬────────┘
│ PCM Audio Frames
▼
┌─────────────────┐
│ Python Extension│ ← Azure Speech Recognition
└────────┬────────┘
│ Transcription Events
▼
┌─────────────────┐
│TypeScript Ext. │ ← Subtitle formatting & timestamps
└────────┬────────┘
│ Formatted Subtitles
▼
┌─────────────────┐
│ Browser/Client │
└─────────────────┘
Why This Architecture Matters
Go handles I/O → Efficient WebSocket connections and audio streaming
Python processes AI → Leverages Azure's Python SDK and ML libraries
TypeScript manages UI logic → Format subtitles, manage timestamps, handle display
Each component runs in its native runtime but communicates through TEN's unified messaging system. You get the performance of Go, the AI ecosystem of Python, and the web integration of TypeScript—without manual IPC, serialization, or protocol design.
Common Issues and Solutions
macOS Library Loading Failures
Symptom: Error loading dynamic libraries on macOS
Solution: Grant terminal permissions for the app:
xattr -d com.apple.quarantine /path/to/tman
Network Connectivity Problems
Symptom: Can't reach Azure Speech Service
Solutions:
Check your firewall settings
Verify internet connectivity
Confirm Azure credentials in
.envTest with a different network
Port Already in Use
Symptom: "Address already in use: port 8080"
Solution: Either stop the conflicting process or change the port:
# Find what's using port 8080
lsof -i :8080
# Or change the port in config/server.json
{
"port": 8081
}
Build Errors
Symptom: Compilation fails with missing dependencies
Solutions:
Re-run
tman installto ensure all dependencies are presentCheck that Go, Python, and Node.js versions meet requirements
Clear build cache:
tman clean && tman build
Dependency Installation Challenges
Symptom: tman install fails with package resolution errors
Solutions:
Use a virtual environment for Python isolation
Check network access to package registries
Try clearing package cache:
rm -rf ~/.tman/cache
What's Actually Happening Under the Hood
When you run the demo, TEN Framework:
Loads the Runtime Graph → Parses your configuration to understand which extensions connect to which
Spawns Language Runtimes → Starts separate processes for Go, Python, and TypeScript
Establishes Message Channels → Creates high-performance IPC channels between languages
Routes Data → Forwards audio frames from Go → Python, transcriptions from Python → TypeScript
Handles Lifecycle → Manages startup, shutdown, and crash recovery across all components
All of this happens transparently. Your extensions just send and receive messages through the TEN API—no manual process management, no custom serialization, no protocol design.
Next Steps: Building Your Own Extensions
Once you've run the demo successfully, you're ready to build custom extensions. The framework makes it easy to:
Swap Out Components
Replace Azure Speech with a different STT provider:
tman extension add deepgram_stt
# Update graph configuration to use new extension
Add New Languages
TEN supports C++, Rust, and more. Add a Rust extension for audio processing:
tman extension create my_audio_processor --language rust
Extend Functionality
Add sentiment analysis to transcriptions:
tman extension create sentiment_analyzer --language python
# Connect it after the STT extension in your graph
Create Custom Workflows
Build a complete voice assistant by chaining:
STT → Transcription
LLM → Response generation
TTS → Voice synthesis
WebSocket → Real-time delivery
Performance Characteristics
The transcriber demo showcases TEN's real-time capabilities:
Audio Latency → Sub-100ms from microphone to STT extension (Go WebSocket handling)
Transcription Speed → Real-time processing (Azure's streaming API via Python)
End-to-End → Text appears in browser typically within 200-300ms of speech
Cross-Language Overhead → Minimal—TEN uses zero-copy message passing where possible
This performance holds even as you add more extensions. The framework's message routing scales linearly with the complexity of your graph.
Beyond the Quick Start
This 5-minute guide gets you running, but TEN Framework offers much more:
Visual Graph Designer → Build extension graphs with TMAN Designer's drag-and-drop interface
Production Deployment → Docker support, Kubernetes orchestration, and cloud-native tooling
Advanced Patterns → Implement pub/sub, request/response, and streaming patterns
Extension Marketplace → Reuse community extensions for common tasks
The transcription demo is intentionally minimal to show the core concepts. In production, you'd add error handling, state management, authentication, and monitoring—all supported by the framework's extension API.
Why This Matters
Traditional approaches to multi-language integration force you to choose:
Single Language → Use one language and accept its limitations for certain tasks
Microservices → Build separate services with REST/gRPC, accept network latency
FFI/Bindings → Write complex C bindings, manage memory across language boundaries
TEN Framework gives you a fourth option: language-native extensions orchestrated by a real-time runtime. You write idiomatic code in each language, and the framework handles the rest.
The result? Go's concurrency, Python's AI ecosystem, TypeScript's web integration—all in a single application, with real-time performance.
Conclusion
In five minutes, you've:
Installed the TEN Framework toolchain
Created a multi-language AI application
Configured external services
Run a real-time transcription system
Understood the extension architecture
The transcriber demo is just the beginning. Use it as a template for:
Live captioning systems
Voice assistants
Meeting transcription tools
Real-time translation services
Accessibility features
The framework abstracts the hard parts—process management, serialization, language interop—so you can focus on building features that matter.
Ready to build something real?
👉 Explore the TEN Framework Documentation
💬 Join the Discord Community to connect with other developers
📦 Browse Extension Marketplace for reusable components
Continue Learning: