A modern, sci-fi themed web interface for the open-source Chatterbox TTS models by Resemble AI. Features a futuristic "camera wall" background with live webcam integration and a floating glassmorphism control panel.
✨ Features
🎭 Dual Model Support: Switch between Chatterbox-Turbo (English, fast) and Chatterbox-Multilingual (23+ languages)
🌍 Multi-language: Support for Chinese, Japanese, Korean, Spanish, French, German, Arabic, and more
🎨 Sci-Fi UI: Dynamic camera grid background with live webcam feeds and CCTV-style effects
🎙️ Voice Cloning: Upload 5-10 second audio samples to clone any voice
🎵 Audio Preview: Listen to reference audio before generation
🏷️ Paralinguistic Tags: Use [laugh], [cough], [sigh] for expressive speech
💎 Glassmorphism Design: Modern, sleek floating interface with neon accents
📋 Prerequisites
Python 3.10+
Node.js 18+ and npm
8GB+ RAM (16GB+ recommended for multilingual model)
Hugging Face account and token (for gated models)
🚀 Quick Start
1. Clone Repository
git clone https://github.com/quyangminddock/chatterbox_demo.git
cd chatterbox_demo
2. Backend Setup
# Install Python dependencies
pip install -e .
# Additional dependencies for API
pip install fastapi uvicorn python-multipart librosa soundfile
# Set your Hugging Face token (get one from https://huggingface.co/settings/tokens)export HF_TOKEN=your_huggingface_token_here
# Start the API server (loads both Turbo and Multilingual models)
python api.py
The server will start on http://localhost:8000. Model loading takes 2-3 minutes on first run.
3. Frontend Setup
cd ui
# Install dependencies
npm install
# Start development server
npm run dev
Open http://localhost:3000 in your browser.
🎯 Usage
Select Model: Choose between Turbo (English only, faster) or Multilingual (23+ languages)
Choose Language: If using Multilingual, select your target language (e.g., "中文" for Chinese)
Enter Text: Type or paste the text you want to synthesize
Upload Audio (Optional): Upload a 5-10 second clear audio sample for voice cloning
Preview Audio: Listen to your uploaded reference before generation
Generate: Click "INITIATE_CLONE" to generate speech
Play: Listen to the generated audio directly in the interface
UI Design: Camera surveillance theme with sci-fi aesthetics
CCTV Images: Generated using AI for demonstration purposes
📄 License
This demo interface is released under the MIT License. The underlying Chatterbox models are licensed under Apache 2.0 by Resemble AI.
🤝 Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
⭐ Acknowledgments
Special thanks to Resemble AI for open-sourcing the amazing Chatterbox TTS models!
About
A **modern, sci-fi themed web interface** for the open-source [Chatterbox TTS](https://github.com/resemble-ai/chatterbox) models by Resemble AI. Features a futuristic "camera wall" background with live webcam integration and a floating glassmorphism control panel.