Public

WeChat Login

Code Issues Pull requests Events Packages Insights

main

Chatterbox Demo - Modern Web Interface

Chatterbox Demo Banner

A modern, sci-fi themed web interface for the open-source Chatterbox TTS models by Resemble AI. Features a futuristic "camera wall" background with live webcam integration and a floating glassmorphism control panel.

✨ Features

🎭 Dual Model Support: Switch between Chatterbox-Turbo (English, fast) and Chatterbox-Multilingual (23+ languages)
🌍 Multi-language: Support for Chinese, Japanese, Korean, Spanish, French, German, Arabic, and more
🎨 Sci-Fi UI: Dynamic camera grid background with live webcam feeds and CCTV-style effects
🎙️ Voice Cloning: Upload 5-10 second audio samples to clone any voice
🎵 Audio Preview: Listen to reference audio before generation
🏷️ Paralinguistic Tags: Use [laugh], [cough], [sigh] for expressive speech
💎 Glassmorphism Design: Modern, sleek floating interface with neon accents

📋 Prerequisites

Python 3.10+
Node.js 18+ and npm
8GB+ RAM (16GB+ recommended for multilingual model)
Hugging Face account and token (for gated models)

🚀 Quick Start

1. Clone Repository

git clone https://github.com/quyangminddock/chatterbox_demo.git
cd chatterbox_demo

2. Backend Setup

# Install Python dependencies
pip install -e .

# Additional dependencies for API
pip install fastapi uvicorn python-multipart librosa soundfile

# Set your Hugging Face token (get one from https://huggingface.co/settings/tokens)
export HF_TOKEN=your_huggingface_token_here

# Start the API server (loads both Turbo and Multilingual models)
python api.py

The server will start on http://localhost:8000. Model loading takes 2-3 minutes on first run.

3. Frontend Setup

cd ui

# Install dependencies
npm install

# Start development server
npm run dev

Open http://localhost:3000 in your browser.

🎯 Usage

Select Model: Choose between Turbo (English only, faster) or Multilingual (23+ languages)
Choose Language: If using Multilingual, select your target language (e.g., "中文" for Chinese)
Enter Text: Type or paste the text you want to synthesize
Upload Audio (Optional): Upload a 5-10 second clear audio sample for voice cloning
Preview Audio: Listen to your uploaded reference before generation
Generate: Click "INITIATE_CLONE" to generate speech
Play: Listen to the generated audio directly in the interface

Supported Languages (Multilingual Model)

Arabic, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Norwegian, Polish, Portuguese, Russian, Spanish, Swahili, Swedish, Turkish

📁 Project Structure

chatterbox_demo/
├── api.py                 # Unified FastAPI server (both models)
├── api_multilingual.py    # Standalone multilingual API (optional)
├── ui/                    # Next.js frontend
│   ├── components/
│   │   ├── CameraBackground.tsx  # Surveillance-style grid background
│   │   └── FloatingHUD.tsx       # Main control interface
│   ├── app/
│   │   └── globals.css          # Sci-fi theme styling
│   └── public/            # CCTV images and assets
├── src/chatterbox/        # Modified Chatterbox source (dtype fixes)
└── README.md

🔧 Technical Details

Backend (FastAPI)

Loads both Turbo and Multilingual models on startup
CPU-optimized with float32 dtype fixes for compatibility
Audio sanitization using librosa for consistent format
CORS enabled for local development

Frontend (Next.js + TypeScript)

React with TypeScript and Tailwind CSS
Framer Motion for smooth animations
Live webcam integration in corner cells
Model and language selection dropdowns
Audio preview and playback

Key Fixes Applied

Float32 Dtype Consistency: Modified tts_turbo.py and mtl_tts.py to ensure float32 throughout pipeline
S3Tokenizer Fix: Added .float() cast in mel spectrogram computation
CPU Map Location: Added map_location='cpu' for model loading on non-CUDA devices
Librosa Audio Loading: Switched from torchaudio to librosa for consistent audio handling

📸 Screenshots

UI Demo

🐛 Troubleshooting

Models not loading

Multilingual model fails: Ensure you have enough RAM (16GB+)
Token errors: Verify your HF_TOKEN is set and has access to gated models
map_location errors: Make sure you're using the modified source files with CPU fixes

Voice cloning not working

Use clear, 5-10 second audio samples
WAV format recommended
Avoid background noise in reference audio

UI issues

Check that both backend (port 8000) and frontend (port 3000) are running
Clear browser cache if encountering stale UI
Verify camera permissions if webcam feeds don't appear

📝 Credits

Chatterbox Models: Resemble AI
UI Design: Camera surveillance theme with sci-fi aesthetics
CCTV Images: Generated using AI for demonstration purposes

📄 License

This demo interface is released under the MIT License. The underlying Chatterbox models are licensed under Apache 2.0 by Resemble AI.

🤝 Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

⭐ Acknowledgments

Special thanks to Resemble AI for open-sourcing the amazing Chatterbox TTS models!

About

A **modern, sci-fi themed web interface** for the open-source [Chatterbox TTS](https://github.com/resemble-ai/chatterbox) models by Resemble AI. Features a futuristic "camera wall" background with live webcam integration and a floating glassmorphism control panel.

4.33 MiB

2 forks 0 stars 1 branches 0 TagREADMEMIT license

Release
0

Tag

Language

Python95.4%

TypeScript4.1%

CSS0.3%

JavaScript0.2%

35/F,Tencent Building,Kejizhongyi Avenue,Nanshan District,Shenzhen

京ICP备11018762号-111

.github
src
ui
.env.example
.gitignore
CONTRIBUTING.md
Chatterbox-Multilingual.png
Chatterbox-Turbo.jpg
LICENSE
README.md
api.py
api_multilingual.py
example_for_mac.py
example_tts.py
example_tts_turbo.py
example_vc.py
gradio_tts_app.py
gradio_tts_turbo_app.py
gradio_vc_app.py
multilingual_app.py
pyproject.toml