Chatterbox Demo - Modern Web Interface

Chatterbox Demo Banner

A modern, sci-fi themed web interface for the open-source Chatterbox TTS models by Resemble AI. Features a futuristic "camera wall" background with live webcam integration and a floating glassmorphism control panel.

✨ Features

🎭 Dual Model Support: Switch between Chatterbox-Turbo (English, fast) and Chatterbox-Multilingual (23+ languages)
🌍 Multi-language: Support for Chinese, Japanese, Korean, Spanish, French, German, Arabic, and more
🎨 Sci-Fi UI: Dynamic camera grid background with live webcam feeds and CCTV-style effects
🎙️ Voice Cloning: Upload 5-10 second audio samples to clone any voice
🎵 Audio Preview: Listen to reference audio before generation
🏷️ Paralinguistic Tags: Use [laugh], [cough], [sigh] for expressive speech
💎 Glassmorphism Design: Modern, sleek floating interface with neon accents

📋 Prerequisites

Python 3.10+
Node.js 18+ and npm
8GB+ RAM (16GB+ recommended for multilingual model)
Hugging Face account and token (for gated models)

🚀 Quick Start

1. Clone Repository


git clone https://github.com/quyangminddock/chatterbox_demo.git
cd chatterbox_demo

2. Backend Setup


# Install Python dependencies
pip install -e .

# Additional dependencies for API
pip install fastapi uvicorn python-multipart librosa soundfile

# Set your Hugging Face token (get one from https://huggingface.co/settings/tokens)
export HF_TOKEN=your_huggingface_token_here

# Start the API server (loads both Turbo and Multilingual models)
python api.py

The server will start on http://localhost:8000. Model loading takes 2-3 minutes on first run.

3. Frontend Setup


cd ui

# Install dependencies
npm install

# Start development server
npm run dev

Open http://localhost:3000 in your browser.

🎯 Usage

Select Model: Choose between Turbo (English only, faster) or Multilingual (23+ languages)
Choose Language: If using Multilingual, select your target language (e.g., "中文" for Chinese)
Enter Text: Type or paste the text you want to synthesize
Upload Audio (Optional): Upload a 5-10 second clear audio sample for voice cloning
Preview Audio: Listen to your uploaded reference before generation
Generate: Click "INITIATE_CLONE" to generate speech
Play: Listen to the generated audio directly in the interface

Supported Languages (Multilingual Model)

Arabic, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Norwegian, Polish, Portuguese, Russian, Spanish, Swahili, Swedish, Turkish

📁 Project Structure


chatterbox_demo/
├── api.py                 # Unified FastAPI server (both models)
├── api_multilingual.py    # Standalone multilingual API (optional)
├── ui/                    # Next.js frontend
│   ├── components/
│   │   ├── CameraBackground.tsx  # Surveillance-style grid background
│   │   └── FloatingHUD.tsx       # Main control interface
│   ├── app/
│   │   └── globals.css          # Sci-fi theme styling
│   └── public/            # CCTV images and assets
├── src/chatterbox/        # Modified Chatterbox source (dtype fixes)
└── README.md

🔧 Technical Details

Backend (FastAPI)

Loads both Turbo and Multilingual models on startup
CPU-optimized with float32 dtype fixes for compatibility
Audio sanitization using librosa for consistent format
CORS enabled for local development

Frontend (Next.js + TypeScript)

React with TypeScript and Tailwind CSS
Framer Motion for smooth animations
Live webcam integration in corner cells
Model and language selection dropdowns
Audio preview and playback

Key Fixes Applied

Float32 Dtype Consistency: Modified tts_turbo.py and mtl_tts.py to ensure float32 throughout pipeline
S3Tokenizer Fix: Added .float() cast in mel spectrogram computation
CPU Map Location: Added map_location='cpu' for model loading on non-CUDA devices
Librosa Audio Loading: Switched from torchaudio to librosa for consistent audio handling

📸 Screenshots

UI Demo

🐛 Troubleshooting

Models not loading

Multilingual model fails: Ensure you have enough RAM (16GB+)
Token errors: Verify your HF_TOKEN is set and has access to gated models
map_location errors: Make sure you're using the modified source files with CPU fixes