This demo system is officially provided by the MiniCPM-o 4.5 model training team. It uses a PyTorch + CUDA inference backend, combined with a lightweight frontend-backend design, aiming to demonstrate the full audio-video omnimodal full-duplex capabilities of MiniCPM-o 4.5 in a transparent, concise, and lossless manner.
| Mode | Features | I/O Modalities | Paradigm |
|---|---|---|---|
| Turn-based Chat | Low-latency streaming interaction; requires button or VAD (Voice Activity Detection) to trigger responses; high response accuracy; strong basic capabilities | Audio + Text input, Audio + Text output | Turn-based |
| Omnimodal Full-Duplex | Real-time omnimodal full-duplex interaction; visual and voice input with simultaneous voice output; model autonomously decides when to speak; powerful cutting-edge capabilities | Vision + Audio input, Text + Voice output | Full-duplex |
| Audio Full-Duplex | Real-time audio full-duplex interaction; voice input and voice output happen simultaneously; model autonomously decides when to speak; powerful cutting-edge capabilities | Audio input, Text + Voice output | Full-duplex |
The 3 currently supported modes share a single model instance with millisecond-level hot-switching (< 0.1ms). More modes will be supported soon.
Additional features:

Frontend (HTML/JS) | HTTPS / WSS Gateway (:8006, HTTPS) | HTTP / WS (internal) Worker Pool (:22400+) +-- Worker 0 (GPU 0) +-- Worker 1 (GPU 1) +-- ...
FFmpeg is required for video frame extraction and inference result visualization. For more information, visit the official FFmpeg website.
macOS (Homebrew):
brew install ffmpeg
Ubuntu/Debian:
sudo apt update && sudo apt install ffmpeg
Verify installation:
ffmpeg -version
1. Install Python 3.10
We recommend using miniconda to install Python 3.10.
mkdir -p ./miniconda3_install_tmp
# Download the miniconda3 installation script
wget https://repo.anaconda.com/miniconda/Miniconda3-py310_25.11.1-1-Linux-x86_64.sh -O ./miniconda3_install_tmp/miniconda.sh
# Install miniconda3 into the project directory
bash ./miniconda3_install_tmp/miniconda.sh -b -u -p ./miniconda3
After installation, you will have an empty base environment. Activate this base environment, which uses Python 3.10 by default.
source ./miniconda3/bin/activate
python --version # Should display 3.10.x
2. Install Dependencies for MiniCPM-o 4.5
Using the install.sh script in the project directory is the fastest way. It creates a venv virtual environment named base under .venv in the project directory and installs all dependencies.
source ./miniconda3/bin/activate
bash ./install.sh
If you have a good network connection, the entire installation process takes about 5 minutes. If you are in China, consider using a third-party PyPI mirror such as the Tsinghua mirror.
You can also install dependencies manually in 2 steps:
# First, prepare an empty Python 3.10 environment
source ./miniconda3/bin/activate
python -m venv .venv/base
source .venv/base/bin/activate
# Install PyTorch
pip install "torch==2.8.0" "torchaudio==2.8.0"
# Install the remaining dependencies
pip install -r requirements.txt
3. Create Configuration File
Copy config.example.json to config.json in the project directory.
cp config.example.json config.json
The model path (model_path) defaults to openbmb/MiniCPM-o-4_5. If you have access to Hugging Face, no modification is needed — the model will be automatically pulled from Hugging Face.
(Optional) If you prefer to download model weights to a fixed location, or cannot access Hugging Face, you can modify model_path to your local model path.
# Install huggingface cli
pip install -U huggingface_hub
# Download the model
huggingface-cli download openbmb/MiniCPM-o-4_5 --local-dir /path/to/your/MiniCPM-o-4_5
If you cannot access Hugging Face, you can use the following two methods to download the model in advance.
pip install -U huggingface_hub
export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download openbmb/MiniCPM-o-4_5 --local-dir /path/to/your/MiniCPM-o-4_5
pip install modelscope modelscope download --model OpenBMB/MiniCPM-o-4_5 --local_dir /path/to/your/MiniCPM-o-4_5
Modify "gateway_port": 8006 to change the deployment port. The default is 8006.
4. Start the Service
CUDA_VISIBLE_DEVICES=0,1,2,3 bash start_all.sh
After the service starts, visit https://localhost:8006. The self-signed certificate will trigger a browser warning — click "Advanced" → "Proceed" to continue.
The following are advanced startup options, currently for developer reference.
CUDA_VISIBLE_DEVICES=0,1 bash start_all.sh # Specify GPUs
bash start_all.sh --compile # torch.compile acceleration (experimental, unstable)
bash start_all.sh --http # Downgrade to HTTP (not recommended, mic/camera APIs require HTTPS)
Manual Startup (step by step):
# Worker (one per GPU)
CUDA_VISIBLE_DEVICES=0 PYTHONPATH=. .venv/base/bin/python worker.py --worker-index 0 --gpu-id 0
# Gateway
PYTHONPATH=. .venv/base/bin/python gateway.py --port 10024 --workers localhost:22400
5. Stop the Service:
pkill -f "gateway.py|worker.py"
Project Code Structure
minicpmo45_service/ ├── config.json # Service config (copied from config.example.json, gitignored) ├── config.example.json # Config example (full fields + defaults) ├── config.py # Config loading logic (Pydantic definition + JSON loading) ├── requirements.txt # Python dependencies ├── start_all.sh # One-click startup script │ ├── gateway.py # Gateway (routing, queuing, WS proxy) ├── worker.py # Worker (inference service) ├── gateway_modules/ # Gateway business modules │ ├── core/ # Core encapsulation │ ├── schemas/ # Pydantic schemas (request/response) │ └── processors/ # Inference processors (UnifiedProcessor) │ ├── MiniCPMO45/ # Model core inference code ├── static/ # Frontend pages ├── resources/ # Resource files (reference audio, etc.) ├── tests/ # Tests └── tmp/ # Runtime logs and PID files
Frontend Routes
| Page | URL |
|---|---|
| Non-streaming | https://localhost:8006 |
| Omnimodal Full-Duplex | https://localhost:8006/omni |
| Audio Full-Duplex | https://localhost:8006/audio_duplex |
| Dashboard | https://localhost:8006/admin |
| API Docs | https://localhost:8006/docs |
All configurations are centralized in config.json (copied from config.example.json).
config.json is gitignored and will not be committed.
Configuration Priority: CLI arguments > config.json > Pydantic defaults
| Group | Field | Default | Description |
|---|---|---|---|
| model | model_path | (required) | HuggingFace format model directory |
| model | pt_path | null | Additional .pt weight override |
| model | attn_implementation | "auto" | Attention implementation: "auto"/"flash_attention_2"/"sdpa"/"eager" |
| audio | ref_audio_path | assets/ref_audio/ref_minicpm_signature.wav | Default TTS reference audio |
| audio | playback_delay_ms | 200 | Frontend audio playback delay (ms); higher = smoother but more latency |
| audio | chat_vocoder | "token2wav" | Chat mode vocoder: "token2wav" (default) or "cosyvoice2" |
| service | gateway_port | 8006 | Gateway port |
| service | worker_base_port | 22400 | Worker base port |
| service | max_queue_size | 100 | Maximum queued requests |
| service | request_timeout | 300.0 | Request timeout (seconds) |
| service | compile | false | torch.compile acceleration |
| service | data_dir | "data" | Data directory |
| duplex | pause_timeout | 60.0 | Duplex pause timeout (seconds) |
Minimal Configuration (only model path required):
{"model": {"model_path": "/path/to/model"}}
# Worker
python worker.py --model-path /alt/model --pt-path /alt/weights.pt --ref-audio-path /alt/ref.wav --compile
# Gateway
python gateway.py --port 10025 --workers localhost:22400,localhost:22401 --http
| Resource | Token2Wav (default) |
|---|---|
| VRAM (per Worker, after initialization) | ~21.5 GB |
| Model loading time | ~16s |
| Mode switching latency | < 0.1ms |
Compile mode incurs an additional ~60s compilation time on the first inference.
# Schema unit tests (no GPU required)
PYTHONPATH=. .venv/base/bin/python -m pytest tests/test_schemas.py -v
# Processor tests (GPU required)
CUDA_VISIBLE_DEVICES=0 PYTHONPATH=. .venv/base/bin/python -m pytest tests/test_chat.py tests/test_streaming.py tests/test_duplex.py -v -s
# API integration tests (service must be running)
PYTHONPATH=. .venv/base/bin/python -m pytest tests/test_api.py -v -s