CyberVerse is an open-source digital human agent platform with real-time video calling. Create an AI agent you can see and talk to, face to face, just like a video call.
Ever dreamed of having your own J.A.R.V.I.S. — an AI that truly sees you, hears you, and talks back in real time?
Want to see someone you've lost again, hear their voice, watch them smile at you?
Or maybe there's a character you've always wished you could bring to life?
Just one photo. CyberVerse makes them alive.
Not pre-recorded. Not turn-based. Unlimited-duration, live, low-latency video calls with a digital human — first frame in ~1.5s. Built on WebRTC with P2P streaming and embedded TURN/NAT traversal.
Every digital human is more than an avatar you can talk to. It is the AI that actually does things.
Upload a single photo to create your digital human. State-of-the-art avatar models deliver real-time facial animation, natural lip-sync, and subtle idle breathing — no 3D modeling or motion capture.
Brain, face, voice, ears — every component is a swappable plugin. Mix and match LLMs, TTS engines, ASR models, and avatar backends via YAML config.
Real-time video conversation requires GPU acceleration. Below are benchmarks for FlashHead and LiveAct avatar models:
| Model | Quality | GPU | Count | Resolution | FPS | Real-time? |
|---|---|---|---|---|---|---|
| FlashHead 1.3B | Pro | RTX 5090 | 2 | 512×512 | 25+ | ✅ Yes |
| FlashHead 1.3B | Pro | RTX PRO 6000 | 1 | 512×512 | 20 | ✅ Yes |
| FlashHead 1.3B | Pro | RTX 4090 | 1 | 512×512 | ~10.8 | ❌ No |
| FlashHead 1.3B | Lite | RTX 4090 | 1 | 512×512 | 25+ | ✅ Yes |
| LiveAct 18B | — | RTX PRO 6000 | 2 | 320×480 | 20 | ✅ Yes |
| LiveAct 18B | — | RTX PRO 6000 | 1 | 256×417 | 20 | ✅ Yes |
Pro favors visual quality; Lite favors speed. The table reflects typical quality–compute balances — more GPU headroom lets you push higher quality; tighter hardware calls for lower settings (resolution, Pro vs Lite, etc.) to stay real-time.
libvpx for video encoding)To verify, use:
node --version
go version
ffmpeg -version|grep libvpx
conda --version
git clone https://github.com/dsd2077/CyberVerse.git
cd CyberVerse
conda create -n cyberverse python=3.10
conda activate cyberverse
cp infra/.env.example .env
Edit .env, fill in your API keys:
DOUBAO_ACCESS_TOKEN=your_doubao_access_token # ByteDance Doubao voice LLM
DOUBAO_APP_ID=your_doubao_app_id
Doubao Voice: get App ID / API Key per Volcengine quick start → DOUBAO_APP_ID / DOUBAO_ACCESS_TOKEN.
After the stack is running, you can change these values (and other API keys / service endpoints) from the web UI at /settings instead of editing .env only.
CyberVerse currently supports FlashHead and LiveAct; download only what you need. More backends are planned.
pip install "huggingface_hub[cli]"
| Model Component | Description | Link |
|---|---|---|
SoulX-FlashHead-1_3B | 1.3B FlashHead weights | Hugging Face, ModelScope |
wav2vec2-base-960h | Audio feature extractor | Hugging Face, ModelScope |
# If you are in mainland China, you can use a mirror first:
# export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download Soul-AILab/SoulX-FlashHead-1_3B \
--local-dir ./checkpoints/SoulX-FlashHead-1_3B
huggingface-cli download facebook/wav2vec2-base-960h \
--local-dir ./checkpoints/wav2vec2-base-960h
| ModelName | Download |
|---|---|
| SoulX-LiveAct | Hugging Face, ModelScope |
| chinese-wav2vec2-base | Hugging Face, ModelScope |
huggingface-cli download Soul-AILab/LiveAct \
--local-dir ./checkpoints/LiveAct
huggingface-cli download TencentGameMate/chinese-wav2vec2-base \
--local-dir ./checkpoints/chinese-wav2vec2-base
Edit cyberverse_config.yaml, update the model paths to match your local checkpoints:
inference:
avatar:
default: "flash_head" # selects which avatar model to start; if set to live_act, fill the live_act section below
runtime:
cuda_visible_devices: 0 # shared GPU ID(s), e.g. 0,1 for multi-GPU
world_size: 1 # shared GPU count, set to 2 for dual-GPU
flash_head:
checkpoint_dir: "./checkpoints/SoulX-FlashHead-1_3B" # ← your path
wav2vec_dir: "./checkpoints/wav2vec2-base-960h" # ← your path
model_type: "lite" # "pro" for higher quality (needs more GPU)
compile_model: true
compile_vae: true
dist_worker_main_thread: true
infer_params:
frame_num: 33
motion_frames_latent_num: 2
tgt_fps: 20
sample_rate: 16000
sample_shift: 5
color_correction_strength: 1.0
cached_audio_duration: 8
num_heads: 12
height: 512
width: 512
live_act:
ckpt_dir: "./checkpoints/LiveAct" # ← your path
wav2vec_dir: "./checkpoints/chinese-wav2vec2-base" # ← your path
seed: 42
compile_wan_model: false
compile_vae_decode: false
dist_worker_main_thread: true
default_prompt: "一个人在说话"
infer_params:
size: "320*480"
fps: 20
audio_cfg: 1.0
You can skip editing paths here for now and adjust these options later in the web UI.
# SageAttention
pip install sageattention==2.2.0 --no-build-isolation
# FlashAttention (optional)
pip install ninja
pip install flash_attn==2.8.0.post2 --no-build-isolation
If compilation is slow, download a prebuilt wheel from flash-attention releases and
pip install <wheel>.whl.
make setup
This installs the base editable package ([dev,inference]), generates gRPC stubs, and installs frontend dependencies. For extra Python packages, either install everything (large) or cherry-pick extras listed under [project.optional-dependencies] in pyproject.toml:
# all optional groups at once
pip install -e ".[all]"
# or pick what you need, e.g.:
pip install -e ".[voice_llm,flash_head]"
pip install -e ".[live_act]"
Terminal 1 — Python inference server:
conda activate cyberverse
make inference
make inference will read inference.avatar.default from cyberverse_config.yaml, then initialize exactly that one avatar model in the current inference process. Startup logs will print the active avatar model.
Wait until you see:
Active avatar model initialized: <model_name>CyberVerse Inference Server started on port 50051Terminal 2 — Go API server:
make server
Terminal 3 — Frontend:
make frontend
# Check API health
curl -s http://localhost:8080/api/v1/health
When streaming_mode: direct uses the embedded TURN server, the browser must be able to reach the server's 8443/TCP. If the page loads but audio/video never connects, or the server logs show ICE connection state: failed or publish timeout waiting for connection, first check whether your machine can reach port 8443 on the server:
nc -vz <server-ip> 8443
If 8443 is not reachable, the usual cause is a cloud security group, firewall, or NAT restriction. In that case, you can forward your local 8443 to the server through an SSH tunnel:
ssh -L 8443:127.0.0.1:8443 user@host -p port
After the tunnel is established, the browser will access the remote TURN service through local 127.0.0.1:8443.
If you want the browser to connect to the remote server directly instead of through an SSH tunnel, set pipeline.ice_public_ip in cyberverse_config.yaml to the server's public IP or domain. If you are using an SSH tunnel, you can keep the default value (127.0.0.1).
Open http://localhost:5173 in your browser — you're ready to go.
Configure characters, inference, and launch real-time digital-human sessions.
Turn digital humans into agents with memory, tools, and task execution.
Connect multiple agents so they can communicate, collaborate, and form networks.
GNU General Public License v3.0 — see LICENSE
SoulX-FlashHead — Avatar model by Soul AI Lab
SoulX-LiveAct - Avatar model by Soul AI Lab
Pion — Go WebRTC implementation