Tan Yu*, Qian Qiao*✉, Le Shen*, Ke Zhou, Jincheng Hu, Dian Sheng, Bo Hu, Haoming Qin, Jun Gao, Changhai Zhou, Shunshun Yin, Siyuan Liu ✉
*Equal Contribution ✉Corresponding Author
More examples are available in the project.
conda create -n flashhead python=3.10
conda activate flashhead
pip install torch==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt
pip install ninja
pip install flash_attn==2.8.0.post2 --no-build-isolation
-- If it takes a long time, we recommend the way below.
pip install sageattention==2.2.0 --no-build-isolation
# Ubuntu / Debian
apt-get install ffmpeg
# CentOS / RHEL
yum install ffmpeg ffmpeg-devel
or
# Conda (no root required)
conda install -c conda-forge ffmpeg==7
| Model Component | Description | Link |
|---|---|---|
SoulX-FlashHead-1_3B | Our 1.3B model | 🤗 Huggingface |
wav2vec2-base-960h | wav2vec2-base-960h | 🤗 Huggingface |
# If you are in china mainland, run this first: export HF_ENDPOINT=https://hf-mirror.com
pip install "huggingface_hub[cli]"
huggingface-cli download Soul-AILab/SoulX-FlashHead-1_3B --local-dir ./models/SoulX-FlashHead-1_3B
huggingface-cli download facebook/wav2vec2-base-960h --local-dir ./models/wav2vec2-base-960h
# Infer with [Pro-Model] on single GPU
bash inference_script_single_gpu_pro.sh
# Infer with [Pro-Model] on multy GPUs
bash inference_script_multi_gpu_pro.sh
# Real-time inference speed of Pro-Model can only be supported on two RTX-5090 with SageAttention.
# Infer with [Lite-Model] on single GPU
bash inference_script_single_gpu_lite.sh
# Real-time inference speed can be supported on single RTX-4090 (up to 3 concurrent).
# Gradio support needs gradio==5.50.0, and Chrome recommonded.
# common gradio demo
python gradio_app.py
# streaming gradio demo (Only support single GPU)
python gradio_app_streaming.py
Click here to experience the real-time streaming demo on HuggingFace Spaces.
For a real-time interactive experience, scan the QR code to enter the event link. [2026.2.12~2026.3.11]
Real-time Online Experience (SoulApp 实时在线体验) |
If you are interested in leaving a message to our work, feel free to email yutan@soulapp.cn or qiaoqian@soulapp.cn or le.shen@mail.dhu.edu.cn or zhouke@soulapp.cn or liusiyuan@soulapp.cn
We have opened a WeChat group. Additionally, we represent SoulApp and warmly welcome everyone to download the app and join our Soul group for further technical discussions and updates!
Join WeChat Group (加入微信技术群) |
Download SoulApp & Join Group (下载SoulApp加入群组) |
If you find our work useful in your research, please consider citing:
@misc{yu2026soulxflashheadoracleguidedgenerationinfinite,
title={SoulX-FlashHead: Oracle-guided Generation of Infinite Real-time Streaming Talking Heads},
author={Tan Yu and Qian Qiao and Le Shen and Ke Zhou and Jincheng Hu and Dian Sheng and Bo Hu and Haoming Qin and Jun Gao and Changhai Zhou and Shunshun Yin and Siyuan Liu},
year={2026},
eprint={2602.07449},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.07449},
}
TIP
If you find our work useful, please also consider starring the original repositories of these foundational methods.