Le Shen*, Qian Qiao*, Tan Yu*, Ke Zhou, Tianhang Yu, Yu Zhan, Zhenjie Wang, Dingcheng Zhen, Ming Tao, Shunshun Yin, Siyuan Liu ✉
*Equal Contribution ✉Corresponding Author
A 4-GPU version of SoulX-FlashTalk and a new open-source real-time streaming digital human model designed specifically for consumer-grade GPUs like 4090 etc.
conda create -n flashtalk python=3.10 conda activate flashtalk
pip install torch==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt
pip install ninja pip install flash_attn==2.8.0.post2 --no-build-isolation
# Ubuntu / Debian
apt-get install ffmpeg
# CentOS / RHEL
yum install ffmpeg ffmpeg-devel
or
# Conda (no root required)
conda install -c conda-forge ffmpeg==7
| Model Component | Description | Link |
|---|---|---|
SoulX-FlashTalk-14B | Our 14b model | 🤗 Huggingface |
chinese-wav2vec2-base | chinese-wav2vec2-base | 🤗 Huggingface |
# If you are in china mainland, run this first: export HF_ENDPOINT=https://hf-mirror.com
pip install "huggingface_hub[cli]"
huggingface-cli download Soul-AILab/SoulX-FlashTalk-14B --local-dir ./models/SoulX-FlashTalk-14B
huggingface-cli download TencentGameMate/chinese-wav2vec2-base --local-dir ./models/chinese-wav2vec2-base
# Infer on single GPU
# Requires more than 64G of VRAM. Use --cpu_offload to reduce VRAM usage to 40G.
bash inference_script_single_gpu.sh
# Infer on multy GPUs
# Real-time inference speed can only be supported on 8xH800 or higher graphics cards
bash inference_script_multi_gpu.sh
Coming Soon!
If you are interested in leaving a message to our work, feel free to email le.shen@mail.dhu.edu.cn or qiaoqian@soulapp.cn or yutan@soulapp.cn or zhouke@soulapp.cn or liusiyuan@soulapp.cn
Due to Group 1 reaching its capacity, we have opened a new WeChat group. Additionally, we represent SoulApp and warmly welcome everyone to download the app and join our Soul group for further technical discussions and updates!
Join WeChat Group (加入微信技术群) |
Download SoulApp & Join Group (下载SoulApp加入群组) |
If you find our work useful in your research, please consider citing:
@misc{shen2025soulxflashtalktechnicalreport, title={SoulX-FlashTalk: Real-Time Infinite Streaming of Audio-Driven Avatars via Self-Correcting Bidirectional Distillation}, author={Le Shen and Qian Qiao and Tan Yu and Ke Zhou and Tianhang Yu and Yu Zhan and Zhenjie Wang and Ming Tao and Shunshun Yin and Siyuan Liu}, year={2025}, eprint={2512.23379}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2512.23379}, }
TIP
If you find our work useful, please also consider starring the original repositories of these foundational methods.