ComfyUI custom nodes for SoulX-FlashHead — generate real-time streaming talking head videos from a reference image and audio.
pro (higher quality) and lite (faster, real-time capable) modelsSearch for ComfyUI_RH_FlashHead in ComfyUI Manager and install.
cd ComfyUI/custom_nodes
git clone https://github.com/HM-RunningHub/ComfyUI_RH_FlashHead.git
cd ComfyUI_RH_FlashHead
pip install -r requirements.txt
This node requires the following system-level dependency:
# Ubuntu / Debian
apt-get install ffmpeg
# CentOS / RHEL
yum install ffmpeg ffmpeg-devel
# Conda (no root required)
conda install -c conda-forge ffmpeg==7
All models must be placed under ComfyUI/models/ with the following structure:
ComfyUI/
└── models/
├── Soul-AILab/
│ └── SoulX-FlashHead-1_3B/ # FlashHead model checkpoint
│ ├── config.json
│ ├── model_lite/
│ └── model_pro/
└── wav2vec/
└── facebook/
└── wav2vec2-base-960h/ # Audio encoder
pip install "huggingface_hub[cli]"
# Download FlashHead model
huggingface-cli download Soul-AILab/SoulX-FlashHead-1_3B \
--local-dir ComfyUI/models/Soul-AILab/SoulX-FlashHead-1_3B
# Download wav2vec2 audio encoder
huggingface-cli download facebook/wav2vec2-base-960h \
--local-dir ComfyUI/models/wav2vec/facebook/wav2vec2-base-960h
export HF_ENDPOINT=https://hf-mirror.com
pip install "huggingface_hub[cli]"
huggingface-cli download Soul-AILab/SoulX-FlashHead-1_3B \
--local-dir ComfyUI/models/Soul-AILab/SoulX-FlashHead-1_3B
huggingface-cli download facebook/wav2vec2-base-960h \
--local-dir ComfyUI/models/wav2vec/facebook/wav2vec2-base-960h
| Model | Link | Description |
|---|---|---|
| SoulX-FlashHead-1_3B | HuggingFace | FlashHead 1.3B model (pro + lite) |
| wav2vec2-base-960h | HuggingFace | Facebook wav2vec2 audio encoder |
| Your GPU VRAM | Recommended Model | Performance |
|---|---|---|
| ≥ 24GB | pro | Higher quality, ~10.8 FPS on RTX 4090 |
| ≥ 8GB | lite | Real-time capable, ~96 FPS on RTX 4090 |
This package provides two ComfyUI nodes:
Loads the FlashHead pipeline into memory.
| Parameter | Type | Default | Description |
|---|---|---|---|
model_type | pro / lite | lite | Model variant to load |
Output: FlashHead Pipeline object
Generates a talking head video from a pipeline, reference image, and audio.
| Parameter | Type | Default | Description |
|---|---|---|---|
pipeline | FlashHead Pipeline | — | Pipeline from Loader node |
ref_audio | AUDIO | — | Driving audio |
avatar_image | IMAGE | — | Reference face image |
seed | INT | 42 | Random seed |
width | INT | 512 | Output video width |
height | INT | 512 | Output video height |
Output: VIDEO — Generated talking head video with audio
Download the example workflow from workflows/example_workflow_api.json and import it into ComfyUI.
The workflow demonstrates:
This project is licensed under the Apache License 2.0.
This project is based on SoulX-FlashHead, developed by Soul-AILab.
We also acknowledge the foundational work from: