logo
0
0
WeChat Login
linwenfeng<linwf@users.noreply.huggingface.co>
Update README.md

ContentV: Efficient Training of Video Generation Models with Limited Compute

This project presents ContentV, an efficient framework for accelerating the training of DiT-based video generation models through three key innovations:

  • A minimalist architecture that maximizes reuse of pre-trained image generation models for video synthesis
  • A systematic multi-stage training strategy leveraging flow matching for enhanced efficiency
  • A cost-effective reinforcement learning with human feedback framework that improves generation quality without requiring additional human annotations

Our open-source 8B model (based on Stable Diffusion 3.5 Large and Wan-VAE) achieves state-of-the-art result (85.14 on VBench) in only 4 weeks of training with 256×64GB NPUs.

⚡ Quickstart

Recommended PyTorch Version

  • GPU: torch >= 2.3.1 (CUDA >= 12.2)
  • NPU: torch and torch-npu >= 2.1.0 (CANN >= 8.0.RC2). Please refer to Ascend Extension for PyTorch for the installation of torch-npu.

Installation

git clone https://github.com/bytedance/ContentV.git cd ContentV pip3 install -r requirements.txt

T2V Generation

## For GPU python3 demo.py ## For NPU USE_ASCEND_NPU=1 python3 demo.py

📊 VBench

ModelTotal ScoreQuality ScoreSemantic ScoreHuman ActionSceneDynamic DegreeMultiple ObjectsAppear. Style
Wan2.1-14B86.2286.6784.4499.2061.2494.2686.5921.59
ContentV (Long)85.1486.6479.1296.8057.3883.0571.4123.02
Goku†84.8585.6081.8797.6057.0876.1179.4823.08
Open-Sora 2.084.3485.4080.1295.4052.7171.3977.7222.98
Sora†84.2885.5179.3598.2056.9579.9170.8524.76
ContentV (Short)84.1186.2375.6189.6044.0279.2674.5821.21
EasyAnimate 5.183.4285.0377.0195.6054.3157.1566.8523.06
Kling 1.6†83.4085.0076.9996.2055.5762.2263.9920.75
HunyuanVideo83.2485.0975.8294.4053.8870.8368.5519.80
CogVideoX-5B81.6182.7577.0499.4053.2070.9762.1124.91
Pika-1.0†80.6982.9271.7786.2049.8347.5043.0822.26
VideoCrafter-2.080.4482.2073.4295.0055.2942.5040.6625.13
AnimateDiff-V280.2782.9069.7592.6050.1940.8336.8822.42
OpenSora 1.279.2380.7173.3085.8042.4747.2258.4123.89

✅ Todo List

  • Inference code and checkpoints
  • Training code of RLHF

🧾 License

This code repository and part of the model weights are licensed under the Apache 2.0 License. Please note that:

❤️ Acknowledgement

🔗 Citation

@article{contentv2025, title = {ContentV: Efficient Training of Video Generation Models with Limited Compute}, author = {Bytedance Douyin Content Team}, journal = {arXiv preprint arXiv:2506.05343}, year = {2025} }