This model card focuses on the LTX-2 model, codebase available here.
LTX-2 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution.
| Name | Notes |
|---|---|
| ltx-2-19b-dev | The full model, flexible and trainable in bf16 |
| ltx-2-19b-dev-fp8 | The full model in fp8 quantization |
| ltx-2-19b-dev-fp4 | The full model in nvfp4 quantization |
| ltx-2-19b-distilled | The distilled version of the full model, 8 steps, CFG=1 |
| ltx-2-19b-distilled-lora-384 | A LoRA version of the distilled model applicable to the full model |
| ltx-2-spatial-upscaler-x2-1.0 | An x2 spatial upscaler for the ltx-2 latents, used in multi stage (multiscale) pipelines for higher resolution |
| ltx-2-temporal-upscaler-x2-1.0 | An x2 temporal upscaler for the ltx-2 latents, used in multi stage (multiscale) pipelines for higher FPS |
LTX-2 is accessible right away via the following links:
You can use the models - full, distilled, upscalers and any derivatives of the models - for purposes under the license.
We recommend you use the built-in LTXVideo nodes that can be found in the ComfyUI Manager. For manual installation information, please refer to our documentation site.
The LTX-2 codebase is a monorepo with several packages. From model definition in 'ltx-core' to pipelines in 'ltx-pipelines' and training capabilities in 'ltx-trainer'. The codebase was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7.
git clone https://github.com/Lightricks/LTX-2.git
cd LTX-2
# From the repository root
uv sync
source .venv/bin/activate
To use our model, please follow the instructions in our ltx-pipelines package.
LTX-2 is supported in the Diffusers Python library for image-to-video generation.
The base (dev) model is fully trainable.
It's extremely easy to reproduce the LoRAs and IC-LoRAs we publish with the model by following the instructions on the LTX-2 Trainer Readme.
Training for motion, style or likeness (sound+appearance) can take less than an hour in many settings.