High-level pipeline implementations for generating audio-video content with Lightricks' LTX-2 model. This package provides ready-to-use pipelines for text-to-video, image-to-video, video-to-video, and keyframe interpolation tasks.
Pipelines are built using building blocks from ltx-core (schedulers, guiders, noisers, patchifiers) and handle the complete inference flow including model loading, encoding, decoding, and file I/O.
LTX-2 Pipelines provides production-ready implementations that abstract away the complexity of the diffusion process, model loading, and memory management. Each pipeline is optimized for specific use cases and offers different trade-offs between speed, quality, and memory usage.
Key Features:
ltx-pipelines provides ready-made inference pipelines for text-to-video, image-to-video, video-to-video, and keyframe interpolation. Built using building blocks from ltx-core, these pipelines handle the complete inference flow including model loading, encoding, decoding, and file I/O.
# From the repository root
uv sync --frozen
# Or install as a package
pip install -e packages/ltx-pipelines
All pipelines can be run directly from the command line. Each pipeline module is executable:
# Run a pipeline (example: two-stage text-to-video)
python -m ltx_pipelines.ti2vid_two_stages \
--checkpoint-path path/to/checkpoint.safetensors \
--distilled-lora path/to/distilled_lora.safetensors 0.8 \
--spatial-upsampler-path path/to/upsampler.safetensors \
--gemma-root path/to/gemma \
--prompt "A beautiful sunset over the ocean" \
--output-path output.mp4
# View all available options for any pipeline
python -m ltx_pipelines.ti2vid_two_stages --help
Available pipeline modules:
ltx_pipelines.ti2vid_two_stages - Two-stage text/image-to-video (recommended).ltx_pipelines.ti2vid_one_stage - Single-stage text/image-to-video.ltx_pipelines.distilled - Fast text/image-to-video pipeline using only the distilled model.ltx_pipelines.ic_lora - Video-to-video with IC-LoRA.ltx_pipelines.keyframe_interpolation - Keyframe interpolation.Use --help with any pipeline module to see all available options and parameters.
Do you need to condition on existing images/videos? ├─ YES → Do you have reference videos for video-to-video? │ ├─ YES → Use ICLoraPipeline │ └─ NO → Do you have multiple keyframe images to interpolate? │ ├─ YES → Use KeyframeInterpolationPipeline │ └─ NO → Use TI2VidTwoStagesPipeline (image conditioning only) │ └─ NO → Text-to-video only ├─ Do you need best quality? │ └─ YES → Use TI2VidTwoStagesPipeline (recommended for production) │ └─ Do you need fastest inference? └─ YES → Use DistilledPipeline (with 8 predefined sigmas)
Note:
TI2VidOneStagePipelineis primarily for educational purposes. For best quality, use two-stage pipelines (TI2VidTwoStagesPipeline,ICLoraPipeline,KeyframeInterpolationPipeline, orDistilledPipeline).
| Pipeline | Stages | CFG | Upsampling | Conditioning | Best For |
|---|---|---|---|---|---|
| TI2VidTwoStagesPipeline | 2 | ✅ | ✅ | Image | Production quality (recommended) |
| TI2VidOneStagePipeline | 1 | ✅ | ❌ | Image | Educational, prototyping |
| DistilledPipeline | 2 | ❌ | ✅ | Image | Fastest inference (8 sigmas) |
| ICLoraPipeline | 2 | ✅ | ✅ | Image + Video | Video-to-video transformations |
| KeyframeInterpolationPipeline | 2 | ✅ | ✅ | Keyframes | Animation, interpolation |
Best for: High-quality text/image-to-video generation with upsampling. Recommended for production use.
Source: src/ltx_pipelines/ti2vid_two_stages.py
Two-stage generation: Stage 1 generates low-resolution video with CFG guidance, Stage 2 upsamples to 2x resolution with distilled LoRA refinement. Supports image conditioning. Highest quality output, slower than one-stage but significantly better quality.
Use when: Production-quality video generation, higher resolution needed, quality over speed, text-to-video with image conditioning.
Best for: Educational purposes and quick prototyping.
Source: src/ltx_pipelines/ti2vid_one_stage.py
⚠️ Important: This pipeline is primarily for educational purposes. For production-quality results, use
TI2VidTwoStagesPipelineor other two-stage pipelines.
Single-stage generation (no upsampling) with CFG guidance and image conditioning support. Faster inference but lower resolution output (typically 512x768).
Use when: Learning how the pipeline works, quick prototyping, testing, or when high resolution is not needed.
Best for: Fastest inference with good quality using a distilled model with predefined sigma schedule.
Source: src/ltx_pipelines/distilled.py
Two-stage generation with 8 predefined sigmas (8 steps in stage 1, 4 steps in stage 2). No CFG guidance required. Fastest inference among all pipelines. Supports image conditioning. Requires spatial upsampler.
Use when: Fastest inference is critical, batch processing many videos, or when you have a distilled model checkpoint.
Best for: Video-to-video and image-to-video transformations using IC-LoRA.
Source: src/ltx_pipelines/ic_lora.py
Two-stage generation with IC-LoRA support. Can condition on reference videos (video-to-video) or images at specific frames. CFG guidance in stage 1, upsampling in stage 2. Requires IC-LoRA trained model.
Use when: Video-to-video transformations, image-to-video with strong control, or when you have reference videos to guide generation.
Best for: Generating videos by interpolating between keyframe images.
Source: src/ltx_pipelines/keyframe_interpolation.py
Two-stage generation with keyframe interpolation. Uses guiding latents (additive conditioning) instead of replacing latents for smoother transitions. CFG guidance in stage 1, upsampling in stage 2.
Use when: You have keyframe images and want to interpolate between them, creating smooth transitions, or animation/motion interpolation tasks.
Pipelines use different conditioning methods from ltx-core for controlling generation. See the ltx-core conditioning documentation for details.
All pipelines support image conditioning, but with different methods:
Replacing Latents (image_conditionings_by_replacing_latent):
TI2VidOneStagePipeline, TI2VidTwoStagesPipeline, DistilledPipeline, ICLoraPipelineGuiding Latents (image_conditionings_by_adding_guiding_latent):
KeyframeInterpolationPipelineVideoConditionByKeyframeIndex from ltx-coreFP8 Transformer (Lower Memory Footprint):
For smaller GPU memory footprint, use the enable-fp8 flag and use the PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True environment variable.
CLI:
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python -m ltx_pipelines.ti2vid_one_stage --enable-fp8 --checkpoint-path=...
Programmatically:
When authoring custom scripts, pass the fp8transformer flag to pipeline classes or construct your own by analogy:
pipeline = TI2VidTwoStagesPipeline(
checkpoint_path=ltx_model_path,
distilled_lora=distilled_lora,
spatial_upsampler_path=upsampler_path,
gemma_root=gemma_root_path,
loras=[],
fp8transformer=True,
)
pipeline(...)
You still need to use PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True when launching:
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python my_denoising_pipeline.py
Memory Cleanup Between Stages:
By default, pipelines clean GPU memory (especially transformer weights) between stages. If you have enough memory, you can skip this cleanup to reduce running time:
# In pipeline implementations, memory cleanup happens automatically
# between stages. For custom pipelines, you can skip:
# utils.cleanup_memory() # Comment out if you have enough VRAM
Gradient Estimation Denoising Loop:
Instead of the standard Euler denoising loop, you can use gradient estimation for fewer steps (~20-30 instead of 40):
from ltx_pipelines.utils.helpers import gradient_estimating_euler_denoising_loop
# Use gradient estimation denoising loop
def denoising_loop(sigmas, video_state, audio_state, stepper):
return gradient_estimating_euler_denoising_loop(
sigmas=sigmas,
video_state=video_state,
audio_state=audio_state,
stepper=stepper,
denoise_fn=your_denoise_function,
ge_gamma=2.0, # Gradient estimation coefficient
)
This allows you to use 20-30 steps instead of 40 while maintaining quality. The gradient estimation function is available in pipeline_utils.py.
.safetensors filefrom ltx_core.loader import LTXV_LORA_COMFY_RENAMING_MAP, LoraPathStrengthAndSDOps
from ltx_pipelines.ti2vid_two_stages import TI2VidTwoStagesPipeline
distilled_lora = [
LoraPathStrengthAndSDOps(
"/path/to/distilled_lora.safetensors",
0.6,
LTXV_LORA_COMFY_RENAMING_MAP
),
]
pipeline = TI2VidTwoStagesPipeline(
checkpoint_path="/path/to/checkpoint.safetensors",
distilled_lora=distilled_lora,
spatial_upsampler_path="/path/to/upsampler.safetensors",
gemma_root="/path/to/gemma",
loras=[],
)
# Generate video from image
pipeline(
prompt="A serene landscape with mountains in the background",
output_path="output.mp4",
seed=42,
height=512,
width=768,
num_frames=121,
frame_rate=25.0,
num_inference_steps=40,
cfg_guidance_scale=3.0,
images=[("input_image.jpg", 0, 1.0)], # Image at frame 0, strength 1.0
)