An fp8_e4m3fn conversion of the text encoder at https://huggingface.co/Comfy-Org/ltx-2/blob/main/split_files/text_encoders/gemma_3_12B_it.safetensors, which is used for LTX-2
gemma_3_12B_it_fp8_e4m3fn.safetensors - The fp8 converted text encoder from comfy, goes in CLIP folder
ltx-2-19b-dev-fp4_projections_only.safetensors - Extracted projections from LTX-2 model to allow loading with DualClipLoader node, goes in CLIP folder
ltx-2-19b-dev-fp4_video_vae.safetensors - The video vae, can be loaded with VaeLoader node, goes in VAE folder
ltx-2-19b-dev-fp4_vocoder.safetensors - The vocoder model, not useful separately currently
Video vae and vocoder seem to have issues currently, they are optional.
When using a ComfyUI workflow which uses the original fp16 gemma 3 12b it model, simply select the text encoder from here instead.
Right now ComfyUI memory offloading seems to have issues with the text encoder loaded by the LTX-2 text encoder loader node, for now as a workaround (If you're getting an OOM error) you can launch ComfyUI with the --novram flag. This will slightly slow down generations so I recommend reverting this when a fix has been released.
--lowvram isn't needed if using the DualClipLoader, as it can be set to use cpu only.
Use the vanilla ComfyUI DualClipLoader node, and as the clip models select the gemma_3_12B_it_fp8_e4m3fn.safetensors and ltx-2-19b-dev-fp4_projections_only.safetensors from this repo. Now replace the LTXV Audio Text Encoder Loader node with the DualClipLoader node.