Full Fine-Tune • Rich Aesthetics • Strong Diversity • Full Negative Prompt Support
BF16 & FP8 & GGUF & AIO • Natural Language Prompts • 8GB VRAM
![]() | ![]() | ![]() |
![]() | ![]() | ![]() |
![]() | ![]() | ![]() |
Z-Anime is a full fine-tune of Alibaba's Z-Image Base architecture — not a LoRA merge, but a fully trained anime-focused model family built from the ground up.
Built on the S3-DiT (Single-Stream Diffusion Transformer, 6B parameters), Z-Anime inherits the strong foundation of Z-Image Base: rich diversity, strong controllability, full negative prompt support, and a high ceiling for fine-tuning — now adapted for anime-style generation.
This repository contains the full Z-Anime family:
| Variant | Focus | Best For |
|---|---|---|
| 🎌 Z-Anime Base | Highest quality | Final renders, full control |
| ⚡ Z-Anime Distill-8-Step | Speed + quality balance | Everyday generation |
| 🚀 Z-Anime Distill-4-Step | Maximum speed | Fast iteration, batches |
| 📦 GGUF Variants | Lower memory usage | Low VRAM / CPU / AMD-friendly workflows |
| 📦 AIO Variants | Single-file convenience | Easy ComfyUI setup |
| 🐍 Diffusers Folder | from_pretrained() ready | Python pipelines, further fine-tuning |
Full fine-tune on Z-Image Base — BF16 & FP8
BF16 & FP8 — fast anime generation in 8 steps, CFG 1.0
BF16 & FP8 — ultra-fast anime generation in 4 steps, CFG 1.0
Available for low VRAM, CPU inference, and AMD-friendly workflows.
All-in-one checkpoints with image model + VAE + Text Encoder integrated in a single file.
Available for Base, Distill-4-Step and Distill-8-Step — each in BF16 & FP8.
The required VAE (ae.safetensors) and Text Encoder (qwen_3_4b.safetensors) are also included in this repository for users running the standard (non-AIO) variants.
The full Diffusers-format folder (diffusers/) is included — drop-in compatible with ZImagePipeline.from_pretrained() for Python users who want to run inference outside ComfyUI or use Z-Anime as a starting point for further fine-tuning.
More updates coming — follow to stay notified! 🎌
Maximum precision. BFloat16 format with minimal quality compromise. Best for final renders, careful work, and LoRA training.
Recommended for most users. Smaller files, faster downloads, and excellent quality with only minor tradeoffs compared to BF16.
Optimized for lightweight inference setups, especially useful for low VRAM, CPU inference, or alternative backends.
All-in-one checkpoints with image model + Text Encoder + VAE integrated into a single file for the easiest setup. Available for Base, Distill-4-Step and Distill-8-Step.
The foundation of the Z-Anime family.
A full fine-tune with the highest quality ceiling, the widest creative range, and full negative prompt support.
steps: 28-50
cfg: 3.0-5.0 # up to 9.0 possible
sampler: euler_ancestral
scheduler: beta
negative_prompt: strongly recommended
Negative prompts have full effect on Z-Anime Base and are highly recommended.
The sweet spot of the family.
Distilled from Z-Anime Base, this version delivers strong anime results in just 8 steps while keeping most of the quality.
steps: 8
cfg: 1.0 # max ~1.5
sampler: euler_ancestral
scheduler: beta
negative_prompt: limited effect
Negative prompts have only limited effect at this distillation level. If your workflow includes ConditioningZeroOut, prefer that instead of a large negative prompt.
The fastest Z-Anime variant.
Built for maximum throughput — ideal for rapid prototyping, quick batch generation, and speed-focused workflows.
steps: 4
cfg: 1.0 # max ~1.5
sampler: euler_ancestral
scheduler: beta
negative_prompt: limited effect
| Use Case | Resolution |
|---|---|
| Portrait / character art | 832 × 1216 |
| Landscape / scenes / backgrounds | 1216 × 832 |
| Square / general purpose | 1024 × 1024 |
| Tall / full body / wallpaper | 768 × 1344 |
| Cinematic / wide scenes | 1920 × 1088 |
| Detailed portraits | 1024 × 1536 |
Supported range: approximately 512 × 512 to 2048 × 2048, any aspect ratio.
All main variants are designed to run on 8GB VRAM.
Natural language works best — not tag lists.
A young anime girl with long silver hair and golden eyes, wearing a traditional shrine maiden outfit with white haori and red hakama. She stands in a sunlit bamboo forest, cherry blossoms falling softly around her. Warm afternoon light filtering through the trees, detailed fabric shading, expressive face, calm serene expression, high quality anime illustration with fine line work.
anime girl, silver hair, shrine maiden, bamboo, cherry blossom, warm light
Detailed anime portrait of [character], soft rim lighting, expressive eyes with detailed reflections, fine hair strands, clean linework, professional anime illustration quality.
Dynamic anime [scene], dramatic angle, motion energy, speed lines, particle effects, cinematic composition, detailed shading, high quality anime art.
Anime [location] at [time of day], [lighting], [atmosphere], beautiful background art, wallpaper quality, highly detailed environment.
Choose between:
ComfyUI/models/diffusion_models/ ├── z-anime-base-bf16.safetensors ├── z-anime-base-fp8.safetensors ├── z-anime-distill-8step-bf16.safetensors ├── z-anime-distill-8step-fp8.safetensors ├── z-anime-distill-4step-bf16.safetensors └── z-anime-distill-4step-fp8.safetensors
ComfyUI/models/unet/ ├── z-anime-base-q8_0.gguf └── z-anime-base-q4_k_s.gguf
Two text encoders are included — pick one:
ComfyUI/models/clip/ └── qwen_3_4b-bf16.safetensors # default (Z-Image standard, BF16) or └── qwen_3_4b-fp8.safetensors # default (Z-Image standard, FP8) or └── qwen_3_4b-engineer-v4-bf16.safetensors # alternative (Engineer V4, BF16) or └── qwen_3_4b-engineer-v4-fp8.safetensors # alternative (Engineer V4, FP8)
qwen_3_4b-*) — the standard Z-Image text encoder, repackaged as a single .safetensors file (BF16 + FP8). This is what the model was trained against.qwen_3_4b-engineer-v4-*) — an alternative full fine-tune of the Z-Image text encoder by BennyDaBall, drop-in compatible. Often produces more varied outputs from the same seed. See Credits below for the original repo.ComfyUI/models/vae/ └── ae.safetensors
For the AIO versions, you only need the single checkpoint file — no extra VAE or Text Encoder required:
ComfyUI/models/checkpoints/ ├── z-anime-base-aio-bf16.safetensors ├── z-anime-base-aio-fp8.safetensors ├── z-anime-distill-8step-aio-bf16.safetensors ├── z-anime-distill-8step-aio-fp8.safetensors ├── z-anime-distill-4step-aio-bf16.safetensors └── z-anime-distill-4step-aio-fp8.safetensors
Use:
models/unet/ folderUse a standard Checkpoint Loader — no extra CLIP or VAE loading required.
For Python users, the full Diffusers-format folder is included under diffusers/ and can be loaded directly with the subfolder argument:
import torch
from diffusers import ZImagePipeline
pipe = ZImagePipeline.from_pretrained(
"SeeSee21/Z-Anime",
subfolder="diffusers",
torch_dtype=torch.bfloat16,
).to("cuda")
image = pipe(
prompt="A young anime girl with long silver hair and golden eyes, "
"shrine maiden outfit, sunlit bamboo forest, cherry blossoms, "
"professional anime illustration, fine line work.",
num_inference_steps=40,
guidance_scale=4.0,
).images[0]
image.save("z-anime-output.png")
This format is also a clean starting point for further fine-tuning (LoRA or full fine-tune) with frameworks like OneTrainer, diffusers, or kohya-ss.
A ready-to-use ComfyUI workflow that supports all variants (Base / Distill-8 / Distill-4, BF16 / FP8 / GGUF / AIO) is included in workflows/Z-Anime-Workflow-v1.json.
It includes:
Z-Anime/ ├── README.md ├── config.json │ ├── diffusion_models/ │ ├── z-anime-base-bf16.safetensors │ ├── z-anime-base-fp8.safetensors │ ├── z-anime-distill-8step-bf16.safetensors │ ├── z-anime-distill-8step-fp8.safetensors │ ├── z-anime-distill-4step-bf16.safetensors │ └── z-anime-distill-4step-fp8.safetensors │ ├── gguf/ │ ├── z-anime-base-q8_0.gguf │ └── z-anime-base-q4_k_s.gguf │ ├── aio/ │ ├── z-anime-base-aio-bf16.safetensors │ ├── z-anime-base-aio-fp8.safetensors │ ├── z-anime-distill-8step-aio-bf16.safetensors │ ├── z-anime-distill-8step-aio-fp8.safetensors │ ├── z-anime-distill-4step-aio-bf16.safetensors │ └── z-anime-distill-4step-aio-fp8.safetensors │ ├── text_encoder/ │ ├── qwen_3_4b-bf16.safetensors # default │ ├── qwen_3_4b-fp8.safetensors # default │ ├── qwen_3_4b-engineer-v4-bf16.safetensors # alternative (BennyDaBall) │ └── qwen_3_4b-engineer-v4-fp8.safetensors # alternative (BennyDaBall) │ ├── vae/ │ └── ae.safetensors │ ├── diffusers/ │ ├── model_index.json │ ├── scheduler/ │ ├── tokenizer/ │ ├── text_encoder/ │ ├── transformer/ (sharded safetensors + index) │ └── vae/ │ ├── images/ │ ├── cover.png │ ├── workflow-cover.png │ ├── workflow-overview.png │ ├── 1.png │ ├── 2.png │ ├── 3.png │ ├── 4.png │ ├── 5.png │ ├── 6.png │ ├── 7.png │ ├── 8.png │ └── 9.png └── workflows/ └── Z-Anime-Workflow-v1.json
ae.safetensors) and Text Encoder (qwen_3_4b.safetensors) includedTongyi-MAI/Z-ImageBennyDaBall/Qwen3-4b-Z-Image-Engineer-V4 — full fine-tune with SMART training, included as alternative text encoderZ-Anime is an experimental anime-focused model family built to explore what a full fine-tune on Z-Image Base can achieve in this space.
It is already strong for anime aesthetics, character work, and fast iteration, and future versions will continue to improve diversity, character handling, prompting flexibility, and overall quality.
Z-Anime — anime at its finest, powered by Z-Image Base. 🎌