This is the XL (4B) Base variant of ACE-Step 1.5 — a larger DiT decoder with ~4B parameters for higher audio quality. It is the foundation model supporting all tasks: text-to-music, cover, repaint, extract, lego, and complete.
XL Architecture
Parameter
Value
DiT Decoder hidden_size
2560
DiT Decoder layers
32
DiT Decoder attention heads
32
Encoder hidden_size
2048
Encoder layers
8
Total params
~4B
Weights size (bf16)
~18.8 GB
Inference steps
50 (with CFG)
GPU Requirements
VRAM
Support
≥12 GB
With CPU offload + INT8 quantization
≥16 GB
With CPU offload
≥20 GB
Without offload
≥24 GB
Full quality (XL + 4B LM)
All LM models (0.6B / 1.7B / 4B) are fully compatible with XL.
Key Features
💰 Commercial-Ready: Trained on legally compliant datasets. Generated music can be used for commercial purposes.
📚 Safe Training Data: Licensed music, royalty-free/public domain, and synthetic (MIDI-to-Audio) data.
🎯 Full Task Support: Text2Music, Cover, Repaint, Extract, Lego, Complete.
🔮 Higher Quality: 4B parameters provide richer audio quality compared to the 2B variants.
Quick Start
# Install ACE-Step
git clone https://github.com/ace-step/ACE-Step-1.5.git
cd ACE-Step-1.5
pip install -e .
# Download this model
huggingface-cli download ACE-Step/acestep-v15-xl-base --local-dir ./checkpoints/acestep-v15-xl-base
# Run with Gradio UI
python acestep --config-path acestep-v15-xl-base