
This repository provides a number of ControlNet models trained for use with Stable Diffusion 3.5 Large.
The following control types are available:
All currently released ControlNets are compatible only with Stable Diffusion 3.5 Large (8b).
Additional ControlNet models, including 2B versions of the variants above, and multiple other control types, will be added to this repository in the future.
Please note: This model is released under the Stability Community License. Visit Stability AI to learn or contact us for commercial licensing details.
Here are the key components of the license:
For organizations with annual revenue more than $1M, please contact us here to inquire about an Enterprise License.
For local or self-hosted use, we recommend ComfyUI for node-based UI inference, or the standalone SD3.5 repo for programmatic use.
Please see the READMEs in the standalone repos for Diffusers support:
Please see the ComfyUI announcement blog post for details on usage within Comfy, including example workflows.
Install the repo:
git clone git@github.com:Stability-AI/sd3.5.git pip install -r requirements.txt
Then, download the models and sample images like so:
input/canny.png models/clip_g.safetensors models/clip_l.safetensors models/t5xxl.safetensors models/sd3.5_large.safetensors models/sd3.5_large_controlnet_canny.safetensors
and then you can run
python sd3_infer.py --controlnet_ckpt models/sd3.5_large_controlnet_canny.safetensors --controlnet_cond_image input/canny.png --prompt "An adorable fluffy pastel creature"
Which should give you an image like below:

The conditioning image should already be preprocessed before being used as input to the standalone repo; sd3.5 does not implement the preprocessing code below.
Below are code snippets for preprocessing the various control image types.
import torchvision.transforms.functional as F
# assuming img is a PIL image
img = F.to_tensor(img)
img = cv2.cvtColor(img.transpose(1, 2, 0), cv2.COLOR_RGB2GRAY)
img = cv2.Canny(img, 100, 200)
import torchvision.transforms as transforms
# assuming img is a PIL image
gaussian_blur = transforms.GaussianBlur(kernel_size=50)
blurred_image = gaussian_blur(img)
# install depthfm from https://github.com/CompVis/depth-fm
import torchvision.transforms as transforms
from depthfm.dfm import DepthFM
depthfm_model = DepthFM(ckpt_path=checkpoint_path)
depthfm_model.eval()
# assuming img is a PIL image
img = F.to_tensor(img)
c, h, w = img.shape
img = F.interpolate(img, (512, 512), mode='bilinear', align_corners=False)
with torch.no_grad():
img = self.depthfm_model(img, num_steps=2, ensemble_size=4)
img = F.interpolate(img, (h, w), mode='bilinear', align_corners=False)
--text_encoder_device <device_name> to load the text encoders directly to VRAM, which can speed up the full inference loop at the cost of extra VRAM usage.All uses of the model must be in accordance with our Acceptable Use Policy.
The model was not trained to be factual or true representations of people or events. As such, using the model to generate such content is out-of-scope of the abilities of this model.
These models were trained on a wide variety of data, including synthetic data and filtered publicly available data.
We believe in safe, responsible AI practices and take deliberate measures to ensure Integrity starts at the early stages of development. This means we have taken and continue to take reasonable steps to prevent the misuse of Stable Diffusion 3.5 by bad actors. For more information about our approach to Safety please visit our Safety page.
Our integrity evaluation methods include structured evaluations and red-teaming testing for certain harms. Testing was conducted primarily in English and may not cover all possible harms.
Please report any issues with the model or contact us: