
How to Set Up Z-Image Turbo in ComfyUI: Complete Workflow Guide
Step-by-step instructions for installing and configuring Z-Image Turbo in ComfyUI, including model downloads, directory structure, node setup, and optimization tips.
This guide covers how to set up Z-Image Turbo in ComfyUI for local image generation. All instructions are based on the official ComfyUI documentation and Hugging Face model repositories.
No GPU? Use Z-Image Turbo online — generate images directly in your browser without local installation.
Prerequisites
Before starting, ensure you have:
- ComfyUI installed (nightly version recommended)
- GPU with at least 16GB VRAM (RTX 4090, RTX 3090, or similar)
- Python 3.10 or later
- Sufficient disk space (~15GB for model files)
Required Model Files
Z-Image Turbo requires three model files to function in ComfyUI:
1. Text Encoder
Download qwen_3_4b.safetensors from the Comfy-Org repository.
This is the Qwen 3.4B language model that processes your text prompts. It enables the strong prompt understanding that Z-Image Turbo is known for.
2. Diffusion Model
Download z_image_turbo_bf16.safetensors from the Comfy-Org Z-Image-Turbo repository.
This is the main 6B parameter diffusion model that generates images.
3. VAE (Variational Autoencoder)
Download ae.safetensors—the FLUX VAE that works with Z-Image Turbo.
The VAE handles encoding and decoding between latent space and pixel space.
Directory Structure
Place downloaded files in your ComfyUI installation:
ComfyUI/
└── models/
├── text_encoders/
│ └── qwen_3_4b.safetensors
├── diffusion_models/
│ └── z_image_turbo_bf16.safetensors
└── vae/
└── ae.safetensorsCreate any missing directories before copying files.
Loading the Workflow
ComfyUI provides official workflow templates for Z-Image Turbo:
- Open ComfyUI in your browser
- Navigate to Workflow Templates in the menu
- Search for "Z-Image" or "Z-Image-Turbo"
- Load the text-to-image workflow
Alternatively, download workflow JSON files from the ComfyUI examples repository.
Node Configuration
The basic Z-Image Turbo workflow uses these nodes:
Load Text Encoder Node
- Select
qwen_3_4b.safetensors - This loads the language model for prompt processing
Load Diffusion Model Node
- Select
z_image_turbo_bf16.safetensors - Model type: Diffusion Transformer
Load VAE Node
- Select
ae.safetensors - Used for final image decoding
Sampler Settings
Z-Image Turbo uses specific sampler parameters:
| Parameter | Value |
|---|---|
| Steps | 9 |
| CFG Scale | 0.0 |
| Sampler | euler |
| Scheduler | simple |
The model uses 0.0 guidance scale because Z-Image Turbo is a distilled model that does not require classifier-free guidance.
Resolution Settings
Z-Image Turbo supports various resolutions. Recommended options:
- 1024 × 1024 — Standard square format
- 1280 × 720 — 16:9 landscape
- 720 × 1280 — 9:16 portrait
- 1024 × 576 — Cinematic widescreen
Higher resolutions require more VRAM. If you encounter out-of-memory errors, reduce dimensions or enable memory optimizations.
Memory Optimization
For GPUs with limited VRAM, apply these optimizations:
Enable Attention Slicing
In ComfyUI settings, enable attention slicing to reduce peak memory usage at the cost of slightly slower generation.
Use FP8 Quantized Model
If 16GB VRAM is insufficient, use the FP8 quantized version:
- Download
z_image_turbo_fp8.safetensorsinstead of the bf16 version - Reduces memory usage to ~10GB
- Minor quality reduction
CPU Offloading
Enable model offloading in ComfyUI to move inactive model components to CPU RAM during generation.
ControlNet Integration (Optional)
Z-Image Turbo supports ControlNet for guided generation:
Download ControlNet Model
Get Z-Image-Turbo-Fun-Controlnet-Union.safetensors and place it in:
ComfyUI/models/controlnet/Workflow with ControlNet
- Add a "Load ControlNet Model" node
- Select the Z-Image ControlNet Union model
- Connect your reference image through a preprocessor (Canny, Depth, Pose)
- Connect ControlNet output to the sampler
Common ControlNet modes:
- Canny — Edge detection for structural guidance
- Depth — Depth maps for spatial composition
- DWPose — Human pose estimation
Performance Expectations
Based on user reports and official documentation:
| GPU | Resolution | Generation Time |
|---|---|---|
| RTX 4090 | 1024×1024 | ~5 seconds |
| RTX 3090 | 1024×1024 | ~13 seconds |
| RTX 4070 Ti | 1024×1024 | ~20 seconds |
| RTX 3080 (10GB) | 1024×1024 | ~30 seconds (with FP8) |
Times are approximate and vary based on system configuration.
Troubleshooting
"Model not found" Error
Verify file paths match exactly. Check that:
- Files are in correct directories
- Filenames match what ComfyUI expects
- No extra extensions or typos
Out of Memory Error
- Reduce resolution
- Use FP8 quantized model
- Enable attention slicing
- Close other GPU applications
Slow First Generation
The first generation after loading models is slower due to CUDA kernel compilation. Subsequent generations run at normal speed.
Black or Corrupted Output
- Ensure VAE is loaded correctly
- Check that sampler settings match (steps: 9, CFG: 0.0)
- Verify bfloat16 is supported by your GPU
Next Steps
Once your basic workflow runs correctly:
- Experiment with different prompts to test text rendering
- Try ControlNet for consistent character poses
- Train custom LoRAs for specific styles
Related Resources
- Use Z-Image Turbo Online — No installation required
- LoRA Training Guide — Create custom style and character models
- Train LoRAs on our Platform — Web-based training, no GPU needed
- Z-Image Turbo vs FLUX — Technical comparison
- View Pricing — Check our plans for online generation
Sources:
Author

Categories
More AI Image Generation Posts

Z-Image Turbo LoRA Training: Complete Guide with De-Distillation Adapter
Learn how to train custom LoRA models for Z-Image Turbo using the Ostris AI Toolkit, including the de-distillation adapter technique, dataset preparation, and training parameters.


Z-Image Turbo vs Midjourney: Open-Source Alternative Comparison 2025
A detailed comparison between Z-Image Turbo (free, open-source) and Midjourney ($10-120/month). Compare pricing, features, text rendering, speed, and commercial licensing.


Z-Image Turbo vs FLUX: Technical Specifications and Performance Comparison
A detailed comparison of Z-Image Turbo and FLUX.1-dev based on official specifications from Hugging Face, covering parameters, inference speed, VRAM requirements, licensing, and benchmark results.
