VOOZH about

URL: https://willitrunai.com/image-models

โ‡ฑ Image Generation Models โ€” VRAM Requirements & GPU Compatibility | WillItRunAI


๐Ÿ‘ Sample images generated by FLUX.2 Dev showing high-quality text-to-image generation
๐Ÿ‘ Black Forest Labs
Black Forest Labs
Flux.2 Dev
32B paramsup to 1024ร—1024~73.8 GB VRAMfrontier
photorealisticartdesign
Top tier

Next-generation text-to-image model from Black Forest Labs. 32B parameter DiT with Mistral-Small-3.2-24B text encoder. Requires ~64GB+ VRAM at full precision; consumer GPUs need Q4/Q8 quantization.

28 inference steps ยท DIT
๐Ÿ‘ Photorealistic image generated by Flux.1 Dev
๐Ÿ‘ Black Forest Labs
Black Forest Labs
Flux.1 Dev
12B paramsup to 1024ร—1024~33.8 GB VRAMfrontier
photorealisticartdesign
Top tier

State-of-the-art text-to-image model from Black Forest Labs. Excels at photorealism, text rendering, and prompt adherence. 12B parameter DiT architecture with dual text encoders: T5-XXL (4.7B) and CLIP-L (0.12B).

28 inference steps ยท DIT
๐Ÿ‘ Sample images generated by HunyuanImage 3.0 showcasing diverse visual styles
๐Ÿ‘ Tencent
Tencent
HunyuanImage 3.0
84B paramsup to 1024ร—1024~168.2 GB VRAMfrontier
photorealisticartdesign
Top tier

Massive MoE-based text-to-image model from Tencent. 84B total parameters with ~14B active (Mixture of Experts). Autoregressive + diffusion hybrid architecture. Excellent quality and Chinese/English text rendering. One of the largest open image generation models.

30 inference steps ยท MOE-DIT
๐Ÿ‘ Inpainting example generated by FLUX.1 Fill Dev
๐Ÿ‘ Black Forest Labs
Black Forest Labs
Flux.1 Fill Dev
12B paramsup to 1024ร—1024~33.8 GB VRAMfrontier
inpaintingoutpaintingimage-editing
Top tier

Inpainting and outpainting specialist built on the Flux.1 architecture. Designed for masked region generation โ€” object removal, replacement, and image extension. Uses higher guidance scale (30) and more steps (50) than standard Flux.1 Dev for optimal mask adherence.

50 inference steps ยท DIT
๐Ÿ‘ Sample images generated by Qwen-Image showcasing photorealistic and artistic styles
Alibaba / QwenQwen Image
20.4B paramsup to 1024ร—1024~57.6 GB VRAMfrontier
photorealisticartdesign
Top tier

State-of-the-art text-to-image model from Qwen team. 20.4B DiT transformer with Qwen2.5-VL (8.3B) text encoder. Excels at photorealism, Chinese/English text rendering, and complex compositions. Apache 2.0 licensed.

30 inference steps ยท DIT
๐Ÿ‘ Photorealistic images generated by Z-Image Turbo
Alibaba Tongyi-MAIZ-Image Turbo
6B paramsup to 1536ร—1536~20.2 GB VRAMfrontier
photorealisticartfast-generation
Top tier

Ultra-fast image generation model from Alibaba Tongyi-MAI using S3-DiT architecture. 6B parameters, only 8 inference steps. Fits in 16GB VRAM.

8 inference steps ยท DIT
๐Ÿ‘ Image editing and generation examples from FLUX.1 Kontext Dev
๐Ÿ‘ Black Forest Labs
Black Forest Labs
Flux.1 Kontext Dev
12B paramsup to 1024ร—1024~33.8 GB VRAMfrontier
image-editingstyle-transfercharacter-consistency
Top tier

Context-aware image editing model from Black Forest Labs. Based on FLUX.1 DiT architecture, Kontext excels at in-context image editing: style transfer, character consistency across images, text modifications, and object manipulation using natural language instructions.

28 inference steps ยท DIT
๐Ÿ‘ Image editing examples from Qwen-Image-Edit showing character consistency and creative editing
Alibaba / QwenQwen Image Edit
20.4B paramsup to 1024ร—1024~57.6 GB VRAMfrontier
image-editinginpaintingstyle-transfer
Top tier

Instruction-based image editing model from Qwen team. Same 20.4B DiT backbone as Qwen-Image but fine-tuned for image editing tasks: inpainting, style transfer, object removal, and text-guided modifications. Apache 2.0 licensed.

30 inference steps ยท DIT
๐Ÿ‘ Image generated by FLUX.1 Schnell
๐Ÿ‘ Black Forest Labs
Black Forest Labs
Flux.1 Schnell
12B paramsup to 1024ร—1024~33.8 GB VRAMfrontier
photorealisticartfast-generation
High

Distilled version of Flux.1 Dev optimized for speed. Only 4 steps needed (vs 28 for Dev). Same architecture but ~7x faster generation. Apache 2.0 licensed.

4 inference steps ยท DIT
๐Ÿ‘ Image generated by FLUX.2 Klein 9B
๐Ÿ‘ Black Forest Labs
Black Forest Labs
Flux.2 Klein 9B
9B paramsup to 1024ร—1024~27.8 GB VRAMfrontier
photorealisticartdesign
High

Mid-range 9B variant of FLUX.2 Klein family. Sub-second generation on H100. DiT architecture with T5-XXL + CLIP-L text encoders (4.82B combined). Higher quality than the 4B sibling while remaining efficient.

20 inference steps ยท DIT
๐Ÿ‘ High-quality image generated by Stable Diffusion 3.5 Large
๐Ÿ‘ Stability AI
Stability AI
Stable Diffusion 3.5 Large
2.5B paramsup to 1024ร—1024~16.2 GB VRAMstable
photorealisticarttext-rendering
High

2.5B MMDiT transformer with triple text encoder (5.5B combined: T5-XXL 4.7B + CLIP-L 0.123B + OpenCLIP-G 0.695B). Improved text rendering and composition over SDXL.

28 inference steps ยท MMDIT
๐Ÿ‘ Image generated by Stable Diffusion 3.5 Large Turbo
๐Ÿ‘ Stability AI
Stability AI
Stable Diffusion 3.5 Large Turbo
2.5B paramsup to 1024ร—1024~16.2 GB VRAMstable
photorealisticartfast-generation
High

Distilled version of SD 3.5 Large requiring only 4 inference steps. Same 2.5B MMDiT architecture but ~7x faster. Good for rapid iteration and previewing.

4 inference steps ยท MMDIT
๐Ÿ‘ Photorealistic image generated by RealVisXL V5.0
SG161222RealVisXL v5.0
2.6B paramsup to 1024ร—1024~7 GB VRAMstable
photorealisticportraitlandscape
High

The most popular photorealistic SDXL fine-tune on CivitAI. Excels at lifelike portraits, landscapes, and product photography. Compatible with all SDXL ControlNets and LoRAs.

25 inference steps ยท UNET
๐Ÿ‘ Photorealistic images generated by FLUX.2 Klein 4B
๐Ÿ‘ Black Forest Labs
Black Forest Labs
Flux.2 Klein 4B
4B paramsup to 1024ร—1024~17.8 GB VRAMfrontier
photorealisticfast-generationlightweight
High

Lightweight 4B variant of FLUX.2 for efficient generation. Distilled from FLUX.2-dev for faster inference on consumer GPUs. Apache 2.0 licensed โ€” the most accessible Flux model for commercial use.

20 inference steps ยท DIT
๐Ÿ‘ Image generated by Chroma model
LodestonesChroma
8.9B paramsup to 1024ร—1024~27.6 GB VRAMstable
photorealisticartfast-generation
High

Community-distilled 8.9B model based on FLUX.1-schnell architecture. Apache 2.0 licensed alternative to Flux with competitive quality. Available in HD and Flash variants for different quality/speed tradeoffs.

4 inference steps ยท DIT
๐Ÿ‘ High-quality image generated by Juggernaut XL
RunDiffusionJuggernaut XL v9
2.6B paramsup to 1024ร—1024~7 GB VRAMstable
photorealisticportraitcinematic
High

Premium photorealistic SDXL fine-tune focused on cinematic quality. Known for exceptional skin textures, lighting, and composition. Popular for portrait and fashion photography.

30 inference steps ยท UNET
๐Ÿ‘ Aesthetic image generated by Playground v2.5
๐Ÿ‘ Playground AI
Playground AI
Playground v2.5
3.5B paramsup to 1024ร—1024~8.8 GB VRAMstable
photorealisticaestheticart
High

SDXL-based model fine-tuned for exceptional aesthetic quality. Consistently ranked top on human preference benchmarks. Excellent at photorealism and artistic compositions. Inherits SDXL ControlNet compatibility โ€” canny, depth, and openpose ControlNets work with varying degrees of success.

50 inference steps ยท UNET
๐Ÿ‘ Creative image generated by DreamShaper XL
LykonDreamShaper XL
2.6B paramsup to 1024ร—1024~7 GB VRAMstable
artphotorealisticfantasy
High

Versatile SDXL fine-tune known for handling diverse styles โ€” from photorealism to digital art, fantasy, and anime. One of the most downloaded community models.

8 inference steps ยท UNET
๐Ÿ‘ Anime-style images generated by Animagine XL 4.0
Cagliostro LabAnimagine XL 4.0
2.6B paramsup to 1024ร—1024~7 GB VRAMstable
animeillustrationcharacter-design
High

Latest version of the popular anime-focused SDXL fine-tune from Cagliostro Lab. Successor to Animagine XL 3.1. Improved anime/illustration quality with better character consistency, more accurate tag-based prompting, and cleaner outputs.

28 inference steps ยท UNET
๐Ÿ‘ Anime-style image generated by Animagine XL 3.1
Cagliostro LabAnimagine XL 3.1
2.6B paramsup to 1024ร—1024~7 GB VRAMstable
animeillustrationcharacter
High

Top anime SDXL fine-tune using Danbooru tag-based prompting. Excellent character generation with consistent anatomy and style. One of the most downloaded anime models on HuggingFace.

28 inference steps ยท UNET
๐Ÿ‘ Anime illustration generated by Illustrious XL
OnomaAIResearchIllustrious XL
2.6B paramsup to 1024ร—1024~7 GB VRAMstable
animeillustrationcharacter-design
High

SDXL-based anime and illustration foundation model. Trained on a massive curated anime/illustration dataset. Spawned a huge derivative ecosystem on CivitAI with hundreds of fine-tunes.

28 inference steps ยท UNET
๐Ÿ‘ Detailed image generated by Stable Diffusion XL 1.0
๐Ÿ‘ Stability AI
Stability AI
Stable Diffusion XL 1.0
2.6B paramsup to 1024ร—1024~7 GB VRAMstable
photorealisticartanime
High

Industry standard image generation model. 2.6B UNet with dual text encoder (CLIP ViT-L 0.123B + OpenCLIP ViT-bigG 0.695B). Massive ecosystem of LoRAs, ControlNets, and community resources.

30 inference steps ยท UNET
๐Ÿ‘ Sample images generated by SDXL-Lightning in just a few inference steps
๐Ÿ‘ ByteDance
ByteDance
SDXL Lightning
2.6B paramsup to 1024ร—1024~7 GB VRAMstable
fast-generationreal-timephotorealistic
High

Progressive distillation of SDXL from ByteDance. Available in 1-step, 2-step, 4-step, and 8-step variants via LoRA or full UNet checkpoints. Achieves near SDXL quality in as few as 2-4 steps โ€” significantly faster than SDXL's standard 25-50 steps.

4 inference steps ยท UNET
๐Ÿ‘ Stylized illustration generated by Pony Diffusion V6 XL
PurpleSmartAIPony Diffusion V6 XL
2.6B paramsup to 1024ร—1024~7 GB VRAMstable
furryanthropomorphicstylized
High

Specialized SDXL fine-tune primarily for furry, anthropomorphic, and stylized character art. Uses score-based prompt system (score_9, score_8_up). Also capable of anime and general illustration but requires specific prompting syntax.

25 inference steps ยท UNET
๐Ÿ‘ Image generated by Kwai Kolors
๐Ÿ‘ Kwai
Kwai
Kolors
2.6B paramsup to 1024ร—1024~17.8 GB VRAMstable
photorealisticartmultilingual
High

Bilingual Chinese + English text-to-image model from Kwai. Uses SDXL UNet (2.6B) with ChatGLM3-6B (6.2B) as text encoder instead of CLIP, enabling strong multilingual prompt understanding. Apache 2.0 licensed.

50 inference steps ยท UNET
๐Ÿ‘ Image generated by Stable Diffusion 3.5 Medium
๐Ÿ‘ Stability AI
Stability AI
Stable Diffusion 3.5 Medium
2B paramsup to 1024ร—1024~15.2 GB VRAMstable
photorealisticartdesign
High

Lightweight 2.0B MMDiT-X model balancing quality and accessibility. Runs on consumer GPUs with 8GB+ VRAM. Good prompt adherence with triple text encoder (5.5B combined: T5-XXL + CLIP-L + OpenCLIP-G).

28 inference steps ยท MMDIT
๐Ÿ‘ Sample images generated by Stable Cascade
๐Ÿ‘ Stability AI
Stability AI
Stable Cascade
3.6B paramsup to 1024ร—1024~9 GB VRAMstable
photorealisticartdesign
High

Two-stage cascade pipeline from Stability AI using Wurstchen architecture. Stage C (~3.6B) generates in a very small latent space, then Stage B (~1.5B) decodes to full resolution. More VRAM-efficient than single-stage models of similar quality.

20 inference steps ยท DIT
๐Ÿ‘ Image generated by AuraFlow v0.3
๐Ÿ‘ fal
fal
AuraFlow v0.3
6.35B paramsup to 1536ร—1536~15.3 GB VRAMbeta
artphotorealistic
Mid

Open-source DiT model from fal.ai combining MMDiT and single DiT blocks in a Flux-like hybrid architecture. 6.35B transformer with T5-XL (~1.2B) text encoder. Apache 2.0 licensed โ€” fully open for commercial use.

50 inference steps ยท DIT
๐Ÿ‘ Image generated by SDXL Turbo in a single step
๐Ÿ‘ Stability AI
Stability AI
SDXL Turbo
2.6B paramsup to 512ร—512~7 GB VRAMstable
fast-generationreal-timeprototyping
Mid

Adversarial distillation of SDXL for near real-time image generation. 2.6B UNet, only 1-4 steps needed. Quality is lower than SDXL base but generation is almost instant. Great for real-time previewing.

1 inference steps ยท UNET
๐Ÿ‘ Detailed image generated by PixArt-Sigma
๐Ÿ‘ PixArt
PixArt
PixArt-Sigma
0.611B paramsup to 1024ร—1024~10.8 GB VRAMstable
artdesignfast-generation
Mid

Ultra-lightweight DiT model with only 0.6B parameters. Generates 1024px images with surprisingly good quality for its size. Uses T5-XXL text encoder for strong prompt adherence despite small UNet.

20 inference steps ยท DIT
๐Ÿ‘ Photorealistic portrait generated by Realistic Vision v5.1
SG161222Realistic Vision v5.1
0.86B paramsup to 768ร—768~2.1 GB VRAMstable
photorealisticportrait
Mid

The gold standard for photorealism on SD 1.5. Generates remarkably lifelike portraits with only 4GB VRAM. Massive LoRA and ControlNet ecosystem inherited from SD 1.5.

25 inference steps ยท UNET
๐Ÿ‘ Image generated by DreamShaper 8
LykonDreamShaper 8
0.86B paramsup to 768ร—768~2.1 GB VRAMstable
artanimephotorealistic
Mid

Versatile SD 1.5 fine-tune handling diverse styles from photorealism to anime and fantasy art. One of the most popular community checkpoints, runs on 4GB+ VRAM.

25 inference steps ยท UNET
๐Ÿ‘ Sample images generated by SD Turbo in a single inference step
๐Ÿ‘ Stability AI
Stability AI
SD Turbo
0.86B paramsup to 512ร—512~2.1 GB VRAMstable
fast-generationreal-timeprototyping
Mid

Adversarial distillation of SD 1.5 for single-step image generation. Only 0.86B UNet โ€” the smallest and fastest Stable Diffusion variant. Quality is lower than SD 1.5 but generation is nearly instant. Ideal for real-time interactive use.

1 inference steps ยท UNET
๐Ÿ‘ Images generated by LCM DreamShaper v7 in 4 steps
SimianLuoLCM DreamShaper v7
0.86B paramsup to 768ร—768~2.1 GB VRAMstable
fast-generationreal-timeart
Mid

Pioneer of Latent Consistency Models (LCM). SD 1.5 based model that generates images in only 1-4 steps, enabling near-real-time generation. Runs on 4GB+ VRAM. MIT licensed.

4 inference steps ยท UNET
๐Ÿ‘ Image generated by Stable Diffusion 1.5
๐Ÿ‘ Stability AI
Stability AI
Stable Diffusion 1.5
0.86B paramsup to 512ร—512~2.1 GB VRAMlegacy
artanimefast-generation
Mid

The original widely-adopted image generation model. Extremely lightweight โ€” runs on 4GB VRAM. Massive legacy ecosystem of checkpoints, LoRAs, and tools. Still preferred for speed and low VRAM scenarios.

20 inference steps ยท UNET