๐ Black Forest Labs
Black Forest LabsFlux.2 Dev
Next-generation text-to-image model from Black Forest Labs. 32B parameter DiT with Mistral-Small-3.2-24B text encoder. Requires ~64GB+ VRAM at full precision; consumer GPUs need Q4/Q8 quantization.
๐ Black Forest Labs
Black Forest LabsFlux.1 Dev
State-of-the-art text-to-image model from Black Forest Labs. Excels at photorealism, text rendering, and prompt adherence. 12B parameter DiT architecture with dual text encoders: T5-XXL (4.7B) and CLIP-L (0.12B).
๐ Tencent
TencentHunyuanImage 3.0
Massive MoE-based text-to-image model from Tencent. 84B total parameters with ~14B active (Mixture of Experts). Autoregressive + diffusion hybrid architecture. Excellent quality and Chinese/English text rendering. One of the largest open image generation models.
๐ Black Forest Labs
Black Forest LabsFlux.1 Fill Dev
Inpainting and outpainting specialist built on the Flux.1 architecture. Designed for masked region generation โ object removal, replacement, and image extension. Uses higher guidance scale (30) and more steps (50) than standard Flux.1 Dev for optimal mask adherence.
Alibaba / QwenQwen Image
State-of-the-art text-to-image model from Qwen team. 20.4B DiT transformer with Qwen2.5-VL (8.3B) text encoder. Excels at photorealism, Chinese/English text rendering, and complex compositions. Apache 2.0 licensed.
Alibaba Tongyi-MAIZ-Image Turbo
Ultra-fast image generation model from Alibaba Tongyi-MAI using S3-DiT architecture. 6B parameters, only 8 inference steps. Fits in 16GB VRAM.
๐ Black Forest Labs
Black Forest LabsFlux.1 Kontext Dev
Context-aware image editing model from Black Forest Labs. Based on FLUX.1 DiT architecture, Kontext excels at in-context image editing: style transfer, character consistency across images, text modifications, and object manipulation using natural language instructions.
Alibaba / QwenQwen Image Edit
Instruction-based image editing model from Qwen team. Same 20.4B DiT backbone as Qwen-Image but fine-tuned for image editing tasks: inpainting, style transfer, object removal, and text-guided modifications. Apache 2.0 licensed.
๐ Black Forest Labs
Black Forest LabsFlux.1 Schnell
Distilled version of Flux.1 Dev optimized for speed. Only 4 steps needed (vs 28 for Dev). Same architecture but ~7x faster generation. Apache 2.0 licensed.
๐ Black Forest Labs
Black Forest LabsFlux.2 Klein 9B
Mid-range 9B variant of FLUX.2 Klein family. Sub-second generation on H100. DiT architecture with T5-XXL + CLIP-L text encoders (4.82B combined). Higher quality than the 4B sibling while remaining efficient.
๐ Stability AI
Stability AIStable Diffusion 3.5 Large
2.5B MMDiT transformer with triple text encoder (5.5B combined: T5-XXL 4.7B + CLIP-L 0.123B + OpenCLIP-G 0.695B). Improved text rendering and composition over SDXL.
๐ Stability AI
Stability AIStable Diffusion 3.5 Large Turbo
Distilled version of SD 3.5 Large requiring only 4 inference steps. Same 2.5B MMDiT architecture but ~7x faster. Good for rapid iteration and previewing.
SG161222RealVisXL v5.0
The most popular photorealistic SDXL fine-tune on CivitAI. Excels at lifelike portraits, landscapes, and product photography. Compatible with all SDXL ControlNets and LoRAs.
๐ Black Forest Labs
Black Forest LabsFlux.2 Klein 4B
Lightweight 4B variant of FLUX.2 for efficient generation. Distilled from FLUX.2-dev for faster inference on consumer GPUs. Apache 2.0 licensed โ the most accessible Flux model for commercial use.
LodestonesChroma
Community-distilled 8.9B model based on FLUX.1-schnell architecture. Apache 2.0 licensed alternative to Flux with competitive quality. Available in HD and Flash variants for different quality/speed tradeoffs.
RunDiffusionJuggernaut XL v9
Premium photorealistic SDXL fine-tune focused on cinematic quality. Known for exceptional skin textures, lighting, and composition. Popular for portrait and fashion photography.
๐ Playground AI
Playground AIPlayground v2.5
SDXL-based model fine-tuned for exceptional aesthetic quality. Consistently ranked top on human preference benchmarks. Excellent at photorealism and artistic compositions. Inherits SDXL ControlNet compatibility โ canny, depth, and openpose ControlNets work with varying degrees of success.
LykonDreamShaper XL
Versatile SDXL fine-tune known for handling diverse styles โ from photorealism to digital art, fantasy, and anime. One of the most downloaded community models.
Cagliostro LabAnimagine XL 4.0
Latest version of the popular anime-focused SDXL fine-tune from Cagliostro Lab. Successor to Animagine XL 3.1. Improved anime/illustration quality with better character consistency, more accurate tag-based prompting, and cleaner outputs.
Cagliostro LabAnimagine XL 3.1
Top anime SDXL fine-tune using Danbooru tag-based prompting. Excellent character generation with consistent anatomy and style. One of the most downloaded anime models on HuggingFace.
OnomaAIResearchIllustrious XL
SDXL-based anime and illustration foundation model. Trained on a massive curated anime/illustration dataset. Spawned a huge derivative ecosystem on CivitAI with hundreds of fine-tunes.
๐ Stability AI
Stability AIStable Diffusion XL 1.0
Industry standard image generation model. 2.6B UNet with dual text encoder (CLIP ViT-L 0.123B + OpenCLIP ViT-bigG 0.695B). Massive ecosystem of LoRAs, ControlNets, and community resources.
๐ ByteDance
ByteDanceSDXL Lightning
Progressive distillation of SDXL from ByteDance. Available in 1-step, 2-step, 4-step, and 8-step variants via LoRA or full UNet checkpoints. Achieves near SDXL quality in as few as 2-4 steps โ significantly faster than SDXL's standard 25-50 steps.
PurpleSmartAIPony Diffusion V6 XL
Specialized SDXL fine-tune primarily for furry, anthropomorphic, and stylized character art. Uses score-based prompt system (score_9, score_8_up). Also capable of anime and general illustration but requires specific prompting syntax.
๐ Kwai
KwaiKolors
Bilingual Chinese + English text-to-image model from Kwai. Uses SDXL UNet (2.6B) with ChatGLM3-6B (6.2B) as text encoder instead of CLIP, enabling strong multilingual prompt understanding. Apache 2.0 licensed.
๐ Stability AI
Stability AIStable Diffusion 3.5 Medium
Lightweight 2.0B MMDiT-X model balancing quality and accessibility. Runs on consumer GPUs with 8GB+ VRAM. Good prompt adherence with triple text encoder (5.5B combined: T5-XXL + CLIP-L + OpenCLIP-G).
๐ Stability AI
Stability AIStable Cascade
Two-stage cascade pipeline from Stability AI using Wurstchen architecture. Stage C (~3.6B) generates in a very small latent space, then Stage B (~1.5B) decodes to full resolution. More VRAM-efficient than single-stage models of similar quality.
๐ fal
falAuraFlow v0.3
Open-source DiT model from fal.ai combining MMDiT and single DiT blocks in a Flux-like hybrid architecture. 6.35B transformer with T5-XL (~1.2B) text encoder. Apache 2.0 licensed โ fully open for commercial use.
๐ Stability AI
Stability AISDXL Turbo
Adversarial distillation of SDXL for near real-time image generation. 2.6B UNet, only 1-4 steps needed. Quality is lower than SDXL base but generation is almost instant. Great for real-time previewing.
๐ PixArt
PixArtPixArt-Sigma
Ultra-lightweight DiT model with only 0.6B parameters. Generates 1024px images with surprisingly good quality for its size. Uses T5-XXL text encoder for strong prompt adherence despite small UNet.
SG161222Realistic Vision v5.1
The gold standard for photorealism on SD 1.5. Generates remarkably lifelike portraits with only 4GB VRAM. Massive LoRA and ControlNet ecosystem inherited from SD 1.5.
LykonDreamShaper 8
Versatile SD 1.5 fine-tune handling diverse styles from photorealism to anime and fantasy art. One of the most popular community checkpoints, runs on 4GB+ VRAM.
๐ Stability AI
Stability AISD Turbo
Adversarial distillation of SD 1.5 for single-step image generation. Only 0.86B UNet โ the smallest and fastest Stable Diffusion variant. Quality is lower than SD 1.5 but generation is nearly instant. Ideal for real-time interactive use.
SimianLuoLCM DreamShaper v7
Pioneer of Latent Consistency Models (LCM). SD 1.5 based model that generates images in only 1-4 steps, enabling near-real-time generation. Runs on 4GB+ VRAM. MIT licensed.
๐ Stability AI
Stability AIStable Diffusion 1.5
The original widely-adopted image generation model. Extremely lightweight โ runs on 4GB VRAM. Massive legacy ecosystem of checkpoints, LoRAs, and tools. Still preferred for speed and low VRAM scenarios.
