PRX text-to-image models • 10 items • Updated • 16
PRXPixel (text-to-image, pixel space)
PRXPixel is a pixel-space variant of PRX:
it denoises raw RGB directly (no VAE), conditions on a Qwen3-VL text encoder (rather than T5Gemma),
and feeds the generation resolution into the timestep modulation. The denoiser is a ~7B
PRXTransformer2DModel with a bottleneck patch projection and a resolution embedder.
- Resolution: 1024
- Transformer: ~7B params,
torch.bfloat16 - Text encoder: Qwen3-VL text tower (
Qwen3VLTextModel) - VAE: none (pixel space)
- Scheduler:
FlowMatchEulerDiscreteScheduler
Requirements
PRXPixelPipelineis not yet in a releaseddiffusers. Installdiffusersfrom the branch that adds it, and usetransformers >= 4.57(the version that introducedQwen3VLTextModel):pip install "transformers>=4.57" pip install "git+https://github.com/huggingface/diffusers.git@prx-pixel-pipeline"
Usage
import torch
from diffusers import PRXPixelPipeline
pipe = PRXPixelPipeline.from_pretrained("Photoroom/prxpixel-t2i", torch_dtype=torch.bfloat16)
pipe.to("cuda")
prompt = "A front-facing portrait of a lion in the golden savanna at sunset."
image = pipe(prompt, num_inference_steps=28, guidance_scale=5.0).images[0]
image.save("prxpixel_output.png")
License
Released under the Apache 2.0 license. See LICENSE and NOTICE.
- Downloads last month
- 294
Model tree for Photoroom/prxpixel-t2i
Finetunes
1 model