![]() |
VOOZH | about |
DeepInfra raises $107M Series B to scale the inference cloud β read the announcement
FastVideo/
$0.0350
/ second
A fast, step-distilled build of Lightricks' LTX-2.3 diffusion-transformer video model (distilled by FastVideo). Generates high-fidelity text-to-video and image-to-video in just a few denoising steps.
Prompt
text prompt describing the video content
Negative Prompt
Negative text prompt (optional, not required); leave blank to fall back to the model's default negative.. (Default: uncanny face, mask-like, plastic skin, doll-like, waxy, mannequin, cgi, 3d render, deformed face, distorted face, extra fingers, deformed hands, blurry, washed out, vintage, 1970s, sepia, grainy, low quality)
Seconds
Clip duration: always 5 seconds (fixed/required for this model).
Resolution
Output resolution: always 1080p (fixed/required for this model).
Orientation
Output orientation: always landscape (fixed/required for this model).
Please upload an image file
You need to log in to use this model
Log InSettings
Seed
specify a seed for reproducible output (Default: empty)
LTX-2.3 is a diffusion-transformer (DiT) audio-video foundation model from Lightricks that generates high-fidelity video with synchronized audio from text or a starting image. This endpoint serves the distilled variant, accelerated with FastVideo (Hao AI Lab, UCSD) to produce results in only a few denoising steps.
image_url (an http(s) URL or a data: URI).Provide a descriptive prompt. For image-to-video, also pass image_url. Use negative_prompt
to steer away from unwanted artifacts and seed for reproducible results. Detailed, concrete
prompts β subject, action, setting, lighting, camera motion, and any sound or dialogue β produce
the strongest results; for image-to-video, describe the motion you want applied to the supplied image.
That's the readme done. The full set is now ready to paste:
Β© 2026 DeepInfra. All rights reserved.