Browse deepinfra models:

👁 Bria/video_foreground_mask cover image

Bria/

video_foreground_mask

Automatically identify and segment foreground objects across video frames and generate a mask. No prompts, just a video.

Partner

$0.1400 / second

👁 Bria/video_increase_resolution cover image

Bria/

video_increase_resolution

Increase video resolution up to 8K with advanced AI upscaling. Bring your videos to the big screen, ready for the screens of tomorrow.

Partner

$0.1400 / second

👁 Bria/video_mask_by_key_points cover image

Bria/

video_mask_by_key_points

Identify and segment objects across video frames using specific coordinate points. Just point in the right direction and the model will figure out by itself which object should be masked.

Partner

$0.1400 / second

👁 Bria/video_mask_by_prompt cover image

Bria/

video_mask_by_prompt

Identify and segment objects across video frames using a text prompt. The easiest way to create a mask to modify your videos.

Partner

$0.1400 / second

👁 Bria/video_remove_background cover image

Bria/

video_remove_background

Light and fast. Remove the background of your videos to bring the foreground elements to focus. No more unwanted distractions.

Partner

$0.0042 / second

👁 ByteDance/Seedance-1.5-Pro cover image

👁 ByteDance logo

ByteDance/

Seedance-1.5-Pro

ByteDance's Seedance 1.5 Pro is a professional video model using V2A native generation for integrated, synced audio-visual output, enhancing efficiency of professional video creation.

Partner

$1.200 / 1M tokens

👁 ByteDance/Seedance-2.0 cover image

👁 ByteDance logo

ByteDance/

Seedance-2.0

A new-generation professional-grade multimodal video creation model developed, supports video generation with multimodal reference inputs including images, videos and audio.

Partner

$4.300 / 1M tokens

LTX-2.3-Distilled-Diffusers

FastVideo/

👁 FastVideo/LTX-2.3-Distilled-Diffusers cover image

A fast, step-distilled build of Lightricks' LTX-2.3 diffusion-transformer video model (distilled by FastVideo). Generates high-fidelity text-to-video and image-to-video in just a few denoising steps.

$0.0350 / second

👁 Pixverse/Pixverse-6-I2V cover image

Pixverse-6-I2V

PixVerse V6 redefines AI video by shifting from isolated generation to a unified, model-driven workflow. Key upgrades include 15-second durations at 1080p resolution and a multi-shot engine. This transition allows creators to move beyond short clips toward meaningful narrative production and professional-grade marketing assets suitable for 2026 digital distribution standards.

Partner

$0.045 / second

👁 Pixverse/Pixverse-6-T2V cover image

Pixverse-6-T2V

Partner

$0.045 / second

👁 Pixverse/Pixverse-T2V cover image

Pixverse-T2V

PixVerse's 720p resolution offers a fast and reliable option for generating standard HD videos, ideal for quick previews and social media content where generation speed is prioritized over maximum detail.

Partner

$0.20 / video

👁 Pixverse/Pixverse-T2V-HD cover image

Pixverse-T2V-HD

The 1080p high-fidelity mode in PixVerse renders videos with significantly enhanced sharpness and visual clarity, capturing intricate details and providing a crisp, professional-grade quality suitable for more polished projects.

Partner

$0.40 / video

👁 PrunaAI/p-video cover image

👁 PrunaAI logo

PrunaAI/

p-video

Real-time AI video generation from text, images, and audio. Supports up to 1080p at 48 FPS with built-in audio generation, draft mode for 4x faster previews, and prompt upsampling.

Partner

$0.02 / second

👁 PrunaAI/p-video-avatar cover image

👁 PrunaAI logo

PrunaAI/

p-video-avatar

Pruna's talking head video generation model. Provide a portrait image and either a speech script or an audio file, and the model generates a realistic video of the person speaking. Supports multiple voices, languages, and output resolutions.

Partner

$0.025 / second

👁 Wan-AI/Wan2.2-T2V-A14B cover image

Wan-AI/

Wan2.2-T2V-A14B

The Wan2.2 T2V A14B is a next-generation 14B-parameter video foundation model by Wan-AI featuring a novel two-stage denoising architecture. It produces 480P videos with improved visual coherence and detail, generating 2 or 5 second clips at 16fps from text prompts.

$0.0360 / second

👁 Wan-AI/Wan2.6-I2V cover image

Wan-AI/

Wan2.6-I2V

Turn any image into a video. Intelligent shot scheduling supports multi-shot storytelling, generating multi-shot narrative videos with consistent subjects, scenes, and atmosphere

Partner

$0.10 / second

👁 Wan-AI/Wan2.6-T2V cover image

Wan-AI/

Wan2.6-T2V

Turn any prompt into a smooth video. Intelligent shot scheduling supports multi-shot storytelling, generating multi-shot narrative videos with consistent subjects, scenes, and atmosphere

Partner

$0.10 / second

👁 Wan-AI/Wan2.7-I2V cover image

Wan-AI/

Wan2.7-I2V

Generates video content from images while stably preserving details such as subject, style, and text elements. Ensures visual consistency and information fidelity throughout dynamic transitions.

Partner

$0.10 / second

👁 Wan-AI/Wan2.7-R2V cover image

Wan-AI/

Wan2.7-R2V

Accurately preserve the look and voice of people or objects from a reference video, supporting multi-reference co-creation.

Partner

$0.10 / second

👁 google/veo-3.1 cover image

👁 google logo

google/

veo-3.1

Veo 3.1 is the latest text-to-video model from Google that generates high-fidelity, cinematic videos with synchronized audio from a simple text prompt. It excels at creating realistic and imaginative scenes with a deep understanding of natural language and visual dynamics.

Partner

$0.4000 / second