We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

Automatic Speech Recognition Embeddings Reranker Text Generation Text To Image Text To Music Text To Speech Text To Video World Model Zero Shot Image Classification

Docs

Pricing

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

automatic-speech-recognition

world-model

zero-shot-image-classification

openai/

clip-vit-base-patch32

👁 openai/clip-vit-base-patch32 cover image

The CLIP model was developed by OpenAI to investigate the robustness of computer vision models. It uses a Vision Transformer architecture and was trained on a large dataset of image-caption pairs. The model shows promise in various computer vision tasks but also has limitations, including difficulties with fine-grained classification and potential biases in certain applications.

$0.0005 / second

image-classification

👁 openai logo

openai/

clip-vit-large-patch14-336

👁 openai/clip-vit-large-patch14-336 cover image

A zero-shot-image-classification model released by OpenAI. The clip-vit-large-patch14-336 model was trained from scratch on an unknown dataset and achieves unspecified results on the evaluation set. The model's intended uses and limitations, as well as its training and evaluation data, are not provided. The training procedure used an unknown optimizer and precision, and the framework versions included Transformers 4.21.3, TensorFlow 2.8.2, and Tokenizers 0.12.1.

$0.0005 / second

👁 Footer Logo

👁 SOC 2 Certified
👁 ISO 27001 Certified

Have questions or need a custom solution?

Company