VOOZH about

URL: https://docs.bentoml.com/en/latest/

โ‡ฑ BentoML


Skip to content
Back to top

BentoML Documentationยถ

๐Ÿ‘ github_stars
๐Ÿ‘ pypi_status
๐Ÿ‘ actions_status
๐Ÿ‘ documentation_status
๐Ÿ‘ join_slack


BentoML is a Unified Inference Platform for deploying and scaling AI systems with any model, on any cloud.

Featured examplesยถ

Serve large language models with OpenAI-compatible APIs and vLLM inference backend.

LLM inference: vLLM

Deploy private RAG systems with open-source embedding and large language models.

RAG: Document ingestion and search

Deploy image generation APIs with flexible customization and optimized batch processing.

Stable Diffusion XL Turbo

Automate reproducible workflows with queued execution using ComfyUI pipelines.

ComfyUI: Deploy workflows as APIs

Build a phone calling agent with end-to-end streaming capabilities using open-source models and Twilio.

https://github.com/bentoml/BentoTwilioConversationRelay

Protect your LLM API endpoint from harmful input using Googleโ€™s safety content moderation model.

LLM safety: ShieldGemma

Explore what developers are building with BentoML.

Overview

What is BentoMLยถ

BentoML is a Unified Inference Platform for deploying and scaling AI models with production-grade reliability, all without the complexity of managing infrastructure. It enables your developers to build AI systems 10x faster with custom models, scale efficiently in your cloud, and maintain complete control over security and compliance.

๐Ÿ‘ The architecture diagram of the BentoML unified inference platform

To get started with BentoML:

How-tosยถ

Build your custom AI APIs with BentoML.

Create online API Services

Deploy your AI application to production with one command.

Create Deployments

Configure fast autoscaling to achieve optimal performance.

Concurrency and autoscaling

Run model inference on GPUs with BentoML.

Work with GPUs

Develop with powerful cloud GPUs using your favorite IDE.

Develop with Codespaces

Load and serve your custom models with BentoML.

Load and manage models

Stay informedยถ

The BentoML team uses the following channels to announce important updates like major product releases and share tutorials, case studies, as well as community news.

To receive release notifications, star and watch the BentoML project on GitHub. For release notes and detailed changelogs, see the Releases page.