Serve Examples

Below are tutorials for exploring Ray Serve capabilities and learning how to integrate different modeling frameworks.

ML Applications

Serve ML Models

Serve a Stable Diffusion Model

Serve a Text Classification Model

Serve an Object Detection Model

Serve a Chatbot with Request and Response Streaming

Video Analysis Inference Pipeline

AI Accelerators

Serve an Inference Model on AWS NeuronCores Using FastAPI

Serve an Inference with Stable Diffusion Model on AWS NeuronCores Using FastAPI

Serve a model on Intel Gaudi Accelerator

Integrations

Scale a Gradio App with Ray Serve

Serve a Text Generator with Request Batching

Serving models with Triton Server in Ray Serve

Serve a Java App

Asynchronous Inference using Ray Serve

Integrate with MLflow Model Registry

LLM Applications

Serve DeepSeek

Deploy a small-sized LLM

Deploy a medium-sized LLM

Deploy a large-sized LLM

Deploy a vision LLM

Deploy a reasoning LLM

Deploy a hybrid reasoning LLM

Deploy gpt-oss

Deployment Patterns

Model Multiplexing with Forecasting Models

Model Composition for Recommendation Systems