Ray Serve

ML library for model deployment and serving. Anyscale supports and further optimizes Ray Serve for improved performance, reliability, and scale.

~50%

reduction in total ML inferencing costs for Samsara

240,000

cores for model serving deployed with Ray Serve at Ant Group

up to

60%

higher QPS serving with optimized version of Ray Serve (vs. open source Ray Serve)

up to

50%

fewer nodes with features like Replica Compaction (compared to open source Ray)

What is Ray Serve?

Ray Serve is a scalable model serving library for building online inference applications, offering features like model composition, model multiplexing, and built-in autoscaling.

Because Ray Serve is framework-agnostic, you can use a single toolkit to serve everything from deep learning models built with any ML framework, including PyTorch, TensorFlow, and other popular frameworks.

Plus, Ray Serve has several features and performance optimizations for serving LLMs such as response streaming, dynamic request batching, multi-node/multi-GPU serving, and more.

👁 Serve Map Small

Ray Serve Feature Highlights

Model Composition

Integrate multiple ML models with separate resource requirements and auto-scaling needs within one deployment. Orchestrate processing workflows at scale with Ray Serve.

👁 Model Composition

Supercharge Ray Serve with Anyscale

Node Startup in 60s or Less

We know how important it is to serve your models quickly, which is why Anyscale nodes scale up in one minute, compared to competitors’ 5+ minute average.

👁 Reliability 400 x 250 white background

👁 samsara-dark

How Samsara Reduced LLM Inference Costs by ~50% with Ray Serve

Discover how introducing Ray Serve dramatically improved Samsara’s production ML pipeline performance and led to a nearly 50% increase in total LLM inferencing costs.

Feature Comparison

Runtime: Performance and Cost

Scale from your laptop to 1,000s of nodes easily

👁 Amazon SageMaker

N/A

👁 Ray

–

👁 anyscale blue

Production Readiness

Production services support for model training and deployment

👁 Amazon SageMaker

👁 Ray

Limited

👁 anyscale blue

Cloud and GPU Support

Launch Your Cluster on Any Cloud with Any Accelerator

👁 Amazon SageMaker

N/A

👁 Ray

Limited

👁 anyscale blue

Many Model Patterns

👁 Amazon SageMaker

Limited

👁 Ray

👁 anyscale blue

Support

Support led by the creators and maintainers of Ray

👁 Amazon SageMaker

—

👁 Ray

Limited

👁 anyscale blue

	👁 Amazon SageMaker	👁 Ray	👁 anyscale blue
Runtime: Performance and Cost Scale from your laptop to 1,000s of nodes easily	👁 Amazon SageMaker N/A N/A	👁 Ray – –	👁 anyscale blue
Production Readiness Production services support for model training and deployment	👁 Amazon SageMaker	👁 Ray Limited Limited	👁 anyscale blue
Cloud and GPU Support Launch Your Cluster on Any Cloud with Any Accelerator	👁 Amazon SageMaker N/A N/A	👁 Ray Limited Limited	👁 anyscale blue
Many Model Patterns	👁 Amazon SageMaker Limited Limited	👁 Ray	👁 anyscale blue
Support Support led by the creators and maintainers of Ray	👁 Amazon SageMaker — —	👁 Ray Limited Limited	👁 anyscale blue

Out-of-the-Box Templates & App Accelerators

Jumpstart your development process with custom-made templates, only available on Anyscale.

Deploy LLMs

Base models, LoRA adapters, and embedding models. Deploy with optimized RayLLM.

Deploy Stable Diffusion

Text-to-image generation model by Stability AI. Deploy with Ray Serve.

Ray Serve with Triton

Optimize performance for Stable diffusion with Triton on Ray Serve.

👁 Canva Logo Black

“We have no ceiling on scale, and an incredible opportunity to bring AI features and value to our 170 million users.”

Greg Roodt
ML Lead, Canva

Related Resources

Learn more about why Anyscale’s Ray Serve is the leader for distributed model deployment and serving.

Free Template: Deploy LLMs

Try Anyscale’s free template which includes base models, LoRA adapters, and embedding models.

Learn More about Ray Serve

Get an in-depth look at Ray Serve, including 4 main benefits and frequently asked questions.

Deploy Ray Serve with up to 50% Fewer Nodes

Learn how Anyscale’s Replica Compaction feature can help you solve resource fragmentation and optimize resource usage.

Ray Serve Docs

Explore in-depth documentation on how to get started and use Ray Serve.

FAQs

Try Anyscale Today

Build, deploy, and manage scalable AI and Python applications on the leading AI platform. Unlock your AI potential with Anyscale.

Book a Demo Contact Sales

URL: https://www.anyscale.com/product/library/ray-serve

⇱ Ray Serve with Anyscale

Ray Serve

~50%

240,000

up to

60%

up to

50%

What is Ray Serve?

Ray Serve Feature Highlights

Model Composition

Supercharge Ray Serve with Anyscale

Node Startup in 60s or Less

How Samsara Reduced LLM Inference Costs by ~50% with Ray Serve

Feature Comparison

Runtime: Performance and Cost

Production Readiness

Cloud and GPU Support

Many Model Patterns

Support

Runtime: Performance and Cost

Production Readiness

Cloud and GPU Support

Many Model Patterns

Support

Out-of-the-Box Templates & App Accelerators

Deploy LLMs

Deploy Stable Diffusion

Ray Serve with Triton

Related Resources

Free Template: Deploy LLMs

Learn More about Ray Serve

Deploy Ray Serve with up to 50% Fewer Nodes

Ray Serve Docs

FAQs

Try Anyscale Today