LLM Model Storage with NFS: Download Once, Infer Everywhere

Published on December 23, 2025

AI/ML

NFS

Kubernetes

DigitalOcean Managed Kubernetes

Solutions Architect

👁 Joe Keegan
👁 Anish Singh Walia

By Joe Keegan and Anish Singh Walia

👁 LLM Model Storage with NFS: Download Once, Infer Everywhere

Your vLLM pods are probably downloading the same massive model file every time they start.

If you’ve deployed LLM inference on Kubernetes, you may have taken the straightforward path: point vLLM at HuggingFace and let it download the model when the pod starts. It works. But here’s what happens next:

A pod crashes at 2 AM. The replacement pod spends several minutes downloading gigabytes of model weights from HuggingFace before it can serve a single request.
You need to scale up during a traffic spike. Each new pod downloads the model independently, competing for bandwidth and delaying your response to demand.
HuggingFace has a rate limit or outage. Your pods can’t start.

There’s a better way: download the model once to shared storage, then let every pod load directly from that source. No redundant downloads. No external runtime dependencies. Fast access for any new pod you deploy.

In this guide, you’ll deploy vLLM on DigitalOcean Kubernetes Service (DOKS) using Managed NFS for model storage.

We’ll use a single H100 GPU node to keep things simple, but the pattern scales to as many nodes as you need and that’s the point. Once your model is on NFS, adding GPU capacity means instant model access, not another lengthy download.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the author(s)

👁 Joe Keegan

Joe Keegan

Author

Sr. Solutions Architect

See author profile

A Senior Solutions Architect at DigitalOcean focusing on Cloud Architecture, Kubernetes, Automation and Infrastructure-as-Code.

See author profile

👁 Anish Singh Walia

Anish Singh Walia

Editor

Sr Technical Content Strategist and Team Lead

See author profile

I help Businesses scale with AI x SEO x (authentic) Content that revives traffic and keeps leads flowing | 3,000,000+ Average monthly readers on Medium | Sr Technical Writer(Team Lead) @ DigitalOcean | Ex-Cloud Consultant @ AMEX | Ex-Site Reliability Engineer(DevOps)@Nutanix

Category:

Tags:

DigitalOcean Managed Kubernetes

Solutions Architect

Still looking for an answer?

Ask a question Search for more help

Was this helpful?

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

👁 Creative Commons
This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 4.0 International License.

Table of contents

Deploy on DigitalOcean
Click below to sign up for DigitalOcean's virtual machines, Databases, and AIML products.
Sign up

👁 Image

Become a contributor for community

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

👁 Image

DigitalOcean Documentation

Full documentation for every DigitalOcean product.

Learn more

👁 Image

Resources for startups and AI-native businesses

The Wave has everything you need to know about building a business, from raising funding to marketing your product.

Learn more

Get our newsletter

Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.

New accounts only. By submitting your email you agree to our Privacy Policy

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

View all products

Start building today

From GPU-powered inference and Kubernetes to managed databases and storage, get everything you need to build, scale, and deploy intelligent applications.

Dark mode is coming soon.

URL: https://www.digitalocean.com/community/tutorials/llm-model-storage-nfs-kubernetes